aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2012-05-23Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds6-16/+43
Pull user namespace enhancements from Eric Biederman: "This is a course correction for the user namespace, so that we can reach an inexpensive, maintainable, and reasonably complete implementation. Highlights: - Config guards make it impossible to enable the user namespace and code that has not been converted to be user namespace safe. - Use of the new kuid_t type ensures the if you somehow get past the config guards the kernel will encounter type errors if you enable user namespaces and attempt to compile in code whose permission checks have not been updated to be user namespace safe. - All uids from child user namespaces are mapped into the initial user namespace before they are processed. Removing the need to add an additional check to see if the user namespace of the compared uids remains the same. - With the user namespaces compiled out the performance is as good or better than it is today. - For most operations absolutely nothing changes performance or operationally with the user namespace enabled. - The worst case performance I could come up with was timing 1 billion cache cold stat operations with the user namespace code enabled. This went from 156s to 164s on my laptop (or 156ns to 164ns per stat operation). - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value. Most uid/gid setting system calls treat these value specially anyway so attempting to use -1 as a uid would likely cause entertaining failures in userspace. - If setuid is called with a uid that can not be mapped setuid fails. I have looked at sendmail, login, ssh and every other program I could think of that would call setuid and they all check for and handle the case where setuid fails. - If stat or a similar system call is called from a context in which we can not map a uid we lie and return overflowuid. The LFS experience suggests not lying and returning an error code might be better, but the historical precedent with uids is different and I can not think of anything that would break by lying about a uid we can't map. - Capabilities are localized to the current user namespace making it safe to give the initial user in a user namespace all capabilities. My git tree covers all of the modifications needed to convert the core kernel and enough changes to make a system bootable to runlevel 1." Fix up trivial conflicts due to nearby independent changes in fs/stat.c * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) userns: Silence silly gcc warning. cred: use correct cred accessor with regards to rcu read lock userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq userns: Convert cgroup permission checks to use uid_eq userns: Convert tmpfs to use kuid and kgid where appropriate userns: Convert sysfs to use kgid/kuid where appropriate userns: Convert sysctl permission checks to use kuid and kgids. userns: Convert proc to use kuid/kgid where appropriate userns: Convert ext4 to user kuid/kgid where appropriate userns: Convert ext3 to use kuid/kgid where appropriate userns: Convert ext2 to use kuid/kgid where appropriate. userns: Convert devpts to use kuid/kgid where appropriate userns: Convert binary formats to use kuid/kgid where appropriate userns: Add negative depends on entries to avoid building code that is userns unsafe userns: signal remove unnecessary map_cred_ns userns: Teach inode_capable to understand inodes whose uids map to other namespaces. userns: Fail exec for suid and sgid binaries with ids outside our user namespace. userns: Convert stat to return values mapped from kuids and kgids userns: Convert user specfied uids and gids in chown into kuids and kgid userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs ...
2012-05-22Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivialLinus Torvalds2-3/+3
Pull trivial updates from Jiri Kosina: "As usual, it's mostly typo fixes, redundant code elimination and some documentation updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits) edac, mips: don't change code that has been removed in edac/mips tree xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer lib: Change mail address of Oskar Schirmer net: Change mail address of Oskar Schirmer arm/m68k: Change mail address of Sebastian Hess i2c: Change mail address of Oskar Schirmer net: Fix tcp_build_and_update_options comment in struct tcp_sock atomic64_32.h: fix parameter naming mismatch Kconfig: replace "--- help ---" with "---help---" c2port: fix bogus Kconfig "default no" edac: Fix spelling errors. qla1280: Remove redundant NULL check before release_firmware() call remoteproc: remove redundant NULL check before release_firmware() qla2xxx: Remove redundant NULL check before release_firmware() call. aic94xx: Get rid of redundant NULL check before release_firmware() call tehuti: delete redundant NULL check before release_firmware() qlogic: get rid of a redundant test for NULL before call to release_firmware() bna: remove redundant NULL test before release_firmware() tg3: remove redundant NULL test before release_firmware() call typhoon: get rid of redundant conditional before all to release_firmware() ...
2012-05-22Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hidLinus Torvalds1-26/+1
Pull HID subsystem updates from Jiri Kosina: "Apart from various driver updates and added support for a number of new devices (mostly multitouch ones, but not limited to), there is one change that is worth pointing out explicitly: creation of HID device groups and proper autoloading of hid-multitouch, implemented by Henrik Rydberg." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (50 commits) HID: wacom: fix build breakage without CONFIG_LEDS_CLASS HID: waltop: Extend barrel button fix HID: hyperv: Set the hid drvdata correctly HID: wacom: Unify speed setting HID: wacom: Add speed setting for Intuos4 WL HID: wacom: Move Graphire raport header check. HID: uclogic: Add support for UC-Logic TWHL850 HID: explain the signed/unsigned handling in hid_add_field() HID: handle logical min/max signedness properly in parser HID: logitech: read all 32 bits of report type bitfield HID: wacom: Add LED selector control for Wacom Intuos4 WL HID: hid-multitouch: fix wrong protocol detection HID: wiimote: Fix IR data parser HID: wacom: Add tilt reporting for Intuos4 WL HID: multitouch: MT interface matching for Baanto HID: hid-multitouch: Only match MT interfaces HID: Create a common generic driver HID: hid-multitouch: Switch to device groups HID: Create a generic device group HID: Allow bus wildcard matching ...
2012-05-22Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds4-83/+65
Pull cgroup updates from Tejun Heo: "cgroup file type addition / removal is updated so that file types are added and removed instead of individual files so that dynamic file type addition / removal can be implemented by cgroup and used by controllers. blkio controller changes which will come through block tree are dependent on this. Other changes include res_counter cleanup and disallowing kthread / PF_THREAD_BOUND threads to be attached to non-root cgroups. There's a reported bug with the file type addition / removal handling which can lead to oops on cgroup umount. The issue is being looked into. It shouldn't cause problems for most setups and isn't a security concern." Fix up trivial conflict in Documentation/feature-removal-schedule.txt * 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits) res_counter: Account max_usage when calling res_counter_charge_nofail() res_counter: Merge res_counter_charge and res_counter_charge_nofail cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads cgroup: remove cgroup_subsys->populate() cgroup: get rid of populate for memcg cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg cgroup: make css->refcnt clearing on cgroup removal optional cgroup: use negative bias on css->refcnt to block css_tryget() cgroup: implement cgroup_rm_cftypes() cgroup: introduce struct cfent cgroup: relocate __d_cgrp() and __d_cft() cgroup: remove cgroup_add_file[s]() cgroup: convert memcg controller to the new cftype interface memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP cgroup: convert all non-memcg controllers to the new cftype interface cgroup: relocate cftype and cgroup_subsys definitions in controllers cgroup: merge cft_release_agent cftype array into the base files array cgroup: implement cgroup_add_cftypes() and friends cgroup: build list of all cgroups under a given cgroupfs_root cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir() ...
2012-05-22Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpuLinus Torvalds2-8/+8
Pull percpu updates from Tejun Heo: "Contains Alex Shi's three patches to remove percpu_xxx() which overlap with this_cpu_xxx(). There shouldn't be any functional change." * 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: remove percpu_xxx() functions x86: replace percpu_xxx funcs with this_cpu_xxx net: replace percpu_xxx funcs with this_cpu_xxx or __this_cpu_xxx
2012-05-22Merge tag 'tty-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds1-73/+64
Pull TTY updates from Greg Kroah-Hartman: "Here's the big TTY/serial driver pull request for the 3.5-rc1 merge window. Nothing major in here, just lots of incremental changes from Alan and Jiri reworking some tty core things to behave better and to get a more solid grasp on some of the nasty tty locking issues. There are a few tty and serial driver updates in here as well. All of this has been in the linux-next releases for a while with no problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'tty-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (115 commits) serial: bfin_uart: Make MMR access compatible with 32 bits bf609 style controller. serial: bfin_uart: RTS and CTS MMRs can be either 16-bit width or 32-bit width. serial: bfin_uart: narrow the reboot condition in DMA tx interrupt serial: bfin_uart: Adapt bf5xx serial driver to bf60x serial4 controller. Revert "serial_core: Update buffer overrun statistics." tty: hvc_xen: NULL dereference on allocation failure tty: Fix LED error return tty: Allow uart_register/unregister/register tty: move global ldisc idle waitqueue to the individual ldisc serial8250-em: Add DT support serial8250-em: clk_get() IS_ERR() error handling fix serial_core: Update buffer overrun statistics. tty: drop the pty lock during hangup cris: fix missing tty arg in wait_event_interruptible_tty call tty/amiserial: Add missing argument for tty_unlock() tty_lock: Localise the lock pty: Lock the devpts bits privately tty_lock: undo the old tty_lock use on the ctty serial8250-em: Emma Mobile UART driver V2 Add missing call to uart_update_timeout() ...
2012-05-21Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds4-13/+7
Pull security subsystem updates from James Morris: "New notable features: - The seccomp work from Will Drewry - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski - Longer security labels for Smack from Casey Schaufler - Additional ptrace restriction modes for Yama by Kees Cook" Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits) apparmor: fix long path failure due to disconnected path apparmor: fix profile lookup for unconfined ima: fix filename hint to reflect script interpreter name KEYS: Don't check for NULL key pointer in key_validate() Smack: allow for significantly longer Smack labels v4 gfp flags for security_inode_alloc()? Smack: recursive tramsmute Yama: replace capable() with ns_capable() TOMOYO: Accept manager programs which do not start with / . KEYS: Add invalidation support KEYS: Do LRU discard in full keyrings KEYS: Permit in-place link replacement in keyring list KEYS: Perform RCU synchronisation on keys prior to key destruction KEYS: Announce key type (un)registration KEYS: Reorganise keys Makefile KEYS: Move the key config into security/keys/Kconfig KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat Yama: remove an unused variable samples/seccomp: fix dependencies on arch macros Yama: add additional ptrace scopes ...
2012-05-21Merge tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds1-1/+2
Pull virtio updates from Rusty Russell. * tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio: fix typo in comment virtio-mmio: Devices parameter parsing virtio_blk: Drop unused request tracking list virtio-blk: Fix hot-unplug race in remove method virtio: Use ida to allocate virtio index virtio: balloon: separate out common code between remove and freeze functions virtio: balloon: drop restore_common() 9p: disconnect channel when PCI device is removed virtio: update documentation to v0.9.5 of spec
2012-05-229p: disconnect channel when PCI device is removedSasha Levin1-1/+2
When a virtio_9p pci device is being removed, we should close down any active channels and free up resources, we're not supposed to BUG() if there's still an open channel since it's a valid case when removing the PCI device. Otherwise, removing the PCI device with an open channel would cause the following BUG(): [ 1184.671416] ------------[ cut here ]------------ [ 1184.672057] kernel BUG at net/9p/trans_virtio.c:618! [ 1184.672057] invalid opcode: 0000 [#1] PREEMPT SMP [ 1184.672057] CPU 3 [ 1184.672057] Pid: 5, comm: kworker/u:0 Tainted: G W 3.4.0-rc2-next-20120413-sasha-dirty #76 [ 1184.672057] RIP: 0010:[<ffffffff825c9116>] [<ffffffff825c9116>] p9_virtio_remove+0x16/0x90 [ 1184.672057] RSP: 0018:ffff88000d653ac0 EFLAGS: 00010202 [ 1184.672057] RAX: ffffffff836bfb40 RBX: ffff88000c9b2148 RCX: ffff88000d658978 [ 1184.672057] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff880028868000 [ 1184.672057] RBP: ffff88000d653ad0 R08: 0000000000000000 R09: 0000000000000000 [ 1184.672057] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880028868000 [ 1184.672057] R13: ffffffff835aa7c0 R14: ffff880041630000 R15: ffff88000d653da0 [ 1184.672057] FS: 0000000000000000(0000) GS:ffff880035a00000(0000) knlGS:0000000000000000 [ 1184.672057] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1184.672057] CR2: 0000000001181000 CR3: 000000000eba1000 CR4: 00000000000406e0 [ 1184.672057] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 x000000000117a190 *[ 1184.672057] DR3: 00000000000000** 00 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1184.672057] Process kworker/u:0 (pid: 5, threadinfo ffff88000d652000, task ffff88000d658000) [ 1184.672057] Stack: [ 1184.672057] ffff880028868000 ffffffff836bfb40 ffff88000d653af0 ffffffff8193661b [ 1184.672057] ffff880028868008 ffffffff836bfb40 ffff88000d653b10 ffffffff81af1c81 [ 1184.672057] ffff880028868068 ffff880028868008 ffff88000d653b30 ffffffff81af257a [ 1184.795301] Call Trace: [ 1184.795301] [<ffffffff8193661b>] virtio_dev_remove+0x1b/0x60 [ 1184.795301] [<ffffffff81af1c81>] __device_release_driver+0x81/0xd0 [ 1184.795301] [<ffffffff81af257a>] device_release_driver+0x2a/0x40 [ 1184.795301] [<ffffffff81af0d48>] bus_remove_device+0x138/0x150 [ 1184.795301] [<ffffffff81aef08d>] device_del+0x14d/0x1b0 [ 1184.795301] [<ffffffff81aef138>] device_unregister+0x48/0x60 [ 1184.795301] [<ffffffff8193694d>] unregister_virtio_device+0xd/0x10 [ 1184.795301] [<ffffffff8265fc74>] virtio_pci_remove+0x2a/0x6c [ 1184.795301] [<ffffffff818a95ad>] pci_device_remove+0x4d/0x110 [ 1184.795301] [<ffffffff81af1c81>] __device_release_driver+0x81/0xd0 [ 1184.795301] [<ffffffff81af257a>] device_release_driver+0x2a/0x40 [ 1184.795301] [<ffffffff81af0d48>] bus_remove_device+0x138/0x150 [ 1184.795301] [<ffffffff81aef08d>] device_del+0x14d/0x1b0 [ 1184.795301] [<ffffffff81aef138>] device_unregister+0x48/0x60 [ 1184.795301] [<ffffffff818a36fa>] pci_stop_bus_device+0x6a/0x90 [ 1184.795301] [<ffffffff818a3791>] pci_stop_and_remove_bus_device+0x11/0x20 [ 1184.795301] [<ffffffff818c21d9>] remove_callback+0x9/0x10 [ 1184.795301] [<ffffffff81252d91>] sysfs_schedule_callback_work+0x21/0x60 [ 1184.795301] [<ffffffff810cb1a1>] process_one_work+0x281/0x430 [ 1184.795301] [<ffffffff810cb140>] ? process_one_work+0x220/0x430 [ 1184.795301] [<ffffffff81252d70>] ? sysfs_read_file+0x1c0/0x1c0 [ 1184.795301] [<ffffffff810cc613>] worker_thread+0x1f3/0x320 [ 1184.795301] [<ffffffff810cc420>] ? manage_workers.clone.13+0x130/0x130 [ 1184.795301] [<ffffffff810d30b2>] kthread+0xb2/0xc0 [ 1184.795301] [<ffffffff826783f4>] kernel_thread_helper+0x4/0x10 [ 1184.795301] [<ffffffff810deb18>] ? finish_task_switch+0x78/0xf0 [ 1184.795301] [<ffffffff82676574>] ? retint_restore_args+0x13/0x13 [ 1184.795301] [<ffffffff810d3000>] ? kthread_flush_work_fn+0x10/0x10 [ 1184.795301] [<ffffffff826783f0>] ? gs_change+0x13/0x13 [ 1184.795301] Code: c1 9e 0a 00 48 83 c4 08 5b c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49 89 fc 53 48 8b 9f a8 04 00 00 80 3b 00 74 0a <0f> 0b 0f 1f 84 00 00 00 00 00 48 8b 87 88 04 00 00 ff 50 30 31 [ 1184.795301] RIP [<ffffffff825c9116>] p9_virtio_remove+0x16/0x90 [ 1184.795301] RSP <ffff88000d653ac0> [ 1184.952618] ---[ end trace a307b3ed40206b4c ]--- Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-05-22Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into nextJames Morris1-0/+1
Per pull request, for 3.5.
2012-05-21net: drop NET dependency from HAVE_BPF_JITSam Ravnborg1-3/+4
There is no point having the NET dependency on the select target, as it forces all users to depend on NET to tell they support BPF_JIT. Move the config option to the bottom of the file - this could be a nice place also for future "selectable" config symbols. Fix up all users to drop the dependency on NET now that it is not required to supress warnings for non-NET builds. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds543-11997/+18656
Pull networking changes from David Miller: 1) Get rid of the error prone NLA_PUT*() macros that used an embedded goto. 2) Kill off the token-ring and MCA networking drivers, from Paul Gortmaker. 3) Reduce high-order allocations made by datagram AF_UNIX sockets, from Eric Dumazet. 4) Add PTP hardware clock support to IGB and IXGBE, from Richard Cochran and Jacob Keller. 5) Allow users to query timestamping capabilities of a card via ethtool, from Richard Cochran. 6) Add loadbalance mode to the teaming driver, from Jiri Pirko. Part of this is that we can now have BPF filters not attached to sockets, and the loadbalancing function is calculated using one. 7) Francois Romieu went through the network drivers removing gratuitous uses of netdev->base_addr, perhaps some day we can remove it completely but it's used for ISA probing still. 8) Add a BPF JIT for sparc. I know, who cares, right? :-) 9) Move networking sysctl registry away from using the compatability mode interfaces in the sysctl code. From Eric W Biederman. 10) Pavel Emelyanov added a way to save and restore TCP socket state via TCP_REPAIR, TCP_REPAIR_QUEUE, and TCP_QUEUE_SEQ socket options as well as a way to forcefully bind a socket to a port via the sk->sk_reuse value SK_FORCE_REUSE. There is also a TCP_REPAIR_OPTIONS which allows to reinstante the TCP options enabled on the connection. 11) Several enhancements from Eric Dumazet that, in particular, can enhance splice performance on TCP sockets significantly. a) Reset the offset of the per-socket sendmsg page when we know we're the only use of the page in linear_to_page(). b) Add facilities such that skb->data can be backed a page rather than SLAB kmalloc'd memory. In particular devices which were receiving into linear RX buffers can now end up providing paged data. The big result is that code like splice and GRO do not have to copy any more. 12) Allow a pure sender to more gracefully handle ACK backlogs in TCP. What can happen at high rates is that the sender hasn't grown his receive buffer limits at all (he's not receiving data so really doesn't need to), but the non-data ACKs consume receive buffer space. sk_add_backlog() is too aggressive in dropping frames in this case, so relax it's requirements by using the receive buffer plus the send buffer limit as the backlog limit instead of just the former. Also from Eric Dumazet. 13) Add ipv6 support to L2TP, from Benjamin LaHaise, James Chapman, and Chris Elston. 14) Implement TCP early retransmit (RFC 5827), from Yuchung Cheng. Basically, we can start fast retransmit before hiting the dupack threshold under certain conditions. 15) New CODEL active queue management packet scheduler, from Eric Dumazet based upon initial work by Dave Taht. Basically, the big feature is that packets are dropped (or ECN bits are set) based upon how long packets live in the queue, rather than the queue length (which is what RED uses). * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1341 commits) drivers/net/stmmac: seq_file fix memory leak ipv6/exthdrs: strict Pad1 and PadN check USB: qmi_wwan: Add ZTE (Vodafone) K3520-Z USB: qmi_wwan: Add ZTE (Vodafone) K3765-Z USB: qmi_wwan: Make forced int 4 whitelist generic net/ipv4: replace simple_strtoul with kstrtoul net/ipv4/ipconfig: neaten __setup placement net: qmi_wwan: Add Vodafone/Huawei K5005 support net: cdc_ether: Add ZTE WWAN matches before generic Ethernet ipv6: use skb coalescing in reassembly ipv4: use skb coalescing in defragmentation net: introduce skb_try_coalesce() net:ipv6:fixed space issues relating to operators. net:ipv6:fixed a trailing white space issue. ipv6: disable GSO on sockets hitting dst_allfrag tg3: use netdev_alloc_frag() API net: napi_frags_skb() is static ppp: avoid false drop_monitor false positives ipv6: bool/const conversions phase2 ipx: Remove spurious NULL checking in ipx_ioctl(). ...
2012-05-21Merge branch 'dentry-cleanups' (dcache access cleanups and optimizations)Linus Torvalds2-8/+3
This branch simplifies and clarifies the dcache lookup, and allows us to do certain nice optimizations when comparing dentries. It also cleans up the interface to __d_lookup_rcu(), especially around passing the inode information around. * dentry-cleanups: vfs: make it possible to access the dentry hash/len as one 64-bit entry vfs: move dentry name length comparison from dentry_cmp() into callers vfs: do the careful dentry name access for all dentry_cmp cases vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces
2012-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-4/+3
2012-05-20ipv6/exthdrs: strict Pad1 and PadN checkEldad Zack1-1/+14
The following tightens the padding check from commit c1412fce7eccae62b4de22494f6ab3ff8a90c0c6 : * Take into account combinations of consecutive Pad1 and PadN. * Catch the corner case of when only padding is present in the header, when the extention header length is 0 (i.e., 8 bytes). In this case, the header would have exactly 6 bytes of padding: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : Next Header : Hdr Ext Len=0 : : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + : Padding (Pad1 or PadN) : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-20net/ipv4: replace simple_strtoul with kstrtoulEldad Zack3-3/+21
Replace simple_strtoul with kstrtoul in three similar occurrences, all setup handlers: * route.c: set_rhash_entries * tcp.c: set_thash_entries * udp.c: set_uhash_entries Also check if the conversion failed. Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-20net/ipv4/ipconfig: neaten __setup placementEldad Zack1-3/+2
The __setup macro should follow the corresponding setup handler. Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19ipv6: use skb coalescing in reassemblyEric Dumazet1-6/+20
ip6_frag_reasm() can use skb_try_coalesce() to build optimized skb, reducing memory used by them (truesize), and reducing number of cache line misses and overhead for the consumer. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19ipv4: use skb coalescing in defragmentationEric Dumazet1-6/+20
ip_frag_reasm() can use skb_try_coalesce() to build optimized skb, reducing memory used by them (truesize), and reducing number of cache line misses and overhead for the consumer. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19net: introduce skb_try_coalesce()Eric Dumazet2-64/+89
Move tcp_try_coalesce() protocol independent part to skb_try_coalesce(). skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly, to build optimized skbs (less sk_buff, and possibly less 'headers') skb_try_coalesce() is zero copy, unless the copy can fit in destination header (its a rare case) kfree_skb_partial() is also moved to net/core/skbuff.c and exported, because IPv6 will need it in patch (ipv6: use skb coalescing in reassembly). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19net:ipv6:fixed space issues relating to operators.Jeffrin Jose1-2/+2
Fixed space issues relating to operators found by checkpatch.pl tool in net/ipv6/udp.c Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19net:ipv6:fixed a trailing white space issue.Jeffrin Jose1-1/+1
Fixed a trailing white space issue found by checkpatch.pl tool in net/ipv6/udp.c Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19ipv6: disable GSO on sockets hitting dst_allfragEric Dumazet1-1/+4
If the allfrag feature has been set on a host route (due to an ICMPv6 Packet Too Big received indicating a MTU of less than 1280), we hit a very slow behavior in TCP stack, because all big packets are dropped and only a retransmit timer is able to push one MSS frame every 200 ms. One way to handle this is to disable GSO on the socket the first time a super packet is dropped. Adding a specific dst_allfrag() in the fast path is probably overkill since the dst_allfrag() case almost never happen. Result on netperf TCP_STREAM, one flow : Before : 60 kbit/sec After : 1.6 Gbit/sec Reported-by: Tore Anderson <tore@fud.no> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19net: napi_frags_skb() is staticEric Dumazet1-2/+1
No need to export napi_frags_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19ipv6: bool/const conversions phase2Eric Dumazet13-118/+119
Mostly bool conversions, some inline removals and const additions. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19ipx: Remove spurious NULL checking in ipx_ioctl().David S. Miller1-3/+1
We already unconditionally dereference 'sk' via lock_sock(sk) earlier in this function, and our caller (sock_do_ioctl()) makes takes similar liberties. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18ipv6: ip6_fragment() should check CHECKSUM_PARTIALEric Dumazet1-0/+4
Quoting Tore Anderson from : If the allfrag feature has been set on a host route (due to an ICMPv6 Packet Too Big received indicating a MTU of less than 1280), TCP SYN/ACK packets to that destination appears to get an incorrect TCP checksum. This in turn means they are thrown away as invalid. In the case of an IPv4 client behind a link with a MTU of less than 1260, accessing an IPv6 server through a stateless translator, this means that the client can only download a single large file from the server, because once it is in the server's routing cache with the allfrag feature set, new TCP connections can no longer be established. </endquote> It appears ip6_fragment() doesn't handle CHECKSUM_PARTIAL properly. As network drivers are not prepared to fetch correct transport header, a safe fix is to call skb_checksum_help() before fragmenting packet. Reported-by: Tore Anderson <tore@fud.no> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18pktgen: fix module unload for goodEric Dumazet1-2/+2
commit c57b5468406 (pktgen: fix crash at module unload) did a very poor job with list primitives. 1) list_splice() arguments were in the wrong order 2) list_splice(list, head) has undefined behavior if head is not initialized. 3) We should use the list_splice_init() variant to clear pktgen_threads list. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18ipv6: remove csummode in ip6_append_data()Eric Dumazet1-3/+1
csummode variable is always CHECKSUM_NONE in ip6_append_data() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18net: introduce netdev_alloc_frag()Eric Dumazet1-41/+41
Fix two issues introduced in commit a1c7fff7e18f5 ( net: netdev_alloc_skb() use build_skb() ) - Must be IRQ safe (non NAPI drivers can use it) - Must not leak the frag if build_skb() fails to allocate sk_buff This patch introduces netdev_alloc_frag() for drivers willing to use build_skb() instead of __netdev_alloc_skb() variants. Factorize code so that : __dev_alloc_skb() is a wrapper around __netdev_alloc_skb(), and dev_alloc_skb() a wrapper around netdev_alloc_skb() Use __GFP_COLD flag. Almost all network drivers now benefit from skb->head_frag infrastructure. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18ipv6: bool conversions phase1Eric Dumazet2-6/+6
ipv6_opt_accepted() returns a bool, and can use const pointers ipv6_addr_equal(), ipv6_addr_any(), ipv6_addr_loopback(), ipv6_addr_orchid() return a bool. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18ip_frag: struct inet_frags match() method returns a boolEric Dumazet2-9/+10
- match() method returns a boolean - return (A && B && C && D) -> return A && B && C && D - fix indentation Signed-off-by: Eric Dumazet <edumazet@google.com>
2012-05-18econet: remove ancient bug ridden protocolStephen Hemminger5-1217/+0
More spring cleaning! The ancient Econet protocol should go. Most of the bug fixes in recent years have been fixing security vulnerabilities. The hardware hasn't been made since the 90s, it is only interesting as an archeological curiosity. For the truly curious, or insomniac, go read up on it. http://en.wikipedia.org/wiki/Econet Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17lapb: Neaten debuggingJoe Perches5-306/+134
Enable dynamic debugging and remove a bunch of #ifdef/#endifs. Add a lapb_dbg(level, fmt, ...) macro and replace the printk(KERN_DEBUG uses. Add pr_fmt and remove embedded prefixes. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17tcp: do_tcp_sendpages() must try to push data out on oom conditionsWilly Tarreau1-2/+1
Since recent changes on TCP splicing (starting with commits 2f533844 "tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp: tcp_sendpages() should call tcp_push() once"), I started seeing massive stalls when forwarding traffic between two sockets using splice() when pipe buffers were larger than socket buffers. Latest changes (net: netdev_alloc_skb() use build_skb()) made the problem even more apparent. The reason seems to be that if do_tcp_sendpages() fails on out of memory condition without being able to send at least one byte, tcp_push() is not called and the buffers cannot be flushed. After applying the attached patch, I cannot reproduce the stalls at all and the data rate it perfectly stable and steady under any condition which previously caused the problem to be permanent. The issue seems to have been there since before the kernel migrated to git, which makes me think that the stalls I occasionally experienced with tux during stress-tests years ago were probably related to the same issue. This issue was first encountered on 3.0.31 and 3.2.17, so please backport to -stable. Signed-off-by: Willy Tarreau <w@1wt.eu> Acked-by: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org>
2012-05-17drop_monitor: convert to modular buildingNeil Horman2-3/+45
When I first wrote drop monitor I wrote it to just build monolithically. There is no reason it can't be built modularly as well, so lets give it that flexibiity. I've tested this by building it as both a module and monolithically, and it seems to work quite well Change notes: v2) * fixed for_each_present_cpu loops to be more correct as per Eric D. * Converted exit path failures to BUG_ON as per Ben H. v3) * Converted del_timer to del_timer_sync to close race noted by Ben H. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Ben Hutchings <bhutchings@solarflare.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17net: netdev_alloc_skb() use build_skb()Eric Dumazet1-1/+31
netdev_alloc_skb() is used by networks driver in their RX path to allocate an skb to receive an incoming frame. With recent skb->head_frag infrastructure, it makes sense to change netdev_alloc_skb() to use build_skb() and a frag allocator. This permits a zero copy splice(socket->pipe), and better GRO or TCP coalescing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17ipv6: correct the ipv6 option name - Pad0 to Pad1Eldad Zack6-9/+9
The padding destination or hop-by-hop option is called Pad1 and not Pad0. See RFC2460 (4.2) or the IANA ipv6-parameters registry: http://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xml Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17tcp: bool conversionsEric Dumazet8-188/+191
bool conversions where possible. __inline__ -> inline space cleanups Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17net: core: Use pr_<level>Joe Perches6-36/+44
Use the current logging style. This enables use of dynamic debugging as well. Convert printk(KERN_<LEVEL> to pr_<level>. Add pr_fmt. Remove embedded prefixes, use %s, __func__ instead. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17net: ipv6: ndisc: Neaten ND_PRINTx macrosJoe Perches1-125/+91
Why use several macros when one will do? Convert the multiple ND_PRINTKx macros to a single ND_PRINTK macro. Use the new net_<level>_ratelimited mechanism too. Add pr_fmt with "ICMPv6: " as prefix. Remove embedded ICMPv6 prefixes from messages. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17pktgen: Use pr_debugJoe Perches1-23/+18
Convert printk(KERN_DEBUG to pr_debug which can enable dynamic debugging. Remove embedded prefixes from the conversions as pr_fmt adds them. Align arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17net: include/net/sock.h cleanupEric Dumazet3-10/+10
bool/const conversions where possible __inline__ -> inline space cleanups Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17net: l2tp: Standardize logging stylesJoe Perches8-221/+213
Use more current logging styles. Add pr_fmt to prefix output appropriately. Convert printks to pr_<level>. Convert PRINTK macros to new l2tp_<level> macros. Neaten some <foo>_refcount debugging macros. Use print_hex_dump_bytes instead of hand-coded loops. Coalesce formats and align arguments. Some KERN_DEBUG output is not now emitted unless dynamic_debugging is enabled. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller21-97/+121
2012-05-17netfilter: nf_ct_h323: fix usage of MODULE_ALIAS_NFCT_HELPERPablo Neira Ayuso1-1/+3
ctnetlink uses the aliases that are created by MODULE_ALIAS_NFCT_HELPER to auto-load the module based on the helper name. Thus, we have to use RAS, Q.931 and H.245, not H.323. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-17netfilter: xt_CT: remove redundant header includeEldad Zack1-1/+0
nf_conntrack_l4proto.h is included twice. Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-17netfilter: ipset: fix timeout value overflow bugJozsef Kadlecsik1-2/+13
Large timeout parameters could result wrong timeout values due to an overflow at msec to jiffies conversion (reported by Andreas Herz) [ This patch was mangled by Pablo Neira Ayuso since David Laight and Eric Dumazet noticed that we were using hardcoded 1000 instead of MSEC_PER_SEC to calculate the timeout ] Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-17netfilter: nf_ct_tcp: extend log message for invalid ignored packetsPablo Neira Ayuso1-1/+2
Extend log message if packets are ignored to include the TCP state, ie. replace: [ 3968.070196] nf_ct_tcp: invalid packet ignored IN= OUT= SRC=... by: [ 3968.070196] nf_ct_tcp: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=... This information is useful to know in what state we were while ignoring the packet. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-05-17netfilter: xt_HMARK: modulus is expensive for hash calculationPablo Neira Ayuso1-1/+1
Use: ((u64)(HASH_VAL * HASH_SIZE)) >> 32 as suggested by David S. Miller. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>