aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2012-04-01cgroup: convert memcg controller to the new cftype interfaceTejun Heo1-6/+10
Convert memcg to use the new cftype based interface. kmem support abuses ->populate() for mem_cgroup_sockets_init() so it can't be removed at the moment. tcp_memcontrol is updated so that tcp_files[] is registered via a __initcall. This change also allows removing the forward declaration of tcp_files[]. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Hugh Dickins <hughd@google.com> Cc: Greg Thelen <gthelen@google.com>
2012-04-01cgroup: convert all non-memcg controllers to the new cftype interfaceTejun Heo2-12/+4
Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio, net_cls and device controllers to use the new cftype based interface. Termination entry is added to cftype arrays and populate callbacks are replaced with cgroup_subsys->base_cftypes initializations. This is functionally identical transformation. There shouldn't be any visible behavior change. memcg is rather special and will be converted separately. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <paul@paulmenage.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Vivek Goyal <vgoyal@redhat.com>
2012-04-01cgroup: relocate cftype and cgroup_subsys definitions in controllersTejun Heo3-61/+49
blk-cgroup, netprio_cgroup, cls_cgroup and tcp_memcontrol unnecessarily define cftype array and cgroup_subsys structures at the top of the file, which is unconventional and necessiates forward declaration of methods. This patch relocates those below the definitions of the methods and removes the forward declarations. Note that forward declaration of tcp_files[] is added in tcp_memcontrol.c for tcp_init_cgroup(). This will be removed soon by another patch. This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-03-29Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-36/+50
Pull x32 support for x86-64 from Ingo Molnar: "This tree introduces the X32 binary format and execution mode for x86: 32-bit data space binaries using 64-bit instructions and 64-bit kernel syscalls. This allows applications whose working set fits into a 32 bits address space to make use of 64-bit instructions while using a 32-bit address space with shorter pointers, more compressed data structures, etc." Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c} * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) x32: Fix alignment fail in struct compat_siginfo x32: Fix stupid ia32/x32 inversion in the siginfo format x32: Add ptrace for x32 x32: Switch to a 64-bit clock_t x32: Provide separate is_ia32_task() and is_x32_task() predicates x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls x86/x32: Fix the binutils auto-detect x32: Warn and disable rather than error if binutils too old x32: Only clear TIF_X32 flag once x32: Make sure TS_COMPAT is cleared for x32 tasks fs: Remove missed ->fds_bits from cessation use of fd_set structs internally fs: Fix close_on_exec pointer in alloc_fdtable x32: Drop non-__vdso weak symbols from the x32 VDSO x32: Fix coding style violations in the x32 VDSO code x32: Add x32 VDSO support x32: Allow x32 to be configured x32: If configured, add x32 system calls to system call tables x32: Handle process creation x32: Signal-related system calls x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h> ...
2012-03-29Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linuxLinus Torvalds11-83/+59
Pull nfsd changes from Bruce Fields: Highlights: - Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features, moving us closer to a complete 4.1 implementation. - Bernd Schubert fixed a long-standing problem with readdir cookies on ext2/3/4. - Jeff Layton performed a long-overdue overhaul of the server reboot recovery code which will allow us to deprecate the current code (a rather unusual user of the vfs), and give us some needed flexibility for further improvements. - Like the client, we now support numeric uid's and gid's in the auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x. Plus miscellaneous bugfixes and cleanup. Thanks to everyone! There are also some delegation fixes waiting on vfs review that I suppose will have to wait for 3.5. With that done I think we'll finally turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly symbolic as it's been on by default in distro's for a while). And the list of 4.1 todo's should be achievable for 3.5 as well: http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues though we may still want a bit more experience with it before turning it on by default. * 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits) nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled nfsd4: use auth_unix unconditionally on backchannel nfsd: fix NULL pointer dereference in cld_pipe_downcall nfsd4: memory corruption in numeric_name_to_id() sunrpc: skip portmap calls on sessions backchannel nfsd4: allow numeric idmapping nfsd: don't allow legacy client tracker init for anything but init_net nfsd: add notifier to handle mount/unmount of rpc_pipefs sb nfsd: add the infrastructure to handle the cld upcall nfsd: add a header describing upcall to nfsdcld nfsd: add a per-net-namespace struct for nfsd sunrpc: create nfsd dir in rpc_pipefs nfsd: add nfsd4_client_tracking_ops struct and a way to set it nfsd: convert nfs4_client->cl_cb_flags to a generic flags field NFSD: Fix nfs4_verifier memory alignment NFSD: Fix warnings when NFSD_DEBUG is not defined nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes) nfsd: rename 'int access' to 'int may_flags' in nfsd_open() ext4: return 32/64-bit dir name hash according to usage type fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash ...
2012-03-28Merge tag 'nfs-for-3.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-1/+1
Pull NFS client bugfixes for Linux 3.4 from Trond Myklebust Highlights include: - Fix infinite loops in the mount code - Fix a userspace buffer overflow in __nfs4_get_acl_uncached - Fix a memory leak due to a double reference count in rpcb_getport_async() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> * tag 'nfs-for-3.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_error NFSv4.1: Fix layoutcommit error handling NFSv4: Fix two infinite loops in the mount code SUNRPC: Use the already looked-up xprt in rpcb_getport_async() NFS4.1: remove duplicate variable declaration in filelayout_clear_request_commit Fix length of buffer copied in __nfs4_get_acl_uncached
2012-03-28Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_systemLinus Torvalds93-93/+0
Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...
2012-03-28Remove all #inclusions of asm/system.hDavid Howells93-93/+0
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds3-230/+255
Pull Ceph updates for 3.4-rc1 from Sage Weil: "Alex has been busy. There are a range of rbd and libceph cleanups, especially surrounding device setup and teardown, and a few critical fixes in that code. There are more cleanups in the messenger code, virtual xattrs, a fix for CRC calculation/checks, and lots of other miscellaneous stuff. There's a patch from Amon Ott to make inos behave a bit better on 32-bit boxes, some decode check fixes from Xi Wang, and network throttling fix from Jim Schutt, and a couple RBD fixes from Josh Durgin. No new functionality, just a lot of cleanup and bug fixing." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (65 commits) rbd: move snap_rwsem to the device, rename to header_rwsem ceph: fix three bugs, two in ceph_vxattrcb_file_layout() libceph: isolate kmap() call in write_partial_msg_pages() libceph: rename "page_shift" variable to something sensible libceph: get rid of zero_page_address libceph: only call kernel_sendpage() via helper libceph: use kernel_sendpage() for sending zeroes libceph: fix inverted crc option logic libceph: some simple changes libceph: small refactor in write_partial_kvec() libceph: do crc calculations outside loop libceph: separate CRC calculation from byte swapping libceph: use "do" in CRC-related Boolean variables ceph: ensure Boolean options support both senses libceph: a few small changes libceph: make ceph_tcp_connect() return int libceph: encapsulate some messenger cleanup code libceph: make ceph_msgr_wq private libceph: encapsulate connection kvec operations libceph: move prepare_write_banner() ...
2012-03-28Merge tag 'for-linus-3.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fsLinus Torvalds1-3/+23
Pull 9p changes for the 3.4 merge window from Eric Van Hensbergen. * tag 'for-linus-3.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: statfs should not override server f_type net/9p: handle flushed Tclunk/Tremove net/9p: don't allow Tflush to be interrupted
2012-03-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds7-57/+103
Pull networking fixes from David Miller: 1) Name string overrun fix in gianfar driver from Joe Perches. 2) VHOST bug fixes from Michael S. Tsirkin and Nadav Har'El 3) Fix dependencies on xt_LOG netfilter module, from Pablo Neira Ayuso. 4) Fix RCU locking in xt_CT, also from Pablo Neira Ayuso. 5) Add a parameter to skb_add_rx_frag() so we can fix the truesize adjustments in the drivers that use it. The individual drivers aren't fixed by this commit, but will be dealt with using follow-on commits. From Eric Dumazet. 6) Add some device IDs to qmi_wwan driver, from Andrew Bird. 7) Fix a potential rcu_read_lock() imbalancein rt6_fill_node(). From Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: fix a potential rcu_read_lock() imbalance in rt6_fill_node() net: add a truesize parameter to skb_add_rx_frag() gianfar: Fix possible overrun and simplify interrupt name field creation USB: qmi_wwan: Add ZTE (Vodafone) K3570-Z and K3571-Z net interfaces USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces USB: qmi_wwan: Add ZTE (Vodafone) K3565-Z and K4505-Z net interfaces qlcnic: Bug fix for LRO netfilter: nf_conntrack: permanently attach timeout policy to conntrack netfilter: xt_CT: fix assignation of the generic protocol tracker netfilter: xt_CT: missing rcu_read_lock section in timeout assignment netfilter: cttimeout: fix dependency with l4protocol conntrack module netfilter: xt_LOG: use CONFIG_IP6_NF_IPTABLES instead of CONFIG_IPV6 vhost: fix release path lockdep checks vhost: don't forget to schedule() tools/virtio: stub out strong barriers tools/virtio: add linux/hrtimer.h stub tools/virtio: add linux/module.h stub
2012-03-27net: fix a potential rcu_read_lock() imbalance in rt6_fill_node()Eric Dumazet1-2/+6
Commit f2c31e32b378 (net: fix NULL dereferences in check_peer_redir() ) added a regression in rt6_fill_node(), leading to rcu_read_lock() imbalance. Thats because NLA_PUT() can make a jump to nla_put_failure label. Fix this by using nla_put() Many thanks to Ben Greear for his help Reported-by: Ben Greear <greearb@candelatech.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-27SUNRPC: Use the already looked-up xprt in rpcb_getport_async()Bryan Schumaker1-1/+1
rbcb_getport_async() was looking up the rpc_xprt (reference++) and then later looking it up again (reference++) to pass through the rpcbind_args. The xprt would only be dereferenced once, when we were done with the rpcbind_args (reference--). This leaves an extra reference to the transport that would never go away. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-26sunrpc: skip portmap calls on sessions backchannelJ. Bruce Fields1-0/+1
There's obviously no point to doing portmap calls over the sessions backchannel. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26sunrpc: create nfsd dir in rpc_pipefsJeff Layton1-0/+5
Add a new top-level dir in rpc_pipefs to hold the pipe for the clientid upcall. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26Merge nfs containerization work from Trond's treeJ. Bruce Fields28-733/+1559
The nfs containerization work is a prerequisite for Jeff Layton's reboot recovery rework.
2012-03-25net: add a truesize parameter to skb_add_rx_frag()Eric Dumazet1-2/+2
skb_add_rx_frag() API is misleading. Network skbs built with this helper can use uncharged kernel memory and eventually stress/crash machine in OOM. Add a 'truesize' parameter and then fix drivers in followup patches. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-24Merge tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linuxLinus Torvalds1-0/+1
Pull <linux/device.h> avoidance patches from Paul Gortmaker: "Nearly every subsystem has some kind of header with a proto like: void foo(struct device *dev); and yet there is no reason for most of these guys to care about the sub fields within the device struct. This allows us to significantly reduce the scope of headers including headers. For this instance, a reduction of about 40% is achieved by replacing the include with the simple fact that the device is some kind of a struct. Unlike the much larger module.h cleanup, this one is simply two commits. One to fix the implicit <linux/device.h> users, and then one to delete the device.h includes from the linux/include/ dir wherever possible." * tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: device.h: audit and cleanup users in main include dir device.h: cleanup users outside of linux/include (C files)
2012-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctlLinus Torvalds1-15/+9
Pull sysctl updates from Eric Biederman: - Rewrite of sysctl for speed and clarity. Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and are no longer bottlenecks in the process of adding and removing network devices. sysctl is now focused on being a filesystem instead of system call and the code can all be found in fs/proc/proc_sysctl.c. Hopefully this means the code is now approachable. Much thanks is owed to Lucian Grinjincu for keeping at this until something was found that was usable. - The recent proc_sys_poll oops found by the fuzzer during hibernation is fixed. * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits) sysctl: protect poll() in entries that may go away sysctl: Don't call sysctl_follow_link unless we are a link. sysctl: Comments to make the code clearer. sysctl: Correct error return from get_subdir sysctl: An easier to read version of find_subdir sysctl: fix memset parameters in setup_sysctl_set() sysctl: remove an unused variable sysctl: Add register_sysctl for normal sysctl users sysctl: Index sysctl directories with rbtrees. sysctl: Make the header lists per directory. sysctl: Move sysctl_check_dups into insert_header sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy sysctl: Replace root_list with links between sysctl_table_sets. sysctl: Add sysctl_print_dir and use it in get_subdir sysctl: Stop requiring explicit management of sysctl directories sysctl: Add a root pointer to ctl_table_set sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry. sysctl: Normalize the root_table data structure. sysctl: Factor out insert_header and erase_header ...
2012-03-23poll: add poll_requested_events() and poll_does_not_wait() functionsHans Verkuil1-1/+1
In some cases the poll() implementation in a driver has to do different things depending on the events the caller wants to poll for. An example is when a driver needs to start a DMA engine if the caller polls for POLLIN, but doesn't want to do that if POLLIN is not requested but instead only POLLOUT or POLLPRI is requested. This is something that can happen in the video4linux subsystem among others. Unfortunately, the current epoll/poll/select implementation doesn't provide that information reliably. The poll_table_struct does have it: it has a key field with the event mask. But once a poll() call matches one or more bits of that mask any following poll() calls are passed a NULL poll_table pointer. Also, the eventpoll implementation always left the key field at ~0 instead of using the requested events mask. This was changed in eventpoll.c so the key field now contains the actual events that should be polled for as set by the caller. The solution to the NULL poll_table pointer is to set the qproc field to NULL in poll_table once poll() matches the events, not the poll_table pointer itself. That way drivers can obtain the mask through a new poll_requested_events inline. The poll_table_struct can still be NULL since some kernel code calls it internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h). In that case poll_requested_events() returns ~0 (i.e. all events). Very rarely drivers might want to know whether poll_wait will actually wait. If another earlier file descriptor in the set already matched the events the caller wanted to wait for, then the kernel will return from the select() call without waiting. This might be useful information in order to avoid doing expensive work. A new helper function poll_does_not_wait() is added that drivers can use to detect this situation. This is now used in sock_poll_wait() in include/net/sock.h. This was the only place in the kernel that needed this information. Drivers should no longer access any of the poll_table internals, but use the poll_requested_events() and poll_does_not_wait() access functions instead. In order to enforce that the poll_table fields are now prepended with an underscore and a comment was added warning against using them directly. This required a change in unix_dgram_poll() in unix/af_unix.c which used the key field to get the requested events. It's been replaced by a call to poll_requested_events(). For qproc it was especially important to change its name since the behavior of that field changes with this patch since this function pointer can now be NULL when that wasn't possible in the past. Any driver accessing the qproc or key fields directly will now fail to compile. Some notes regarding the correctness of this patch: the driver's poll() function is called with a 'struct poll_table_struct *wait' argument. This pointer may or may not be NULL, drivers can never rely on it being one or the other as that depends on whether or not an earlier file descriptor in the select()'s fdset matched the requested events. There are only three things a driver can do with the wait argument: 1) obtain the key field: events = wait ? wait->key : ~0; This will still work although it should be replaced with the new poll_requested_events() function (which does exactly the same). This will now even work better, since wait is no longer set to NULL unnecessarily. 2) use the qproc callback. This could be deadly since qproc can now be NULL. Renaming qproc should prevent this from happening. There are no kernel drivers that actually access this callback directly, BTW. 3) test whether wait == NULL to determine whether poll would return without waiting. This is no longer sufficient as the correct test is now wait == NULL || wait->_qproc == NULL. However, the worst that can happen here is a slight performance hit in the case where wait != NULL and wait->_qproc == NULL. In that case the driver will assume that poll_wait() will actually add the fd to the set of waiting file descriptors. Of course, poll_wait() will not do that since it tests for wait->_qproc. This will not break anything, though. There is only one place in the whole kernel where this happens (sock_poll_wait() in include/net/sock.h) and that code will be replaced by a call to poll_does_not_wait() in the next patch. Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait() actually waiting. The next file descriptor from the set might match the event mask and thus any possible waits will never happen. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Reviewed-by: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23Merge branch 'master' of git://1984.lsi.us.es/netDavid S. Miller5-53/+95
2012-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds11-24/+17
Pull networking fixes from David Miller: 1) L2TP doesn't get autoloaded when you try to open an L2TP socket due to a missing module alias, fix from Benjamin LaHaise. 2) Netlabel and RDS should propagate gfp flags given to them by callers, fixes from Dan Carpeneter. 3) Recursive locking fix in usbnet wasn't bulletproof and can result in objects going away mid-flight due to races, fix from Ming Lei. 4) Fix up some confusion about a bool module parameter in netfilter's iptable_filter and ip6table_filter, from Rusty Russell. 5) If SKB recycling is used via napi_reuse_skb() we end up with different amounts of headroom reserved than we had at the original SKB allocation. Fix from Eric Dumazet. 6) Fix races in TG3 driver ring refilling, from Michael Chan. 7) We have callbacks for IPSEC replay notifiers, but some call sites were not using the ops method and instead were calling one of the implementations directly. Oops. Fix from Steffen Klassert. 8) Fix IP address validation properly in the bonding driver, the previous fix only works with netlink where the subnet mask and IP address are changed in one atomic operation. When 'ifconfig' ioctls are used the IP address and the subnet mask are changed in two distinct operations. Fix from Andy Gospodarek. 9) Provide a sky2 module operation to work around power management issues with some BIOSes. From Stephen Hemminger. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: usbnet: consider device busy at each recieved packet bonding: remove entries for master_ip and vlan_ip and query devices instead netfilter: remove forward module param confusion. usbnet: don't clear urb->dev in tx_complete usbnet: increase URB reference count before usb_unlink_urb xfrm: Access the replay notify functions via the registered callbacks xfrm: Remove unused xfrm_state from xfrm_state_check_space RDS: use gfp flags from caller in conn_alloc() netlabel: use GFP flags from caller instead of GFP_ATOMIC l2tp: enable automatic module loading for l2tp_ppp cnic: Fix parity error code conflict tg3: Fix RSS ring refill race condition sky2: override for PCI legacy power management net: fix napi_reuse_skb() skb reserve
2012-03-23Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds26-730/+1547
Pull NFS client updates for Linux 3.4 from Trond Myklebust: "New features include: - Add NFS client support for containers. This should enable most of the necessary functionality, including lockd support, and support for rpc.statd, NFSv4 idmapper and RPCSEC_GSS upcalls into the correct network namespace from which the mount system call was issued. - NFSv4 idmapper scalability improvements Base the idmapper cache on the keyring interface to allow concurrent access to idmapper entries. Start the process of migrating users from the single-threaded daemon-based approach to the multi-threaded request-key based approach. - NFSv4.1 implementation id. Allows the NFSv4.1 client and server to mutually identify each other for logging and debugging purposes. - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of having to use the more counterintuitive 'vers=4,minorversion=1'. - SUNRPC tracepoints. Start the process of adding tracepoints in order to improve debugging of the RPC layer. - pNFS object layout support for autologin. Important bugfixes include: - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to fail to wake up all tasks when applied to priority waitqueues. - Ensure that we handle read delegations correctly, when we try to truncate a file. - A number of fixes for NFSv4 state manager loops (mostly to do with delegation recovery)." * tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits) NFS: fix sb->s_id in nfs debug prints xprtrdma: Remove assumption that each segment is <= PAGE_SIZE xprtrdma: The transport should not bug-check when a dup reply is received pnfs-obj: autologin: Add support for protocol autologin NFS: Remove nfs4_setup_sequence from generic rename code NFS: Remove nfs4_setup_sequence from generic unlink code NFS: Remove nfs4_setup_sequence from generic read code NFS: Remove nfs4_setup_sequence from generic write code NFS: Fix more NFS debug related build warnings SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined nfs: non void functions must return a value SUNRPC: Kill compiler warning when RPC_DEBUG is unset SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG NFS: Use cond_resched_lock() to reduce latencies in the commit scans NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner NFS: ncommit count is being double decremented SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up() Try using machine credentials for RENEW calls NFSv4.1: Fix a few issues in filelayout_commit_pagelist NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code ...
2012-03-22bonding: remove entries for master_ip and vlan_ip and query devices insteadAndy Gospodarek1-0/+1
The following patch aimed to resolve an issue where secondary, tertiary, etc. addresses added to bond interfaces could overwrite the bond->master_ip and vlan_ip values. commit 917fbdb32f37e9a93b00bb12ee83532982982df3 Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com> Date: Wed Nov 23 23:37:15 2011 +0000 bonding: only use primary address for ARP That patch was good because it prevented bonds using ARP monitoring from sending frames with an invalid source IP address. Unfortunately, it didn't always work as expected. When using an ioctl (like ifconfig does) to set the IP address and netmask, 2 separate ioctls are actually called to set the IP and netmask if the mask chosen doesn't match the standard mask for that class of address. The first ioctl did not have a mask that matched the one in the primary address and would still cause the device address to be overwritten. The second ioctl that was called to set the mask would then detect as secondary and ignored, but the damage was already done. This was not an issue when using an application that used netlink sockets as the setting of IP and netmask came down at once. The inconsistent behavior between those two interfaces was something that needed to be resolved. While I was thinking about how I wanted to resolve this, Ralf Zeidler came with a patch that resolved this on a RHEL kernel by keeping a full shadow of the entries in dev->ifa_list for the bonding device and vlan devices in the bonding driver. I didn't like the duplication of the list as I want to see the 'bonding' struct and code shrink rather than grow, but liked the general idea. As the Subject indicates this patch drops the master_ip and vlan_ip elements from the 'bonding' and 'vlan_entry' structs, respectively. This can be done because a device's address-list is now traversed to determine the optimal source IP address for ARP requests and for checks to see if the bonding device has a particular IP address. This code could have all be contained inside the bonding driver, but it made more sense to me to EXPORT and call inet_confirm_addr since it did exactly what was needed. I tested this and a backported patch and everything works as expected. Ralf also helped with verification of the backported patch. Thanks to Ralf for all his help on this. v2: Whitespace and organizational changes based on suggestions from Jay Vosburgh and Dave Miller. v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's suggestion. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> CC: Ralf Zeidler <ralf.zeidler@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22netfilter: remove forward module param confusion.Rusty Russell2-14/+4
It used to be an int, and it got changed to a bool parameter at least 7 years ago. It happens that NF_ACCEPT and NF_DROP are 0 and 1, so this works, but it's unclear, and the check that it's in range is not required. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds1-1/+1
Pull s390 patches from Martin Schwidefsky: "The biggest patch is the rework of the smp code, something I wanted to do for some time. There are some patches for our various dump methods and one new thing: z/VM LGR detection. LGR stands for linux-guest- relocation and is the guest migration feature of z/VM. For debugging purposes we keep a log of the systems where a specific guest has lived." Fix up trivial conflict in arch/s390/kernel/smp.c due to the scheduler cleanup having removed some code next to removed s390 code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: [S390] kernel: Pass correct stack for smp_call_ipl_cpu() [S390] Ensure that vmcore_info pointer is never accessed directly [S390] dasd: prevent validate server for offline devices [S390] Remove monolithic build option for zcrypt driver. [S390] stack dump: fix indentation in output [S390] kernel: Add OS info memory interface [S390] Use block_sigmask() [S390] kernel: Add z/VM LGR detection [S390] irq: external interrupt code passing [S390] irq: set __ARCH_IRQ_EXIT_IRQS_DISABLED [S390] zfcpdump: Implement async sdias event processing [S390] Use copy_to_absolute_zero() instead of "stura/sturg" [S390] rework idle code [S390] rework smp code [S390] rename lowcore field [S390] Fix gcc 4.6.0 compile warning
2012-03-23netfilter: nf_conntrack: permanently attach timeout policy to conntrackPablo Neira Ayuso1-17/+22
We need to permanently attach the timeout policy to the conntrack, otherwise we may apply the custom timeout policy inconsistently. Without this patch, the following example: nfct timeout add test inet icmp timeout 100 iptables -I PREROUTING -t raw -p icmp -s 1.1.1.1 -j CT --timeout test Will only apply the custom timeout policy to outgoing packets from 1.1.1.1, but not to reply packets from 2.2.2.2 going to 1.1.1.1. To fix this issue, this patch modifies the current logic to attach the timeout policy when the first packet is seen (which is when the conntrack entry is created). Then, we keep using the attached timeout policy until the conntrack entry is destroyed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-23netfilter: xt_CT: fix assignation of the generic protocol trackerPablo Neira Ayuso1-1/+8
`iptables -p all' uses 0 to match all protocols, while the conntrack subsystem uses 255. We still need `-p all' to attach the custom timeout policies for the generic protocol tracker. Moreover, we may use `iptables -p sctp' while the SCTP tracker is not loaded. In that case, we have to default on the generic protocol tracker. Another possibility is `iptables -p ip' that should be supported as well. This patch makes sure we validate all possible scenarios. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-23netfilter: xt_CT: missing rcu_read_lock section in timeout assignmentPablo Neira Ayuso1-6/+12
Fix a dereference to pointer without rcu_read_lock held. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-23netfilter: cttimeout: fix dependency with l4protocol conntrack modulePablo Neira Ayuso3-24/+48
This patch introduces nf_conntrack_l4proto_find_get() and nf_conntrack_l4proto_put() to fix module dependencies between timeout objects and l4-protocol conntrack modules. Thus, we make sure that the module cannot be removed if it is used by any of the cttimeout objects. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-22xfrm: Access the replay notify functions via the registered callbacksSteffen Klassert1-3/+3
We call the wrong replay notify function when we use ESN replay handling. This leads to the fact that we don't send notifications if we use ESN. Fix this by calling the registered callbacks instead of xfrm_replay_notify(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22xfrm: Remove unused xfrm_state from xfrm_state_check_spaceSteffen Klassert1-2/+2
The xfrm_state argument is unused in this function, so remove it. Also the name xfrm_state_check_space does not really match what this function does. It actually checks if we have enough head and tailroom on the skb. So we rename the function to xfrm_skb_check_space. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22RDS: use gfp flags from caller in conn_alloc()Dan Carpenter3-3/+3
We should be using the gfp flags the caller specified here, instead of GFP_KERNEL. I think this might be a bugfix, depending on the value of "sock->sk->sk_allocation" when we call rds_conn_create_outgoing() in rds_sendmsg(). Otherwise, it's just a cleanup. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22netlabel: use GFP flags from caller instead of GFP_ATOMICDan Carpenter1-1/+1
This function takes a GFP flags as a parameter, but they are never used. We don't take a lock in this function so there is no reason to prefer GFP_ATOMIC over the caller's GFP flags. There is only one caller, cipso_v4_map_cat_rng_ntoh(), and it passes GFP_ATOMIC as the GFP flags so this doesn't change how the code works. It's just a cleanup. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22libceph: isolate kmap() call in write_partial_msg_pages()Alex Elder1-11/+2
In write_partial_msg_pages(), every case now does an identical call to kmap(page). Instead, just call it once inside the CRC-computing block where it's needed. Move the definition of kaddr inside that block, and make it a (char *) to ensure portable pointer arithmetic. We still don't kunmap() it until after the sendpage() call, in case that also ends up needing to use the mapping. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: rename "page_shift" variable to something sensibleAlex Elder1-5/+6
In write_partial_msg_pages() there is a local variable used to track the starting offset within a bio segment to use. Its name, "page_shift" defies the Linux convention of using that name for log-base-2(page size). Since it's only used in the bio case rename it "bio_offset". Use it along with the page_pos field to compute the memory offset when computing CRC's in that function. This makes the bio case match the others more closely. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: get rid of zero_page_addressAlex Elder1-9/+2
There's not a lot of benefit to zero_page_address, which basically holds a mapping of the zero page through the life of the messenger module. Even with our own mapping, the sendpage interface where it's used may need to kmap() it again. It's almost certain to be in low memory anyway. So stop treating the zero page specially in write_partial_msg_pages() and just get rid of zero_page_address entirely. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: only call kernel_sendpage() via helperAlex Elder1-6/+2
Make ceph_tcp_sendpage() be the only place kernel_sendpage() is used, by using this helper in write_partial_msg_pages(). Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: use kernel_sendpage() for sending zeroesAlex Elder1-5/+15
If a message queued for send gets revoked, zeroes are sent over the wire instead of any unsent data. This is done by constructing a message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg(). Since we are already working with a page in this case we can use the sendpage interface instead. Create a new ceph_tcp_sendpage() helper that sets up flags to match the way ceph_tcp_sendmsg() does now. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: fix inverted crc option logicAlex Elder1-10/+10
CRC's are computed for all messages between ceph entities. The CRC computation for the data portion of message can optionally be disabled using the "nocrc" (common) ceph option. The default is for CRC computation for the data portion to be enabled. Unfortunately, the code that implements this feature interprets the feature flag wrong, meaning that by default the CRC's have *not* been computed (or checked) for the data portion of messages unless the "nocrc" option was supplied. Fix this, in write_partial_msg_pages() and read_partial_message(). Also change the flag variable in write_partial_msg_pages() to be "no_datacrc" to match the usage elsewhere in the file. This fixes http://tracker.newdream.net/issues/2064 Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: some simple changesAlex Elder1-4/+9
Nothing too big here. - define the size of the buffer used for consuming ignored incoming data using a symbolic constant - simplify the condition determining whether to unmap the page in write_partial_msg_pages(): do it for crc but not if the page is the zero page Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: small refactor in write_partial_kvec()Alex Elder1-11/+12
Make a small change in the code that counts down kvecs consumed by a ceph_tcp_sendmsg() call. Same functionality, just blocked out a little differently. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: do crc calculations outside loopAlex Elder1-14/+12
Move blocks of code out of loops in read_partial_message_section() and read_partial_message(). They were only was getting called at the end of the last iteration of the loop anyway. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: separate CRC calculation from byte swappingAlex Elder1-14/+17
Calculate CRC in a separate step from rearranging the byte order of the result, to improve clarity and readability. Use offsetof() to determine the number of bytes to include in the CRC calculation. In read_partial_message(), switch which value gets byte-swapped, since the just-computed CRC is already likely to be in a register. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: use "do" in CRC-related Boolean variablesAlex Elder1-20/+20
Change the name (and type) of a few CRC-related Boolean local variables so they contain the word "do", to distingish their purpose from variables used for holding an actual CRC value. Note that in the process of doing this I identified a fairly serious logic error in write_partial_msg_pages(): the value of "do_crc" assigned appears to be the opposite of what it should be. No attempt to fix this is made here; this change preserves the erroneous behavior. The problem I found is documented here: http://tracker.newdream.net/issues/2064 Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22ceph: ensure Boolean options support both sensesAlex Elder1-0/+10
Many ceph-related Boolean options offer the ability to both enable and disable a feature. For all those that don't offer this, add a new option so that they do. Note that ceph_show_options()--which reports mount options currently in effect--only reports the option if it is different from the default value. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: a few small changesAlex Elder1-14/+18
This gathers a number of very minor changes: - use %hu when formatting the a socket address's address family - null out the ceph_msgr_wq pointer after the queue has been destroyed - drop a needless cast in ceph_write_space() - add a WARN() call in ceph_state_change() in the event an unrecognized socket state is encountered - rearrange the logic in ceph_con_get() and ceph_con_put() so that: - the reference counts are only atomically read once - the values displayed via dout() calls are known to be meaningful at the time they are formatted Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: make ceph_tcp_connect() return intAlex Elder1-8/+6
There is no real need for ceph_tcp_connect() to return the socket pointer it creates, since it already assigns it to con->sock, which is visible to the caller. Instead, have it return an error code, which tidies things up a bit. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: encapsulate some messenger cleanup codeAlex Elder1-18/+20
Define a helper function to perform various cleanup operations. Use it both in the exit routine and in the init routine in the event of an error. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22libceph: make ceph_msgr_wq privateAlex Elder1-1/+1
The messenger workqueue has no need to be public. So give it static scope. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>