linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2008-07-21	md: Protect access to mddev->disks list using RCU	NeilBrown	3	-17/+31
	All modifications and most access to the mddev->disks list are made under the reconfig_mutex lock. However there are three places where the list is walked without any locking. If a reconfig happens at this time, havoc (and oops) can ensue. So use RCU to protect these accesses: - wrap them in rcu_read_{,un}lock() - use list_for_each_entry_rcu - add to the list with list_add_rcu - delete from the list with list_del_rcu - delay the 'free' with call_rcu rather than schedule_work Note that export_rdev did a list_del_init on this list. In almost all cases the entry was not in the list anymore so it was a no-op and so safe. It is no longer safe as after list_del_rcu we may not touch the list_head. An audit shows that export_rdev is called: - after unbind_rdev_from_array, in which case the delete has already been done, - after bind_rdev_to_array fails, in which case the delete isn't needed. - before the device has been put on a list at all (e.g. in add_new_disk where reading the superblock fails). - and in autorun devices after a failure when the device is on a different list. So remove the list_del_init call from export_rdev, and add it back immediately before the called to export_rdev for that last case. Note also that ->same_set is sometimes used for lists other than mddev->list (e.g. candidates). In these cases rcu is not needed. Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-21	md: only count actual openers as access which prevent a 'stop'	NeilBrown	2	-4/+8
	Open isn't the only thing that increments ->active. e.g. reading /proc/mdstat will increment it briefly. So to avoid false positives in testing for concurrent access, introduce a new counter that counts just the number of times the md device it open. Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-21	md: linear: Make array_size sector-based and rename it to array_sectors.	Andre Noll	2	-9/+9
	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-21	md: Make mddev->array_size sector-based.	Andre Noll	9	-28/+33
	This patch renames the array_size field of struct mddev_s to array_sectors and converts all instances to use units of 512 byte sectors instead of 1k blocks. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-21	md: Make super_type->rdev_size_change() take sector-based sizes.	Andre Noll	1	-21/+19
	Also, change the type of the size parameter from unsigned long long to sector_t and rename it to num_sectors. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-21	md: Fix check for overlapping devices.	Andre Noll	1	-2/+3
	The checks in overlaps() expect all parameters either in block-based or sector-based quantities. However, its single caller passes two rdev->data_offset arguments as well as two rdev->size arguments, the former being sector counts while the latter are measured in 1K blocks. This could cause rdev_size_store() to accept an invalid size from user space. Fix it by passing only sector-based quantities to overlaps(). Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-21	md: Tidy up rdev_size_store a bit:	Neil Brown	2	-10/+9
	- used strict_strtoull in place of simple_strtoull - use my_mddev in place of rdev->mddev (they have the same value) and more significantly, - don't adjust mddev->size to fit, rather reject changes which make rdev->size smaller than mddev->size Adjusting mddev->size is a hangover from bind_rdev_to_array which does a similar thing. But it really is a better design to insist that mddev->size is set as required, then the rdev->sizes are set to allow for that. The previous way invites confusion. Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-11	md: Remove some unused macros.	Andre Noll	1	-3/+0
	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	md: Turn rdev->sb_offset into a sector-based quantity.	Andre Noll	3	-49/+44
	Rename it to sb_start to make sure all users have been converted. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	md: Make calc_dev_sboffset() return a sector count.	Andre Noll	1	-6/+7
	As BLOCK_SIZE_BITS is 10 and MD_NEW_SIZE_SECTORS(2 * x) = 2 * NEW_SIZE_BLOCKS(x), the return value of calc_dev_sboffset() doubles. Fix up all three callers accordingly. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	md: Replace calc_dev_size() by calc_num_sectors().	Andre Noll	1	-11/+7
	Number of sectors is the preferred unit for sizes of raid devices, so change calc_dev_size() so that it returns this unit instead of the number of 1K blocks. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	md: Make update_size() take the number of sectors.	Andre Noll	1	-18/+18
	Changing the internal representations of sizes of raid devices from 1K blocks to sector counts (512B units) is desirable because it allows to get rid of many divisions/multiplications and unnecessary casts that are present in the current code. This patch is a first step in this direction. It replaces the old 1K-based "size" argument of update_size() by "num_sectors" and fixes up its two callers. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	md: Better control of when do_md_stop is allowed to stop the array.	Neil Brown	1	-14/+15
	do_md_stop check the number of active users before allowing the array to be stopped. Two problems: 1/ it assumes the request is coming through an open file descriptor (via ioctl) so it allows for that. This is not always the case. 2/ it doesn't do the check it the array hasn't been activated. This is not good for cases when we use an inactive array to hold some devices in a container. Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	md: get_disk_info(): Don't convert between signed and unsigned and back.	Andre Noll	1	-4/+1
	The current code copies a signed int from user space, converts it to unsigned and passes the unsigned value to find_rdev_nr() which expects a signed value. Simply pass the signed value from user space directly. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	md: Simplify restart_array().	Andre Noll	1	-32/+17
	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	md: alloc_disk_sb(): Return proper error value.	Andre Noll	1	-1/+1
	If alloc_page() fails, ENOMEM is a more suitable error value than EINVAL. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	md: Simplify sb_equal().	Andre Noll	1	-5/+1
	The only caller of sb_equal() tests the return value against zero, so it's OK to return the negated return value of memcmp(). Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	md: Simplify uuid_equal().	Andre Noll	1	-9/+4
	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11	Merge branch 'master' into for-next	Neil Brown	110	-369/+2192

2008-07-10	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	33	-128/+351
	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits) tun: Persistent devices can get stuck in xoff state xfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info ipv6: missed namespace context in ipv6_rthdr_rcv netlabel: netlink_unicast calls kfree_skb on error path by itself ipv4: fib_trie: Fix lookup error return tcp: correct kcalloc usage ip: sysctl documentation cleanup Documentation: clarify tcp_{r,w}mem sysctl docs netfilter: nf_nat_snmp_basic: fix a range check in NAT for SNMP netfilter: nf_conntrack_tcp: fix endless loop libertas: fix memory alignment problems on the blackfin zd1211rw: stop beacons on remove_interface rt2x00: Disable synchronization during initialization rc80211_pid: Fix fast_start parameter handling sctp: Add documentation for sctp sysctl variable ipv6: fix race between ipv6_del_addr and DAD timer irda: Fix netlink error path return value irda: New device ID for nsc-ircc irda: via-ircc proper dma freeing sctp: Mark the tsn as received after all allocations finish ...
2008-07-10	tun: Persistent devices can get stuck in xoff state	Max Krasnyansky	1	-0/+6
	The scenario goes like this. App stops reading from tun/tap. TX queue gets full and driver does netif_stop_queue(). App closes fd and TX queue gets flushed as part of the cleanup. Next time the app opens tun/tap and starts reading from it but the xoff state is not cleared. We're stuck. Normally xoff state is cleared when netdev is brought up. But in the case of persistent devices this happens only during initial setup. The fix is trivial. If device is already up when an app opens it we clear xoff state and that gets things moving again. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10	xfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info	Steffen Klassert	2	-2/+2
	Add a XFRM_STATE_AF_UNSPEC flag to handle the AF_UNSPEC behavior for the selector family. Userspace applications can set this flag to leave the selector family of the xfrm_state unspecified. This can be used to to handle inter family tunnels if the selector is not set from userspace. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10	ipv6: missed namespace context in ipv6_rthdr_rcv	Denis V. Lunev	1	-1/+1
	Signed-off-by: Denis V. Lunev <den@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10	netlabel: netlink_unicast calls kfree_skb on error path by itself	Denis V. Lunev	3	-21/+4
	So, no need to kfree_skb here on the error path. In this case we can simply return. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10	ipv4: fib_trie: Fix lookup error return	Ben Hutchings	1	-11/+6
	In commit a07f5f508a4d9728c8e57d7f66294bf5b254ff7f "[IPV4] fib_trie: style cleanup", the changes to check_leaf() and fn_trie_lookup() were wrong - where fn_trie_lookup() would previously return a negative error value from check_leaf(), it now returns 0. Now fn_trie_lookup() doesn't appear to care about plen, so we can revert check_leaf() to returning the error value. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Tested-by: William Boughton <bill@boughton.de> Acked-by: Stephen Heminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10	tcp: correct kcalloc usage	Milton Miller	1	-1/+1
	kcalloc is supposed to be called with the count as its first argument and the element size as the second. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10	ip: sysctl documentation cleanup	Stephen Hemminger	1	-31/+29
	Reduced version of the spelling cleanup patch. Take out the confusing language in tcp_frto, and organize the undocumented values. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10	Documentation: clarify tcp_{r,w}mem sysctl docs	J. Bruce Fields	1	-11/+15
	Fix some of the defaults and attempt to clarify some language. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10	slub: Fix use-after-preempt of per-CPU data structure	Dmitry Adamushko	1	-1/+3
	Vegard Nossum reported a crash in kmem_cache_alloc(): BUG: unable to handle kernel paging request at da87d000 IP: [<c01991c7>] kmem_cache_alloc+0xc7/0xe0 pde = 28180163 pte = 1a87d160 Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC Pid: 3850, comm: grep Not tainted (2.6.26-rc9-00059-gb190333 #5) EIP: 0060:[<c01991c7>] EFLAGS: 00210203 CPU: 0 EIP is at kmem_cache_alloc+0xc7/0xe0 EAX: 00000000 EBX: da87c100 ECX: 1adad71a EDX: 6b6b6b6b ESI: 00200282 EDI: da87d000 EBP: f60bfe74 ESP: f60bfe54 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 and analyzed it: "The register %ecx looks innocent but is very important here. The disassembly: mov %edx,%ecx shr $0x2,%ecx rep stos %eax,%es:(%edi) <-- the fault So %ecx has been loaded from %edx... which is 0x6b6b6b6b/POISON_FREE. (0x6b6b6b6b >> 2 == 0x1adadada.) %ecx is the counter for the memset, from here: memset(object, 0, c->objsize); i.e. %ecx was loaded from c->objsize, so "c" must have been freed. Where did "c" come from? Uh-oh... c = get_cpu_slab(s, smp_processor_id()); This looks like it has very much to do with CPU hotplug/unplug. Is there a race between SLUB/hotplug since the CPU slab is used after it has been freed?" Good analysis. Yeah, it's possible that a caller of kmem_cache_alloc() -> slab_alloc() can be migrated on another CPU right after local_irq_restore() and before memset(). The inital cpu can become offline in the mean time (or a migration is a consequence of the CPU going offline) so its 'kmem_cache_cpu' structure gets freed ( slab_cpuup_callback). At some point of time the caller continues on another CPU having an obsolete pointer... Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10	exec: fix stack excutability without PT_GNU_STACK	Hugh Dickins	1	-1/+1
	Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32) exec'ing an ELF without a PT_GNU_STACK program header should default to an executable stack; but this got broken by the unlimited argv feature because stack vma is now created before the right personality has been established: so breaking old binaries using nested function trampolines. Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack vm_flags used to be set, before the mprotect_fixup. Checking through our existing VM_flags, none would have changed since insert_vm_struct: so this seems safer than finding a way through the personality labyrinth. Reported-by: pageexec@freemail.hu Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2	Linus Torvalds	1	-7/+7
	* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: Fix flags in ocfs2_file_lock
2008-07-10	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	Linus Torvalds	1	-3/+4
	* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: fix cpu hotplug, cleanup sched: fix cpu hotplug
2008-07-10	sched: fix cpu hotplug, cleanup	Linus Torvalds	1	-6/+5
	Clean up __migrate_task(): to just have separate "done" and "fail" cases, instead of that "out" case with random error behavior. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-10	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	Linus Torvalds	1	-1/+24
	* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix /dev/mem compatibility under PAT
2008-07-10	Fix PREEMPT_RCU without HOTPLUG_CPU	Nick Piggin	1	-12/+8
	PREEMPT_RCU without HOTPLUG_CPU is broken. The rcu_online_cpu is called to initially populate rcu_cpu_online_map with all online CPUs when the hotplug event handler is installed, and also to populate the map with CPUs as they come online. The former case is meant to happen with and without HOTPLUG_CPU, but without HOTPLUG_CPU, the rcu_offline_cpu function is no-oped -- while it still gets called, it does not set the rcu CPU map. With a blank RCU CPU map, grace periods get to tick by completely oblivious to active RCU read side critical sections. This results in free-before-grace bugs. Fix is obvious once the problem is known. (Also, change __devinit to __cpuinit so the function gets thrown away on !HOTPLUG_CPU kernels). Signed-off-by: Nick Piggin <npiggin@suse.de> Reported-and-tested-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ Nick is my personal hero of the day - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10	ftrace: Documentation	Steven Rostedt	1	-0/+1353
	This is the long awaited ftrace.txt. It explains in quite detail how to use ftrace and the various tracers. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10	arch/x86/kernel/.gitignore: Added vmlinux.lds to .gitignore file because it shouldn't be tracked.	Daniel Guilak	1	-0/+1
	Signed-off-by: Daniel Guilak <daniel@danielguilak.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10	kernel/kprobes.c: Made kprobe_blacklist static.	Daniel Guilak	1	-1/+1
	Signed-off-by: Daniel Guilak <daniel@danielguilak.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	Linus Torvalds	1	-2/+8
	* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: chainiv - Invoke completion function
2008-07-10	Merge branch 'for-2.6.26' of git://neil.brown.name/md	Linus Torvalds	1	-6/+1
	* 'for-2.6.26' of git://neil.brown.name/md: md: ensure all blocks are uptodate or locked when syncing
2008-07-10	ocfs2: Fix flags in ocfs2_file_lock	Mark Fasheh	1	-7/+7
	The stack-glue merge changed the way we use flags in dlmglue in that we now use the fs/dlm equivalents. Unfortunately, a merge error left the new flock code only partially updated. This took a while to show up though, because the lock level constants are actually identical between o2dlm and fs/dlm. The _CONVERT and _NOQUEUE flags have different values though, which is eventually causing a crash in flags_to_o2dlm(). Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-10	crypto: chainiv - Invoke completion function	Herbert Xu	1	-2/+8
	When chainiv postpones requests it never calls their completion functions. This causes symptoms such as memory leaks when IPsec is in use. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-07-10	x86: fix /dev/mem compatibility under PAT	Venkatesh Pallipadi	1	-1/+24
	Add ioremap_default(), which gives a sane mapping without worrying about type conflicts. Use it in /dev/mem read in place of ioremap(), as with ioremap(), any mapping of the region (other than UC_MINUS) will cause a conflict and failure of /dev/mem read. Should address the vbetest failure reported at: http://bugzilla.kernel.org/show_bug.cgi?id=11057 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-10	sched: fix cpu hotplug	Dmitry Adamushko	1	-1/+3
	I think we may have a race between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu() when the later one may end up looping endlessly. Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, ...) is called so we may get a race between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may push a task out of a dead CPU causing the later one to loop endlessly. Heiko Carstens observed: \| That's exactly what explains a dump I got yesterday. Thanks for fixing! :) Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: miaox@cn.fujitsu.com Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Avi Kivity <avi@qumranet.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-10	Merge branch 'for-2.6.26' into for-next	Neil Brown	0	-0/+0
	Conflicts: drivers/md/raid5.c
2008-07-10	md: ensure all blocks are uptodate or locked when syncing	Dan Williams	1	-6/+1
	Remove the dubious attempt to prefer 'compute' over 'read'. Not only is it wrong given commit c337869d (md: do not compute parity unless it is on a failed drive), but it can trigger a BUG_ON in handle_parity_checks5(). Cc: <stable@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-09	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	David S. Miller	9	-25/+63

2008-07-09	netfilter: nf_nat_snmp_basic: fix a range check in NAT for SNMP	David Howells	1	-1/+1
	Fix a range check in netfilter IP NAT for SNMP to always use a big enough size variable that the compiler won't moan about comparing it to ULONG_MAX/8 on a 64-bit platform. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-09	netfilter: nf_conntrack_tcp: fix endless loop	Patrick McHardy	1	-2/+8
	When a conntrack entry is destroyed in process context and destruction is interrupted by packet processing and the packet is an attempt to reopen a closed connection, TCP conntrack tries to kill the old entry itself and returns NF_REPEAT to pass the packet through the hook again. This may lead to an endless loop: TCP conntrack repeatedly finds the old entry, but can not kill it itself since destruction is already in progress, but destruction in process context can not complete since TCP conntrack is keeping the CPU busy. Drop the packet in TCP conntrack if we can't kill the connection ourselves to avoid this. Reported by: hemao77@gmail.com [ Kernel bugzilla #11058 ] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-09	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband	Linus Torvalds	1	-0/+4
	* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/cxgb3: Fix regression caused by class_device -> device conversion