aboutsummaryrefslogtreecommitdiffstats
path: root/lib/scatterlist.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2008-02-08IPC/semaphores: consolidate SEM_STAT and IPC_STAT commandsPierre Peiffer1-22/+16
These commands (SEM_STAT and IPC_STAT) are rather doing the same things (only the meaning of the id given as input and the return value differ). However, for the semaphores, they are handled in two different places (two different functions). This patch consolidates this for clarification by handling these both commands in the same place in semctl_nolock(). It also removes one unused parameter for this function. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08ipc: uninline some code from util.hPavel Emelyanov2-58/+57
ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be inline. Besides, they give no optimization being inline as they perform calls inside in any case. Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC internal API. $ ./scripts/bloat-o-meter vmlinux-orig vmlinux add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499) function old new delta ipcget - 392 +392 ipc_lock_check_down - 49 +49 ipc_lock_check - 49 +49 sys_semget 119 105 -14 sys_shmget 108 86 -22 sys_msgget 100 78 -22 do_msgsnd 665 631 -34 do_msgrcv 680 644 -36 do_shmat 771 733 -38 sys_msgctl 1302 1229 -73 ipcget_new 80 - -80 sys_semtimedop 1534 1452 -82 sys_semctl 2034 1922 -112 sys_shmctl 1919 1765 -154 ipcget_public 322 - -322 The ipcget() growth is the result of gcc inlining of currently static ipcget_new/_public. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08serial_core: bring mostly into line with coding styleAlan Cox1-27/+28
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-088250: enable rate reporting via termiosAlan Cox1-0/+1
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08serial8250: coding styleAlan Cox1-9/+10
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-088250_pci: coding styleAlan Cox1-55/+54
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-088250_hub6: codding styleAlan Cox1-12/+12
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-088250_hp300: coding styleAlan Cox1-20/+15
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-088250_gsc: coding styleAlan Cox1-9/+8
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-088250_early: coding styleAlan Cox1-8/+15
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08tty_ioctl: drag screaming into compliance with the coding styleAlan Cox1-182/+180
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08tty_io: drag screaming into coding style complianceAlan Cox1-278/+283
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08tty_audit: fix checkpatch complaintAlan Cox1-1/+1
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08rocket: don't let random users reset the controllerAlan Cox1-0/+3
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08rocket: first pass at termios reportingAlan Cox1-10/+8
Also removes a cflag comparison that caused some mode changes to get wrongly ignored Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08n_tty: clean up old code to follow coding style and (mostly) checkpatchAlan Cox1-75/+73
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08moxa: first pass at termios reportingAlan Cox1-8/+15
Signed-off-by: Alan Cox <alan@redhat.com> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08fix "modules: make module_address_lookup() safe"Andrew Morton2-4/+4
Get the constness right, avoid nasty cast. Cc: Ingo Molnar <mingo@elte.hu> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08modules: include sections.h to avoid defining linker variables explicitlyChristoph Lameter1-3/+1
module.c should not define linker variables on its own. We have an include file for that. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08Modules: handle symbols that have a zero valueChristoph Lameter1-9/+14
The module subsystem cannot handle symbols that are zero. If symbols are present that have a zero value then the module resolver prints out a message that these symbols are unresolved. [akinobu.mita@gmail.com: fix __find_symbl() error checks] Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Kay Sievers <kay.sievers@vrfy.org Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08tty: s390 support for termios2.Heiko Carstens3-1/+11
Backend for s390. Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08tty: let architectures override the user/kernel macros.Heiko Carstens1-0/+6
Give architectures that support the new termios2 the possibilty to overide the user_termios_to_kernel_termios and kernel_termios_to_user_termios macros. As soon as all architectures that use the generic variant have been converted the ifdefs can go away again. Architectures in question are avr32, frv, powerpc and s390. Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: David Howells <dhowells@redhat.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08intel-iommu: fault_reason index cleanupmark gross2-7/+6
Fix an off by one bug in the fault reason string reporting function, and clean up some of the code around this buglet. [akpm@linux-foundation.org: cleanup] Signed-off-by: mark gross <mgross@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08intel-iommu: PMEN supportmark gross2-1/+24
Add support for protected memory enable bits by clearing them if they are set at startup time. Some future boot loaders or firmware could have this bit set after it loads the kernel, and it needs to be cleared if DMA's are going to happen effectively. Signed-off-by: mark gross <mgross@intel.com> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: fix ->open'less usage due to ->proc_fops flipAlexey Dobriyan4-11/+47
Typical PDE creation code looks like: pde = create_proc_entry("foo", 0, NULL); if (pde) pde->proc_fops = &foo_proc_fops; Notice that PDE is first created, only then ->proc_fops is set up to final value. This is a problem because right after creation a) PDE is fully visible in /proc , and b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's possible to ->read without ->open (see one class of oopses below). The fix is new API called proc_create() which makes sure ->proc_fops are set up before gluing PDE to main tree. Typical new code looks like: pde = proc_create("foo", 0, NULL, &foo_proc_fops); if (!pde) return -ENOMEM; Fix most networking users for a start. In the long run, create_proc_entry() for regular files will go. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024 printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000 Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/block/sda/sda1/dev Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2) EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0 EIP is at mutex_lock_nested+0x75/0x25d EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570 ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000) Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb 00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246 00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550 Call Trace: [<c106f7ce>] seq_read+0x24/0x28a [<c106f7aa>] seq_read+0x0/0x28a [<c106f7ce>] seq_read+0x24/0x28a [<c106f7aa>] seq_read+0x0/0x28a [<c10818b8>] proc_reg_read+0x60/0x73 [<c1081858>] proc_reg_read+0x0/0x73 [<c105a34f>] vfs_read+0x6c/0x8b [<c105a6f3>] sys_read+0x3c/0x63 [<c10025f2>] sysenter_past_esp+0x5f/0xa5 [<c10697a7>] destroy_inode+0x24/0x33 ======================= INFO: lockdep is turned off. Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33 EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8 [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: fix the threaded /proc/selfEric W. Biederman1-6/+6
Long ago when the CLONE_THREAD support first went it someone thought it would be wise to point /proc/self at /proc/<tgid> instead of /proc/<pid>. Given that /proc/<tgid> can return information about a very different task (if enough things have been unshared) then our current process /proc/<tgid> seems blatantly wrong. So far I have yet to think up an example where the current behavior would be advantageous, and I can see several places where it is seriously non-intuitive. We may be stuck with the current broken behavior for backwards compatibility reasons but lets try fixing our ancient bug for the 2.6.25 time frame and see if anyone screams. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: "Guillaume Chazarain" <guichaz@yahoo.fr> Cc: "Pavel Emelyanov" <xemul@openvz.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: proper pidns handling for /proc/selfEric W. Biederman1-2/+10
Currently if you access a /proc that is not mounted with your processes current pid namespace /proc/self will point at a completely random task. This patch fixes /proc/self to point to the current process if it is available in the particular mount of /proc or to return -ENOENT if the current process is not visible. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: seqfile convert proc_pid_status to properly handle pid namespacesEric W. Biederman10-101/+103
Currently we possibly lookup the pid in the wrong pid namespace. So seq_file convert proc_pid_status which ensures the proper pid namespaces is passed in. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: another build fix] [akpm@linux-foundation.org: s390 build fix] [akpm@linux-foundation.org: fix task_name() output] [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morgan <morgan@kernel.org> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08seqfile convert proc_pid_statmEric W. Biederman3-6/+9
This conversion is just for code cleanliness, uniformity, and general safety. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: rewrite do_task_stat to correctly handle pid namespaces.Eric W. Biederman3-17/+20
Currently (as pointed out by Oleg) do_task_stat has a race when calling task_pid_nr_ns with the task exiting. In addition do_task_stat is not currently displaying information in the context of the pid namespace that mounted the /proc filesystem. So "cut -d' ' -f 1 /proc/<pid>/stat" may not equal <pid>. This patch fixes the problem by converting to a single_open seq_file show method. Getting the pid namespace from the filesystem superblock instead of current, and simply using the the struct pid from the inode instead of attempting to get that same pid from the task. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: implement proc_single_file_operationsEric W. Biederman2-0/+46
Currently many /proc/pid files use a crufty precursor to the current seq_file api, and they don't have direct access to the pid_namespace or the pid of for which they are displaying data. So implement proc_single_file_operations to make the seq_file routines easy to use, and to give access to the full state of the pid of we are displaying data for. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: detect duplicate names on registrationZhang Rui1-0/+10
Print a warning if PDE is registered with a name which already exists in target directory. Bug report and a simple fix can be found here: http://bugzilla.kernel.org/show_bug.cgi?id=8798 [\n fixlet and no undescriptive variable usage --adobriyan] [akpm@linux-foundation.org: make printk comprehensible] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: remove useless check on symlink removalAlexey Dobriyan1-1/+1
proc symlinks always have valid ->data containing destination of symlink. No need to check it on removal -- proc_symlink() already done it. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: simplify function prototypesAlexey Dobriyan2-15/+6
Move code around so as to reduce the number of forward-declarations. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: less LOCK operations during lookupAlexey Dobriyan1-2/+2
Pseudo-code for lookup effectively is: LOCK kernel LOCK proc_subdir_lock find PDE UNLOCK proc_subdir_lock get inode LOCK proc_subdir_lock goto unlock UNLOCK proc_subdir_lock UNLOCK kernel We can get rid of LOCK/UNLOCK pair after getting inode simply by jumping to unlock_kernel() directly. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08proc: remove MODULE_LICENSEAlexey Dobriyan1-1/+0
proc is not modular, so MODULE_LICENSE just expands to empty space. proc without doubts remains GPLed. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: mark NET_NS with "depends on NAMESPACES"Pavel Emelyanov1-1/+1
There's already an option controlling the net namespaces cloning code, so make it work the same way as all the other namespaces' options. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: cleanup the code managed with PID_NS optionPavel Emelyanov6-194/+220
Just like with the user namespaces, move the namespace management code into the separate .c file and mark the (already existing) PID_NS option as "depend on NAMESPACES" [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: cleanup the code managed with the USER_NS optionPavel Emelyanov4-24/+21
Make the user_namespace.o compilation depend on this option and move the init_user_ns into user.c file to make the kernel compile and work without the namespaces support. This make the user namespace code be organized similar to other namespaces'. Also mask the USER_NS option as "depend on NAMESPACES". [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: move the IPC namespace under IPC_NS optionPavel Emelyanov12-112/+164
Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: move the UTS namespace under UTS_NS optionPavel Emelyanov3-1/+30
Currently all the namespace management code is in the kernel/utsname.c file, so just compile it out and make stubs in the appropriate header. The init namespace itself is in init/version.c and is in the kernel all the time. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: add the NAMESPACES config optionPavel Emelyanov1-0/+9
The option is selectable if EMBEDDED is chosen only. When the EMBEDDED is off namespaces will be on. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08hugetlb: add locking for overcommit sysctlNishanth Aravamudan3-1/+12
When I replaced hugetlb_dynamic_pool with nr_overcommit_hugepages I used proc_doulongvec_minmax() directly. However, hugetlb.c's locking rules require that all counter modifications occur under the hugetlb_lock. Add a callback into the hugetlb code similar to the one for nr_hugepages. Grab the lock around the manipulation of nr_overcommit_hugepages in proc_doulongvec_minmax(). Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08inotify: fix check for one-shot watches before destroying themUlisses Furquim1-1/+1
As the IN_ONESHOT bit is never set when an event is sent we must check it in the watch's mask and not in the event's mask. Signed-off-by: Ulisses Furquim <ulissesf@gmail.com> Reported-by: "Clem Taylor" <clem.taylor@gmail.com> Tested-by: "Clem Taylor" <clem.taylor@gmail.com> Cc: Amy Griffis <amy.griffis@hp.com> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07Convert SG from nopage to fault.Nick Piggin1-12/+11
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Douglas Gilbert <dougg@torque.net> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08dm raid1: report fault statusJonathan Brassow1-8/+36
This patch adds extra information to the mirror status output, so that it can be determined which device(s) have failed. For each mirror device, a character is printed indicating the most severe error encountered. The characters are: * A => Alive - No failures * D => Dead - A write failure occurred leaving mirror out-of-sync * S => Sync - A sychronization failure occurred, mirror out-of-sync * R => Read - A read failure occurred, mirror data unaffected This allows userspace to properly reconfigure the mirror set. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-02-08dm raid1: handle read failuresJonathan Brassow1-45/+211
This patch gives the ability to respond-to/record device failures that happen during read operations. It also adds the ability to read from mirror devices that are not the primary if they are in-sync. There are essentially two read paths in mirroring; the direct path and the queued path. When a read request is mapped, if the region is 'in-sync' the direct path is taken; otherwise the queued path is taken. If the direct path is taken, we must record bio information so that if the read fails we can retry it. We then discover the status of a direct read through mirror_end_io. If the read has failed, we will mark the device from which the read was attempted as failed (so we don't try to read from it again), restore the bio and try again. If the queued path is taken, we discover the results of the read from 'read_callback'. If the device failed, we will mark the device as failed and attempt the read again if there is another device where this region is known to be 'in-sync'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-02-08dm raid1: fix EIO after log failureJonathan Brassow1-11/+90
This patch adds the ability to requeue write I/O to core device-mapper when there is a log device failure. If a write to the log produces and error, the pending writes are put on the "failures" list. Since the log is marked as failed, they will stay on the failures list until a suspend happens. Suspends come in two phases, presuspend and postsuspend. We must make sure that all the writes on the failures list are requeued in the presuspend phase (a requirement of dm core). This means that recovery must be complete (because writes may be delayed behind it) and the failures list must be requeued before we return from presuspend. The mechanisms to ensure recovery is complete (or stopped) was already in place, but needed to be moved from postsuspend to presuspend. We rely on 'flush_workqueue' to ensure that the mirror thread is complete and therefore, has requeued all writes in the failures list. Because we are using flush_workqueue, we must ensure that no additional 'queue_work' calls will produce additional I/O that we need to requeue (because once we return from presuspend, we are unable to do anything about it). 'queue_work' is called in response to the following functions: - complete_resync_work = NA, recovery is stopped - rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it is ready to recover the region (recovery is stopped) or it needs to clear the region in the log* **this doesn't get called while suspending** - rh_recovery_end = NA, recovery is stopped - rh_recovery_start = NA, recovery is stopped - write_callback = 1) Writes w/o failures simply call bio_endio -> mirror_end_io -> rh_dec (see rh_dec above) 2) Writes with failures are put on the failures list and queue_work is called** ** write_callbacks don't happen during suspend ** - do_failures = NA, 'queue_work' not called if suspending - add_mirror (initialization) = NA, only done on mirror creation - queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue is called. 2) No more I/Os are being issued. 3) Re-attempted READs can still be handled. (Write completions are handled through rh_dec/ write_callback - mention above - and do not use queue_bio.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-02-08dm raid1: handle recovery failuresJonathan Brassow1-3/+20
This patch adds the calls to 'fail_mirror' if an error occurs during mirror recovery (aka resynchronization). 'fail_mirror' is responsible for recording the type of error by mirror device and ensuring an event gets raised for the purpose of notifying userspace. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-02-08dm raid1: handle write failuresJonathan Brassow1-26/+224
This patch gives mirror the ability to handle device failures during normal write operations. The 'write_callback' function is called when a write completes. If all the writes failed or succeeded, we report failure or success respectively. If some of the writes failed, we call fail_mirror; which increments the error count for the device, notes the type of error encountered (DM_RAID1_WRITE_ERROR), and selects a new primary (if necessary). Note that the primary device can never change while the mirror is not in-sync (IOW, while recovery is happening.) This means that the scenario where a failed write changes the primary and gives recovery_complete a chance to misread the primary never happens. The fact that the primary can change has necessitated the change to the default_mirror field. We need to protect against reading garbage while the primary changes. We then add the bio to a new list in the mirror set, 'failures'. For every bio in the 'failures' list, we call a new function, '__bio_mark_nosync', where we mark the region 'not-in-sync' in the log and properly set the region state as, RH_NOSYNC. Userspace must also be notified of the failure. This is done by 'raising an event' (dm_table_event()). If fail_mirror is called in process context the event can be raised right away. If in interrupt context, the event is deferred to the kmirrord thread - which raises the event if 'event_waiting' is set. Backwards compatibility is maintained by ignoring errors if the DM_FEATURES_HANDLE_ERRORS flag is not present. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>