aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux (follow)
AgeCommit message (Collapse)AuthorFilesLines
2006-10-03[PATCH] md: replace magic numbers in sb_dirty with well defined bit flagsNeilBrown2-2/+5
Instead of magic numbers (0,1,2,3) in sb_dirty, we have some flags instead: MD_CHANGE_DEVS Some device state has changed requiring superblock update on all devices. MD_CHANGE_CLEAN The array has transitions from 'clean' to 'dirty' or back, requiring a superblock update on active devices, but possibly not on spares MD_CHANGE_PENDING A superblock update is underway. We wait for an update to complete by waiting for all flags to be clear. A flag can be set at any time, even during an update, without risk that the change will be lost. Stop exporting md_update_sb - isn't needed. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] md: fix a comment that is wrong in raid5.hNeilBrown1-2/+3
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] md: the scheduled removal of the START_ARRAY ioctl for mdAdrian Bunk2-2/+1
This patch contains the scheduled removal of the START_ARRAY ioctl for md. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] dm table: add target flushBryn Reeves2-1/+3
This patch adds support for a per-target dm_flush_fn method. This is needed to allow dm-loop to invalidate page cache mappings in response to BLKFLSBUF ioctl commands. Signed-off-by: Bryn Reeves <breeves@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] dm: extract device limit settingBryn Reeves1-0/+5
Separate the setting of device I/O limits from dm_get_device(). dm-loop will use this. Signed-off-by: Bryn Reeves <breeves@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] dm table: add target preresumeMilan Broz2-2/+4
This patch adds a target preresume hook. It is called before the targets are resumed and if it returns an error the resume gets cancelled. The crypt target will use this to indicate that it is unable to process I/O because no encryption key has been supplied. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] dm: export blkdev_driver_ioctlAlasdair G Kergon1-0/+3
Export blkdev_driver_ioctl for device-mapper. If we get as far as the device-mapper ioctl handler, we know the ioctl is not a standard block layer BLK* one, so we don't need to check for them a second time and can call blkdev_driver_ioctl() directly. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] dm: support ioctls on mapped devicesMilan Broz2-1/+6
Extend the core device-mapper infrastructure to accept arbitrary ioctls on a mapped device provided that it has exactly one target and it is capable of supporting ioctls. [We can't use unlocked_ioctl because we need 'inode': 'file' might be NULL. Is it worth changing this?] Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Arnd Bergmann <arnd@arndb.de> wrote: > Am Wednesday 21 June 2006 21:31 schrieb Alasdair G Kergon: > > static struct block_device_operations dm_blk_dops = { > > .open = dm_blk_open, > > .release = dm_blk_close, > > +.ioctl = dm_blk_ioctl, > > .getgeo = dm_blk_getgeo, > > .owner = THIS_MODULE > > I guess this also needs a ->compat_ioctl method, otherwise it won't > work for ioctl numbers that have a compat_ioctl implementation in the > low-level device driver. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] vt: proper prototypes for some console functionsAdrian Bunk2-0/+4
This patch adds proper prototypes to header files for three console init functions used on drivers/char/vt.c Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] fbdev: Honor the return value of device_create_fileAntonino A. Daplas1-0/+1
Check the return value of device_create_file(). If return is 'fail', remove attributes by calling device_remove_file(). Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] fbdev: Add generic ddc read functionalityDennis Munsie1-0/+2
Adds functionality to read the EDID information over the DDC bus in a generic way. This code is based on the DDC implementation in the radeon driver. [adaplas] - separate from fbmon.c and place in new file fb_ddc.c - remove dependency to CONFIG_I2C and CONFIG_I2C_ALGOBIT, otherwise, feature will not compile if i2c support is compiled as a module - feature is selectable only by drivers needing it. It must have a 'select FB_DDC if xxx' in Kconfig - change printk's to dev_*, the i2c people prefers it Signed-off-by: Dennis Munsie <dmunsie@cecropia.com> Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] ide: Fix crash on repeated resetAlan Cox1-0/+3
Michal Miroslaw reported a problem (bugzilla #7023) where a user initiated reset while the IDE layer was already resetting the channel caused a crash, and provided a rough fix. This is a slightly cleaner version of the fix which tracks the reset state and blocks further reset requests while a reset is in progress. Note this is not a security issue - random end users can't access the ioctl in question anyway. Signed-off-by: Alan Cox <alan@redhat.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] ide: remove dma_base2 field from ide_hwif_tSergei Shtylyov1-1/+0
Remove dma_base2 field from ide_hwif_t as it's used only in 2 drivers and without great need. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: John Keller <jpk@sgi.com> Signed-off-by: Jeremy Higdon <jeremy@sgi.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] drivers/ide/: cleanupsAdrian Bunk1-1/+0
- setup-pci.c: remove the unused ide_pci_unregister_driver() - ide-dma.c: remove the unused EXPORT_SYMBOL_GPL(ide_in_drive_list) Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] Make number of IDE interfaces configurableMatt Mackall1-1/+2
Make IDE_HWIFS configurable if EMBEDDED This lets us lop as much as 16k off an x86 build. It's a little ugly, but it's dead simple. Note the fix for HWIFS < 2. Sizing interfaces dynamically unfortunately turns out to be pretty major surgery. add/remove: 0/1 grow/shrink: 0/11 up/down: 0/-16182 (-16182) function old new delta ide_hwifs 16920 1692 -15228 init_irq 1113 750 -363 ideprobe_init 283 138 -145 ide_pci_setup_ports 1329 1193 -136 save_match 85 - -85 ide_register_hw_with_fixup 367 287 -80 ide_setup 1364 1308 -56 is_chipset_set 40 4 -36 create_proc_ide_interfaces 225 205 -20 init_ide_data 84 67 -17 ide_probe_for_cmd640x 1198 1183 -15 ide_unregister 1452 1451 -1 Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] IDE: claim extra DMA ports regardless of channelSergei Shtylylov1-1/+3
- Claim extra DMA I/O ports regardless of what IDE channels are present/enabled. - Remove extra ports handling from ide_mapped_mmio_dma() since it's not applicable to the custom-mapping IDE drivers. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] sched: cleanup sched_group cpu_power setupSiddha, Suresh B2-12/+43
Up to now sched group's cpu_power for each sched domain is initialized independently. This made the setup code ugly as the new sched domains are getting added. Make the sched group cpu_power setup code generic, by using domain child field and new domain flag in sched_domain. For most of the sched domains(except NUMA), sched group's cpu_power is now computed generically using the domain properties of itself and of the child domain. sched groups in NUMA domains are setup little differently and hence they don't use this generic mechanism. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] sched: introduce child field in sched_domainSiddha, Suresh B2-0/+4
Introduce the child field in sched_domain struct and use it in sched_balance_self(). We will also use this field in cleaning up the sched group cpu_power setup(done in a different patch) code. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] Create kallsyms_lookup_size_offset()Franck Bui-Huu1-0/+11
Some uses of kallsyms_lookup() do not need to find out the name of a symbol and its module's name it belongs. This is specially true in arch specific code, which needs to unwind the stack to show the back trace during oops (mips is an example). In this specific case, we just need to retreive the function's size and the offset of the active intruction inside it. Adds a new entry "kallsyms_lookup_size_offset()" This new entry does exactly the same as kallsyms_lookup() but does not require any buffers to store any names. It returns 0 if it fails otherwise 1. Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbersDavid Howells2-2/+2
These patches make the kernel pass 64-bit inode numbers internally when communicating to userspace, even on a 32-bit system. They are required because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS for example. The 64-bit inode numbers are then propagated to userspace automatically where the arch supports it. Problems have been seen with userspace (eg: ld.so) using the 64-bit inode number returned by stat64() or getdents64() to differentiate files, and failing because the 64-bit inode number space was compressed to 32-bits, and so overlaps occur. This patch: Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit inode number so that 64-bit inode numbers can be passed back to userspace. The stat functions then returns the full 64-bit inode number where available and where possible. If it is not possible to represent the inode number supplied by the filesystem in the field provided by userspace, then error EOVERFLOW will be issued. Similarly, the getdents/readdir functions now pass the full 64-bit inode number to userspace where possible, returning EOVERFLOW instead when a directory entry is encountered that can't be properly represented. Note that this means that some inodes will not be stat'able on a 32-bit system with old libraries where they were before - but it does mean that there will be no ambiguity over what a 32-bit inode number refers to. Note similarly that directory scans may be cut short with an error on a 32-bit system with old libraries where the scan would work before for the same reasons. It is judged unlikely that this situation will occur because modern glibc uses 64-bit capable versions of stat and getdents class functions exclusively, and that older systems are unlikely to encounter unrepresentable inode numbers anyway. [akpm: alpha build fix] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] pid.h cleanupAndrew Morton1-15/+15
Make the pid.h macros look less revolting in an 80-col window. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02Add prototype for sigset_from_compat()Linus Torvalds1-0/+1
Duh. I screwed up editing David Howells patch in commit 3f2e05e90e0846c42626e3d272454f26be34a1bc, and the actual declaration for the sigset_from_compat() function went missing. My bad. Olaf Hering saved the day and noticed that I'm a moron. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02Merge git://git.infradead.org/mtd-2.6Linus Torvalds3-5/+30
* git://git.infradead.org/mtd-2.6: [MTD] Cleanup of 'ioremap balanced with iounmap for drivers/mtd subsystem' [MTD] fix nftl_write warning [MTD] fix printk warning [MTD ONENAND] Check OneNAND lock scheme & all block unlock command support [MTD ONENAND] Remove unused MTD_ONENAND_SYNC_READ configuration [MTD ONENAND] Fix OneNAND probe [MTD NAND] Provide prototype for newly-exported nand_wait_ready() [MTD] Remove #ifndef __KERNEL__ hack in <mtd/mtd-abi.h> [MTD NAND] Allow override of page read and write functions. [MTD NAND] Allocate chip->buffers separately to allow it to be overridden [MTD NAND] Split nand_scan() into two parts; allow board driver to intervene [MTD NAND] Export nand_wait_ready() for use by board drivers
2006-10-02Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds4-88/+223
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (35 commits) Input: wistron - add support for Acer TravelMate 2424NWXCi Input: wistron - fix setting up special buttons Input: add KEY_BLUETOOTH and KEY_WLAN definitions Input: add new BUS_VIRTUAL bus type Input: add driver for stowaway serial keyboards Input: make input_register_handler() return error codes Input: remove cruft that was needed for transition to sysfs Input: fix input module refcounting Input: constify input core Input: libps2 - rearrange exports Input: atkbd - support Microsoft Natural Elite Pro keyboards Input: i8042 - disable MUX mode on Toshiba Equium A110 Input: i8042 - get rid of polling timer Input: send key up events at disconnect Input: constify psmouse driver Input: i8042 - add Amoi to the MUX blacklist Input: logips2pp - add sugnature 56 (Cordless MouseMan Wheel), cleanup Input: add driver for Touchwin serial touchscreens Input: add driver for Touchright serial touchscreens Input: add driver for Penmount serial touchscreens ...
2006-10-02[PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.hDavid Howells1-0/+1
Revert Andrew Morton's patch to temporarily hack around the lack of a declaration of sigset_t in linux/compat.h to make the block-disablement patches build on IA64. This got accidentally pushed to Linus and should be fixed in a different manner. Also make linux/compat.h #include asm/signal.h to gain a definition of sigset_t so that it can externally declare sigset_from_compat(). This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and ppc64 and run-tested on frv. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] replace cad_pid by a struct pidCedric Le Goater1-0/+7
There are a few places in the kernel where the init task is signaled. The ctrl+alt+del sequence is one them. It kills a task, usually init, using a cached pid (cad_pid). This patch replaces the pid_t by a struct pid to avoid pid wrap around problem. The struct pid is initialized at boot time in init() and can be modified through systctl with /proc/sys/kernel/cad_pid [ I haven't found any distro using it ? ] It also introduces a small helper routine kill_cad_pid() which is used where it seemed ok to use cad_pid instead of pid 1. [akpm@osdl.org: cleanups, build fix] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] introduce get_task_pid() to fix unsafe get_pid()Oleg Nesterov1-0/+2
proc_pid_make_inode: ei->pid = get_pid(task_pid(task)); I think this is not safe. get_pid() can be preempted after checking "pid != NULL". Then the task exits, does detach_pid(), and RCU frees the pid. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] remove remaining errno and __KERNEL_SYSCALLS__ referencesArnd Bergmann1-5/+1
The last in-kernel user of errno is gone, so we should remove the definition and everything referring to it. This also removes the now-unused lib/execve.c file that was introduced earlier. Also remove every trace of __KERNEL_SYSCALLS__ that still remained in the kernel. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] rename the provided execve functions to kernel_execveArnd Bergmann1-0/+2
Some architectures provide an execve function that does not set errno, but instead returns the result code directly. Rename these to kernel_execve to get the right semantics there. Moreover, there is no reasone for any of these architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so remove these right away. [akpm@osdl.org: build fix] [bunk@stusta.de: build fix] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andi Kleen <ak@muc.de> Acked-by: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] IPC namespace - utilsKirill Korotaev2-1/+20
This patch adds basic IPC namespace functionality to IPC utils: - init_ipc_ns - copy/clone/unshare/free IPC ns - /proc preparations Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] IPC namespace coreKirill Korotaev4-0/+40
This patch set allows to unshare IPCs and have a private set of IPC objects (sem, shm, msg) inside namespace. Basically, it is another building block of containers functionality. This patch implements core IPC namespace changes: - ipc_namespace structure - new config option CONFIG_IPC_NS - adds CLONE_NEWIPC flag - unshare support [clg@fr.ibm.com: small fix for unshare of ipc namespace] [akpm@osdl.org: build fix] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: implement CLONE_NEWUTS flagSerge E. Hallyn2-0/+12
Implement a CLONE_NEWUTS flag, and use it at clone and sys_unshare. [clg@fr.ibm.com: IPC unshare fix] [bunk@stusta.de: cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: remove system_utsnameSerge E. Hallyn1-2/+0
The system_utsname isn't needed now that kernel/sysctl.c is fixed. Nuke it. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: implement utsname namespacesSerge E. Hallyn4-3/+42
This patch defines the uts namespace and some manipulators. Adds the uts namespace to task_struct, and initializes a system-wide init namespace. It leaves a #define for system_utsname so sysctl will compile. This define will be removed in a separate patch. [akpm@osdl.org: build fix, cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: switch to using uts namespacesSerge E. Hallyn1-1/+1
Replace references to system_utsname to the per-process uts namespace where appropriate. This includes things like uname. Changes: Per Eric Biederman's comments, use the per-process uts namespace for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c [jdike@addtoit.com: UML fix] [clg@fr.ibm.com: cleanup] [akpm@osdl.org: build fix] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: introduce temporary helpersSerge E. Hallyn1-0/+10
Define utsname() and init_utsname() which return &system_utsname. Users of system_utsname will be changed to use these helpers, after which system_utsname will disappear. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: incorporate fs namespace into nsproxySerge E. Hallyn4-7/+7
This moves the mount namespace into the nsproxy. The mount namespace count now refers to the number of nsproxies point to it, rather than the number of tasks. As a result, the unshare_namespace() function in kernel/fork.c no longer checks whether it is being shared. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: add nsproxySerge E. Hallyn3-0/+54
This patch adds a nsproxy structure to the task struct. Later patches will move the fs namespace pointer into this structure, and introduce a new utsname namespace into the nsproxy. The vserver and openvz functionality, then, would be implemented in large part by virtualizing/isolating more and more resources into namespaces, each contained in the nsproxy. [akpm@osdl.org: build fix] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] nfsd: lockdep annotationPeter Zijlstra1-2/+9
while doing a kernel make modules_install install over an NFS mount. ============================================= [ INFO: possible recursive locking detected ] --------------------------------------------- nfsd/9550 is trying to acquire lock: (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f but task is already holding lock: (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f other info that might help us debug this: 2 locks held by nfsd/9550: #0: (hash_sem){..--}, at: [<cc895223>] exp_readlock+0xd/0xf [nfsd] #1: (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f stack backtrace: [<c0103508>] show_trace_log_lvl+0x58/0x152 [<c0103b8b>] show_trace+0xd/0x10 [<c0103c2f>] dump_stack+0x19/0x1b [<c012aa57>] __lock_acquire+0x77a/0x9a3 [<c012af4a>] lock_acquire+0x60/0x80 [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e [<c034c845>] mutex_lock+0x1c/0x1f [<c0162edc>] vfs_unlink+0x34/0x8a [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd] [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd] [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd] [<c033e84d>] svc_process+0x3a5/0x5ed [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd] [<c0101005>] kernel_thread_helper+0x5/0xb DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb Leftover inexact backtrace: [<c0103b8b>] show_trace+0xd/0x10 [<c0103c2f>] dump_stack+0x19/0x1b [<c012aa57>] __lock_acquire+0x77a/0x9a3 [<c012af4a>] lock_acquire+0x60/0x80 [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e [<c034c845>] mutex_lock+0x1c/0x1f [<c0162edc>] vfs_unlink+0x34/0x8a [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd] [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd] [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd] [<c033e84d>] svc_process+0x3a5/0x5ed [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd] [<c0101005>] kernel_thread_helper+0x5/0xb ============================================= [ INFO: possible recursive locking detected ] --------------------------------------------- nfsd/9580 is trying to acquire lock: (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f but task is already holding lock: (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f other info that might help us debug this: 2 locks held by nfsd/9580: #0: (hash_sem){..--}, at: [<cc89522b>] exp_readlock+0xd/0xf [nfsd] #1: (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f stack backtrace: [<c0103508>] show_trace_log_lvl+0x58/0x152 [<c0103b8b>] show_trace+0xd/0x10 [<c0103c2f>] dump_stack+0x19/0x1b [<c012aa63>] __lock_acquire+0x77a/0x9a3 [<c012af56>] lock_acquire+0x60/0x80 [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e [<c034cc1d>] mutex_lock+0x1c/0x1f [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd] [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd] [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd] [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd] [<c033ec1d>] svc_process+0x3a5/0x5ed [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd] [<c0101005>] kernel_thread_helper+0x5/0xb DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb Leftover inexact backtrace: [<c0103b8b>] show_trace+0xd/0x10 [<c0103c2f>] dump_stack+0x19/0x1b [<c012aa63>] __lock_acquire+0x77a/0x9a3 [<c012af56>] lock_acquire+0x60/0x80 [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e [<c034cc1d>] mutex_lock+0x1c/0x1f [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd] [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd] [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd] [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd] [<c033ec1d>] svc_process+0x3a5/0x5ed [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd] [<c0101005>] kernel_thread_helper+0x5/0xb Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Neil Brown <neilb@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: make rpc threads pools numa awareGreg Banks1-0/+1
Actually implement multiple pools. On NUMA machines, allocate a svc_pool per NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool. Enqueue sockets on the svc_pool corresponding to the CPU on which the socket bh is run (i.e. the NIC interrupt CPU). Threads have their cpu mask set to limit them to the CPUs in the svc_pool that owns them. This is the patch that allows an Altix to scale NFS traffic linearly beyond 4 CPUs and 4 NICs. Incorporates changes and feedback from Neil Brown, Trond Myklebust, and Christoph Hellwig. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: add svc_set_num_threadsGreg Banks1-5/+16
Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new way of managing the list of all threads in a svc_serv. Add svc_create_pooled() to allow creation of a svc_serv whose threads are managed by the sunrpc code. Add svc_set_num_threads() to manage the number of threads in a service, either per-pool or globally across the service. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: add svc_getGreg Banks1-0/+11
add svc_get() for those occasions when we need to temporarily bump up svc_serv->sv_nrthreads as a pseudo refcount. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: split svc_serv into poolsGreg Banks2-2/+24
Split out the list of idle threads and pending sockets from svc_serv into a new svc_pool structure, and allocate a fixed number (in this patch, 1) of pools per svc_serv. The new structure contains a lock which takes over several of the duties of svc_serv->sv_lock, which is now relegated to protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv. The point is to move the hottest fields out of svc_serv and into svc_pool, allowing a following patch to arrange for a svc_pool per NUMA node or per CPU. This is a major step towards making the NFS server NUMA-friendly. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: convert sk_reserved to atomic_tGreg Banks1-1/+1
Convert the svc_sock->sk_reserved variable from an int protected by svc_serv->sv_lock, to an atomic. This reduces (by 1) the number of places we need to take the (effectively global) svc_serv->sv_lock. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: use new lock for svc_sock deferred listGreg Banks1-0/+1
Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock instead of svc_serv->sv_lock. Using the more fine-grained lock reduces the number of places we need to take the svc_serv lock. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: convert sk_inuse to atomic_tGreg Banks1-1/+1
Convert the svc_sock->sk_inuse counter from an int protected by svc_serv->sv_lock, to an atomic. This reduces the number of places we need to take the (effectively global) svc_serv->sv_lock. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: move tempsock aging to a timerGreg Banks2-0/+3
Following are 11 patches from Greg Banks which combine to make knfsd more Numa-aware. They reduce hitting on 'global' data structures, and create some data-structures that can be node-local. knfsd threads are bound to a particular node, and the thread to handle a new request is chosen from the threads that are attach to the node that received the interrupt. The distribution of threads across nodes can be controlled by a new file in the 'nfsd' filesystem, though the default approach of an even spread is probably fine for most sites. Some (old) numbers that show the efficacy of these patches: N == number of NICs == number of CPUs == nmber of clients. Number of NUMA nodes == N/2 N Throughput, MiB/s CPU usage, % (max=N*100) Before After Before After --- ------ ---- ----- ----- 4 312 435 350 228 6 500 656 501 418 8 562 804 690 589 This patch: Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to a timer which uses a mark-and-sweep algorithm every 6 minutes. This reduces the amount of work that needs to be done in the main RPC loop and the length of time we need to hold the (effectively global) svc_serv->sv_lock. [akpm@osdl.org: cleanup] Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: Drop 'serv' option to svc_recv and svc_processNeilBrown2-2/+2
It isn't needed as it is available in rqstp->rq_server, and dropping it allows some local vars to be dropped. [akpm@osdl.org: build fix] Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist'NeilBrown2-1/+6
Userspace should create and bind a socket (but not connectted) and write the 'fd' to portlist. This will cause the nfs server to listen on that socket. To close a socket, the name of the socket - as read from 'portlist' can be written to 'portlist' with a preceding '-'. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: define new nfsdfs file: portlist - contains list of portsNeilBrown1-0/+1
This file will list all ports that nfsd has open. Default when TCP enabled will be ipv4 udp 0.0.0.0 2049 ipv4 tcp 0.0.0.0 2049 Later, the list of ports will be settable. 'portlist' chosen rather than 'ports', to avoid unnecessary confusion with non-mainline patches which created 'ports' with different semantics. [akpm@osdl.org: cleanups, build fix] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>