aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux (follow)
AgeCommit message (Collapse)AuthorFilesLines
2007-10-17Implement file posix capabilitiesSerge E. Hallyn4-19/+73
Implement file posix capabilities. This allows programs to be given a subset of root's powers regardless of who runs them, without having to use setuid and giving the binary all of root's powers. This version works with Kaigai Kohei's userspace tools, found at http://www.kaigai.gr.jp/index.php. For more information on how to use this patch, Chris Friedhoff has posted a nice page at http://www.friedhoff.org/fscaps.html. Changelog: Nov 27: Incorporate fixes from Andrew Morton (security-introduce-file-caps-tweaks and security-introduce-file-caps-warning-fix) Fix Kconfig dependency. Fix change signaling behavior when file caps are not compiled in. Nov 13: Integrate comments from Alexey: Remove CONFIG_ ifdef from capability.h, and use %zd for printing a size_t. Nov 13: Fix endianness warnings by sparse as suggested by Alexey Dobriyan. Nov 09: Address warnings of unused variables at cap_bprm_set_security when file capabilities are disabled, and simultaneously clean up the code a little, by pulling the new code into a helper function. Nov 08: For pointers to required userspace tools and how to use them, see http://www.friedhoff.org/fscaps.html. Nov 07: Fix the calculation of the highest bit checked in check_cap_sanity(). Nov 07: Allow file caps to be enabled without CONFIG_SECURITY, since capabilities are the default. Hook cap_task_setscheduler when !CONFIG_SECURITY. Move capable(TASK_KILL) to end of cap_task_kill to reduce audit messages. Nov 05: Add secondary calls in selinux/hooks.c to task_setioprio and task_setscheduler so that selinux and capabilities with file cap support can be stacked. Sep 05: As Seth Arnold points out, uid checks are out of place for capability code. Sep 01: Define task_setscheduler, task_setioprio, cap_task_kill, and task_setnice to make sure a user cannot affect a process in which they called a program with some fscaps. One remaining question is the note under task_setscheduler: are we ok with CAP_SYS_NICE being sufficient to confine a process to a cpuset? It is a semantic change, as without fsccaps, attach_task doesn't allow CAP_SYS_NICE to override the uid equivalence check. But since it uses security_task_setscheduler, which elsewhere is used where CAP_SYS_NICE can be used to override the uid equivalence check, fixing it might be tough. task_setscheduler note: this also controls cpuset:attach_task. Are we ok with CAP_SYS_NICE being used to confine to a cpuset? task_setioprio task_setnice sys_setpriority uses this (through set_one_prio) for another process. Need same checks as setrlimit Aug 21: Updated secureexec implementation to reflect the fact that euid and uid might be the same and nonzero, but the process might still have elevated caps. Aug 15: Handle endianness of xattrs. Enforce capability version match between kernel and disk. Enforce that no bits beyond the known max capability are set, else return -EPERM. With this extra processing, it may be worth reconsidering doing all the work at bprm_set_security rather than d_instantiate. Aug 10: Always call getxattr at bprm_set_security, rather than caching it at d_instantiate. [morgan@kernel.org: file-caps clean up for linux/capability.h] [bunk@kernel.org: unexport cap_inode_killpriv] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morgan <morgan@kernel.org> Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17ifdef struct task_struct::securityAlexey Dobriyan1-1/+2
For those who don't care about CONFIG_SECURITY. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17security: Convert LSM into a static interfaceJames Morris1-984/+206
Convert LSM into a static interface, as the ability to unload a security module is not required by in-tree users and potentially complicates the overall security architecture. Needlessly exported LSM symbols have been unexported, to help reduce API abuse. Parameters for the capability and root_plug modules are now specified at boot. The SECURITY_FRAMEWORK_VERSION macro has also been removed. In a nutshell, there is no safe way to unload an LSM. The modular interface is thus unecessary and broken infrastructure. It is used only by out-of-tree modules, which are often binary-only, illegal, abusive of the API and dangerous, e.g. silently re-vectoring SELinux. [akpm@linux-foundation.org: cleanups] [akpm@linux-foundation.org: USB Kconfig fix] [randy.dunlap@oracle.com: fix LSM kernel-doc] Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Acked-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17r/o bind mounts: filesystem helpers for custom 'struct file'sDave Hansen1-0/+9
Why do we need r/o bind mounts? This feature allows a read-only view into a read-write filesystem. In the process of doing that, it also provides infrastructure for keeping track of the number of writers to any given mount. This has a number of uses. It allows chroots to have parts of filesystems writable. It will be useful for containers in the future because users may have root inside a container, but should not be allowed to write to somefilesystems. This also replaces patches that vserver has had out of the tree for several years. It allows security enhancement by making sure that parts of your filesystem read-only (such as when you don't trust your FTP server), when you don't want to have entire new filesystems mounted, or when you want atime selectively updated. I've been using the following script to test that the feature is working as desired. It takes a directory and makes a regular bind and a r/o bind mount of it. It then performs some normal filesystem operations on the three directories, including ones that are expected to fail, like creating a file on the r/o mount. This patch: Some filesystems forego the vfs and may_open() and create their own 'struct file's. This patch creates a couple of helper functions which can be used by these filesystems, and will provide a unified place which the r/o bind mount code may patch. Also, rename an existing, static-scope init_file() to a less generic name. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17PNP: remove null pointer checksBjorn Helgaas1-3/+3
Remove some null pointer checks. Null pointers in these areas indicate programming errors, and I think it's better to oops immediately rather than return an error that is easily ignored. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17bitmap.h: remove dead artifactsAdrian Bunk1-2/+0
bitmap_active() no longer exists and BITMAP_ACTIVE is no longer used. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17ext2 reservationsMartin J. Bligh2-6/+67
Val's cross-port of the ext3 reservations code into ext2. [mbligh@mbligh.org: Small type error for printk [akpm@linux-foundation.org: fix types, sync with ext3] [mbligh@mbligh.org: Bring ext2 reservations code in line with latest ext3] [akpm@linux-foundation.org: kill noisy printk] [akpm@linux-foundation.org: remember to dirty the gdp's block] [akpm@linux-foundation.org: cross-port the missed 5dea5176e5c32ef9f0d1a41d28427b3bf6881b3a] [akpm@linux-foundation.org: cross-port e6022603b9aa7d61d20b392e69edcdbbc1789969] [akpm@linux-foundation.org: Port the omitted 08fb306fe63d98eb86e3b16f4cc21816fa47f18e] [akpm@linux-foundation.org: Backport the missed 20acaa18d0c002fec180956f87adeb3f11f635a6] [akpm@linux-foundation.org: fixes] [cmm@us.ibm.com: fix reservation extension] [bunk@stusta.de: make ext2_get_blocks() static] [hugh@veritas.com: fix hang] [hugh@veritas.com: ext2_new_blocks should reset the reservation window size] [hugh@veritas.com: ext2 balloc: fix off-by-one against rsv_end] [hugh@veritas.com: grp_goal 0 is a genuine goal (unlike -1), so ext2_try_to_allocate_with_rsv should treat it as such] [hugh@veritas.com: rbtree usage cleanup] [pbadari@us.ibm.com: Fix for ext2 reservation] [bunk@kernel.org: remove fs/ext2/balloc.c:reserve_blocks()] [hugh@veritas.com: ext2 balloc: use io_error label] Cc: "Martin J. Bligh" <mbligh@mbligh.org> Cc: Valerie Henson <val_henson@linux.intel.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17introduce I_SYNCJoern Engel2-10/+68
I_LOCK was used for several unrelated purposes, which caused deadlock situations in certain filesystems as a side effect. One of the purposes now uses the new I_SYNC bit. Also document the various bits and change their order from historical to logical. [bunk@stusta.de: make fs/inode.c:wake_up_inode() static] Signed-off-by: Joern Engel <joern@wohnheim.fh-wedel.de> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: David Chinner <dgc@sgi.com> Cc: Anton Altaparmakov <aia21@cam.ac.uk> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17writeback: introduce writeback_control.more_io to indicate more ioFengguang Wu1-0/+1
After making dirty a 100M file, the normal behavior is to start the writeback for all data after 30s delays. But sometimes the following happens instead: - after 30s: ~4M - after 5s: ~4M - after 5s: all remaining 92M Some analyze shows that the internal io dispatch queues goes like this: s_io s_more_io ------------------------- 1) 100M,1K 0 2) 1K 96M 3) 0 96M 1) initial state with a 100M file and a 1K file 2) 4M written, nr_to_write <= 0, so write more 3) 1K written, nr_to_write > 0, no more writes(BUG) nr_to_write > 0 in (3) fools the upper layer to think that data have all been written out. The big dirty file is actually still sitting in s_more_io. We cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and let the loop in generic_sync_sb_inodes() continue: this may starve newly expired inodes in s_dirty. It is also not an option to draw inodes from both s_more_io and s_dirty, an let the loop go on: this might lead to live locks, and might also starve other superblocks in sync time(well kupdate may still starve some superblocks, that's another bug). We have to return when a full scan of s_io completes. So nr_to_write > 0 does not necessarily mean that "all data are written". This patch introduces a flag writeback_control.more_io to indicate this situation. With it the big dirty file no longer has to wait for the next kupdate invocation 5s later. Cc: David Chinner <dgc@sgi.com> Cc: Ken Chen <kenchen@google.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17writeback: fix ntfs with sb_has_dirty_inodes()Fengguang Wu1-0/+1
NTFS's if-condition on dirty inodes is not complete. Fix it with sb_has_dirty_inodes(). Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Ken Chen <kenchen@google.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17writeback: fix periodic superblock dirty inode flushingKen Chen1-0/+1
Current -mm tree has bucketful of bug fixes in periodic writeback path. However, we still hit a glitch where dirty pages on a given inode aren't completely flushed to the disk, and system will accumulate large amount of dirty pages beyond what dirty_expire_interval is designed for. The problem is __sync_single_inode() will move an inode to sb->s_dirty list even when there are more pending dirty pages on that inode. If there is another inode with a small number of dirty pages, we hit a case where the loop iteration in wb_kupdate() terminates prematurely because wbc.nr_to_write > 0. Thus leaving the inode that has large amount of dirty pages behind and it has to wait for another dirty_writeback_interval before we flush it again. We effectively only write out MAX_WRITEBACK_PAGES every dirty_writeback_interval. If the rate of dirtying is sufficiently high, the system will start accumulate a large number of dirty pages. So fix it by having another sb->s_more_io list on which to park the inode while we iterate through sb->s_io and to allow each dirty inode which resides on that sb to have an equal chance of flushing some amount of dirty pages. Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17printk: add KERN_CONT annotationIngo Molnar1-0/+7
printk: add the KERN_CONT annotation (which is empty string but via which checkpatch.pl can notice that the lacking KERN_ level is fine). This useful for multiple calls of hand-crafted printk output done by early debug code or similar. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17F_DUPFD_CLOEXEC implementationUlrich Drepper1-5/+10
One more small change to extend the availability of creation of file descriptors with FD_CLOEXEC set. Adding a new command to fcntl() requires no new system call and the overall impact on code size if minimal. If this patch gets accepted we will also add this change to the next revision of the POSIX spec. To test the patch, use the following little program. Adjust the value of F_DUPFD_CLOEXEC appropriately. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #ifndef F_DUPFD_CLOEXEC # define F_DUPFD_CLOEXEC 12 #endif int main (int argc, char *argv[]) { if (argc > 1) { if (fcntl (3, F_GETFD) == 0) { puts ("descriptor not closed"); exit (1); } if (errno != EBADF) { puts ("error not EBADF"); exit (1); } exit (0); } int fd = fcntl (STDOUT_FILENO, F_DUPFD_CLOEXEC, 0); if (fd == -1 && errno == EINVAL) { puts ("F_DUPFD_CLOEXEC not supported"); return 0; } if (fd != 3) { puts ("program called with descriptors other than 0,1,2"); return 1; } execl ("/proc/self/exe", "/proc/self/exe", "1", NULL); puts ("execl failed"); return 1; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17task_struct: move ->fpu_counter and ->oomkilladjAlexey Dobriyan1-10/+10
There is nice 2 byte hole after struct task_struct::ioprio field into which we can put two 1-byte fields: ->fpu_counter and ->oomkilladj. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17rename signalfd_siginfo fieldsDavide Libenzi1-16/+16
For Michael Kerrisk request, the following patch renames signalfd_siginfo fields in order to keep them consistent with the siginfo_t ones. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17ext3: remove #ifdef CONFIG_EXT3_INDEXEric Sandeen1-12/+2
CONFIG_EXT3_INDEX is not an exposed config option in the kernel, and it is unconditionally defined in ext3_fs.h. tune2fs is already able to turn off dir indexing, so at this point it's just cluttering up the code. Remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Completely remove deprecated IRQ flags (SA_*)Ahmed S. Darwish1-22/+0
Only very little files use the deprecated SA_* IRQ flags in latest pull. This patch series removes such macros from the tree and transfrom old code to the new IRQF_* flags. I've grepped the whole tree to make sure that no more files than the patched ones use such deprecated macros. I hope this series won't introduce build errors. Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Matthew Wilcox <willy@debian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17change inotifyfs magic as the same magic is used for futexfsAndrey Mirkin1-0/+3
Right now futexfs and inotifyfs have one magic 0xBAD1DEA, that looks a little bit confusing. Use 0xBAD1DEA as magic for futexfs and 0x2BAD1DEA as magic for inotifyfs. Signed-off-by: Andrey Mirkin <major@openvz.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17increase AT_VECTOR_SIZE to terminate saved_auxv properlyOlaf Hering3-3/+9
include/asm-powerpc/elf.h has 6 entries in ARCH_DLINFO. fs/binfmt_elf.c has 14 unconditional NEW_AUX_ENT entries and 2 conditional NEW_AUX_ENT entries. So in the worst case, saved_auxv does not get an AT_NULL entry at the end. The saved_auxv array must be terminated with an AT_NULL entry. Make the size of mm_struct->saved_auxv arch dependend, based on the number of ARCH_DLINFO entries. Signed-off-by: Olaf Hering <olh@suse.de> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Remove unused member from nsproxyPavel Emelyanov2-2/+0
The nslock spinlock is not used in the kernel at all. Remove it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17user.c: #ifdef ->mq_bytesAlexey Dobriyan1-0/+2
For those who deselect POSIX message queues. Reduces SLAB size of user_struct from 64 to 32 bytes here, SLUB size -- from 40 bytes to 32 bytes. [akpm@linux-foundation.org: fix build] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17tty: expose new methods needed for drivers to get termios rightAlan Cox1-0/+3
This adds three new functions (or in one case to be more exact makes it always available) tty_termios_copy_hw Copies all the hardware settings from one termios structure to the other. This is intended for drivers that support little or no hardware setting tty_termios_encode_baud_rate Allows you to set the input and output baud rate in a termios structure. A driver is supposed to set the resulting baud rate from a request so most will want to use this function to set the resulting input and output rates to match the hardware values. Internally it knows about keeping Bxxx encoding when possible to maximise compatibility. tty_encode_baud_rate As above but for the tty's own current termios structure I suspect this will initially need some tweaking as it gets enabled by driver patches over the next few mm cycles so consider this lot -mm only for the moment so it can stabilize and end up neat before it goes to base. I've tried not to break any obscure architectures - if you get a speed you can't represent the code will print warnings on non updated termios systems but not break. Once this is merged and seems sane I've got a growing pile of driver updates to use it - notably for USB serial drivers. [akpm@linux-foundation.org: cleanups] Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17add consts where appropriate in fs/nls/*Denys Vlasenko1-4/+4
Add const modifiers to a few struct nls_table's member pointers in include/linux/nls.h and adds a lot of const's in fs/nls/*.c files. Resulting changes as visible by size: text data bss dec hex filename 113612 481216 2368 597196 91ccc nls.org/built-in.o 593548 3296 288 597132 91c8c nls/built-in.o Apparently compiler managed to optimize code a bit better because of const-ness. No other changes are made. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17shrink_dcache_sb speedupDenis V. Lunev1-0/+12
This patch makes shrink_dcache_sb consistent with dentry pruning policy. On the first pass we iterate over dentry unused list and prepare some dentries for removal. However, since the existing code moves evicted dentries to the beginning of the LRU it can happen that fresh dentries from other superblocks will be inserted *before* our dentries. This can result in significant slowdown of shrink_dcache_sb(). Moreover, for virtual filesystems like unionfs which can call dput() during dentries kill existing code results in O(n^2) complexity. We observed 2 minutes shrink_dcache_sb() with only 35000 dentries. To avoid this effects we propose to isolate sb dentries at the end of LRU list. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrey Mirkin <amirkin@openvz.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Make the pr_*() family of macros in kernel.h completeEmil Medve1-5/+17
Other/Some pr_*() macros are already defined in kernel.h, but pr_err() was defined multiple times in several other places Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Tony Lindgren <tony@atomide.com> Reviewed-by: Satyam Sharma <satyam@infradead.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17KEYS: Make request_key() and co fundamentally asynchronousDavid Howells2-86/+125
Make request_key() and co fundamentally asynchronous to make it easier for NFS to make use of them. There are now accessor functions that do asynchronous constructions, a wait function to wait for construction to complete, and a completion function for the key type to indicate completion of construction. Note that the construction queue is now gone. Instead, keys under construction are linked in to the appropriate keyring in advance, and that anyone encountering one must wait for it to be complete before they can use it. This is done automatically for userspace. The following auxiliary changes are also made: (1) Key type implementation stuff is split from linux/key.h into linux/key-type.h. (2) AF_RXRPC provides a way to allocate null rxrpc-type keys so that AFS does not need to call key_instantiate_and_link() directly. (3) Adjust the debugging macros so that they're -Wformat checked even if they are disabled, and make it so they can be enabled simply by defining __KDEBUG to be consistent with other code of mine. (3) Documentation. [alan@lxorguk.ukuu.org.uk: keys: missing word in documentation] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Mutex documentation is unclear about software interrupts, tasklets and timersMatti Linnanvuori1-1/+2
Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17add CONFIG_VT_UNICODEBill Nottingham1-0/+1
As of now, the kernel defaults to non-unicode and XLATE for the keyboard. We've been changing this in Fedora, but that requires patching the defaults in the kernel. The attached introduces CONFIG_VT_UNICODE, which sets the console in unicode mode by default on boot, including both the virtual terminal and the keyboard driver. Signed-off-by: Bill Nottingham <notting@redhat.com> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17constify string/array kparam tracking structuresJan Beulich1-7/+11
.. in an effort to make read-only whatever can be made, so that CONFIG_DEBUG_RODATA can catch as many issues as possible. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17store __setup_str_* in a more compact wayJan Beulich1-1/+1
__setup_str_* are referenced only during boot, hence there's no need to waste image space for aligning these strings (with the aim of improving performance). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Add a "rounddown_pow_of_two" routine to log2.hRobert P. J. Day1-0/+25
To go along with the existing "roundup_pow_of_two" routine, add one for rounding down since that operation appears to crop up on a regular basis in the source tree. [m.kozlowski@tuxland.pl: fix unbalanced parentheses] Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17quota: send messages via netlinkJan Kara1-0/+31
Implement sending of quota messages via netlink interface. The advantage is that in userspace we can better decide what to do with the message - for example display a dialogue in your X session or just write the message to the console. As a bonus, we can get rid of problems with console locking deep inside filesystem code once we remove the old printing mechanism. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17make kernel/profile.c:time_hook staticAdrian Bunk1-3/+0
{,un}register_timer_hook() is the API that should be used. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17remove include/asm-*/ipc.hAdrian Bunk1-0/+28
All asm/ipc.h files do only #include <asm-generic/ipc.h>. This patch therefore removes all include/asm-*/ipc.h files and moves the contents of include/asm-generic/ipc.h to include/linux/ipc.h. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Drop some headers from mm.hAlexey Dobriyan1-2/+0
mm.h doesn't use directly anything from mutex.h and backing-dev.h, so remove them and add them back to files which need them. Cross-compile tested on many configs and archs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17NBD: allow hung network I/O to be cancelledPaul Clements1-0/+2
Allow NBD I/O to be cancelled when a network outage occurs. Previously, I/O would just hang, and if enough I/O was hung in nbd, the system (at least user-level) would completely hang until a TCP timeout (default, 15 minutes) occurred. The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit timeout value (in seconds) to be specified. Any network send that exceeds the timeout will be cancelled and the nbd connection will be shut down. I've tested with various timeout values and 6 seconds seems to be a good choice for the timeout. If the NBD_SET_TIMEOUT ioctl is not called, you get the old (I/O hang) behavior. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17tty.h: remove dead defineAlan Cox1-5/+0
No longer used. TTY_FLIPBUF_SIZE will also go soon but needs a couple of other cleanups first Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Shrink task_struct if CONFIG_FUTEX=nAlexey Dobriyan1-1/+2
robust_list, compat_robust_list, pi_state_list, pi_state_cache are really used if futexes are on. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17add-vmcore: add a prefix "VMCOREINFO_" to the vmcoreinfo macrosKen'ichi Ohmichi1-7/+7
Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is impossible to grep for them. So these names should be changed. This discussion is the following: http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17add-vmcore: add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_dataKen'ichi Ohmichi1-0/+5
[2/3] Add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_data. The dump filetering command 'makedumpfile'(v1.1.6 or before) had assumed the above values, and it was not good from the reliability viewpoint. So makedumpfile v1.2.0 came to need these values and I created the patch to let the kernel output them. makedumpfile site: https://sourceforge.net/projects/makedumpfile/ Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17add-vmcore: cleanup the coding style according to Andrew's commentsKen'ichi Ohmichi1-7/+9
[1/3] Cleanup the coding style according to Andrew's comments: http://lists.infradead.org/pipermail/kexec/2007-August/000522.html - vmcoreinfo_append_str() should have suitable __attribute__s so that the compiler can check its use. - vmcoreinfo_max_size should have size_t. - Use get_seconds() instead of xtime.tv_sec. - Use init_uts_ns.name.release instead of UTS_RELEASE. Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Add vmcoreinfoKen'ichi Ohmichi1-0/+26
This patch set frees the restriction that makedumpfile users should install a vmlinux file (including the debugging information) into each system. makedumpfile command is the dump filtering feature for kdump. It creates a small dumpfile by filtering unnecessary pages for the analysis. To distinguish unnecessary pages, it needs a vmlinux file including the debugging information. These days, the debugging package becomes a huge file, and it is hard to install it into each system. To solve the problem, kdump developers discussed it at lkml and kexec-ml. As the result, we reached the conclusion that necessary information for dump filtering (called "vmcoreinfo") should be embedded into the first kernel file and it should be accessed through /proc/vmcore during the second kernel. (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html) Dan Aloni created the patch set for the above implementation. (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html) And I updated it for multi architectures and memory models. (http://lists.infradead.org/pipermail/kexec/2007-August/000479.html) Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Fix f_version type: should be u64 instead of unsigned longMathieu Desnoyers2-2/+2
Fix f_version type: should be u64 instead of long There is a type inconsistency between struct inode i_version and struct file f_version. fs.h: struct inode u64 i_version; and struct file unsigned long f_version; Users do: fs/ext3/dir.c: if (filp->f_version != inode->i_version) { So why isn't f_version a u64 ? It becomes a problem if versions gets higher than 2^32 and we are on an architecture where longs are 32 bits. This patch changes the f_version type to u64, and updates the users accordingly. It applies to 2.6.23-rc2-mm2. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Martin Bligh <mbligh@google.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: <linux-ext4@vger.kernel.org> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17make fs/libfs.c:simple_commit_write() staticAdrian Bunk1-2/+0
simple_commit_write() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17kernel/time/timekeeping.c: cleanupsAdrian Bunk1-1/+1
- remove the no longer required __attribute__((weak)) of xtime_lock - remove the following no longer used EXPORT_SYMBOL's: - xtime - xtime_lock Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Shrink struct task_struct::oomkilladjAlexey Dobriyan1-1/+1
oomkilladj is int, but values which can be assigned to it are -17, [-16, 15], thus fitting into s8. While patch itself doesn't help in making task_struct smaller, because of natural alignment of ->link_count, it will make picture clearer wrt futher task_struct reduction patches. My plan is to move ->fpu_counter and ->oomkilladj after ->ioprio filling hole on i386 and x86_64. But that's for later, because bloated distro configs need looking at as well. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17cramfs: error message about endianessAndi Drebes1-0/+1
The README file in the cramfs subdirectory says: "All data is currently in host-endian format; neither mkcramfs nor the kernel ever do swabbing." If somebody tries to mount a cramfs with the wrong endianess, cramfs only complains about a wrong magic but doesn't inform the user that only the endianess isn't right. The following patch adds an error message to the cramfs sources. If a user tries to mount a cramfs with the wrong endianess using the patched sources, cramfs will display the message "cramfs: wrong endianess". Signed-off-by: Andi Drebes <lists-receive@programmierforen.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17include linux/types.h in if_fddi.hOlaf Hering1-0/+2
include/linux/if_fddi.h is an exported header. It uses __be16. Include linux/types.h to get this prototype. Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17remove consolemap.h from header exportsOlaf Hering1-1/+0
Remove linux/consolemap.h from make headers_install It contains no user interfaces. The defines in this file are used only for kernel internal state. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17unicode diacritics supportSamuel Thibault3-1/+12
There have been issues with non-latin1 diacritics and unicode. http://bugzilla.kernel.org/show_bug.cgi?id=7746 Git 759448f459234bfcf34b82471f0dba77a9aca498 `Kernel utf-8 handling' partly resolved it by adding conversion between diacritics and unicode. The patch below goes further by just turning diacritics into unicode, hence providing better future support. The kbd support can be fetched from http://bugzilla.kernel.org/attachment.cgi?id=12313 This was tested in all of latin1, latin9, latin2 and unicode with french and czech dead keys. Turn the kernel accent_table into unicode, and extend ioctls KDGKBDIACR and KDSKBDIACR into their equivalents KDGKBDIACRUC and KDSKBDIACR. New function int conv_uni_to_8bit(u32 uni) for converting unicode into 8bit _input_. No, we don't want to store the translation, as it is potentially sparse and large. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Jan Engelhardt <jengelh@gmx.de> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>