aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2009-04-01Merge branch 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds6-42/+89
* 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (58 commits) SUNRPC: Ensure IPV6_V6ONLY is set on the socket before binding to a port NSM: Fix unaligned accesses in nsm_init_private() NFS: Simplify logic to compare socket addresses in client.c NFS: Start PF_INET6 callback listener only if IPv6 support is available lockd: Start PF_INET6 listener only if IPv6 support is available SUNRPC: Remove CONFIG_SUNRPC_REGISTER_V4 SUNRPC: rpcb_register() should handle errors silently SUNRPC: Simplify kernel RPC service registration SUNRPC: Simplify svc_unregister() SUNRPC: Allow callers to pass rpcb_v4_register a NULL address SUNRPC: rpcbind actually interprets r_owner string SUNRPC: Clean up address type casts in rpcb_v4_register() SUNRPC: Don't return EPROTONOSUPPORT in svc_register()'s helpers SUNRPC: Use IPv4 loopback for registering AF_INET6 kernel RPC services SUNRPC: Set IPV6ONLY flag on PF_INET6 RPC listener sockets NFS: Revert creation of IPv6 listeners for lockd and NFSv4 callbacks SUNRPC: Remove @family argument from svc_create() and svc_create_pooled() SUNRPC: Change svc_create_xprt() to take a @family argument SUNRPC: svc_setup_socket() gets protocol family from socket SUNRPC: Pass a family argument to svc_register() ...
2009-04-01Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds1-0/+6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits) ext4: Regularize mount options ext4: fix locking typo in mballoc which could cause soft lockup hangs ext4: fix typo which causes a memory leak on error path jbd2: Update locking coments ext4: Rename pa_linear to pa_type ext4: add checks of block references for non-extent inodes ext4: Check for an valid i_mode when reading the inode from disk ext4: Use WRITE_SYNC for commits which are caused by fsync() ext4: Add auto_da_alloc mount option ext4: Use struct flex_groups to calculate get_orlov_stats() ext4: Use atomic_t's in struct flex_groups ext4: remove /proc tuning knobs ext4: Add sysfs support ext4: Track lifetime disk writes ext4: Fix discard of inode prealloc space with delayed allocation. ext4: Automatically allocate delay allocated blocks on rename ext4: Automatically allocate delay allocated blocks on close ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl ext4: Simplify delalloc code by removing mpage_da_writepages() ext4: Save stack space by removing fake buffer heads ...
2009-04-01Merge branch 'devel' into for-linusTrond Myklebust6-42/+89
2009-04-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6Linus Torvalds1-29/+27
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (59 commits) ide-floppy: do not complete rq's prematurely ide: be able to build pmac driver without IDE built-in ide-pmac: IDE cable detection on Apple PowerBook ide: inline SELECT_DRIVE() ide: turn selectproc() method into dev_select() method (take 5) MAINTAINERS: move old ide-{floppy,tape} entries to CREDITS (take 2) ide: move data register access out of tf_{read|load}() methods (take 2) ide: call {in|out}put_data() methods from tf_{read|load}() methods (take 2) ide-io-std: shorten ide_{in|out}put_data() ide: rename IDE_TFLAG_IN_[HOB_]FEATURE ide: turn set_irq() method into write_devctl() method ide: use ATA_HOB ide-disk: use ATA_ERR ide: add support for CFA specified transfer modes (take 3) ide-iops: only clear DMA words on setting DMA mode ide: identify data word 53 bit 1 doesn't cover words 62 and 63 (take 3) au1xxx-ide: auide_{in|out}sw() should be static ide-floppy: use ide_pio_bytes() ide-{floppy,tape}: fix padding for PIO transfers ide: remove CONFIG_BLK_DEV_IDEDOUBLER config option ...
2009-04-01Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6Linus Torvalds7-84/+165
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits) PCI: fix HT MSI mapping fix PCI: don't enable too much HT MSI mapping x86/PCI: make pci=lastbus=255 work when acpi is on PCI: save and restore PCIe 2.0 registers PCI: update fakephp for bus_id removal PCI: fix kernel oops on bridge removal PCI: fix conflict between SR-IOV and config space sizing powerpc/PCI: include pci.h in powerpc MSI implementation PCI Hotplug: schedule fakephp for feature removal PCI Hotplug: rename legacy_fakephp to fakephp PCI Hotplug: restore fakephp interface with complete reimplementation PCI: Introduce /sys/bus/pci/devices/.../rescan PCI: Introduce /sys/bus/pci/devices/.../remove PCI: Introduce /sys/bus/pci/rescan PCI: Introduce pci_rescan_bus() PCI: do not enable bridges more than once PCI: do not initialize bridges more than once PCI: always scan child buses PCI: pci_scan_slot() returns newly found devices PCI: don't scan existing devices ... Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
2009-04-01fbdev: update s1d13xxxfb to differ between revisions and production idsKristoffer Ericson1-6/+10
The s1d13xxx chip provides two values of identification value: the Production id (e.g 13506/13505/13806..) and a revision number 0,1,2,3). Together these can help us to differentiate between similiar setups. This patch adds the proper way of grabbing both those values and save them for future reference (in order to decide what functions a card supports, e.g acceleration). We also move away from the concept of all s1d13xxx = s1d13806 when we really support alot more. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: simplify s1d13xxxfb_probe()] Signed-off-by: Kristoffer Ericson <kristoffer.ericson@gmail.com Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01fbdev: newport: newport_*wait() return 0 on timeoutRoel Kluin1-2/+2
With a postfix decrement t reaches -1 on timeout which results in a return of 0. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01fbdev: uninline lock_fb_info()Andrew Morton1-9/+1
Before: text data bss dec hex filename 3648 2910 32 6590 19be drivers/video/backlight/backlight.o 3226 2812 32 6070 17b6 drivers/video/backlight/lcd.o 30990 16688 8480 56158 db5e drivers/video/console/fbcon.o 15488 8400 24 23912 5d68 drivers/video/fbmem.o After: text data bss dec hex filename 3537 2870 32 6439 1927 drivers/video/backlight/backlight.o 3131 2772 32 5935 172f drivers/video/backlight/lcd.o 30876 16648 8480 56004 dac4 drivers/video/console/fbcon.o 15506 8400 24 23930 5d7a drivers/video/fbmem.o Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01cirrusfb: add accelerator constantKrzysztof Helt1-0/+1
Add an accelerator constant so almost all Cirrus are recognized as accelerators by the fbset command. Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01cirrusfb: Laguna chipset 8bpp fixKrzysztof Helt1-1/+0
Fix 8bpp mode by adding handling of the Laguna chipsets to various places and stop trashing a HDR register which probably does not exist on the Laguna. Fix compilation warnings about uninitialized variables also. Finally, all 8bpp, 16bpp and 32bpp modes work on the Laguna chipset. Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01cirrusfb: add Laguna additional overflow registerKrzysztof Helt1-0/+1
Add additional overflow register setting for Laguna chips. Also, simplify some code in the cirrusfb_pan_display() and cirrusfb_blank(). Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01atyfb: fix header file trailing whitespaceRandy Dunlap2-283/+283
Fix trailing whitespace because quilt complained about it. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01rtc: convert LEAP_YEAR into an inlineAndrew Morton1-0/+6
- the LEAP_YEAR macro is buggy - it references its arg multiple times. Fix this by turning it into a C function. - give it a more approriate name - Move it to rtc.h so that other .c files can use it, instead of copying it. Cc: dann frazier <dannf@hp.com> Acked-by: Alessandro Zummo <alessandro.zummo@towertech.it> Cc: stephane eranian <eranian@googlemail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01autofs4: fix kernel includesIan Kent2-3/+10
autofs_dev-ioctl.h is included by both the kernel module and user space tools and it includes two kernel header files. Compiles work if the kernel headers are installed but fail otherwise. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01spi_mpc83xx: add OF platform driver bindingsAnton Vorontsov1-1/+1
Implement full support for OF SPI bindings. Now the driver can manage its own chip selects without any help from the board files and/or fsl_soc constructors. The "legacy" code is well isolated and could be removed as time goes by. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: David Brownell <david-b@pacbell.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01spi_mpc83xx: rework chip selects handlingAnton Vorontsov1-2/+3
The main purpose of this patch is to pass 'struct spi_device' to the chip select handling routines. This is needed so that we could implement full-fledged OpenFirmware support for this driver. While at it, also: - Replace two {de,activate}_cs routines by single cs_contol(). - Don't duplicate platform data callbacks in mpc83xx_spi struct. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: David Brownell <david-b@pacbell.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01epoll keyed wakeups: introduce new *_poll() wakeup macrosDavide Libenzi1-13/+9
Introduce new wakeup macros that allow passing an event mask to the wakeup targets. They exactly mimic their non-_poll() counterpart, with the added event mask passing capability. I did add only the ones currently requested, avoiding the _nr() and _all() for the moment. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: William Lee Irwin III <wli@movementarian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key()Davide Libenzi1-2/+5
This patchset introduces wakeup hints for some of the most popular (from epoll POV) devices, so that epoll code can avoid spurious wakeups on its waiters. The problem with epoll is that the callback-based wakeups do not, ATM, carry any information about the events the wakeup is related to. So the only choice epoll has (not being able to call f_op->poll() from inside the callback), is to add the file* to a ready-list and resolve the real events later on, at epoll_wait() (or its own f_op->poll()) time. This can cause spurious wakeups, since the wake_up() itself might be for an event the caller is not interested into. The rate of these spurious wakeup can be pretty high in case of many network sockets being monitored. By allowing devices to report the events the wakeups refer to (at least the two major classes - POLLIN/POLLOUT), we are able to spare useless wakeups by proper handling inside the epoll's poll callback. Epoll will have in any case to call f_op->poll() on the file* later on, since the change to be done in order to have the full event set sent via wakeup, is too invasive for the way our f_op->poll() system works (the full event set is calculated inside the poll function - there are too many of them to even start thinking the change - also poll/select would need change too). Epoll is changed in a way that both devices which send event hints, and the ones that don't, are correctly handled. The former will gain some efficiency though. As a general rule for devices, would be to add an event mask by using key-aware wakeup macros, when making up poll wait queues. I tested it (together with the epoll's poll fix patch Andrew has in -mm) and wakeups for the supported devices are correctly filtered. Test program available here: http://www.xmailserver.org/epoll_test.c This patch: Nothing revolutionary here. Just using the available "key" that our wakeup core already support. The __wake_up_locked_key() was no brainer, since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers around __wake_up_common(). The __wake_up_sync() function had a body, so the choice was between borrowing the body for __wake_up_sync_key() and calling it from __wake_up_sync(), or make an inline and calling it from both. I chose the former since in most archs it all resolves to "mov $0, REG; jmp ADDR". Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: William Lee Irwin III <wli@movementarian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01eventfd: improve support for semaphore-like behaviorDavide Libenzi1-1/+11
People started using eventfd in a semaphore-like way where before they were using pipes. That is, counter-based resource access. Where a "wait()" returns immediately by decrementing the counter by one, if counter is greater than zero. Otherwise will wait. And where a "post(count)" will add count to the counter releasing the appropriate amount of waiters. If eventfd the "post" (write) part is fine, while the "wait" (read) does not dequeue 1, but the whole counter value. The problem with eventfd is that a read() on the fd returns and wipes the whole counter, making the use of it as semaphore a little bit more cumbersome. You can do a read() followed by a write() of COUNTER-1, but IMO it's pretty easy and cheap to make this work w/out extra steps. This patch introduces a new eventfd flag that tells eventfd to only dequeue 1 from the counter, allowing simple read/write to make it behave like a semaphore. Simple test here: http://www.xmailserver.org/eventfd-sem.c To be back-compatible with earlier kernels, userspace applications should probe for the availability of this feature via #ifdef EFD_SEMAPHORE fd = eventfd2 (CNT, EFD_SEMAPHORE); if (fd == -1 && errno == EINVAL) <fallback> #else <fallback> #endif Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: <linux-api@vger.kernel.org> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01introduce pr_cont() macroCyrill Gorcunov1-0/+2
We cover all log-levels by pr_... macros except KERN_CONT one. Add it for convenience. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01remove unused include/asm-generic/dma-mapping.hFUJITA Tomonori1-308/+0
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01filesystem freeze: allow SysRq emergency thaw to thaw frozen filesystemsEric Sandeen1-0/+1
Now that the filesystem freeze operation has been elevated to the VFS, and is just an ioctl away, some sort of safety net for unintentionally frozen root filesystems may be in order. The timeout thaw originally proposed did not get merged, but perhaps something like this would be useful in emergencies. For example, freeze /path/to/mountpoint may freeze your root filesystem if you forgot that you had that unmounted. I chose 'j' as the last remaining character other than 'h' which is sort of reserved for help (because help is generated on any unknown character). I've tested this on a non-root fs with multiple (nested) freezers, as well as on a system rendered unresponsive due to a frozen root fs. [randy.dunlap@oracle.com: emergency thaw only if CONFIG_BLOCK enabled] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Takashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01loop: add ioctl to resize a loop deviceJ. R. Okajima1-0/+1
Add the ability to 'resize' the loop device on the fly. One practical application is a loop file with XFS filesystem, already mounted: You can easily enlarge the file (append some bytes) and then call ioctl(fd, LOOP_SET_CAPACITY, new); The loop driver will learn about the new size and you can use xfs_growfs later on, which will allow you to use full capacity of the loop file without the need to unmount. Test app: #include <linux/fs.h> #include <linux/loop.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <assert.h> #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define _GNU_SOURCE #include <getopt.h> char *me; void usage(FILE *f) { fprintf(f, "%s [options] loop_dev [backend_file]\n" "-s, --set new_size_in_bytes\n" "\twhen backend_file is given, " "it will be expanded too while keeping the original contents\n", me); } struct option opts[] = { { .name = "set", .has_arg = 1, .flag = NULL, .val = 's' }, { .name = "help", .has_arg = 0, .flag = NULL, .val = 'h' } }; void err_size(char *name, __u64 old) { fprintf(stderr, "size must be larger than current %s (%llu)\n", name, old); } int main(int argc, char *argv[]) { int fd, err, c, i, bfd; ssize_t ssz; size_t sz; __u64 old, new, append; char a[BUFSIZ]; struct stat st; FILE *out; char *backend, *dev; err = EINVAL; out = stderr; me = argv[0]; new = 0; while ((c = getopt_long(argc, argv, "s:h", opts, &i)) != -1) { switch (c) { case 's': errno = 0; new = strtoull(optarg, NULL, 0); if (errno) { err = errno; perror(argv[i]); goto out; } break; case 'h': err = 0; out = stdout; goto err; default: perror(argv[i]); goto err; } } if (optind < argc) dev = argv[optind++]; else goto err; fd = open(dev, O_RDONLY); if (fd < 0) { err = errno; perror(dev); goto out; } err = ioctl(fd, BLKGETSIZE64, &old); if (err) { err = errno; perror("ioctl BLKGETSIZE64"); goto out; } if (!new) { printf("%llu\n", old); goto out; } if (new < old) { err = EINVAL; err_size(dev, old); goto out; } if (optind < argc) { backend = argv[optind++]; bfd = open(backend, O_WRONLY|O_APPEND); if (bfd < 0) { err = errno; perror(backend); goto out; } err = fstat(bfd, &st); if (err) { err = errno; perror(backend); goto out; } if (new < st.st_size) { err = EINVAL; err_size(backend, st.st_size); goto out; } append = new - st.st_size; sz = sizeof(a); while (append > 0) { if (append < sz) sz = append; ssz = write(bfd, a, sz); if (ssz != sz) { err = errno; perror(backend); goto out; } append -= sz; } err = fsync(bfd); if (err) { err = errno; perror(backend); goto out; } } err = ioctl(fd, LOOP_SET_CAPACITY, new); if (err) { err = errno; perror("ioctl LOOP_SET_CAPACITY"); } goto out; err: usage(out); out: return err; } Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Tomas Matejicek <tomas@slax.org> Cc: <util-linux-ng@vger.kernel.org> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01pm: rework includes, remove arch ifdefsMagnus Damm1-3/+0
Make the following header file changes: - remove arch ifdefs and asm/suspend.h from linux/suspend.h - add asm/suspend.h to disk.c (for arch_prepare_suspend()) - add linux/io.h to swsusp.c (for ioremap()) - x86 32/64 bit compile fixes Signed-off-by: Magnus Damm <damm@igel.co.jp> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01shmem: writepage directly to swapHugh Dickins1-0/+5
Synopsis: if shmem_writepage calls swap_writepage directly, most shmem swap loads benefit, and a catastrophic interaction between SLUB and some flash storage is avoided. shmem_writepage() has always been peculiar in making no attempt to write: it has just transferred a shmem page from file cache to swap cache, then let that page make its way around the LRU again before being written and freed. The idea was that people use tmpfs because they want those pages to stay in RAM; so although we give it an overflow to swap, we should resist writing too soon, giving those pages a second chance before they can be reclaimed. That was always questionable, and I've toyed with this patch for years; but never had a clear justification to depart from the original design. It became more questionable in 2.6.28, when the split LRU patches classed shmem and tmpfs pages as SwapBacked rather than as file_cache: that in itself gives them more resistance to reclaim than normal file pages. I prepared this patch for 2.6.29, but the merge window arrived before I'd completed gathering statistics to justify sending it in. Then while comparing SLQB against SLUB, running SLUB on a laptop I'd habitually used with SLAB, I found SLUB to run my tmpfs kbuild swapping tests five times slower than SLAB or SLQB - other machines slower too, but nowhere near so bad. Simpler "cp -a" swapping tests showed the same. slub_max_order=0 brings sanity to all, but heavy swapping is too far from normal to justify such a tuning. The crucial factor on that laptop turns out to be that I'm using an SD card for swap. What happens is this: By default, SLUB uses order-2 pages for shmem_inode_cache (and many other fs inodes), so creating tmpfs files under memory pressure brings lumpy reclaim into play. One subpage of the order is chosen from the bottom of the LRU as usual, then the other three picked out from their random positions on the LRUs. In a tmpfs load, many of these pages will be ones which already passed through shmem_writepage, so already have swap allocated. And though their offsets on swap were probably allocated sequentially, now that the pages are picked off at random, their swap offsets are scattered. But the flash storage on the SD card is very sensitive to having its writes merged: once swap is written at scattered offsets, performance falls apart. Rotating disk seeks increase too, but less disastrously. So: stop giving shmem/tmpfs pages a second pass around the LRU, write them out to swap as soon as their swap has been allocated. It's surely possible to devise an artificial load which runs faster the old way, one whose sizing is such that the tmpfs pages on their second pass are the ones that are wanted again, and other pages not. But I've not yet found such a load: on all machines, under the loads I've tried, immediate swap_writepage speeds up shmem swapping: especially when using the SLUB allocator (and more effectively than slub_max_order=0), but also with the others; and it also reduces the variance between runs. How much faster varies widely: a factor of five is rare, 5% is common. One load which might have suffered: imagine a swapping shmem load in a limited mem_cgroup on a machine with plenty of memory. Before 2.6.29 the swapcache was not charged, and such a load would have run quickest with the shmem swapcache never written to swap. But now swapcache is charged, so even this load benefits from shmem_writepage directly to swap. Apologies for the #ifndef CONFIG_SWAP swap_writepage() stub in swap.h: it's silly because that will never get called; but refactoring shmem.c sensibly according to CONFIG_SWAP will be a separate task. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01vmscan: fix it to take care of nodemaskKAMEZAWA Hiroyuki1-1/+1
try_to_free_pages() is used for the direct reclaim of up to SWAP_CLUSTER_MAX pages when watermarks are low. The caller to alloc_pages_nodemask() can specify a nodemask of nodes that are allowed to be used but this is not passed to try_to_free_pages(). This can lead to unnecessary reclaim of pages that are unusable by the caller and int the worst case lead to allocation failure as progress was not been make where it is needed. This patch passes the nodemask used for alloc_pages_nodemask() to try_to_free_pages(). Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01nommu: there is no mlock() for NOMMU, so don't provide the bitsDavid Howells1-7/+13
The mlock() facility does not exist for NOMMU since all mappings are effectively locked anyway, so we don't make the bits available when they're not useful. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Enrik Berkhan <Enrik.Berkhan@ge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: use debug_kmap_atomicAkinobu Mita2-0/+4
Use debug_kmap_atomic in kmap_atomic, kmap_atomic_pfn, and iomap_atomic_prot_pfn. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: introduce debug_kmap_atomicAkinobu Mita1-0/+12
x86 has debug_kmap_atomic_prot() which is error checking function for kmap_atomic. It is usefull for the other architectures, although it needs CONFIG_TRACE_IRQFLAGS_SUPPORT. This patch exposes it to the other architectures. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: page_mkwrite change prototype to match faultNick Piggin2-2/+3
Change the page_mkwrite prototype to take a struct vm_fault, and return VM_FAULT_xxx flags. There should be no functional change. This makes it possible to return much more detailed error information to the VM (and also can provide more information eg. virtual_address to the driver, which might be important in some special cases). This is required for a subsequent fix. And will also make it easier to merge page_mkwrite() with fault() in future. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: enable hashdist by default on 64bit NUMAAnton Blanchard1-3/+3
On PowerPC we allocate large boot time hashes on node 0. This leads to an imbalance in the free memory, for example on a 64GB box (4 x 16GB nodes): Free memory: Node 0: 97.03% Node 1: 98.54% Node 2: 98.42% Node 3: 98.53% If we switch to using vmalloc (like ia64 and x86-64) things are more balanced: Free memory: Node 0: 97.53% Node 1: 98.35% Node 2: 98.33% Node 3: 98.33% For many HPC applications we are limited by the free available memory on the smallest node, so even though the same amount of memory is used the better balancing helps. Since all 64bit NUMA capable architectures should have sufficient vmalloc space, it makes sense to enable it via CONFIG_64BIT. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: fix proc_dointvec_userhz_jiffies "breakage"Alexey Dobriyan1-2/+2
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9838 On i386, HZ=1000, jiffies_to_clock_t() converts time in a somewhat strange way from the user's point of view: # echo 500 >/proc/sys/vm/dirty_writeback_centisecs # cat /proc/sys/vm/dirty_writeback_centisecs 499 So, we have 5000 jiffies converted to only 499 clock ticks and reported back. TICK_NSEC = 999848 ACTHZ = 256039 Keeping in-kernel variable in units passed from userspace will fix issue of course, but this probably won't be right for every sysctl. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01generic debug pageallocAkinobu Mita3-0/+37
CONFIG_DEBUG_PAGEALLOC is now supported by x86, powerpc, sparc64, and s390. This patch implements it for the rest of the architectures by filling the pages with poison byte patterns after free_pages() and verifying the poison patterns before alloc_pages(). This generic one cannot detect invalid page accesses immediately but invalid read access may cause invalid dereference by poisoned memory and invalid write access can be detected after a long delay. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01memdup_user(): introduceLi Zefan1-0/+1
I notice there are many places doing copy_from_user() which follows kmalloc(): dst = kmalloc(len, GFP_KERNEL); if (!dst) return -ENOMEM; if (copy_from_user(dst, src, len)) { kfree(dst); return -EFAULT } memdup_user() is a wrapper of the above code. With this new function, we don't have to write 'len' twice, which can lead to typos/mistakes. It also produces smaller code and kernel text. A quick grep shows 250+ places where memdup_user() *may* be used. I'll prepare a patchset to do this conversion. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: remove pagevec_swap_free()KOSAKI Motohiro1-1/+0
pagevec_swap_free() is now unused. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01vfs: add/use account_page_dirtied()Edward Shishkin1-0/+1
Add a helper function account_page_dirtied(). Use that from two callsites. reiser4 adds a function which adds a third callsite. Signed-off-by: Edward Shishkin<edward.shishkin@gmail.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: introduce for_each_populated_zone() macroKOSAKI Motohiro1-0/+8
Impact: cleanup In almost cases, for_each_zone() is used with populated_zone(). It's because almost function doesn't need memoryless node information. Therefore, for_each_populated_zone() can help to make code simplify. This patch has no functional change. [akpm@linux-foundation.org: small cleanup] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01get_mm_hiwater_xxx: trivial, s/define/inline/Oleg Nesterov1-2/+9
Andrew pointed out get_mm_hiwater_xxx() evaluate "mm" argument thrice/twice, make them inline. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01proc tty: remove struct tty_operations::read_procAlexey Dobriyan1-2/+0
struct tty_operations::proc_fops took it's place and there is one less create_proc_read_entry() user now! Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01proc tty: add struct tty_operations::proc_fopsAlexey Dobriyan1-0/+1
Used for gradual switch of TTY drivers from using ->read_proc which helps with gradual switch from ->read_proc for the whole tree. As side effect, fix possible race condition when ->data initialized after PDE is hooked into proc tree. ->proc_fops takes precedence over ->read_proc. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-31ide: inline SELECT_DRIVE()Sergei Shtylyov1-1/+0
Since SELECT_DRIVE() has boiled down to a mere dev_select() method call, it now makes sense to just inline it... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-03-31ide: turn selectproc() method into dev_select() method (take 5)Sergei Shtylyov1-3/+3
Turn selectproc() method into dev_select() method by teaching it to write to the device register and moving it from 'struct ide_port_ops' to 'struct ide_tp_ops'. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: benh@kernel.crashing.org Cc: petkovbb@gmail.com [bart: add ->dev_select to at91_ide.c and tx4939.c (__BIG_ENDIAN case)] Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-03-31ide: rename IDE_TFLAG_IN_[HOB_]FEATURESergei Shtylyov1-4/+8
The feature register has never been readable -- when its location is read, one gets the error register value; hence rename IDE_TFLAG_IN_[HOB_]FEATURE into IDE_TFLAG_IN_[HOB_]ERROR and introduce the 'hob_error' field into the 'struct ide_taskfile' (despite the error register not really depending on the HOB bit). Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-03-31ide: turn set_irq() method into write_devctl() methodSergei Shtylyov1-4/+2
Turn set_irq() method with its software reset hack into write_devctl() method (for just writing a value into the device control register) at last... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-03-31ide-floppy: use ide_pio_bytes()Bartlomiej Zolnierkiewicz1-5/+0
* Fix ide_init_sg_cmd() setup for non-fs requests. * Convert ide_pc_intr() to use ide_pio_bytes() for floppy media. * Remove no longer needed ide_io_buffers() and sg/sg_cnt fields from struct ide_atapi_pc. * Remove partial completions; kill idefloppy_update_buffers(), as a result. * Add some more debugging statements. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-03-31ide: decrease size of ->pc_buf field in struct ide_atapi_pcBartlomiej Zolnierkiewicz1-1/+1
struct ide_atapi_pc is often allocated on the stack and size of ->pc_buf size is 256 bytes. However since only ide_floppy_create_read_capacity_cmd() and idetape_create_inquiry_cmd() require such size allocate buffers for these pc-s explicitely and decrease ->pc_buf size to 64 bytes. Cc: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-03-31ide: sanitize ide_build_sglist() and ide_destroy_dmatable()Bartlomiej Zolnierkiewicz1-3/+3
* Move ide_map_sg() calls out from ide_build_sglist() to ide_dma_prepare(). * Pass command to ide_destroy_dmatable(). * Rename ide_build_sglist() to ide_dma_map_sg() and ide_destroy_dmatable() to ide_dma_unmap_sg(). There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-03-31ide: add ->dma_check methodBartlomiej Zolnierkiewicz1-0/+1
* Add (an optional) ->dma_check method for checking if DMA can be used for a given command and fail DMA setup in ide_dma_prepare() if necessary. * Convert alim15x3 and trm290 host drivers to use ->dma_check. * Rename ali15x3_dma_setup() to ali_dma_check() while at it. There should be no functional changes caused by this patch. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-03-31ide: add ide_dma_prepare() helperBartlomiej Zolnierkiewicz1-3/+4
* Add ide_dma_prepare() helper. * Convert ide_issue_pc() and do_rw_taskfile() to use it. * Make ide_build_sglist() static. There should be no functional changes caused by this patch. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-03-31ide: destroy DMA mappings after ending DMA (v2)Bartlomiej Zolnierkiewicz1-0/+1
Move ide_destroy_dmatable() call out from ->dma_end method to {ide_pc,cdrom_newpc,ide_dma}_intr(), ide_dma_timeout_retry() and sgiioc4_resetproc(). This causes minor/safe behavior changes w.r.t.: * cmd64x.c::cmd64{8,x}_dma_end() * cs5536.c::cs5536_dma_end() * icside.c::icside_dma_end() * it821x.c::it821x_dma_end() * scc_pata.c::__scc_dma_end() * sl82c105.c::sl82c105_dma_end() * tx4939ide.c::tx4939ide_dma_end() v2: * Fix build for CONFIG_BLK_DEV_IDEDMA=n (reported by Randy Dunlap). Cc: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>