aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2012-10-06idr: rename MAX_LEVEL to MAX_IDR_LEVELFengguang Wu1-1/+1
To avoid name conflicts: drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined While at it, also make the other names more consistent and add parentheses. [akpm@linux-foundation.org: repair fallout] [sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: walter harms <wharms@bfs.de> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-05ext4: fix ext4_flush_completed_IO wait semanticsDmitry Monakhov6-13/+19
BUG #1) All places where we call ext4_flush_completed_IO are broken because buffered io and DIO/AIO goes through three stages 1) submitted io, 2) completed io (in i_completed_io_list) conversion pended 3) finished io (conversion done) And by calling ext4_flush_completed_IO we will flush only requests which were in (2) stage, which is wrong because: 1) punch_hole and truncate _must_ wait for all outstanding unwritten io regardless to it's state. 2) fsync and nolock_dio_read should also wait because there is a time window between end_page_writeback() and ext4_add_complete_io() As result integrity fsync is broken in case of buffered write to fallocated region: fsync blkdev_completion ->filemap_write_and_wait_range ->ext4_end_bio ->end_page_writeback <-- filemap_write_and_wait_range return ->ext4_flush_completed_IO sees empty i_completed_io_list but pended conversion still exist ->ext4_add_complete_io BUG #2) Race window becomes wider due to the 'ext4: completed_io locking cleanup V4' patch series This patch make following changes: 1) ext4_flush_completed_io() now first try to flush completed io and when wait for any outstanding unwritten io via ext4_unwritten_wait() 2) Rename function to more appropriate name. 3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to prevent endless wait Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2012-10-04Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fsLinus Torvalds6-51/+136
Pull ext3 & udf fixes from Jan Kara: "Shortlog pretty much says it all. The interesting bits are UDF support for direct IO and ext3 fix for a long standing oops in data=journal mode." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: jbd: Fix assertion failure in commit code due to lacking transaction credits UDF: Add support for O_DIRECT ext3: Replace 0 with NULL for pointer in super.c file udf: add writepages support for udf ext3: don't clear orphan list on ro mount with errors reiserfs: Make reiserfs_xattr_handlers static
2012-10-03ore: signedness bug in _sp2d_min_pg()Dan Carpenter1-1/+1
This for loop doesn't work correctly when "p" is unsigned. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-10-03ceph: avoid 32-bit page index overflowAlex Elder1-6/+5
A pgoff_t is defined (by default) to have type (unsigned long). On architectures such as i686 that's a 32-bit type. The ceph address space code was attempting to produce 64 bit offsets by shifting a page's index by PAGE_CACHE_SHIFT, but the result was not what was desired because the shift occurred before the result got promoted to 64 bits. Fix this by converting all uses of page->index used in this way to use the page_offset() macro, which ensures the 64-bit result has the intended value. This fixes http://tracker.newdream.net/issues/3112 Reported-by: Mohamed Pakkeer <pakkeer.mohideen@realimage.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-03ceph: return EIO on invalid layout on GET_DATALOC ioctlSage Weil1-2/+6
If the user calls GET_DATALOC on a file with an invalid (e.g., zeroed) layout, return EIO to userland. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-10-03Merge tag 'jfs-3.7' of git://github.com/kleikamp/linux-shaggyLinus Torvalds10-27/+373
Pull JFS update from Dave Kleikamp: "JFS TRIM support and some minor fixes" * tag 'jfs-3.7' of git://github.com/kleikamp/linux-shaggy: jfs: Fix do_div precision in commit b40c2e66 JFS: use list_move instead of list_del/list_add jfs: Remove obsolete email address fs/jfs: TRIM support for JFS Filesystem
2012-10-02Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds3-3/+7
Pull security subsystem updates from James Morris: "Highlights: - Integrity: add local fs integrity verification to detect offline attacks - Integrity: add digital signature verification - Simple stacking of Yama with other LSMs (per LSS discussions) - IBM vTPM support on ppc64 - Add new driver for Infineon I2C TIS TPM - Smack: add rule revocation for subject labels" Fixed conflicts with the user namespace support in kernel/auditsc.c and security/integrity/ima/ima_policy.c. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits) Documentation: Update git repository URL for Smack userland tools ima: change flags container data type Smack: setprocattr memory leak fix Smack: implement revoking all rules for a subject label Smack: remove task_wait() hook. ima: audit log hashes ima: generic IMA action flag handling ima: rename ima_must_appraise_or_measure audit: export audit_log_task_info tpm: fix tpm_acpi sparse warning on different address spaces samples/seccomp: fix 31 bit build on s390 ima: digital signature verification support ima: add support for different security.ima data types ima: add ima_inode_setxattr/removexattr function and calls ima: add inode_post_setattr call ima: replace iint spinblock with rwlock/read_lock ima: allocating iint improvements ima: add appraise action keywords and default rules ima: integrity appraisal extension vfs: move ima_file_free before releasing the file ...
2012-10-02Merge tag 'upstream-3.7-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds20-578/+451
Pull ubifs changes from Artem Bityutskiy: "No big changes for 3.7 in UBIFS: - Error reporting and debug printing improvements - Power cut emulation fixes - Minor cleanups" Fix trivial conflict in fs/ubifs/debug.c due to the user namespace changes. * tag 'upstream-3.7-rc1' of git://git.infradead.org/linux-ubifs: UBIFS: print less UBIFS: use pr_ helper instead of printk UBIFS: comply with coding style UBIFS: use __aligned() attribute UBIFS: remove __DATE__ and __TIME__ UBIFS: fix power cut emulation for mtdram UBIFS: improve scanning debug output UBIFS: always print full error reports UBIFS: print PID in debug messages
2012-10-02Merge tag 'for-linus-v3.7-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds7-80/+447
Pull xfs update from Ben Myers: "Several enhancements and cleanups: - make inode32 and inode64 remountable options - SEEK_HOLE/SEEK_DATA enhancements - cleanup struct declarations in xfs_mount.h" * tag 'for-linus-v3.7-rc1' of git://oss.sgi.com/xfs/xfs: xfs: Make inode32 a remountable option xfs: add inode64->inode32 transition into xfs_set_inode32() xfs: Fix mp->m_maxagi update during inode64 remount xfs: reduce code duplication handling inode32/64 options xfs: make inode64 as the default allocation mode xfs: Fix m_agirotor reset during AG selection Make inode64 a remountable option xfs: stop the sync worker before xfs_unmountfs xfs: xfs_seek_hole() refinement with hole searching from page cache for unwritten extents xfs: xfs_seek_data() refinement with unwritten extents check up from page cache xfs: Introduce a helper routine to probe data or hole offset from page cache xfs: Remove type argument from xfs_seek_data()/xfs_seek_hole() xfs: fix race while discarding buffers [V4] xfs: check for possible overflow in xfs_ioc_trim xfs: unlock the AGI buffer when looping in xfs_dialloc xfs: kill struct declarations in xfs_mount.h xfs: fix uninitialised variable in xfs_rtbuf_get()
2012-10-02Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds94-2092/+2486
Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...
2012-10-02compat: fs: Generic compat_sys_sendfile implementationCatalin Marinas3-2/+26
This function is used by sparc, powerpc and arm64 for compat support. The patch adds a generic implementation which calls do_sendfile() directly and avoids set_fs(). The sparc architecture has wrappers for the sign extensions while powerpc relies on the compiler to do the this. The patch adds wrappers for powerpc to handle the u32->int type conversion. compat_sys_sendfile64() can be replaced by a sys_sendfile() call since compat_loff_t has the same size as off_t on a 64-bit system. On powerpc, the patch also changes the 64-bit sendfile call from sys_sendile64 to sys_sendfile. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02fs: push rcu_barrier() from deactivate_locked_super() to filesystemsKirill A. Shutemov47-6/+240
There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fast paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02btrfs: reada_extent doesn't need kref for refcountAl Viro1-11/+7
All increments and decrements are under the same spinlock - have to be, since they need to protect the radix_tree it's found in. Just use int, no need to wank with kref... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02coredump: move core dump functionality into its own fileAlex Kelly3-645/+688
This prepares for making core dump functionality optional. The variable "suid_dumpable" and associated functions are left in fs/exec.c because they're used elsewhere, such as in ptrace. Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds1-0/+2
Pull input updates from Dmitry Torokhov: "A few drivers were updated with device tree bindings and others got a few small cleanups and fixes." Fix trivial conflict in drivers/input/keyboard/omap-keypad.c due to changes clashing with a whitespace cleanup. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (28 commits) Input: wacom - mark Intuos5 pad as in-prox when touching buttons Input: synaptics - adjust threshold for treating position values as negative Input: hgpk - use %*ph to dump small buffer Input: gpio_keys_polled - fix dt pdata->nbuttons Input: Add KD[GS]KBDIACRUC ioctls to the compatible list Input: omap-keypad - fixed formatting Input: tegra - move platform data header Input: wacom - add support for EMR on Cintiq 24HD touch Input: s3c2410_ts - make s3c_ts_pmops const Input: samsung-keypad - use of_get_child_count() helper Input: samsung-keypad - use of_match_ptr() Input: uinput - fix formatting Input: uinput - specify exact bit sizes on userspace APIs Input: uinput - mark failed submission requests as free Input: uinput - fix race that can block nonblocking read Input: uinput - return -EINVAL when read buffer size is too small Input: uinput - take event lock when fetching events from buffer Input: get rid of MATCH_BIT() macro Input: rotary-encoder - add DT bindings Input: rotary-encoder - constify platform data pointers ...
2012-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-4/+4
Pull networking changes from David Miller: 1) GRE now works over ipv6, from Dmitry Kozlov. 2) Make SCTP more network namespace aware, from Eric Biederman. 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko. 4) Make openvswitch network namespace aware, from Pravin B Shelar. 5) IPV6 NAT implementation, from Patrick McHardy. 6) Server side support for TCP Fast Open, from Jerry Chu and others. 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel Borkmann. 8) Increate the loopback default MTU to 64K, from Eric Dumazet. 9) Use a per-task rather than per-socket page fragment allocator for outgoing networking traffic. This benefits processes that have very many mostly idle sockets, which is quite common. From Eric Dumazet. 10) Use up to 32K for page fragment allocations, with fallbacks to smaller sizes when higher order page allocations fail. Benefits are a) less segments for driver to process b) less calls to page allocator c) less waste of space. From Eric Dumazet. 11) Allow GRO to be used on GRE tunnels, from Eric Dumazet. 12) VXLAN device driver, one way to handle VLAN issues such as the limitation of 4096 VLAN IDs yet still have some level of isolation. From Stephen Hemminger. 13) As usual there is a large boatload of driver changes, with the scale perhaps tilted towards the wireless side this time around. Fix up various fairly trivial conflicts, mostly caused by the user namespace changes. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits) hyperv: Add buffer for extended info after the RNDIS response message. hyperv: Report actual status in receive completion packet hyperv: Remove extra allocated space for recv_pkt_list elements hyperv: Fix page buffer handling in rndis_filter_send_request() hyperv: Fix the missing return value in rndis_filter_set_packet_filter() hyperv: Fix the max_xfer_size in RNDIS initialization vxlan: put UDP socket in correct namespace vxlan: Depend on CONFIG_INET sfc: Fix the reported priorities of different filter types sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP sfc: Fix loopback self-test with separate_tx_channels=1 sfc: Fix MCDI structure field lookup sfc: Add parentheses around use of bitfield macro arguments sfc: Fix null function pointer in efx_sriov_channel_type vxlan: virtual extensible lan igmp: export symbol ip_mc_leave_group netlink: add attributes to fdb interface tg3: unconditionally select HWMON support when tg3 is enabled. Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT" gre: fix sparse warning ...
2012-10-02Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds108-528/+1011
Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...
2012-10-02Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds1-0/+180
Pull cgroup updates from Tejun Heo: - xattr support added. The implementation is shared with tmpfs. The usage is restricted and intended to be used to manage per-cgroup metadata by system software. tmpfs changes are routed through this branch with Hugh's permission. - cgroup subsystem ID handling simplified. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Define CGROUP_SUBSYS_COUNT according the configuration cgroup: Assign subsystem IDs during compile time cgroup: Do not depend on a given order when populating the subsys array cgroup: Wrap subsystem selection macro cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT cgroup: net_prio: Do not define task_netpioidx() when not selected cgroup: net_cls: Do not define task_cls_classid() when not selected cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h cgroup: trivial fixes for Documentation/cgroups/cgroups.txt xattr: mark variable as uninitialized to make both gcc and smatch happy fs: add missing documentation to simple_xattr functions cgroup: add documentation on extended attributes usage cgroup: rename subsys_bits to subsys_mask cgroup: add xattr support cgroup: revise how we re-populate root directory xattr: extract simple_xattr code from tmpfs
2012-10-02Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds12-34/+17
Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...
2012-10-01Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds32-1547/+4884
Pull CIFS updates from Steve French: "This patchset is the final section of the SMB2.1 support merge for cifs.ko. It also includes improvements to the cifs socket handling from Jeff, and also fixes a few cifs bug fixes. It adds SMB2 support for file and inode operations as well as moves some existing cifs code to use ops server struct of protocol specific callbacks. Most of this code is SMB2 specific. When enabled SMB2.1 does pass various functional tests including most of the connectathon test suite, For SMB2.1, Connectathon test 4 and some related tests fail due to not updating mode bits remotely (cifsacl support where mode bits are approximated with the cifs acl is not enable for smb2), and test8 (symlink) support is not completed for SMB2 yet (note that we will likely have a "Unix Extensions" eventually, at least for Samba, so in the long run posix locks won't have to be emulated when mounting Linux to Linux, but for most NAS and for Windows mounts posix lock emulation will still used for SMB2 in a similar fashion as we do for cifs). SMB2.1 dialect is supported. Although additional fixes to enable smb2 (the original smb2.02) dialect and to add various optional features of the smb3 dialect are expected to be added in the future as testing progresses, currently mounting with the "vers=2.1" is supported (in order to mount using SMB2.1 to servers like Samba 4, and Windows 7, Windows 2008R2)." * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: (82 commits) [CIFS] Fix indentation of fs/cifs/Kconfig entries [CIFS] Fix SMB2 negotiation support to select only one dialect (based on vers=) cifs: obtain file access during backup intent lookup (resend) CIFS: Fix possible freed pointer dereference in CIFS_SessSetup CIFS: Fix possible freed pointer dereference in SMB2_sess_setup CIFS: Make ops->close return void cifs: change DOS/NT/POSIX mapping of ERRnoresource cifs: remove support for deprecated "forcedirectio" and "strictcache" mount options cifs: remove support for CIFS_IOC_CHECKUMOUNT ioctl CIFS: Fix possible memory leaks in SMB2 code CIFS: Fix endian conversion of IndexNumber Trivial endian fixes MARK SMB2 support EXPERIMENTAL Update cifs version number cifs: add FL_CLOSE to fl_flags mask in cifs_read_flock cifs: Mangle string used for unc in /proc/mounts cifs: cleanups for cifs_mkdir_qinfo CIFS: Fix fast lease break after open problem CIFS: Add SMB2.1 lease break support CIFS: Fix cache coherency for read oplock case ...
2012-10-01ceph: propagate layout error on osd request creationSage Weil2-6/+6
If we are creating an osd request and get an invalid layout, return an EINVAL to the caller. We switch up the return to have an error code instead of NULL implying -ENOMEM. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-10-01Merge branch 'next' into for-linusDmitry Torokhov1-0/+2
Prepare first set of updates for 3.7 merge window.
2012-10-01Merge tag 'dlm-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlmLinus Torvalds13-130/+289
Pull dlm updates from David Teigland: "There are two main patches in this set, both related to the userland dlm_controld daemon. The first fixes a deadlock between dlm_controld and the dlm_send workqueue when both access configfs data simultaneously. The second reworks some code to get around a long standing, but intentional, unlock balance warning. The userland daemon no longer takes a lock that is later released from the kernel. The other commits are minor fixes and changes." * tag 'dlm-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: check the maximum size of a request from user dlm: cleanup send_to_sock routine dlm: convert add_sock routine return value type to void dlm: remove redundant variable assignments dlm: fix unlock balance warnings dlm: fix uninitialized spinlock dlm: fix deadlock between dlm_send and dlm_controld
2012-10-01ceph: convert to use le32_add_cpu()Wei Yongjun1-1/+1
Convert cpu_to_le32(le32_to_cpu(E1) + E2) to use le32_add_cpu(). dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-01ceph: Fix oops when handling mdsmap that decreases max_mdsYan, Zheng1-1/+2
When i >= newmap->m_max_mds, ceph_mdsmap_get_addr(newmap, i) return NULL. Passing NULL to memcmp() triggers oops. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-01ceph: let path portion of mount "device" be optionalAlex Elder1-11/+26
A recent change to /sbin/mountall causes any trailing '/' character in the "device" (or fs_spec) field in /etc/fstab to be stripped. As a result, an entry for a ceph mount that intends to mount the root of the name space ends up with now path portion, and the ceph mount option processing code rejects this. That is, an entry in /etc/fstab like: cephserver:port:/ /mnt ceph defaults 0 0 provides to the ceph code just "cephserver:port:" as the "device," and that gets rejected. Although this is a bug in /sbin/mountall, we can have the ceph mount code support an empty/nonexistent path, interpreting it to mean the root of the name space. RFC 5952 offers recommendations for how to express IPv6 addresses, and recommends the usage found in RFC 3986 (which specifies the format for URI's) for representing both IPv4 and IPv6 addresses that include port numbers. (See in particular the definition of "authority" found in the Appendix of RFC 3986.) According to those standards, no host specification will ever contain a '/' character. As a result, it is sufficient to scan a provided "device" from an /etc/fstab entry for the first '/' character, and if it's found, treat that as the beginning of the path. If no '/' character is present, we can treat the entire string as the monitor host specification(s), and assume the path to be the root of the name space. We'll still require a ':' to separate the host portion from the (possibly empty) path portion. This means that we can more formally define how ceph will interpret the "device" it's provided when processing a mount request: "device" will look like: <server_spec>[,<server_spec>...]:[<path>] where <server_spec> is <ip>[:<port>] <path> is optional, but if present must begin with '/' This addresses http://tracker.newdream.net/issues/2919 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-10-01Merge tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds1-0/+6
Pull TTY changes from Greg Kroah-Hartman: "As we skipped the merge window for 3.6-rc1 for the tty tree, everything is now settled down and working properly, so we are ready for 3.7-rc1. Here's the patchset, it's big, but the large changes are removing a firmware file and adding a staging tty driver (it depended on the tty core changes, so it's going through this tree instead of the staging tree.) All of these patches have been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fix up more-or-less trivial conflicts in - drivers/char/pcmcia/synclink_cs.c: tty NULL dereference fix vs tty_port_cts_enabled() helper function - drivers/staging/{Kconfig,Makefile}: add-add conflict (dgrp driver added close to other staging drivers) - drivers/staging/ipack/devices/ipoctal.c: "split ipoctal_channel from iopctal" vs "TTY: use tty_port_register_device" * tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (235 commits) tty/serial: Add kgdb_nmi driver tty/serial/amba-pl011: Quiesce interrupts in poll_get_char tty/serial/amba-pl011: Implement poll_init callback tty/serial/core: Introduce poll_init callback kdb: Turn KGDB_KDB=n stubs into static inlines kdb: Implement disable_nmi command kernel/debug: Mask KGDB NMI upon entry serial: pl011: handle corruption at high clock speeds serial: sccnxp: Make 'default' choice in switch last serial: sccnxp: Remove mask termios caps for SW flow control serial: sccnxp: Report actual baudrate back to core serial: samsung: Add poll_get_char & poll_put_char Powerpc 8xx CPM_UART setting MAXIDL register proportionaly to baud rate Powerpc 8xx CPM_UART maxidl should not depend on fifo size Powerpc 8xx CPM_UART too many interrupts Powerpc 8xx CPM_UART desynchronisation serial: set correct baud_base for EXSYS EX-41092 Dual 16950 serial: omap: fix the reciever line error case 8250: blacklist Winbond CIR port 8250_pnp: do pnp probe before legacy probe ...
2012-10-01Merge tag 'driver-core-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds2-5/+5
Pull driver core merge from Greg Kroah-Hartman: "Here is the big driver core update for 3.7-rc1. A number of firmware_class.c updates (as you saw a month or so ago), and some hyper-v updates and some printk fixes as well. All patches that are outside of the drivers/base area have been acked by the respective maintainers, and have all been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits) memory: tegra{20,30}-mc: Fix reading incorrect register in mc_readl() device.h: Add missing inline to #ifndef CONFIG_PRINTK dev_vprintk_emit memory: emif: Add ifdef CONFIG_DEBUG_FS guard for emif_debugfs_[init|exit] Documentation: Fixes some translation error in Documentation/zh_CN/gpio.txt Documentation: Remove 3 byte redundant code at the head of the Documentation/zh_CN/arm/booting Documentation: Chinese translation of Documentation/video4linux/omap3isp.txt device and dynamic_debug: Use dev_vprintk_emit and dev_printk_emit dev: Add dev_vprintk_emit and dev_printk_emit netdev_printk/netif_printk: Remove a superfluous logging colon netdev_printk/dynamic_netdev_dbg: Directly call printk_emit dev_dbg/dynamic_debug: Update to use printk_emit, optimize stack driver-core: Shut up dev_dbg_reatelimited() without DEBUG tools/hv: Parse /etc/os-release tools/hv: Check for read/write errors tools/hv: Fix exit() error code tools/hv: Fix file handle leak Tools: hv: Implement the KVP verb - KVP_OP_GET_IP_INFO Tools: hv: Rename the function kvp_get_ip_address() Tools: hv: Implement the KVP verb - KVP_OP_SET_IP_INFO Tools: hv: Add an example script to configure an interface ...
2012-10-01Merge tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64Linus Torvalds1-2/+2
Pull arm64 support from Catalin Marinas: "Linux support for the 64-bit ARM architecture (AArch64) Features currently supported: - 39-bit address space for user and kernel (each) - 4KB and 64KB page configurations - Compat (32-bit) user applications (ARMv7, EABI only) - Flattened Device Tree (mandated for all AArch64 platforms) - ARM generic timers" * tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (35 commits) arm64: ptrace: remove obsolete ptrace request numbers from user headers arm64: Do not set the SMP/nAMP processor bit arm64: MAINTAINERS update arm64: Build infrastructure arm64: Miscellaneous header files arm64: Generic timers support arm64: Loadable modules arm64: Miscellaneous library functions arm64: Performance counters support arm64: Add support for /proc/sys/debug/exception-trace arm64: Debugging support arm64: Floating point and SIMD arm64: 32-bit (compat) applications support arm64: User access library functions arm64: Signal handling support arm64: VDSO support arm64: System calls handling arm64: ELF definitions arm64: SMP support arm64: DMA mapping API ...
2012-10-01[CIFS] Fix indentation of fs/cifs/Kconfig entriesSteve French1-18/+19
make menuconfig for cifs shows multiple entries toward the end of the list with the incorrect indentation (probably a bug in Kconfig parsing of items that are dependant on the module (cifs=m instead of just CONFIG_CIFS). This patch fixes the indentation of all but the last entry (CIFS_ACL) which I don't know how to fix. It also clarifies wording in two places Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-10-01[CIFS] Fix SMB2 negotiation support to select only one dialect (based on vers=)Steve French5-29/+56
Based on whether the user (on mount command) chooses: vers=3.0 (for smb3.0 support) vers=2.1 (for smb2.1 support) or (with subsequent patch, which will allow SMB2 support) vers=2.0 (for original smb2.02 dialect support) send only one dialect at a time during negotiate (we had been sending a list). Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-10-01Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivialLinus Torvalds11-21/+17
Pull the trivial tree from Jiri Kosina: "Tiny usual fixes all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) doc: fix old config name of kprobetrace fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc btrfs: fix the commment for the action flags in delayed-ref.h btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID vfs: fix kerneldoc for generic_fh_to_parent() treewide: fix comment/printk/variable typos ipr: fix small coding style issues doc: fix broken utf8 encoding nfs: comment fix platform/x86: fix asus_laptop.wled_type module parameter mfd: printk/comment fixes doc: getdelays.c: remember to close() socket on error in create_nl_socket() doc: aliasing-test: close fd on write error mmc: fix comment typos dma: fix comments spi: fix comment/printk typos in spi Coccinelle: fix typo in memdup_user.cocci tmiofb: missing NULL pointer checks tools: perf: Fix typo in tools/perf tools/testing: fix comment / output typos ...
2012-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds15-800/+710
Pull GFS2 updates from Steven Whitehouse: "The major feature this time is the "rbm" conversion in the resource group code. The new struct gfs2_rbm specifies the location of an allocatable block in (resource group, bitmap, offset) form. There are a number of added helper functions, and later patches then rewrite some of the resource group code in terms of this new structure. Not only does this give us a nice code clean up, but it also removes some of the previous restrictions where extents could not cross bitmap boundaries, for example. In addition to that, there are a few bug fixes and clean ups, but the rbm work is by far the majority of this patch set in terms of number of changed lines." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (27 commits) GFS2: Write out dirty inode metadata in delayed deletes GFS2: fix s_writers.counter imbalance in gfs2_ail_empty_gl GFS2: Fix infinite loop in rbm_find GFS2: Consolidate free block searching functions GFS2: Get rid of I_MUTEX_QUOTA usage GFS2: Stop block extents at the end of bitmaps GFS2: Fix unclaimed_blocks() wrapping bug and clean up GFS2: Improve block reservation tracing GFS2: Fall back to ignoring reservations, if there are no other blocks left GFS2: Fix ->show_options() for statfs slow GFS2: Use rbm for gfs2_setbit() GFS2: Use rbm for gfs2_testbit() GFS2: Eliminate unnecessary check for state > 3 in bitfit GFS2: Eliminate redundant calls to may_grant GFS2: Combine functions gfs2_glock_dq_wait and wait_on_demote GFS2: Combine functions gfs2_glock_wait and wait_on_holder GFS2: inline __gfs2_glock_schedule_for_reclaim GFS2: change function gfs2_direct_IO to use a normal gfs2_glock_dq GFS2: rbm code cleanup GFS2: Fix case where reservation finished at end of rgrp ...
2012-09-30ext4: fix mtime update in nodelalloc modeTheodore Ts'o3-6/+9
Commits 5e8830dc85d0 and 41c4d25f78c0 introduced a regression into v3.6-rc1 for ext4 in nodealloc mode, such that mtime updates would not take place for files modified via mmap if the page was already in the page cache. This would also affect ext3 file systems mounted using the ext4 file system driver. The problem was that ext4_page_mkwrite() had a shortcut which would avoid calling __block_page_mkwrite() under some circumstances, and the above two commit transferred the responsibility of calling file_update_time() to __block_page_mkwrite --- which woudln't get called in some circumstances. Since __block_page_mkwrite() only has three callers, block_page_mkwrite(), ext4_page_mkwrite, and nilfs_page_mkwrite(), the best way to solve this is to move the responsibility for calling file_update_time() to its caller. This problem was found via xfstests #215 with a file system mounted with -o nodelalloc. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@vger.kernel.org
2012-09-30ext4: fix ext_remove_space for punch_hole caseDmitry Monakhov1-7/+9
Inode is allowed to have empty leaf only if it this is blockless inode. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-30ext4: punch_hole should wait for DIO writersDmitry Monakhov1-17/+36
punch_hole is the place where we have to wait for all existing writers (writeback, aio, dio), but currently we simply flush pended end_io request which is not sufficient. Other issue is that punch_hole performed w/o i_mutex held which obviously result in dangerous data corruption due to write-after-free. This patch performs following changes: - Guard punch_hole with i_mutex - Recheck inode flags under i_mutex - Block all new dio readers in order to prevent information leak caused by read-after-free pattern. - punch_hole now wait for all writers in flight NOTE: XXX write-after-free race is still possible because new dirty pages may appear due to mmap(), and currently there is no easy way to stop writeback while punch_hole is in progress. [ Fixed error return from ext4_ext_punch_hole() to make sure that we release i_mutex before returning EPERM or ETXTBUSY -- Ted ] Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-29vfs: dcache: fix deadlock in tree traversalMiklos Szeredi1-0/+6
IBM reported a deadlock in select_parent(). This was found to be caused by taking rename_lock when already locked when restarting the tree traversal. There are two cases when the traversal needs to be restarted: 1) concurrent d_move(); this can only happen when not already locked, since taking rename_lock protects against concurrent d_move(). 2) racing with final d_put() on child just at the moment of ascending to parent; rename_lock doesn't protect against this rare race, so it can happen when already locked. Because of case 2, we need to be able to handle restarting the traversal when rename_lock is already held. This patch fixes all three callers of try_to_ascend(). IBM reported that the deadlock is gone with this patch. [ I rewrote the patch to be smaller and just do the "goto again" if the lock was already held, but credit goes to Miklos for the real work. - Linus ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-29ext4: serialize truncate with owerwrite DIO workersDmitry Monakhov1-0/+2
Jan Kara have spotted interesting issue: There are potential data corruption issue with direct IO overwrites racing with truncate: Like: dio write truncate_task ->ext4_ext_direct_IO ->overwrite == 1 ->down_read(&EXT4_I(inode)->i_data_sem); ->mutex_unlock(&inode->i_mutex); ->ext4_setattr() ->inode_dio_wait() ->truncate_setsize() ->ext4_truncate() ->down_write(&EXT4_I(inode)->i_data_sem); ->__blockdev_direct_IO ->ext4_get_block ->submit_io() ->up_read(&EXT4_I(inode)->i_data_sem); # truncate data blocks, allocate them to # other inode - bad stuff happens because # dio is still in flight. In order to serialize with truncate dio worker should grab extra i_dio_count reference before drop i_mutex. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-29ext4: endless truncate due to nonlocked dio readersDmitry Monakhov1-2/+7
If we have enough aggressive DIO readers, truncate and other dio waiters will wait forever inside inode_dio_wait(). It is reasonable to disable nonlock DIO read optimization during truncate. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-29ext4: serialize unlocked dio reads with truncateDmitry Monakhov1-2/+5
Current serialization will works only for DIO which holds i_mutex, but nonlocked DIO following race is possible: dio_nolock_read_task truncate_task ->ext4_setattr() ->inode_dio_wait() ->ext4_ext_direct_IO ->ext4_ind_direct_IO ->__blockdev_direct_IO ->ext4_get_block ->truncate_setsize() ->ext4_truncate() #alloc truncated blocks #to other inode ->submit_io() #INFORMATION LEAK In order to serialize with unlocked DIO reads we have to rearrange wait sequence 1) update i_size first 2) if i_size about to be reduced wait for outstanding DIO requests 3) and only after that truncate inode blocks Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-29ext4: serialize dio nonlocked reads with defrag workersDmitry Monakhov4-0/+44
Inode's block defrag and ext4_change_inode_journal_flag() may affect nonlocked DIO reads result, so proper synchronization required. - Add missed inode_dio_wait() calls where appropriate - Check inode state under extra i_dio_count reference. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-29ext4: completed_io locking cleanupDmitry Monakhov6-169/+121
Current unwritten extent conversion state-machine is very fuzzy. - For unknown reason it performs conversion under i_mutex. What for? My diagnosis: We already protect extent tree with i_data_sem, truncate and punch_hole should wait for DIO, so the only data we have to protect is end_io->flags modification, but only flush_completed_IO and end_io_work modified this flags and we can serialize them via i_completed_io_lock. Currently all these games with mutex_trylock result in the following deadlock truncate: kworker: ext4_setattr ext4_end_io_work mutex_lock(i_mutex) inode_dio_wait(inode) ->BLOCK DEADLOCK<- mutex_trylock() inode_dio_done() #TEST_CASE1_BEGIN MNT=/mnt_scrach unlink $MNT/file fallocate -l $((1024*1024*1024)) $MNT/file aio-stress -I 100000 -O -s 100m -n -t 1 -c 10 -o 2 -o 3 $MNT/file sleep 2 truncate -s 0 $MNT/file #TEST_CASE1_END Or use 286's xfstests https://github.com/dmonakhov/xfstests/blob/devel/286 This patch makes state machine simple and clean: (1) xxx_end_io schedule final extent conversion simply by calling ext4_add_complete_io(), which append it to ei->i_completed_io_list NOTE1: because of (2A) work should be queued only if ->i_completed_io_list was empty, otherwise the work is scheduled already. (2) ext4_flush_completed_IO is responsible for handling all pending end_io from ei->i_completed_io_list Flushing sequence consists of following stages: A) LOCKED: Atomically drain completed_io_list to local_list B) Perform extents conversion C) LOCKED: move converted io's to to_free list for final deletion This logic depends on context which we was called from. D) Final end_io context destruction NOTE1: i_mutex is no longer required because end_io->flags modification is protected by ei->ext4_complete_io_lock Full list of changes: - Move all completion end_io related routines to page-io.c in order to improve logic locality - Move open coded logic from various xx_end_xx routines to ext4_add_complete_io() - remove EXT4_IO_END_FSYNC - Improve SMP scalability by removing useless i_mutex which does not protect io->flags anymore. - Reduce lock contention on i_completed_io_lock by optimizing list walk. - Rename ext4_end_io_nolock to end4_end_io and make it static - Check flush completion status to ext4_ext_punch_hole(). Because it is not good idea to punch blocks from corrupted inode. Changes since V3 (in request to Jan's comments): Fall back to active flush_completed_IO() approach in order to prevent performance issues with nolocked DIO reads. Changes since V2: Fix use-after-free caused by race truncate vs end_io_work Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-28ext4: fix unwritten counter leakageDmitry Monakhov2-8/+19
ext4_set_io_unwritten_flag() will increment i_unwritten counter, so once we mark end_io with EXT4_END_IO_UNWRITTEN we have to revert it back on error path. - add missed error checks to prevent counter leakage - ext4_end_io_nolock() will clear EXT4_END_IO_UNWRITTEN flag to signal that conversion finished. - add BUG_ON to ext4_free_end_io() to prevent similar leakage in future. Visible effect of this bug is that unaligned aio_stress may deadlock Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-28ext4: give i_aiodio_unwritten a more appropriate nameDmitry Monakhov4-7/+7
AIO/DIO prefix is wrong because it account unwritten extents which also may be scheduled from buffered write endio Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-28ext4: ext4_inode_info dietDmitry Monakhov4-8/+15
Generic inode has unused i_private pointer which may be used as cur_aio_dio storage. TODO: If cur_aio_dio will be passed as an argument to get_block_t this allow to have concurent AIO_DIO requests. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-28cifs: obtain file access during backup intent lookup (resend)Shirish Pargaonkar5-41/+80
Rebased and resending the patch. Path based queries can fail for lack of access, especially during lookup during open. open itself would actually succeed becasue of back up intent bit but queries (either path or file handle based) do not have a means to specifiy backup intent bit. So query the file info during lookup using trans2 / findfirst / file_id_full_dir_info to obtain file info as well as file_id/inode value. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Acked-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller10-84/+73
Conflicts: drivers/net/team/team.c drivers/net/usb/qmi_wwan.c net/batman-adv/bat_iv_ogm.c net/ipv4/fib_frontend.c net/ipv4/route.c net/l2tp/l2tp_netlink.c The team, fib_frontend, route, and l2tp_netlink conflicts were simply overlapping changes. qmi_wwan and bat_iv_ogm were of the "use HEAD" variety. With help from Antonio Quartulli. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-28Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2-4/+9
Pull vfs fixes from Al Viro: "A couple of fixes; one for automount/lazy umount race, another a classic "we don't protect the refcount transition to zero with the lock that protects looking for object in hash" kind of crap in lockd." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: close the race in nlmsvc_free_block() do_add_mount()/umount -l races
2012-09-28Merge tag 'v3.6-rc7' into nextJames Morris106-834/+1123
Linux 3.6-rc7 Requested by David Howells so he can merge his key susbsystem work into my tree with requisite -linus changesets.