linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-04-15	VFS: Impose ordering on accesses of d_inode and d_flags	David Howells	1	-18/+3
	Impose ordering on accesses of d_inode and d_flags to avoid the need to do this: if (!dentry->d_inode \|\| d_is_negative(dentry)) { when this: if (d_is_negative(dentry)) { should suffice. This check is especially problematic if a dentry can have its type field set to something other than DENTRY_MISS_TYPE when d_inode is NULL (as in unionmount). What we really need to do is stick a write barrier between setting d_inode and setting d_flags and a read barrier between reading d_flags and reading d_inode. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15	VFS: Add owner-filesystem positive/negative dentry checks	David Howells	1	-0/+38
	Supply two functions to test whether a filesystem's own dentries are positive or negative (d_really_is_positive() and d_really_is_negative()). The problem is that the DCACHE_ENTRY_TYPE field of dentry->d_flags may be overridden by the union part of a layered filesystem and isn't thus necessarily indicative of the type of dentry. Normally, this would involve a negative dentry (ie. ->d_inode == NULL) having ->d_layer.lower pointed to a lower layer dentry, DCACHE_PINNING_LOWER set and the DCACHE_ENTRY_TYPE field set to something other than DCACHE_MISS_TYPE - but it could also involve, say, a DCACHE_SPECIAL_TYPE being overridden to DCACHE_WHITEOUT_TYPE if a 0,0 chardev is detected in the top layer. However, inside a filesystem, when that fs is looking at its own dentries, it probably wants to know if they are really negative or not - and doesn't care about the fallthrough bits used by the union. To this end, a filesystem should normally use d_really_is_positive/negative() when looking at its own dentries rather than d_is_positive/negative() and should use d_inode() to get at the inode. Anyone looking at someone else's dentries (this includes pathwalk) should use d_is_xxx() and d_backing_inode(). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15	nfs: generic_write_checks() shouldn't be done on swapout...	Al Viro	1	-2/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	mirror O_APPEND and O_DIRECT into iocb->ki_flags	Al Viro	1	-0/+15
	... avoiding write_iter/fcntl races. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	switch generic_write_checks() to iocb and iter	Al Viro	1	-1/+1
	... returning -E... upon error and amount of data left in iter after (possible) truncation upon success. Note, that normal case gives a non-zero (positive) return value, so any tests for != 0 _must_ be updated. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Conflicts: fs/ext4/file.c
2015-04-11	Merge branch 'for-linus' into for-next	Al Viro	1	-1/+1

2015-04-11	generic_write_checks(): drop isblk argument	Al Viro	1	-1/+1
	all remaining callers are passing 0; some just obscure that fact. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	direct_IO: remove rw from a_ops->direct_IO()	Omar Sandoval	2	-2/+2
	Now that no one is using rw, remove it completely. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	Remove rw from dax_{do_,}io()	Omar Sandoval	1	-2/+2
	And use iov_iter_rw() instead. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	Remove rw from {,__,do_}blockdev_direct_IO()	Omar Sandoval	1	-10/+12
	Most filesystems call through to these at some point, so we'll start here. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	new helper: iov_iter_rw()	Omar Sandoval	1	-0/+8
	Get either READ or WRITE out of iter->type. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	->aio_read and ->aio_write removed	Al Viro	1	-2/+0
	no remaining users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	kill do_sync_read/do_sync_write	Al Viro	1	-2/+0
	all remaining instances of aio_{read,write} (all 4 of them) have explicit ->read and ->write resp.; do_sync_read/do_sync_write is never called by __vfs_read/__vfs_write anymore and no other users had been left. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	make new_sync_{read,write}() static	Al Viro	1	-2/+0
	All places outside of core VFS that checked ->read and ->write for being NULL or called the methods directly are gone now, so NULL {read,write} with non-NULL {read,write}_iter will do the right thing in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	new helper: __vfs_write()	Al Viro	1	-0/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	9p: switch p9_client_read() to passing struct iov_iter *	Al Viro	1	-2/+1
	... and make it loop Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	9p: switch p9_client_write() to passing it struct iov_iter *	Al Viro	1	-2/+3
	... and make it loop until it's done Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	net/9p: switch the guts of p9_client_{read,write}() to iov_iter	Al Viro	1	-1/+1
	... and have get_user_pages_fast() mapping fewer pages than requested to generate a short read/write. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	Merge branch 'for-davem' into for-next	Al Viro	150	-1179/+2676

2015-04-11	Merge branch 'iov_iter' into for-next	Al Viro	1	-0/+14

2015-04-11	Merge branch 'iocb' into for-next	Al Viro	3	-70/+23

2015-04-11	VFS: Add iov_iter_fault_in_multipages_readable()	Anton Altaparmakov	1	-0/+1
	simillar to iov_iter_fault_in_readable() but differs in that it is not limited to faulting in the first iovec and instead faults in "bytes" bytes iterating over the iovecs as necessary. Also, instead of only faulting in the first and last page of the range, all pages are faulted in. This function is needed by NTFS when it does multi page file writes. Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	switch security_inode_getattr() to struct path *	Al Viro	1	-4/+3
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	kill struct filename.separate	Al Viro	1	-1/+1
	just make const char iname[] the last member and compare name->name with name->iname instead of checking name->separate We need to make sure that out-of-line name doesn't end up allocated adjacent to struct filename refering to it; fortunately, it's easy to achieve - just allocate that struct filename with one byte in ->iname[], so that ->iname[0] will be inside the same object and thus have an address different from that of out-of-line name [spotted by Boqun Feng <boqun.feng@gmail.com>] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	new helper: msg_data_left()	Al Viro	1	-0/+5
	convert open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11	Merge remote-tracking branch 'dh/afs' into for-davem	Al Viro	1	-1/+2

2015-04-11	get rid of the size argument of sock_sendmsg()	Al Viro	1	-1/+1
	it's equal to iov_iter_count(&msg->msg_iter) in all cases Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-09	net: switch importing msghdr from userland to {compat_,}import_iovec()	Al Viro	1	-1/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-09	Merge branch 'iov_iter' into for-davem	Al Viro	1	-0/+14

2015-04-09	Merge branch 'iocb' into for-davem	Al Viro	3	-70/+23
	trivial conflict in net/socket.c and non-trivial one in crypto - that one had evaded aio_complete() removal. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-07	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	David S. Miller	2	-13/+17
	Johan Hedberg says: ==================== pull request: bluetooth-next 2015-04-04 Here's what's probably the last bluetooth-next pull request for 4.1: - Fixes for LE advertising data & advertising parameters - Fix for race condition with HCI_RESET flag - New BNEPGETSUPPFEAT ioctl, needed for certification - New HCI request callback type to get the resulting skb - Cleanups to use BIT() macro wherever possible - Consolidate Broadcom device entries in the btusb HCI driver - Check for valid flags in CMTP, HIDP & BNEP - Disallow local privacy & OOB data combo to prevent a potential race - Expose SMP & ECDH selftest results through debugfs - Expose current Device ID info through debugfs Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	7	-18/+30
	Conflicts: drivers/net/ethernet/mellanox/mlx4/cmd.c net/core/fib_rules.c net/ipv4/fib_frontend.c The fib_rules.c and fib_frontend.c conflicts were locking adjustments in 'net' overlapping addition and removal of code in 'net-next'. The mlx4 conflict was a bug fix in 'net' happening in the same place a constant was being replaced with a more suitable macro. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds	4	-17/+10
	Pull networking fixes from David Miller: 1) In TCP, don't register an FRTO for cumulatively ACK'd data that was previously SACK'd, from Neal Cardwell. 2) Need to hold RNL mutex in ipv4 multicast code namespace cleanup, from Cong WANG. 3) Similarly we have to hold RNL mutex for fib_rules_unregister(), also from Cong WANG. 4) Revert and rework netns nsid allocation fix, from Nicolas Dichtel. 5) When we encapsulate for a tunnel device, skb->sk still points to the user socket. So this leads to cases where we retraverse the ipv4/ipv6 output path with skb->sk being of some other address family (f.e. AF_PACKET). This can cause things to crash since the ipv4 output path is dereferencing an AF_PACKET socket as if it were an ipv4 one. The short term fix for 'net' and -stable is to elide these socket checks once we've entered an encapsulation sequence by testing xmit_recursion. Longer term we have a better solution wherein we pass the tunnel's socket down through the output paths, but that is way too invasive for 'net' and -stable. From Hannes Frederic Sowa. 6) l2tp_init() failure path forgets to unregister per-net ops, from Cong WANG. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net/mlx4_core: Fix error message deprecation for ConnectX-2 cards net: dsa: fix filling routing table from OF description l2tp: unregister l2tp_net_ops on failure path mvneta: dont call mvneta_adjust_link() manually ipv6: protect skb->sk accesses from recursive dereference inside the stack netns: don't allocate an id for dead netns Revert "netns: don't clear nsid too early on removal" ip6mr: call del_timer_sync() in ip6mr_free_table() net: move fib_rules_unregister() under rtnl lock ipv4: take rtnl_lock and mark mrt table as freed on namespace cleanup tcp: fix FRTO undo on cumulative ACK of SACKed range xen-netfront: transmit fully GSO-sized packets
2015-04-06	fix mremap() vs. ioctx_kill() race	Al Viro	1	-1/+1
	teach ->mremap() method to return an error and have it fail for aio mappings in process of being killed Note that in case of ->mremap() failure we need to undo move_page_tables() we'd already done; we could call ->mremap() first, but then the failure of move_page_tables() would require undoing whatever _successful_ ->mremap() has done, which would be a lot more headache in general. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-06	tc: bpf: add checksum helpers	Alexei Starovoitov	1	-1/+37
	Commit 608cd71a9c7c ("tc: bpf: generalize pedit action") has added the possibility to mangle packet data to BPF programs in the tc pipeline. This patch adds two helpers bpf_l3_csum_replace() and bpf_l4_csum_replace() for fixing up the protocol checksums after the packet mangling. It also adds 'flags' argument to bpf_skb_store_bytes() helper to avoid unnecessary checksum recomputations when BPF programs adjusting l3/l4 checksums and documents all three helpers in uapi header. Moreover, a sample program is added to show how BPF programs can make use of the mangle and csum helpers. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06	ipv6: protect skb->sk accesses from recursive dereference inside the stack	hannes@stressinduktion.org	4	-17/+10
	We should not consult skb->sk for output decisions in xmit recursion levels > 0 in the stack. Otherwise local socket settings could influence the result of e.g. tunnel encapsulation process. ipv6 does not conform with this in three places: 1) ip6_fragment: we do consult ipv6_npinfo for frag_size 2) sk_mc_loop in ipv6 uses skb->sk and checks if we should loop the packet back to the local socket 3) ip6_skb_dst_mtu could query the settings from the user socket and force a wrong MTU Furthermore: In sk_mc_loop we could potentially land in WARN_ON(1) if we use a PF_PACKET socket ontop of an IPv6-backed vxlan device. Reuse xmit_recursion as we are currently only interested in protecting tunnel devices. Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04	netfilter: Pass nf_hook_state through arpt_do_table().	David S. Miller	1	-2/+1
	Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04	netfilter: Pass nf_hook_state through nft_set_pktinfo*().	David S. Miller	3	-10/+7
	Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04	netfilter: Pass nf_hook_state through ip6t_do_table().	David S. Miller	1	-2/+1
	Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04	netfilter: Pass nf_hook_state through nf_nat_ipv6_{in,out,fn,local_fn}().	David S. Miller	1	-16/+8
	Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04	netfilter: Pass nf_hook_state through ipt_do_table().	David S. Miller	1	-2/+1
	Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04	netfilter: Pass nf_hook_state through nf_nat_ipv4_{in,out,fn,local_fn}().	David S. Miller	1	-16/+8
	Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04	netfilter: Make nf_hookfn use nf_hook_state.	David S. Miller	1	-3/+1
	Pass the nf_hook_state all the way down into the hook functions themselves. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04	netfilter: Use nf_hook_state in nf_queue_entry.	David S. Miller	1	-5/+1
	That way we don't have to reinstantiate another nf_hook_state on the stack of the nf_reinject() path. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04	netfilter: Create and use nf_hook_state.	David S. Miller	1	-5/+23
	Instead of passing a large number of arguments down into the nf_hook() entry points, create a structure which carries this state down through the hook processing layers. This makes is so that if we want to change the types or signatures of any of these pieces of state, there are less places that need to be changed. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-03	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input	Linus Torvalds	1	-1/+2
	Pull input subsystem fixes from Dmitry Torokhov: "A fix for ALPS driver for issue introduced in the latest update and a tweak for yet another Lenovo box in Synaptics. There will be more ALPS tweaks coming.." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: define INPUT_PROP_ACCELEROMETER behavior Input: synaptics - fix min-max quirk value for E440 Input: synaptics - add quirk for Thinkpad E440 Input: ALPS - fix max coordinates for v5 and v7 protocols Input: add MT_TOOL_PALM
2015-04-03	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	Linus Torvalds	1	-0/+1
	Pull block layer fix from Jens Axboe: "Just one patch in this pull request, fixing a regression caused by a 'mathematically correct' change to lcm()" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix blk_stack_limits() regression due to lcm() change
2015-04-03	add fixed_phy_update_state() - update state of fixed_phy	Stas Sergeev	1	-0/+9
	Currently fixed_phy uses a callback to periodically poll the link state. This patch adds the fixed_phy_update_state() API. It solves the following problems: - On link state interrupt, MAC driver can't update status. Instead it needs to provide the callback to periodically query the HW about the link state. It is more efficient to update status after interrupt. - The callback needs to be unregistered before phy_disconnect(), or otherwise it will be called with net_dev==NULL. phy_disconnect() does not have enough info to unregister the callback automatically. - The callback needs to be registered before of_phy_connect() to avoid running with outdated state, but of_phy_connect() returns the phy_device pointer, which is needed to register the callback. Registering it before of_phy_connect() will therefore require a hack to get the pointer earlier. Overall, this addition makes the subsequent patch that implements SGMII link status for mvneta, much cleaner. CC: Florian Fainelli <f.fainelli@gmail.com> CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-03	ebpf: add skb->priority to offset map for usage in {cls, act}_bpf	Daniel Borkmann	1	-0/+1
	This adds the ability to read out the skb->priority from an eBPF program, so that it can be taken into account from a tc filter or action for the use-case where the priority is not being used to directly override the filter classification in a qdisc, but to tag traffic otherwise for the classifier; the priority can be assigned from various places incl. user space, in future we may also mangle it from an eBPF program. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-03	jhash: Update jhash_[321]words functions to use correct initval	Alexander Duyck	1	-6/+11
	Looking over the implementation for jhash2 and comparing it to jhash_3words I realized that the two hashes were in fact very different. Doing a bit of digging led me to "The new jhash implementation" in which lookup2 was supposed to have been replaced with lookup3. In reviewing the patch I noticed that jhash2 had originally initialized a and b to JHASH_GOLDENRATIO and c to initval, but after the patch a, b, and c were initialized to initval + (length << 2) + JHASH_INITVAL. However the changes in jhash_3words simply replaced the initialization of a and b with JHASH_INITVAL. This change corrects what I believe was an oversight so that a, b, and c in jhash_3words all have the same value added consisting of initval + (length << 2) + JHASH_INITVAL so that jhash2 and jhash_3words will now produce the same hash result given the same inputs. Fixes: 60d509c823cca ("The new jhash implementation") Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>