linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-12-23	neigh: Send netevent after marking neigh as dead	Ido Schimmel	1	-0/+1
	neigh_cleanup_and_release() is always called after marking a neighbour as dead, but it only notifies user space and not in-kernel listeners of the netevent notification chain. This can cause multiple problems. In my specific use case, it causes the listener (a switch driver capable of L3 offloads) to believe a neighbour entry is still valid, and is thus erroneously kept in the device's table. Fix that by sending a netevent after marking the neighbour as dead. Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23	ipv6: handle -EFAULT from skb_copy_bits	Dave Jones	1	-1/+5
	By setting certain socket options on ipv6 raw sockets, we can confuse the length calculation in rawv6_push_pending_frames triggering a BUG_ON. RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40 RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282 RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002 RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00 RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009 R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030 R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80 Call Trace: [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170 [<ffffffff816da27e>] SyS_sendto+0xe/0x10 [<ffffffff81002910>] do_syscall_64+0x50/0xa0 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25 Handle by jumping to the failure path if skb_copy_bits gets an EFAULT. Reproducer: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #define LEN 504 int main(int argc, char* argv[]) { int fd; int zero = 0; char buf[LEN]; memset(buf, 0, LEN); fd = socket(AF_INET6, SOCK_RAW, 7); setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4); setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN); sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110); } Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23	inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets	Willem de Bruijn	2	-2/+2
	Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within the packet. For sockets that have transport headers pulled, transport offset can be negative. Use signed comparison to avoid overflow. Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Reported-by: Nisar Jagabar <njagabar@cloudmark.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23	net/sched: cls_flower: Mandate mask when matching on flags	Or Gerlitz	1	-11/+12
	When matching on flags, we should require the user to provide the mask and avoid using an all-ones mask. Not doing so causes matching on flags provided w.o mask to hit on the value being unset for all flags, which may not what the user wanted to happen. Fixes: faa3ffce7829 ('net/sched: cls_flower: Add support for matching on flags') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23	net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6	Or Gerlitz	1	-2/+2
	The UDP dst port was provided to the helper function which sets the IPv6 IP tunnel meta-data under a wrong param order, fix that. Fixes: 75bfbca01e48 ('net/sched: act_tunnel_key: Add UDP dst port option') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23	stmmac: CSR clock configuration fix	jpinto	3	-6/+6
	When testing stmmac with my QoS reference design I checked a problem in the CSR clock configuration that was impossibilitating the phy discovery, since every read operation returned 0x0000ffff. This patch fixes the issue. Signed-off-by: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-22	sg_write()/bsg_write() is not fit to be called under KERNEL_DS	Al Viro	2	-0/+6
	Both damn things interpret userland pointers embedded into the payload; worse, they are actually traversing those. Leaving aside the bad API design, this is very much _not_ safe to call with KERNEL_DS. Bail out early if that happens. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22	ufs: fix function declaration for ufs_truncate_blocks	Jeff Layton	1	-1/+1
	sparse says: fs/ufs/inode.c:1195:6: warning: symbol 'ufs_truncate_blocks' was not declared. Should it be static? Note that the forward declaration in the file is already marked static. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22	fs: exec: apply CLOEXEC before changing dumpable task flags	Aleksa Sarai	1	-2/+8
	If you have a process that has set itself to be non-dumpable, and it then undergoes exec(2), any CLOEXEC file descriptors it has open are "exposed" during a race window between the dumpable flags of the process being reset for exec(2) and CLOEXEC being applied to the file descriptors. This can be exploited by a process by attempting to access /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE. The race in question is after set_dumpable has been (for get_link, though the trace is basically the same for readlink): [vfs] -> proc_pid_link_inode_operations.get_link -> proc_pid_get_link -> proc_fd_access_allowed -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); Which will return 0, during the race window and CLOEXEC file descriptors will still be open during this window because do_close_on_exec has not been called yet. As a result, the ordering of these calls should be reversed to avoid this race window. This is of particular concern to container runtimes, where joining a PID namespace with file descriptors referring to the host filesystem can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect against access of CLOEXEC file descriptors -- file descriptors which may reference filesystem objects the container shouldn't have access to). Cc: dev@opencontainers.org Cc: <stable@vger.kernel.org> # v3.2+ Reported-by: Michael Crosby <crosbymichael@gmail.com> Signed-off-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22	seq_file: reset iterator to first record for zero offset	Tomasz Majchrzak	1	-0/+7
	If kernfs file is empty on a first read, successive read operations using the same file descriptor will return no data, even when data is available. Default kernfs 'seq_next' implementation advances iterator position even when next object is not there. Kernfs 'seq_start' for following requests will not return iterator as position is already on the second object. This defect doesn't allow to monitor badblocks sysfs files from MD raid. They are initially empty but if data appears at some stage, userspace is not able to read it. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22	vfs: fix isize/pos/len checks for reflink & dedupe	Darrick J. Wong	3	-9/+13
	Strengthen the checking of pos/len vs. i_size, clarify the return values for the clone prep function, and remove pointless code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22	[iov_iter] fix iterate_all_kinds() on empty iterators	Al Viro	1	-29/+26
	Problem similar to ones dealt with in "fold checks into iterate_and_advance()" and followups, except that in this case we really want to do nothing when asked for zero-length operation - unlike zero-length iterate_and_advance(), zero-length iterate_all_kinds() has no side effects, and callers are simpler that way. That got exposed when copy_from_iter_full() had been used by tipc, which builds an msghdr with zero payload and (now) feeds it to a primitive based on iterate_all_kinds() instead of iterate_and_advance(). Reported-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22	move aio compat to fs/aio.c	Al Viro	4	-82/+98
	... and fix the minor buglet in compat io_submit() - native one kills ioctx as cleanup when put_user() fails. Get rid of bogus compat_... in !CONFIG_AIO case, while we are at it - they should simply fail with ENOSYS, same as for native counterparts. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22	perf sched timehist: Fix invalid period calculation	Namhyung Kim	1	-1/+1
	When --time option is given with a value outside recorded time, the last sample time (tprev) was set to that value and run time calculation might be incorrect. This is a problem of the first samples for each cpus since it would skip the runtime update when tprev is 0. But with --time option it had non-zero (which is invalid) value so the calculation is also incorrect. For example, let's see the followging: $ perf sched timehist time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 3195.968367 [0003] <idle> 0.000 0.000 0.000 3195.968386 [0002] Timer[4306/4277] 0.000 0.000 0.018 3195.968397 [0002] Web Content[4277] 0.000 0.000 0.000 3195.968595 [0001] JS Helper[4302/4277] 0.000 0.000 0.000 3195.969217 [0000] <idle> 0.000 0.000 0.621 3195.969251 [0001] kworker/1:1H[291] 0.000 0.000 0.033 The sample starts at 3195.968367 but when I gave a time interval from 3194 to 3196 (in sec) it will calculate the whole 2 second as runtime. In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as idle time. Before: $ perf sched timehist --time 3194,3196 -s \| tail Idle stats: CPU 0 idle for 1995.991 msec CPU 1 idle for 20.793 msec CPU 2 idle for 30.191 msec CPU 3 idle for 1999.852 msec Total number of unique tasks: 23 Total number of context switches: 128 Total run time (msec): 3724.940 After: $ perf sched timehist --time 3194,3196 -s \| tail Idle stats: CPU 0 idle for 10.811 msec CPU 1 idle for 20.793 msec CPU 2 idle for 30.191 msec CPU 3 idle for 18.337 msec Total number of unique tasks: 23 Total number of context switches: 128 Total run time (msec): 18.139 Committer notes: Further testing: Before: Idle stats: CPU 0 idle for 229.785 msec CPU 1 idle for 937.944 msec CPU 2 idle for 188.931 msec CPU 3 idle for 986.185 msec After: # perf sched timehist --time 40602,40603 -s \| tail Idle stats: CPU 0 idle for 229.785 msec CPU 1 idle for 175.407 msec CPU 2 idle for 188.931 msec CPU 3 idle for 223.657 msec Total number of unique tasks: 68 Total number of context switches: 814 Total run time (msec): 97.688 # for cpu in `seq 0 3` ; do echo -n "CPU $cpu idle for " ; perf sched timehist --time 40602,40603 \| grep "\[000${cpu}\].*\<idle\>" \| tr -s ' ' \| cut -d' ' -f7 \| awk '{entries++ ; s+=$1} END {print s " msec (entries: " entries ")"}' ; done CPU 0 idle for 229.721 msec (entries: 123) CPU 1 idle for 175.381 msec (entries: 65) CPU 2 idle for 188.903 msec (entries: 56) CPU 3 idle for 223.61 msec (entries: 102) Difference due to the idle stats being accounted at nanoseconds precision while the <idle> entries in 'perf sched timehist' are trucated at msec.usec. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 853b74071110 ("perf sched timehist: Add option to specify time window of interest") Link: http://lkml.kernel.org/r/20161222060350.17655-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-22	perf sched timehist: Remove hardcoded 'comm_width' check at print_summary	Namhyung Kim	1	-3/+0
	Now that the default 'comm_width' value is 30, no need to check that at print_summary, Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-22	perf sched timehist: Enlarge default 'comm_width'	Namhyung Kim	1	-1/+1
	Current default value is 20 but it's easily changed to a bigger value as task has a long name and different tid and pid. And it makes the output not aligned. So change it to have a large value as summary shows. Committer notes: Before: # perf sched record ^C # perf sched timehist <SNIP> 40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020 40602.771512 [0003] <idle> 0.003 0.000 0.986 40602.771586 [0001] <idle> 0.020 0.000 1.049 40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020 40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116 40602.771776 [0000] <idle> 0.001 0.000 1.892 <SNIP> After: # perf sched timehist <SNIP> 40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020 40602.771512 [0003] <idle> 0.003 0.000 0.986 40602.771586 [0001] <idle> 0.020 0.000 1.049 40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020 40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116 <SNIP> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-22	perf sched timehist: Honour 'comm_width' when aligning the headers	Namhyung Kim	1	-3/+4
	Current default value is 20, but that may change in the future, so make places where we have 20 hardcoded use 'comm_width'. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-22	MAINTAINERS: Update Jacek Anaszewski's email address	Jacek Anaszewski	1	-2/+2
	My previous email address is no longer valid. From now on, jacek.anaszewski@gmail.com should be used instead. Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
2016-12-22	perf/x86: Fix overlap counter scheduling bug	Peter Zijlstra	1	-1/+1
	Jiri reported the overlap scheduling exceeding its max stack. Looking at the constraint that triggered this, it turns out the overlap marker isn't needed. The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if the counter mask of such an event is not a subset of any other counter mask of a constraint with an equal or higher weight". Esp. that latter part is of interest here I think, our overlapping mask is 0x0e, that has 3 bits set and is the highest weight mask in on the PMU, therefore it will be placed last. Can we still create a scenario where we would need to rewind that? The scenario for AMD Fam15h is we're having masks like: 0x3F -- 111111 0x38 -- 111000 0x07 -- 000111 0x09 -- 001001 And we mark 0x09 as overlapping, because it is not a direct subset of 0x38 or 0x07 and has less weight than either of those. This means we'll first try and place the 0x09 event, then try and place 0x38/0x07 events. Now imagine we have: 3 * 0x07 + 0x09 and the initial pick for the 0x09 event is counter 0, then we'll fail to place all 0x07 events. So we'll pop back, try counter 4 for the 0x09 event, and then re-try all 0x07 events, which will now work. The masks on the PMU in question are: 0x01 - 0001 0x03 - 0011 0x0e - 1110 0x0c - 1100 But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 -> 0x1) are of heavier weight, it should all work out. Reported-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Liang Kan <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-22	perf/x86/pebs: Fix handling of PEBS buffer overflows	Stephane Eranian	1	-9/+21
	This patch solves a race condition between PEBS and the PMU handler. In case multiple PEBS events are sampled at the same time, it is possible to have GLOBAL_STATUS bit 62 set indicating PEBS buffer overflow and also seeing at most 3 PEBS counters having their bits set in the status register. This is a sign that there was at least one PEBS record pending at the time of the PMU interrupt. PEBS counters must only be processed via the drain_pebs() calls, and not via the regular sample processing loop coming after that the function, otherwise phony regular samples may be generated in the sampling buffer not marked with the EXACT tag. Another possibility is to have one PEBS event and at least one non-PEBS event whic hoverflows while PEBS has armed. In this case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow status bit for the PEBS counter will be on Skylake. To avoid this problem, we systematically ignore the PEBS-enabled counters from the GLOBAL_STATUS mask and we always process PEBS events via drain_pebs(). The problem manifested itself by having non-exact samples when sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would not have the EXACT flag set. Note that this problem is only present on Skylake processor. This fix is harmless on older processors. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-22	x86/paravirt: Mark unused patch_default label	Peter Zijlstra	2	-2/+2
	A bugfix commit: 45dbea5f55c0 ("x86/paravirt: Fix native_patch()") ... introduced a harmless warning: arch/x86/kernel/paravirt_patch_32.c: In function 'native_patch': arch/x86/kernel/paravirt_patch_32.c:71:1: error: label 'patch_default' defined but not used [-Werror=unused-label] Fix it by annotating the label as __maybe_unused. Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Piotr Gregor <piotrgregor@rsyncme.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 45dbea5f55c0 ("x86/paravirt: Fix native_patch()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-22	IB/rxe: Don't check for null ptr in send()	Andrew Boyer	1	-2/+1
	pkt->qp was already dereferenced earlier in the function. Fixes Smatch complaint: drivers/infiniband/sw/rxe/rxe_net.c:458 send() warn: variable dereferenced before check 'pkt->qp' (see line 441) Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	IB/rxe: Drop future atomic/read packets rather than retrying	Andrew Boyer	1	-1/+1
	If the completer is in the middle of a large read operation, one lost packet can cause havoc. Going to COMPST_ERROR_RETRY will cause the requester to resend the request. After that, any packet from the first attempt still in the receive queue will be interpreted as an error, restarting the error/retry sequence. The transfer will quickly exhaust its retries. This behavior is very noticeable when doing 512KB reads on a QEMU system configured with 1500B MTU. Also, a resent request here will prompt the responder on the other side to immediately start resending, but the resent packets will get stuck in the already-loaded receive queue and will never be processed. Rather than erroring out every time an unexpected future packet arrives, just drop it. Eventually the retry timer will send a duplicate request; the completer will be able to make progress since the queue will start relatively empty. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sends	Andrew Boyer	1	-1/+2
	Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	qedr: Always notify the verb consumer of flushed CQEs	Amrani, Ram	1	-1/+1
	Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	qedr: clear the vendor error field in the work completion	Amrani, Ram	1	-0/+3
	We clear the vendor error field in the work completion so that if a work completion is erroneous the field won't confuse the caller. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	qedr: post_send/recv according to QP state	Amrani, Ram	1	-4/+4
	Enable posting to SQ only in RTS, ERR and SQD QP state. Enable posting to RQ in ERR QP state. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	qedr: ignore inline flag in read verbs	Amrani, Ram	1	-1/+3
	In the current implementation a read verb with IB_SEND_INLINE may be illegally configured. In this fix we ignore the inline bit in the case of a read verb. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	qedr: modify QP state to error when destroying it	Amrani, Ram	1	-2/+4
	Current code didn't modify the QP state to error because it queried the QP state as a bitmap while it isn't. So the code never got executed. This patch fixes this and queries for each QP state respectively and not at once via a bitmask. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	qedr: return correct value on modify qp	Amrani, Ram	1	-1/+1
	Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	qedr: return error if destroy CQ failed	Amrani, Ram	1	-1/+6
	Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	qedr: configure the number of CQEs on CQ creation	Amrani, Ram	1	-0/+3
	Configure ibcq->cqe when a CQ is created. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	i40iw: Set 128B as the only supported RQ WQE size	Chien Tin Tung	8	-25/+53
	RQ WQE size other than 128B is not supported. Correct RQ size calculation to use 128B only. Since this breaks ABI, add additional code to provide compatibility with v4 user provider, libi40iw. Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	IB/cma: Fix a race condition in iboe_addr_get_sgid()	Bart Van Assche	1	-2/+4
	Code that dereferences the struct net_device ip_ptr member must be protected with an in_dev_get() / in_dev_put() pair. Hence insert calls to these functions. Fixes: commit 7b85627b9f02 ("IB/cma: IBoE (RoCE) IP-based GID addressing") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Roland Dreier <roland@purestorage.com> Cc: <stable@vger.kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22	net: ipv4: Don't crash if passing a null sk to ip_do_redirect.	Lorenzo Colitti	1	-1/+2
	Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.") made ip_do_redirect call sock_net(sk) to determine the network namespace of the passed-in socket. This crashes if sk is NULL. Fix this by getting the network namespace from the skb instead. Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.") Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-22	befs: add NFS export support	Luis de Bethencourt	1	-0/+38
	Implement mandatory export_operations, so it is possible to export befs via nfs. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22	befs: remove trailing whitespaces	Luis de Bethencourt	7	-48/+47
	Removing all trailing whitespaces in befs. I was skeptic about tainting the history with this, but whitespace changes can be ignored by using 'git blame -w' and 'git log -w'. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22	befs: remove signatures from comments	Luis de Bethencourt	3	-7/+1
	No idea why some comments have signatures. These predate git. Removing them since they add noise and no information. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22	befs: fix style issues in header files	Luis de Bethencourt	6	-17/+12
	Fixing checkpatch.pl issues in befs header files: WARNING: Missing a blank line after declarations + befs_inode_addr iaddr; + iaddr.allocation_group = blockno >> BEFS_SB(sb)->ag_shift; WARNING: space prohibited between function name and open parenthesis '(' + return BEFS_SB(sb)->block_size / sizeof (befs_disk_inode_addr); ERROR: "foo * bar" should be "foo bar" + const char key, befs_off_t * value); ERROR: Macros with complex values should be enclosed in parentheses +#define PACKED __attribute__ ((__packed__)) Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22	befs: fix style issues in linuxvfs.c	Luis de Bethencourt	1	-23/+26
	Fix the following type of checkpatch.pl issues: WARNING: line over 80 characters +static struct dentry befs_lookup(struct inode , struct dentry , unsigned int); ERROR: code indent should use tabs where possible + if (!bi)$ WARNING: please, no spaces at the start of a line + if (!bi)$ WARNING: labels should not be indented + unacquire_bh: WARNING: space prohibited between function name and open parenthesis '(' + sizeof (struct befs_inode_info), WARNING: braces {} are not necessary for single statement blocks + if (!out) { + return -ENOMEM; + } WARNING: Block comments use a trailing / on a separate line + in special cases / WARNING: Missing a blank line after declarations + int token; + if (!p) ERROR: do not use assignment in if condition + if (!(bh = sb_bread(sb, sb_block))) { ERROR: space prohibited after that open parenthesis '(' + if( befs_sb->num_blocks > ~((sector_t)0) ) { ERROR: space prohibited before that close parenthesis ')' + if( befs_sb->num_blocks > ~((sector_t)0) ) { ERROR: space required before the open parenthesis '(' + if( befs_sb->num_blocks > ~((sector_t)0) ) { Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22	befs: fix typos in linuxvfs.c	Luis de Bethencourt	1	-8/+6
	Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22	befs: fix style issues in io.c	Luis de Bethencourt	1	-2/+2
	Fixing the two following checkpatch.pl issues: ERROR: trailing whitespace + * Based on portions of file.c and inode.c $ WARNING: labels should not be indented + error: Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22	befs: fix style issues in inode.c	Luis de Bethencourt	1	-6/+6
	Fixing the following checkpatch.pl errors and warning: ERROR: trailing whitespace + * $ WARNING: Block comments use * on subsequent lines +/* + Validates the correctness of the befs inode ERROR: "foo * bar" should be "foo bar" +befs_check_inode(struct super_block sb, befs_inode * raw_inode, Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-22	befs: fix style issues in debug.c	Luis de Bethencourt	1	-6/+8
	Fix all checkpatch.pl errors and warnings in debug.c: ERROR: trailing whitespace + * $ WARNING: Missing a blank line after declarations + va_list args; + va_start(args, fmt); ERROR: "foo * bar" should be "foo bar" +befs_dump_inode(const struct super_block sb, befs_inode * inode) ERROR: "foo * bar" should be "foo bar" +befs_dump_super_block(const struct super_block sb, befs_super_block * sup) ERROR: "foo * bar" should be "foo bar" +befs_dump_small_data(const struct super_block sb, befs_small_data * sd) WARNING: line over 80 characters +befs_dump_index_entry(const struct super_block sb, befs_disk_btree_super super) ERROR: "foo * bar" should be "foo bar" +befs_dump_index_entry(const struct super_block sb, befs_disk_btree_super * super) ERROR: "foo * bar" should be "foo bar" +befs_dump_index_node(const struct super_block sb, befs_btree_nodehead * node) Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2016-12-21	CREDITS: Remove outdated address information	Gertjan van Wingerde	1	-2/+0
	This address hasn't been accurate for several years now. Simply remove it. Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-21	net: fddi: skfp: use %p format specifier for addresses rather than %x	Colin Ian King	3	-8/+8
	Trivial fix: Addresses should be printed using the %p format specifier rather than using %x. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-21	tcp: add a missing barrier in tcp_tasklet_func()	Eric Dumazet	1	-0/+1
	Madalin reported crashes happening in tcp_tasklet_func() on powerpc64 Before TSQ_QUEUED bit is cleared, we must ensure the changes done by list_del(&tp->tsq_node); are committed to memory, otherwise corruption might happen, as an other cpu could catch TSQ_QUEUED clearance too soon. We can notice that old kernels were immune to this bug, because TSQ_QUEUED was cleared after a bh_lock_sock(sk)/bh_unlock_sock(sk) section, but they could have missed a kick to write additional bytes, when NIC interrupts for a given flow are spread to multiple cpus. Affected TCP flows would need an incoming ACK or RTO timer to add more packets to the pipe. So overall situation should be better now. Fixes: b223feb9de2a ("tcp: tsq: add shortcut in tcp_tasklet_func()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Madalin Bucur <madalin.bucur@nxp.com> Tested-by: Madalin Bucur <madalin.bucur@nxp.com> Tested-by: Xing Lei <xing.lei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-21	splice: reinstate SIGPIPE/EPIPE handling	Linus Torvalds	1	-2/+7
	Commit 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()") caused a regression when there were no more readers left on a pipe that was being spliced into: rather than the expected SIGPIPE and -EPIPE return value, the writer would end up waiting forever for space to free up (which obviously was not going to happen with no readers around). Fixes: 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()") Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org> Debugged-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org # v4.9 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-21	net: mvpp2: fix dma unmapping of TX buffers for fragments	Thomas Petazzoni	1	-29/+30
	Since commit 71ce391dfb784 ("net: mvpp2: enable proper per-CPU TX buffers unmapping"), we are not correctly DMA unmapping TX buffers for fragments. Indeed, the mvpp2_txq_inc_put() function only stores in the txq_cpu->tx_buffs[] array the physical address of the buffer to be DMA-unmapped when skb != NULL. In addition, when DMA-unmapping, we use skb_headlen(skb) to get the size to be unmapped. Both of this works fine for TX descriptors that are associated directly to a SKB, but not the ones that are used for fragments, with a NULL pointer as skb: - We have a NULL physical address when calling DMA unmap - skb_headlen(skb) crashes because skb is NULL This causes random crashes when fragments are used. To solve this problem, we need to: - Store the physical address of the buffer to be unmapped unconditionally, regardless of whether it is tied to a SKB or not. - Store the length of the buffer to be unmapped, which requires a new field. Instead of adding a third array to store the length of the buffer to be unmapped, and as suggested by David Miller, this commit refactors the tx_buffs[] and tx_skb[] arrays of 'struct mvpp2_txq_pcpu' into a separate structure 'mvpp2_txq_pcpu_buf', to which a 'size' field is added. Therefore, instead of having three arrays to allocate/free, we have a single one, which also improve data locality, reducing the impact on the CPU cache. Fixes: 71ce391dfb784 ("net: mvpp2: enable proper per-CPU TX buffers unmapping") Reported-by: Raphael G <raphael.glon@corp.ovh.com> Cc: Raphael G <raphael.glon@corp.ovh.com> Cc: stable@vger.kernel.org Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-21	net: ethernet: stmmac: dwmac-rk: make clk enablement first in powerup	Heiko Stübner	1	-4/+4
	Right now the dwmac-rk tries to set up the GRF-specific speed and link options before enabling clocks, phys etc and on previous socs this works because the GRF is supplied on the whole by one clock. On the rk3399 however the GRF (General Register Files) clock-supply has been split into multiple clocks and while there is no specific grf-gmac clock like for other sub-blocks, it seems the mac-specific portions are actually supplied by the general mac clock. This results in hangs on rk3399 boards if the driver is build as module. When built in te problem of course doesn't surface, as the clocks are of course still on at the stage before clock_disable_unused. To solve this, simply move the clock enablement to the first position in the powerup callback. This is also a good idea in general to enable clocks before everything else. Tested on rk3288, rk3368 and rk3399 the dwmac still works on all of them. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: David S. Miller <davem@davemloft.net>