aboutsummaryrefslogtreecommitdiffstats
path: root/ipc (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-11-27race of nfsd inetaddr notifiers vs nn->nfsd_serv changeVasily Averin3-3/+17
nfsd_inet[6]addr_event uses nn->nfsd_serv without taking nfsd_mutex, which can be changed during execution of notifiers and crash the host. Moreover if notifiers were enabled in one net namespace they are enabled in all other net namespaces, from creation until destruction. This patch allows notifiers to access nn->nfsd_serv only after the pointer is correctly initialized and delays cleanup until notifiers are no longer in use. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Tested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27race of lockd inetaddr notifiers vs nlmsvc_rqst changeVasily Averin1-2/+14
lockd_inet[6]addr_event use nlmsvc_rqst without taken nlmsvc_mutex, nlmsvc_rqst can be changed during execution of notifiers and crash the host. Patch enables access to nlmsvc_rqst only when it was correctly initialized and delays its cleanup until notifiers are no longer in use. Note that nlmsvc_rqst can be temporally set to ERR_PTR, so the "if (nlmsvc_rqst)" check in notifiers is insufficient on its own. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Tested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27SUNRPC: make cache_detail structures constBhumika Goyal2-4/+4
Make these const as they are only getting passed to the function cache_create_net having the argument as const. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27NFSD: make cache_detail structures constBhumika Goyal2-4/+4
Make these const as they are only getting passed to the function cache_create_net having the argument as const. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27sunrpc: make the function arg as constBhumika Goyal2-2/+2
Make the struct cache_detail *tmpl argument of the function cache_create_net as const as it is only getting passed to kmemup having the argument as const void *. Add const to the prototype too. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: check for use of the closed special stateidAndrew Elble1-2/+5
Prevent the use of the closed (invalid) special stateid by clients. Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: fix panic in posix_unblock_lock called from nfs4_laundromatNaofumi Honda1-2/+2
From kernel 4.9, my two nfsv4 servers sometimes suffer from "panic: unable to handle kernel page request" in posix_unblock_lock() called from nfs4_laundromat(). These panics diseappear if we revert the commit "nfsd: add a LRU list for blocked locks". The cause appears to be a typo in nfs4_laundromat(), which is also present in nfs4_state_shutdown_net(). Cc: stable@vger.kernel.org Fixes: 7919d0a27f1e "nfsd: add a LRU list for blocked locks" Cc: jlayton@redhat.com Reveiwed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27lockd: lost rollback of set_grace_period() in lockd_down_net()Vasily Averin1-0/+2
Commit efda760fe95ea ("lockd: fix lockd shutdown race") is incorrect, it removes lockd_manager and disarm grace_period_end for init_net only. If nfsd was started from another net namespace lockd_up_net() calls set_grace_period() that adds lockd_manager into per-netns list and queues grace_period_end delayed work. These action should be reverted in lockd_down_net(). Otherwise it can lead to double list_add on after restart nfsd in netns, and to use-after-free if non-disarmed delayed work will be executed after netns destroy. Fixes: efda760fe95e ("lockd: fix lockd shutdown race") Cc: stable@vger.kernel.org Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27lockd: added cleanup checks in exit_net hookVasily Averin1-0/+11
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27grace: replace BUG_ON by WARN_ONCE in exit_net hookVasily Averin1-1/+3
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex classAndrew Elble1-3/+8
The use of the st_mutex has been confusing the validator. Use the proper nested notation so as to not produce warnings. Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27lockd: remove net pointer from messagesVasily Averin4-14/+21
Publishing of net pointer is not safe, use net->ns.inum as net ID in debug messages [ 171.757678] lockd_up_net: per-net data created; net=f00001e7 [ 171.767188] NFSD: starting 90-second grace period (net f00001e7) [ 300.653313] lockd: nuking all hosts in net f00001e7... [ 300.653641] lockd: host garbage collection for net f00001e7 [ 300.653968] lockd: nlmsvc_mark_resources for net f00001e7 [ 300.711483] lockd_down_net: per-net data destroyed; net=f00001e7 [ 300.711847] lockd: nuking all hosts in net 0... [ 300.711847] lockd: host garbage collection for net 0 [ 300.711848] lockd: nlmsvc_mark_resources for net 0 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: remove net pointer from debug messagesVasily Averin1-3/+3
Publishing of net pointer is not safe, replace it in debug meesages by net->ns.inum [ 119.989161] nfsd: initializing export module (net: f00001e7). [ 171.767188] NFSD: starting 90-second grace period (net f00001e7) [ 322.185240] nfsd: shutting down export module (net: f00001e7). [ 322.186062] nfsd: export shutdown complete (net: f00001e7). Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: Fix races with check_stateid_generation()Trond Myklebust1-3/+19
The various functions that call check_stateid_generation() in order to compare a client-supplied stateid with the nfs4_stid state, usually need to atomically check for closed state. Those that perform the check after locking the st_mutex using nfsd4_lock_ol_stateid() should now be OK, but we do want to fix up the others. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: Ensure we check stateid validity in the seqid operation checksTrond Myklebust1-9/+3
After taking the stateid st_mutex, we want to know that the stateid still represents valid state before performing any non-idempotent actions. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: Fix race in lock stateid creationTrond Myklebust1-28/+42
If we're looking up a new lock state, and the creation fails, then we want to unhash it, just like we do for OPEN. However in order to do so, we need to that no other LOCK requests can grab the mutex until we have unhashed it (and marked it as closed). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd4: move find_lock_stateidTrond Myklebust1-19/+19
Trivial cleanup to simplify following patch. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: Ensure we don't recognise lock stateids after freeing themTrond Myklebust1-11/+8
In order to deal with lookup races, nfsd4_free_lock_stateid() needs to be able to signal to other stateful functions that the lock stateid is no longer valid. Right now, nfsd_lock() will check whether or not an existing stateid is still hashed, but only in the "new lock" path. To ensure the stateid invalidation is also recognised by the "existing lock" path, and also by a second call to nfsd4_free_lock_stateid() itself, we can change the type to NFS4_CLOSED_STID under the stp->st_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)Trond Myklebust1-0/+8
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: Fix another OPEN stateid raceTrond Myklebust1-15/+13
If nfsd4_process_open2() is initialising a new stateid, and yet the call to nfs4_get_vfs_file() fails for some reason, then we must declare the stateid closed, and unhash it before dropping the mutex. Right now, we unhash the stateid after dropping the mutex, and without changing the stateid type, meaning that another OPEN could theoretically look it up and attempt to use it. Reported-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27nfsd: Fix stateid races between OPEN and CLOSETrond Myklebust1-8/+59
Open file stateids can linger on the nfs4_file list of stateids even after they have been closed. In order to avoid reusing such a stateid, and confusing the client, we need to recheck the nfs4_stid's type after taking the mutex. Otherwise, we risk reusing an old stateid that was already closed, which will confuse clients that expect new stateids to conform to RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-26Linux 4.15-rc1Linus Torvalds1-2/+2
2017-11-26ARM: BUG if jumping to usermode address in kernel modeRussell King2-0/+24
Detect if we are returning to usermode via the normal kernel exit paths but the saved PSR value indicates that we are in kernel mode. This could occur due to corrupted stack state, which has been observed with "ftracetest". This ensures that we catch the problem case before we get to user code. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-11-24m68k/macboing: Fix missed timer callback assignmentKees Cook1-2/+2
This fixes a missed function prototype callback from the timer conversions. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20171123221902.GA75727@beast
2017-11-24afs: remove redundant assignment of dvnode to itselfColin Ian King1-1/+1
The assignment of dvnode to itself is redundant and can be removed. Cleans up warning detected by cppcheck: fs/afs/dir.c:975: (warning) Redundant assignment of 'dvnode' to itself. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24afs: cell: Remove unnecessary code in afs_lookup_cellGustavo A. R. Silva1-6/+1
Due to recent changes this piece of code is no longer needed. Addresses-Coverity-ID: 1462033 Link: https://lkml.kernel.org/r/4923.1510957307@warthog.procyon.org.uk Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24afs: Fix signal handling in some file opsDavid Howells1-0/+8
afs_mkdir(), afs_create(), afs_link() and afs_symlink() all need to drop the target dentry if a signal causes the operation to be killed immediately before we try to contact the server. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24afs: Fix some dentry handling in dir ops and missing key_putsDavid Howells1-10/+5
Fix some of dentry handling in AFS directory ops: (1) Do d_drop() on the new_dentry before assigning a new inode to it in afs_vnode_new_inode(). It's fine to do this before calling afs_iget() because the operation has taken place on the server. (2) Replace d_instantiate()/d_rehash() with d_add(). (3) Don't d_drop() the new_dentry in afs_rename() on error. Also fix afs_link() and afs_rename() to call key_put() on all error paths where the key is taken. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24afs: Make afs_write_begin() avoid writing to a page that's being storedDavid Howells1-0/+5
Make afs_write_begin() wait for a page that's marked PG_writeback because: (1) We need to avoid interference with the data being stored so that the data on the server ends up in a defined state. (2) page->private is used to track the window of dirty data within a page, but it's also used by the storage code to track what's being written, being cleared by the completion notification. Ownership can't be relinquished by the storage code until completion because it a store fails, the data must be remarked dirty. Tracing shows something like the following (edited): x86_64-linux-gn-15940 [1] afs_page_dirty: vn=ffff8800bef33800 9c75 begin 0-125 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 store+ 0-125 x86_64-linux-gn-15940 [1] afs_page_dirty: vn=ffff8800bef33800 9c75 begin 0-2052 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 clear 0-2052 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 store 0-0 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 WARN 0-0 The clear (completion) corresponding to the store+ (store continuation from a previous page) happens between the second begin (afs_write_begin) and the store corresponding to that. This results in the second store not seeing any data to write back, leading to the following warning: WARNING: CPU: 2 PID: 114 at ../fs/afs/write.c:403 afs_write_back_from_locked_page+0x19d/0x76c [kafs] Modules linked in: kafs(E) CPU: 2 PID: 114 Comm: kworker/u8:3 Tainted: G E 4.14.0-fscache+ #242 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Workqueue: writeback wb_workfn (flush-afs-2) task: ffff8800cad72600 task.stack: ffff8800cad44000 RIP: 0010:afs_write_back_from_locked_page+0x19d/0x76c [kafs] RSP: 0018:ffff8800cad47aa0 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff8800bef33a20 RCX: 0000000000000000 RDX: 000000000000000f RSI: ffffffff81c5d0e0 RDI: ffff8800cad72e78 RBP: ffff8800d31ea1e8 R08: ffff8800c1358000 R09: ffff8800ca00e400 R10: ffff8800cad47a38 R11: ffff8800c5d9e400 R12: 0000000000000000 R13: ffffea0002d9df00 R14: ffffffffa0023c1c R15: 0000000000007fdf FS: 0000000000000000(0000) GS:ffff8800ca700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f85ac6c4000 CR3: 0000000001c10001 CR4: 00000000001606e0 Call Trace: ? clear_page_dirty_for_io+0x23a/0x267 afs_writepages_region+0x1be/0x286 [kafs] afs_writepages+0x60/0x127 [kafs] do_writepages+0x36/0x70 __writeback_single_inode+0x12f/0x635 writeback_sb_inodes+0x2cc/0x452 __writeback_inodes_wb+0x68/0x9f wb_writeback+0x208/0x470 ? wb_workfn+0x22b/0x565 wb_workfn+0x22b/0x565 ? worker_thread+0x230/0x2ac process_one_work+0x2cc/0x517 ? worker_thread+0x230/0x2ac worker_thread+0x1d4/0x2ac ? rescuer_thread+0x29b/0x29b kthread+0x15d/0x165 ? kthread_create_on_node+0x3f/0x3f ? call_usermodehelper_exec_async+0x118/0x11f ret_from_fork+0x24/0x30 Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24sched/debug: Fix task state recording/printoutThomas Gleixner1-3/+3
The recent conversion of the task state recording to use task_state_index() broke the sched_switch tracepoint task state output. task_state_index() returns surprisingly an index (0-7) which is then printed with __print_flags() applying bitmasks. Not really working and resulting in weird states like 'prev_state=t' instead of 'prev_state=I'. Use TASK_REPORT_MAX instead of TASK_STATE_MAX to report preemption. Build a bitmask from the return value of task_state_index() and store it in entry->prev_state, which makes __print_flags() work as expected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@vger.kernel.org Fixes: efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1711221304180.1751@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-24x86/decoder: Add new TEST instruction patternMasami Hiramatsu1-1/+1
The kbuild test robot reported this build warning: Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx) Warning: objdump says 3 bytes, but insn_get_length() says 2 Warning: decoded and checked 1569014 instructions with 1 warnings This sequence seems to be a new instruction not in the opcode map in the Intel SDM. The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8. Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of the ModR/M Byte (bits 2,1,0 in parenthesis)" In that table, opcodes listed by the index REG bits as: 000 001 010 011 100 101 110 111 TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX So, it seems TEST Ib is assigned to 001. Add the new pattern. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-23x86/PCI: Remove unused HyperTransport interrupt supportBjorn Helgaas11-454/+2
There are no in-tree callers of ht_create_irq(), the driver interface for HyperTransport interrupts, left. Remove the unused entry point and all the supporting code. See 8b955b0dddb3 ("[PATCH] Initial generic hypertransport interrupt support"). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-pci@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
2017-11-23x86/umip: Fix insn_get_code_seg_params()'s return valueBorislav Petkov3-4/+4
In order to save on redundant structs definitions insn_get_code_seg_params() was made to return two 4-bit values in a char but clang complains: arch/x86/lib/insn-eval.c:780:10: warning: implicit conversion from 'int' to 'char' changes value from 132 to -124 [-Wconstant-conversion] return INSN_CODE_SEG_PARAMS(4, 8); ~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~ ./arch/x86/include/asm/insn-eval.h:16:57: note: expanded from macro 'INSN_CODE_SEG_PARAMS' #define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4)) Those two values do get picked apart afterwards the opposite way of how they were ORed so wrt to the LSByte, the return value is the same. But this function returns -EINVAL in the error case, which is an int. So make it return an int which is the native word size anyway and thus fix the clang warning. Reported-by: Kees Cook <keescook@google.com> Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ricardo.neri-calderon@linux.intel.com Link: https://lkml.kernel.org/r/20171123091951.1462-1-bp@alien8.de
2017-11-23x86/boot/KASLR: Remove unused variableChao Fan1-3/+2
There are two variables "rc" in mem_avoid_memmap. One at the top of the function and another one inside the while() loop. Drop the outer one as it is unused. Cleanup some whitespace damage while at it. Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gregkh@linuxfoundation.org Cc: n-horiguchi@ah.jp.nec.com Cc: keescook@chromium.org Link: https://lkml.kernel.org/r/20171123090847.15293-1-fanc.fnst@cn.fujitsu.com
2017-11-23genirq/matrix: Make - vs ?: Precedence explicitKees Cook1-1/+1
Noticed with a Clang build. This improves the readability of the ?: expression, as it has lower precedence than the - expression. Show explicitly that - is evaluated first. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20171122205645.GA27125@beast
2017-11-23irqchip/imgpdc: Use resource_size function on resource objectVasyl Gomonovych1-1/+1
drivers/irqchip/irq-imgpdc.c:327:20-23: WARNING: Suspicious code. resource_size is maybe missing with res_regs Generated by: scripts/coccinelle/api/resource_size.cocci Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: marc.zyngier@arm.com Cc: jason@lakedaemon.net Link: https://lkml.kernel.org/r/1511215361-8279-1-git-send-email-gomonovych@gmail.com
2017-11-23irqchip/qcom: Fix u32 comparison with value less than zeroColin Ian King1-1/+1
The comparison of u32 nregs being less than zero is never true since nregs is unsigned. Fix this by making nregs a signed integer. Fixes: f20cc9b00c7b ("irqchip/qcom: Add IRQ combiner driver") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kernel-janitors@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Link: https://lkml.kernel.org/r/20171117183553.2739-1-colin.king@canonical.com
2017-11-24ipvlan: Fix insufficient skb linear check for ipv6 icmpGao Feng1-1/+19
In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to make sure the skb header has enough linear room for ipv6 header. But it would use the latter memory directly without linear check when it is icmp. So it still may access the unepxected memory in ipvlan_addr_lookup. Now invoke the pskb_may_pull again if it is ipv6 icmp. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24ipvlan: Fix insufficient skb linear check for arpGao Feng1-8/+8
In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to make sure the skb header has enough linear room for arp header. But it would access the arp payload in func ipvlan_addr_lookup. So it still may access the unepxected memory. Now use arp_hdr_len(port->dev) instead of the arp header as the param. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6Hangbin Liu1-1/+15
Stefano pointed that configure or show UDP_ZERO_CSUM6_RX/TX info doesn't make sense if we haven't enabled CONFIG_IPV6. Fix it by adding if IS_ENABLED(CONFIG_IPV6) check. Fixes: abe492b4f50c ("geneve: UDP checksum configuration via netlink") Fixes: fd7eafd02121 ("geneve: fix fill_info when link down") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHYFlorian Fainelli1-1/+1
The PHY on BCM7278 has an additional bit that needs to be cleared: IDDQ_GLOBAL_PWR, without doing this, the PHY remains stuck in reset out of suspend/resume cycles. Fixes: 0fe9933804eb ("net: dsa: bcm_sf2: Add support for BCM7278 integrated switch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24net: accept UFO datagrams from tuntap and packetWillem de Bruijn15-14/+209
Tuntap and similar devices can inject GSO packets. Accept type VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively. Processes are expected to use feature negotiation such as TUNSETOFFLOAD to detect supported offload types and refrain from injecting other packets. This process breaks down with live migration: guest kernels do not renegotiate flags, so destination hosts need to expose all features that the source host does. Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677. This patch introduces nearly(*) no new code to simplify verification. It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP insertion and software UFO segmentation. It does not reinstate protocol stack support, hardware offload (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception of VIRTIO_NET_HDR_GSO_UDP packets in tuntap. To support SKB_GSO_UDP reappearing in the stack, also reinstate logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD by squashing in commit 939912216fa8 ("net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1 ("net: avoid skb_warn_bad_offload false positives on UFO"). (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id, ipv6_proxy_select_ident is changed to return a __be32 and this is assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted at the end of the enum to minimize code churn. Tested Booted a v4.13 guest kernel with QEMU. On a host kernel before this patch `ethtool -k eth0` shows UFO disabled. After the patch, it is enabled, same as on a v4.13 host kernel. A UFO packet sent from the guest appears on the tap device: host: nc -l -p -u 8000 & tcpdump -n -i tap0 guest: dd if=/dev/zero of=payload.txt bs=1 count=2000 nc -u 192.16.1.1 8000 < payload.txt Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds, packets arriving fragmented: ./with_tap_pair.sh ./tap_send_ufo tap0 tap1 (from https://github.com/wdebruij/kerneltools/tree/master/tests) Changes v1 -> v2 - simplified set_offload change (review comment) - documented test procedure Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com> Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.") Reported-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24net: realtek: r8169: implement set_link_ksettings()Tobias Jakobi1-16/+22
Commit 6fa1ba61520576cf1346c4ff09a056f2950cb3bf partially implemented the new ethtool API, by replacing get_settings() with get_link_ksettings(). This breaks ethtool, since the userspace tool (according to the new API specs) never tries the legacy set() call, when the new get() call succeeds. All attempts to chance some setting from userspace result in: > Cannot set new settings: Operation not supported Implement the missing set() call. Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24net: ipv6: Fixup device for anycast routes during copyDavid Ahern1-1/+1
Florian reported a breakage with anycast routes due to commit 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address"). Prior to this commit anycast routes were added against the loopback device causing repetitive route entries with no insight into why they existed. e.g.: $ ip -6 ro ls table local type anycast anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium anycast fe80:: dev lo proto kernel metric 0 pref medium anycast fe80:: dev lo proto kernel metric 0 pref medium The point of commit 4832c30d5458 is to add the routes using the device with the address which is causing the route to be added. e.g.,: $ ip -6 ro ls table local type anycast anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium anycast fe80:: dev eth2 proto kernel metric 0 pref medium anycast fe80:: dev eth1 proto kernel metric 0 pref medium For traffic to work as it did before, the dst device needs to be switched to the loopback when the copy is created similar to local routes. Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24net/smc: Fix preinitialization of buf_desc in __smc_buf_create()Geert Uytterhoeven1-1/+1
With gcc-4.1.2: net/smc/smc_core.c: In function ‘__smc_buf_create’: net/smc/smc_core.c:567: warning: ‘bufsize’ may be used uninitialized in this function Indeed, if the for-loop is never executed, bufsize is used uninitialized. In addition, buf_desc is stored for later use, while it is still a NULL pointer. Before, error handling was done by checking if buf_desc is non-NULL. The cleanup changed this to an error check, but forgot to update the preinitialization of buf_desc to an error pointer. Update the preinitializatin of buf_desc to fix this. Fixes: b33982c3a6838d13 ("net/smc: cleanup function __smc_buf_create()") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24net/smc: use sk_rcvbuf as start for rmb creationUrsula Braun1-1/+1
Commit 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers") merged handling of SMC receive and send buffers. It introduced sk_buf_size as merged start value for size determination. But since sk_buf_size is not used at all, sk_sndbuf is erroneously used as start for rmb creation. This patch makes sure, sk_buf_size is really used as intended, and sk_rcvbuf is used as start value for rmb creation. Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24ipv6: Do not consider linkdown nexthops during multipathIdo Schimmel1-0/+5
When the 'ignore_routes_with_linkdown' sysctl is set, we should not consider linkdown nexthops during route lookup. While the code correctly verifies that the initially selected route ('match') has a carrier, it does not perform the same check in the subsequent multipath selection, resulting in a potential packet loss. In case the chosen route does not have a carrier and the sysctl is set, choose the initially selected route. Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop link is down") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24net: sched: fix crash when deleting secondary chainsRoman Kapl1-3/+4
If you flush (delete) a filter chain other than chain 0 (such as when deleting the device), the kernel may run into a use-after-free. The chain refcount must not be decremented unless we are sure we are done with the chain. To reproduce the bug, run: ip link add dtest type dummy tc qdisc add dev dtest ingress tc filter add dev dtest chain 1 parent ffff: flower ip link del dtest Introduced in: commit f93e1cdcf42c ("net/sched: fix filter flushing"), but unless you have KAsan or luck, you won't notice it until commit 0dadc117ac8b ("cls_flower: use tcf_exts_get_net() before call_rcu()") Fixes: f93e1cdcf42c ("net/sched: fix filter flushing") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Roman Kapl <code@rkapl.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSEJesse Chan1-0/+4
This change resolves a new compile-time warning when built as a loadable module: WARNING: modpost: missing MODULE_LICENSE() in drivers/net/phy/cortina.o see include/linux/module.h for more information This adds the license as "GPL", which matches the header of the file. MODULE_DESCRIPTION and MODULE_AUTHOR are also added. Signed-off-by: Jesse Chan <jc@linux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-23kbuild: drop $(extra-y) from real-objs-yMasahiro Yamada1-1/+1
$(real-objs-y) in only used in scripts/Makefile.build to form "targets", but $(extra-y) is added to "targets" in another line. We do not need to add $(extra-y) twice. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>