aboutsummaryrefslogtreecommitdiffstats
path: root/net/dcb/dcbnl.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-01-15tipc: fix uninit-value in in tipc_conn_rcv_subYing Xue1-1/+1
syzbot reported: BUG: KMSAN: uninit-value in tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373 CPU: 0 PID: 66 Comm: kworker/u4:4 Not tainted 4.17.0-rc3+ #88 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: tipc_rcv tipc_conn_recv_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683 tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373 tipc_conn_rcv_from_sock net/tipc/topsrv.c:409 [inline] tipc_conn_recv_work+0x3cd/0x560 net/tipc/topsrv.c:424 process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145 worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279 kthread+0x539/0x720 kernel/kthread.c:239 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412 Local variable description: ----s.i@tipc_conn_recv_work Variable was created at: tipc_conn_recv_work+0x65/0x560 net/tipc/topsrv.c:419 process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145 In tipc_conn_rcv_from_sock(), it always supposes the length of message received from sock_recvmsg() is not smaller than the size of struct tipc_subscr. However, this assumption is false. Especially when the length of received message is shorter than struct tipc_subscr size, we will end up touching uninitialized fields in tipc_conn_rcv_sub(). Reported-by: syzbot+8951a3065ee7fd6d6e23@syzkaller.appspotmail.com Reported-by: syzbot+75e6e042c5bbf691fc82@syzkaller.appspotmail.com Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15sch_cake: Correctly update parent qlen when splitting GSO packetsToke Høiland-Jørgensen1-2/+3
To ensure parent qdiscs have the same notion of the number of enqueued packets even after splitting a GSO packet, update the qdisc tree with the number of packets that was added due to the split. Reported-by: Pete Heist <pete@heistp.net> Tested-by: Pete Heist <pete@heistp.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15sched: Fix detection of empty queues in child qdiscsToke Høiland-Jørgensen3-3/+9
Several qdiscs check on enqueue whether the packet was enqueued to a class with an empty queue, in which case the class is activated. This is done by checking if the qlen is exactly 1 after enqueue. However, if GSO splitting is enabled in the child qdisc, a single packet can result in a qlen longer than 1. This means the activation check fails, leading to a stalled queue. Fix this by checking if the queue is empty *before* enqueue, and running the activation logic if this was the case. Reported-by: Pete Heist <pete@heistp.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15sched: Avoid dereferencing skb pointer after child enqueueToke Høiland-Jørgensen8-16/+23
Parent qdiscs may dereference the pointer to the enqueued skb after enqueue. However, both CAKE and TBF call consume_skb() on the original skb when splitting GSO packets, leading to a potential use-after-free in the parent. Fix this by avoiding dereferencing the skb pointer after enqueueing to the child. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15ip6_gre: update version related info when changing linkHangbin Liu1-0/+4
We forgot to update ip6erspan version related info when changing link, which will cause setting new hwid failed. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 94d7d8f292870 ("ip6_gre: add erspan v2 support") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15net: phy: fix too strict check in phy_start_anegHeiner Kallweit1-12/+7
When adding checks to detect wrong usage of the phylib API we added a check to phy_start_aneg() which is too strict. If the phylib state machine is in state PHY_HALTED we should allow reconfiguring and restarting aneg, and just don't touch the state. Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking") Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15Revert "igb: reduce CPU0 latency when updating statistics"Jeff Kirsher3-10/+10
This reverts commit 59361316afcb08569af21e1af83e89c7051c055a. Due to problems found in additional testing, this causes an illegal context switch in the RCU read-side critical section. CC: Dave Jones <davej@codemonkey.org.uk> CC: Cong Wang <xiyou.wangcong@gmail.com> CC: Jan Jablonsky <jan.jablonsky@thalesgroup.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15selftests/txtimestamp: Fix an equals vs assign bugDan Carpenter1-1/+1
This should be == instead of =. Fixes: b52354aa068e ("selftests: expand txtimestamp with ipv6 dgram + raw and pf_packet") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15net: ipv4: Fix memory leak in network namespace dismantleIdo Schimmel3-6/+15
IPv4 routing tables are flushed in two cases: 1. In response to events in the netdev and inetaddr notification chains 2. When a network namespace is being dismantled In both cases only routes associated with a dead nexthop group are flushed. However, a nexthop group will only be marked as dead in case it is populated with actual nexthops using a nexthop device. This is not the case when the route in question is an error route (e.g., 'blackhole', 'unreachable'). Therefore, when a network namespace is being dismantled such routes are not flushed and leaked [1]. To reproduce: # ip netns add blue # ip -n blue route add unreachable 192.0.2.0/24 # ip netns del blue Fix this by not skipping error routes that are not marked with RTNH_F_DEAD when flushing the routing tables. To prevent the flushing of such routes in case #1, add a parameter to fib_table_flush() that indicates if the table is flushed as part of namespace dismantle or not. Note that this problem does not exist in IPv6 since error routes are associated with the loopback device. [1] unreferenced object 0xffff888066650338 (size 56): comm "ip", pid 1206, jiffies 4294786063 (age 26.235s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff ..........ba.... e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00 ...d............ backtrace: [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220 [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20 [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380 [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690 [<0000000014f62875>] netlink_sendmsg+0x929/0xe10 [<00000000bac9d967>] sock_sendmsg+0xc8/0x110 [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0 [<000000002e94f880>] __sys_sendmsg+0xf7/0x250 [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610 [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<000000003a8b605b>] 0xffffffffffffffff unreferenced object 0xffff888061621c88 (size 48): comm "ip", pid 1206, jiffies 4294786063 (age 26.235s) hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff kkkkkkkk..&_.... backtrace: [<00000000733609e3>] fib_table_insert+0x978/0x1500 [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220 [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20 [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380 [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690 [<0000000014f62875>] netlink_sendmsg+0x929/0xe10 [<00000000bac9d967>] sock_sendmsg+0xc8/0x110 [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0 [<000000002e94f880>] __sys_sendmsg+0xf7/0x250 [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610 [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<000000003a8b605b>] 0xffffffffffffffff Fixes: 8cced9eff1d4 ("[NETNS]: Enable routing configuration in non-initial namespace.") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15ip6_gre: fix tunnel list corruption for x-netnsOlivier Matz1-2/+2
In changelink ops, the ip6gre_net pointer is retrieved from dev_net(dev), which is wrong in case of x-netns. Thus, the tunnel is not unlinked from its current list and is relinked into another net namespace. This corrupts the tunnel lists and can later trigger a kernel oops. Fix this by retrieving the netns from device private area. Fixes: c8632fc30bb0 ("net: ip6_gre: Split up ip6gre_changelink()") Cc: Petr Machata <petrm@mellanox.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15tracing/kprobes: Fix NULL pointer dereference in trace_kprobe_create()Andrea Righi1-3/+9
It is possible to trigger a NULL pointer dereference by writing an incorrectly formatted string to krpobe_events (trying to create a kretprobe omitting the symbol). Example: echo "r:event_1 " >> /sys/kernel/debug/tracing/kprobe_events That triggers this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 6 PID: 1757 Comm: bash Not tainted 5.0.0-rc1+ #125 Hardware name: Dell Inc. XPS 13 9370/0F6P3V, BIOS 1.5.1 08/09/2018 RIP: 0010:kstrtoull+0x2/0x20 Code: 28 00 00 00 75 17 48 83 c4 18 5b 41 5c 5d c3 b8 ea ff ff ff eb e1 b8 de ff ff ff eb da e8 d6 36 bb ff 66 0f 1f 44 00 00 31 c0 <80> 3f 2b 55 48 89 e5 0f 94 c0 48 01 c7 e8 5c ff ff ff 5d c3 66 2e RSP: 0018:ffffb5d482e57cb8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff82b12720 RDX: ffffb5d482e57cf8 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb5d482e57d70 R08: ffffa0c05e5a7080 R09: ffffa0c05e003980 R10: 0000000000000000 R11: 0000000040000000 R12: ffffa0c04fe87b08 R13: 0000000000000001 R14: 000000000000000b R15: ffffa0c058d749e1 FS: 00007f137c7f7740(0000) GS:ffffa0c05e580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000497d46004 CR4: 00000000003606e0 Call Trace: ? trace_kprobe_create+0xb6/0x840 ? _cond_resched+0x19/0x40 ? _cond_resched+0x19/0x40 ? __kmalloc+0x62/0x210 ? argv_split+0x8f/0x140 ? trace_kprobe_create+0x840/0x840 ? trace_kprobe_create+0x840/0x840 create_or_delete_trace_kprobe+0x11/0x30 trace_run_command+0x50/0x90 trace_parse_run_command+0xc1/0x160 probes_write+0x10/0x20 __vfs_write+0x3a/0x1b0 ? apparmor_file_permission+0x1a/0x20 ? security_file_permission+0x31/0xf0 ? _cond_resched+0x19/0x40 vfs_write+0xb1/0x1a0 ksys_write+0x55/0xc0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x5a/0x120 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix by doing the proper argument checks in trace_kprobe_create(). Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/lkml/20190111095108.b79a2ee026185cbd62365977@kernel.org Link: http://lkml.kernel.org/r/20190111060113.GA22841@xps-13 Fixes: 6212dd29683e ("tracing/kprobes: Use dyn_event framework for kprobe events") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-15sbitmap: Protect swap_lock from hardirqMing Lei1-2/+3
Because we may call blk_mq_get_driver_tag() directly from blk_mq_dispatch_rq_list() without holding any lock, then HARDIRQ may come and the above DEADLOCK is triggered. Commit ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq") tries to fix this issue by using 'spin_lock_bh', which isn't enough because we complete request from hardirq context direclty in case of multiqueue. Cc: Clark Williams <williams@redhat.com> Fixes: ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq") Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <ming.lei@redhat.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-15sbitmap: Protect swap_lock from softirqsSteven Rostedt (VMware)1-10/+2
The swap_lock used by sbitmap has a chain with locks taken from softirq, but the swap_lock is not protected from being preempted by softirqs. A chain exists of: sbq->ws[i].wait -> dispatch_wait_lock -> swap_lock Where the sbq->ws[i].wait lock can be taken from softirq context, which means all locks below it in the chain must also be protected from softirqs. Reported-by: Clark Williams <williams@redhat.com> Fixes: 58ab5e32e6fd ("sbitmap: silence bogus lockdep IRQ warning") Fixes: ea86ea2cdced ("sbitmap: amortize cost of clearing bits") Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <ming.lei@redhat.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-14netfilter: nft_flow_offload: fix checking method of conntrack helperHenry Yen1-1/+4
This patch uses nfct_help() to detect whether an established connection needs conntrack helper instead of using test_bit(IPS_HELPER_BIT, &ct->status). The reason is that IPS_HELPER_BIT is only set when using explicit CT target. However, in the case that a device enables conntrack helper via command "echo 1 > /proc/sys/net/netfilter/nf_conntrack_helper", the status of IPS_HELPER_BIT will not present any change, and consequently it loses the checking ability in the context. Signed-off-by: Henry Yen <henry.yen@mediatek.com> Reviewed-by: Ryder Lee <ryder.lee@mediatek.com> Tested-by: John Crispin <john@phrozen.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-14Linux 5.0-rc2Linus Torvalds1-1/+1
2019-01-14kernel/sys.c: Clarify that UNAME26 does not generate unique versions anymoreJonathan Neuschäfer1-1/+2
UNAME26 is a mechanism to report Linux's version as 2.6.x, for compatibility with old/broken software. Due to the way it is implemented, it would have to be updated after 5.0, to keep the resulting versions unique. Linus Torvalds argued: "Do we actually need this? I'd rather let it bitrot, and just let it return random versions. It will just start again at 2.4.60, won't it? Anybody who uses UNAME26 for a 5.x kernel might as well think it's still 4.x. The user space is so old that it can't possibly care about differences between 4.x and 5.x, can it? The only thing that matters is that it shows "2.4.<largeenough>", which it will do regardless" Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-12phy: fix build breakage: add PHY_MODE_SATAJohn Hubbard2-2/+4
Commit 49e54187ae0b ("ata: libahci_platform: comply to PHY framework") uses the PHY_MODE_SATA, but that enum had not yet been added. This caused a build failure for me, with today's linux.git. Also, there is a potentially conflicting (mis-named) PHY_MODE_SATA, hiding in the Marvell Berlin SATA PHY driver. Fix the build by: 1) Renaming Marvell's defined value to a more scoped name, in order to avoid any potential conflicts: PHY_BERLIN_MODE_SATA. 2) Adding the missing enum, which was going to be added anyway as part of [1]. [1] https://lkml.kernel.org/r/20190108163124.6409-3-miquel.raynal@bootlin.com Fixes: 49e54187ae0b ("ata: libahci_platform: comply to PHY framework") Signed-off-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Olof Johansson <olof@lixom.net> Cc: Grzegorz Jaszczyk <jaz@semihalf.com> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-12bnxt_en: Fix context memory allocation.Michael Chan1-3/+6
When allocating memory pages for context memory, if the last page table should be fully populated, the current code will set nr_pages to 0 when calling bnxt_alloc_ctx_mem_blk(). This will cause the last page table to be completely blank and causing some RDMA failures. Fix it by setting the last page table's nr_pages to the remainder only if it is non-zero. Fixes: 08fe9d181606 ("bnxt_en: Add Level 2 context memory paging support.") Reported-by: Eric Davis <eric.davis@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-12bnxt_en: Fix ring checking logic on 57500 chips.Michael Chan2-3/+5
In bnxt_hwrm_check_pf_rings(), add the proper flag to test the NQ resources. Without the proper flag, the firmware will change the NQ resource allocation and remap the IRQ, causing missing IRQs. This issue shows up when adding MQPRIO TX queues, for example. Fixes: 36d65be9a880 ("bnxt_en: Disable MSIX before re-reserving NQs/CMPL rings.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11mISDN: hfcsusb: Use struct_size() in kzalloc()Gustavo A. R. Silva1-2/+1
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: clear skb->tstamp in bridge forwarding pathPaolo Abeni1-0/+1
Matteo reported forwarding issues inside the linux bridge, if the enslaved interfaces use the fq qdisc. Similar to commit 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths"), we need to clear the tstamp field in the bridge forwarding path. Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Reported-and-tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: bpfilter: disallow to remove bpfilter module while being usedTaehee Yoo3-23/+29
The bpfilter.ko module can be removed while functions of the bpfilter.ko are executing. so panic can occurred. in order to protect that, locks can be used. a bpfilter_lock protects routines in the __bpfilter_process_sockopt() but it's not enough because __exit routine can be executed concurrently. Now, the bpfilter_umh can not run in parallel. So, the module do not removed while it's being used and it do not double-create UMH process. The members of the umh_info and the bpfilter_umh_ops are protected by the bpfilter_umh_ops.lock. test commands: while : do iptables -I FORWARD -m string --string ap --algo kmp & modprobe -rv bpfilter & done splat looks like: [ 298.623435] BUG: unable to handle kernel paging request at fffffbfff807440b [ 298.628512] #PF error: [normal kernel read fault] [ 298.633018] PGD 124327067 P4D 124327067 PUD 11c1a3067 PMD 119eb2067 PTE 0 [ 298.638859] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 298.638859] CPU: 0 PID: 2997 Comm: iptables Not tainted 4.20.0+ #154 [ 298.638859] RIP: 0010:__mutex_lock+0x6b9/0x16a0 [ 298.638859] Code: c0 00 00 e8 89 82 ff ff 80 bd 8f fc ff ff 00 0f 85 d9 05 00 00 48 8b 85 80 fc ff ff 48 bf 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 38 00 0f 85 1d 0e 00 00 48 8b 85 c8 fc ff ff 49 39 47 58 c6 [ 298.638859] RSP: 0018:ffff88810e7777a0 EFLAGS: 00010202 [ 298.638859] RAX: 1ffffffff807440b RBX: ffff888111bd4d80 RCX: 0000000000000000 [ 298.638859] RDX: 1ffff110235ff806 RSI: ffff888111bd5538 RDI: dffffc0000000000 [ 298.638859] RBP: ffff88810e777b30 R08: 0000000080000002 R09: 0000000000000000 [ 298.638859] R10: 0000000000000000 R11: 0000000000000000 R12: fffffbfff168a42c [ 298.638859] R13: ffff888111bd4d80 R14: ffff8881040e9a05 R15: ffffffffc03a2000 [ 298.638859] FS: 00007f39e3758700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 [ 298.638859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 298.638859] CR2: fffffbfff807440b CR3: 000000011243e000 CR4: 00000000001006f0 [ 298.638859] Call Trace: [ 298.638859] ? mutex_lock_io_nested+0x1560/0x1560 [ 298.638859] ? kasan_kmalloc+0xa0/0xd0 [ 298.638859] ? kmem_cache_alloc+0x1c2/0x260 [ 298.638859] ? __alloc_file+0x92/0x3c0 [ 298.638859] ? alloc_empty_file+0x43/0x120 [ 298.638859] ? alloc_file_pseudo+0x220/0x330 [ 298.638859] ? sock_alloc_file+0x39/0x160 [ 298.638859] ? __sys_socket+0x113/0x1d0 [ 298.638859] ? __x64_sys_socket+0x6f/0xb0 [ 298.638859] ? do_syscall_64+0x138/0x560 [ 298.638859] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 298.638859] ? __alloc_file+0x92/0x3c0 [ 298.638859] ? init_object+0x6b/0x80 [ 298.638859] ? cyc2ns_read_end+0x10/0x10 [ 298.638859] ? cyc2ns_read_end+0x10/0x10 [ 298.638859] ? hlock_class+0x140/0x140 [ 298.638859] ? sched_clock_local+0xd4/0x140 [ 298.638859] ? sched_clock_local+0xd4/0x140 [ 298.638859] ? check_flags.part.37+0x440/0x440 [ 298.638859] ? __lock_acquire+0x4f90/0x4f90 [ 298.638859] ? set_rq_offline.part.89+0x140/0x140 [ ... ] Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: bpfilter: restart bpfilter_umh when error occurredTaehee Yoo4-12/+40
The bpfilter_umh will be stopped via __stop_umh() when the bpfilter error occurred. The bpfilter_umh() couldn't start again because there is no restart routine. The section of the bpfilter_umh_{start/end} is no longer .init.rodata because these area should be reused in the restart routine. hence the section name is changed to .bpfilter_umh. The bpfilter_ops->start() is restart callback. it will be called when bpfilter_umh is stopped. The stop bit means bpfilter_umh is stopped. this bit is set by both start and stop routine. Before this patch, Test commands: $ iptables -vnL $ kill -9 <pid of bpfilter_umh> $ iptables -vnL [ 480.045136] bpfilter: write fail -32 $ iptables -vnL All iptables commands will fail. After this patch, Test commands: $ iptables -vnL $ kill -9 <pid of bpfilter_umh> $ iptables -vnL $ iptables -vnL Now, all iptables commands will work. Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: bpfilter: use cleanup callback to release umh_infoTaehee Yoo3-23/+44
Now, UMH process is killed, do_exit() calls the umh_info->cleanup callback to release members of the umh_info. This patch makes bpfilter_umh's cleanup routine to use the umh_info->cleanup callback. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11umh: add exit routine for UMH processTaehee Yoo4-2/+43
A UMH process which is created by the fork_usermode_blob() such as bpfilter needs to release members of the umh_info when process is terminated. But the do_exit() does not release members of the umh_info. hence module which uses UMH needs own code to detect whether UMH process is terminated or not. But this implementation needs extra code for checking the status of UMH process. it eventually makes the code more complex. The new PF_UMH flag is added and it is used to identify UMH processes. The exit_umh() does not release members of the umh_info. Hence umh_info->cleanup callback should release both members of the umh_info and the private data. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11isdn: i4l: isdn_tty: Fix some concurrency double-free bugsJia-Ju Bai1-1/+5
The functions isdn_tty_tiocmset() and isdn_tty_set_termios() may be concurrently executed. isdn_tty_tiocmset isdn_tty_modem_hup line 719: kfree(info->dtmf_state); line 721: kfree(info->silence_state); line 723: kfree(info->adpcms); line 725: kfree(info->adpcmr); isdn_tty_set_termios isdn_tty_modem_hup line 719: kfree(info->dtmf_state); line 721: kfree(info->silence_state); line 723: kfree(info->adpcms); line 725: kfree(info->adpcmr); Thus, some concurrency double-free bugs may occur. These possible bugs are found by a static tool written by myself and my manual code review. To fix these possible bugs, the mutex lock "modem_info_mutex" used in isdn_tty_tiocmset() is added in isdn_tty_set_termios(). Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11vhost/vsock: fix vhost vsock cid hashing inconsistentZha Bin1-1/+1
The vsock core only supports 32bit CID, but the Virtio-vsock spec define CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as zero. This inconsistency causes one bug in vhost vsock driver. The scenarios is: 0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock object. And hash_min() is used to compute the hash key. hash_min() is defined as: (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)). That means the hash algorithm has dependency on the size of macro argument 'val'. 0. In function vhost_vsock_set_cid(), a 64bit CID is passed to hash_min() to compute the hash key when inserting a vsock object into the hash table. 0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min() to compute the hash key when looking up a vsock for an CID. Because the different size of the CID, hash_min() returns different hash key, thus fails to look up the vsock object for an CID. To fix this bug, we keep CID as u64 in the IOCTLs and virtio message headers, but explicitly convert u64 to u32 when deal with the hash table and vsock core. Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack callers") Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.tex Signed-off-by: Zha Bin <zhabin@linux.alibaba.com> Reviewed-by: Liu Jiang <gerry@linux.alibaba.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: stmmac: Prevent RX starvation in stmmac_napi_poll()Jose Abreu1-13/+14
Currently, TX is given a budget which is consumed by stmmac_tx_clean() and stmmac_rx() is given the remaining non-consumed budget. This is wrong and in case we are sending a large number of packets this can starve RX because remaining budget will be low. Let's give always the same budget for RX and TX clean. While at it, check if we missed any interrupts while we were in NAPI callback by looking at DMA interrupt status. Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: stmmac: Fix the logic of checking if RX Watchdog must be enabledJose Abreu1-12/+12
RX Watchdog can be disabled by platform definitions but currently we are initializing the descriptors before checking if Watchdog must be disabled or not. Fix this by checking earlier if user wants Watchdog disabled or not. Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: stmmac: Check if CBS is supported before configuringJose Abreu1-0/+2
Check if CBS is currently supported before trying to configure it in HW. Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: stmmac: dwxgmac2: Only clear interrupts that are activeJose Abreu1-3/+3
In DMA interrupt handler we were clearing all interrupts status, even the ones that were not active. Fix this and only clear the active interrupts. Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: stmmac: Fix PCI module removal leakJose Abreu1-0/+10
Since commit b7d0f08e9129, the enable / disable of PCI device is not managed which will result in IO regions not being automatically unmapped. As regions continue mapped it is currently not possible to remove and then probe again the PCI module of stmmac. Fix this by manually unmapping regions on remove callback. Changes from v1: - Fix build error Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Fixes: b7d0f08e9129 ("net: stmmac: Fix WoL for PCI-based setups") Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11ata: ahci: mvebu: request PHY suspend/resume for Armada 3700Miquel Raynal1-0/+3
A feature has been added in the libahci driver: the possibility to set a new flag in hpriv->flags to let the core handle PHY suspend/resume automatically. Make use of this feature to make suspend to RAM work with SATA drives on A3700. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-11ata: ahci: mvebu: add Armada 3700 initialization needed for S2RAMMiquel Raynal1-9/+18
A3700 comphy initialization is done in the firmware (TF-A). Looking at the SATA PHY initialization routine, there is a comment about "vendor specific" registers. Two registers are mentioned. They are not initialized there in the firmware because they are AHCI related, while the firmware at this location does only PHY configuration. The solution to avoid doing such initialization is relying on U-Boot. While this work at boot time, U-Boot is definitely not going to run during a resume after suspending to RAM. Two possible solutions were considered: * Fixing the firmware. * Fixing the kernel driver. The first solution would take ages to propagate, while the second solution is easy to implement as the driver as been a little bit reworked to prepare for such platform configuration. Hence, this patch adds an Armada 3700 configuration function to set these two registers both at boot time (in the probe) and after a suspend (in the resume path). Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-11ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCsMiquel Raynal1-17/+51
At the beginning, only Armada 38x SoCs where supported by the ahci_mvebu.c driver. Commit 15d3ce7b63bd ("ata: ahci_mvebu: add support for Armada 3700 variant") introduced Armada 3700 support. As opposed to Armada 38x SoCs, the 3700 variants do not have to configure mbus and the regret option. This patch took care of avoiding such configuration when not needed in the probe function, but failed to do the same in the resume path. While doing so looks harmless by experience, let's clean the driver logic and avoid doing this useless configuration with Armada 3700 SoCs. Because the logic is very similar between these two places, it has been decided to factorize this code and put it in a "Armada 38x configuration function". This function is part of a new (per-compatible) platform data structure, so that the addition of such configuration function for Armada 3700 will be eased. Fixes: 15d3ce7b63bd ("ata: ahci_mvebu: add support for Armada 3700 variant") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-11ata: ahci: mvebu: remove stale commentMiquel Raynal1-5/+0
For Armada-38x (32-bit) SoCs, PM platform support has been added since: commit 32f9494c9dfd ("ARM: mvebu: prepare pm-board.c for the introduction of Armada 38x support") commit 3cbd6a6ca81c ("ARM: mvebu: Add standby support") For Armada 64-bit SoCs, like the A3700 also using this AHCI driver, PM platform support has always existed. There are even suspend/resume hooks in this driver since: commit d6ecf15814888 ("ata: ahci_mvebu: add suspend/resume support") Remove the stale comment at the end of this driver stating that all the above does not exist yet. Fixes: d6ecf15814888 ("ata: ahci_mvebu: add suspend/resume support") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-11ata: libahci_platform: comply to PHY frameworkMiquel Raynal2-0/+15
Current implementation of the libahci does not take into account the new PHY framework. Correct the situation by adding a call to phy_set_mode() before phy_power_on(). PHYs should also be handled at suspend/resume time. For this, call ahci_platform_enable/disable_phys() at suspend/resume_host() time. These calls are guarded by a HFLAG (AHCI_HFLAG_SUSPEND_PHYS) that the user of the libahci driver must set manually in hpriv->flags at probe time. This is to avoid breaking users that have not been tested with this change. Reviewed-by: Hans de Goede <hdegoede@redhat.com> Suggested-by: Grzegorz Jaszczyk <jaz@semihalf.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-11x86/kvm/nVMX: don't skip emulated instruction twice when vmptr address is not backedVitaly Kuznetsov1-2/+1
Since commit 09abb5e3e5e50 ("KVM: nVMX: call kvm_skip_emulated_instruction in nested_vmx_{fail,succeed}") nested_vmx_failValid() results in kvm_skip_emulated_instruction() so doing it again in handle_vmptrld() when vmptr address is not backed is wrong, we end up advancing RIP twice. Fixes: fca91f6d60b6e ("kvm: nVMX: Set VM instruction error for VMPTRLD of unbacked page") Reported-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11Documentation/virtual/kvm: Update URL for AMD SEV API specificationChristophe de Dinechin1-1/+1
The URL of [api-spec] in Documentation/virtual/kvm/amd-memory-encryption.rst is no longer valid, replaced space with underscore. Signed-off-by: Christophe de Dinechin <dinechin@redhat.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11KVM/VMX: Avoid return error when flush tlb successfully in the hv_remote_flush_tlb_with_range()Lan Tianyu1-1/+1
The "ret" is initialized to be ENOTSUPP. The return value of __hv_remote_flush_tlb_with_range() will be Or with "ret" when ept table potiners are mismatched. This will cause return ENOTSUPP even if flush tlb successfully. This patch is to fix the issue and set "ret" to 0. Fixes: a5c214dad198 ("KVM/VMX: Change hv flush logic when ept tables are mismatched.") Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11kvm: sev: Fail KVM_SEV_INIT if already initializedDavid Rientjes1-0/+3
By code inspection, it was found that multiple calls to KVM_SEV_INIT could deplete asid bits and overwrite kvm_sev_info's regions_list. Multiple calls to KVM_SVM_INIT is not likely to occur with QEMU, but this should likely be fixed anyway. This code is serialized by kvm->lock. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Reported-by: Cfir Cohen <cfir@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11KVM: validate userspace input in kvm_clear_dirty_log_protect()Tomas Bortoli1-2/+7
The function at issue does not fully validate the content of the structure pointed by the log parameter, though its content has just been copied from userspace and lacks validation. Fix that. Moreover, change the type of n to unsigned long as that is the type returned by kvm_dirty_bitmap_bytes(). Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+028366e52c9ace67deb3@syzkaller.appspotmail.com [Squashed the fix from Paolo. - Radim.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11KVM: x86: Fix bit shifting in update_intel_pt_cfgGustavo A. R. Silva1-1/+1
ctl_bitmask in pt_desc is of type u64. When an integer like 0xf is being left shifted more than 32 bits, the behavior is undefined. Fix this by adding suffix ULL to integer 0xf. Addresses-Coverity-ID: 1476095 ("Bad bit shift operation") Fixes: 6c0f0bba85a0 ("KVM: x86: Introduce a function to initialize the PT configuration") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11tty: Don't hold ldisc lock in tty_reopen() if ldisc presentDmitry Safonov1-7/+13
Try to get reference for ldisc during tty_reopen(). If ldisc present, we don't need to do tty_ldisc_reinit() and lock the write side for line discipline semaphore. Effectively, it optimizes fast-path for tty_reopen(), but more importantly it won't interrupt ongoing IO on the tty as no ldisc change is needed. Fixes user-visible issue when tty_reopen() interrupted login process for user with a long password, observed and reported by Lukas. Fixes: c96cf923a98d ("tty: Don't block on IO when ldisc change is pending") Fixes: 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()") Cc: Jiri Slaby <jslaby@suse.com> Reported-by: Lukas F. Hartmann <lukas@mntmn.com> Tested-by: Lukas F. Hartmann <lukas@mntmn.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-11mmc: core: don't override the CD GPIO level when "cd-inverted" is setMartin Blumenstingl1-1/+1
Since commit 89a5e15bcba87d ("gpio/mmc/of: Respect polarity in the device tree") gpiolib-of parses the "cd-gpios" property and flips the polarity if "cd-inverted" is also set. This results in the "cd-inverted" property being evaluated twice, which effectively makes it a no-op: - first in drivers/gpio/gpiolib-of.c (of_xlate_and_get_gpiod_flags) when setting up the CD GPIO - then again in drivers/mmc/core/slot-gpio.c (mmc_gpio_get_cd) when reading the CD GPIO value at runtime On boards which are using device-tree with the "cd-inverted" property being set any inserted card are not detected anymore. This is due to the MMC core treating the CD GPIO with the wrong polarity. Disable "override_cd_active_level" for the card detection GPIO which is parsed using mmc_of_parse. This fixes SD card detection on the boards which are currently using the "cd-inverted" device-tree property (tested on Meson8b Odroid-C1 and Meson8b EC-100). This does not remove the CD GPIO inversion logic from the MMC core because there's at least one driver (sdhci-pci-core for Intel BayTrail based boards) which still passes "override_cd_active_level = true" to mmc_gpiod_request_cd(). Due to lack of hardware for testing this is left untouched. In the future the GPIO inversion logic for both, card and read-only detection can be removed once no driver is using it anymore. Fixes: 89a5e15bcba87d ("gpio/mmc/of: Respect polarity in the device tree") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Tested-by: Anand Moon <linux.amoon@gmail.com> Tested-by: Loys Ollivier <loys.ollivier@gmail.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-01-11cifs: update internal module version numberSteve French1-1/+1
To 2.16 Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11CIFS: Fix error paths in writeback codePavel Shilovsky4-9/+56
This patch aims to address writeback code problems related to error paths. In particular it respects EINTR and related error codes and stores and returns the first error occurred during writeback. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11CIFS: Move credit processing to mid callbacks for SMB3Pavel Shilovsky1-17/+34
Currently we account for credits in the thread initiating a request and waiting for a response. The demultiplex thread receives the response, wakes up the thread and the latter collects credits from the response buffer and add them to the server structure on the client. This approach is not accurate, because it may race with reconnect events in the demultiplex thread which resets the number of credits. Fix this by moving credit processing to new mid callbacks that collect credits granted by the server from the response in the demultiplex thread. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11CIFS: Fix credits calculation for cancelled requestsPavel Shilovsky2-2/+27
If a request is cancelled, we can't assume that the server returns 1 credit back. Instead we need to wait for a response and process the number of credits granted by the server. Create a separate mid callback for cancelled request, parse the number of credits in a response buffer and add them to the client's credits. If the didn't get a response (no response buffer available) assume 0 credits granted. The latter most probably happens together with session reconnect, so the client's credits are adjusted anyway. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11cifs: Fix potential OOB access of lock element arrayRoss Lagerwall2-6/+6
If maxBuf is small but non-zero, it could result in a zero sized lock element array which we would then try and access OOB. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>