aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/mm (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-05-07ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()Oleg Nesterov1-8/+26
Commit cc731525f26a ("signal: Remove kernel interal si_code magic") changed the value of SI_FROMUSER(SI_MESGQ), this means that mq_notify() no longer works if the sender doesn't have rights to send a signal. Change __do_notify() to use do_send_sig_info() instead of kill_pid_info() to avoid check_kill_permission(). This needs the additional notify.sigev_signo != 0 check, shouldn't we change do_mq_notify() to deny sigev_signo == 0 ? Test-case: #include <signal.h> #include <mqueue.h> #include <unistd.h> #include <sys/wait.h> #include <assert.h> static int notified; static void sigh(int sig) { notified = 1; } int main(void) { signal(SIGIO, sigh); int fd = mq_open("/mq", O_RDWR|O_CREAT, 0666, NULL); assert(fd >= 0); struct sigevent se = { .sigev_notify = SIGEV_SIGNAL, .sigev_signo = SIGIO, }; assert(mq_notify(fd, &se) == 0); if (!fork()) { assert(setuid(1) == 0); mq_send(fd, "",1,0); return 0; } wait(NULL); mq_unlink("/mq"); assert(notified); return 0; } [manfred@colorfullife.com: 1) Add self_exec_id evaluation so that the implementation matches do_notify_parent 2) use PIDTYPE_TGID everywhere] Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic") Reported-by: Yoji <yoji.fujihar.min@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Markus Elfring <elfring@users.sourceforge.net> Cc: <1vier1@web.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/e2a782e4-eab9-4f5c-c749-c07a8f7a4e66@colorfullife.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07io_uring: don't use 'fd' for openat/openat2/statxJens Axboe1-25/+7
We currently make some guesses as when to open this fd, but in reality we have no business (or need) to do so at all. In fact, it makes certain things fail, like O_PATH. Remove the fd lookup from these opcodes, we're just passing the 'fd' to generic helpers anyway. With that, we can also remove the special casing of fd values in io_req_needs_file(), and the 'fd_non_neg' check that we have. And we can ensure that we only read sqe->fd once. This fixes O_PATH usage with openat/openat2, and ditto statx path side oddities. Cc: stable@vger.kernel.org: # v5.6 Reported-by: Max Kellermann <mk@cm4all.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-07tools/bootconfig: Fix resource leak in apply_xbc()Yunfeng Ye1-3/+6
Fix the @data and @fd allocations that are leaked in the error path of apply_xbc(). Link: http://lkml.kernel.org/r/583a49c9-c27a-931d-e6c2-6f63a4b18bea@huawei.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07tracing: Make tracing_snapshot_instance_cond() staticZou Wei1-1/+2
Fix the following sparse warning: kernel/trace/trace.c:950:6: warning: symbol 'tracing_snapshot_instance_cond' was not declared. Should it be static? Link: http://lkml.kernel.org/r/1587614905-48692-1-git-send-email-zou_wei@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07tracing: Fix doc mistakes in trace sampleWei Yang1-1/+1
As the example below shows, DECLARE_EVENT_CLASS() is used instead of DEFINE_EVENT_CLASS(). Link: http://lkml.kernel.org/r/20200428214959.11259-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07gpu/trace: Minor comment updates for gpu_mem_total tracepointYiwei Zhang1-1/+1
This change updates the improper comment for the 'size' attribute in the tracepoint definition. Most gfx drivers pre-fault in physical pages instead of making virtual allocations. So we drop the 'Virtual' keyword here and leave this to the implementations. Link: http://lkml.kernel.org/r/20200428220825.169606-1-zzyiwei@google.com Signed-off-by: Yiwei Zhang <zzyiwei@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07tracing: Add a vmalloc_sync_mappings() for safe measureSteven Rostedt (VMware)1-0/+13
x86_64 lazily maps in the vmalloc pages, and the way this works with per_cpu areas can be complex, to say the least. Mappings may happen at boot up, and if nothing synchronizes the page tables, those page mappings may not be synced till they are used. This causes issues for anything that might touch one of those mappings in the path of the page fault handler. When one of those unmapped mappings is touched in the page fault handler, it will cause another page fault, which in turn will cause a page fault, and leave us in a loop of page faults. Commit 763802b53a42 ("x86/mm: split vmalloc_sync_all()") split vmalloc_sync_all() into vmalloc_sync_unmappings() and vmalloc_sync_mappings(), as on system exit, it did not need to do a full sync on x86_64 (although it still needed to be done on x86_32). By chance, the vmalloc_sync_all() would synchronize the page mappings done at boot up and prevent the per cpu area from being a problem for tracing in the page fault handler. But when that synchronization in the exit of a task became a nop, it caused the problem to appear. Link: https://lore.kernel.org/r/20200429054857.66e8e333@oasis.local.home Cc: stable@vger.kernel.org Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Reported-by: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> Suggested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07tracing: Wait for preempt irq delay thread to finishSteven Rostedt (VMware)1-6/+24
Running on a slower machine, it is possible that the preempt delay kernel thread may still be executing if the module was immediately removed after added, and this can cause the kernel to crash as the kernel thread might be executing after its code has been removed. There's no reason that the caller of the code shouldn't just wait for the delay thread to finish, as the thread can also be created by a trigger in the sysfs code, which also has the same issues. Link: http://lore.kernel.org/r/5EA2B0C8.2080706@cn.fujitsu.com Cc: stable@vger.kernel.org Fixes: 793937236d1ee ("lib: Add module for testing preemptoff/irqsoff latency tracers") Reported-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Reviewed-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Reviewed-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07splice: move f_mode checks to do_{splice,tee}()Pavel Begunkov1-27/+18
do_splice() is used by io_uring, as will be do_tee(). Move f_mode checks from sys_{splice,tee}() to do_{splice,tee}(), so they're enforced for io_uring as well. Fixes: 7d67af2c0134 ("io_uring: add splice(2) support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-07objtool: Fix infinite loop in find_jump_table()Josh Poimboeuf1-2/+13
Kristen found a hang in objtool when building with -ffunction-sections. It was caused by evergreen_pcie_gen2_enable.cold() being laid out immediately before evergreen_pcie_gen2_enable(). Since their "pfunc" is always the same, find_jump_table() got into an infinite loop because it didn't recognize the boundary between the two functions. Fix that with a new prev_insn_same_sym() helper, which doesn't cross subfunction boundaries. Reported-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/378b51c9d9c894dc3294bc460b4b0869e950b7c5.1588110291.git.jpoimboe@redhat.com
2020-05-07bdi: move bdi_dev_name out of lineChristoph Hellwig2-9/+10
bdi_dev_name is not a fast path function, move it out of line. This prepares for using it from modular callers without having to export an implementation detail like bdi_unknown_name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-07vboxsf: don't use the source name in the bdi nameChristoph Hellwig1-1/+1
Simplify the bdi name to mirror what we are doing elsewhere, and drop them name in favor of just using a number. This avoids a potentially very long bdi name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-07arm64: hugetlb: avoid potential NULL dereferenceMark Rutland1-0/+2
The static analyzer in GCC 10 spotted that in huge_pte_alloc() we may pass a NULL pmdp into pte_alloc_map() when pmd_alloc() returns NULL: | CC arch/arm64/mm/pageattr.o | CC arch/arm64/mm/hugetlbpage.o | from arch/arm64/mm/hugetlbpage.c:10: | arch/arm64/mm/hugetlbpage.c: In function ‘huge_pte_alloc’: | ./arch/arm64/include/asm/pgtable-types.h:28:24: warning: dereference of NULL ‘pmdp’ [CWE-690] [-Wanalyzer-null-dereference] | ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’ | arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’ | |arch/arm64/mm/hugetlbpage.c:232:10: | |./arch/arm64/include/asm/pgtable-types.h:28:24: | ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’ | arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’ This can only occur when the kernel cannot allocate a page, and so is unlikely to happen in practice before other systems start failing. We can avoid this by bailing out if pmd_alloc() fails, as we do earlier in the function if pud_alloc() fails. Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Kyrill Tkachov <kyrylo.tkachov@arm.com> Cc: <stable@vger.kernel.org> # 4.5.x- Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-05-07usb: chipidea: msm: Ensure proper controller reset using role switch APIBryan O'Donoghue1-1/+1
Currently we check to make sure there is no error state on the extcon handle for VBUS when writing to the HS_PHY_GENCONFIG_2 register. When using the USB role-switch API we still need to write to this register absent an extcon handle. This patch makes the appropriate update to ensure the write happens if role-switching is true. Fixes: 05559f10ed79 ("usb: chipidea: add role switch class support") Cc: stable <stable@vger.kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: linux-usb@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Peter Chen <peter.chen@nxp.com> Link: https://lore.kernel.org/r/20200507004918.25975-2-peter.chen@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-06net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CAREPablo Neira Ayuso3-4/+22
This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver that the frontend does not need counters, this hw stats type request never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests the driver to disable the stats, however, if the driver cannot disable counters, it bails out. TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_* except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED (this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*. Fixes: 319a1d19471e ("flow_offload: check for basic action hw stats type") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper orderLukas Bulwahn1-1/+1
Commit 9b038086f06b ("docs: networking: convert DIM to RST") added a new file entry to DYNAMIC INTERRUPT MODERATION to the end, and not following alphabetical order. So, ./scripts/checkpatch.pl -f MAINTAINERS complains: WARNING: Misordered MAINTAINERS entry - list file patterns in alphabetic order #5966: FILE: MAINTAINERS:5966: +F: lib/dim/ +F: Documentation/networking/net_dim.rst Reorder the file entries to keep MAINTAINERS nicely ordered. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06wireguard: send/receive: use explicit unlikely branch instead of implicit coalescingJason A. Donenfeld2-16/+12
It's very unlikely that send will become true. It's nearly always false between 0 and 120 seconds of a session, and in most cases becomes true only between 120 and 121 seconds before becoming false again. So, unlikely(send) is clearly the right option here. What happened before was that we had this complex boolean expression with multiple likely and unlikely clauses nested. Since this is evaluated left-to-right anyway, the whole thing got converted to unlikely. So, we can clean this up to better represent what's going on. The generated code is the same. Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06wireguard: selftests: initalize ipv6 members to NULL to squelch clang warningJason A. Donenfeld1-2/+2
Without setting these to NULL, clang complains in certain configurations that have CONFIG_IPV6=n: In file included from drivers/net/wireguard/ratelimiter.c:223: drivers/net/wireguard/selftest/ratelimiter.c:173:34: error: variable 'skb6' is uninitialized when used here [-Werror,-Wuninitialized] ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count); ^~~~ drivers/net/wireguard/selftest/ratelimiter.c:123:29: note: initialize the variable 'skb6' to silence this warning struct sk_buff *skb4, *skb6; ^ = NULL drivers/net/wireguard/selftest/ratelimiter.c:173:40: error: variable 'hdr6' is uninitialized when used here [-Werror,-Wuninitialized] ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count); ^~~~ drivers/net/wireguard/selftest/ratelimiter.c:125:22: note: initialize the variable 'hdr6' to silence this warning struct ipv6hdr *hdr6; ^ We silence this warning by setting the variables to NULL as the warning suggests. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06wireguard: send/receive: cond_resched() when processing worker ringbuffersJason A. Donenfeld2-0/+6
Users with pathological hardware reported CPU stalls on CONFIG_ PREEMPT_VOLUNTARY=y, because the ringbuffers would stay full, meaning these workers would never terminate. That turned out not to be okay on systems without forced preemption, which Sultan observed. This commit adds a cond_resched() to the bottom of each loop iteration, so that these workers don't hog the core. Note that we don't need this on the napi poll worker, since that terminates after its budget is expended. Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com> Reported-by: Wang Jian <larkwang@gmail.com> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06wireguard: socket: remove errant restriction on looping to selfJason A. Donenfeld2-15/+51
It's already possible to create two different interfaces and loop packets between them. This has always been possible with tunnels in the kernel, and isn't specific to wireguard. Therefore, the networking stack already needs to deal with that. At the very least, the packet winds up exceeding the MTU and is discarded at that point. So, since this is already something that happens, there's no need to forbid the not very exceptional case of routing a packet back to the same interface; this loop is no different than others, and we shouldn't special case it, but rather rely on generic handling of loops in general. This also makes it easier to do interesting things with wireguard such as onion routing. At the same time, we add a selftest for this, ensuring that both onion routing works and infinite routing loops do not crash the kernel. We also add a test case for wireguard interfaces nesting packets and sending traffic between each other, as well as the loop in this case too. We make sure to send some throughput-heavy traffic for this use case, to stress out any possible recursion issues with the locks around workqueues. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06wireguard: selftests: use normal kernel stack size on ppc64Jason A. Donenfeld1-0/+1
While at some point it might have made sense to be running these tests on ppc64 with 4k stacks, the kernel hasn't actually used 4k stacks on 64-bit powerpc in a long time, and more interesting things that we test don't really work when we deviate from the default (16k). So, we stop pushing our luck in this commit, and return to the default instead of the minimum. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: ethernet: ti: am65-cpsw-nuss: fix irqs typeGrygorii Strashko1-2/+3
The K3 INTA driver, which is source TX/RX IRQs for CPSW NUSS, defines IRQs triggering type as EDGE by default, but triggering type for CPSW NUSS TX/RX IRQs has to be LEVEL as the EDGE triggering type may cause unnecessary IRQs triggering and NAPI scheduling for empty queues. It was discovered with RT-kernel. Fix it by explicitly specifying CPSW NUSS TX/RX IRQ type as IRQF_TRIGGER_HIGH. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06ionic: Use debugfs_create_bool() to export boolGeert Uytterhoeven1-2/+1
Currently bool ionic_cq.done_color is exported using debugfs_create_u8(), which requires a cast, preventing further compiler checks. Fix this by switching to debugfs_create_bool(), and dropping the cast. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: dsa: Do not leave DSA master with NULL netdev_opsFlorian Fainelli1-1/+2
When ndo_get_phys_port_name() for the CPU port was added we introduced an early check for when the DSA master network device in dsa_master_ndo_setup() already implements ndo_get_phys_port_name(). When we perform the teardown operation in dsa_master_ndo_teardown() we would not be checking that cpu_dp->orig_ndo_ops was successfully allocated and non-NULL initialized. With network device drivers such as virtio_net, this leads to a NPD as soon as the DSA switch hanging off of it gets torn down because we are now assigning the virtio_net device's netdev_ops a NULL pointer. Fixes: da7b9e9b00d4 ("net: dsa: Add ndo_get_phys_port_name() for CPU port") Reported-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirredVladimir Oltean1-5/+3
This was caused by a poor merge conflict resolution on my side. The "act = &cls->rule->action.entries[0];" assignment was already present in the code prior to the patch mentioned below. Fixes: e13c2075280e ("net: dsa: refactor matchall mirred action to separate function") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: stricter validation of untrusted gso packetsWillem de Bruijn1-2/+24
Syzkaller again found a path to a kernel crash through bad gso input: a packet with transport header extending beyond skb_headlen(skb). Tighten validation at kernel entry: - Verify that the transport header lies within the linear section. To avoid pulling linux/tcp.h, verify just sizeof tcphdr. tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use. - Match the gso_type against the ip_proto found by the flow dissector. Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06seg6: fix SRH processing to comply with RFC8754Ahmed Abdelsalam1-2/+8
The Segment Routing Header (SRH) which defines the SRv6 dataplane is defined in RFC8754. RFC8754 (section 4.1) defines the SR source node behavior which encapsulates packets into an outer IPv6 header and SRH. The SR source node encodes the full list of Segments that defines the packet path in the SRH. Then, the first segment from list of Segments is copied into the Destination address of the outer IPv6 header and the packet is sent to the first hop in its path towards the destination. If the Segment list has only one segment, the SR source node can omit the SRH as he only segment is added in the destination address. RFC8754 (section 4.1.1) defines the Reduced SRH, when a source does not require the entire SID list to be preserved in the SRH. A reduced SRH does not contain the first segment of the related SR Policy (the first segment is the one already in the DA of the IPv6 header), and the Last Entry field is set to n-2, where n is the number of elements in the SR Policy. RFC8754 (section 4.3.1.1) defines the SRH processing and the logic to validate the SRH (S09, S10, S11) which works for both reduced and non-reduced behaviors. This patch updates seg6_validate_srh() to validate the SRH as per RFC8754. Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not msVladimir Oltean1-2/+9
One may notice that automatically-learnt entries 'never' expire, even though the bridge configures the address age period at 300 seconds. Actually the value written to hardware corresponds to a time interval 1000 times higher than intended, i.e. 83 hours. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Faineli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: dsa: ocelot: the MAC table on Felix is twice as largeVladimir Oltean6-4/+7
When running 'bridge fdb dump' on Felix, sometimes learnt and static MAC addresses would appear, sometimes they wouldn't. Turns out, the MAC table has 4096 entries on VSC7514 (Ocelot) and 8192 entries on VSC9959 (Felix), so the existing code from the Ocelot common library only dumped half of Felix's MAC table. They are both organized as a 4-way set-associative TCAM, so we just need a single variable indicating the correct number of rows. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07ARM: futex: Address build warningThomas Gleixner1-2/+7
Stephen reported the following build warning on a ARM multi_v7_defconfig build with GCC 9.2.1: kernel/futex.c: In function 'do_futex': kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] 1676 | return oldval == cmparg; | ~~~~~~~^~~~~~~~~ kernel/futex.c:1652:6: note: 'oldval' was declared here 1652 | int oldval, ret; | ^~~~~~ introduced by commit a08971e9488d ("futex: arch_futex_atomic_op_inuser() calling conventions change"). While that change should not make any difference it confuses GCC which fails to work out that oldval is not referenced when the return value is not zero. GCC fails to properly analyze arch_futex_atomic_op_inuser(). It's not the early return, the issue is with the assembly macros. GCC fails to detect that those either set 'ret' to 0 and set oldval or set 'ret' to -EFAULT which makes oldval uninteresting. The store to the callsite supplied oldval pointer is conditional on ret == 0. The straight forward way to solve this is to make the store unconditional. Aside of addressing the build warning this makes sense anyway because it removes the conditional from the fastpath. In the error case the stored value is uninteresting and the extra store does not matter at all. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87pncao2ph.fsf@nanos.tec.linutronix.de
2020-05-06net: dsa: sja1105: the PTP_CLK extts input reacts on both edgesVladimir Oltean1-8/+18
It looks like the sja1105 external timestamping input is not as generic as we thought. When fed a signal with 50% duty cycle, it will timestamp both the rising and the falling edge. When fed a short pulse signal, only the timestamp of the falling edge will be seen in the PTPSYNCTS register, because that of the rising edge had been overwritten. So the moral is: don't feed it short pulse inputs. Luckily this is not a complete deal breaker, as we can still work with 1 Hz square waves. But the problem is that the extts polling period was not dimensioned enough for this input signal. If we leave the period at half a second, we risk losing timestamps due to jitter in the measuring process. So we need to increase it to 4 times per second. Also, the very least we can do to inform the user is to deny any other flags combination than with PTP_RISING_EDGE and PTP_FALLING_EDGE both set. Fixes: 747e5eb31d59 ("net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06selftests: net: tcp_mmap: fix SO_RCVLOWAT settingEric Dumazet1-1/+3
Since chunk_size is no longer an integer, we can not use it directly as an argument of setsockopt(). This patch should fix tcp_mmap for Big Endian kernels. Fixes: 597b01edafac ("selftests: net: avoid ptl lock contention in tcp_mmap") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Arjun Roy <arjunroy@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: hsr: fix incorrect type usage for protocol variableMurali Karicheri1-1/+1
Fix following sparse checker warning:- net/hsr/hsr_slave.c:38:18: warning: incorrect type in assignment (different base types) net/hsr/hsr_slave.c:38:18: expected unsigned short [unsigned] [usertype] protocol net/hsr/hsr_slave.c:38:18: got restricted __be16 [usertype] h_proto net/hsr/hsr_slave.c:39:25: warning: restricted __be16 degrades to integer net/hsr/hsr_slave.c:39:57: warning: restricted __be16 degrades to integer Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: macsec: fix rtnl locking issueAntoine Tenart1-1/+2
netdev_update_features() must be called with the rtnl lock taken. Not doing so triggers a warning, as ASSERT_RTNL() is used in __netdev_update_features(), the first function called by netdev_update_features(). Fix this. Fixes: c850240b6c41 ("net: macsec: report real_dev features when HW offloading is enabled") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()Dan Carpenter1-0/+3
The "info->fs.location" is a u32 that comes from the user via the ethtool_set_rxnfc() function. We need to check for invalid values to prevent a buffer overflow. I copy and pasted this check from the mvpp2_ethtool_cls_rule_ins() function. Fixes: 90b509b39ac9 ("net: mvpp2: cls: Add Classification offload support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: mvpp2: prevent buffer overflow in mvpp22_rss_ctx()Dan Carpenter1-0/+2
The "rss_context" variable comes from the user via ethtool_get_rxfh(). It can be any u32 value except zero. Eventually it gets passed to mvpp22_rss_ctx() and if it is over MVPP22_N_RSS_TABLES (8) then it results in an array overflow. Fixes: 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06selftests: net: tcp_mmap: clear whole tcp_zerocopy_receive structEric Dumazet1-1/+2
We added fields in tcp_zerocopy_receive structure, so make sure to clear all fields to not pass garbage to the kernel. We were lucky because recent additions added 'out' parameters, still we need to clean our reference implementation, before folks copy/paste it. Fixes: c8856c051454 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.") Fixes: 33946518d493 ("tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06tracing/kprobes: Reject new event if loc is NULLMasami Hiramatsu1-0/+6
Reject the new event which has NULL location for kprobes. For kprobes, user must specify at least the location. Link: http://lkml.kernel.org/r/158779376597.6082.1411212055469099461.stgit@devnote2 Cc: Tom Zanussi <zanussi@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Fixes: 2a588dd1d5d6 ("tracing: Add kprobe event command generation functions") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06tracing/boottime: Fix kprobe event API usageMasami Hiramatsu1-12/+8
Fix boottime kprobe events to use API correctly for multiple events. For example, when we set a multiprobe kprobe events in bootconfig like below, ftrace.event.kprobes.myevent { probes = "vfs_read $arg1 $arg2", "vfs_write $arg1 $arg2" } This cause an error; trace_boot: Failed to add probe: p:kprobes/myevent (null) vfs_read $arg1 $arg2 vfs_write $arg1 $arg2 This shows the 1st argument becomes NULL and multiprobes are merged to 1 probe. Link: http://lkml.kernel.org/r/158779375766.6082.201939936008972838.stgit@devnote2 Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Fixes: 29a154810546 ("tracing: Change trace_boot to use kprobe_event interface") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06tracing/kprobes: Fix a double initialization typoMasami Hiramatsu1-1/+1
Fix a typo that resulted in an unnecessary double initialization to addr. Link: http://lkml.kernel.org/r/158779374968.6082.2337484008464939919.stgit@devnote2 Cc: Tom Zanussi <zanussi@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Fixes: c7411a1a126f ("tracing/kprobe: Check whether the non-suffixed symbol is notrace") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06bootconfig: Fix to remove bootconfig data from initrd while bootMasami Hiramatsu1-17/+52
If there is a bootconfig data in the tail of initrd/initramfs, initrd image sanity check caused an error while decompression stage as follows. [ 0.883882] Unpacking initramfs... [ 2.696429] Initramfs unpacking failed: invalid magic at start of compressed archive This error will be ignored if CONFIG_BLK_DEV_RAM=n, but CONFIG_BLK_DEV_RAM=y the kernel failed to mount rootfs and causes a panic. To fix this issue, shrink down the initrd_end for removing tailing bootconfig data while boot the kernel. Link: http://lkml.kernel.org/r/158788401014.24243.17424755854115077915.stgit@devnote2 Cc: Borislav Petkov <bp@alien8.de> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: 7684b8582c24 ("bootconfig: Load boot config from the tail of initrd") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu3-0/+3
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06KVM: selftests: Fix build for evmcs.hPeter Xu2-2/+5
I got this error when building kvm selftests: /usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: multiple definition of `current_evmcs'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: first defined here /usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: multiple definition of `current_vp_assist'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: first defined here I think it's because evmcs.h is included both in a test file and a lib file so the structs have multiple declarations when linking. After all it's not a good habit to declare structs in the header files. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200504220607.99627-1-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bitsPaolo Bonzini1-15/+5
Using CPUID data can be useful for the processor compatibility check, but that's it. Using it to compute guest-reserved bits can have both false positives (such as LA57 and UMIP which we are already handling) and false negatives: in particular, with this patch we don't allow anymore a KVM guest to set CR4.PKE when CR4.PKE is clear on the host. Fixes: b9dd21e104bc ("KVM: x86: simplify handling of PKRU") Reported-by: Jim Mattson <jmattson@google.com> Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB pathSean Christopherson1-0/+3
Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so that KVM doesn't interpret clobbered RFLAGS as a VM-Fail. Filling the RSB has always clobbered RFLAGS, its current incarnation just happens clear CF and ZF in the processs. Relying on the macro to clear CF and ZF is extremely fragile, e.g. commit 089dd8e53126e ("x86/speculation: Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such that the ZF flag is always set. Reported-by: Qian Cai <cai@lca.pw> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Fixes: f2fde6a5bcfcf ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200506035355.2242-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06docs/virt/kvm: Document configuring and running nested guestsKashyap Chamarthy2-0/+278
This is a rewrite of this[1] Wiki page with further enhancements. The doc also includes a section on debugging problems in nested environments, among other improvements. [1] https://www.linux-kvm.org/page/Nested_Guests Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com> Message-Id: <20200505112839.30534-1-kchamart@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-05RISC-V: Remove unused code from STRICT_KERNEL_RWXAtish Patra2-24/+0
This patch removes the unused functions set_kernel_text_rw/ro. Currently, it is not being invoked from anywhere and no other architecture (except arm) uses this code. Even in ARM, these functions are not invoked from anywhere currently. Fixes: d27c3c90817e ("riscv: add STRICT_KERNEL_RWX support") Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-05neigh: send protocol value in neighbor create notificationRoman Mashak1-3/+3
When a new neighbor entry has been added, event is generated but it does not include protocol, because its value is assigned after the event notification routine has run, so move protocol assignment code earlier. Fixes: df9b0e30d44c ("neighbor: Add protocol attribute") Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Roman Mashak <mrv@mojatatu.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05drm/amd/display: Prevent dpcd reads with passive donglesAurabindo Pillai1-6/+11
[why] During hotplug, a DP port may be connected to the sink through passive adapter which does not support DPCD reads. Issuing reads without checking for this condition will result in errors [how] Ensure the link is in aux_mode before initiating operation that result in a DPCD read. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-05drm/amd/display: fix counter in wait_for_no_pipes_pendingRoman Li1-3/+2
[Why] Wait counter is not being reset for each pipe. [How] Move counter reset into pipe loop scope. Signed-off-by: Roman Li <roman.li@amd.com> Reviewed-by: Zhan Liu <Zhan.Liu@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>