aboutsummaryrefslogtreecommitdiffstats
path: root/lib (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-03-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-8/+1
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-03-31 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add raw BPF tracepoint API in order to have a BPF program type that can access kernel internal arguments of the tracepoints in their raw form similar to kprobes based BPF programs. This infrastructure also adds a new BPF_RAW_TRACEPOINT_OPEN command to BPF syscall which returns an anon-inode backed fd for the tracepoint object that allows for automatic detach of the BPF program resp. unregistering of the tracepoint probe on fd release, from Alexei. 2) Add new BPF cgroup hooks at bind() and connect() entry in order to allow BPF programs to reject, inspect or modify user space passed struct sockaddr, and as well a hook at post bind time once the port has been allocated. They are used in FB's container management engine for implementing policy, replacing fragile LD_PRELOAD wrapper intercepting bind() and connect() calls that only works in limited scenarios like glibc based apps but not for other runtimes in containerized applications, from Andrey. 3) BPF_F_INGRESS flag support has been added to sockmap programs for their redirect helper call bringing it in line with cls_bpf based programs. Support is added for both variants of sockmap programs, meaning for tx ULP hooks as well as recv skb hooks, from John. 4) Various improvements on BPF side for the nfp driver, besides others this work adds BPF map update and delete helper call support from the datapath, JITing of 32 and 64 bit XADD instructions as well as offload support of bpf_get_prandom_u32() call. Initial implementation of nfp packet cache has been tackled that optimizes memory access (see merge commit for further details), from Jakub and Jiong. 5) Removal of struct bpf_verifier_env argument from the print_bpf_insn() API has been done in order to prepare to use print_bpf_insn() soon out of perf tool directly. This makes the print_bpf_insn() API more generic and pushes the env into private data. bpftool is adjusted as well with the print_bpf_insn() argument removal, from Jiri. 6) Couple of cleanups and prep work for the upcoming BTF (BPF Type Format). The latter will reuse the current BPF verifier log as well, thus bpf_verifier_log() is further generalized, from Martin. 7) For bpf_getsockopt() and bpf_setsockopt() helpers, IPv4 IP_TOS read and write support has been added in similar fashion to existing IPv6 IPV6_TCLASS socket option we already have, from Nikita. 8) Fixes in recent sockmap scatterlist API usage, which did not use sg_init_table() for initialization thus triggering a BUG_ON() in scatterlist API when CONFIG_DEBUG_SG was enabled. This adds and uses a small helper sg_init_marker() to properly handle the affected cases, from Prashant. 9) Let the BPF core follow IDR code convention and therefore use the idr_preload() and idr_preload_end() helpers, which would also help idr_alloc_cyclic() under GFP_ATOMIC to better succeed under memory pressure, from Shaohua. 10) Last but not least, a spelling fix in an error message for the BPF cookie UID helper under BPF sample code, from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31rhashtable: add schedule pointsEric Dumazet1-0/+2
Rehashing and destroying large hash table takes a lot of time, and happens in process context. It is safe to add cond_resched() in rhashtable_rehash_table() and rhashtable_free_and_destroy() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30lib/scatterlist: add sg_init_marker() helperPrashant Bhole1-8/+1
sg_init_marker initializes sg_magic in the sg table and calls sg_mark_end() on the last entry of the table. This can be useful to avoid memset in sg_init_table() when scatterlist is already zeroed out For example: when scatterlist is embedded inside other struct and that container struct is zeroed out Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-29test_bpf: Fix NULL vs IS_ERR() check in test_skb_segment()Dan Carpenter1-1/+1
The skb_segment() function returns error pointers on error. It never returns NULL. Fixes: 76db8087c4c9 ("net: bpf: add a test for skb_segment in test_bpf module") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai1-1/+0
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25net: bpf: add a test for skb_segment in test_bpf moduleYonghong Song1-2/+91
Without the previous commit, "modprobe test_bpf" will have the following errors: ... [ 98.149165] ------------[ cut here ]------------ [ 98.159362] kernel BUG at net/core/skbuff.c:3667! [ 98.169756] invalid opcode: 0000 [#1] SMP PTI [ 98.179370] Modules linked in: [ 98.179371] test_bpf(+) ... which triggers the bug the previous commit intends to fix. The skbs are constructed to mimic what mlx5 may generate. The packet size/header may not mimic real cases in production. But the processing flow is similar. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller8-10/+154
Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+4
Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, thp: do not cause memcg oom for thp mm/vmscan: wake up flushers for legacy cgroups too Revert "mm: page_alloc: skip over regions of invalid pfns where possible" mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink() mm/thp: do not wait for lock_page() in deferred_split_scan() mm/khugepaged.c: convert VM_BUG_ON() to collapse fail x86/mm: implement free pmd/pte page interfaces mm/vmalloc: add interfaces to free unmapped page table h8300: remove extraneous __BIG_ENDIAN definition hugetlbfs: check for pgoff value overflow lockdep: fix fs_reclaim warning MAINTAINERS: update Mark Fasheh's e-mail mm/mempolicy.c: avoid use uninitialized preferred_node
2018-03-22mm/vmalloc: add interfaces to free unmapped page tableToshi Kani1-2/+4
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may create pud/pmd mappings. A kernel panic was observed on arm64 systems with Cortex-A75 in the following steps as described by Hanjun Guo. 1. ioremap a 4K size, valid page table will build, 2. iounmap it, pte0 will set to 0; 3. ioremap the same address with 2M size, pgd/pmd is unchanged, then set the a new value for pmd; 4. pte0 is leaked; 5. CPU may meet exception because the old pmd is still in TLB, which will lead to kernel panic. This panic is not reproducible on x86. INVLPG, called from iounmap, purges all levels of entries associated with purged address on x86. x86 still has memory leak. The patch changes the ioremap path to free unmapped page table(s) since doing so in the unmap path has the following issues: - The iounmap() path is shared with vunmap(). Since vmap() only supports pte mappings, making vunmap() to free a pte page is an overhead for regular vmap users as they do not need a pte page freed up. - Checking if all entries in a pte page are cleared in the unmap path is racy, and serializing this check is expensive. - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges. Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB purge. Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which clear a given pud/pmd entry and free up a page for the lower level entries. This patch implements their stub functions on x86 and arm64, which work as workaround. [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub] Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings") Reported-by: Lei Li <lious.lilei@hisilicon.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Wang Xuefeng <wxf.wang@hisilicon.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-2/+138
Pull networking fixes from David Miller: 1) Always validate XFRM esn replay attribute, from Florian Westphal. 2) Fix RCU read lock imbalance in xfrm_get_tos(), from Xin Long. 3) Don't try to get firmware dump if not loaded in iwlwifi, from Shaul Triebitz. 4) Fix BPF helpers to deal with SCTP GSO SKBs properly, from Daniel Axtens. 5) Fix some interrupt handling issues in e1000e driver, from Benjamin Poitier. 6) Use strlcpy() in several ethtool get_strings methods, from Florian Fainelli. 7) Fix rhlist dup insertion, from Paul Blakey. 8) Fix SKB leak in netem packet scheduler, from Alexey Kodanev. 9) Fix driver unload crash when link is up in smsc911x, from Jeremy Linton. 10) Purge out invalid socket types in l2tp_tunnel_create(), from Eric Dumazet. 11) Need to purge the write queue when TCP connections are aborted, otherwise userspace using MSG_ZEROCOPY can't close the fd. From Soheil Hassas Yeganeh. 12) Fix double free in error path of team driver, from Arkadi Sharshevsky. 13) Filter fixes for hv_netvsc driver, from Stephen Hemminger. 14) Fix non-linear packet access in ipv6 ndisc code, from Lorenzo Bianconi. 15) Properly filter out unsupported feature flags in macvlan driver, from Shannon Nelson. 16) Don't request loading the diag module for a protocol if the protocol itself is not even registered. From Xin Long. 17) If datagram connect fails in ipv6, make sure the socket state is consistent afterwards. From Paolo Abeni. 18) Use after free in qed driver, from Dan Carpenter. 19) If received ipv4 PMTU is less than the min pmtu, lock the mtu in the entry. From Sabrina Dubroca. 20) Fix sleep in atomic in tg3 driver, from Jonathan Toppins. 21) Fix vlan in vlan untagging in some situations, from Toshiaki Makita. 22) Fix double SKB free in genlmsg_mcast(). From Nicolas Dichtel. 23) Fix NULL derefs in error paths of tcf_*_init(), from Davide Caratti. 24) Unbalanced PM runtime calls in FEC driver, from Florian Fainelli. 25) Memory leak in gemini driver, from Igor Pylypiv. 26) IDR leaks in error paths of tcf_*_init() functions, from Davide Caratti. 27) Need to use GFP_ATOMIC in seg6_build_state(), from David Lebrun. 28) Missing dev_put() in error path of macsec_newlink(), from Dan Carpenter. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (201 commits) macsec: missing dev_put() on error in macsec_newlink() net: dsa: Fix functional dsa-loop dependency on FIXED_PHY hv_netvsc: common detach logic hv_netvsc: change GPAD teardown order on older versions hv_netvsc: use RCU to fix concurrent rx and queue changes hv_netvsc: disable NAPI before channel close net/ipv6: Handle onlink flag with multipath routes ppp: avoid loop in xmit recursion detection code ipv6: sr: fix NULL pointer dereference when setting encap source address ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state net: aquantia: driver version bump net: aquantia: Implement pci shutdown callback net: aquantia: Allow live mac address changes net: aquantia: Add tx clean budget and valid budget handling logic net: aquantia: Change inefficient wait loop on fw data reads net: aquantia: Fix a regression with reset on old firmware net: aquantia: Fix hardware reset when SPI may rarely hangup s390/qeth: on channel error, reject further cmd requests s390/qeth: lock read device while queueing next buffer s390/qeth: when thread completes, wake up all waiters ...
2018-03-22netns: send uevent messagesChristian Brauner1-1/+78
This patch adds a receive method to NETLINK_KOBJECT_UEVENT netlink sockets to allow sending uevent messages into the network namespace the socket belongs to. Currently non-initial network namespaces are already isolated and don't receive uevents. There are a number of cases where it is beneficial for a sufficiently privileged userspace process to send a uevent into a network namespace. One such use case would be debugging and fuzzing of a piece of software which listens and reacts to uevents. By running a copy of that software inside a network namespace, specific uevents could then be presented to it. More concretely, this would allow for easy testing of udevd/ueventd. This will also allow some piece of software to run components inside a separate network namespace and then effectively filter what that software can receive. Some examples of software that do directly listen to uevents and that we have in the past attempted to run inside a network namespace are rbd (CEPH client) or the X server. Implementation: The implementation has been kept as simple as possible from the kernel's perspective. Specifically, a simple input method uevent_net_rcv() is added to NETLINK_KOBJECT_UEVENT sockets which completely reuses existing af_netlink infrastructure and does neither add an additional netlink family nor requires any user-visible changes. For example, by using netlink_rcv_skb() we can make use of existing netlink infrastructure to report back informative error messages to userspace. Furthermore, this implementation does not introduce any overhead for existing uevent generating codepaths. The struct netns got a new uevent socket member that records the uevent socket associated with that network namespace including its position in the uevent socket list. Since we record the uevent socket for each network namespace in struct net we don't have to walk the whole uevent socket list. Instead we can directly retrieve the relevant uevent socket and send the message. At exit time we can now also trivially remove the uevent socket from the uevent socket list. This keeps the codepath very performant without introducing needless overhead and even makes older codepaths faster. Uevent sequence numbers are kept global. When a uevent message is sent to another network namespace the implementation will simply increment the global uevent sequence number and append it to the received uevent. This has the advantage that the kernel will never need to parse the received uevent message to replace any existing uevent sequence numbers. Instead it is up to the userspace process to remove any existing uevent sequence numbers in case the uevent message to be sent contains any. Security: In order for a caller to send uevent messages to a target network namespace the caller must have CAP_SYS_ADMIN in the owning user namespace of the target network namespace. Additionally, any received uevent message is verified to not exceed size UEVENT_BUFFER_SIZE. This includes the space needed to append the uevent sequence number. Testing: This patch has been tested and verified to work with the following udev implementations: 1. CentOS 6 with udevd version 147 2. Debian Sid with systemd-udevd version 237 3. Android 7.1.1 with ueventd Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: add uevent socket memberChristian Brauner1-10/+7
This commit adds struct uevent_sock to struct net. Since struct uevent_sock records the position of the uevent socket in the uevent socket list we can trivially remove it from the uevent socket list during cleanup. This speeds up the old removal codepath. Note, list_del() will hit __list_del_entry_valid() in its call chain which will validate that the element is a member of the list. If it isn't it will take care that the list is not modified. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other archesThadeu Lima de Souza Cascardo1-1/+1
Function bpf_fill_maxinsns11 is designed to not be able to be JITed on x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and commit 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that case. However, it does not fail on other architectures, which have a different JIT compiler design. So, test_bpf has started to fail to load on those. After this fix, test_bpf loads fine on both x86_64 and ppc64el. Fixes: 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19Merge branch 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpuLinus Torvalds1-0/+2
Pull percpu fixes from Tejun Heo: "Late percpu pull request for v4.16-rc6. - percpu allocator pool replenishing no longer triggers OOM or warning messages. Also, the alloc interface now understands __GFP_NORETRY and __GFP_NOWARN. This is to allow avoiding OOMs from userland triggered actions like bpf map creation. Also added cond_resched() in alloc loop. - perpcu allocation now can be interrupted by kill sigs to avoid deadlocking OOM killer. - Added Dennis Zhou as a co-maintainer. He has rewritten the area map allocator, understands most of the code base and has been responsive for all bug reports" * 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu_ref: Update doc to dissuade users from depending on internal RCU grace periods mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() percpu: include linux/sched.h for cond_resched() percpu: add a schedule point in pcpu_balance_workfn() percpu: allow select gfp to be passed to underlying allocators percpu: add __GFP_NORETRY semantics to the percpu balancing path percpu: match chunk allocator declarations with definitions percpu: add Dennis Zhou as a percpu co-maintainer
2018-03-19percpu_ref: Update doc to dissuade users from depending on internal RCU grace periodsTejun Heo1-0/+2
percpu_ref internally uses sched-RCU to implement the percpu -> atomic mode switching and the documentation suggested that this could be depended upon. This doesn't seem like a good idea. * percpu_ref uses sched-RCU which has different grace periods regular RCU. Users may combine percpu_ref with regular RCU usage and incorrectly believe that regular RCU grace periods are performed by percpu_ref. This can lead to, for example, use-after-free due to premature freeing. * percpu_ref has a grace period when switching from percpu to atomic mode. It doesn't have one between the last put and release. This distinction is subtle and can lead to surprising bugs. * percpu_ref allows starting in and switching to atomic mode manually for debugging and other purposes. This means that there may not be any grace periods from kill to release. This patch makes it clear that the grace periods are percpu_ref's internal implementation detail and can't be depended upon by the users. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-14btree: avoid variable-length allocationsJoern Engel1-4/+6
geo->keylen cannot be larger than 4. So we might as well make fixed-size allocations. Given the one remaining user, geo->keylen cannot even be larger than 1. Logfs used to have 64bit and 128bit keys, tcm_qla2xxx only has 32bit keys. But let's not break the code if we don't have to. Signed-off-by: Joern Engel <joern@purestorage.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-09lib/test_kmod.c: fix limit check on number of test devices createdLuis R. Rodriguez1-1/+1
As reported by Dan the parentheses is in the wrong place, and since unlikely() call returns either 0 or 1 it's never less than zero. The second issue is that signed integer overflows like "INT_MAX + 1" are undefined behavior. Since num_test_devs represents the number of devices, we want to stop prior to hitting the max, and not rely on the wrap arround at all. So just cap at num_test_devs + 1, prior to assigning a new device. Link: http://lkml.kernel.org/r/20180224030046.24238-1-mcgrof@kernel.org Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-09lib/bug.c: exclude non-BUG/WARN exceptions from report_bug()Kees Cook1-0/+2
Commit b8347c219649 ("x86/debug: Handle warnings before the notifier chain, to fix KGDB crash") changed the ordering of fixups, and did not take into account the case of x86 processing non-WARN() and non-BUG() exceptions. This would lead to output of a false BUG line with no other information. In the case of a refcount exception, it would be immediately followed by the refcount WARN(), producing very strange double-"cut here": lkdtm: attempting bad refcount_inc() overflow ------------[ cut here ]------------ Kernel BUG at 0000000065f29de5 [verbose debug info unavailable] ------------[ cut here ]------------ refcount_t overflow at lkdtm_REFCOUNT_INC_OVERFLOW+0x6b/0x90 in cat[3065], uid/euid: 0/0 WARNING: CPU: 0 PID: 3065 at kernel/panic.c:657 refcount_error_report+0x9a/0xa4 ... In the prior ordering, exceptions were searched first: do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str, ... if (fixup_exception(regs, trapnr)) return 0; - if (fixup_bug(regs, trapnr)) - return 0; - As a result, fixup_bugs()'s is_valid_bugaddr() didn't take into account needing to search the exception list first, since that had already happened. So, instead of searching the exception list twice (once in is_valid_bugaddr() and then again in fixup_exception()), just add a simple sanity check to report_bug() that will immediately bail out if a BUG() (or WARN()) entry is not found. Link: http://lkml.kernel.org/r/20180301225934.GA34350@beast Fixes: b8347c219649 ("x86/debug: Handle warnings before the notifier chain, to fix KGDB crash") Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-09bug: use %pB in BUG and stack protector failureKees Cook1-1/+1
The BUG and stack protector reports were still using a raw %p. This changes it to %pB for more meaningful output. Link: http://lkml.kernel.org/r/20180301225704.GA34198@beast Fixes: ad67b74d2469 ("printk: hash addresses printed with %p") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Richard Weinberger <richard.weinberger@gmail.com>, Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-07test_rhashtable: add test case for rhltable with duplicate objectsPaul Blakey1-0/+134
Tries to insert duplicates in the middle of bucket's chain: bucket 1: [[val 21 (tid=1)]] -> [[ val 1 (tid=2), val 1 (tid=0) ]] Reuses tid to distinguish the elements insertion order. Signed-off-by: Paul Blakey <paulb@mellanox.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07rhashtable: Fix rhlist duplicates insertionPaul Blakey1-1/+3
When inserting duplicate objects (those with the same key), current rhlist implementation messes up the chain pointers by updating the bucket pointer instead of prev next pointer to the newly inserted node. This causes missing elements on removal and travesal. Fix that by properly updating pprev pointer to point to the correct rhash_head next pointer. Issue: 1241076 Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7 Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface') Signed-off-by: Paul Blakey <paulb@mellanox.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-12/+15
All of the conflicts were cases of overlapping changes. In net/core/devlink.c, we have to make care that the resouce size_params have become a struct member rather than a pointer to such an object. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+3
Pull networking fixes from David Miller: 1) Use an appropriate TSQ pacing shift in mac80211, from Toke Høiland-Jørgensen. 2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk in ip6_route_me_harder, from Eric Dumazet. 3) Fix several shutdown races and similar other problems in l2tp, from James Chapman. 4) Handle missing XDP flush properly in tuntap, for real this time. From Jason Wang. 5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel Borkmann. 6) Fix phy_resume() locking, from Andrew Lunn. 7) IFLA_MTU values are ignored on newlink for some tunnel types, fix from Xin Long. 8) Revert F-RTO middle box workarounds, they only handle one dimension of the problem. From Yuchung Cheng. 9) Fix socket refcounting in RDS, from Ka-Cheong Poon. 10) Don't allow ppp unit registration to an unregistered channel, from Guillaume Nault. 11) Various hv_netvsc fixes from Stephen Hemminger. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits) hv_netvsc: propagate rx filters to VF hv_netvsc: filter multicast/broadcast hv_netvsc: defer queue selection to VF hv_netvsc: use napi_schedule_irqoff hv_netvsc: fix race in napi poll when rescheduling hv_netvsc: cancel subchannel setup before halting device hv_netvsc: fix error unwind handling if vmbus_open fails hv_netvsc: only wake transmit queue if link is up hv_netvsc: avoid retry on send during shutdown virtio-net: re enable XDP_REDIRECT for mergeable buffer ppp: prevent unregistered channels from connecting to PPP units tc-testing: skbmod: fix match value of ethertype mlxsw: spectrum_switchdev: Check success of FDB add operation net: make skb_gso_*_seglen functions private net: xfrm: use skb_gso_validate_network_len() to check gso sizes net: sched: tbf: handle GSO_BY_FRAGS case in enqueue net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len rds: Incorrect reference counting in TCP socket creation net: ethtool: don't ignore return from driver get_fecparam method vrf: check forwarding on the original netdevice when generating ICMP dest unreachable ...
2018-03-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-1/+3
Daniel Borkmann says: ==================== pull-request: bpf 2018-02-28 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Add schedule points and reduce the number of loop iterations the test_bpf kernel module is performing in order to not hog the CPU for too long, from Eric. 2) Fix an out of bounds access in tail calls in the ppc64 BPF JIT compiler, from Daniel. 3) Fix a crash on arm64 on unaligned BPF xadd operations that could be triggered via interpreter and JIT, from Daniel. Please not that once you merge net into net-next at some point, there is a minor merge conflict in test_verifier.c since test cases had been added at the end in both trees. Resolution is trivial: keep all the test cases from both trees. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28Merge tag 'dma-mapping-4.16-3' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-5/+5
Pull dma-mapping fix from Christoph Hellwig: "A single fix for a memory leak regression in the dma-debug code" * tag 'dma-mapping-4.16-3' of git://git.infradead.org/users/hch/dma-mapping: dma-debug: fix memory leak in debug_dma_alloc_coherent
2018-02-28test_bpf: reduce MAX_TESTRUNSEric Dumazet1-1/+1
For tests that are using the maximal number of BPF instruction, each run takes 20 usec. Looping 10,000 times on them totals 200 ms, which is bad when the loop is not preemptible. test_bpf: #264 BPF_MAXINSNS: Call heavy transformations jited:1 19248 18548 PASS test_bpf: #269 BPF_MAXINSNS: ld_abs+get_processor_id jited:1 20896 PASS Lets divide by ten the number of iterations, so that max latency is 20ms. We could use need_resched() to break the loop earlier if we believe 20 ms is too much. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-26test_bpf: add a schedule pointEric Dumazet1-0/+2
test_bpf() is taking 1.6 seconds nowadays, it is time to add a schedule point in it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-26idr: Fix handling of IDs above INT_MAXMatthew Wilcox1-6/+7
Khalid reported that the kernel selftests are currently failing: selftests: test_bpf.sh ======================================== test_bpf: [FAIL] not ok 1..8 selftests: test_bpf.sh [FAIL] He bisected it to 6ce711f2750031d12cec91384ac5cfa0a485b60a ("idr: Make 1-based IDRs more efficient"). The root cause is doing a signed comparison in idr_alloc_u32() instead of an unsigned comparison. I went looking for any similar problems and found a couple (which would each result in the failure to warn in two situations that aren't supposed to happen). I knocked up a few test-cases to prove that I was right and added them to the test-suite. Reported-by: Khalid Aziz <khalid.aziz@oracle.com> Tested-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2018-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-4/+3
2018-02-23Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printkLinus Torvalds1-1/+1
Pull printk fixlet from Petr Mladek: "People expect to see the real pointer value for %px. Let's substitute '(null)' only for the other %p? format modifiers that need to deference the pointer" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: vsprintf: avoid misleading "(null)" for %px
2018-02-22dma-debug: fix memory leak in debug_dma_alloc_coherentMiles Chen1-5/+5
Marty reported a memory leakage introduced by commit 3aaabbf1c39e ("lib/dma-debug.c: fix incorrect pfn calculation"). Fix it by checking the virtual address before allocating the entry. This patch also use virt_addr_valid() instead of virt_to_page() to check if a virtual address is linear. Fixes: 3aaabbf1 ("lib/dma-debug.c: fix incorrect pfn calculation") Reported-by: Marty Faltesek <mfaltesek@google.com> Signed-off-by: Miles Chen <miles.chen@mediatek.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-02-21lib/Kconfig.debug: enable RUNTIME_TESTING_MENUAnders Roxell1-0/+1
Commit d3deafaa8b5c ("lib/: make RUNTIME_TESTS a menuconfig to ease disabling it all") causes a regression when using runtime tests due to it defaults RUNTIME_TESTING_MENU to not set. Link: http://lkml.kernel.org/r/20180214133015.10090-1-anders.roxell@linaro.org Fixes: d3deafaa8b5c ("lib/: make RUNTIME_TESTS a menuconfig to easedisabling it all") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Cc: Vincent Legoll <vincent.legoll@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21ida: do zeroing in ida_pre_get()Rasmus Villemoes2-3/+1
As far as I can tell, the only place the per-cpu ida_bitmap is populated is in ida_pre_get. The pre-allocated element is stolen in two places in ida_get_new_above, in both cases immediately followed by a memset(0). Since ida_get_new_above is called with locks held, do the zeroing in ida_pre_get, or rather let kmalloc() do it. Also, apparently gcc generates ~44 bytes of code to do a memset(, 0, 128): $ scripts/bloat-o-meter vmlinux.{0,1} add/remove: 0/0 grow/shrink: 2/1 up/down: 5/-88 (-83) Function old new delta ida_pre_get 115 119 +4 vermagic 27 28 +1 ida_get_new_above 715 627 -88 Link: http://lkml.kernel.org/r/20180108225634.15340-1-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+5
2018-02-13net: Convert uevent_net_opsKirill Tkhai1-0/+1
uevent_net_init() and uevent_net_exit() create and destroy netlink socket, and these actions serialized in netlink code. Parallel execution with other pernet_operations makes the socket disappear earlier from uevent_sock_list on ->exit. As userspace can't be interested in broadcast messages of dying net, and, as I see, no one in kernel listen them, we may safely make uevent_net_ops async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-12dma-direct: comment the dma_direct_free calling conventionChristoph Hellwig1-0/+4
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-02-12dma-direct: mark as is_physChristoph Hellwig1-0/+1
Various PCI_DMA_BUS_IS_PHYS implementations rely on this flag to make proper decisions for block and networking addressability. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-02-09Merge tag 'kbuild-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds1-1/+0
Pull more Kbuild updates from Masahiro Yamada: "Makefile changes: - enable unused-variable warning that was wrongly disabled for clang Kconfig changes: - warn about blank 'help' and fix existing instances - fix 'choice' behavior to not write out invisible symbols - fix misc weirdness Coccinell changes: - fix false positive of free after managed memory alloc detection - improve performance of NULL dereference detection" * tag 'kbuild-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (21 commits) kconfig: remove const qualifier from sym_expand_string_value() kconfig: add xrealloc() helper kconfig: send error messages to stderr kconfig: echo stdin to stdout if either is redirected kconfig: remove check_stdin() kconfig: remove 'config*' pattern from .gitignnore kconfig: show '?' prompt even if no help text is available kconfig: do not write choice values when their dependency becomes n coccinelle: deref_null: avoid useless computation coccinelle: devm_free: reduce false positives kbuild: clang: disable unused variable warnings only when constant kconfig: Warn if help text is blank nios2: kconfig: Remove blank help text arm: vt8500: kconfig: Remove blank help text MIPS: kconfig: Remove blank help text MIPS: BCM63XX: kconfig: Remove blank help text lib/Kconfig.debug: Remove blank help text Staging: rtl8192e: kconfig: Remove blank help text Staging: rtl8192u: kconfig: Remove blank help text mmc: kconfig: Remove blank help text ...
2018-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-5/+26
Pull networking fixes from David Miller: 1) Make allocations less aggressive in x_tables, from Minchal Hocko. 2) Fix netfilter flowtable Kconfig deps, from Pablo Neira Ayuso. 3) Fix connection loss problems in rtlwifi, from Larry Finger. 4) Correct DRAM dump length for some chips in ath10k driver, from Yu Wang. 5) Fix ABORT handling in rxrpc, from David Howells. 6) Add SPDX tags to Sun networking drivers, from Shannon Nelson. 7) Some ipv6 onlink handling fixes, from David Ahern. 8) Netem packet scheduler interval calcualtion fix from Md. Islam. 9) Don't put crypto buffers on-stack in rxrpc, from David Howells. 10) Fix handling of error non-delivery status in netlink multicast delivery over multiple namespaces, from Nicolas Dichtel. 11) Missing xdp flush in tuntap driver, from Jason Wang. 12) Synchonize RDS protocol netns/module teardown with rds object management, from Sowini Varadhan. 13) Add nospec annotations to mpls, from Dan Williams. 14) Fix SKB truesize handling in TIPC, from Hoang Le. 15) Interrupt masking fixes in stammc from Niklas Cassel. 16) Don't allow ptr_ring objects to be sized outside of kmalloc's limits, from Jason Wang. 17) Don't allow SCTP chunks to be built which will have a length exceeding the chunk header's 16-bit length field, from Alexey Kodanev. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (82 commits) ibmvnic: Remove skb->protocol checks in ibmvnic_xmit bpf: fix rlimit in reuseport net selftest sctp: verify size of a new chunk in _sctp_make_chunk() s390/qeth: fix SETIP command handling s390/qeth: fix underestimated count of buffer elements ptr_ring: try vmalloc() when kmalloc() fails ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZE net: stmmac: remove redundant enable of PMT irq net: stmmac: rename GMAC_INT_DEFAULT_MASK for dwmac4 net: stmmac: discard disabled flags in interrupt status register ibmvnic: Reset long term map ID counter tools/libbpf: handle issues with bpf ELF objects containing .eh_frames selftests/bpf: add selftest that use test_libbpf_open selftests/bpf: add test program for loading BPF ELF files tools/libbpf: improve the pr_debug statements to contain section numbers bpf: Sync kernel ABI header with tooling header for bpf_common.h net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT net: thunder: change q_len's type to handle max ring size tipc: fix skb truesize/datasize ratio control net/sched: cls_u32: fix cls_u32 on filter replace ...
2018-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-5/+26
Daniel Borkmann says: ==================== pull-request: bpf 2018-02-09 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Two fixes for BPF sockmap in order to break up circular map references from programs attached to sockmap, and detaching related sockets in case of socket close() event. For the latter we get rid of the smap_state_change() and plug into ULP infrastructure, which will later also be used for additional features anyway such as TX hooks. For the second issue, dependency chain is broken up via map release callback to free parse/verdict programs, all from John. 2) Fix a libbpf relocation issue that was found while implementing XDP support for Suricata project. Issue was that when clang was invoked with default target instead of bpf target, then various other e.g. debugging relevant sections are added to the ELF file that contained relocation entries pointing to non-BPF related sections which libbpf trips over instead of skipping them. Test cases for libbpf are added as well, from Jesper. 3) Various misc fixes for bpftool and one for libbpf: a small addition to libbpf to make sure it recognizes all standard section prefixes. Then, the Makefile in bpftool/Documentation is improved to explicitly check for rst2man being installed on the system as we otherwise risk installing empty man pages; the man page for bpftool-map is corrected and a set of missing bash completions added in order to avoid shipping bpftool where the completions are only partially working, from Quentin. 4) Fix applying the relocation to immediate load instructions in the nfp JIT which were missing a shift, from Jakub. 5) Two fixes for the BPF kernel selftests: handle CONFIG_BPF_JIT_ALWAYS_ON=y gracefully in test_bpf.ko module and mark them as FLAG_EXPECTED_FAIL in this case; and explicitly delete the veth devices in the two tests test_xdp_{meta,redirect}.sh before dismantling the netnses as when selftests are run in batch mode, then workqueue to handle destruction might not have finished yet and thus veth creation in next test under same dev name would fail, from Yonghong. 6) Fix test_kmod.sh to check the test_bpf.ko module path before performing an insmod, and fallback to modprobe. Especially the latter is useful when having a device under test that has the modules installed instead, from Naresh. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08Merge branch 'idr-2018-02-06' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds2-63/+195
Pull idr updates from Matthew Wilcox: - test-suite improvements - replace the extended API by improving the normal API - performance improvement for IDRs which are 1-based rather than 0-based - add documentation * 'idr-2018-02-06' of git://git.infradead.org/users/willy/linux-dax: idr: Add documentation idr: Make 1-based IDRs more efficient idr: Warn if old iterators see large IDs idr: Rename idr_for_each_entry_ext idr: Remove idr_alloc_ext cls_u32: Convert to idr_alloc_u32 cls_u32: Reinstate cyclic allocation cls_flower: Convert to idr_alloc_u32 cls_bpf: Convert to use idr_alloc_u32 cls_basic: Convert to use idr_alloc_u32 cls_api: Convert to idr_alloc_u32 net sched actions: Convert to use idr_alloc_u32 idr: Add idr_alloc_u32 helper idr: Delete idr_find_ext function idr: Delete idr_replace_ext function idr: Delete idr_remove_ext function IDR test suite: Check handling negative end correctly idr test suite: Fix ida_test_random() radix tree test suite: Remove ARRAY_SIZE
2018-02-08vsprintf: avoid misleading "(null)" for %pxAdam Borowski1-1/+1
Like %pK already does, print "00000000" instead. This confused people -- the convention is that "(null)" means you tried to dereference a null pointer as opposed to printing the address. Link: http://lkml.kernel.org/r/20180204174521.21383-1-kilobyte@angband.pl To: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> To: Steven Rostedt <rostedt@goodmis.org> To: linux-kernel@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Roberts, William C" <william.c.roberts@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Adam Borowski <kilobyte@angband.pl> Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-02-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds13-372/+435
Merge misc updates from Andrew Morton: - kasan updates - procfs - lib/bitmap updates - other lib/ updates - checkpatch tweaks - rapidio - ubsan - pipe fixes and cleanups - lots of other misc bits * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) Documentation/sysctl/user.txt: fix typo MAINTAINERS: update ARM/QUALCOMM SUPPORT patterns MAINTAINERS: update various PALM patterns MAINTAINERS: update "ARM/OXNAS platform support" patterns MAINTAINERS: update Cortina/Gemini patterns MAINTAINERS: remove ARM/CLKDEV SUPPORT file pattern MAINTAINERS: remove ANDROID ION pattern mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors mm: docs: fix parameter names mismatch mm: docs: fixup punctuation pipe: read buffer limits atomically pipe: simplify round_pipe_size() pipe: reject F_SETPIPE_SZ with size over UINT_MAX pipe: fix off-by-one error when checking buffer limits pipe: actually allow root to exceed the pipe buffer limits pipe, sysctl: remove pipe_proc_fn() pipe, sysctl: drop 'min' parameter from pipe-max-size converter kasan: rework Kconfig settings crash_dump: is_kdump_kernel can be boolean kernel/mutex: mutex_is_locked can be boolean ...
2018-02-06kasan: rework Kconfig settingsArnd Bergmann2-1/+12
We get a lot of very large stack frames using gcc-7.0.1 with the default -fsanitize-address-use-after-scope --param asan-stack=1 options, which can easily cause an overflow of the kernel stack, e.g. drivers/gpu/drm/i915/gvt/handlers.c:2434:1: warning: the frame size of 46176 bytes is larger than 3072 bytes drivers/net/wireless/ralink/rt2x00/rt2800lib.c:5650:1: warning: the frame size of 23632 bytes is larger than 3072 bytes lib/atomic64_test.c:250:1: warning: the frame size of 11200 bytes is larger than 3072 bytes drivers/gpu/drm/i915/gvt/handlers.c:2621:1: warning: the frame size of 9208 bytes is larger than 3072 bytes drivers/media/dvb-frontends/stv090x.c:3431:1: warning: the frame size of 6816 bytes is larger than 3072 bytes fs/fscache/stats.c:287:1: warning: the frame size of 6536 bytes is larger than 3072 bytes To reduce this risk, -fsanitize-address-use-after-scope is now split out into a separate CONFIG_KASAN_EXTRA Kconfig option, leading to stack frames that are smaller than 2 kilobytes most of the time on x86_64. An earlier version of this patch also prevented combining KASAN_EXTRA with KASAN_INLINE, but that is no longer necessary with gcc-7.0.1. All patches to get the frame size below 2048 bytes with CONFIG_KASAN=y and CONFIG_KASAN_EXTRA=n have been merged by maintainers now, so we can bring back that default now. KASAN_EXTRA=y still causes lots of warnings but now defaults to !COMPILE_TEST to disable it in allmodconfig, and it remains disabled in all other defconfigs since it is a new option. I arbitrarily raise the warning limit for KASAN_EXTRA to 3072 to reduce the noise, but an allmodconfig kernel still has around 50 warnings on gcc-7. I experimented a bit more with smaller stack frames and have another follow-up series that reduces the warning limit for 64-bit architectures to 1280 bytes (without CONFIG_KASAN). With earlier versions of this patch series, I also had patches to address the warnings we get with KASAN and/or KASAN_EXTRA, using a "noinline_if_stackbloat" annotation. That annotation now got replaced with a gcc-8 bugfix (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715) and a workaround for older compilers, which means that KASAN_EXTRA is now just as bad as before and will lead to an instant stack overflow in a few extreme cases. This reverts parts of commit 3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y"). Two patches in linux-next should be merged first to avoid introducing warnings in an allmodconfig build: 3cd890dbe2a4 ("media: dvb-frontends: fix i2c access helpers for KASAN") 16c3ada89cff ("media: r820t: fix r820t_write_reg for KASAN") Do we really need to backport this? I think we do: without this patch, enabling KASAN will lead to unavoidable kernel stack overflow in certain device drivers when built with gcc-7 or higher on linux-4.10+ or any version that contains a backport of commit c5caf21ab0cf8. Most people are probably still on older compilers, but it will get worse over time as they upgrade their distros. The warnings we get on kernels older than this should all be for code that uses dangerously large stack frames, though most of them do not cause an actual stack overflow by themselves.The asan-stack option was added in linux-4.0, and commit 3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y") effectively turned off the warning for allmodconfig kernels, so I would like to see this fix backported to any kernels later than 4.0. I have done dozens of fixes for individual functions with stack frames larger than 2048 bytes with asan-stack, and I plan to make sure that all those fixes make it into the stable kernels as well (most are already there). Part of the complication here is that asan-stack (from 4.0) was originally assumed to always require much larger stacks, but that turned out to be a combination of multiple gcc bugs that we have now worked around and fixed, but sanitize-address-use-after-scope (from v4.10) has a much higher inherent stack usage and also suffers from at least three other problems that we have analyzed but not yet fixed upstream, each of them makes the stack usage more severe than it should be. Link: http://lkml.kernel.org/r/20171221134744.2295529-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06lib/ubsan: remove returns-nonnull-attribute checksAndrey Ryabinin2-29/+0
Similarly to type mismatch checks, new GCC 8.x and Clang also changed for ABI for returns_nonnull checks. While we can update our code to conform the new ABI it's more reasonable to just remove it. Because it's just dead code, we don't have any single user of returns_nonnull attribute in the whole kernel. And AFAIU the advantage that this attribute could bring would be mitigated by -fno-delete-null-pointer-checks cflag that we use to build the kernel. So it's unlikely we will have a lot of returns_nonnull attribute in future. So let's just remove the code, it has no use. [aryabinin@virtuozzo.com: fix warning] Link: http://lkml.kernel.org/r/20180122165711.11510-1-aryabinin@virtuozzo.com Link: http://lkml.kernel.org/r/20180119152853.16806-2-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Sodagudi Prasad <psodagud@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06lib/ubsan: add type mismatch handler for new GCC/ClangAndrey Ryabinin2-10/+52
UBSAN=y fails to build with new GCC/clang: arch/x86/kernel/head64.o: In function `sanitize_boot_params': arch/x86/include/asm/bootparam_utils.h:37: undefined reference to `__ubsan_handle_type_mismatch_v1' because Clang and GCC 8 slightly changed ABI for 'type mismatch' errors. Compiler now uses new __ubsan_handle_type_mismatch_v1() function with slightly modified 'struct type_mismatch_data'. Let's add new 'struct type_mismatch_data_common' which is independent from compiler's layout of 'struct type_mismatch_data'. And make __ubsan_handle_type_mismatch[_v1]() functions transform compiler-dependent type mismatch data to our internal representation. This way, we can support both old and new compilers with minimal amount of change. Link: http://lkml.kernel.org/r/20180119152853.16806-1-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Sodagudi Prasad <psodagud@codeaurora.org> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06lib/ubsan.c: s/missaligned/misaligned/Andrew Morton1-2/+2
A vist from the spelling fairy. Cc: David Laight <David.Laight@ACULAB.COM> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06lib/test_sort.c: add module unload supportPravin Shedge1-0/+6
test_sort.c performs array-based and linked list sort test. Code allows to compile either as a loadable modules or builtin into the kernel. Current code is not allow to unload the test_sort.ko module after successful completion. This patch adds support to unload the "test_sort.ko" module by adding module_exit support. Previous patch was implemented auto unload support by returning -EAGAIN from module_init() function on successful case, but this approach is not ideal. The auto-unload might seem like a nice optimization, but it encourages inconsistent behaviour. And behaviour that is different from all other normal modules. Link: http://lkml.kernel.org/r/1513967133-6843-1-git-send-email-pravin.shedge4linux@gmail.com Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com> Cc: Kostenzer Felix <fkostenzer@live.at> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06lib/: make RUNTIME_TESTS a menuconfig to ease disabling it allVincent Legoll1-2/+5
No need to get into the submenu to disable all related config entries. This makes it easier to disable all RUNTIME_TESTS config options without entering the submenu. It will also enable one to see that en/dis-abled state from the outside menu. This is only intended to change menuconfig UI, not change the config dependencies. Link: http://lkml.kernel.org/r/20171209162742.7363-1-vincent.legoll@gmail.com Signed-off-by: Vincent Legoll <vincent.legoll@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06lib: optimize cpumask_next_and()Clement Courbet3-21/+72
We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and(). It's essentially a joined iteration in search for a non-zero bit, which is currently implemented as a lookup join (find a nonzero bit on the lhs, lookup the rhs to see if it's set there). Implement a direct join (find a nonzero bit on the incrementally built join). Also add generic bitmap benchmarks in the new `test_find_bit` module for new function (see `find_next_and_bit` in [2] and [3] below). For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory usage. Note that on Arm, the new pure-C implementation still outperforms the old one that uses a mix of C and asm (`find_next_bit`) [3]. [1] Approximate benchmark code: ``` unsigned long src1p[nr_cpumask_longs] = {pattern1}; unsigned long src2p[nr_cpumask_longs] = {pattern2}; for (/*a bunch of repetitions*/) { for (int n = -1; n <= nr_cpu_ids; ++n) { asm volatile("" : "+rm"(src1p)); // prevent any optimization asm volatile("" : "+rm"(src2p)); unsigned long result = cpumask_next_and(n, src1p, src2p); asm volatile("" : "+rm"(result)); } } ``` Results: pattern1 pattern2 time_before/time_after 0x0000ffff 0x0000ffff 1.65 0x0000ffff 0x00005555 2.24 0x0000ffff 0x00001111 2.94 0x0000ffff 0x00000000 14.0 0x00005555 0x0000ffff 1.67 0x00005555 0x00005555 1.71 0x00005555 0x00001111 1.90 0x00005555 0x00000000 6.58 0x00001111 0x0000ffff 1.46 0x00001111 0x00005555 1.49 0x00001111 0x00001111 1.45 0x00001111 0x00000000 3.10 0x00000000 0x0000ffff 1.18 0x00000000 0x00005555 1.18 0x00000000 0x00001111 1.17 0x00000000 0x00000000 1.25 ----------------------------- geo.mean 2.06 [2] test_find_next_bit, X86 (skylake) [ 3913.477422] Start testing find_bit() with random-filled bitmap [ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations [ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations [ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations [ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations [ 3913.480216] Start testing find_next_and_bit() with random-filled bitmap [ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations [ 3913.481075] Start testing find_bit() with sparse bitmap [ 3913.481078] find_next_bit: 2536 cycles, 66 iterations [ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations [ 3913.481255] find_last_bit: 2006 cycles, 66 iterations [ 3913.481265] find_first_bit: 17488 cycles, 66 iterations [ 3913.481266] Start testing find_next_and_bit() with sparse bitmap [ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations [3] test_find_next_bit, arm (v7 odroid XU3). [ 267.206928] Start testing find_bit() with random-filled bitmap [ 267.214752] find_next_bit: 4474 cycles, 16419 iterations [ 267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations [ 267.229294] find_last_bit: 4209 cycles, 16419 iterations [ 267.279131] find_first_bit: 1032991 cycles, 16420 iterations [ 267.286265] Start testing find_next_and_bit() with random-filled bitmap [ 267.302386] find_next_and_bit: 2290 cycles, 8140 iterations [ 267.309422] Start testing find_bit() with sparse bitmap [ 267.316054] find_next_bit: 191 cycles, 66 iterations [ 267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations [ 267.329803] find_last_bit: 84 cycles, 66 iterations [ 267.336169] find_first_bit: 4118 cycles, 66 iterations [ 267.342627] Start testing find_next_and_bit() with sparse bitmap [ 267.356919] find_next_and_bit: 91 cycles, 1 iterations [courbet@google.com: v6] Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com [geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>] Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com Signed-off-by: Clement Courbet <courbet@google.com> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>