linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-02-26	sh_eth: fix TSU init on SH7734/R8A7740	Sergei Shtylyov	2	-1/+6
	It appears that the single port Ether controllers having TSU (like SH7734/ R8A7740) need the same kind of treating in sh_eth_tsu_init() as R7S72100 currently has -- they also don't have the TSU registers related e.g. to passing the frames between ports. Add the 'sh_eth_cpu_data::dual_port' flag and use it as a new criterion for taking a "short path" in the TSU init sequence in order to avoid writing to the non-existent registers... Fixes: f0e81fecd4f8 ("net: sh_eth: Add support SH7734") Fixes: 73a0d907301e ("net: sh_eth: add support R8A7740") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	sh_eth: TSU_QTAG0/1 registers the same as TSU_QTAGM0/1	Sergei Shtylyov	2	-13/+6
	The TSU_QTAG0/1 registers found in the Gigabit Ether controllers actually have the same long name as the TSU_QTAGM0/1 registers in the early Ether controllers: Qtag Addition/Deletion Set Register (Port 0/1 to 1/0); thus there's no need to make a difference in sh_eth_tsu_init() between those controllers. Unfortunately, we can't just remove TSU_QTAG0/1 from the register enum because that would break the ethtool register dump... Fixes: b0ca2a21f769 ("sh_eth: Add support of SH7763 to sh_eth") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	tc: python3, string formattings	BTaskaya	2	-5/+5
	This patch converts old type string formattings to new type string formattings for adapting Linux Traffic Control (tc) unit testing suite python3. Linux Traffic Control (tc) unit testing suite's code quality improved is improved with this patch. According to python documentation; "The built-in string class provides the ability to do complex variable substitutions and value formatting via the format() method described in PEP 3101. " but the project was using old type formattings and new type string formattings together, this patch's main purpose is converting all old types to new types. Following files changed: 1. tools/testing/selftests/tc-testing/tdc.py 2. tools/testing/selftests/tc-testing/tdc_batch.py Following PEP rules applied: 1. PEP8 - Code Styling 2. PEP3101 - Advanced Code Formatting Signed-off-by: Batuhan Osman Taskaya <batuhanosmantaskaya@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	xen-netback: make function xenvif_rx_skb static	Colin Ian King	1	-1/+1
	The function xenvif_rx_skb is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: drivers/net/xen-netback/rx.c:422:6: warning: symbol 'xenvif_rx_skb' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	tools: tc-testing: better error reporting	Brenda J. Butler	1	-14/+70
	Do a better job with error handling - in pre- and post-suite, in pre- and post-case. Show a traceback for errors. Signed-off-by: Brenda J. Butler <bjb@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	tools: tc-testing: Fix indentation	Brenda J. Butler	1	-2/+2
	Signed-off-by: Brenda J. Butler <bjb@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	sfc: falcon: remove duplicated bit-wise or of LOOPBACK_SGMII	Colin Ian King	1	-1/+0
	Bit pattern LOOPBACK_SGMII is being bit-wise or'd twice; remove the redundant 2nd LOOPBACK_SGMII Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	mlxsw: spectrum_kvdl: avoid uninitialized variable warning	Arnd Bergmann	1	-0/+2
	gcc warns that 'resource_id' is not initialized if we don't come though any of the three 'case' statements before: drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c: In function 'mlxsw_sp_kvdl_part_init': drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:275:8: error: 'resource_id' may be used uninitialized in this function [-Werror=maybe-uninitialized] In the current code, that won't happen, but it's more robust to explicitly handle this by returning a failure from mlxsw_sp_kvdl_part_init. Fixes: 887839e6960d ("mlxsw: spectrum_kvdl: Add support for dynamic partition set") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	mlxsw: spectrum_kvdl: use div_u64() for 64-bit division	Arnd Bergmann	1	-1/+1
	Calculating the number of entries now uses 64-bit arithmetic that causes a link error on 32-bit architectures: drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.o: In function `mlxsw_sp_kvdl_init': spectrum_kvdl.c:(.text+0x51c): undefined reference to `__aeabi_uldivmod' We could probably use a 32-bit division here as before, but since this is not in a performance critical path, div_u64() seems cleaner here. Fixes: 887839e6960d ("mlxsw: spectrum_kvdl: Add support for dynamic partition set") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	arm: mvebu: 370-rd: Enable PHY interrupt handling	Andrew Lunn	1	-0/+32
	The Ethernet switch has an embedded interrupt controller. Interrupts from the embedded PHYs are part of this interrupt controller. Explicitly list the MDIO bus the embedded PHYs are on, and wire up the interrupts. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	net: dsa: mv88e6xxx: Poll when no interrupt defined	Andrew Lunn	2	-43/+106
	Not all boards have the interrupt output from the switch connected to a GPIO line. In such cases, phylib has to poll the internal PHYs, rather than receive an interrupt when there is a change in the link state. phylib polls once per second, and per PHY reads around 4 words. With a switch typically having 4 internal PHYs, this means 16 MDIO transactions per second. Rather than performing this phylib level polling, have the driver poll the interrupt status register. If the status register indicates an interrupt condition processing of interrupts in the same way as if a GPIO was used. Polling 10 times a second places less load on the MDIO bus. But rather than taking on average 0.5s to detect a link change, it takes less than 0.05s. Additionally, other interrupts, such as the watchdog, ATU and VTU violations will be reported. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	mlxsw: spectrum_switchdev: Allow port enslavement to a VLAN-unaware bridge	Ido Schimmel	1	-7/+5
	Up until now we only allowed VLAN devices to be put in a VLAN-unaware bridge, but some users need the ability to enslave physical ports as well. This is achieved by mapping the port and VID 1 to the bridge's vFID, instead of the port and the VID used by the VLAN device. The above is valid because as long as the port is not enslaved to a bridge, VID 1 is guaranteed to be configured as PVID and egress untagged. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26	samples/bpf: Add program for CPU state statistics	Leo Yan	3	-0/+504
	CPU is active when have running tasks on it and CPUFreq governor can select different operating points (OPP) according to different workload; we use 'pstate' to present CPU state which have running tasks with one specific OPP. On the other hand, CPU is idle which only idle task on it, CPUIdle governor can select one specific idle state to power off hardware logics; we use 'cstate' to present CPU idle state. Based on trace events 'cpu_idle' and 'cpu_frequency' we can accomplish the duration statistics for every state. Every time when CPU enters into or exits from idle states, the trace event 'cpu_idle' is recorded; trace event 'cpu_frequency' records the event for CPU OPP changing, so it's easily to know how long time the CPU stays in the specified OPP, and the CPU must be not in any idle state. This patch is to utilize the mentioned trace events for pstate and cstate statistics. To achieve more accurate profiling data, the program uses below sequence to insure CPU running/idle time aren't missed: - Before profiling the user space program wakes up all CPUs for once, so can avoid to missing account time for CPU staying in idle state for long time; the program forces to set 'scaling_max_freq' to lowest frequency and then restore 'scaling_max_freq' to highest frequency, this can ensure the frequency to be set to lowest frequency and later after start to run workload the frequency can be easily to be changed to higher frequency; - User space program reads map data and update statistics for every 5s, so this is same with other sample bpf programs for avoiding big overload introduced by bpf program self; - When send signal to terminate program, the signal handler wakes up all CPUs, set lowest frequency and restore highest frequency to 'scaling_max_freq'; this is exactly same with the first step so avoid to missing account CPU pstate and cstate time during last stage. Finally it reports the latest statistics. The program has been tested on Hikey board with octa CA53 CPUs, below is one example for statistics result, the format mainly follows up Jesper Dangaard Brouer suggestion. Jesper reminds to 'get printf to pretty print with thousands separators use %' and setlocale(LC_NUMERIC, "en_US")', tried three different arm64 GCC toolchains (5.4.0 20160609, 6.2.1 20161016, 6.3.0 20170516) but all of them cannot support printf flag character %' on arm64 platform, so go back print number without grouping mode. CPU states statistics: state(ms) cstate-0 cstate-1 cstate-2 pstate-0 pstate-1 pstate-2 pstate-3 pstate-4 CPU-0 767 6111 111863 561 31 756 853 190 CPU-1 241 10606 107956 484 125 646 990 85 CPU-2 413 19721 98735 636 84 696 757 89 CPU-3 84 11711 79989 17516 909 4811 5773 341 CPU-4 152 19610 98229 444 53 649 708 1283 CPU-5 185 8781 108697 666 91 671 677 1365 CPU-6 157 21964 95825 581 67 566 684 1284 CPU-7 125 15238 102704 398 20 665 786 1197 Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-23	bpf: add various jit test cases	Daniel Borkmann	1	-0/+89
	Add few test cases that check the rnu-time results under JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23	bpf, x64: save 5 bytes in prologue when ebpf insns came from cbpf	Daniel Borkmann	1	-12/+16
	While it's rather cumbersome to reduce prologue for cBPF->eBPF migrations wrt spill/fill for r15 which is callee saved register due to bpf_error path in bpf_jit.S that is both used by migrations as well as native eBPF, we can still trivially save 5 bytes in prologue for the former since tail calls can never be used there. cBPF->eBPF migrations also have their own custom prologue in BPF asm that xors A and X reg anyway, so it's fine we skip this here. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23	bpf, x64: save few bytes when mul is in alu32	Daniel Borkmann	1	-15/+28
	Add a generic emit_mov_reg() helper in order to reuse it in BPF multiplication to load the src into rax, we can save a few bytes in alu32 while doing so. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23	bpf, x64: save several bytes when mul dest is r0/r3 anyway	Daniel Borkmann	1	-10/+11
	Instead of unconditionally performing push/pop on rax/rdx in case of multiplication, we can save a few bytes in case of dest register being either BPF r0 (rax) or r3 (rdx) since the result is written in there anyway. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23	bpf, x64: save several bytes by using mov over movabsq when possible	Daniel Borkmann	1	-51/+74
	While analyzing some of the more complex BPF programs from Cilium, I found that LLVM generally prefers to emit LD_IMM64 instead of MOV32 BPF instructions for loading unsigned 32-bit immediates into a register. Given we cannot change the current/stable LLVM versions that are already out there, lets optimize this case such that the JIT prefers to emit 'mov %eax, imm32' over 'movabsq %rax, imm64' whenever suitable in order to reduce the image size by 4-5 bytes per such load in the typical case, reducing image size on some of the bigger programs by up to 4%. emit_mov_imm32() and emit_mov_imm64() have been added as helpers. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23	bpf, x64: save one byte per shl/shr/sar when imm is 1	Daniel Borkmann	1	-1/+5
	When we shift by one, we can use a different encoding where imm is not explicitly needed, which saves 1 byte per such op. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23	net: fib_rules: Add new attribute to set protocol	Donald Sharp	4	-8/+20
	For ages iproute2 has used `struct rtmsg` as the ancillary header for FIB rules and in the process set the protocol value to RTPROT_BOOT. Until ca56209a66 ("net: Allow a rule to track originating protocol") the kernel rules code ignored the protocol value sent from userspace and always returned 0 in notifications. To avoid incompatibility with existing iproute2, send the protocol as a new attribute. Fixes: cac56209a66 ("net: Allow a rule to track originating protocol") Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23	selftests/net: ignore background traffic in psock_fanout	Willem de Bruijn	3	-8/+51
	The packet fanout test generates UDP traffic and reads this with a pair of packet sockets, testing the various fanout algorithms. Avoid non-determinism from reading unrelated background traffic. Fanout decisions are made before unrelated packets can be dropped with a filter, so that is an insufficient strategy []. Run the packet socket tests in a network namespace, similar to msg_zerocopy. It it still good practice to install a filter on a packet socket before accepting traffic. Because this is example code, demonstrate that pattern. Open the socket initially bound to no protocol, install a filter, and only then bind to ETH_P_IP. Another source of non-determinism is hash collisions in FANOUT_HASH. The hash function used to select a socket in the fanout group includes the pseudorandom number hashrnd, which is not visible from userspace. To work around this, the test tries to find a pair of UDP source ports that do not collide. It gives up too soon (5 times, every 32 runs) and output is confusing. Increase tries to 20 and revise the error msg. [] another approach would be to add a third socket to the fanout group and direct all unexpected traffic here. This is possible only when reimplementing methods like RR or HASH alongside this extra catch-all bucket, using the BPF fanout method. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23	atm: idt77252: remove redundant bit-wise or'ing of zero	Colin Ian King	1	-8/+4
	Zero is being bit-wise or'd in a calculation twice; these are redundant and can be removed. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23	net_sched: gen_estimator: fix broken estimators based on percpu stats	Eric Dumazet	1	-0/+1
	pfifo_fast got percpu stats lately, uncovering a bug I introduced last year in linux-4.10. I missed the fact that we have to clear our temporary storage before calling __gnet_stats_copy_basic() in the case of percpu stats. Without this fix, rate estimators (tc qd replace dev xxx root est 1sec 4sec pfifo_fast) are utterly broken. Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23	rds: rds_msg_zcopy should return error of null rm->data.op_mmp_znotifier	Sowmini Varadhan	1	-1/+2
	if either or both of MSG_ZEROCOPY and SOCK_ZEROCOPY have not been specified, the rm->data.op_mmp_znotifier allocation will be skipped. In this case, it is invalid ot pass down a cmsghdr with RDS_CMSG_ZCOPY_COOKIE, so return EINVAL from rds_msg_zcopy for this case. Reported-by: syzbot+f893ae7bb2f6456dfbc3@syzkaller.appspotmail.com Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23	r8169: simplify and improve check for dash	Heiner Kallweit	1	-30/+9
	r8168_check_dash() returns false anyway for all chip versions not supporting dash. So we can simplify the check conditions. In addition change the check functions to return bool instead of int, because they actually return a bool value. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23	r8169: disable WOL per default	Heiner Kallweit	1	-2/+3
	Currently, if BIOS enables WOL in the chip, settings are inconsistent because the device isn't marked as wakeup-enabled (if not done explicitly via userspace tools). This causes issues with suspend/ resume because mdio_bus_phy_may_suspend() checks whether device is wakeup-enabled. In detail MDIO bus access in phy_suspend() can fail because the MDIO bus is disabled. In the history of the driver we find two competing approaches: 8f9d5138035d "r8169: remember WOL preferences on driver load" prefers to preserve what the BIOS may have set, whilst bde135a672bf "r8169: only enable PCI wakeups when WOL is active" disabled PCI wakeup per default to work around a bug on one platform. Seems like nobody complained after the latter patch about non-working WOL, what makes me think that nobody uses WOL w/o configuring it explicitly. My opinion: Vast majority of users doesn't use WOL even if the BIOS enables it in the chip. And having WOL being active keeps the PHY(s) from powering down if being idle. If somebody needs WOL, he can enable it during boot, e.g. by configuring systemd.link/WakeOnLan. Therefore, to make WOL consistent again, disable it per default. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23	gianfar: simplify FCS handling and fix memory leak	Andy Spencer	1	-16/+7
	Previously, buffer descriptors containing only the frame check sequence (FCS) were skipped and not added to the skb. However, the page reference count was still incremented, leading to a memory leak. Fixing this inside gfar_add_rx_frag() is difficult due to reserved memory handling and page reuse. Instead, move the FCS handling to gfar_process_frame() and trim off the FCS before passing the skb up the networking stack. Signed-off-by: Andy Spencer <aspencer@spacex.com> Signed-off-by: Jim Gruen <jgruen@spacex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23	ipv6 sit: work around bogus gcc-8 -Wrestrict warning	Arnd Bergmann	1	-1/+1
	gcc-8 has a new warning that detects overlapping input and output arguments in memcpy(). It triggers for sit_init_net() calling ipip6_tunnel_clone_6rd(), which is actually correct: net/ipv6/sit.c: In function 'sit_init_net': net/ipv6/sit.c:192:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] The problem here is that the logic detecting the memcpy() arguments finds them to be the same, but the conditional that tests for the input and output of ipip6_tunnel_clone_6rd() to be identical is not a compile-time constant. We know that netdev_priv(t->dev) is the same as t for a tunnel device, and comparing "dev" directly here lets the compiler figure out as well that 'dev == sitn->fb_tunnel_dev' when called from sit_init_net(), so it no longer warns. This code is old, so Cc stable to make sure that we don't get the warning for older kernels built with new gcc. Cc: Martin Sebor <msebor@gmail.com> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83456 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23	macvlan: fix use-after-free in macvlan_common_newlink()	Alexey Kodanev	1	-1/+1
	The following use-after-free was reported by KASan when running LTP macvtap01 test on 4.16-rc2: [10642.528443] BUG: KASAN: use-after-free in macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10642.626607] Read of size 8 at addr ffff880ba49f2100 by task ip/18450 ... [10642.963873] Call Trace: [10642.994352] dump_stack+0x5c/0x7c [10643.035325] print_address_description+0x75/0x290 [10643.092938] kasan_report+0x28d/0x390 [10643.137971] ? macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10643.207963] macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10643.275978] macvtap_newlink+0x171/0x260 [macvtap] [10643.334532] rtnl_newlink+0xd4f/0x1300 ... [10646.256176] Allocated by task 18450: [10646.299964] kasan_kmalloc+0xa6/0xd0 [10646.343746] kmem_cache_alloc_trace+0xf1/0x210 [10646.397826] macvlan_common_newlink+0x6de/0x14a0 [macvlan] [10646.464386] macvtap_newlink+0x171/0x260 [macvtap] [10646.522728] rtnl_newlink+0xd4f/0x1300 ... [10647.022028] Freed by task 18450: [10647.061549] __kasan_slab_free+0x138/0x180 [10647.111468] kfree+0x9e/0x1c0 [10647.147869] macvlan_port_destroy+0x3db/0x650 [macvlan] [10647.211411] rollback_registered_many+0x5b9/0xb10 [10647.268715] rollback_registered+0xd9/0x190 [10647.319675] register_netdevice+0x8eb/0xc70 [10647.370635] macvlan_common_newlink+0xe58/0x14a0 [macvlan] [10647.437195] macvtap_newlink+0x171/0x260 [macvtap] Commit d02fd6e7d293 ("macvlan: Fix one possible double free") handles the case when register_netdevice() invokes ndo_uninit() on error and as a result free the port. But 'macvlan_port_get_rtnl(dev))' check (returns dev->rx_handler_data), which was added by this commit in order to prevent double free, is not quite correct: * for macvlan it always returns NULL because 'lowerdev' is the one that was used to register rx handler (port) in macvlan_port_create() as well as to unregister it in macvlan_port_destroy(). * for macvtap it always returns a valid pointer because macvtap registers its own rx handler before macvlan_common_newlink(). Fixes: d02fd6e7d293 ("macvlan: Fix one possible double free") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23	bpf: NULL pointer check is not needed in BPF_CGROUP_RUN_PROG_INET_SOCK	Yafang Shao	1	-1/+1
	sk is already allocated in inet_create/inet6_create, hence when BPF_CGROUP_RUN_PROG_INET_SOCK is executed sk will never be NULL. The logic is as bellow, sk = sk_alloc(); if (!sk) goto out; BPF_CGROUP_RUN_PROG_INET_SOCK(sk); Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-23	arm64: fix unwind_frame() for filtered out fn for function graph tracing	Pratyush Anand	3	-2/+7
	do_task_stat() calls get_wchan(), which further does unwind_frame(). unwind_frame() restores frame->pc to original value in case function graph tracer has modified a return address (LR) in a stack frame to hook a function return. However, if function graph tracer has hit a filtered function, then we can't unwind it as ftrace_push_return_trace() has biased the index(frame->graph) with a 'huge negative' offset(-FTRACE_NOTRACE_DEPTH). Moreover, arm64 stack walker defines index(frame->graph) as unsigned int, which can not compare a -ve number. Similar problem we can have with calling of walk_stackframe() from save_stack_trace_tsk() or dump_backtrace(). This patch fixes unwind_frame() to test the index for -ve value and restore index accordingly before we can restore frame->pc. Reproducer: cd /sys/kernel/debug/tracing/ echo schedule > set_graph_notrace echo 1 > options/display-graph echo wakeup > current_tracer ps -ef \| grep -i agent Above commands result in: Unable to handle kernel paging request at virtual address ffff801bd3d1e000 pgd = ffff8003cbe97c00 [ffff801bd3d1e000] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000006 [#1] SMP [...] CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33 [...] task: ffff8003c21ba000 task.stack: ffff8003cc6c0000 PC is at unwind_frame+0x12c/0x180 LR is at get_wchan+0xd4/0x134 pc : [<ffff00000808892c>] lr : [<ffff0000080860b8>] pstate: 60000145 sp : ffff8003cc6c3ab0 x29: ffff8003cc6c3ab0 x28: 0000000000000001 x27: 0000000000000026 x26: 0000000000000026 x25: 00000000000012d8 x24: 0000000000000000 x23: ffff8003c1c04000 x22: ffff000008c83000 x21: ffff8003c1c00000 x20: 000000000000000f x19: ffff8003c1bc0000 x18: 0000fffffc593690 x17: 0000000000000000 x16: 0000000000000001 x15: 0000b855670e2b60 x14: 0003e97f22cf1d0f x13: 0000000000000001 x12: 0000000000000000 x11: 00000000e8f4883e x10: 0000000154f47ec8 x9 : 0000000070f367c0 x8 : 0000000000000000 x7 : 00008003f7290000 x6 : 0000000000000018 x5 : 0000000000000000 x4 : ffff8003c1c03cb0 x3 : ffff8003c1c03ca0 x2 : 00000017ffe80000 x1 : ffff8003cc6c3af8 x0 : ffff8003d3e9e000 Process ps (pid: 11696, stack limit = 0xffff8003cc6c0000) Stack: (0xffff8003cc6c3ab0 to 0xffff8003cc6c4000) [...] [<ffff00000808892c>] unwind_frame+0x12c/0x180 [<ffff000008305008>] do_task_stat+0x864/0x870 [<ffff000008305c44>] proc_tgid_stat+0x3c/0x48 [<ffff0000082fde0c>] proc_single_show+0x5c/0xb8 [<ffff0000082b27e0>] seq_read+0x160/0x414 [<ffff000008289e6c>] __vfs_read+0x58/0x164 [<ffff00000828b164>] vfs_read+0x88/0x144 [<ffff00000828c2e8>] SyS_read+0x60/0xc0 [<ffff0000080834a0>] __sys_trace_return+0x0/0x4 Fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's output under function graph tracer) Signed-off-by: Pratyush Anand <panand@redhat.com> Signed-off-by: Jerome Marchand <jmarchan@redhat.com> [catalin.marinas@arm.com: replace WARN_ON with WARN_ON_ONCE] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-22	integrity/security: fix digsig.c build error with header file	Randy Dunlap	1	-0/+1
	security/integrity/digsig.c has build errors on some $ARCH due to a missing header file, so add it. security/integrity/digsig.c:146:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: linux-integrity@vger.kernel.org Link: http://kisskb.ellerman.id.au/kisskb/head/13396/ Signed-off-by: James Morris <james.morris@microsoft.com>
2018-02-22	MIPS: boot: Define __ASSEMBLY__ for its.S build	Kees Cook	1	-0/+1
	The MIPS %.its.S compiler command did not define __ASSEMBLY__, which meant when compiler_types.h was added to kconfig.h, unexpected things appeared (e.g. struct declarations) which should not have been present. As done in the general %.S compiler command, __ASSEMBLY__ is now included here too. The failure was: Error: arch/mips/boot/vmlinux.gz.its:201.1-2 syntax error FATAL ERROR: Unable to parse input tree /usr/bin/mkimage: Can't read arch/mips/boot/vmlinux.gz.itb.tmp: Invalid argument /usr/bin/mkimage Can't add hashes to FIT blob Reported-by: kbuild test robot <lkp@intel.com> Fixes: 28128c61e08e ("kconfig.h: Include compiler types to avoid missed struct attributes") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-22	bpf, arm64: fix out of bounds access in tail call	Daniel Borkmann	2	-2/+29
	I recently noticed a crash on arm64 when feeding a bogus index into BPF tail call helper. The crash would not occur when the interpreter is used, but only in case of JIT. Output looks as follows: [ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510 [...] [ 347.043065] [fffb850e96492510] address between user and kernel address ranges [ 347.050205] Internal error: Oops: 96000004 [#1] SMP [...] [ 347.190829] x13: 0000000000000000 x12: 0000000000000000 [ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10 [ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000 [ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000 [ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a [ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500 [ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500 [ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61) [ 347.235221] Call trace: [ 347.237656] 0xffff000002f3a4fc [ 347.240784] bpf_test_run+0x78/0xf8 [ 347.244260] bpf_prog_test_run_skb+0x148/0x230 [ 347.248694] SyS_bpf+0x77c/0x1110 [ 347.251999] el0_svc_naked+0x30/0x34 [ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b) [...] In this case the index used in BPF r3 is the same as in r1 at the time of the call, meaning we fed a pointer as index; here, it had the value 0xffff808fd7cf0500 which sits in x2. While I found tail calls to be working in general (also for hitting the error cases), I noticed the following in the code emission: # bpftool p d j i 988 [...] 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x000000000000007c <-- signed cmp 44: mov x10, #0x20 // #32 48: cmp x26, x10 4c: b.gt 0x000000000000007c 50: add x26, x26, #0x1 54: mov x10, #0x110 // #272 58: add x10, x1, x10 5c: lsl x11, x2, #3 60: ldr x11, [x10,x11] <-- faulting insn (f86b694b) 64: cbz x11, 0x000000000000007c [...] Meaning, the tests passed because commit ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper") was using signed compares instead of unsigned which as a result had the test wrongly passing. Change this but also the tail call count test both into unsigned and cap the index as u32. Latter we did as well in 90caccdd8cc0 ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here, too. Tested on HiSilicon Hi1616. Result after patch: # bpftool p d j i 268 [...] 38: ldr w10, [x1,x10] 3c: add w2, w2, #0x0 40: cmp w2, w10 44: b.cs 0x0000000000000080 48: mov x10, #0x20 // #32 4c: cmp x26, x10 50: b.hi 0x0000000000000080 54: add x26, x26, #0x1 58: mov x10, #0x110 // #272 5c: add x10, x1, x10 60: lsl x11, x2, #3 64: ldr x11, [x10,x11] 68: cbz x11, 0x0000000000000080 [...] Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-22	bpf, x64: implement retpoline for tail call	Daniel Borkmann	2	-4/+42
	Implement a retpoline [0] for the BPF tail call JIT'ing that converts the indirect jump via jmp %rax that is used to make the long jump into another JITed BPF image. Since this is subject to speculative execution, we need to control the transient instruction sequence here as well when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop. The latter aligns also with what gcc / clang emits (e.g. [1]). JIT dump after patch: # bpftool p d x i 1 0: (18) r2 = map[id:1] 2: (b7) r3 = 0 3: (85) call bpf_tail_call#12 4: (b7) r0 = 2 5: (95) exit With CONFIG_RETPOLINE: # bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000072 \|* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000072 \| 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000072 \| 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: callq 0x000000000000006d \|+ 66: pause \| 68: lfence \| 6b: jmp 0x0000000000000066 \| 6d: mov %rax,(%rsp) \| 71: retq \| 72: mov $0x2,%eax [...] * relative fall-through jumps in error case + retpoline for indirect jump Without CONFIG_RETPOLINE: # bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000063 \|* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000063 \| 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000063 \| 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: jmpq %rax \|- 63: mov $0x2,%eax [...] relative fall-through jumps in error case - plain indirect jump as before [0] https://support.google.com/faqs/answer/7625886 [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2b Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-22	fs/signalfd: fix build error for BUS_MCEERR_AR	Randy Dunlap	1	-3/+12
	Fix build error in fs/signalfd.c by using same method that is used in kernel/signal.c: separate blocks for different signal si_code values. ./fs/signalfd.c: error: 'BUS_MCEERR_AR' undeclared (first use in this function) Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-02-22	rxrpc: Fix send in rxrpc_send_data_packet()	David Howells	1	-1/+1
	All the kernel_sendmsg() calls in rxrpc_send_data_packet() need to send both parts of the iov[] buffer, but one of them does not. Fix it so that it does. Without this, short IPv6 rxrpc DATA packets may be seen that have the rxrpc header included, but no payload. Fixes: 5a924b8951f8 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22	dsa: ptp: mark dummy helpers as 'inline'	Arnd Bergmann	2	-6/+6
	Declaring a static function in a header leads to a warning every time that header gets included without the function being used: In file included from drivers/net/dsa/mv88e6xxx/chip.c:42: drivers/net/dsa/mv88e6xxx/ptp.h:92:13: error: 'mv88e6xxx_hwtstamp_work' defined but not used [-Werror=unused-function] static long mv88e6xxx_hwtstamp_work(struct ptp_clock_info ptp) In file included from drivers/net/dsa/mv88e6xxx/chip.c:38: drivers/net/dsa/mv88e6xxx/global2.h:355:12: error: 'mv88e6xxx_g2_wait' defined but not used [-Werror=unused-function] static int mv88e6xxx_g2_wait(struct mv88e6xxx_chip chip, int reg, u16 mask) ^~~~~~~~~~~~~~~~~ drivers/net/dsa/mv88e6xxx/global2.h:350:12: error: 'mv88e6xxx_g2_update' defined but not used [-Werror=unused-function] static int mv88e6xxx_g2_update(struct mv88e6xxx_chip chip, int reg, u16 update) ^~~~~~~~~~~~~~~~~~~ drivers/net/dsa/mv88e6xxx/global2.h:345:12: error: 'mv88e6xxx_g2_write' defined but not used [-Werror=unused-function] static int mv88e6xxx_g2_write(struct mv88e6xxx_chip chip, int reg, u16 val) ^~~~~~~~~~~~~~~~~~ drivers/net/dsa/mv88e6xxx/global2.h:340:12: error: 'mv88e6xxx_g2_read' defined but not used [-Werror=unused-function] static int mv88e6xxx_g2_read(struct mv88e6xxx_chip chip, int reg, u16 val) This marks all such functions in dsa inline to make sure we don't warn about them. Fixes: c6fe0ad2c349 ("net: dsa: mv88e6xxx: add rx/tx timestamping support") Fixes: 0d632c3d6fe3 ("net: dsa: mv88e6xxx: add accessors for PTP/TAI registers") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22	net: aquantia: Fix error handling in aq_pci_probe()	Dan Carpenter	1	-4/+10
	We should check "self->aq_hw" for allocation failure, and also we should free it on the error paths. Fixes: 23ee07ad3c2f ("net: aquantia: Cleanup pci functions module") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22	bpf: fix rcu lockdep warning for lpm_trie map_free callback	Yonghong Song	1	-2/+1
	Commit 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback function") fixed a memory leak and removed unnecessary locks in map_free callback function. Unfortrunately, it introduced a lockdep warning. When lockdep checking is turned on, running tools/testing/selftests/bpf/test_lpm_map will have: [ 98.294321] ============================= [ 98.294807] WARNING: suspicious RCU usage [ 98.295359] 4.16.0-rc2+ #193 Not tainted [ 98.295907] ----------------------------- [ 98.296486] /home/yhs/work/bpf/kernel/bpf/lpm_trie.c:572 suspicious rcu_dereference_check() usage! [ 98.297657] [ 98.297657] other info that might help us debug this: [ 98.297657] [ 98.298663] [ 98.298663] rcu_scheduler_active = 2, debug_locks = 1 [ 98.299536] 2 locks held by kworker/2:1/54: [ 98.300152] #0: ((wq_completion)"events"){+.+.}, at: [<00000000196bc1f0>] process_one_work+0x157/0x5c0 [ 98.301381] #1: ((work_completion)(&map->work)){+.+.}, at: [<00000000196bc1f0>] process_one_work+0x157/0x5c0 Since actual trie tree removal happens only after no other accesses to the tree are possible, replacing rcu_dereference_protected(slot, lockdep_is_held(&trie->lock)) with rcu_dereference_protected(slot, 1) fixed the issue. Fixes: 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback function") Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-22	bpf: add schedule points in percpu arrays management	Eric Dumazet	1	-1/+4
	syszbot managed to trigger RCU detected stalls in bpf_array_free_percpu() It takes time to allocate a huge percpu map, but even more time to free it. Since we run in process context, use cond_resched() to yield cpu if needed. Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-22	nfp: advertise firmware for mixed 10G/25G mode	Dirk van der Merwe	1	-0/+1
	The AMDA0099-0001 platform can support the 1x10G + 1x25G mixed mode operation. Recently, firmware has been added for this configuration mode. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>