linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-11-07	ixgbe: allow IPsec Tx offload in VEPA mode	Shannon Nelson	1	-1/+3
	When it's possible that the PF might end up trying to send a packet to one of its own VFs, we have to forbid IPsec offload because the device drops the packets into a black hole. See commit 47b6f50077e6 ("ixgbe: disallow IPsec Tx offload when in SR-IOV mode") for more info. This really is only necessary when the device is in the default VEB mode. If instead the device is running in VEPA mode, the packets will go through the encryption engine and out the MAC/PHY as normal, and get "hairpinned" as needed by the switch. So let's not block IPsec offload when in VEPA mode. To get there with the ixgbe device, use the handy 'bridge' command: bridge link set dev eth1 hwmode vepa Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-07	ixgbe: don't clear_bit on xdp_ring->state if xdp_ring is null	Colin Ian King	1	-1/+2
	There is an earlier check to see if xdp_ring is null when configuring the tx ring, so assuming that it can still be null, the clearing of the xdp_ring->state currently could end up with a null pointer dereference. Fix this by only clearing the bit if xdp_ring is not null. Detected by CoverityScan, CID#1473795 ("Dereference after null check") Fixes: 024aa5800f32 ("ixgbe: added Rx/Tx ring disable/enable functions") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-07	igbvf: Replace spin_is_locked() with lockdep	Lance Roy	1	-2/+2
	lockdep_assert_held() is better suited to checking locking requirements, since it won't get confused when someone else holds the lock. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-06	net: systemport: Unmap queues upon DSA unregister event	Florian Fainelli	1	-6/+50
	Binding and unbinding the switch driver which creates the DSA slave network devices for which we set-up inspection would lead to undesireable effects since we were not clearing the port/queue mapping to the SYSTEMPORT TX queue. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	net: systemport: Simplify queue mapping logic	Florian Fainelli	2	-9/+9
	The use of a bitmap speeds up the finding of the first available queue to which we could start establishing the mapping for, but we still have to loop over all slave network devices to set them up. Simplify the logic to have a single loop, and use the fact that a correctly configured ring has inspect set to true. This will make things simpler to unwind during device unregistration. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	net: dsa: bcm_sf2: Turn on PHY to allow successful registration	Florian Fainelli	1	-0/+4
	We are binding to the PHY using the SF2 slave MDIO bus that we create, binding involves reading the PHY's MII_PHYSID1/2 which won't be possible if the PHY is turned off. Temporarily turn it on/off for the bus probing to succeeed. This fixes unbind/bind problems where the port connecting to that PHY would be in error since it could not connect to it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	net: systemport: Restore Broadcom tag match filters upon resume	Florian Fainelli	2	-0/+13
	Some of the system suspend states that we support wipe out entirely the HW contents. If we had a Wake-on-LAN filter programmed prior to going into suspend, but we did not actually wake-up from Wake-on-LAN and instead used a deeper suspend state, make sure we restore the CID number that we need to match against. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	net: dsa: bcm_sf2: Get rid of unmarshalling functions	Florian Fainelli	1	-310/+0
	Now that we have migrated the CFP rule handling to a list with a software copy, the delete/get operation just returns what is on the list, no need to read from the hardware which is both slow and more error prone. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	net: dsa: bcm_sf2: Restore CFP rules during system resume	Florian Fainelli	3	-0/+41
	The hardware can lose its context during system suspend, and depending on the switch generation (7445 vs. 7278), while the rules are still there, they will have their valid bit cleared (because that's the fastest way for the HW to reset things). Just make sure we re-apply them coming back from resume. The 7445 switch is an older version of the core that has some quirky RAM technology requiring a delete then re-inser to guarantee the RAM entries are properly latched. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	net: dsa: bcm_sf2: Split rule handling from HW operation	Florian Fainelli	1	-35/+54
	In preparation for restoring CFP rules during system wide system suspend/resume where the hardware loses its context, split the rule validation from its actual insertion as well as the rule removal from its actual hardware deletion operation. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	net: dsa: bcm_sf2: Keep copy of inserted rules	Florian Fainelli	3	-11/+137
	We tried hard to use the hardware as a storage area, which made things needlessly complex in that we had to both marshall and unmarshall the ethtool_rx_flow_spec into what the CFP hardware understands but it did not require any driver level allocations, so that was nice. Keep a copy of the ethtool_rx_flow_spec rule we want to insert, and also make sure we don't have a duplicate rule already. This greatly speeds up the deletion time since we only need to clear the slice's valid bit and not perform a full read. This is a preparatory step for being able to restore rules upon system resumption where the hardware loses its context partially or entirely. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	rtnetlink: Add more extack messages to rtnl_newlink	David Ahern	1	-2/+4
	Add extack arg to the nla_parse_nested calls in rtnl_newlink, and add messages for unknown device type and link network namespace id. In particular, it improves the failure message when the wrong link type is used. From $ ip li add bond1 type bonding RTNETLINK answers: Operation not supported to $ ip li add bond1 type bonding Error: Unknown device type. (The module name is bonding but the link type is bond.) Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	net: Add extack argument to ip_fib_metrics_init	David Ahern	4	-11/+25
	Add extack argument to ip_fib_metrics_init and add messages for invalid metrics. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	net: Add extack argument to rtnl_create_link	David Ahern	7	-12/+19
	Add extack arg to rtnl_create_link and add messages for invalid number of Tx or Rx queues. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	ipv6: gro: do not use slow memcmp() in ipv6_gro_receive()	Eric Dumazet	1	-3/+10
	ipv6_gro_receive() compares 34 bytes using slow memcmp(), while handcoding with a couple of ipv6_addr_equal() is much faster. Before this patch, "perf top -e cycles:pp -C <cpu>" would see memcmp() using ~10% of cpu cycles on a 40Gbit NIC receiving IPv6 TCP traffic. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	net: skbuff.h: remove unnecessary unlikely()	Yangtao Li	1	-3/+1
	WARN_ON() already contains an unlikely(), so it's not necessary to use unlikely. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	ISDN: eicon: Remove driver	Olof Johansson	85	-41755/+0
	I started looking at the history of this driver, and last time the maintainer was active on the mailing list was when discussing how to remove it. This was in 2012: https://lore.kernel.org/lkml/4F4DE175.30002@melware.de/ It looks to me like this has in practice been an orphan for quite a while. It's throwing warnings about stack size in a function that is in dire need of refactoring, and it's probably a case of "it's time to call it". Cc: Armin Schindler <mac@melware.de> Cc: Karsten Keil <isdn@linux-pingi.de> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06	ARM: 8809/1: proc-v7: fix Thumb annotation of cpu_v7_hvc_switch_mm	Ard Biesheuvel	1	-1/+1
	Due to what appears to be a copy/paste error, the opening ENTRY() of cpu_v7_hvc_switch_mm() lacks a matching ENDPROC(), and instead, the one for cpu_v7_smc_switch_mm() is duplicated. Given that it is ENDPROC() that emits the Thumb annotation, the cpu_v7_hvc_switch_mm() routine will be called in ARM mode on a Thumb2 kernel, resulting in the following splat: Internal error: Oops - undefined instruction: 0 [#1] SMP THUMB2 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc1-00030-g4d28ad89189d-dirty #488 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 PC is at cpu_v7_hvc_switch_mm+0x12/0x18 LR is at flush_old_exec+0x31b/0x570 pc : [<c0316efe>] lr : [<c04117c7>] psr: 00000013 sp : ee899e50 ip : 00000000 fp : 00000001 r10: eda28f34 r9 : eda31800 r8 : c12470e0 r7 : eda1fc00 r6 : eda53000 r5 : 00000000 r4 : ee88c000 r3 : c0316eec r2 : 00000001 r1 : eda53000 r0 : 6da6c000 Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Note the 'ISA ARM' in the last line. Fix this by using the correct name in ENDPROC(). Cc: <stable@vger.kernel.org> Fixes: 10115105cb3a ("ARM: spectre-v2: add firmware based hardening") Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-11-05	net: alx: make alx_drv_name static	Rasmus Villemoes	2	-2/+1
	alx_drv_name is not used outside main.c, so there's no reason for it to have external linkage. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05	net: bpfilter: fix iptables failure if bpfilter_umh is disabled	Taehee Yoo	1	-3/+3
	When iptables command is executed, ip_{set/get}sockopt() try to upload bpfilter.ko if bpfilter is enabled. if it couldn't find bpfilter.ko, command is failed. bpfilter.ko is generated if CONFIG_BPFILTER_UMH is enabled. ip_{set/get}sockopt() only checks CONFIG_BPFILTER. So that if CONFIG_BPFILTER is enabled and CONFIG_BPFILTER_UMH is disabled, iptables command is always failed. test config: CONFIG_BPFILTER=y # CONFIG_BPFILTER_UMH is not set test command: %iptables -L iptables: No chain/target/match by that name. Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05	sock_diag: fix autoloading of the raw_diag module	Andrei Vagin	1	-0/+1
	IPPROTO_RAW isn't registred as an inet protocol, so inet_protos[protocol] is always NULL for it. Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Fixes: bf2ae2e4bf93 ("sock_diag: request _diag module only when the family or proto has been registered") Signed-off-by: Andrei Vagin <avagin@gmail.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05	net: core: netpoll: Enable netconsole IPv6 link local address	Matwey V. Kornilov	1	-1/+2
	There is no reason to discard using source link local address when remote netconsole IPv6 address is set to be link local one. The patch allows administrators to use IPv6 netconsole without explicitly configuring source address: netconsole=@/,@fe80::5054:ff:fe2f:6012/ Signed-off-by: Matwey V. Kornilov <matwey@sai.msu.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05	ipv6: properly check return value in inet6_dump_all()	Alexey Kodanev	1	-2/+2
	Make sure we call fib6_dump_end() if it happens that skb->len is zero. rtnl_dump_all() can reset cb->args on the next loop iteration there. Fixes: 08e814c9e8eb ("net/ipv6: Bail early if user only wants cloned entries") Fixes: ae677bbb4441 ("net: Don't return invalid table id error when dumping all families") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05	rtnetlink: restore handling of dumpit return value in rtnl_dump_all()	Alexey Kodanev	1	-1/+1
	For non-zero return from dumpit() we should break the loop in rtnl_dump_all() and return the result. Otherwise, e.g., we could get the memory leak in inet6_dump_fib() [1]. The pointer to the allocated struct fib6_walker there (saved in cb->args) can be lost, reset on the next iteration. Fix it by partially restoring the previous behavior before commit c63586dc9b3e ("net: rtnl_dump_all needs to propagate error from dumpit function"). The returned error from dumpit() is still passed further. [1]: unreferenced object 0xffff88001322a200 (size 96): comm "sshd", pid 1484, jiffies 4296032768 (age 1432.542s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................ 18 09 41 36 00 88 ff ff 18 09 41 36 00 88 ff ff ..A6......A6.... backtrace: [<0000000095846b39>] kmem_cache_alloc_trace+0x151/0x220 [<000000007d12709f>] inet6_dump_fib+0x68d/0x940 [<000000002775a316>] rtnl_dump_all+0x1d9/0x2d0 [<00000000d7cd302b>] netlink_dump+0x945/0x11a0 [<000000002f43485f>] __netlink_dump_start+0x55d/0x800 [<00000000f76bbeec>] rtnetlink_rcv_msg+0x4fa/0xa00 [<000000009b5761f3>] netlink_rcv_skb+0x29c/0x420 [<0000000087a1dae1>] rtnetlink_rcv+0x15/0x20 [<00000000691b703b>] netlink_unicast+0x4e3/0x6c0 [<00000000b5be0204>] netlink_sendmsg+0x7f2/0xba0 [<0000000096d2aa60>] sock_sendmsg+0xba/0xf0 [<000000008c1b786f>] __sys_sendto+0x1e4/0x330 [<0000000019587b3f>] __x64_sys_sendto+0xe1/0x1a0 [<00000000071f4d56>] do_syscall_64+0x9f/0x300 [<000000002737577f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0000000057587684>] 0xffffffffffffffff Fixes: c63586dc9b3e ("net: rtnl_dump_all needs to propagate error from dumpit function") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05	net/ipv6: Move anycast init/cleanup functions out of CONFIG_PROC_FS	Jeff Barnhill	1	-1/+1
	Move the anycast.c init and cleanup functions which were inadvertently added inside the CONFIG_PROC_FS definition. Fixes: 2384d02520ff ("net/ipv6: Add anycast addresses to a global hashtable") Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05	compiler: remove __no_sanitize_address_or_inline again	Martin Schwidefsky	3	-15/+3
	The __no_sanitize_address_or_inline and __no_kasan_or_inline defines are almost identical. The only difference is that __no_kasan_or_inline does not have the 'notrace' attribute. To be able to replace __no_sanitize_address_or_inline with the older definition, add 'notrace' to __no_kasan_or_inline and change to two users of __no_sanitize_address_or_inline in the s390 code. The 'notrace' option is necessary for e.g. the __load_psw_mask function in arch/s390/include/asm/processor.h. Without the option it is possible to trace __load_psw_mask which leads to kernel stack overflow. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Pointed-out-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-05	tracing/kprobes: Fix strpbrk() argument order	Masami Hiramatsu	1	-1/+1
	Fix strpbrk()'s argument order, it must pass acceptable string in 2nd argument. Note that this can cause a kernel panic where it recovers backup character to code->data. Link: http://lkml.kernel.org/r/154108256792.2604.1816052586385217811.stgit@devbox Fixes: a6682814f371 ("tracing/kprobes: Allow kprobe-events to record module symbol") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-04	bonding/802.3ad: fix link_failure_count tracking	Jarod Wilson	1	-2/+2
	Commit 4d2c0cda07448ea6980f00102dc3964eb25e241c set slave->link to BOND_LINK_DOWN for 802.3ad bonds whenever invalid speed/duplex values were read, to fix a problem with slaves getting into weird states, but in the process, broke tracking of link failures, as going straight to BOND_LINK_DOWN when a link is indeed down (cable pulled, switch rebooted) means we broke out of bond_miimon_inspect()'s BOND_LINK_DOWN case because !link_state was already true, we never incremented commit, and never got a chance to call bond_miimon_commit(), where slave->link_failure_count would be incremented. I believe the simple fix here is to mark the slave as BOND_LINK_FAIL, and let bond_miimon_inspect() transition the link from _FAIL to either _UP or _DOWN, and in the latter case, we now get proper incrementing of link_failure_count again. Fixes: 4d2c0cda0744 ("bonding: speed/duplex update at NETDEV_UP event") CC: Mahesh Bandewar <maheshb@google.com> CC: David S. Miller <davem@davemloft.net> CC: netdev@vger.kernel.org CC: stable@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-04	net: phy: realtek: fix RTL8201F sysfs name	Holger Hoffstätte	1	-1/+1
	Since 4.19 the following error in sysfs has appeared when using the r8169 NIC driver: $cd /sys/module/realtek/drivers $ls -l ls: cannot access 'mdio_bus:RTL8201F 10/100Mbps Ethernet': No such file or directory [..garbled dir entries follow..] Apparently the forward slash in "10/100Mbps Ethernet" is interpreted as directory separator that leads nowhere, and was introduced in commit 513588dd44b ("net: phy: realtek: add RTL8201F phy-id and functions"). Fix this by removing the offending slash in the driver name. Other drivers in net/phy seem to have the same problem, but I cannot test/verify them. Fixes: 513588dd44b ("net: phy: realtek: add RTL8201F phy-id and functions") Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-04	Linux 4.20-rc1	Linus Torvalds	1	-2/+2

2018-11-03	sctp: define SCTP_SS_DEFAULT for Stream schedulers	Xin Long	2	-1/+2
	According to rfc8260#section-4.3.2, SCTP_SS_DEFAULT is required to defined as SCTP_SS_FCFS or SCTP_SS_RR. SCTP_SS_FCFS is used for SCTP_SS_DEFAULT's value in this patch. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	sctp: fix strchange_flags name for Stream Change Event	Xin Long	1	-0/+2
	As defined in rfc6525#section-6.1.3, SCTP_STREAM_CHANGE_DENIED and SCTP_STREAM_CHANGE_FAILED should be used instead of SCTP_ASSOC_CHANGE_DENIED and SCTP_ASSOC_CHANGE_FAILED. To keep the compatibility, fix it by adding two macros. Fixes: b444153fb5a6 ("sctp: add support for generating add stream change event notification") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	mlxsw: spectrum: Fix IP2ME CPU policer configuration	Shalom Toledo	1	-1/+0
	The CPU policer used to police packets being trapped via a local route (IP2ME) was incorrectly configured to police based on bytes per second instead of packets per second. Change the policer to police based on packets per second and avoid packet loss under certain circumstances. Fixes: 9148e7cf73ce ("mlxsw: spectrum: Add policers for trap groups") Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELS	Arnd Bergmann	1	-1/+2
	When CONFIG_CC_OPTIMIZE_FOR_DEBUGGING is enabled, the compiler fails to optimize out a dead code path, which leads to a link failure: net/openvswitch/conntrack.o: In function `ovs_ct_set_labels': conntrack.c:(.text+0x2e60): undefined reference to `nf_connlabels_replace' In this configuration, we can take a shortcut, and completely remove the contrack label code. This may also help the regular optimization. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	qed: fix link config error handling	Arnd Bergmann	1	-2/+5
	gcc-8 notices that qed_mcp_get_transceiver_data() may fail to return a result to the caller: drivers/net/ethernet/qlogic/qed/qed_mcp.c: In function 'qed_mcp_trans_speed_mask': drivers/net/ethernet/qlogic/qed/qed_mcp.c:1955:2: error: 'transceiver_type' may be used uninitialized in this function [-Werror=maybe-uninitialized] When an error is returned by qed_mcp_get_transceiver_data(), we should propagate that to the caller of qed_mcp_trans_speed_mask() rather than continuing with uninitialized data. Fixes: c56a8be7e7aa ("qed: Add supported link and advertise link to display in ethtool.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-04	sched/topology: Fix off by one bug	Peter Zijlstra	1	-1/+1
	With the addition of the NUMA identity level, we increased @level by one and will run off the end of the array in the distance sort loop. Fixed: 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-03	net: hns3: Fix for out-of-bounds access when setting pfc back pressure	Yunsheng Lin	1	-6/+4
	The vport should be initialized to hdev->vport for each bp group, otherwise it will cause out-of-bounds access and bp setting not correct problem. [ 35.254124] BUG: KASAN: slab-out-of-bounds in hclge_pause_setup_hw+0x2a0/0x3f8 [hclge] [ 35.254126] Read of size 2 at addr ffff803b6651581a by task kworker/0:1/14 [ 35.254132] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted 4.19.0-rc7-hulk+ #85 [ 35.254133] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - B052 (V0.52) 09/14/2018 [ 35.254141] Workqueue: events work_for_cpu_fn [ 35.254144] Call trace: [ 35.254147] dump_backtrace+0x0/0x2f0 [ 35.254149] show_stack+0x24/0x30 [ 35.254154] dump_stack+0x110/0x184 [ 35.254157] print_address_description+0x168/0x2b0 [ 35.254160] kasan_report+0x184/0x310 [ 35.254162] __asan_load2+0x7c/0xa0 [ 35.254170] hclge_pause_setup_hw+0x2a0/0x3f8 [hclge] [ 35.254177] hclge_tm_init_hw+0x794/0x9f0 [hclge] [ 35.254184] hclge_tm_schd_init+0x48/0x58 [hclge] [ 35.254191] hclge_init_ae_dev+0x778/0x1168 [hclge] [ 35.254196] hnae3_register_ae_dev+0x14c/0x298 [hnae3] [ 35.254206] hns3_probe+0x88/0xa8 [hns3] [ 35.254210] local_pci_probe+0x7c/0xf0 [ 35.254212] work_for_cpu_fn+0x34/0x50 [ 35.254214] process_one_work+0x4d4/0xa38 [ 35.254216] worker_thread+0x55c/0x8d8 [ 35.254219] kthread+0x1b0/0x1b8 [ 35.254222] ret_from_fork+0x10/0x1c [ 35.254224] The buggy address belongs to the page: [ 35.254228] page:ffff7e00ed994400 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 35.273835] flags: 0xfffff8000008000(head) [ 35.282007] raw: 0fffff8000008000 dead000000000100 dead000000000200 0000000000000000 [ 35.282010] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 35.282012] page dumped because: kasan: bad access detected [ 35.282014] Memory state around the buggy address: [ 35.282017] ffff803b66515700: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe [ 35.282019] ffff803b66515780: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe [ 35.282021] >ffff803b66515800: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe [ 35.282022] ^ [ 35.282024] ffff803b66515880: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe [ 35.282026] ffff803b66515900: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe [ 35.282028] ================================================================== [ 35.282029] Disabling lock debugging due to kernel taint [ 35.282747] hclge driver initialization finished. Fixes: 67bf2541f4b9 ("net: hns3: Fixes the back pressure setting when sriov is enabled") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	net/mlx4_en: use __netdev_tx_sent_queue()	Eric Dumazet	1	-2/+4
	doorbell only depends on xmit_more and netif_tx_queue_stopped() Using __netdev_tx_sent_queue() avoids messing with BQL stop flag, and is more generic. This patch increases performance on GSO workload by keeping doorbells to the minimum required. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	net: do not abort bulk send on BQL status	Eric Dumazet	1	-1/+1
	Before calling dev_hard_start_xmit(), upper layers tried to cook optimal skb list based on BQL budget. Problem is that GSO packets can end up comsuming more than the BQL budget. Breaking the loop is not useful, since requeued packets are ahead of any packets still in the qdisc. It is also more expensive, since next TX completion will push these packets later, while skbs are not in cpu caches. It is also a behavior difference with TSO packets, that can break the BQL limit by a large amount. Note that drivers should use __netdev_tx_sent_queue() in order to have optimal xmit_more support, and avoid useless atomic operations as shown in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	net: bql: add __netdev_tx_sent_queue()	Eric Dumazet	1	-0/+20
	When qdisc_run() tries to use BQL budget to bulk-dequeue a batch of packets, GSO can later transform this list in another list of skbs, and each skb is sent to device ndo_start_xmit(), one at a time, with skb->xmit_more being set to one but for last skb. Problem is that very often, BQL limit is hit in the middle of the packet train, forcing dev_hard_start_xmit() to stop the bulk send and requeue the end of the list. BQL role is to avoid head of line blocking, making sure a qdisc can deliver high priority packets before low priority ones. But there is no way requeued packets can be bypassed by fresh packets in the qdisc. Aborting the bulk send increases TX softirqs, and hot cache lines (after skb_segment()) are wasted. Note that for TSO packets, we never split a packet in the middle because of BQL limit being hit. Drivers should be able to update BQL counters without flipping/caring about BQL status, if the current skb has xmit_more set. Upper layers are ultimately responsible to stop sending another packet train when BQL limit is hit. Code template in a driver might look like the following : send_doorbell = __netdev_tx_sent_queue(tx_queue, nr_bytes, skb->xmit_more); Note that __netdev_tx_sent_queue() use is not mandatory, since following patch will change dev_hard_start_xmit() to not care about BQL status. But it is highly recommended so that xmit_more full benefits can be reached (less doorbells sent, and less atomic operations as well) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	s390/qeth: report 25Gbit link speed	Julian Wiedmann	2	-2/+20
	This adds the various identifiers for 25Gbit cards, and wires them up into sysfs and ethtool. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	s390/qeth: sanitize ARP requests	Julian Wiedmann	4	-79/+34
	The ARP_{ADD,REMOVE}_ENTRY cmd structs contain reserved fields. Introduce a common helper that doesn't raw-copy the user-provided data into the cmd, but only sets those fields that are strictly needed for the command. This also sets the correct command length for ARP_REMOVE_ENTRY. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	s390/qeth: fix initial operstate	Julian Wiedmann	4	-10/+25
	Setting the carrier 'on' for an unregistered netdevice doesn't update its operstate. Fix this by delaying the update until the netdevice has been registered. Fixes: 91cc98f51e3d ("s390/qeth: remove duplicated carrier state tracking") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	s390/qeth: unregister netdevice only when registered	Julian Wiedmann	3	-4/+11
	qeth only registers its netdevice when the qeth device is first set online. Thus a device that has never been set online will trigger a WARN ("network todo 'hsi%d' but state 0") in unregister_netdev() when removed. Fix this by protecting the unregister step, just like we already protect against repeated registering of the netdevice. Fixes: d3d1b205e89f ("s390/qeth: allocate netdevice early") Reported-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	s390/qeth: fix HiperSockets sniffer	Julian Wiedmann	1	-3/+5
	Sniffing mode for L3 HiperSockets requires that no IP addresses are registered with the HW. The preferred way to achieve this is for userspace to delete all the IPs on the interface. But qeth is expected to also tolerate a configuration where that is not the case, by skipping the IP registration when in sniffer mode. Since commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") reworked the IP registration logic in the L3 subdriver, this no longer works. When the qeth device is set online, qeth_l3_recover_ip() now unconditionally registers all unicast addresses from our internal IP table. While we could fix this particular problem by skipping qeth_l3_recover_ip() on a sniffer device, the more future-proof change is to skip the IP address registration at the lowest level. This way we a) catch any future code path that attempts to register an IP address without considering the sniffer scenario, and b) continue to build up our internal IP table, so that if sniffer mode is switched off later we can operate just like normal. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	s390/qeth: sanitize strings in debug messages	Julian Wiedmann	4	-151/+119
	As Documentation/s390/s390dbf.txt states quite clearly, using any pointer in sprinf-formatted s390dbf debug entries is dangerous. The pointers are dereferenced whenever the trace file is read from. So if the referenced data has a shorter life-time than the trace file, any read operation can result in a use-after-free. So rip out all hazardous use of indirect data, and replace any usage of dev_name() and such by the Bus ID number. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03	memory_hotplug: cond_resched in __remove_pages	Michal Hocko	1	-0/+1
	We have received a bug report that unbinding a large pmem (>1TB) can result in a soft lockup: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [ndctl:4365] [...] Supported: Yes CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default #1 SLE12-SP4 Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018 task: ffff9cce7d4410c0 task.stack: ffffbe9eb1bc4000 RIP: 0010:__put_page+0x62/0x80 Call Trace: devm_memremap_pages_release+0x152/0x260 release_nodes+0x18d/0x1d0 device_release_driver_internal+0x160/0x210 unbind_store+0xb3/0xe0 kernfs_fop_write+0x102/0x180 __vfs_write+0x26/0x150 vfs_write+0xad/0x1a0 SyS_write+0x42/0x90 do_syscall_64+0x74/0x150 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fd13166b3d0 It has been reported on an older (4.12) kernel but the current upstream code doesn't cond_resched in the hot remove code at all and the given range to remove might be really large. Fix the issue by calling cond_resched once per memory section. Link: http://lkml.kernel.org/r/20181031125840.23982-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Dan Williams <dan.j.williams@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03	bfs: add sanity check at bfs_fill_super()	Tetsuo Handa	1	-3/+6
	syzbot is reporting too large memory allocation at bfs_fill_super() [1]. Since file system image is corrupted such that bfs_sb->s_start == 0, bfs_fill_super() is trying to allocate 8MB of continuous memory. Fix this by adding a sanity check on bfs_sb->s_start, __GFP_NOWARN and printf(). [1] https://syzkaller.appspot.com/bug?id=16a87c236b951351374a84c8a32f40edbc034e96 Link: http://lkml.kernel.org/r/1525862104-3407-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+71c6b5d68e91149fc8a4@syzkaller.appspotmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tigran Aivazian <aivazian.tigran@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03	kernel/sysctl.c: remove duplicated include	Michael Schupikov	1	-1/+0
	Remove one include of <linux/pipe_fs_i.h>. No functional changes. Link: http://lkml.kernel.org/r/20181004134223.17735-1-michael@schupikov.de Signed-off-by: Michael Schupikov <michael@schupikov.de> Reviewed-by: Richard Weinberger <richard@nod.at> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03	kernel/kexec_file.c: remove some duplicated includes	zhong jiang	1	-2/+0
	We include kexec.h and slab.h twice in kexec_file.c. It's unnecessary. hence just remove them. Link: http://lkml.kernel.org/r/1537498098-19171-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>