linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-02-09	platform/x86: mlx-platform: Add support for new 200G IB and Ethernet systems	Vadim Pasternak	1	-0/+142
	It adds support for new Mellanox system types of basic classes qmb7, sn34, sn37, containing systems QMB700 (40x200GbE InfiniBand switch), SN3700 (32x200GbE and 16x400GbE Ethernet switch) and SN3410 (6x400GbE plus 48x50GbE Ethernet switch). These are the Top of the Rack systems, equipped with Mellanox COM-Express carrier board and switch board with Mellanox Quantum device, which supports InfiniBand switching with 40X200G ports and line rate of up to HDR speed or with Mellanox Spectrum-2 device, which supports Ethernet switching with 32X200G ports line rate of up to HDR speed. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-09	platform/x86: mlx-platform: Add support for new msn201x system type	Vadim Pasternak	1	-0/+59
	It adds support for new Mellanox system types of basic half unit size class msn201x, containing system MSN2010 (18x10GbE plus 4x4x25GbE) half and its derivatives. This is the Top of the Rack system, equipped with Mellanox Small Form Factor carrier board and switch board with Mellanox Spectrum device, which supports Ethernet switching with 32X100G ports line rate of up to EDR speed. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-09	platform/x86: mlx-platform: Add support for new msn274x system type	Vadim Pasternak	1	-0/+124
	It adds support for new Mellanox system types of basic class msn274x, containing system MSN2740 (32x100GbE Ethernet switch with cost reduction) and its derivatives. These are the Top of the Rack system, equipped with Mellanox Small Form Factor carrier board and switch board with Mellanox Spectrum device, which supports Ethernet switching with 32X100G ports line rate of up to EDR speed. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-09	ibmvnic: Remove skb->protocol checks in ibmvnic_xmit	John Allen	1	-4/+1
	Having these checks in ibmvnic_xmit causes problems with VLAN tagging and balance-alb/tlb bonding modes. The restriction they imposed can be removed. Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	bpf: fix rlimit in reuseport net selftest	Daniel Borkmann	1	-1/+20
	Fix two issues in the reuseport_bpf selftests that were reported by Linaro CI: [...] + ./reuseport_bpf ---- IPv4 UDP ---- Testing EBPF mod 10... Reprograming, testing mod 5... ./reuseport_bpf: ebpf error. log: 0: (bf) r6 = r1 1: (20) r0 = (u32 )skb[0] 2: (97) r0 %= 10 3: (95) exit processed 4 insns : Operation not permitted + echo FAIL [...] ---- IPv4 TCP ---- Testing EBPF mod 10... ./reuseport_bpf: failed to bind send socket: Address already in use + echo FAIL [...] For the former adjust rlimit since this was the cause of failure for loading the BPF prog, and for the latter add SO_REUSEADDR. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://bugs.linaro.org/show_bug.cgi?id=3502 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	sctp: verify size of a new chunk in _sctp_make_chunk()	Alexey Kodanev	1	-1/+6
	When SCTP makes INIT or INIT_ACK packet the total chunk length can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when transmitting these packets, e.g. the crash on sending INIT_ACK: [ 597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168 put:120156 head:000000007aa47635 data:00000000d991c2de tail:0x1d640 end:0xfec0 dev:<NULL> ... [ 597.976970] ------------[ cut here ]------------ [ 598.033408] kernel BUG at net/core/skbuff.c:104! [ 600.314841] Call Trace: [ 600.345829] <IRQ> [ 600.371639] ? sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.436934] skb_put+0x16c/0x200 [ 600.477295] sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.540630] ? sctp_packet_config+0x890/0x890 [sctp] [ 600.601781] ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp] [ 600.671356] ? sctp_cmp_addr_exact+0x3f/0x90 [sctp] [ 600.731482] sctp_outq_flush+0x663/0x30d0 [sctp] [ 600.788565] ? sctp_make_init+0xbf0/0xbf0 [sctp] [ 600.845555] ? sctp_check_transmitted+0x18f0/0x18f0 [sctp] [ 600.912945] ? sctp_outq_tail+0x631/0x9d0 [sctp] [ 600.969936] sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp] [ 601.041593] ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp] [ 601.104837] ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp] [ 601.175436] ? sctp_eat_data+0x1710/0x1710 [sctp] [ 601.233575] sctp_do_sm+0x182/0x560 [sctp] [ 601.284328] ? sctp_has_association+0x70/0x70 [sctp] [ 601.345586] ? sctp_rcv+0xef4/0x32f0 [sctp] [ 601.397478] ? sctp6_rcv+0xa/0x20 [sctp] ... Here the chunk size for INIT_ACK packet becomes too big, mostly because of the state cookie (INIT packet has large size with many address parameters), plus additional server parameters. Later this chunk causes the panic in skb_put_data(): skb_packet_transmit() sctp_packet_pack() skb_put_data(nskb, chunk->skb->data, chunk->skb->len); 'nskb' (head skb) was previously allocated with packet->size from u16 'chunk->chunk_hdr->length'. As suggested by Marcelo we should check the chunk's length in _sctp_make_chunk() before trying to allocate skb for it and discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN. Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leinter@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	s390/qeth: fix SETIP command handling	Julian Wiedmann	2	-6/+13
	send_control_data() applies some special handling to SETIP v4 IPA commands. But current code parses all command types for the SETIP command code. Limit the command code check to IPA commands. Fixes: 5b54e16f1a54 ("qeth: do not spin for SETIP ip assist command") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	s390/qeth: fix underestimated count of buffer elements	Ursula Braun	1	-1/+1
	For a memory range/skb where the last byte falls onto a page boundary (ie. 'end' is of the form xxx...xxx001), the PFN_UP() part of the calculation currently doesn't round up to the next PFN due to an off-by-one error. Thus qeth believes that the skb occupies one page less than it actually does, and may select a IO buffer that doesn't have enough spare buffer elements to fit all of the skb's data. HW detects this as a malformed buffer descriptor, and raises an exception which then triggers device recovery. Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	ptr_ring: try vmalloc() when kmalloc() fails	Jason Wang	1	-5/+8
	This patch switch to use kvmalloc_array() for using a vmalloc() fallback to help in case kmalloc() fails. Reported-by: syzbot+e4d4f9ddd4295539735d@syzkaller.appspotmail.com Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZE	Jason Wang	1	-0/+2
	To avoid slab to warn about exceeded size, fail early if queue occupies more than KMALLOC_MAX_SIZE. Reported-by: syzbot+e4d4f9ddd4295539735d@syzkaller.appspotmail.com Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	net: stmmac: remove redundant enable of PMT irq	Niklas Cassel	2	-4/+1
	For dwmac4, GMAC_INT_DEFAULT_ENABLE already includes GMAC_INT_PMT_EN, so it is redundant to check if hw->pmt is set, and if so, setting the bit again. For dwmac1000, GMAC_INT_DEFAULT_MASK does not include GMAC_INT_DISABLE_PMT, so it is redundant to check if hw->pmt is set, and if so, clearing an already cleared bit. Improve code readability by removing this redundant code. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	net: stmmac: rename GMAC_INT_DEFAULT_MASK for dwmac4	Niklas Cassel	2	-3/+3
	GMAC_INT_DEFAULT_MASK is written to the interrupt enable register. In previous versions of the IP (e.g. dwmac1000), this register was instead an interrupt mask register. To improve clarity and reflect reality, rename GMAC_INT_DEFAULT_MASK to GMAC_INT_DEFAULT_ENABLE. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	net: stmmac: discard disabled flags in interrupt status register	Niklas Cassel	1	-2/+4
	The interrupt status register in both dwmac1000 and dwmac4 ignores interrupt enable (for dwmac4) / interrupt mask (for dwmac1000). Therefore, if we want to check only the bits that can actually trigger an irq, we have to filter the interrupt status register manually. Commit 0a764db10337 ("stmmac: Discard masked flags in interrupt status register") fixed this for dwmac1000. Fix the same issue for dwmac4. Just like commit 0a764db10337 ("stmmac: Discard masked flags in interrupt status register"), this makes sure that we do not get spurious link up/link down prints. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	ibmvnic: Reset long term map ID counter	Thomas Falcon	1	-0/+1
	When allocating RX or TX buffer pools, the driver needs to provide a unique mapping ID to firmware for each pool. This value is assigned using a counter which is incremented after a new pool is created. The ID can be an integer ranging from 1-255. When migrating to a device that requests a different number of queues, this value was not being reset properly. As a result, after enough migrations, the counter exceeded the upper bound and pool creation failed. This is fixed by resetting the counter to one in this case. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context	Trond Myklebust	1	-2/+2
	Calling __UDPX_INC_STATS() from a preemptible context leads to a warning of the form: BUG: using __this_cpu_add() in preemptible [00000000] code: kworker/u5:0/31 caller is xs_udp_data_receive_workfn+0x194/0x270 CPU: 1 PID: 31 Comm: kworker/u5:0 Not tainted 4.15.0-rc8-00076-g90ea9f1 #2 Workqueue: xprtiod xs_udp_data_receive_workfn Call Trace: dump_stack+0x85/0xc1 check_preemption_disabled+0xce/0xe0 xs_udp_data_receive_workfn+0x194/0x270 process_one_work+0x318/0x620 worker_thread+0x20a/0x390 ? process_one_work+0x620/0x620 kthread+0x120/0x130 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x24/0x30 Since we're taking a spinlock in those functions anyway, let's fix the issue by moving the call so that it occurs under the spinlock. Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-09	video: omapfb: fix missing #includes	Tomi Valkeinen	4	-0/+6
	The omapfb driver fails to build after commit 23c35f48f5fb ("pinctrl: remove include file from <linux/device.h>") because it relies on the <linux/pinctrl/consumer.h> and <linux/seq_file.h> being pulled in by the <linux/device.h> header implicitly. Include these headers explicitly to avoid the build failures. Fixes: 23c35f48f5fb ("pinctrl: remove include file from <linux/device.h>") Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Tested-by: Tony Lindgren <tony@atomide.com> [b.zolnierkie: fix include order and patch description] Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-02-09	KVM: PPC: Book3S: Add MMIO emulation for VMX instructions	Jose Ricardo Ziviani	5	-0/+198
	This patch provides the MMIO load/store vector indexed X-Form emulation. Instructions implemented: lvx: the quadword in storage addressed by the result of EA & 0xffff_ffff_ffff_fff0 is loaded into VRT. stvx: the contents of VRS are stored into the quadword in storage addressed by the result of EA & 0xffff_ffff_ffff_fff0. Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-09	KVM: PPC: Book3S HV: Branch inside feature section	Alexander Graf	1	-1/+2
	We ended up with code that did a conditional branch inside a feature section to code outside of the feature section. Depending on how the object file gets organized, that might mean we exceed the 14bit relocation limit for conditional branches: arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4 So instead of doing a conditional branch outside of the feature section, let's just jump at the end of the same, making the branch very short. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-09	KVM: PPC: Book3S HV: Make HPT resizing work on POWER9	David Gibson	2	-9/+24
	This adds code to enable the HPT resizing code to work on POWER9, which uses a slightly modified HPT entry format compared to POWER8. On POWER9, we convert HPTEs read from the HPT from the new format to the old format so that the rest of the HPT resizing code can work as before. HPTEs written to the new HPT are converted to the new format as the last step before writing them into the new HPT. This takes out the checks added by commit bcd3bb63dbc8 ("KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18), now that HPT resizing works on POWER9. On POWER9, when we pivot to the new HPT, we now call kvmppc_setup_partition_table() to update the partition table in order to make the hardware use the new HPT. [paulus@ozlabs.org - added kvmppc_setup_partition_table() call, wrote commit message.] Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-09	KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code	Paul Mackerras	1	-6/+2
	This fixes the computation of the HPTE index to use when the HPT resizing code encounters a bolted HPTE which is stored in its secondary HPTE group. The code inverts the HPTE group number, which is correct, but doesn't then mask it with new_hash_mask. As a result, new_pteg will be effectively negative, resulting in new_hptep pointing before the new HPT, which will corrupt memory. In addition, this removes two BUG_ON statements. The condition that the BUG_ONs were testing -- that we have computed the hash value incorrectly -- has never been observed in testing, and if it did occur, would only affect the guest, not the host. Given that BUG_ON should only be used in conditions where the kernel (i.e. the host kernel, in this case) can't possibly continue execution, it is not appropriate here. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-08	platform/x86: mlx-platform: Fix power cable setting for msn21xx family	Vadim Pasternak	1	-2/+21
	Add dedicated structure with power cable setting for Mellanox msn21xx family. These systems do not have a physical device for the power unit controller. When the power cable is inserted or removed, the relevant interrupt signal is handled, the status is updated, but no device is associated with the signal. Add definition for interrupt low aggregation signal. On system from msn21xx family, low aggregation mask should be removed in order to allow signal to hit CPU. Fixes: 6613d18e9038 ("platform/x86: mlx-platform: Move module from arch/x86") Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-08	platform/x86: mlx-platform: Add define for the negative bus	Vadim Pasternak	1	-0/+1
	Add define for the negative bus ID in order to use it in case no hotplug device is associated with the hotplug interrupt signal. In this case, the signal will be handled by the mlxreg-hotplug driver, but no device will be associated with the signal. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-08	platform/x86: mlx-platform: Use defines for bus assignment	Vadim Pasternak	1	-8/+15
	Add defines for the bus IDs, used for hotplug device topology to improve code readability. Defines added for FAN and power units. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-08	platform/mellanox: mlxreg-hotplug: Fix uninitialized variable	Geert Uytterhoeven	1	-1/+1
	With gcc-4.1.2: drivers/platform/mellanox/mlxreg-hotplug.c: In function ‘mlxreg_hotplug_health_work_helper’: drivers/platform/mellanox/mlxreg-hotplug.c:347: warning: ‘ret’ is used uninitialized in this function Indeed, if mlxreg_core_item.count is zero, ret is used uninitialized. While this is unlikely to happen (it is set to ARRAY_SIZE(...) in x86 board files), this is done in another source file, so fix this by preinitializing ret to zero. Fixes: c6acad68eb2dbffd ("platform/mellanox: mlxreg-hotplug: Modify to use a regmap interface") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-09	tools/libbpf: handle issues with bpf ELF objects containing .eh_frames	Jesper Dangaard Brouer	1	-0/+26
	V3: More generic skipping of relo-section (suggested by Daniel) If clang >= 4.0.1 is missing the option '-target bpf', it will cause llc/llvm to create two ELF sections for "Exception Frames", with section names '.eh_frame' and '.rel.eh_frame'. The BPF ELF loader library libbpf fails when loading files with these sections. The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c, handle this gracefully. And iproute2 loader also seems to work with these "eh" sections. The issue in libbpf is caused by bpf_object__elf_collect() skipping some sections, and later when performing relocation it will be pointing to a skipped section, as these sections cannot be found by bpf_object__find_prog_by_idx() in bpf_object__collect_reloc(). This is a general issue that also occurs for other sections, like debug sections which are also skipped and can have relo section. As suggested by Daniel. To avoid keeping state about all skipped sections, instead perform a direct qlookup in the ELF object. Lookup the section that the relo-section points to and check if it contains executable machine instructions (denoted by the sh_flags SHF_EXECINSTR). Use this check to also skip irrelevant relo-sections. Note, for samples/bpf/ the '-target bpf' parameter to clang cannot be used due to incompatibility with asm embedded headers, that some of the samples include. This is explained in more details by Yonghong Song in bpf_devel_QA. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-09	selftests/bpf: add selftest that use test_libbpf_open	Jesper Dangaard Brouer	2	-2/+61
	This script test_libbpf.sh will be part of the 'make run_tests' invocation, but can also be invoked manually in this directory, and a verbose mode can be enabled via setting the environment variable $VERBOSE like: $ VERBOSE=yes ./test_libbpf.sh The script contains some tests that are commented out, as they currently fail. They are reminders about what we need to improve for the libbpf loader library. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-09	selftests/bpf: add test program for loading BPF ELF files	Jesper Dangaard Brouer	2	-1/+151
	V2: Moved program into selftests/bpf from tools/libbpf This program can be used on its own for testing/debugging if a BPF ELF-object file can be loaded with libbpf (from tools/lib/bpf). If something is wrong with the ELF object, the program have a --debug mode that will display the ELF sections and especially the skipped sections. This allows for quickly identifying the problematic ELF section number, which can be corrolated with the readelf tool. The program signal error via return codes, and also have a --quiet mode, which is practical for use in scripts like selftests/bpf. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-09	tools/libbpf: improve the pr_debug statements to contain section numbers	Jesper Dangaard Brouer	1	-12/+13
	While debugging a bpf ELF loading issue, I needed to correlate the ELF section number with the failed relocation section reference. Thus, add section numbers/index to the pr_debug. In debug mode, also print section that were skipped. This helped me identify that a section (.eh_frame) was skipped, and this was the reason the relocation section (.rel.eh_frame) could not find that section number. The section numbers corresponds to the readelf tools Section Headers [Nr]. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-09	bpf: Sync kernel ABI header with tooling header for bpf_common.h	Jesper Dangaard Brouer	1	-3/+4
	I recently fixed up a lot of commits that forgot to keep the tooling headers in sync. And then I forgot to do the same thing in commit cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes"). Let correct that before people notice ;-). Lawrence did partly fix/sync this for bpf.h in commit d6d4f60c3a09 ("bpf: add selftest for tcpbpf"). Fixes: cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-08	fix parallelism for rpc tasks	Olga Kornievskaia	1	-1/+1
	Hi folks, On a multi-core machine, is it expected that we can have parallel RPCs handled by each of the per-core workqueue? In testing a read workload, observing via "top" command that a single "kworker" thread is running servicing the requests (no parallelism). It's more prominent while doing these operations over krb5p mount. What has been suggested by Bruce is to try this and in my testing I see then the read workload spread among all the kworker threads. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-08	net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT	Heiner Kallweit	1	-1/+1
	This condition wasn't adjusted when PHY_IGNORE_INTERRUPT (-2) was added long ago. In case of PHY_IGNORE_INTERRUPT the MAC interrupt indicates also PHY state changes and we should do what the symbol says. Fixes: 84a527a41f38 ("net: phylib: fix interrupts re-enablement in phy_start") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	net: thunder: change q_len's type to handle max ring size	Dean Nelson	1	-1/+1
	The Cavium thunder nicvf driver supports rx/tx rings of up to 65536 entries per. The number of entires are stored in the q_len member of struct q_desc_mem. The problem is that q_len being a u16, results in 65536 becoming 0. In getting pointers to descriptors in the rings, the driver uses q_len minus 1 as a mask after incrementing the pointer, in order to go back to the beginning and not go past the end of the ring. With the q_len set to 0 the mask is no longer correct and the driver does go beyond the end of the ring, causing various ills. Usually the first thing that shows up is a "NETDEV WATCHDOG: enP2p1s0f1 (nicvf): transmit queue 7 timed out" warning. This patch remedies the problem by changing q_len to a u32. Signed-off-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	tipc: fix skb truesize/datasize ratio control	Hoang Le	1	-2/+2
	In commit d618d09a68e4 ("tipc: enforce valid ratio between skb truesize and contents") we introduced a test for ensuring that the condition truesize/datasize <= 4 is true for a received buffer. Unfortunately this test has two problems. - Because of the integer arithmetics the test if (skb->truesize / buf_roundup_len(skb) > 4) will miss all ratios [4 < ratio < 5], which was not the intention. - The buffer returned by skb_copy() inherits skb->truesize of the original buffer, which doesn't help the situation at all. In this commit, we change the ratio condition and replace skb_copy() with a call to skb_copy_expand() to finally get this right. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	net/sched: cls_u32: fix cls_u32 on filter replace	Ivan Vecera	1	-1/+2
	The following sequence is currently broken: # tc qdisc add dev foo ingress # tc filter replace dev foo protocol all ingress \ u32 match u8 0 0 action mirred egress mirror dev bar1 # tc filter replace dev foo protocol all ingress \ handle 800::800 pref 49152 \ u32 match u8 0 0 action mirred egress mirror dev bar2 Error: cls_u32: Key node flags do not match passed flags. We have an error talking to the kernel, -1 The error comes from u32_change() when comparing new and existing flags. The existing ones always contains one of TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state. These flags cannot be passed from userspace so the condition (n->flags != flags) in u32_change() always fails. Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and TCA_CLS_FLAGS_IN_HW are not taken into account. Fixes: 24d3dc6d27ea ("net/sched: cls_u32: Reflect HW offload status") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	mpls, nospec: Sanitize array index in mpls_label_ok()	Dan Williams	1	-10/+14
	mpls_label_ok() validates that the 'platform_label' array index from a userspace netlink message payload is valid. Under speculation the mpls_label_ok() result may not resolve in the CPU pipeline until after the index is used to access an array element. Sanitize the index to zero to prevent userspace-controlled arbitrary out-of-bounds speculation, a precursor for a speculative execution side channel vulnerability. Cc: <stable@vger.kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management	Sowmini Varadhan	11	-30/+76
	An rds_connection can get added during netns deletion between lines 528 and 529 of 506 static void rds_tcp_kill_sock(struct net net) : / code to pull out all the rds_connections that should be destroyed */ : 528 spin_unlock_irq(&rds_tcp_conn_lock); 529 list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) 530 rds_conn_destroy(tc->t_cpath->cp_conn); Such an rds_connection would miss out the rds_conn_destroy() loop (that cancels all pending work) and (if it was scheduled after netns deletion) could trigger the use-after-free. A similar race-window exists for the module unload path in rds_tcp_exit -> rds_tcp_destroy_conns Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled by checking check_net() before enqueuing new work or adding new connections. Concurrency with module-unload is handled by maintaining a module specific flag that is set at the start of the module exit function, and must be checked before enqueuing new work or adding new connections. This commit refactors existing RDS_DESTROY_PENDING checks added by commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown") and consolidates all the concurrency checks listed above into the function rds_destroy_pending(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	net: Whitelist the skbuff_head_cache "cb" field	Kees Cook	1	-1/+3
	Most callers of put_cmsg() use a "sizeof(foo)" for the length argument. Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a result of the cmsg header calculations. This means that hardened usercopy will examine the copy, even though it was technically a fixed size and should be implicitly whitelisted. All the put_cmsg() calls being built from values in skbuff_head_cache are coming out of the protocol-defined "cb" field, so whitelist this field entirely instead of creating per-use bounce buffers, for which there are concerns about performance. Original report was: Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'skbuff_head_cache' (offset 64, size 16)! WARNING: CPU: 0 PID: 3663 at mm/usercopy.c:81 usercopy_warn+0xdb/0x100 mm/usercopy.c:76 ... __check_heap_object+0x89/0xc0 mm/slab.c:4426 check_heap_object mm/usercopy.c:236 [inline] __check_object_size+0x272/0x530 mm/usercopy.c:259 check_object_size include/linux/thread_info.h:112 [inline] check_copy_size include/linux/thread_info.h:143 [inline] copy_to_user include/linux/uaccess.h:154 [inline] put_cmsg+0x233/0x3f0 net/core/scm.c:242 sock_recv_errqueue+0x200/0x3e0 net/core/sock.c:2913 packet_recvmsg+0xb2e/0x17a0 net/packet/af_packet.c:3296 sock_recvmsg_nosec net/socket.c:803 [inline] sock_recvmsg+0xc9/0x110 net/socket.c:810 ___sys_recvmsg+0x2a4/0x640 net/socket.c:2179 __sys_recvmmsg+0x2a9/0xaf0 net/socket.c:2287 SYSC_recvmmsg net/socket.c:2368 [inline] SyS_recvmmsg+0xc4/0x160 net/socket.c:2352 entry_SYSCALL_64_fastpath+0x29/0xa0 Reported-by: syzbot+e2d6cfb305e9f3911dea@syzkaller.appspotmail.com Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	net: Extra '_get' in declaration of arch_get_platform_mac_address	Mathieu Malaterre	1	-1/+1
	In commit c7f5d105495a ("net: Add eth_platform_get_mac_address() helper."), two declarations were added: int eth_platform_get_mac_address(struct device dev, u8 mac_addr); unsigned char arch_get_platform_get_mac_address(void); An extra '_get' was introduced in arch_get_platform_get_mac_address, remove it. Fix compile warning using W=1: CC net/ethernet/eth.o net/ethernet/eth.c:523:24: warning: no previous prototype for ‘arch_get_platform_mac_address’ [-Wmissing-prototypes] unsigned char __weak arch_get_platform_mac_address(void) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ AR net/ethernet/built-in.o Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	ibmvnic: queue reset when CRQ gets closed during reset	Nathan Fontenot	1	-1/+5
	While handling a driver reset we get a H_CLOSED return trying to send a CRQ event. When this occurs we need to queue up another reset attempt. Without doing this we see instances where the driver is left in a closed state because the reset failed and there is no further attempts to reset the driver. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	atm: he: use 64-bit arithmetic instead of 32-bit	Gustavo A. R. Silva	1	-4/+4
	Add suffix ULL to constants 272, 204, 136 and 68 in order to give the compiler complete information about the proper arithmetic to use. Notice that these constants are used in contexts that expect expressions of type unsigned long long (64 bits, unsigned). The following expressions are currently being evaluated using 32-bit arithmetic: 272 * mult 204 * mult 136 * mult 68 * mult Addresses-Coverity-ID: 201058 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	cramfs: better MTD dependency expression	Nicolas Pitre	1	-2/+1
	Commit b9f5fb1800d8 ("cramfs: fix MTD dependency") did what it says. Since commit 9059a3493efe ("kconfig: fix relational operators for bool and tristate symbols") it is possible to do it slightly better though. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-08	rtnetlink: require unique netns identifier	Christian Brauner	1	-0/+48
	Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK it is possible for userspace to send us requests with three different properties to identify a target network namespace. This affects at least RTM_{NEW,SET}LINK. Each of them could potentially refer to a different network namespace which is confusing. For legacy reasons the kernel will pick the IFLA_NET_NS_PID property first and then look for the IFLA_NET_NS_FD property but there is no reason to extend this type of behavior to network namespace ids. The regression potential is quite minimal since the rtnetlink requests in question either won't allow IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	tuntap: add missing xdp flush	Jason Wang	1	-0/+15
	When using devmap to redirect packets between interfaces, xdp_do_flush() is usually a must to flush any batched packets. Unfortunately this is missed in current tuntap implementation. Unlike most hardware driver which did XDP inside NAPI loop and call xdp_do_flush() at then end of each round of poll. TAP did it in the context of process e.g tun_get_user(). So fix this by count the pending redirected packets and flush when it exceeds NAPI_POLL_WEIGHT or MSG_MORE was cleared by sendmsg() caller. With this fix, xdp_redirect_map works again between two TAPs. Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	kconfig: send error messages to stderr	Masahiro Yamada	4	-19/+24
	These messages should be directed to stderr. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
2018-02-09	kconfig: echo stdin to stdout if either is redirected	Masahiro Yamada	1	-3/+4
	If stdio is not tty, conf_askvalue() puts additional new line to prevent prompts from being concatenated into a single line. This care is missing in conf_choice(), so a 'choice' prompt and the next prompt are shown in the same line. Move the code into xfgets() to cater to all cases. To improve this more, let's echo stdin to stdout. This clarifies what keys were input from stdio and the stdout looks like as if it were from tty. I removed the isatty(2) check since stderr is unrelated here. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
2018-02-09	kconfig: remove check_stdin()	Masahiro Yamada	1	-14/+0
	Except silentoldconfig, valid_stdin is 1, so check_stdin() is no-op. oldconfig and silentoldconfig work almost in the same way except that the latter generates additional files under include/. Both ask users for input for new symbols. I do not know why only silentoldconfig requires stdio be tty. $ rm -f .config; touch .config $ yes "" \| make oldconfig > stdout $ rm -f .config; touch .config $ yes "" \| make silentoldconfig > stdout make[1]: * [silentoldconfig] Error 1 make: * [silentoldconfig] Error 2 $ tail -n 4 stdout Console input/output is redirected. Run 'make oldconfig' to update configuration. scripts/kconfig/Makefile:40: recipe for target 'silentoldconfig' failed Makefile:507: recipe for target 'silentoldconfig' failed Redirection is useful, for example, for testing where we want to give particular key inputs from a test file, then check the result. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
2018-02-09	kconfig: remove 'config*' pattern from .gitignnore	Masahiro Yamada	1	-1/+0
	I could not figure out why this pattern should be ignored. Checking commit 1e65174a3378 ("Add some basic .gitignore files") did not help. Let's remove this pattern, then see if it is really needed. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
2018-02-09	kconfig: show '?' prompt even if no help text is available	Masahiro Yamada	1	-7/+2
	'make config', 'make oldconfig', etc. always receive '?' as a valid input and show useful information even if no help text is available. ------------------------>8------------------------ foo (FOO) [N/y] (NEW) ? There is no help available for this option. Symbol: FOO [=n] Type : bool Prompt: foo Defined at Kconfig:1 ------------------------>8------------------------ However, '?' is not shown in the prompt if its help text is missing. Let's show '?' all the time so that the prompt and the behavior match. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
2018-02-09	kconfig: do not write choice values when their dependency becomes n	Masahiro Yamada	1	-9/+7
	"# CONFIG_... is not set" for choice values are wrongly written into the .config file if they are once visible, then become invisible later. Test case --------- ---------------------------(Kconfig)---------------------------- config A bool "A" choice prompt "Choice ?" depends on A config CHOICE_B bool "Choice B" config CHOICE_C bool "Choice C" endchoice ---------------------------------------------------------------- ---------------------------(.config)---------------------------- CONFIG_A=y ---------------------------------------------------------------- With the Kconfig and .config above, $ make config scripts/kconfig/conf --oldaskconfig Kconfig * * Linux Kernel Configuration * A (A) [Y/n] n # # configuration written to .config # $ cat .config # # Automatically generated file; DO NOT EDIT. # Linux Kernel Configuration # # CONFIG_A is not set # CONFIG_CHOICE_B is not set # CONFIG_CHOICE_C is not set Here, # CONFIG_CHOICE_B is not set # CONFIG_CHOICE_C is not set should not be written into the .config file because their dependency "depends on A" is unmet. Currently, there is no code that clears SYMBOL_WRITE of choice values. Clear SYMBOL_WRITE for all symbols in sym_calc_value(), then set it again after calculating visibility. To simplify the logic, set the flag if they have non-n visibility, regardless of types, and regardless of whether they are choice values or not. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
2018-02-08	netlink: ensure to loop over all netns in genlmsg_multicast_allns()	Nicolas Dichtel	1	-2/+10
	Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the case when commit 134e63756d5f was pushed. However, there was no reason to stop the loop if a netns does not have listeners. Returns -ESRCH only if there was no listeners in all netns. To avoid having the same problem in the future, I didn't take the assumption that nlmsg_multicast() returns only 0 or -ESRCH. Fixes: 134e63756d5f ("genetlink: make netns aware") CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>