aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-03-01bpf: fix sanitation rewrite in case of non-pointersDaniel Borkmann1-1/+2
Marek reported that he saw an issue with the below snippet in that timing measurements where off when loaded as unpriv while results were reasonable when loaded as privileged: [...] uint64_t a = bpf_ktime_get_ns(); uint64_t b = bpf_ktime_get_ns(); uint64_t delta = b - a; if ((int64_t)delta > 0) { [...] Turns out there is a bug where a corner case is missing in the fix d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths"), namely fixup_bpf_calls() only checks whether aux has a non-zero alu_state, but it also needs to test for the case of BPF_ALU_NON_POINTER since in both occasions we need to skip the masking rewrite (as there is nothing to mask). Fixes: d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths") Reported-by: Marek Majkowski <marek@cloudflare.com> Reported-by: Arthur Fabre <afabre@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/ Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-01Merge branch 'doc-net-ieee802154-move-from-plain-text-to-rst'David S. Miller2-95/+99
Stefan Schmidt says: ==================== doc: net: ieee802154: move from plain text to rst The ieee802154 subsystem doc was still in plain text. With the networking book taking shape I thought it was time to do the first step and move it over to rst. This really is only the minimal conversion. I need to take some time to update and extend the docs. The patches are based on net-next, but they only touch the networking book so I would not expect and trouble. From what I have seen they would go through Jonathan's tree after being acked by Dave? If you want this patches against a different tree let me know. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01doc: net: ieee802154: remove old plain text docs after switching to rstStefan Schmidt1-177/+0
The plain text docs are converted to rst now, which allows us to remove the old text file from the tree. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01doc: net: ieee802154: introduce IEEE 802.15.4 subsystem doc in rst styleStefan Schmidt2-0/+181
Moving the ieee802154 docs from a plain text file into the new rst style. This commit only does the minimal needed change to bring the documentation over. Follow up patches will improve and extend on this. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01devlink: fix kdocJakub Kicinski1-7/+5
devlink suffers from a few kdoc warnings: net/core/devlink.c:5292: warning: Function parameter or member 'dev' not described in 'devlink_register' net/core/devlink.c:5351: warning: Function parameter or member 'port_index' not described in 'devlink_port_register' net/core/devlink.c:5753: warning: Function parameter or member 'parent_resource_id' not described in 'devlink_resource_register' net/core/devlink.c:5753: warning: Function parameter or member 'size_params' not described in 'devlink_resource_register' net/core/devlink.c:5753: warning: Excess function parameter 'top_hierarchy' description in 'devlink_resource_register' net/core/devlink.c:5753: warning: Excess function parameter 'reload_required' description in 'devlink_resource_register' net/core/devlink.c:5753: warning: Excess function parameter 'parent_reosurce_id' description in 'devlink_resource_register' net/core/devlink.c:6451: warning: Function parameter or member 'region' not described in 'devlink_region_snapshot_create' net/core/devlink.c:6451: warning: Excess function parameter 'devlink_region' description in 'devlink_region_snapshot_create' Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01Merge branch 'net-aquantia-minor-bug-fixes-after-static-analysis'David S. Miller11-83/+197
Igor Russkikh says: ==================== net: aquantia: minor bug fixes after static analysis This patchset fixes minor errors and warnings found by smatch and kasan. Extra patch is to replace AQ_HW_WAIT_FOR with readx_poll_timeout to improve readability. V2: use readx_poll resubmitted to net-next since the changeset became quite big. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: aquantia: use better wrappers for state registersNikita Danilov2-5/+5
Replace some direct registers reads with better online functions. Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: aquantia: replace AQ_HW_WAIT_FOR with readx_poll_timeout_atomicNikita Danilov8-72/+184
David noticed the original define was hiding 'err' variable reference. Thats confusing and counterintuitive. Andrew noted the whole macro could be replaced with standard readx_poll kernel macro. This makes code more readable. Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: aquantia: fixed instack structure overflowIgor Russkikh2-4/+4
This is a real stack undercorruption found by kasan build. The issue did no harm normally because it only overflowed 2 bytes after `bitary` array which on most architectures were mapped into `err` local. Fixes: bab6de8fd180 ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.") Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: aquantia: fixed buffer overflowNikita Danilov1-0/+2
The overflow is detected by smatch: drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c: 175 aq_pci_func_free_irqs() error: buffer overflow 'self->aq_vec' 8 <= 31 In reality msix_entry_mask always restricts number of iterations. Adding extra condition to make logic clear and smatch happy. Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: aquantia: added newline at end of fileNikita Danilov1-1/+1
drivers/net/ethernet/aquantia/atlantic/aq_nic.c: 991:1: warning: no newline at end of file Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: aquantia: fixed memcpy sizeNikita Danilov1-1/+1
Not careful array dereference caused analysis tools to think there could be memory overflow. There was actually no corruption because the array is two dimensional. drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c: 140 aq_ethtool_get_strings() error: memcpy() '*aq_ethtool_stat_names' too small (32 vs 704) Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01ipv4: Add ICMPv6 support when parse route ipprotoHangbin Liu4-7/+17
For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers. But for ip -6 route, currently we only support tcp, udp and icmp. Add ICMPv6 support so we can match ipv6-icmp rules for route lookup. v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: eacb9384a3fe ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02MIPS: eBPF: Fix icache flush end addressPaul Burton1-1/+1
The MIPS eBPF JIT calls flush_icache_range() in order to ensure the icache observes the code that we just wrote. Unfortunately it gets the end address calculation wrong due to some bad pointer arithmetic. The struct jit_ctx target field is of type pointer to u32, and as such adding one to it will increment the address being pointed to by 4 bytes. Therefore in order to find the address of the end of the code we simply need to add the number of 4 byte instructions emitted, but we mistakenly add the number of instructions multiplied by 4. This results in the call to flush_icache_range() operating on a memory region 4x larger than intended, which is always wasteful and can cause crashes if we overrun into an unmapped page. Fix this by correcting the pointer arithmetic to remove the bogus multiplication, and use braces to remove the need for a set of brackets whilst also making it obvious that the target field is a pointer. Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.") Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: netdev@vger.kernel.org Cc: bpf@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01Merge branch 'nfp-control-processor-DMA-support-and-RJ45'David S. Miller3-46/+243
Jakub Kicinski says: ==================== nfp: control processor DMA support and RJ45 This series starts with adding support for reporting twisted pair media type in ethtool. Remaining patches add support for using DMA with the control/service processor. Currently we always copy the command data into card's memory. DMA support allows us to have the NSP read the data from host memory by itself. Unfortunately, the FW loading and flashing cannot directly map the buffers for DMA because (a) the firmware ABI returns const buffers, and (b) the buffers may be vmalloc()ed in many mysterious/unmappable way. So just bite the bullet - allocate new host buffer for the command and copy. As Dirk explains, the NSP now supports updating all FWs at once which means the max flashing time grew significantly. He bumps the max wait to avoid timeouts. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01nfp: nsp: set higher timeout for flash bundleDirk van der Merwe1-4/+1
The management firmware now supports being passed a bundle with multiple components to be stored in flash at once. This makes it easier to update all components to a known state with a single user command, however, this also has the potential to increase the time required to perform the update significantly. The management firmware only updates the components out of a bundle which are outdated, however, we need to make sure we can handle the absolute worst case where a CPLD update can take a long time to perform. We set a very conservative total timeout of 900s which already adds a contingency. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01nfp: nsp: allow the use of DMA bufferJakub Kicinski1-5/+191
Newer versions of NSP can access host memory. Simplest access type requires all data to be in one contiguous area. Since we don't have the guarantee on where callers of the NSP ABI will allocate their buffers we allocate a bounce buffer and copy the data in and out. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01nfp: nsp: move default buffer handling into its own functionJakub Kicinski1-42/+51
DMA version of NSP communication is coming, move the code which copies data into the NFP buffer into a separate function. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01nfp: nsp: use fractional size of the bufferJakub Kicinski1-6/+7
NSP expresses the buffer size in MB and 4 kB blocks. For small buffers the kB part may make a difference, so count it in. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01nfp: report RJ45 connector in ethtoolJakub Kicinski2-0/+4
Add support for reporting twisted pair port type. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01lan743x: Fix TX Stall IssueBryan Whitehead1-4/+12
It has been observed that tx queue stalls while downloading from certain web sites (example www.speedtest.net) The cause has been tracked down to a corner case where dma descriptors where not setup properly. And there for a tx completion interrupt was not signaled. This fix corrects the problem by properly marking the end of a multi descriptor transmission. Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: phy: phylink: fix uninitialized variable in phylink_get_mac_stateHeiner Kallweit1-0/+4
When debugging an issue I found implausible values in state->pause. Reason in that state->pause isn't initialized and later only single bits are changed. Also the struct itself isn't initialized in phylink_resolve(). So better initialize state->pause and other not yet initialized fields. v2: - use right function name in subject v3: - initialize additional fields Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: aquantia: regression on cpus with high cores: set mode with 8 queuesDmitry Bogdanov4-0/+29
Recently the maximum number of queues was increased up to 8, but NIC was not fully configured for 8 queues. In setups with more than 4 CPU cores parts of TX traffic gets lost if the kernel routes it to queues 4th-8th. This patch sets a tx hw traffic mode with 8 queues. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202651 Fixes: 71a963cfc50b ("net: aquantia: increase max number of hw queues") Reported-by: Nicholas Johnson <nicholas.johnson@outlook.com.au> Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01selftests: fixes for UDP GROPaolo Abeni2-17/+33
The current implementation for UDP GRO tests is racy: the receiver may flush the RX queue while the sending is still transmitting and incorrectly report RX errors, with a wrong number of packet received. Add explicit timeouts to the receiver for both connection activation (first packet received for UDP) and reception completion, so that in the above critical scenario the receiver will wait for the transfer completion. Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: marvell: neta: disable comphy when setting modeMarek Behún1-5/+23
The comphy driver for Armada 3700 by Miquèl Raynal (which is currently in linux-next) does not actually set comphy mode when phy_set_mode_ext is called. The mode is set at next call of phy_power_on. Update the driver to semantics similar to mvpp2: helper mvneta_comphy_init sets comphy mode and powers it on. When mode is to be changed in mvneta_mac_config, first power the comphy off, then call mvneta_comphy_init (which sets the mode to new one). Only do this when new mode is different from old mode. This should also work for Armada 38x, since in that comphy driver methods power_on and power_off are unimplemented. Signed-off-by: Marek Behún <marek.behun@nic.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01Merge branch 'enetc-Add-mdio-support-and-device-tree-nodes'David S. Miller7-1/+340
Claudiu Manoil says: ==================== enetc: Add mdio support and device tree nodes This is the missing part to enable PCI probing of the ENETC ethernet ports on the LS1028A SoC and external traffic on the LS1028A RDB board. It's one of the first items on the TODO list for the recently merged ENETC ethernet driver. v3: Add DT bindings doc for ENETC connections v4: none ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01dt-bindings: net: freescale: enetc: Add connection bindings for ENETC ethernet nodesClaudiu Manoil1-0/+69
Define connection bindings (external PHY connections and internal links) for the ENETC on-chip ethernet controllers. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01enetc: Add ENETC PF level external MDIO supportClaudiu Manoil4-1/+219
Each ENETC PF has its own MDIO interface, the corresponding MDIO registers are mapped in the ENETC's Port register block. The current patch adds a driver for these PF level MDIO buses, so that each PF can manage directly its own external link. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01arm64: dts: fsl: ls1028a-rdb: Add ENETC external eth ports for the LS1028A RDB boardClaudiu Manoil1-0/+17
The LS1028A RDB board features an Atheros PHY connected over SGMII to the ENETC PF0 (or Port0). ENETC Port1 (PF1) has no external connection on this board, so it can be disabled for now. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01arm64: dts: fsl: ls1028a: Add PCI IERC node and ENETC endpointsClaudiu Manoil1-0/+35
The LS1028A SoC features a PCI Integrated Endpoint Root Complex (IERC) defining several integrated PCI devices, including the ENETC ethernet controller integrated endpoints (IEPs). The IERC implements ECAM (Enhanced Configuration Access Mechanism) to provide access to the PCIe config space of the IEPs. This means the the IEPs (including ENETC) do not support the standard PCIe BARs, instead the Enhanced Allocation (EA) capability structures in the ECAM space are used to fix the base addresses in the system, and the PCI subsystem uses these structures for device enumeration and discovery. The "ranges" entries contain basic information from these EA capabily structures required by the kernel for device enumeration. The current patch also enables the first 2 ENETC PFs (Physiscal Functions) and the associated VFs (Virtual Functions), 2 VFs for each PF. Each of these ENETC PFs has an external ethernet port on the LS1028A SoC. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01bpf: drop refcount if bpf_map_new_fd() fails in map_create()Peng Sun1-2/+2
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file descriptor is supposed to return to userspace. When bpf_map_new_fd() fails, drop the refcount. Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID") Signed-off-by: Peng Sun <sironhide0null@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-28net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390XMaxime Chevallier1-2/+2
Upon setting the cmode on 6390 and 6390X, the associated serdes interfaces must be powered off/on. Both 6390X and 6390 share code to do so, but it currently uses the 6390 specific helper mv88e6390_serdes_power() to disable and enable the serdes interface. This call will fail silently on 6390X when trying so set a 10G interface such as XAUI or RXAUI, since mv88e6390_serdes_power() internally grabs the lane number based on modes supported by the 6390, and returns 0 when getting -ENODEV as a lane number. Using mv88e6390x_serdes_power() should be safe here, since we explicitly rule-out all ports but the 9 and 10, and because modes supported by 6390 ports 9 and 10 are a subset of those supported on 6390X. This was tested on 6390X using RXAUI mode. Fixes: 364e9d7776a3 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28selftests: rtnetlink: use internal netns switch for ip commandsDavid Ahern1-61/+61
'ip' can switch network namespaces internally and then run a given command relative to that namespace without the need to fork and exec another ip instance. Update all references of the form: ip netns exec "$testns" ip ... to ip -netns "$testns" ... Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28Merge branch 's390-qeth-next'David S. Miller6-212/+80
Julian Wiedmann says: ==================== s390/qeth: updates 2019-02-28 please apply one more qeth patch series for net-next. This eliminates some of the quirks in our reset code, and slims down the internal state machine. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: drop redundant state checkingJulian Wiedmann5-43/+13
Now that qeth always uses dev_close() to shutdown the interface, we can trust the locking and remove some custom state checks. qeth_l?_stop_card() is no longer called for a card in UP state, so remove the checks there too. This basically makes the UP state obsolete, so rip out the whole thing (except for the sysfs-visible string). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: don't special-case HW trap during suspendJulian Wiedmann2-12/+4
It makes no difference whether we 1. manually disarm the HW trap and call the offline code with recovery_mode == 1, or 2. call the offline code with recovery_mode == 0, and let it disarm the HW trap for us. So consolidate the two code paths in the suspend callback. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: remove driver-wide workqueueJulian Wiedmann1-18/+1
The qeth-wide workqueue is now only used by a single caller to schedule close_dev work. Just put it on a system queue instead. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: don't defer close_dev work during recoveryJulian Wiedmann4-5/+3
The recovery code already runs in a kthread, we don't have to defer the offlining further. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: remove a redundant check for card->devJulian Wiedmann1-1/+1
smatch complains that __qeth_l3_set_offline() first accesses card->dev, and then later checks whether the pointer is valid. Since commit d3d1b205e89f ("s390/qeth: allocate netdevice early"), the pointer is _always_ valid - that patch merely missed to remove this one check. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: call dev_close() during recoveryJulian Wiedmann3-75/+20
When resetting an interface ("recovery"), qeth currently attempts to elide the call to dev_close(). We initially only call .ndo_close to quiesce the data path, and then offline & online the ccwgroup device. If the reset succeeded, a call to .ndo_open then resumes the data path along with some internal setup (dev_addr validation, RX modeset) that dev_open() would have usually triggered. dev_close() only gets called (via the close_dev worker) if the reset action fails. It's unclear whether this was initially done due to locking concerns, or rather to execute the reset transparently. Either way, temporarily closing the interface without dev_close() is fragile, and means we're susceptible to various races and unexpected behaviour. For instance: - Bypassing dev_deactivate_many() means that the qdiscs are not set to __QDISC_STATE_DEACTIVATED. Consequently any intermittent TX completion can wake up the txq, resulting in calls to .ndo_start_xmit while the data path is down. We have custom state checking to detect this case and drop such packets. - Because the IFF_UP flag doesn't reflect the interface's actual state during a reset, we have custom state checking in .ndo_open and .ndo_close to guard against invalid calls. - Considering that the reset might take a considerable amount of time (in particular if an IO fails and we end up waiting for its timeout), we _do_ want NETDEV_GOING_DOWN and NETDEV_DOWN events so that components like bonding, team, bridge, macvlan, vlan, ... can take appropriate action. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: unconditionally clear MAC_REGISTERED flagJulian Wiedmann2-2/+1
In its attempt to run only the minimal amount of tear down steps, qeth_l2_stop_card() fails to reset the "is dev_addr registered?" flag in some rare scenarios. But a future change to the tear down sequence would cause us to _always_ hit this issue, so patch it up before that code lands. Fix it by unconditionally clearing the flag bit. This also allows us to remove the additional cleanup step in qeth_dev_layer2_store(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: enable/disable the HW trap a little earlierJulian Wiedmann2-14/+17
When setting a L2 qeth device online, enable the HW trap as soon as the control plane is available. This allows us to catch any error that occurs during the very first commands. In the same spirit, the offline code should disable the HW trap as the very first step of its processing. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: remove RECOVER stateJulian Wiedmann6-47/+25
The offline code uses a specific RECOVER state to indicate that the interface should be brought up when a qeth device is set online again. Rather than having a specific card-state for this, just put it in an internal flag bit and set the state to DOWN. When working with the card's state transitions, this reduces the complexity quite a bit. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: dsa: mv88e6xxx: Fix u64 statisticsAndrew Lunn1-1/+1
The switch maintains u64 counters for the number of octets sent and received. These are kept as two u32's which need to be combined. Fix the combing, which wrongly worked on u16's. Fixes: 80c4627b2719 ("dsa: mv88x6xxx: Refactor getting a single statistic") Reported-by: Chris Healy <Chris.Healy@zii.aero> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28xen-netback: don't populate the hash cache on XenBus disconnectIgor Druzhinin2-0/+9
Occasionally, during the disconnection procedure on XenBus which includes hash cache deinitialization there might be some packets still in-flight on other processors. Handling of these packets includes hashing and hash cache population that finally results in hash cache data structure corruption. In order to avoid this we prevent hashing of those packets if there are no queues initialized. In that case RCU protection of queues guards the hash cache as well. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net/smc: allow pnetid-less configurationUrsula Braun1-1/+41
Without hardware pnetid support there must currently be a pnet table configured to determine the IB device port to be used for SMC RDMA traffic. This patch enables a setup without pnet table, if the used handshake interface belongs already to a RoCE port. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28xen-netback: fix occasional leak of grant ref mappings under memory pressureIgor Druzhinin1-5/+5
Zero-copy callback flag is not yet set on frag list skb at the moment xenvif_handle_frag_list() returns -ENOMEM. This eventually results in leaking grant ref mappings since xenvif_zerocopy_callback() is never called for these fragments. Those eventually build up and cause Xen to kill Dom0 as the slots get reused for new mappings: "d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005" That behavior is observed under certain workloads where sudden spikes of page cache writes coexist with active atomic skb allocations from network traffic. Additionally, rework the logic to deal with frag_list deallocation in a single place. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: sched: pie: avoid slow division in drop probability decayLeslie Monis1-1/+2
As per RFC 8033, it is sufficient for the drop probability decay factor to have a value of (1 - 1/64) instead of 98%. This avoids the need to do slow division. Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28sctp: chunk.c: correct format string for size_t in printkMatthias Maennich1-1/+1
According to Documentation/core-api/printk-formats.rst, size_t should be printed with %zu, rather than %Zu. In addition, using %Zu triggers a warning on clang (-Wformat-extra-args): net/sctp/chunk.c:196:25: warning: data argument not used by format string [-Wformat-extra-args] __func__, asoc, max_data); ~~~~~~~~~~~~~~~~^~~~~~~~~ ./include/linux/printk.h:440:49: note: expanded from macro 'pr_warn_ratelimited' printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ ./include/linux/printk.h:424:17: note: expanded from macro 'printk_ratelimited' printk(fmt, ##__VA_ARGS__); \ ~~~ ^ Fixes: 5b5e0928f742 ("lib/vsprintf.c: remove %Z support") Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Matthias Maennich <maennich@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: netem: fix skb length BUG_ON in __skb_to_sgvecSheng Lan1-3/+7
It can be reproduced by following steps: 1. virtio_net NIC is configured with gso/tso on 2. configure nginx as http server with an index file bigger than 1M bytes 3. use tc netem to produce duplicate packets and delay: tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90% 4. continually curl the nginx http server to get index file on client 5. BUG_ON is seen quickly [10258690.371129] kernel BUG at net/core/skbuff.c:4028! [10258690.371748] invalid opcode: 0000 [#1] SMP PTI [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2 [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202 [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002 [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028 [10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000 [10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0 [10258690.372094] Call Trace: [10258690.372094] <IRQ> [10258690.372094] skb_to_sgvec+0x11/0x40 [10258690.372094] start_xmit+0x38c/0x520 [virtio_net] [10258690.372094] dev_hard_start_xmit+0x9b/0x200 [10258690.372094] sch_direct_xmit+0xff/0x260 [10258690.372094] __qdisc_run+0x15e/0x4e0 [10258690.372094] net_tx_action+0x137/0x210 [10258690.372094] __do_softirq+0xd6/0x2a9 [10258690.372094] irq_exit+0xde/0xf0 [10258690.372094] smp_apic_timer_interrupt+0x74/0x140 [10258690.372094] apic_timer_interrupt+0xf/0x20 [10258690.372094] </IRQ> In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's linear data size and nonlinear data size, thus BUG_ON triggered. Because the skb is cloned and a part of nonlinear data is split off. Duplicate packet is cloned in netem_enqueue() and may be delayed some time in qdisc. When qdisc len reached the limit and returns NET_XMIT_DROP, the skb will be retransmit later in write queue. the skb will be fragmented by tso_fragment(), the limit size that depends on cwnd and mss decrease, the skb's nonlinear data will be split off. The length of the skb cloned by netem will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(), the BUG_ON trigger. To fix it, netem returns NET_XMIT_SUCCESS to upper stack when it clones a duplicate packet. Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()") Signed-off-by: Sheng Lan <lansheng@huawei.com> Reported-by: Qin Ji <jiqin.ji@huawei.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>