aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-18Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-2/+2
Pull rseq fixes from Ingo Molnar: "Two rseq bugfixes: - CLONE_VM !CLONE_THREAD didn't work properly, the kernel would end up corrupting the TLS of the parent. Technically a change in the ABI but the previous behavior couldn't resonably have been relied on by applications so this looks like a valid exception to the ABI rule. - Make the RSEQ_FLAG_UNREGISTER ABI behavior consistent with the handling of other flags. This is not thought to impact any applications either" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Unregister rseq for clone CLONE_VM rseq: Reject unknown flags on rseq unregister
2020-01-17XArray: Add xa_for_each_rangeMatthew Wilcox (Oracle)1-5/+32
This function supports iterating over a range of an array. Also add documentation links for xa_for_each_start(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-01-17XArray: Add wrappers for nested spinlocksMatthew Wilcox (Oracle)1-0/+8
Some users need to take an xarray lock while holding another xarray lock. Reported-by: Doug Gilbert <dgilbert@interlog.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-01-18Merge tag 'drm-misc-fixes-2020-01-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixesDave Airlie1-0/+6
virtio: maintain obj reservation lock when submitting cmds (Gerd) rockchip: increase link rate var size to accommodate rates (Tobias) mst: serialize down messages and clear timeslots are on unplug (Wayne) Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Tobias Schramm <t.schramm@manjaro.org> Cc: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> From: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20200116162856.GA11524@art_vandelay
2020-01-17Merge tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+4
Pull block fixes from Jens Axboe: "Three fixes that should go into this release: - The 32-bit segment size fix that I mentioned last week (Ming) - Use uint for the block size (Mikulas) - A null_blk zone write handling fix (Damien)" * tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-block: block: fix an integer overflow in logical block size null_blk: Fix zone write handling block: fix get_max_segment_size() overflow on 32bit arch
2020-01-17netns: Constify exported functionsGuillaume Nault1-5/+5
Mark function parameters as 'const' where possible. Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-17net: phy: Maintain MDIO device and bus statisticsFlorian Fainelli1-0/+11
We maintain global statistics for an entire MDIO bus, as well as broken down, per MDIO bus address statistics. Given that it is possible for MDIO devices such as switches to access MDIO bus addresses for which there is not a mdio_device instance created (therefore not a a corresponding device directory in sysfs either), we also maintain per-address statistics under the statistics folder. The layout looks like this: /sys/class/mdio_bus/../statistics/ transfers errrors writes reads transfers_<addr> errors_<addr> writes_<addr> reads_<addr> When a mdio_device instance is registered, a statistics/ folder is created with the tranfers, errors, writes and reads attributes which point to the appropriate MDIO bus statistics structure. Statistics are 64-bit unsigned quantities and maintained through the u64_stats_sync.h helper functions. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-16devmap: Adjust tracepoint for map-less queue flushJesper Dangaard Brouer1-17/+12
Now that we don't have a reference to a devmap when flushing the device bulk queue, let's change the the devmap_xmit tracepoint to remote the map_id and map_index fields entirely. Rearrange the fields so 'drops' and 'sent' stay in the same position in the tracepoint struct, to make it possible for the xdp_monitor utility to read both the old and the new format. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/157918768613.1458396.9165902403373826572.stgit@toke.dk
2020-01-16xdp: Use bulking for non-map XDP_REDIRECT and consolidate code pathsToke Høiland-Jørgensen3-61/+63
Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, we can re-use the bulking for the non-map version of the bpf_redirect() helper. This is a simple matter of having xdp_do_redirect_slow() queue the frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). Unfortunately we can't make the bpf_redirect() helper return an error if the ifindex doesn't exit (as bpf_redirect_map() does), because we don't have a reference to the network namespace of the ingress device at the time the helper is called. So we have to leave it as-is and keep the device lookup in xdp_do_redirect_slow(). Since this leaves less reason to have the non-map redirect code in a separate function, so we get rid of the xdp_do_redirect_slow() function entirely. This does lose us the tracepoint disambiguation, but fortunately the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint entry structures. This means both can contain a map index, so we can just amend the tracepoint definitions so we always emit the xdp_redirect(_err) tracepoints, but with the map ID only populated if a map is present. This means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep the definitions around in case someone is still listening for them. With this change, the performance of the xdp_redirect sample program goes from 5Mpps to 8.4Mpps (a 68% increase). Since the flush functions are no longer map-specific, rename the flush() functions to drop _map from their names. One of the renamed functions is the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To keep from having to update all drivers, use a #define to keep the old name working, and only update the virtual drivers in this patch. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
2020-01-16xdp: Move devmap bulk queue into struct net_deviceToke Høiland-Jørgensen2-6/+9
Commit 96360004b862 ("xdp: Make devmap flush_list common for all map instances"), changed devmap flushing to be a global operation instead of a per-map operation. However, the queue structure used for bulking was still allocated as part of the containing map. This patch moves the devmap bulk queue into struct net_device. The motivation for this is reusing it for the non-map variant of XDP_REDIRECT, which will be changed in a subsequent commit. To avoid other fields of struct net_device moving to different cache lines, we also move a couple of other members around. We defer the actual allocation of the bulk queue structure until the NETDEV_REGISTER notification devmap.c. This makes it possible to check for ndo_xdp_xmit support before allocating the structure, which is not possible at the time struct net_device is allocated. However, we keep the freeing in free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER. Because of this change, we lose the reference back to the map that originated the redirect, so change the tracepoint to always return 0 as the map ID and index. Otherwise no functional change is intended with this patch. After this patch, the relevant part of struct net_device looks like this, according to pahole: /* --- cacheline 14 boundary (896 bytes) --- */ struct netdev_queue * _tx __attribute__((__aligned__(64))); /* 896 8 */ unsigned int num_tx_queues; /* 904 4 */ unsigned int real_num_tx_queues; /* 908 4 */ struct Qdisc * qdisc; /* 912 8 */ unsigned int tx_queue_len; /* 920 4 */ spinlock_t tx_global_lock; /* 924 4 */ struct xdp_dev_bulk_queue * xdp_bulkq; /* 928 8 */ struct xps_dev_maps * xps_cpus_map; /* 936 8 */ struct xps_dev_maps * xps_rxqs_map; /* 944 8 */ struct mini_Qdisc * miniq_egress; /* 952 8 */ /* --- cacheline 15 boundary (960 bytes) --- */ struct hlist_head qdisc_hash[16]; /* 960 128 */ /* --- cacheline 17 boundary (1088 bytes) --- */ struct timer_list watchdog_timer; /* 1088 40 */ /* XXX last struct has 4 bytes of padding */ int watchdog_timeo; /* 1128 4 */ /* XXX 4 bytes hole, try to pack */ struct list_head todo_list; /* 1136 16 */ /* --- cacheline 18 boundary (1152 bytes) --- */ Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk
2020-01-16Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds1-3/+3
Pull ARM SoC fixes from Olof Johansson: "I've been sitting on these longer than I meant, so the patch count is a bit higher than ideal for this part of the release. There's also some reverts of double-applied patches that brings the diffstat up a bit. With that said, the biggest changes are: - Revert of duplicate i2c device addition on two Aspeed (BMC) Devicetrees. - Move of two device nodes that got applied to the wrong part of the tree on ASpeed G6. - Regulator fix for Beaglebone X15 (adding 12/5V supplies) - Use interrupts for keys on Amlogic SM1 to avoid missed polls In addition to that, there is a collection of smaller DT fixes: - Power supply assignment fixes for i.MX6 - Fix of interrupt line for magnetometer on i.MX8 Librem5 devkit - Build fixlets (selects) for davinci/omap2+ - More interrupt number fixes for Stratix10, Amlogic SM1, etc. - ... and more similar fixes across different platforms And some non-DT stuff: - optee fix to register multiple shared pages properly - Clock calculation fixes for MMP3 - Clock fixes for OMAP as well" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits) MAINTAINERS: Add myself as the co-maintainer for Actions Semi platforms ARM: dts: imx7: Fix Toradex Colibri iMX7S 256MB NAND flash support ARM: dts: imx6sll-evk: Remove incorrect power supply assignment ARM: dts: imx6sl-evk: Remove incorrect power supply assignment ARM: dts: imx6sx-sdb: Remove incorrect power supply assignment ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment ARM: dts: imx6q-icore-mipi: Use 1.5 version of i.Core MX6DL ARM: omap2plus: select RESET_CONTROLLER ARM: davinci: select CONFIG_RESET_CONTROLLER ARM: dts: aspeed: rainier: Fix fan fault and presence ARM: dts: aspeed: rainier: Remove duplicate i2c busses ARM: dts: aspeed: tacoma: Remove duplicate flash nodes ARM: dts: aspeed: tacoma: Remove duplicate i2c busses ARM: dts: aspeed: tacoma: Fix fsi master node ARM: dts: aspeed-g6: Fix FSI master location ARM: dts: mmp3: Fix the TWSI ranges clk: mmp2: Fix the order of timer mux parents ARM: mmp: do not divide the clock rate arm64: dts: rockchip: Fix IR on Beelink A1 optee: Fix multi page dynamic shm pool alloc ...
2020-01-16net/mlx5: Allow creating autogroups with reserved entriesPaul Blakey1-0/+1
Exclude the last n entries for an autogrouped flow table. Reserving entries at the end of the FT will ensure that this FG will be the last to be evaluated. This will be used in the next patch to create a miss group enabling custom actions on FT miss. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Add ignore level support fwd to table rulesPaul Blakey1-0/+1
If user sets ignore flow level flag on a rule, that rule can point to a flow table of any level, including those with levels equal or less than the level of the flow table it is added on. This with unamanged tables will be used to create a FDB chain/prio hierarchy much larger than currently supported level range. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: fs_core: Introduce unmanaged flow tablesPaul Blakey1-0/+2
Currently, Most of the steering tree is statically declared ahead of time, with steering prios instances allocated for each fdb chain to assign max number of levels for each of them. This allows fs_core to manage the connections and levels of the flow tables hierarcy to prevent loops, but restricts us with the number of supported chains and priorities. Introduce unmananged flow tables, allowing the user to manage the flow table connections. A unamanged table is detached from the fs_core flow table hierarcy, and is only connected back to the hierarchy by explicit FTEs forward actions. This will be used together with firmware that supports ignoring the flow table levels to increase the number of supported chains and prios. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxSaeed Mahameed4-17/+263
This merge syncs with mlx5-next latest HW bits and layout updates for next features, in addition one patch that improves mlx5_create_auto_grouped_flow_table() API across all mlx5 users. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Refactor mlx5_create_auto_grouped_flow_table net/mlx5e: Add discard counters per priority net/mlx5e: Expose FEC feilds and related capability bit net/mlx5: Add mlx5_ifc definitions for connection tracking support net/mlx5: Add copy header action struct layout net/mlx5: Expose resource dump register mapping net/mlx5: Add structures and defines for MIRC register net/mlx5: Read MCAM register groups 1 and 2 net/mlx5: Add structures layout for new MCAM access reg groups net/mlx5: Expose vDPA emulation device capabilities net/mlx5: Add Virtio Emulation related device capabilities Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Refactor mlx5_create_auto_grouped_flow_tablePaul Blakey1-8/+8
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param which already carries the max_fte, prio and flags memebers, and is used the same in similar mlx5_create_flow_table() function. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5e: Add discard counters per priorityAharon Landau1-1/+3
Add counters that count (per priority) the number of received packets that dropped due to lack of buffers on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network. Signed-off-by: Aharon Landau <aharonl@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5e: Expose FEC feilds and related capability bitAya Levin1-1/+15
Introduce 50G per lane FEC modes capability bit and newly supported fields in PPLM register which allow this configuration. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Add mlx5_ifc definitions for connection tracking supportPaul Blakey1-4/+20
Add the required hardware definitions to mlx5_ifc: ignore_flow_level, registers, copy_header, and fwd_and_modify cap. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Sholomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Add copy header action struct layoutHamdan Igbaria1-0/+16
Add definition for copy header action, copy action is used to copy header fields from source to destination. Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com> Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Expose resource dump register mappingAya Levin2-1/+130
Add new register enumeration for resource dump. Add layout mapping for resource dump: access command and response. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Add structures and defines for MIRC registerEran Ben Elisha2-0/+9
Add needed structures, layouts and defines for MIRC (Management Image Re-activation Control) register. This structure will be used for the FSM reactivation flow in the downstream patches. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Read MCAM register groups 1 and 2Eran Ben Elisha2-2/+14
On load, Driver caches MCAM (Management Capabilities Mask Register) registers. in addition to the only MCAM register group (0) the driver already reads, here we add support for reading groups 1 and 2. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Add structures layout for new MCAM access reg groupsEran Ben Elisha1-0/+24
MCAM has 3 access_reg_groups (0-2). Defines data structures in order to read and parse access_reg_groups #1 and #2. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16netfilter: bitwise: add support for shifts.Jeremy Sowden1-2/+7
Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR and XOR. Extend it to do shifts as well. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: bitwise: add NFTA_BITWISE_DATA attribute.Jeremy Sowden1-0/+3
Add a new bitwise netlink attribute that will be used by shift operations to store the size of the shift. It is not used by boolean operations. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: bitwise: add NFTA_BITWISE_OP netlink attribute.Jeremy Sowden1-0/+12
Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a value of a new enum, nft_bitwise_ops. It describes the type of operation an expression contains. Currently, it only has one value: NFT_BITWISE_BOOL. More values will be added later to implement shifts. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: refresh flow if hardware offload failsPablo Neira Ayuso1-0/+1
If nf_flow_offload_add() fails to add the flow to hardware, then the NF_FLOW_HW_REFRESH flag bit is set and the flow remains in the flowtable software path. If flowtable hardware offload is enabled, this patch enqueues a new request to offload this flow to hardware. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: add nf_flowtable_hw_offload() helper functionPablo Neira Ayuso1-0/+5
This function checks for the NF_FLOWTABLE_HW_OFFLOAD flag, meaning that the flowtable hardware offload is enabled. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: use atomic bitwise operations for flow flagsPablo Neira Ayuso1-7/+9
Originally, all flow flag bits were set on only from the workqueue. With the introduction of the flow teardown state and hardware offload this is no longer true. Let's be safe and use atomic bitwise operation to operation with flow flags. Fixes: 59c466dd68e7 ("netfilter: nf_flow_table: add a new flow state for tearing down offloading") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: remove dying bit, use teardown bit insteadPablo Neira Ayuso1-5/+0
The dying bit removes the conntrack entry if the netdev that owns this flow is going down. Instead, use the teardown mechanism to push back the flow to conntrack to let the classic software path decide what to do with it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: nft_bitwise: correct uapi header comment.Jeremy Sowden1-1/+1
The comment documenting how bitwise expressions work includes a table which summarizes the mask and xor arguments combined to express the supported boolean operations. However, the row for OR: mask xor 0 x is incorrect. dreg = (sreg & 0) ^ x is not equivalent to: dreg = sreg | x What the code actually does is: dreg = (sreg & ~x) ^ x Update the documentation to match. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16IB/core: Add interface to advise_mr for kernel usersMoni Shoua1-0/+3
Allow ULPs to call advise_mr, so they can control ODP regions in the same way as user space applications. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16IB/core: Introduce ib_reg_user_mrMoni Shoua1-0/+6
Add ib_reg_user_mr() for kernel ULPs to register user MRs. The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP that acts as an agent for the userspace application. This function is intended to be used without a user context so vendor drivers need to be aware of calling reg_user_mr() device operation with udata equal to NULL. Among all drivers, i40iw is the only driver which relies on presence of udata, so check udata existence for that driver. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16IB: Allow calls to ib_umem_get from kernel ULPsMoni Shoua2-5/+5
So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller3-7/+14
Daniel Borkmann says: ==================== pull-request: bpf 2020-01-15 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 9 day(s) which contain a total of 13 files changed, 95 insertions(+), 43 deletions(-). The main changes are: 1) Fix refcount leak for TCP time wait and request sockets for socket lookup related BPF helpers, from Lorenz Bauer. 2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann. 3) Batch of several sockmap and related TLS fixes found while operating more complex BPF programs with Cilium and OpenSSL, from John Fastabend. 4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue() to avoid purging data upon teardown, from Lingpeng Chen. 5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly dump a BPF map's value with BTF, from Martin KaFai Lau. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15block: fix an integer overflow in logical block sizeMikulas Patocka1-4/+4
Logical block size has type unsigned short. That means that it can be at most 32768. However, there are architectures that can run with 64k pages (for example arm64) and on these architectures, it may be possible to create block devices with 64k block size. For exmaple (run this on an architecture with 64k pages): Mount will fail with this error because it tries to read the superblock using 2-sector access: device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536 EXT4-fs (dm-0): unable to read superblock This patch changes the logical block size from unsigned short to unsigned int to avoid the overflow. Cc: stable@vger.kernel.org Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-15bpf: Sockmap/tls, push write_space updates through ulp updatesJohn Fastabend2-6/+12
When sockmap sock with TLS enabled is removed we cleanup bpf/psock state and call tcp_update_ulp() to push updates to TLS ULP on top. However, we don't push the write_space callback up and instead simply overwrite the op with the psock stored previous op. This may or may not be correct so to ensure we don't overwrite the TLS write space hook pass this field to the ULP and have it fixup the ctx. This completes a previous fix that pushed the ops through to the ULP but at the time missed doing this for write_space, presumably because write_space TLS hook was added around the same time. Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
2020-01-15bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loopJohn Fastabend1-0/+1
When a sockmap is free'd and a socket in the map is enabled with tls we tear down the bpf context on the socket, the psock struct and state, and then call tcp_update_ulp(). The tcp_update_ulp() call is to inform the tls stack it needs to update its saved sock ops so that when the tls socket is later destroyed it doesn't try to call the now destroyed psock hooks. This is about keeping stacked ULPs in good shape so they always have the right set of stacked ops. However, recently unhash() hook was removed from TLS side. But, the sockmap/bpf side is not doing any extra work to update the unhash op when is torn down instead expecting TLS side to manage it. So both TLS and sockmap believe the other side is managing the op and instead no one updates the hook so it continues to point at tcp_bpf_unhash(). When unhash hook is called we call tcp_bpf_unhash() which detects the psock has already been destroyed and calls sk->sk_prot_unhash() which calls tcp_bpf_unhash() yet again and so on looping and hanging the core. To fix have sockmap tear down logic fixup the stale pointer. Fixes: 5d92e631b8be ("net/tls: partially revert fix transition through disconnect with close") Reported-by: syzbot+83979935eb6304f8cd46@syzkaller.appspotmail.com Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-2-john.fastabend@gmail.com
2020-01-15Merge tag 'batadv-next-for-davem-20200114' of git://git.open-mesh.org/linux-mergeDavid S. Miller2-2/+2
Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - fix typo and kerneldocs, by Sven Eckelmann - use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer - silence some endian sparse warnings by adding annotations, by Sven Eckelmann - Update copyright years to 2020, by Sven Eckelmann - Disable deprecated sysfs configuration by default, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15drm/dp_mst: Have DP_Tx send one msg at a timeWayne Lin1-0/+6
[Why] Noticed this while testing MST with the 4 ports MST hub from StarTech.com. Sometimes can't light up monitors normally and get the error message as 'sideband msg build failed'. Look into aux transactions, found out that source sometimes will send out another down request before receiving the down reply of the previous down request. On the other hand, in drm_dp_get_one_sb_msg(), current code doesn't handle the interleaved replies case. Hence, source can't build up message completely and can't light up monitors. [How] For good compatibility, enforce source to send out one down request at a time. Add a flag, is_waiting_for_dwn_reply, to determine if the source can send out a down request immediately or not. - Check the flag before calling process_single_down_tx_qlock to send out a msg - Set the flag when successfully send out a down request - Clear the flag when successfully build up a down reply - Clear the flag when find erros during sending out a down request - Clear the flag when find errors during building up a down reply - Clear the flag when timeout occurs during waiting for a down reply - Use drm_dp_mst_kick_tx() to try to send another down request in queue at the end of drm_dp_mst_wait_tx_reply() (attempt to send out messages in queue when errors occur) Cc: Lyude Paul <lyude@redhat.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200113093649.11755-1-Wayne.Lin@amd.com
2020-01-15bpf: Add batch ops to all htab bpf mapYonghong Song2-0/+4
htab can't use generic batch support due some problematic behaviours inherent to the data structre, i.e. while iterating the bpf map a concurrent program might delete the next entry that batch was about to use, in that case there's no easy solution to retrieve the next entry, the issue has been discussed multiple times (see [1] and [2]). The only way hmap can be traversed without the problem previously exposed is by making sure that the map is traversing entire buckets. This commit implements those strict requirements for hmap, the implementation follows the same interaction that generic support with some exceptions: - If keys/values buffer are not big enough to traverse a bucket, ENOSPC will be returned. - out_batch contains the value of the next bucket in the iteration, not the next key, but this is transparent for the user since the user should never use out_batch for other than bpf batch syscalls. This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete batch ops it is possible to use the generic implementations. [1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/ [2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/ Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com
2020-01-15bpf: Add generic support for update and delete batch opsBrian Vazquez2-0/+12
This commit adds generic support for update and delete batch ops that can be used for almost all the bpf maps. These commands share the same UAPI attr that lookup and lookup_and_delete batch ops use and the syscall commands are: BPF_MAP_UPDATE_BATCH BPF_MAP_DELETE_BATCH The main difference between update/delete and lookup batch ops is that for update/delete keys/values must be specified for userspace and because of that, neither in_batch nor out_batch are used. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com
2020-01-15bpf: Add generic support for lookup batch opBrian Vazquez2-0/+23
This commit introduces generic support for the bpf_map_lookup_batch. This implementation can be used by almost all the bpf maps since its core implementation is relying on the existing map_get_next_key and map_lookup_elem. The bpf syscall subcommand introduced is: BPF_MAP_LOOKUP_BATCH The UAPI attribute is: struct { /* struct used by BPF_MAP_*_BATCH commands */ __aligned_u64 in_batch; /* start batch, * NULL to start from beginning */ __aligned_u64 out_batch; /* output: next start batch */ __aligned_u64 keys; __aligned_u64 values; __u32 count; /* input/output: * input: # of key/value * elements * output: # of filled elements */ __u32 map_fd; __u64 elem_flags; __u64 flags; } batch; in_batch/out_batch are opaque values use to communicate between user/kernel space, in_batch/out_batch must be of key_size length. To start iterating from the beginning in_batch must be null, count is the # of key/value elements to retrieve. Note that the 'keys' buffer must be a buffer of key_size * count size and the 'values' buffer must be value_size * count, where value_size must be aligned to 8 bytes by userspace if it's dealing with percpu maps. 'count' will contain the number of keys/values successfully retrieved. Note that 'count' is an input/output variable and it can contain a lower value after a call. If there's no more entries to retrieve, ENOENT will be returned. If error is ENOENT, count might be > 0 in case it copied some values but there were no more entries to retrieve. Note that if the return code is an error and not -EFAULT, count indicates the number of elements successfully processed. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com
2020-01-15bpf: Fix incorrect verifier simulation of ARSH under ALU32Daniel Borkmann1-1/+1
Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (85) call bpf_get_socket_cookie#46 1: R0_w=invP(id=0) R10=fp0 1: (57) r0 &= 808464432 2: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 2: (14) w0 -= 810299440 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 3: (c4) w0 s>>= 1 4: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 4: (76) if w0 s>= 0x30303030 goto pc+216 221: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 221: (95) exit processed 6 insns (limit 1000000) [...] Taking a closer look, the program was xlated as follows: # ./bpftool p d x i 12 0: (85) call bpf_get_socket_cookie#7800896 1: (bf) r6 = r0 2: (57) r6 &= 808464432 3: (14) w6 -= 810299440 4: (c4) w6 s>>= 1 5: (76) if w6 s>= 0x30303030 goto pc+216 6: (05) goto pc-1 7: (05) goto pc-1 8: (05) goto pc-1 [...] 220: (05) goto pc-1 221: (05) goto pc-1 222: (95) exit Meaning, the visible effect is very similar to f54c7898ed1c ("bpf: Fix precision tracking for unbounded scalars"), that is, the fall-through branch in the instruction 5 is considered to be never taken given the conclusion from the min/max bounds tracking in w6, and therefore the dead-code sanitation rewrites it as goto pc-1. However, real-life input disagrees with verification analysis since a soft-lockup was observed. The bug sits in the analysis of the ARSH. The definition is that we shift the target register value right by K bits through shifting in copies of its sign bit. In adjust_scalar_min_max_vals(), we do first coerce the register into 32 bit mode, same happens after simulating the operation. However, for the case of simulating the actual ARSH, we don't take the mode into account and act as if it's always 64 bit, but location of sign bit is different: dst_reg->smin_value >>= umin_val; dst_reg->smax_value >>= umin_val; dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val); Consider an unknown R0 where bpf_get_socket_cookie() (or others) would for example return 0xffff. With the above ARSH simulation, we'd see the following results: [...] 1: R1=ctx(id=0,off=0,imm=0) R2_w=invP65535 R10=fp0 1: (85) call bpf_get_socket_cookie#46 2: R0_w=invP(id=0) R10=fp0 2: (57) r0 &= 808464432 -> R0_runtime = 0x3030 3: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 3: (14) w0 -= 810299440 -> R0_runtime = 0xcfb40000 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 (0xffffffff) 4: (c4) w0 s>>= 1 -> R0_runtime = 0xe7da0000 5: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 (0x67c00000) (0x7ffbfff8) [...] In insn 3, we have a runtime value of 0xcfb40000, which is '1100 1111 1011 0100 0000 0000 0000 0000', the result after the shift has 0xe7da0000 that is '1110 0111 1101 1010 0000 0000 0000 0000', where the sign bit is correctly retained in 32 bit mode. In insn4, the umax was 0xffffffff, and changed into 0x7ffbfff8 after the shift, that is, '0111 1111 1111 1011 1111 1111 1111 1000' and means here that the simulation didn't retain the sign bit. With above logic, the updates happen on the 64 bit min/max bounds and given we coerced the register, the sign bits of the bounds are cleared as well, meaning, we need to force the simulation into s32 space for 32 bit alu mode. Verification after the fix below. We're first analyzing the fall-through branch on 32 bit signed >= test eventually leading to rejection of the program in this specific case: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r2 = 808464432 1: R1=ctx(id=0,off=0,imm=0) R2_w=invP808464432 R10=fp0 1: (85) call bpf_get_socket_cookie#46 2: R0_w=invP(id=0) R10=fp0 2: (bf) r6 = r0 3: R0_w=invP(id=0) R6_w=invP(id=0) R10=fp0 3: (57) r6 &= 808464432 4: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 4: (14) w6 -= 810299440 5: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 5: (c4) w6 s>>= 1 6: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0 (0x67c00000) (0xfffbfff8) 6: (76) if w6 s>= 0x30303030 goto pc+216 7: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0 7: (30) r0 = *(u8 *)skb[808464432] BPF_LD_[ABS|IND] uses reserved fields processed 8 insns (limit 1000000) [...] Fixes: 9cbe1f5a32dc ("bpf/verifier: improve register value range tracking with ARSH") Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115204733.16648-1-daniel@iogearbox.net
2020-01-15bpf: Add bpf_send_signal_thread() helperYonghong Song1-2/+17
Commit 8b401f9ed244 ("bpf: implement bpf_send_signal() helper") added helper bpf_send_signal() which permits bpf program to send a signal to the current process. The signal may be delivered to any threads in the process. We found a use case where sending the signal to the current thread is more preferable. - A bpf program will collect the stack trace and then send signal to the user application. - The user application will add some thread specific information to the just collected stack trace for later analysis. If bpf_send_signal() is used, user application will need to check whether the thread receiving the signal matches the thread collecting the stack by checking thread id. If not, it will need to send signal to another thread through pthread_kill(). This patch proposed a new helper bpf_send_signal_thread(), which sends the signal to the thread corresponding to the current kernel task. This way, user space is guaranteed that bpf_program execution context and user space signal handling context are the same thread. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115035002.602336-1-yhs@fb.com
2020-01-15Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+0
Pull vfs fixes from Al Viro: "Fixes for mountpoint_last() bugs (by converting to use of lookup_last()) and an autofs regression fix from this cycle (caused by follow_managed() breakage introduced in barrier fixes series)" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix autofs regression caused by follow_managed() changes reimplement path_mountpoint() with less magic
2020-01-15net: bridge: vlan: add rtnetlink group and notify supportNikolay Aleksandrov1-0/+2
Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN and add support for sending vlan notifications (both single and ranges). No functional changes intended, the notification support will be used by later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtm range supportNikolay Aleksandrov1-0/+1
Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress similar vlans into ranges. This will be also used when per-vlan options are introduced and vlans' options match, they will be put into a single range which is encapsulated in one netlink attribute. We need to run similar checks as br_process_vlan_info() does because these ranges will be used for options setting and they'll be able to skip br_process_vlan_info(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtm definitions and dump supportNikolay Aleksandrov2-0/+35
This patch adds vlan rtm definitions: - NEWVLAN: to be used for creating vlans, setting options and notifications - DELVLAN: to be used for deleting vlans - GETVLAN: used for dumping vlan information Dumping vlans which can span multiple messages is added now with basic information (vid and flags). We use nlmsg_parse() to validate the header length in order to be able to extend the message with filtering attributes later. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>