aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-07net/smc: original socket family in inet_sock_diagKarsten Graul1-2/+1
Commit ed75986f4aae ("net/smc: ipv6 support for smc_diag.c") changed the value of the diag_family field. The idea was to indicate the family of the IP address in the inet_diag_sockid field. But the change makes it impossible to distinguish an inet_sock_diag response message from SMC sock_diag response. This patch restores the original behaviour and sends AF_SMC as value of the diag_family field. Fixes: ed75986f4aae ("net/smc: ipv6 support for smc_diag.c") Reported-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/smc: move code to clear the conn->lgr fieldKarsten Graul1-2/+3
The lgr field of an smc_connection is set in smc_conn_create() and should be cleared in smc_conn_free() for consistency reasons, so move the responsible code. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/smc: use client and server LGR pending locks for SMC-RHans Wippel1-16/+28
If SMC client and server connections are both established at the same time, smc_connect_rdma() cannot send a CLC confirm message while smc_listen_work() is waiting for one due to lock contention. This can result in timeouts in smc_clc_wait_msg() and failed SMC connections. In case of SMC-R, there are two types of LGRs (client and server LGRs) which can be protected by separate locks. So, this patch splits the LGR pending lock into two separate locks for client and server to avoid the locking issue for SMC-R. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/smc: unlock LGR pending lock earlier for SMC-DHans Wippel1-5/+9
If SMC client and server connections are both established at the same time, smc_connect_ism() cannot send a CLC confirm message while smc_listen_work() is waiting for one due to lock contention. This can result in timeouts in smc_clc_wait_msg() and failed SMC connections. In case of SMC-D, the LGR pending lock is not needed while smc_listen_work() is waiting for the CLC confirm message. So, this patch releases the lock earlier for SMC-D to avoid the locking issue. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/smc: use smc_curs_copy() for SMC-DUrsula Braun1-4/+5
SMC already provides a wrapper for atomic64 calls to be architecture independent. Use this wrapper for SMC-D as well. Reported-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/smc: postpone release of clcsockUrsula Braun2-17/+23
According to RFC7609 (http://www.rfc-editor.org/info/rfc7609) first the SMC-R connection is shut down and then the normal TCP connection FIN processing drives cleanup of the internal TCP connection. The unconditional release of the clcsock during active socket closing has to be postponed if the peer has not yet signalled socket closing. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health dump {get,clear} commandsEran Ben Elisha1-0/+63
Add devlink health dump commands, in order to run an dump operation over a specific reporter. The supported operations are dump_get in order to get last saved dump (if not exist, dump now) and dump_clear to clear last saved dump. It is expected from driver's callback for diagnose command to fill it via the devlink fmsg API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health diagnose commandEran Ben Elisha1-0/+46
Add devlink health diagnose command, in order to run a diagnose operation over a specific reporter. It is expected from driver's callback for diagnose command to fill it via the devlink fmsg API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health recover commandEran Ben Elisha1-0/+20
Add devlink health recover command to the uapi, in order to allow the user to execute a recover operation over a specific reporter. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health set commandEran Ben Elisha1-0/+36
Add devlink health set command, in order to set configuration parameters for a specific reporter. Supported parameters are: - graceful_period: Time interval between auto recoveries (in msec) - auto_recover: Determines if the devlink shall execute recover upon receiving error for the reporter Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health get commandEran Ben Elisha1-0/+149
Add devlink health get command to provide reporter/s data for user space. Add the ability to get data per reporter or dump data from all available reporters. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health report functionalityEran Ben Elisha1-0/+119
Upon error discover, every driver can report it to the devlink health mechanism via devlink_health_report function, using the appropriate reporter registered to it. Driver can pass error specific context which will be delivered to it as part of the dump / recovery callbacks. Once an error is reported, devlink health will do the following actions: * A log is being send to the kernel trace events buffer * Health status and statistics are being updated for the reporter instance * Object dump is being taken and stored at the reporter instance (as long as there is no other dump which is already stored) * Auto recovery attempt is being done. Depends on: - Auto Recovery configuration - Grace period vs. Time since last recover Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health reporter create/destroy functionalityEran Ben Elisha1-0/+92
Devlink health reporter is an instance for reporting, diagnosing and recovering from run time errors discovered by the reporters. Define it's data structure and supported operations. In addition, expose devlink API to create and destroy a reporter. Each devlink instance will hold it's own reporters list. As part of the allocation, driver shall provide a set of callbacks which will be used by devlink in order to handle health reports and user commands related to this reporter. In addition, driver is entitled to provide some priv pointer, which can be fetched from the reporter by devlink_health_reporter_priv function. For each reporter, devlink will hold a metadata of statistics, dump msg and status. For passing dumps and diagnose data to the user-space, it will use devlink fmsg API. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add devlink formatted message (fmsg) APIEran Ben Elisha1-0/+483
Devlink fmsg is a mechanism to pass descriptors between drivers and devlink, in json-like format. The API allows the driver to add nested attributes such as object, object pair and value array, in addition to attributes such as name and value. Driver can use this API to fill the fmsg context in a format which will be translated by the devlink to the netlink message later. There is no memory allocation in advance (other than the initial list head), and it dynamically allocates messages descriptors and add them to the list on the fly. When it needs to send the data using SKBs to the netlink layer, it fragments the data between different SKBs. In order to do this fragmentation, it uses virtual nests attributes, to avoid actual nesting use which cannot be divided between different SKBs. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06Merge branch 'for_net-next-5.1/rds-tos-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linuxDavid S. Miller15-52/+155
Santosh Shilimkar says: ==================== rds: add tos support RDS applications make use of tos to classify database traffic. This feature has been used in shipping products from 2.6.32 based kernels. Its tied with RDS v4.1 protocol version and the compatibility gets negotiated as part of connections setup. Patchset keeps full backward compatibility using existing connection negotiation scheme. Currently the feature is exploited by RDMA transport and for TCP transport the user tos values are mapped to same default class (0). For RDMA transports, RDMA CM service type API is used to set up different SL(service lanes) and the IB fabric is configured for tos mapping using Subnet Manager(SL to VL mappings). Similarly for ROCE fabric, user priority is mapped with different DSCP code points which are associated with different switch queues in the fabric. The original code was developed by Bang Nguyen in downstream kernel back in 2.6.32 kernel days and it has evolved significantly over period of time. Thanks to Yanjun for doing testing with various combinations of host like v3.1<->v4.1, v4.1.<->v3.1, v4.1 upstream to shipping v4.1 etc etc ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-5/+5
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-02-07 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add a riscv64 JIT for BPF, from Björn. 2) Implement BTF deduplication algorithm for libbpf which takes BTF type information containing duplicate per-compilation unit information and reduces it to an equivalent set of BTF types with no duplication and without loss of information, from Andrii. 3) Offloaded and native BPF XDP programs can coexist today, enable also offloaded and generic ones as well, from Jakub. 4) Expose various BTF related helper functions in libbpf as API which are in particular helpful for JITed programs, from Yonghong. 5) Fix the recently added JMP32 code emission in s390x JIT, from Heiko. 6) Fix BPF kselftests' tcp_{server,client}.py to be able to run inside a network namespace, also add a fix for libbpf to get libbpf_print() working, from Stanislav. 7) Fixes for bpftool documentation, from Prashant. 8) Type cleanup in BPF kselftests' test_maps.c to silence a gcc8 warning, from Breno. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_IDFlorian Fainelli5-73/+15
Now that we have a dedicated NDO for getting a port's parent ID, get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the NDO exclusively. This is a preliminary change to getting rid of switchdev_ops eventually. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net: dsa: Implement ndo_get_port_parent_id()Florian Fainelli1-6/+12
DSA implements SWITCHDEV_ATTR_ID_PORT_PARENT_ID and we want to get rid of switchdev_ops eventually, ease that migration by implementing a ndo_get_port_parent_id() function which returns what switchdev_port_attr_get() would do. Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net: Introduce ndo_get_port_parent_id()Florian Fainelli5-5/+82
In preparation for getting rid of switchdev_ops, create a dedicated NDO operation for getting the port's parent identifier. There are essentially two classes of drivers that need to implement getting the port's parent ID which are VF/PF drivers with a built-in switch, and pure switchdev drivers such as mlxsw, ocelot, dsa etc. We introduce a helper function: dev_get_port_parent_id() which supports recursion into the lower devices to obtain the first port's parent ID. Convert the bridge, core and ipv4 multicast routing code to check for such ndo_get_port_parent_id() and call the helper function when valid before falling back to switchdev_port_attr_get(). This will allow us to convert all relevant drivers in one go instead of having to implement both switchdev_port_attr_get() and ndo_get_port_parent_id() operations, then get rid of switchdev_port_attr_get(). Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06devlink: add hardware errors tracing facilityNir Dotan1-0/+1
Define a tracepoint and allow user to trace messages in case of an hardware error code for hardware associated with devlink instance. Signed-off-by: Nir Dotan <nird@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06ethtool: add ethtool_rx_flow_spec to flow_rule structure translatorPablo Neira Ayuso1-0/+241
This patch adds a function to translate the ethtool_rx_flow_spec structure to the flow_rule representation. This allows us to reuse code from the driver side given that both flower and ethtool_rx_flow interfaces use the same representation. This patch also includes support for the flow type flags FLOW_EXT, FLOW_MAC_EXT and FLOW_RSS. The ethtool_rx_flow_spec_input wrapper structure is used to convey the rss_context field, that is away from the ethtool_rx_flow_spec structure, and the ethtool_rx_flow_spec structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06cls_flower: don't expose TC actions to drivers anymorePablo Neira Ayuso1-5/+0
Now that drivers have been converted to use the flow action infrastructure, remove this field from the tc_cls_flower_offload structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add statistics retrieval infrastructure and use itPablo Neira Ayuso1-0/+4
This patch provides the flow_stats structure that acts as container for tc_cls_flower_offload, then we can use to restore the statistics on the existing TC actions. Hence, tcf_exts_stats_update() is not used from drivers anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06cls_api: add translator to flow_action representationPablo Neira Ayuso2-0/+113
This patch implements a new function to translate from native TC action to the new flow_action representation. Moreover, this patch also updates cls_flower to use this new function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add flow action infrastructurePablo Neira Ayuso3-5/+33
This new infrastructure defines the nic actions that you can perform from existing network drivers. This infrastructure allows us to avoid a direct dependency with the native software TC action representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add flow_rule and flow_match structures and use themPablo Neira Ayuso3-14/+178
This patch wraps the dissector key and mask - that flower uses to represent the matching side - around the flow_match structure. To avoid a follow up patch that would edit the same LoCs in the drivers, this patch also wraps this new flow match structure around the flow rule object. This new structure will also contain the flow actions in follow up patches. This introduces two new interfaces: bool flow_rule_match_key(rule, dissector_id) that returns true if a given matching key is set on, and: flow_rule_match_XYZ(rule, &match); To fetch the matching side XYZ into the match container structure, to retrieve the key and the mask with one single call. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net: xdp: allow generic and driver XDP on one interfaceJakub Kicinski1-5/+5
Since commit a25717d2b604 ("xdp: support simultaneous driver and hw XDP attachment") users can load an XDP program for offload and in native driver mode simultaneously. Allow a similar mix of offload and SKB mode/generic XDP. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-04rds: rdma: update rdma transport for tosSantosh Shilimkar6-22/+29
For RDMA transports, RDS TOS is an extension of IB QoS(Annex A13) to provide clients the ability to segregate traffic flows for different type of data. RDMA CM abstract it for ULPs using rdma_set_service_type(). Internally, each traffic flow is represented by a connection with all of its independent resources like that of a normal connection, and is differentiated by service type. In other words, there can be multiple qp connections between an IP pair and each supports a unique service type. The feature has been added from RDSv4.1 onwards and supports rolling upgrades. RDMA connection metadata also carries the tos information to set up SL on end to end context. The original code was developed by Bang Nguyen in downstream kernel back in 2.6.32 kernel days and it has evolved over period of time. Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes] Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-04rds: add transport specific tos_map hookSantosh Shilimkar4-4/+24
RDMA transport maps user tos to underline virtual lanes(VL) for IB or DSCP values. RDMA CM transport abstract thats for RDS. TCP transport makes use of default priority 0 and maps all user tos values to it. Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes] Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-04rds: add type of service(tos) infrastructureSantosh Shilimkar10-17/+61
RDS Service type (TOS) is user-defined and needs to be configured via RDS IOCTL interface. It must be set before initiating any traffic and once set the TOS can not be changed. All out-going traffic from the socket will be associated with its TOS. Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes] Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-04rds: rdma: add consumer rejectSantosh Shilimkar3-2/+22
For legacy protocol version incompatibility with non linux RDS, consumer reject reason being used to convey it to peer. But the choice of reject reason value as '1' was really poor. Anyway for interoperability reasons with shipping products, it needs to be supported. For any future versions, properly encoded reject reason should to be used. Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes] Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-04rds: make v3.1 as compat versionSantosh Shilimkar4-17/+29
Mark RDSv3.1 as compat version and add v4.1 version macro's. Subsequent patches enable TOS(Type of Service) feature which is tied with v4.1 for RDMA transport. Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes] Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-03net: Fix fall through warning in y2038 tstamp changes.David S. Miller1-0/+1
net/core/sock.c: In function 'sock_setsockopt': net/core/sock.c:914:3: warning: this statement may fall through [-Wimplicit-fallthrough=] sock_set_flag(sk, SOCK_TSTAMP_NEW); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/sock.c:915:2: note: here case SO_TIMESTAMPING_OLD: ^~~~ Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03bpfilter: remove extra header search paths for bpfilter_umhMasahiro Yamada2-2/+1
Currently, the header search paths -Itools/include and -Itools/include/uapi are not used. Let's drop the unused code. We can remove -I. too by fixing up one C file. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03net: Fix ip_mc_{dec,inc}_group allocation contextFlorian Fainelli2-13/+26
After 4effd28c1245 ("bridge: join all-snoopers multicast address"), I started seeing the following sleep in atomic warnings: [ 26.763893] BUG: sleeping function called from invalid context at mm/slab.h:421 [ 26.771425] in_atomic(): 1, irqs_disabled(): 0, pid: 1658, name: sh [ 26.777855] INFO: lockdep is turned off. [ 26.781916] CPU: 0 PID: 1658 Comm: sh Not tainted 5.0.0-rc4 #20 [ 26.787943] Hardware name: BCM97278SV (DT) [ 26.792118] Call trace: [ 26.794645] dump_backtrace+0x0/0x170 [ 26.798391] show_stack+0x24/0x30 [ 26.801787] dump_stack+0xa4/0xe4 [ 26.805182] ___might_sleep+0x208/0x218 [ 26.809102] __might_sleep+0x78/0x88 [ 26.812762] kmem_cache_alloc_trace+0x64/0x28c [ 26.817301] igmp_group_dropped+0x150/0x230 [ 26.821573] ip_mc_dec_group+0x1b0/0x1f8 [ 26.825585] br_ip4_multicast_leave_snoopers.isra.11+0x174/0x190 [ 26.831704] br_multicast_toggle+0x78/0xcc [ 26.835887] store_bridge_parm+0xc4/0xfc [ 26.839894] multicast_snooping_store+0x3c/0x4c [ 26.844517] dev_attr_store+0x44/0x5c [ 26.848262] sysfs_kf_write+0x50/0x68 [ 26.852006] kernfs_fop_write+0x14c/0x1b4 [ 26.856102] __vfs_write+0x60/0x190 [ 26.859668] vfs_write+0xc8/0x168 [ 26.863059] ksys_write+0x70/0xc8 [ 26.866449] __arm64_sys_write+0x24/0x30 [ 26.870458] el0_svc_common+0xa0/0x11c [ 26.874291] el0_svc_handler+0x38/0x70 [ 26.878120] el0_svc+0x8/0xc while toggling the bridge's multicast_snooping attribute dynamically. Pass a gfp_t down to igmpv3_add_delrec(), introduce __igmp_group_dropped() and introduce __ip_mc_dec_group() to take a gfp_t argument. Similarly introduce ____ip_mc_inc_group() and __ip_mc_inc_group() to allow caller to specify gfp_t. IPv6 part of the patch appears fine. Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03net: devlink: report cell size of shared buffersJakub Kicinski1-0/+3
Shared buffer allocation is usually done in cell increments. Drivers will either round up the allocation or refuse the configuration if it's not an exact multiple of cell size. Drivers know exactly the cell size of shared buffer, so help out users by providing this information in dumps. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEWDeepa Dinamani1-14/+39
Add new socket timeout options that are y2038 safe. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: ccaulfie@redhat.com Cc: davem@davemloft.net Cc: deller@gmx.de Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: cluster-devel@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixesDeepa Dinamani1-2/+2
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval as the time format. struct timeval is not y2038 safe. The subsequent patches in the series add support for new socket timeout options with _NEW suffix that will use y2038 safe data structures. Although the existing struct timeval layout is sufficiently wide to represent timeouts, because of the way libc will interpret time_t based on user defined flag, these new flags provide a way of having a structure that is the same for all architectures consistently. Rename the existing options with _OLD suffix forms so that the right option is enabled for userspace applications according to the architecture and time_t definition of libc. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: ccaulfie@redhat.com Cc: deller@gmx.de Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: cluster-devel@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Add SO_TIMESTAMPING_NEWDeepa Dinamani5-20/+61
Add SO_TIMESTAMPING_NEW variant of socket timestamp options. This is the y2038 safe versions of the SO_TIMESTAMPING_OLD for all architectures. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: chris@zankel.net Cc: fenghua.yu@intel.com Cc: rth@twiddle.net Cc: tglx@linutronix.de Cc: ubraun@linux.ibm.com Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-s390@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Add SO_TIMESTAMP[NS]_NEWDeepa Dinamani5-22/+91
Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of socket timestamp options. These are the y2038 safe versions of the SO_TIMESTAMP_OLD and SO_TIMESTAMPNS_OLD for all architectures. Note that the format of scm_timestamping.ts[0] is not changed in this patch. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: jejb@parisc-linux.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: linux-alpha@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Use old_timeval types for socket timestampsDeepa Dinamani5-8/+8
As part of y2038 solution, all internal uses of struct timeval are replaced by struct __kernel_old_timeval and struct compat_timeval by struct old_timeval32. Make socket timestamps use these new types. This is mainly to be able to verify that the kernel build is y2038 safe when such non y2038 safe types are not supported anymore. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: isdn@linux-pingi.de Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLDDeepa Dinamani7-21/+21
SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the way they are currently defined, are not y2038 safe. Subsequent patches in the series add new y2038 safe versions of these options which provide 64 bit timestamps on all architectures uniformly. Hence, rename existing options with OLD tag suffixes. Also note that kernel will not use the untagged SO_TIMESTAMP* and SCM_TIMESTAMP* options internally anymore. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: deller@gmx.de Cc: dhowells@redhat.com Cc: jejb@parisc-linux.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: linux-afs@lists.infradead.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: move compat timeout handling into sock.cArnd Bergmann3-89/+46
This is a cleanup to prepare for the addition of 64-bit time_t in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems unnecessarily complex and error-prone, moving it all into the main setsockopt()/getsockopt() implementation requires half as much code and is easier to extend. 32-bit user space can now use old_timeval32 on both 32-bit and 64-bit machines, while 64-bit code can use __old_kernel_timeval. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03ipv4/igmp: Don't drop IGMP pkt with zeros src addrEdward Chron1-1/+2
Don't drop IGMP packets with a source address of all zeros which are IGMP proxy reports. This is documented in Section 2.1.1 IGMP Forwarding Rules of RFC 4541 IGMP and MLD Snooping Switches Considerations. Signed-off-by: Edward Chron <echron@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2-6/+22
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-02-01 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) introduce bpf_spin_lock, from Alexei. 2) convert xdp samples to libbpf, from Maciej. 3) skip verifier tests for unsupported program/map types, from Stanislav. 4) powerpc64 JIT support for BTF line info, from Sandipan. 5) assorted fixed, from Valdis, Jesper, Jiong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01ethtool: add compat for devlink infoJakub Kicinski2-0/+70
If driver did not fill the fw_version field, try to call into the new devlink get_info op and collect the versions that way. We assume ethtool was always reporting running versions. v4: - use IS_REACHABLE() to avoid problems with DEVLINK=m (kbuildbot). v3 (Jiri): - do a dump and then parse it instead of special handling; - concatenate all versions (well, all that fit :)). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01devlink: add version reporting to devlink info APIJakub Kicinski1-0/+57
ethtool -i has a few fixed-size fields which can be used to report firmware version and expansion ROM version. Unfortunately, modern hardware has more firmware components. There is usually some datapath microcode, management controller, PXE drivers, and a CPLD load. Running ethtool -i on modern controllers reveals the fact that vendors cram multiple values into firmware version field. Here are some examples from systems I could lay my hands on quickly: tg3: "FFV20.2.17 bc 5720-v1.39" i40e: "6.01 0x800034a4 1.1747.0" nfp: "0.0.3.5 0.25 sriov-2.1.16 nic" Add a new devlink API to allow retrieving multiple versions, and provide user-readable name for those versions. While at it break down the versions into three categories: - fixed - this is the board/fixed component version, usually vendors report information like the board version in the PCI VPD, but it will benefit from naming and common API as well; - running - this is the running firmware version; - stored - this is firmware in the flash, after firmware update this value will reflect the flashed version, while the running version may only be updated after reboot. v3: - add per-type helpers instead of using the special argument (Jiri). RFCv2: - remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now versions are mixed with other info attrs)l - have the driver report versions from the same callback as other info. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01devlink: add device information APIJakub Kicinski1-0/+112
ethtool -i has served us well for a long time, but its showing its limitations more and more. The device information should also be reported per device not per-netdev. Lay foundation for a simple devlink-based way of reading device info. Add driver name and device serial number as initial pieces of information exposed via this new API. v3: - rename helpers (Jiri); - rename driver name attr (Jiri); - remove double spacing in commit message (Jiri). RFC v2: - wrap the skb into an opaque structure (Jiri); - allow the serial number of be any length (Jiri & Andrew); - add driver name (Jonathan). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01ipconfig: add carrier_timeout kernel parameterMartin Kepplinger1-5/+22
commit 3fb72f1e6e61 ("ipconfig wait for carrier") added a "wait for carrier" policy, with a fixed worst case maximum wait of two minutes. Now make the wait for carrier timeout configurable on the kernel commandline and use the 120s as the default. The timeout messages introduced with commit 5e404cd65860 ("ipconfig: add informative timeout messages while waiting for carrier") are done in a fixed interval of 20 seconds, just like they were before (240/12). Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01ipv4: fib: use struct_size() in kzalloc()Gustavo A. R. Silva1-1/+1
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>