Age | Commit message (Collapse) | Author | Files | Lines |
|
Despite recent improvements, the handling of error codes and return
values at reception of messages in the socket layer is still confusing.
In this commit, we try to make it more comprehensible. First, we
separate between the return values coming from the functions called
by tipc_sk_rcv(), -those are TIPC specific error codes, and the
return values returned by tipc_sk_rcv() itself. Second, we don't
use the returned TIPC error code as indication for whether a buffer
should be forwarded/rejected or not; instead we use the buffer pointer
passed along with filter_msg(). This separation is necessary because
we sometimes want to forward messages even when there is no error
(i.e., protocol messages and successfully secondary looked up data
messages).
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The most common usage of namespace information is when we fetch the
own node addess from the net structure. This leads to a lot of
passing around of a parameter of type 'struct net *' between
functions just to make them able to obtain this address.
However, in many cases this is unnecessary. The own node address
is readily available as a member of both struct tipc_sock and
tipc_link, and can be fetched from there instead.
The fact that the vast majority of functions in socket.c and link.c
anyway are maintaining a pointer to their respective base structures
makes this option even more compelling.
In this commit, we introduce the inline functions tsk_own_node()
and link_own_node() to make it easy for functions to fetch the node
address from those structs instead of having to pass along and
dereference the namespace struct.
In particular, we make calls to the msg_xx() functions in msg.{h,c}
context independent by directly passing them the own node address
as parameter when needed. Those functions should be regarded as
leaves in the code dependency tree, and it is hence desirable to
keep them namspace unaware.
Apart from a potential positive effect on cache behavior, these
changes make it easier to introduce the changes that will follow
later in this series.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass the static attribute groups and the driver data via
tty_port_register_device_attr() instead of manual device_create_file()
and device_remove_file() calls.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Bas Peters <baspeters93@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Bas Peters <baspeters93@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Bas Peters <baspeters93@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes following sparse warnings :
net/ipv6/sit.c:1509:32: warning: incorrect type in assignment (different base types)
net/ipv6/sit.c:1509:32: expected restricted __be16 [usertype] sport
net/ipv6/sit.c:1509:32: got unsigned short
net/ipv6/sit.c:1514:32: warning: incorrect type in assignment (different base types)
net/ipv6/sit.c:1514:32: expected restricted __be16 [usertype] dport
net/ipv6/sit.c:1514:32: got unsigned short
net/ipv6/sit.c:1711:38: warning: incorrect type in argument 3 (different base types)
net/ipv6/sit.c:1711:38: expected unsigned short [unsigned] [usertype] value
net/ipv6/sit.c:1711:38: got restricted __be16 [usertype] sport
net/ipv6/sit.c:1713:38: warning: incorrect type in argument 3 (different base types)
net/ipv6/sit.c:1713:38: expected unsigned short [unsigned] [usertype] value
net/ipv6/sit.c:1713:38: got restricted __be16 [usertype] dport
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types)
include/net/ipv6.h:713:22: expected restricted __be32 [usertype] hash
include/net/ipv6.h:713:22: got unsigned int
include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer
include/net/ipv6.h:719:22: warning: invalid assignment: ^=
include/net/ipv6.h:719:22: left side has type restricted __be32
include/net/ipv6.h:719:22: right side has type unsigned int
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
netdev_adjacent_add_links() and netdev_adjacent_del_links()
are static.
queue->qdisc has __rcu annotation, need to use RCU_INIT_POINTER()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(struct flow_keys)->n_proto is in network order, use
proper type for this.
Fixes following sparse errors :
net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:139:39: expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:139:39: got restricted __be16 [assigned] [usertype] proto
net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:237:23: expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:237:23: got restricted __be16 [assigned] [usertype] proto
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This allows for a VXLAN-GBP socket to talk to a Linux VXLAN socket by
not setting any of the bits.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
info is in network byte order, change it back to host byte order
before use. In particular, the current code sets the MTU of the tunnel
to a wrong (too big) value.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of manual calls of device_create_file() and
device_remove_files(), assign the static attribute groups to netdev
groups array. This simplifies the code and avoids the possible
races.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of manual calls of device_create_file() and
device_remove_files(), assign the static attribute groups to netdev
groups array. This simplifies the code and avoids the possible
races.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The functions kfree() and vfree() perform also input parameter validation.
Thus the test around their calls is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
this patch fixes following sparse warning:
cpsw-common.c:23:5: warning: symbol 'cpsw_am33xx_cm_get_macid' was not declared. Should it be static?
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After d75b1ade567f ("net: less interrupt masking in NAPI")
driver's NAPI poll routine is expected to return
exact budget value if it wants to be re-called.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Fixes: d75b1ade567f ("net: less interrupt masking in NAPI")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The release_firmware() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
cxgb_busy_poll, corresponding to ndo_busy_poll, gets called by the socket
waiting for data.
With busy_poll enabled, improvement is seen in latency numbers as observed by
collecting netperf TCP_RR numbers.
Below are latency number, with and without busy-poll, in a switched environment
for a particular msg size:
netperf command: netperf -4 -H <ip> -l 30 -t TCP_RR -- -r1,1
Latency without busy-poll: ~16.25 us
Latency with busy-poll : ~08.79 us
Based on original work by Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 9a05dde59a35eee5643366d3d1e1f43fc9069adb.
Requested by Scott Feldman.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FQ has a fast path for skb attached to a socket, as it does not
have to compute a flow hash. But for other packets, FQ being non
stochastic means that hosts exposed to random Internet traffic
can allocate million of flows structure (104 bytes each) pretty
easily. Not only host can OOM, but lookup in RB trees can take
too much cpu and memory resources.
This patch adds a new attribute, orphan_mask, that is adding
possibility of having a stochastic hash for orphaned skb.
Its default value is 1024 slots, to mimic SFQ behavior.
Note: This does not apply to locally generated TCP traffic,
and no locally generated traffic will share a flow structure
with another perfect or stochastic flow.
This patch also handles the specific case of SYNACK messages:
They are attached to the listener socket, and therefore all map
to a single hash bucket. If listener have set SO_MAX_PACING_RATE,
hoping to have new accepted socket inherit this rate, SYNACK
might be paced and even dropped.
This is very similar to an internal patch Google have used more
than one year.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we added pacing to TCP, we decided to let sch_fq take care
of actual pacing.
All TCP had to do was to compute sk->pacing_rate using simple formula:
sk->pacing_rate = 2 * cwnd * mss / rtt
It works well for senders (bulk flows), but not very well for receivers
or even RPC :
cwnd on the receiver can be less than 10, rtt can be around 100ms, so we
can end up pacing ACK packets, slowing down the sender.
Really, only the sender should pace, according to its own logic.
Instead of adding a new bit in skb, or call yet another flow
dissection, we tweak skb->truesize to a small value (2), and
we instruct sch_fq to use new helper and not pace pure ack.
Note this also helps TCP small queue, as ack packets present
in qdisc/NIC do not prevent sending a data packet (RPC workload)
This helps to reduce tx completion overhead, ack packets can use regular
sock_wfree() instead of tcp_wfree() which is a bit more expensive.
This has no impact in the case packets are sent to loopback interface,
as we do not coalesce ack packets (were we would detect skb->truesize
lie)
In case netem (with a delay) is used, skb_orphan_partial() also sets
skb->truesize to 1.
This patch is a combination of two patches we used for about one year at
Google.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch gets rid of the manual rhashtable walk in nft_hash
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.
Note that I'm leaving nft_hash_destroy alone since it's only
invoked on shutdown and it shouldn't be affected by changes
to rhashtable internals (or at least not what I'm planning to
change).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch gets rid of the manual rhashtable walk in netlink
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.
In fact the existing code was very buggy. Some sockets weren't
shown at all while others were shown more than once.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some existing rhashtable users get too intimate with it by walking
the buckets directly. This prevents us from easily changing the
internals of rhashtable.
This patch adds the helpers rhashtable_walk_init/exit/start/next/stop
which will replace these custom walkers.
They are meant to be usable for both procfs seq_file walks as well
as walking by a netlink dump. The iterator structure should fit
inside a netlink dump cb structure, with at least one element to
spare.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current being_destroyed check in rhashtable_expand is not
enough since if we start a shrinking process after freeing all
elements in the table that's also going to crash.
This patch adds a being_destroyed check to the deferred worker
thread so that we bail out as soon as we take the lock.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The RSS support requires enablement based on the features reported by
the hardware. The setting of this flag is missing. Add support to
set the RSS enablement flag based on the reported hardware features.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The functions cpsw_ale_destroy() and of_dev_put() test whether their argument
is NULL and then return immediately. Thus the test around the call
is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The of_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The number of traffic classes reported by the hardware is zero-based
so increment the value returned to get an actual count.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In tcf_exts_dump_stats(), ensure that exts->actions is not empty before
accessing the first element of that list and calling tcf_action_copy_stats()
on it. This fixes some random segvs when adding filters of type "basic" with
no particular action.
This also fixes the dumping of those "no-action" filters, which more often
than not made calls to tcf_action_copy_stats() fail and consequently netlink
attributes added by the caller to be removed by a call to nla_nest_cancel().
Fixes: 33be62715991 ("net_sched: act: use standard struct list_head")
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Configuring fq with quantum 0 hangs the system, presumably because of a
non-interruptible infinite loop. Either way quantum 0 does not make sense.
Reproduce with:
sudo tc qdisc add dev lo root fq quantum 0 initial_quantum 0
ping 127.0.0.1
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit [8975626ea35a: drm/cirrus: allow 32bpp framebuffers for
cirrus drm] broke X modesetting driver because cirrus driver still
provides the full list of modes up to 1280x1024 while the 32bpp can
support only up to 800x600.
We might be able to filter out the invalid modes in mode_valid
callback, but unfortunately the bpp in question can't be referred
there for now (let me know if there is a better way to retrieve the
bpp for the probed fb).
So, instead, this patch adds the bpp module option to specify the
maximal bpp explicitly and limits the resolutions in get_modes
depending on its value.
The default value is set to 24 so that the existing stuff keeps
working. If you need a new 32bpp feature, specify cirrus.bpp=32
option explicitly.
Fixes: 8975626ea35a ('drm/cirrus: allow 32bpp framebuffers for cirrus drm')
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Notify users when TX vlan offload feature changed with ethtool.
Relevant command - ethtool -K <eth> txvlan on/off.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch improves memory utilization and therefore the packets rate
for special MTU's. Instead of setting the frag_stride to the maximal
hard coded frag_size, use the actual frag_size that is set according to
the MTU, when setting the stride of the last frag.
So, for example, for MTU 1600, where the frag_size of the 2nd frag is
86, the frag_size is set to 128 instead of 4096. See below:
Before:
frag:0 - size:1536 prefix:0 stride:1536
frag:1 - size:86 prefix:1536 stride:4096
frag 0 allocator: - size:32768 frags:21
frag 1 allocator: - size:32768 frags:8
After:
frag:0 - size:1536 prefix:0 stride:1536
frag:1 - size:86 prefix:1536 stride:128
frag 0 allocator: - size:32768 frags:21
frag 1 allocator: - size:32768 frags:256
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After Initialization of page_alloc, print actual allocated page
size and number of frags it contains. prints is done only when drv
message level is set on the interface.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Align the IDs in the code with the modinfo, lspci -n, etc tools outputs.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We do support cache line sizes of 32 and 64 bytes without activating the
CQE stride feature. Fix a misleading print saying that these cache line
sizes aren't supported.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add Initialization to struct config_dev before filling and using it.
Fix to warning:
warning: config_dev.rx_checksum_val may be used uninitialized in this function
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
a) Previously, mlx4_mr_rereg_write filled the MPT's start
and length with the old MPT's values.
Fixing the initialization to take the new start and length.
b) In addition access flags in mpt_status were initialized instead of
status due to bad boolean operation. Fixing the operation.
c) Initialization of pd_slave caused a protection error.
Fix - removing this initialization.
d) In resource_tracker.c: Fixing vf encoding to be one-based.
Fixes: e630664c ('mlx4_core: Add helper functions to support MR re-registration')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the mlx4 IB (RoCE) device works in link aggregation mode, it
exposes a single port to upper layers. Therefore, applications always
set '1' in port_num attribute when modifying a QP or creating an address handle.
To make sure that a node uses all available ports the mlx4 driver will
override the port_num attribute with a round robin policy.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In port aggregation mode flows for port #1 (the only port) should be mirrored
on port #2. This is because packets can arrive from either physical ports.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register the interface with the mlx4 core driver with port aggregation support
and check for port aggregation mode when the 'add' function is called.
In this mode, only one physical port is reported to the upper layer
(RoCE/IB core stack and ULPs).
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This function is implemented twice... get rid of one copy.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Capture NETDEV events generated by the bonding driver and based on that
make decisions of how to configure port aggregation in the mlx4 core driver.
This includes setting the V2P port table and re-creating the interested
interfaces in bonded/non-bonded mode.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Supply interface functions to bond and unbond ports of a mlx4 internal
interfaces. Example for such an interface is the one registered by the
mlx4 IB driver under RoCE.
There are
1. Functions to go in/out to/from bonded mode
2. Function to remap virtual ports to physical ports
The bond_mutex prevents simultaneous access to data that keep status of
the device in bonded mode.
The upper mlx4 interface marks to the mlx4 core module that they
want to be subject for such bonding by setting the MLX4_INTFF_BONDING
flag. Interface which goes to/from bonded mode is re-created.
The mlx4 Ethernet driver does not set this flag when registering the
interface, the IB driver does.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|