Age | Commit message (Collapse) | Author | Files | Lines |
|
bd_super is only set by get_tree_bdev and mount_bdev, and thus not by
other openers like btrfs or the XFS realtime and log devices, as well as
block devices directly opened from user space. Check bd_openers
instead.
Fixes: 77032ca66f86 ("Return EBUSY from BLKRRPART for mounted whole-dev fs")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In case rdma accept fails at nvmet_rdma_queue_connect(), release work is
scheduled. Later on, a new RDMA CM event may arrive since we didn't
destroy the cm-id and call nvmet_rdma_queue_connect_fail(), which
schedule another release work. This will cause calling
nvmet_rdma_free_queue twice. To fix this we implicitly destroy the cm_id
with non-zero ret code, which guarantees that new rdma_cm events will
not arrive afterwards. Also add a qp pointer to nvmet_rdma_queue
structure, so we can use it when the cm_id pointer is NULL or was
destroyed.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Unburden the drivers from checking if a call to commit_rqs() is valid by
not calling it when there are no requests to commit.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The original patch was to resolve the lldd being able to be unloaded
while being used to talk to the boot device of the system. However, the
end result of the original patch is that any driver unload while a nvme
controller is live via the lldd is now being prohibited. Given the module
reference, the module teardown routine can't be called, thus there's no
way, other than manual actions to terminate the controllers.
Fixes: 863fbae929c7 ("nvme_fc: add module to ops template to allow module references")
Cc: <stable@vger.kernel.org> # v5.4+
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The deadlock combines 4 flows in parallel:
- ns scanning (triggered from reconnect)
- request timeout
- ANA update (triggered from reconnect)
- I/O coming into the mpath device
(1) ns scanning triggers disk revalidation -> update disk info ->
freeze queue -> but blocked, due to (2)
(2) timeout handler reference the g_usage_counter - > but blocks in
the transport .timeout() handler, due to (3)
(3) the transport timeout handler (indirectly) calls nvme_stop_queue() ->
which takes the (down_read) namespaces_rwsem - > but blocks, due to (4)
(4) ANA update takes the (down_write) namespaces_rwsem -> calls
nvme_mpath_set_live() -> which synchronize the ns_head srcu
(see commit 504db087aacc) -> but blocks, due to (5)
(5) I/O came into nvme_mpath_make_request -> took srcu_read_lock ->
direct_make_request > blk_queue_enter -> but blocked, due to (1)
==> the request queue is under freeze -> deadlock.
The fix is making ANA update take a read lock as the namespaces list
is not manipulated, it is just the ns and ns->head that are being
updated (which is protected with the ns->head lock).
Fixes: 0d0b660f214dc ("nvme: add ANA support")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
RDMA_CM_EVENT_ADDR_CHANGE event occur in the case of bonding failover
on normal as well as on listening cm_ids. Hence this event will
immediately trigger a NULL dereference trying to disconnect a queue
for a cm_id that actually belongs to the port.
To fix this we provide a different handler for the listener cm_ids
that will defer a work to disable+(re)enable the port which essentially
destroys and setups another listener cm_id
Reported-by: Alex Lyakas <alex@zadara.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Tested-by: Alex Lyakas <alex@zadara.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
If the backing device for a loop device is itself a block device,
then mirror the "write zeroes" capabilities of the underlying
block device into the loop device. Copy this capability into both
max_write_zeroes_sectors and max_discard_sectors of the loop device.
The reason for this is that REQ_OP_DISCARD on a loop device translates
into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This
presents a consistent interface for loop devices (that discarded data
is zeroed), regardless of the backing device type of the loop device.
There should be no behavior change for loop devices backed by regular
files.
This change fixes blktest block/003, and removes an extraneous
error print in block/013 when testing on a loop device backed
by a block device that does not support discard.
Signed-off-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Gwendal Grignou <gwendal@chromium.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[used updated version of Evan's comment in loop_config_discard()]
[moved backingq to local scope, removed redundant braces]
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Properly plumb out EOPNOTSUPP from loop driver operations, which may
get returned when for instance a discard operation is attempted but not
supported by the underlying block device. Before this change, everything
was reported in the log as an I/O error, which is scary and not
helpful in debugging.
Signed-off-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Gwendal Grignou <gwendal@chromium.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When item release is called, the parent is already null. We need the
parent to pass to nvmet_referral_disable so hook it up to
->disconnect_notify.
Reported-by: Tony Asleson <tasleson@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
If the backing device require stable pages, we need to set it on the
stack mpath device as well. This applies to rdma/fc transports when
doing data integrity and tcp transport calculating digests.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
don't get offlined while there are active cgwbs on them. However, it
ends up making offlining unordered sometimes causing parents to be
offlined before children.
Let's fix this by making child blkcgs pin the parents' online states.
Note that pin/unpin names are chosen over get/put intentionally
because css uses get/put online for something different.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
don't get offlined while there are active cgwbs on them. However, it
ends up making offlining unordered sometimes causing parents to be
offlined before children.
To fix it, we want child blkcgs to pin the parents' online states
turning the refcnt into a more generic online pinning mechanism.
In prepartion,
* blkcg->cgwb_refcnt -> blkcg->online_pin
* blkcg_cgwb_get/put() -> blkcg_pin/unpin_online()
* Take them out of CONFIG_CGROUP_WRITEBACK
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If the target misbehaves and sends us unexpected payload we
need to make sure to fail the controller and stop processing
the input stream. We clear the rd_enabled flag and stop
the io_work, but we may still requeue it if we still have pending
sends and then in the next invocation we will process the input
stream as the check is only in the .data_ready upcall.
To fix this we need to make sure not to self-requeue io_work
upon a recv flow error.
This fixes the crash:
nvme nvme2: receive failed: -22
BUG: unable to handle page fault for address: ffffbeb5816c3b48
nvme_ns_head_make_request: 29 callbacks suppressed
block nvme0n5: no usable path - requeuing I/O
block nvme0n5: no usable path - requeuing I/O
block nvme0n7: no usable path - requeuing I/O
block nvme0n7: no usable path - requeuing I/O
block nvme0n3: no usable path - requeuing I/O
block nvme0n3: no usable path - requeuing I/O
block nvme0n3: no usable path - requeuing I/O
block nvme0n7: no usable path - requeuing I/O
block nvme0n3: no usable path - requeuing I/O
block nvme0n3: no usable path - requeuing I/O
#PF: supervisor read access inkernel mode
#PF: error_code(0x0000) - not-present page
PGD 1039157067 P4D 1039157067 PUD 103915a067 PMD 102719f067 PTE 0
Oops: 0000 [#1] SMP PTI
CPU: 8 PID: 411 Comm: kworker/8:1H Not tainted 5.3.0-40-generic #32~18.04.1-Ubuntu
Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
RIP: 0010:nvme_tcp_recv_skb+0x2ae/0xb50 [nvme_tcp]
RSP: 0018:ffffbeb5806cfd10 EFLAGS: 00010246
RAX: ffffbeb5816c3b48 RBX: 00000000000003d0 RCX: 0000000000000008
RDX: 00000000000003d0 RSI: 0000000000000001 RDI: ffff9a3040684b40
RBP: ffffbeb5806cfd90 R08: 0000000000000000 R09: ffffffff946e6900
R10: ffffbeb5806cfce0 R11: 0000000000000001 R12: 0000000000000000
R13: ffff9a2ff86501c0 R14: 00000000000003d0 R15: ffff9a30b85f2798
FS: 0000000000000000(0000) GS:ffff9a30bf800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffbeb5816c3b48 CR3: 000000088400a006 CR4: 00000000003626e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcp_read_sock+0x8c/0x290
? __release_sock+0x9d/0xe0
? nvme_tcp_write_space+0xb0/0xb0 [nvme_tcp]
nvme_tcp_io_work+0x4b4/0x830 [nvme_tcp]
? finish_task_switch+0x163/0x270
process_one_work+0x1fd/0x3f0
worker_thread+0x34/0x410
kthread+0x121/0x140
? process_one_work+0x3f0/0x3f0
? kthread_park+0xb0/0xb0
ret_from_fork+0x35/0x40
Reported-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Every remaining user just has the error case returning -EFAULT.
In fact, the exception was __get_user_asm_nozero(), which was removed in
commit 4b842e4e25b1 ("x86: get rid of small constant size cases in
raw_copy_{to,from}_user()"), and the other __get_user_xyz() macros just
followed suit for consistency.
Fix up some macro whitespace while at it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The last user was removed by commit 4b842e4e25b1 ("x86: get rid of small
constant size cases in raw_copy_{to,from}_user()"). Get rid of the
left-overs before somebody tries to use it again.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In case memory resources for buf were allocated, release them before
return.
Addresses-Coverity-ID: 1492011 ("Resource leak")
Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Included nic tls statistics in ethtool stats.
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix an oops in dsa_port_phylink_mac_change() caused by a combination
of a20f997010c4 ("net: dsa: Don't instantiate phylink for CPU/DSA
ports unless needed") and the net-dsa-improve-serdes-integration
series of patches 65b7a2c8e369 ("Merge branch
'net-dsa-improve-serdes-integration'").
Unable to handle kernel NULL pointer dereference at virtual address 00000124
pgd = c0004000
[00000124] *pgd=00000000
Internal error: Oops: 805 [#1] SMP ARM
Modules linked in: tag_edsa spi_nor mtd xhci_plat_hcd mv88e6xxx(+) xhci_hcd armada_thermal marvell_cesa dsa_core ehci_orion libdes phy_armada38x_comphy at24 mcp3021 sfp evbug spi_orion sff mdio_i2c
CPU: 1 PID: 214 Comm: irq/55-mv88e6xx Not tainted 5.6.0+ #470
Hardware name: Marvell Armada 380/385 (Device Tree)
PC is at phylink_mac_change+0x10/0x88
LR is at mv88e6352_serdes_irq_status+0x74/0x94 [mv88e6xxx]
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A testing message was brought by 13d0f7b814d9 ("net/bpfilter: fix dprintf
usage for /dev/kmsg") but should've been deleted before patch submission.
Although it doesn't cause any harm to the code or functionality itself, it's
totally unpleasant to have it displayed on every loop iteration with no real
use case. Thus remove it unconditionally.
Fixes: 13d0f7b814d9 ("net/bpfilter: fix dprintf usage for /dev/kmsg")
Signed-off-by: Bruno Meneguele <bmeneg@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
fixed-link nodes are treated as PHY nodes by of_mdiobus_child_is_phy().
We must check if the interface is a fixed-link before looking up for PHY
nodes.
Fixes: 7897b071ac3b ("net: macb: convert to phylink")
Tested-by: Cristian Birsan <cristian.birsan@microchip.com>
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KSZ protocol tag is needed by the KSZ DSA drivers.
Fixes: 0b9f9dfbfab4 ("dsa: Allow tag drivers to be built as modules")
Tested-by: Cristian Birsan <cristian.birsan@microchip.com>
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update kselftest help information.
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
In error recovery we might be removing the queue so check we
can actually poll before we do.
Reported-by: Mark Wunderlich <mark.wunderlich@intel.com>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We cannot look at blk_rq_payload_bytes without first checking
that the request has a mappable physical segments first (e.g.
blk_rq_nr_phys_segments(rq) != 0) and only then to take the
request payload bytes. This caused us to send a wrong sgl to
the target or even dereference a non-existing buffer in case
we actually got to the data send sequence (if it was in-capsule).
Reported-by: Tony Asleson <tasleson@redhat.com>
Suggested-by: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Fix typo in comment: about should be abort
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chiatanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
|
|
Use a semicolon at the end of an assignment expression.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
There's been a longstanding bug of LS completions which freed ls ops,
particularly the disconnect LS, while executing on a work context that
is in the memory being free. Not a good thing to do.
Rework LS handling to make callbacks in the rport context rather than
the ls_request context.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
On a 32-bit kernel, the upper bits of userspace addresses passed via
various ioctls are silently ignored by the nvme driver.
However on a 64-bit kernel running a compat task, these upper bits are
not ignored and are in fact required to be zero for the ioctls to work.
Unfortunately, this difference matters. 32-bit smartctl submits the
NVME_IOCTL_ADMIN_CMD ioctl with garbage in these upper bits because it
seems the pointer value it puts into the nvme_passthru_cmd structure is
sign extended. This works fine on 32-bit kernels but fails on a 64-bit
one because (at least on my setup) the addresses smartctl uses are
consistently above 2G. For example:
# smartctl -x /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.5.11] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Bad address
Since changing 32-bit kernels to actually check all of the submitted
address bits now would break existing userspace, this patch fixes the
compat problem by explicitly zeroing the upper bits in the compat case.
This enables 32-bit smartctl to work on a 64-bit kernel.
Signed-off-by: Nick Bowler <nbowler@draconx.ca>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
In case memory resources for dummy_data were allocated, release them
before return.
Addresses-Coverity-ID: 1491997 ("Resource leak")
Fixes: 7ef19d3b1d5e ("devlink: report error once U32_MAX snapshot ids have been used")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add EHL SGMII 2.5Gbps PCI info and PCI ID
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add EHL PSE0/1 RGMII & SGMII 1Gbps PCI info and PCI ID
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As stmmac_pci.c file is getting bigger and more complex, it is reasonable
to separate all the Intel specific dwmac pci device to a different file.
This move includes Intel Quark, TGL and EHL. A new kernel config
CONFIG_DWMAC_INTEL is introduced and depends on X86. For this initial
patch, all the necessary function such as probe() and exit() are identical
besides the function name.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The port to which the ASP is connected on 7278 is not capable of
processing VLAN tags as part of the Ethernet frame, so allow an user to
configure the egress VLAN policy they want to see applied by purposing
the h_ext.data[1] field. Bit 0 is used to indicate that 0=tagged,
1=untagged.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update relevant code paths to support the programming and matching of
VLAN TCI, this is the only member of the ethtool_flow_ext that we can
match, the switch does not permit matching the VLAN Ethernet Type field.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for matching VLANs, move the writing of CFP_DATA(5) into
the IPv4 and IPv6 slicing logic since they are part of the per-flow
configuration.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We do not currently support matching on FLOW_EXT or FLOW_MAC_EXT, but we
were not checking for those bits being set in the flow specification.
The check for FLOW_EXT and FLOW_MAC_EXT are separated out because a
subsequent commit will add support for matching VLAN TCI which are
covered by FLOW_EXT.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We don't want to enable learning for the ASP port since it only receives
directed traffic, this allows us to bypass ARL-driven forwarding rules
which could conflict with Broadcom tags and/or CFP forwarding.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On 7278, port 7 connects to the ASP which should only receive frames
through the use of CFP rules, it is not desirable to have it be part of
a bridge at all since that would make it pick up unwanted traffic that
it may not even be able to filter or sustain.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On 7278, port 7 of the switch connects to the ASP UniMAC which is not
capable of processing VLAN tagged frames. We can still allow the port to
be part of a VLAN entry, and we may want it to be untagged on egress on
that VLAN because of that limitation.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The first time b53_configure_vlan() is called we have not configured any
VLAN entries yet, since that happens later when interfaces get brought
up. When b53_configure_vlan() is called again from suspend/resume we
need to restore all VLAN entries though.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit f949a12fd697 ("net: dsa: bcm_sf2: fix buffer overflow doing
set_rxnfc") tried to fix the some user controlled buffer overflows in
bcm_sf2_cfp_rule_set() and bcm_sf2_cfp_rule_del() but the fix was using
CFP_NUM_RULES, which while it is correct not to overflow the bitmaps, is
not representative of what the device actually supports. Correct that by
using bcm_sf2_cfp_rule_size() instead.
The latter subtracts the number of rules by 1, so change the checks from
greater than or equal to greater than accordingly.
Fixes: f949a12fd697 ("net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The vzalloc_node(), already rounds the total size to whole pages, and
sizeof(u64) is smaller than sizeof(struct recv_comp_data). So
round_up of recv_completion_cnt is not necessary, and may cause extra
memory allocation.
To save memory, remove this unnecessary round_up for recv_completion_cnt.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add test cases that verify that each registered packet trap policer:
* Honors that imposed limitations of rate and burst size
* Able to police trapped packets to the specified rate
* Able to police trapped packets to the specified burst size
* Able to be unbound from its trap group
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement support for setting of packet trap group parameters by
invoking the trap_group_init() callback with the new parameters.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some packet traps are currently exposed to user space as being member of
"l3_drops" trap group, but internally they are member of a different
group.
Switch these traps to use the correct group so that they are all subject
to the same policer, as exposed to user space.
Set the trap priority of packets trapped due to loopback error during
routing to the lowest priority. Such packets are not routed again by the
kernel and therefore should not mask other traps (e.g., host miss) that
should be routed.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The policer is now initialized as part of the registration with devlink,
so there is no need to initialize it before the registration.
Remove the initialization.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register supported packet trap policers with devlink and implement
callbacks to change their parameters and read their counters.
Prevent user space from passing invalid policer parameters down to the
device by checking their validity and communicating the failure via an
appropriate extack message.
v2:
* Remove the max/min validity checks from __mlxsw_sp_trap_policer_set()
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prepare an array of policer IDs to register with devlink and their
associated parameters.
The array is composed from both policers that are currently bound to
exposed trap groups and policers that are not bound to any trap group.
v2:
* Provide max/min rate/burst size when registering policers
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During initialization the driver configures various packet trap groups
and binds policers to them.
Currently, most of these groups are not exposed to user space and
therefore their policers should not be exposed as well. Otherwise, user
space will be able to alter policer parameters without knowing which
packet traps are policed by the policer.
Use a bitmap to track the used policer IDs so that these policers will
not be registered with devlink in a subsequent patch.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The QoS Policer Configuration Register (QPCR) is used to configure
hardware policers. Extend this register with following fields and
defines which will be used by subsequent patches:
1. Violate counter: reads number of packets dropped by the policer
2. Clear counter: to ensure we start counting from 0
3. Rate and burst size limits
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|