Age | Commit message (Collapse) | Author | Files | Lines |
|
This is now feasible. We protect the submission queue ring with
->sq_lock, and the completion side with ->cq_lock.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Split the completion of events into a two part process:
1) Reap the events inside the queue lock
2) Complete the events outside the queue lock
Since we never wrap the queue, we can access it locklessly after we've
updated the completion queue head. This patch started off with batching
events on the stack, but with this trick we don't have to. Keith Busch
<keith.busch@intel.com> came up with that idea.
Note that this kills the ->cqe_seen as well. I haven't been able to
trigger any ill effects of this. If we do race with polling every so
often, it should be rare enough NOT to trigger any issues.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: refactored, restored poll early exit optimization]
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We only clear it dynamically in nvme_suspend_queue(). When we do, ensure
to do a full flush so that any nvme_queue_rq() invocation will see it.
Ideally we'd kill this check completely, but we're using it to flush
requests on a dying queue.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We always check the completion queue after submitting, but in my testing
this isn't a win even on DRAM/xpoint devices. In some cases it's
actually worse. Kill it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We always look at the current CQ head and phase, so don't pass these
as separate arguments, and rename the function to nvme_cqe_pending.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We'll need that in the PCIe driver soon as we'll read it straight off the
CQ.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
AER handling expects a successful return from slot_reset means the
driver made the device functional again. The nvme driver had been using
an asynchronous reset to recover the device, so the device
may still be initializing after control is returned to the
AER handler. This creates problems for subsequent event handling,
causing the initializion to fail.
This patch fixes that by syncing the controller reset before returning
to the AER driver, and reporting the true state of the reset.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199657
Reported-by: Alex Gagniuc <mr.nuke.me@gmail.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Tested-by: Alex Gagniuc <mr.nuke.me@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
It is possible the driver's remove may have freed the controller if
the remove callback is invoked prior to the async_schedule starting
the reset_work. This patch fixes that by holding a reference on the
controller.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
Currently nvme reconfigures discard for every disk revalidation. This
is problematic because any O_WRONLY or O_RDWR open will trigger a
partition scan through udev/systemd, and we will reconfigure discard.
This blows away any user settings, like discard_max_bytes.
Only re-configure the user settable settings if we need to.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[removed redundant queue flag setting]
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
This patch schedules the initial controller reset in an async_domain
so that it can be synchronized from wait_for_device_probe(). This way
the kernel waits for the initial nvme controller scan to complete for
all devices before proceeding with the boot sequence, which may have
nvme dependencies.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
Add a new lightnvm quirk to identify CNEX’s Granby controller.
Signed-off-by: Wei Xu <wxu@cnexlabs.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
Add Seagate Nytro Flash Storage nvme drive to quirk list for
NVME_QUIRK_DELAY_BEFORE_CHK_RDY, which solves a bug where the drive is
probed on hot-add before the firmare is ready, I/O errors are generated
while reading sector 0, and linux is "unable to read partition table".
Signed-off-by: Micah Parrish <micah.parrish@hpe.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
Keith reported that command submission and command completion
tracepoints have the order of the cmdid and qid fields swapped.
While it isn't easily possible to change the command submission
tracepoint, as there is a regression test parsing it in blktests we
can swap the command completion tracepoint to have the fields aligned.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
Provide a descriptive error in case an lport to rport association
isn't found when creating the FC-NVME controller.
Currently it's very hard to debug the reason for a failed connect
attempt without a look at the source.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
This reverts commit 37c7c6c76d431dd7ef9c29d95f6052bd425f004c.
Turns out some drivers(most are FC drivers) may not use managed
IRQ affinity, and has their customized .map_queues meantime, so
still keep this code for avoiding regression.
Reported-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Cc: Ewan Milne <emilne@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As it came up in discussion on the mailing list that the semantic
meaning of 'blk_mq_ctx' and 'blk_mq_hw_ctx' isn't completely
obvious to everyone, let's add some minimal kerneldoc for a
starter.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since Michael had to step back, Coly has agreed to be the new
maintainer. Mark him as such.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Too much to do with other projects. I've enjoyed working with everyone
here, and hope to occasionally contribute on bcache.
Signed-off-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
For the MAC read operation, the device can return up to two (LAN and WoL)
MAC addresses. Without access to adequate memory, the device will return
an error. Fixed this by allocating the right amount of memory. Also, logic
to detect and copy the LAN MAC address into the port_info structure has
been added. Note that the WoL MAC address is ignored currently as the WoL
feature isn't supported yet.
Fixes: dc49c7723676 ("ice: Get MAC/PHY/link info and scheduler topology")
Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Debian toolcahin defaults to PIE, and I guess that will also be the case
of most distributions. This causes the following build failure:
AS arch/riscv/kernel/vdso/getcpu.o
AS arch/riscv/kernel/vdso/flush_icache.o
VDSOLD arch/riscv/kernel/vdso/vdso.so.dbg
OBJCOPY arch/riscv/kernel/vdso/vdso.so
AS arch/riscv/kernel/vdso/vdso.o
VDSOLD arch/riscv/kernel/vdso/vdso-dummy.o
LD arch/riscv/kernel/vdso/vdso-syms.o
riscv64-linux-gnu-ld: attempted static link of dynamic object `arch/riscv/kernel/vdso/vdso-dummy.o'
make[2]: *** [arch/riscv/kernel/vdso/Makefile:43: arch/riscv/kernel/vdso/vdso-syms.o] Error 1
make[1]: *** [scripts/Makefile.build:575: arch/riscv/kernel/vdso] Error 2
make: *** [Makefile:1018: arch/riscv/kernel] Error 2
While the root Makefile correctly passes "-fno-PIE" to build individual
object files, the RISC-V kernel also builds vdso-dummy.o as an
executable, which is therefore linked as PIE. Fix that by updating this
specific link rule to also include "-no-pie".
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
So don't list it as generic-y.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
DMA_DIRECT_OPS is defined in lib/Kconfig, so don't duplicate it in
arch/riscv/Kconfig.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
Associate an arbitrary ID with each ARFS filter, allowing to properly query
for expiry. The association is maintained in a hash table, which is
protected by a spinlock.
v3: fix build warnings when CONFIG_RFS_ACCEL is disabled (thanks lkp-robot).
v2: fixed uninitialised variable (thanks davem and lkp-robot).
Fixes: 3af0f34290f6 ("sfc: replace asynchronous filter operations")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While adding support for ethtool::get_fecparam and set_fecparam, kernel
doc for these functions was missed, add those.
Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction modes")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Updates to the bitfields in struct packet_sock are not atomic.
Serialize these read-modify-write cycles.
Move po->running into a separate variable. Its writes are protected by
po->bind_lock (except for one startup case at packet_create). Also
replace a textual precondition warning with lockdep annotation.
All others are set only in packet_setsockopt. Serialize these
updates by holding the socket lock. Analogous to other field updates,
also hold the lock when testing whether a ring is active (pg_vec).
Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg")
Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
Reported-by: Byoungyoung Lee <byoungyoung@purdue.edu>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to the hardware spec, checking the INTEVENT bit isn't a
reliable way to detect if an OICR interrupt has occurred. This is
because this bit can be cleared by the hardware/firmware before the
interrupt service routine has run. So instead, just check for OICR
events every time.
Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt")
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Action type 5 defines large action generic values. Fix comment to
reflect that better.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
ice_sched_add_nodes_to_layer is used recursively, and so we start
with num_nodes_added being 0. This way, in case of an error or if
num_nodes is NULL, the function just returns 0 to indicate that no
nodes were added.
Fixes: 5513b920a4f7 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
When Qav mode is enabled, queue 0 should be kept on Stream Reservation
mode. From the i210 datasheet, section 8.12.19:
"Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to
Qav." ("QueueMode 1b" represents the Stream Reservation mode)
The solution is to give queue 0 the all the credits it might need, so
it has priority over queue 1.
A situation where this can happen is when cbs is "installed" only on
queue 1, leaving queue 0 alone. For example:
$ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
$ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \
hicredit 30 sendslope -980000 idleslope 20000 offload 1
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The current error handling for failed resource setup for xdp_ring
data is a break out of the loop and returning 0 indicated everything
was OK, when in fact it is not. Fix this by exiting via the
error exit label err_setup_tx that will clean up the resources
correctly and return and error status.
Detected by CoverityScan, CID#1466879 ("Logically dead code")
Fixes: 21092e9ce8b1 ("ixgbevf: Add support for XDP_TX action")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The same fix in Commit dbe173079ab5 ("bridge: fix netconsole
setup over bridge") is also needed for team driver.
While at it, remove the unnecessary parameter *team from
team_port_enable_netpoll().
v1->v2:
- fix it in a better way, as does bridge.
Fixes: 0fb52a27a04a ("team: cleanup netpoll clode")
Reported-by: João Avelino Bellomo Filho <jbellomo@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The SFP eeprom indicates the transceiver signals (Rx LOS, Tx Fault, etc.)
that it supports. Update the driver to include checking the eeprom data
when deciding whether to use a transceiver signal.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update xgbe-phy-v2.c to make use of the auto-negotiation (AN) phy hooks
to improve the ability to successfully complete Clause 73 AN when running
at 10gbps. Hardware can sometimes have issues with CDR lock when the
AN DME page exchange is being performed.
The AN and KR training hooks are used as follows:
- The pre AN hook is used to disable CDR tracking in the PHY so that the
DME page exchange can be successfully and consistently completed.
- The post KR training hook is used to re-enable the CDR tracking so that
KR training can successfully complete.
- The post AN hook is used to check for an unsuccessful AN which will
increase a CDR tracking enablement delay (up to a maximum value).
Add two debugfs entries to allow control over use of the CDR tracking
workaround. The debugfs entries allow the CDR tracking workaround to
be disabled and determine whether to re-enable CDR tracking before or
after link training has been initiated.
Also, with these changes the receiver reset cycle that is performed during
the link status check can be performed less often.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add hooks to the driver auto-negotiation (AN) flow to allow the different
phy implementations to perform any steps necessary to improve AN.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We must validate sockaddr_len, otherwise userspace can pass fewer data
than we expect and we end up accessing invalid data.
Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers")
Reported-by: syzbot+4f03bdf92fdf9ef5ddab@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check sockaddr_len before dereferencing sp->sa_protocol, to ensure that
it actually points to valid data.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Reported-by: syzbot+a70ac890b23b1bf29f5c@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If WOL event happened once, the LED[2] interrupt pin will not be
cleared unless we read the CSISR register. If interrupts are in use,
the normal interrupt handling will clear the WOL event. Let's clear the
WOL event before enabling it if !phy_interrupt_is_valid().
Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KMSAN reported use of uninit-value that I tracked to lack
of proper size check on RTA_TABLE attribute.
I also believe RTA_PREFSRC lacks a similar check.
Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config")
Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After Commit 8a8efa22f51b ("bonding: sync netpoll code with bridge"), it
would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev
if bond->dev->npinfo was set.
However now slave_dev npinfo is set with bond->dev->npinfo before calling
slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called
in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup().
It causes that the lower dev of this slave dev can't set its npinfo.
One way to reproduce it:
# modprobe bonding
# brctl addbr br0
# brctl addif br0 eth1
# ifconfig bond0 192.168.122.1/24 up
# ifenslave bond0 eth2
# systemctl restart netconsole
# ifenslave bond0 br0
# ifconfig eth2 down
# systemctl restart netconsole
The netpoll won't really work.
This patch is to remove that slave_dev npinfo setting in bond_enslave().
Fixes: 8a8efa22f51b ("bonding: sync netpoll code with bridge")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The old code reads the "opsize" variable from out-of-bounds memory (first
byte behind the segment) if a broken TCP segment ends directly after an
opcode that is neither EOL nor NOP.
The result of the read isn't used for anything, so the worst thing that
could theoretically happen is a pagefault; and since the physmap is usually
mostly contiguous, even that seems pretty unlikely.
The following C reproducer triggers the uninitialized read - however, you
can't actually see anything happen unless you put something like a
pr_warn() in tcp_parse_md5sig_option() to print the opsize.
====================================
#define _GNU_SOURCE
#include <arpa/inet.h>
#include <stdlib.h>
#include <errno.h>
#include <stdarg.h>
#include <net/if.h>
#include <linux/if.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <linux/in.h>
#include <linux/if_tun.h>
#include <err.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <assert.h>
void systemf(const char *command, ...) {
char *full_command;
va_list ap;
va_start(ap, command);
if (vasprintf(&full_command, command, ap) == -1)
err(1, "vasprintf");
va_end(ap);
printf("systemf: <<<%s>>>\n", full_command);
system(full_command);
}
char *devname;
int tun_alloc(char *name) {
int fd = open("/dev/net/tun", O_RDWR);
if (fd == -1)
err(1, "open tun dev");
static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI };
strcpy(req.ifr_name, name);
if (ioctl(fd, TUNSETIFF, &req))
err(1, "TUNSETIFF");
devname = req.ifr_name;
printf("device name: %s\n", devname);
return fd;
}
#define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24))
void sum_accumulate(unsigned int *sum, void *data, int len) {
assert((len&2)==0);
for (int i=0; i<len/2; i++) {
*sum += ntohs(((unsigned short *)data)[i]);
}
}
unsigned short sum_final(unsigned int sum) {
sum = (sum >> 16) + (sum & 0xffff);
sum = (sum >> 16) + (sum & 0xffff);
return htons(~sum);
}
void fix_ip_sum(struct iphdr *ip) {
unsigned int sum = 0;
sum_accumulate(&sum, ip, sizeof(*ip));
ip->check = sum_final(sum);
}
void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) {
unsigned int sum = 0;
struct {
unsigned int saddr;
unsigned int daddr;
unsigned char pad;
unsigned char proto_num;
unsigned short tcp_len;
} fakehdr = {
.saddr = ip->saddr,
.daddr = ip->daddr,
.proto_num = ip->protocol,
.tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4)
};
sum_accumulate(&sum, &fakehdr, sizeof(fakehdr));
sum_accumulate(&sum, tcp, tcp->doff*4);
tcp->check = sum_final(sum);
}
int main(void) {
int tun_fd = tun_alloc("inject_dev%d");
systemf("ip link set %s up", devname);
systemf("ip addr add 192.168.42.1/24 dev %s", devname);
struct {
struct iphdr ip;
struct tcphdr tcp;
unsigned char tcp_opts[20];
} __attribute__((packed)) syn_packet = {
.ip = {
.ihl = sizeof(struct iphdr)/4,
.version = 4,
.tot_len = htons(sizeof(syn_packet)),
.ttl = 30,
.protocol = IPPROTO_TCP,
/* FIXUP check */
.saddr = IPADDR(192,168,42,2),
.daddr = IPADDR(192,168,42,1)
},
.tcp = {
.source = htons(1),
.dest = htons(1337),
.seq = 0x12345678,
.doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4,
.syn = 1,
.window = htons(64),
.check = 0 /*FIXUP*/
},
.tcp_opts = {
/* INVALID: trailing MD5SIG opcode after NOPs */
1, 1, 1, 1, 1,
1, 1, 1, 1, 1,
1, 1, 1, 1, 1,
1, 1, 1, 1, 19
}
};
fix_ip_sum(&syn_packet.ip);
fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp);
while (1) {
int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet));
if (write_res != sizeof(syn_packet))
err(1, "packet write failed");
}
}
====================================
Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Postpone calling virt_to_page() translation on memory locations not
guaranteed to be backed by a struct page. Try first to map memory from
the device coherent memory pool, then perform translation if that fails.
On some architectures, specifically SH when configured with the SPARSEMEM
memory model, assuming a struct page is always assigned to a memory
address lead to unexpected hangs during the virtual to page address
translation. This patch fixes that specific issue but applies in the
general case too.
Suggested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The use of "correctly mapped" here is misleading, since it can give the
wrong expectation in the case that the memory *should* have been mapped
from the per-device pool, but doing so failed for other reasons.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
When an allocation with lower dma_coherent mask fails, dma_direct_alloc()
retries the allocation with GFP_DMA. But, this is useless for
architectures that hav no ZONE_DMA.
Fix it by adding the check of CONFIG_ZONE_DMA before retrying the
allocation.
Fixes: 95f183916d4b ("dma-direct: retry allocations using GFP_DMA for small masks")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
Avoid using value stored in the login response buffer when
cleaning TX and RX buffer pools since these could be inconsistent
depending on the device state. Instead use the field in the driver's
private data that tracks the number of active pools.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch checks if sk buffer is available to dererence ife header. If
not then NULL will returned to signal an malformed ife packet. This
avoids to crashing the kernel from outside.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Reviewed-by: Yotam Gigi <yotam.gi@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is currently no handling to check on a invalid tlv length. This
patch adds such handling to avoid killing the kernel with a malformed
ife packet.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Reviewed-by: Yotam Gigi <yotam.gi@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to record stats for received metadata that we dont know how
to process. Have find_decode_metaid() return -ENOENT to capture this.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Reviewed-by: Yotam Gigi <yotam.gi@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
struct sock's sk_rcvtimeo is initialized to
LONG_MAX/MAX_SCHEDULE_TIMEOUT in sock_init_data. Calling
mod_delayed_work with a timeout of LONG_MAX causes spurious execution of
the work function. timer->expires is set equal to jiffies + LONG_MAX.
When timer_base->clk falls behind the current value of jiffies,
the delta between timer_base->clk and jiffies + LONG_MAX causes the
expiration to be in the past. Returning early from strp_start_timer if
timeo == LONG_MAX solves this problem.
Found while testing net/tls_sw recv path.
Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src()
in order to set the src addr of outer IPv6 header.
The net_device is required for set_tun_src(). However calling ip6_dst_idev()
on dst_entry in case of IPv4 traffic results on the following bug.
Using just dst->dev should fix this BUG.
[ 196.242461] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 196.242975] PGD 800000010f076067 P4D 800000010f076067 PUD 10f060067 PMD 0
[ 196.243329] Oops: 0000 [#1] SMP PTI
[ 196.243468] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd input_leds glue_helper led_class pcspkr serio_raw mac_hid video autofs4 hid_generic usbhid hid e1000 i2c_piix4 ahci pata_acpi libahci
[ 196.244362] CPU: 2 PID: 1089 Comm: ping Not tainted 4.16.0+ #1
[ 196.244606] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 196.244968] RIP: 0010:seg6_do_srh_encap+0x1ac/0x300
[ 196.245236] RSP: 0018:ffffb2ce00b23a60 EFLAGS: 00010202
[ 196.245464] RAX: 0000000000000000 RBX: ffff8c7f53eea300 RCX: 0000000000000000
[ 196.245742] RDX: 0000f10000000000 RSI: ffff8c7f52085a6c RDI: ffff8c7f41166850
[ 196.246018] RBP: ffffb2ce00b23aa8 R08: 00000000000261e0 R09: ffff8c7f41166800
[ 196.246294] R10: ffffdce5040ac780 R11: ffff8c7f41166828 R12: ffff8c7f41166808
[ 196.246570] R13: ffff8c7f52085a44 R14: ffffffffb73211c0 R15: ffff8c7e69e44200
[ 196.246846] FS: 00007fc448789700(0000) GS:ffff8c7f59d00000(0000) knlGS:0000000000000000
[ 196.247286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 196.247526] CR2: 0000000000000000 CR3: 000000010f05a000 CR4: 00000000000406e0
[ 196.247804] Call Trace:
[ 196.247972] seg6_do_srh+0x15b/0x1c0
[ 196.248156] seg6_output+0x3c/0x220
[ 196.248341] ? prandom_u32+0x14/0x20
[ 196.248526] ? ip_idents_reserve+0x6c/0x80
[ 196.248723] ? __ip_select_ident+0x90/0x100
[ 196.248923] ? ip_append_data.part.50+0x6c/0xd0
[ 196.249133] lwtunnel_output+0x44/0x70
[ 196.249328] ip_send_skb+0x15/0x40
[ 196.249515] raw_sendmsg+0x8c3/0xac0
[ 196.249701] ? _copy_from_user+0x2e/0x60
[ 196.249897] ? rw_copy_check_uvector+0x53/0x110
[ 196.250106] ? _copy_from_user+0x2e/0x60
[ 196.250299] ? copy_msghdr_from_user+0xce/0x140
[ 196.250508] sock_sendmsg+0x36/0x40
[ 196.250690] ___sys_sendmsg+0x292/0x2a0
[ 196.250881] ? _cond_resched+0x15/0x30
[ 196.251074] ? copy_termios+0x1e/0x70
[ 196.251261] ? _copy_to_user+0x22/0x30
[ 196.251575] ? tty_mode_ioctl+0x1c3/0x4e0
[ 196.251782] ? _cond_resched+0x15/0x30
[ 196.251972] ? mutex_lock+0xe/0x30
[ 196.252152] ? vvar_fault+0xd2/0x110
[ 196.252337] ? __do_fault+0x1f/0xc0
[ 196.252521] ? __handle_mm_fault+0xc1f/0x12d0
[ 196.252727] ? __sys_sendmsg+0x63/0xa0
[ 196.252919] __sys_sendmsg+0x63/0xa0
[ 196.253107] do_syscall_64+0x72/0x200
[ 196.253305] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 196.253530] RIP: 0033:0x7fc4480b0690
[ 196.253715] RSP: 002b:00007ffde9f252f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 196.254053] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007fc4480b0690
[ 196.254331] RDX: 0000000000000000 RSI: 000000000060a360 RDI: 0000000000000003
[ 196.254608] RBP: 00007ffde9f253f0 R08: 00000000002d1e81 R09: 0000000000000002
[ 196.254884] R10: 00007ffde9f250c0 R11: 0000000000000246 R12: 0000000000b22070
[ 196.255205] R13: 20c49ba5e353f7cf R14: 431bde82d7b634db R15: 00007ffde9f278fe
[ 196.255484] Code: a5 0f b6 45 c0 41 88 41 28 41 0f b6 41 2c 48 c1 e0 04 49 8b 54 01 38 49 8b 44 01 30 49 89 51 20 49 89 41 18 48 8b 83 b0 00 00 00 <48> 8b 30 49 8b 86 08 0b 00 00 48 8b 40 20 48 8b 50 08 48 0b 10
[ 196.256190] RIP: seg6_do_srh_encap+0x1ac/0x300 RSP: ffffb2ce00b23a60
[ 196.256445] CR2: 0000000000000000
[ 196.256676] ---[ end trace 71af7d093603885c ]---
Fixes: 8936ef7604c11 ("ipv6: sr: fix NULL pointer dereference when setting encap source address")
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|