Age | Commit message (Collapse) | Author | Files | Lines |
|
Replace C structure-based XDR decoding with more portable code that
instead uses pointer arithmetic.
This is a refactoring change only.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Clean up: extract the logic to save pages under I/O into a helper to
add a big documenting comment without adding clutter in the send
path.
This is a refactoring change only.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The Send Queue depth is temporarily reduced to 1 SQE per credit. The
new rdma_rw API does an internal computation, during QP creation, to
increase the depth of the Send Queue to handle RDMA Read and Write
operations.
This change has to come before the NFSD code paths are updated to
use the rdma_rw API. Without this patch, rdma_rw_init_qp() increases
the size of the SQ too much, resulting in memory allocation failures
during QP creation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Introduce a helper to DMA-map a reply's transport header before
sending it. This will in part replace the map vector cache.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Clean up: Move the ib_send_wr off the stack, and move common code
to post a Send Work Request into a helper.
This is a refactoring change only.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
consider the sequence of commands:
mkdir -p /import/nfs /import/bind /import/etc
mount --bind / /import/bind
mount --make-private /import/bind
mount --bind /import/etc /import/bind/etc
exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/
mount -o vers=4 localhost:/ /import/nfs
ls -l /import/nfs/etc
You would not expect this to report a stale file handle.
Yet it does.
The manipulations under /import/bind cause the dentry for
/etc to get the DCACHE_MOUNTED flag set, even though nothing
is mounted on /etc. This causes nfsd to call
nfsd_cross_mnt() even though there is no mountpoint. So an
upcall to mountd for "/etc" is performed.
The 'crossmnt' flag on the export of / causes mountd to
report that /etc is exported as it is a descendant of /. It
assumes the kernel wouldn't ask about something that wasn't
a mountpoint. The filehandle returned identifies the
filesystem and the inode number of /etc.
When this filehandle is presented to rpc.mountd, via
"nfsd.fh", the inode cannot be found associated with any
name in /etc/exports, or with any mountpoint listed by
getmntent(). So rpc.mountd says the filehandle doesn't
exist. Hence ESTALE.
This is fixed by teaching nfsd not to trust DCACHE_MOUNTED
too much. It is just a hint, not a guarantee.
Change nfsd_mountpoint() to return '1' for a certain mountpoint,
'2' for a possible mountpoint, and 0 otherwise.
Then change nfsd_crossmnt() to check if follow_down()
actually found a mountpount and, if not, to avoid performing
a lookup if the location is not known to certainly require
an export-point.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
kstrdup() already checks for NULL.
(Brought to our attention by Jason Yann noticing (from sparse output)
that it should have been declared static.)
Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Include <linux/types.h> and consistently use types it provides
to fix the following linux/nfsd/cld.h userspace compilation errors:
/usr/include/linux/nfsd/cld.h:40:2: error: unknown type name 'uint16_t'
uint16_t cn_len; /* length of cm_id */
/usr/include/linux/nfsd/cld.h:46:2: error: unknown type name 'uint8_t'
uint8_t cm_vers; /* upcall version */
/usr/include/linux/nfsd/cld.h:47:2: error: unknown type name 'uint8_t'
uint8_t cm_cmd; /* upcall command */
/usr/include/linux/nfsd/cld.h:48:2: error: unknown type name 'int16_t'
int16_t cm_status; /* return code */
/usr/include/linux/nfsd/cld.h:49:2: error: unknown type name 'uint32_t'
uint32_t cm_xid; /* transaction id */
/usr/include/linux/nfsd/cld.h:51:3: error: unknown type name 'int64_t'
int64_t cm_gracetime; /* grace period start time */
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.
Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.
So, insist that the argument not be any longer than we expect.
Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.
As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer. This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative. Add
checks to catch these.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Use a couple shortcuts that will simplify a following bugfix.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.
Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.
Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.
So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.
As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.
We may also consider rejecting calls that have any extra garbage
appended. That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
otherwise skbs with queue_mapping greater than real_num_tx_queues
can be sent to the underlying driver and can result in kernel panic.
One such event is running netconsole and enabling VF on the same
device. Or running netconsole and changing number of tx queues via
ethtool on same device.
e.g.
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 0000000000001525
tsk->{mm,active_mm}->pgd = fff800130ff9a000
\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
kworker/48:1(475): Oops [#1]
CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G OE
4.11.0-rc3-davem-net+ #7
Workqueue: events queue_process
task: fff80013113299c0 task.stack: fff800131132c000
TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y:
00000000 Tainted: G OE
TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]>
g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3:
0000000000000001
g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
00000000000000c0
o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
0000000000000003
o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
000000000049ed94
RPC: <set_next_entity+0x34/0xb80>
l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3:
0000000000000000
l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7:
fff8001fa7605028
i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3:
0000000000000000
i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7:
00000000103fa4b0
I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]>
Call Trace:
[00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
[0000000000998c74] netpoll_start_xmit+0xf4/0x200
[0000000000998e10] queue_process+0x90/0x160
[0000000000485fa8] process_one_work+0x188/0x480
[0000000000486410] worker_thread+0x170/0x4c0
[000000000048c6b8] kthread+0xd8/0x120
[0000000000406064] ret_from_fork+0x1c/0x2c
[0000000000000000] (null)
Disabling lock debugging due to kernel taint
Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200
Caller[0000000000998e10]: queue_process+0x90/0x160
Caller[0000000000485fa8]: process_one_work+0x188/0x480
Caller[0000000000486410]: worker_thread+0x170/0x4c0
Caller[000000000048c6b8]: kthread+0xd8/0x120
Caller[0000000000406064]: ret_from_fork+0x1c/0x2c
Caller[0000000000000000]: (null)
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
because we call unregister_netdevice_many for a device that is already
being destroyed. In IPv4's ipmr that has been resolved by two commits
long time ago by introducing the "notify" parameter to the delete
function and avoiding the unregister when called from a notifier, so
let's do the same for ip6mr.
The trace from Andrey:
------------[ cut here ]------------
kernel BUG at net/core/dev.c:6813!
invalid opcode: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: netns cleanup_net
task: ffff880069208000 task.stack: ffff8800692d8000
RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
FS: 0000000000000000(0000) GS:ffff88006cb00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
Call Trace:
unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
__raw_notifier_call_chain kernel/notifier.c:394
raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
call_netdevice_notifiers net/core/dev.c:1663
rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
unregister_netdevice_many net/core/dev.c:7880
default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
kthread+0x35e/0x430 kernel/kthread.c:231
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
---[ end trace e0b29c57e9b3292c ]---
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add various related files that have been missing under
BPF entry covering essential parts of its infrastructure
and also add myself as co-maintainer.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If skb_pad() fails then it frees the skb so we should check for errors.
Fixes: bdabad3e363d ("net: Add Qualcomm IPC router")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Maps of per-cpu type have their value element size adjusted to 8 if it
is specified smaller during various map operations.
This makes test_maps as a 32-bit binary fail, in fact the kernel
writes past the end of the value's array on the user's stack.
To be quite honest, I think the kernel should reject creation of a
per-cpu map that doesn't have a value size of at least 8 if that's
what the kernel is going to silently adjust to later.
If the user passed something smaller, it is a sizeof() calcualtion
based upon the type they will actually use (just like in this testcase
code) in later calls to the map operations.
Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
Andrey reported a fault in the IPv6 route code:
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880069809600 task.stack: ffff880062dc8000
RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
RSP: 0018:ffff880062dced30 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
Call Trace:
ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
...
Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
set. Flags passed to the kernel are blindly copied to the allocated
rt6_info by ip6_route_info_create making a newly inserted route appear
as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
and expects rt->dst.from to be set - which it is not since it is not
really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
generates the fault.
Fix by checking for the flag and failing with EINVAL.
Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
pointer") assumes that all SKBs in a frag_list (except maybe the last
one) contain the same amount of GSO payload.
This assumption is not always correct, resulting in the following
warning message in the log:
skb_segment: too many frags
For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
one frag, and some with 2 frags.
After GRO, the frag_list SKBs end up having different amounts of payload.
If this frag_list SKB is then forwarded, the aforementioned assumption
is violated.
Validate the assumption, and fall back to software GSO if it not true.
Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212
Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can use skb_cow_head() to properly deal with clones,
especially the ones coming from TCP stack that allow their head being
modified. This avoids a copy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: cc28a20e77b2 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The icmpv6_param_prob() function already does a kfree_skb(),
this patch removes the duplicate one.
Fixes: 1ababeba4a21f3dba3da3523c670b207fb2feb62 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We want people to report bugs to the netdev list.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 6afcf8ef0ca0 ("mm, compaction: fix NR_ISOLATED_* stats for pfn
based migration") moved the dec_node_page_state() call (along with the
page_is_file_cache() call) to after putback_lru_page().
But page_is_file_cache() can change after putback_lru_page() is called,
so it should be called before putback_lru_page(), as it was before that
patch, to prevent NR_ISOLATE_* stats from going negative.
Without this fix, non-CONFIG_SMP kernels end up hanging in the
while(too_many_isolated()) { congestion_wait() } loop in
shrink_active_list() due to the negative stats.
Mem-Info:
active_anon:32567 inactive_anon:121 isolated_anon:1
active_file:6066 inactive_file:6639 isolated_file:4294967295
^^^^^^^^^^
unevictable:0 dirty:115 writeback:0 unstable:0
slab_reclaimable:2086 slab_unreclaimable:3167
mapped:3398 shmem:18366 pagetables:1145 bounce:0
free:1798 free_pcp:13 free_cma:0
Fixes: 6afcf8ef0ca0 ("mm, compaction: fix NR_ISOLATED_* stats for pfn based migration")
Link: http://lkml.kernel.org/r/1492683865-27549-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ming Ling <ming.ling@spreadtrum.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit 374ad05ab64.
While the patch worked great for userspace allocations, the fact that
softirq loses the per-cpu allocator caused problems. It needs to be
redone taking into account that a separate list is needed for hard/soft
IRQs or alternatively find a cheap way of detecting reentry due to an
interrupt. Both are possible but sufficiently tricky that it shouldn't
be rushed.
Jesper had one method for allowing softirqs but reported that the cost
was high enough that it performed similarly to a plain revert. His
figures for netperf TCP_STREAM were as follows
Baseline v4.10.0 : 60316 Mbit/s
Current 4.11.0-rc6: 47491 Mbit/s
Jesper's patch : 60662 Mbit/s
This patch : 60106 Mbit/s
As this is a regression, I wish to revert to noirq allocator for now and
go back to the drawing board.
Link: http://lkml.kernel.org/r/20170415145350.ixy7vtrzdzve57mh@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Tariq Toukan <ttoukan.linux@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If we have a scheduler attached, blk_mq_tag_to_rq() on the
scheduled tags will return NULL if a request is no longer
in flight. This is different than using the normal tags,
where it will always return the fixed request. Check for
this condition for polling, in case we happen to enter
polling for a completed request.
The request address remains valid, so this check and return
should be perfectly safe.
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Tested-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
There's a report that it malfunctions with APST on.
See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
I got a couple more reports: the Samsung APST issues appears to
affect multiple 950-series devices in Dell XPS 15 9550 and Precision
5510 laptops. Change the quirk: rather than blacklisting the
firmware on the first problematic SSD that was reported, disable
APST on all 144d:a802 devices if they're installed in the two
affected Dell models. While we're at it, disable only the deepest
sleep state instead of all of them -- the reporters say that this is
sufficient to fix the problem.
(I have a device that appears to be entirely identical to one of the
affected devices, but I have a different Dell laptop, so it's not
the case that all Samsung devices with firmware BXW75D0Q are broken
under all circumstances.)
Samsung engineers have an affected system, and hopefully they'll
give us a better workaround some time soon. In the mean time, this
should minimize regressions.
See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Policing filters do not use the TCA_ACT_* enum and the tb[]
nlattr array in tcf_action_init_1() doesn't get filled for
them so we should not try to look for a TCA_ACT_COOKIE
attribute in the then uninitialized array.
The error handling in cookie allocation then calls
tcf_hash_release() leading to invalid memory access later
on.
Additionally, if cookie allocation fails after an already
existing non-policing filter has successfully been changed,
tcf_action_release() should not be called, also we would
have to roll back the changes in the error handling, so
instead we now allocate the cookie early and assign it on
success at the end.
CVE-2017-7979
Fixes: 1045ba77a596 ("net sched actions: Add support for user cookies")
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change ieee_setpfc() callback implementation to populate traffic class
count with the user provided value.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
qed_dcbnl_get_dcbx() API uses kmalloc in GFT_KERNEL mode. The API gets
invoked in the interrupt context by qed_dcbnl_getdcbx callback. Need
to invoke this kmalloc in atomic mode.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PFC error-mask value is not supported by MFW, but this bit could be
set in the pfc bit-map of the operational parameters if remote device
supports it. These operational parameters are used as basis for
populating the dcbx config parameters. User provided configs will be
applied on top of these parameters and then send them to MFW when
requested. Driver need to clear the error-mask bit before sending the
config parameters to MFW.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some adapters may not publish the max_tc value. Populate the default
value for max_tc field in case the mfw didn't provide one.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver was failing to check that the SKB wasn't cloned
before adding checksum data.
Replace existing handling to extend/copy the header buffer
with skb_cow_head.
Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Mugunthan V N, who was reviewing TI's CPSW driver patches is
not working for TI anymore and wont be reviewing patches for
that driver.
Drop Mugunthan as the maintiainer for this driver.
Grygorii continues to be a reviewer. Dave Miller applies the
patches directly and adding a maintainer is actually
misleading since get_maintainer.pl script stops suggesting
that Dave Miller be copied.
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch is prompted by a static checker warning about a potential
use after free. The concern is that netif_rx_ni() can free "skb" and we
call it twice.
When I look at the commit that added this, it looks like some stray
lines were added accidentally. It doesn't make sense to me that we
would recieve the same data two times. I asked the author but never
recieved a response.
I can't test this code, but I'm pretty sure my patch is correct.
Fixes: 4b063258ab93 ("dp83640: Delay scheduled work.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes an out-of-bounds access in seg6_validate_srh() when the
trailing data is less than sizeof(struct sr6_tlv).
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'psock_fanout' has been failing since commit 4d7b9dc1f36a9 ("tools:
psock_lib: harden socket filter used by psock tests"). That commit
changed the CBPF filter to examine the full ethernet frame, and was
tested on 'psock_tpacket' which uses SOCK_RAW. But 'psock_fanout' was
also using this same CBPF in two places, for filtering and fanout, on a
SOCK_DGRAM socket.
Change 'psock_fanout' to use SOCK_RAW so that the CBPF program used with
SO_ATTACH_FILTER can examine the entire frame. Create a new CBPF
program for use with PACKET_FANOUT_DATA which ignores the header, as it
cannot see the ethernet header.
Tested: Ran tools/testing/selftests/net/psock_{fanout,tpacket} 10 times,
and they all passed.
Fixes: 4d7b9dc1f36a9 ("tools: psock_lib: harden socket filter used by psock tests")
Signed-off-by: 'Mike Maloney <maloneykernel@gmail.com>'
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
AP/AP_VLAN modes don't accept any real 802.11 multicast data
frames, but since they do need to accept broadcast management
frames the same is currently permitted for data frames. This
opens a security problem because such frames would be decrypted
with the GTK, and could even contain unicast L3 frames.
Since the spec says that ToDS frames must always have the BSSID
as the RA (addr1), reject any other data frames.
The problem was originally reported in "Predicting, Decrypting,
and Abusing WPA2/802.11 Group Keys" at usenix
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/vanhoef
and brought to my attention by Jouni.
Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
--
Dave, I didn't want to send you a new pull request for a single
commit yet again - can you apply this one patch as is?
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The vectors_per_node is calculated from the remaining available vectors.
The current vector starts after pre_vectors, so we need to subtract that
from the current to properly account for the number of remaining vectors
to assign.
Fixes: 3412386b531 ("irq/affinity: Fix extra vecs calculation")
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Link: http://lkml.kernel.org/r/1492645870-13019-1-git-send-email-keith.busch@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Currently for DDR50 card, it need tuning in default. We meet tuning fail
issue for DDR50 card and some data CRC error when DDR50 sd card works.
This is because the default pad I/O drive strength can't make sure DDR50
card work stable. So increase the pad I/O drive strength for DDR50 card,
and use pins_100mhz.
This fixes DDR50 card support for IMX since DDR50 tuning was enabled from
commit 9faac7b95ea4 ("mmc: sdhci: enable tuning for DDR50")
Tested-and-reported-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Cc: stable@vger.kernel.org # v4.4+
Acked-by: Dong Aisheng <aisheng.dong@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
It apears that devices designed around Wacom's G11 chipset (e.g. Lenovo
ThinkPad Yoga 260, Lenovo ThinkPad X1 Yoga, Dell XPS 12 9250, Dell Venue
8 Pro 5855, etc.) suffer from a common issue in their HID descriptors.
The logical maximum is not updated for the "Contact Identifier" usage,
leaving it as just "1" despite these devices being capable of tracking
far more touches.
Commit 60a221869803 began ignoring usages with out-of-range values,
causing problems for devices based on this chipset. Touches after
the first will have an out-of-range Contact Identifier, and ignoring
that usage will cause the kernel to incorrectly slot each finger's
events (along with all the knock-on userspace effects that entails).
This commit checks for these buggy descriptors and updates the maximum
where required. Prior chipsets have used "255" as the maximum (and the
G11, at least, doesn't seem to actually use IDs outside the range of
1..CONTACTMAX) so continue using this value.
Cc: stable@vger.kernel.org
Fixes: 60a221869803 ("HID: wacom: generic: add support for touchring")
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
I noticed that reading the snapshot file when it is empty no longer gives a
status. It suppose to show the status of the snapshot buffer as well as how
to allocate and use it. For example:
># cat snapshot
# tracer: nop
#
#
# * Snapshot is allocated *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')
But instead it just showed an empty buffer:
># cat snapshot
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
What happened was that it was using the ring_buffer_iter_empty() function to
see if it was empty, and if it was, it showed the status. But that function
was returning false when it was empty. The reason was that the iter header
page was on the reader page, and the reader page was empty, but so was the
buffer itself. The check only tested to see if the iter was on the commit
page, but the commit page was no longer pointing to the reader page, but as
all pages were empty, the buffer is also.
Cc: stable@vger.kernel.org
Fixes: 651e22f2701b ("ring-buffer: Always reset iterator to reader page")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Andrey reported a use-after-free in __ns_get_path():
spin_lock include/linux/spinlock.h:299 [inline]
lockref_get_not_dead+0x19/0x80 lib/lockref.c:179
__ns_get_path+0x197/0x860 fs/nsfs.c:66
open_related_ns+0xda/0x200 fs/nsfs.c:143
sock_ioctl+0x39d/0x440 net/socket.c:1001
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
We are under rcu read lock protection at that point:
rcu_read_lock();
d = atomic_long_read(&ns->stashed);
if (!d)
goto slow;
dentry = (struct dentry *)d;
if (!lockref_get_not_dead(&dentry->d_lockref))
goto slow;
rcu_read_unlock();
but don't use a proper RCU API on the free path, therefore a parallel
__d_free() could free it at the same time. We need to mark the stashed
dentry with DCACHE_RCUACCESS so that __d_free() will be called after all
readers leave RCU.
Fixes: e149ed2b805f ("take the targets of /proc/*/ns/* symlinks to separate fs")
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Geert has reported a freeze during PM resume and some additional
debugging has shown that the device_resume worker cannot make a forward
progress because it waits for an event which is stuck waiting in
drain_all_pages:
INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
Not tainted 4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u4:0 D 0 5 2 0x00000000
Workqueue: events_unbound async_run_entry_fn
__schedule
schedule
schedule_timeout
wait_for_common
dpm_wait_for_superior
device_resume
async_resume
async_run_entry_fn
process_one_work
worker_thread
kthread
[...]
bash D 0 1703 1694 0x00000000
__schedule
schedule
schedule_timeout
wait_for_common
flush_work
drain_all_pages
start_isolate_page_range
alloc_contig_range
cma_alloc
__alloc_from_contiguous
cma_allocator_alloc
__dma_alloc
arm_dma_alloc
sh_eth_ring_init
sh_eth_open
sh_eth_resume
dpm_run_callback
device_resume
dpm_resume
dpm_resume_end
suspend_devices_and_enter
pm_suspend
state_store
kernfs_fop_write
__vfs_write
vfs_write
SyS_write
[...]
Showing busy workqueues and worker pools:
[...]
workqueue mm_percpu_wq: flags=0xc
pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0
delayed: drain_local_pages_wq, vmstat_update
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=0/0
delayed: drain_local_pages_wq BAR(1703), vmstat_update
Tetsuo has properly noted that mm_percpu_wq is created as WQ_FREEZABLE
so it is frozen this early during resume so we are effectively
deadlocked. Fix this by dropping WQ_FREEZABLE when creating
mm_percpu_wq. We really want to have it operational all the time.
Fixes: ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq")
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|