Age | Commit message (Collapse) | Author | Files | Lines |
|
In theory we could map other things, but there's a reason that function
is called "user_iov". Using anything else (like splice can do) just
confuses it.
Reported-and-tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We accidentally introduced a dereference before the NULL check in
xmit_descs() as part of silencing a GCC warning.
Fixes: 16f46050e709 ("dbri: Fix compiler warning")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The shmem hole punching with fallocate(FALLOC_FL_PUNCH_HOLE) does not
want to race with generating new pages by faulting them in.
However, the wait-queue used to delay the page faulting has a serious
problem: the wait queue head (in shmem_fallocate()) is allocated on the
stack, and the code expects that "wake_up_all()" will make sure that all
the queue entries are gone before the stack frame is de-allocated.
And that is not at all necessarily the case.
Yes, a normal wake-up sequence will remove the wait-queue entry that
caused the wakeup (see "autoremove_wake_function()"), but the key
wording there is "that caused the wakeup". When there are multiple
possible wakeup sources, the wait queue entry may well stay around.
And _particularly_ in a page fault path, we may be faulting in new pages
from user space while we also have other things going on, and there may
well be other pending wakeups.
So despite the "wake_up_all()", it's not at all guaranteed that all list
entries are removed from the wait queue head on the stack.
Fix this by introducing a new wakeup function that removes the list
entry unconditionally, even if the target process had already woken up
for other reasons. Use that "synchronous" function to set up the
waiters in shmem_fault().
This problem has never been seen in the wild afaik, but Dave Jones has
reported it on and off while running trinity. We thought we fixed the
stack corruption with the blk-mq rq_list locking fix (commit
7fe311302f7d: "blk-mq: update hardware and software queues for sleeping
alloc"), but it turns out there was _another_ stack corruptor hiding
in the trinity runs.
Vegard Nossum (also running trinity) was able to trigger this one fairly
consistently, and made us look once again at the shmem code due to the
faults often being in that area.
Reported-and-tested-by: Vegard Nossum <vegard.nossum@oracle.com>.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When using the negative logic (i.e. FLUSH state), after the RQ/SQ reopen
we will have a time interval that the RQ/SQ is not really ready and the
state indicates that its not in FLUSH state because the initial SQ/RQ struct
memory starts as zeros.
Now we changed the state to indicate if the SQ/RQ is opened and we will
set the READY state after finishing preparing all the SQ/RQ resources.
Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close")
Fixes: f2fde18c52a7 ("net/mlx5e: Don't wait for RQ completions on close")
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are doing SQ descriptors cleanup in driver.
Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to do this a couple of steps ahead anyway.
Fixes: d3c9bc2743dc ("net/mlx5e: Added ICO SQs")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In old FWs query ISSI command is not supported and for some of those FWs
it might fail with status other than "MLX5_CMD_STAT_BAD_OP_ERR".
In such case instead of failing the driver load, we will treat any FW
status other than 0 for Query ISSI FW command as ISSI not supported and
assume ISSI=0 (most basic driver/FW interface).
In case of driver syndrom (query ISSI failure by driver) we will fail
driver load.
Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support ConnectX-4
Ethernet functionality')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove duplicate pci dev name printing from mlx5_core_warn/dbg.
Fixes: 5a7883989b1c ('net/mlx5_core: Improve mlx5 messages')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Verify the mlx5_core module parameters by making sure that they are in
the expected range and if they aren't restore them to their default
values.
Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 80cca775cdc4 ("net: fec: cache statistics while device is down")
introduced unconditional statistics-related actions.
However, when driver is compiled with CONFIG_M5272, staticsics-related
definitions do not exist, which results into build errors.
Fix that by adding explicit handling of !defined(CONFIG_M5272) case.
Fixes: 80cca775cdc4 ("net: fec: cache statistics while device is down")
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
OPCODE_COMMON_GET_FN_PRIVILEGES is returning only DEVSEC
privilege (Unrestricted Administrative Privilege) for Lancer NIC functions.
So, driver is failing SET_HSW_CONFIG command, as DEVSEC privilege was not
set in the privilege bitmap. This patch fixes the problem by setting DEVSEC
privilege in SET_HSW_CONFIG’s privilege bitmap.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
pointer to the stack and it will OOPS. Copy the address to the heap
to prevent the crash.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Laura Abbott <labbott@redhat.com>
Reported-by: zbyszek@in.waw.pl
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There have been some reports lately about TCP connection stalls caused
by NIC drivers that aren't setting gso_size on aggregated packets on rx
path. This causes TCP to assume that the MSS is actually the size of the
aggregated packet, which is invalid.
Although the proper fix is to be done at each driver, it's often hard
and cumbersome for one to debug, come to such root cause and report/fix
it.
This patch amends this situation in two ways. First, it adds a warning
on when this situation occurs, so it gives a hint to those trying to
debug this. It also limit the maximum probed MSS to the adverised MSS,
as it should never be any higher than that.
The result is that the connection may not have the best performance ever
but it shouldn't stall, and the admin will have a hint on what to look
for.
Tested with virtio by forcing gso_size to 0.
v2: updated msg per David's suggestion
v3: use skb_iif to find the interface and also log its name, per Eric
Dumazet's suggestion. As the skb may be backlogged and the interface
gone by then, we need to check if the number still has a meaning.
v4: use helper tcp_gro_dev_warn() and avoid pr_warn_once inside __once, per
David's suggestion
Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix a wrong condition preventing the higher net device flags
IFF_LOWER_UP etc to be defined if net/if.h is included before
linux/if.h.
The comment makes it clear the intention was to allow partial
definition with either parts.
This fixes compilation of userspace programs trying to use
IFF_LOWER_UP, IFF_DORMANT or IFF_ECHO.
Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
WR_OSR_LMT and RD_OSR_LMT have a reset value of 1.
Since the reset value wasn't cleared before writing, the value in the
register would be incorrect if specifying an uneven value for
snps,wr_osr_lmt/snps,rd_osr_lmt.
Zero is a valid value for the properties, since the databook specifies:
maximum outstanding requests = WR_OSR_LMT + 1.
We do not want to change the behavior for existing users when the
property is missing. Therefore, default to 1 if the property is missing,
since that is the same as the reset value.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is wrong to schedule a work from sk_destruct using the socket
as the memory reserve because the socket will be freed immediately
after the return from sk_destruct.
Instead we should do the deferral prior to sk_free.
This patch does just that.
Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
File is in uapi directory but not being copied on
make install_headers
Fixes commit 4ec9c8fbbc22 ("netfilter: nft_log: complete
NFTA_LOG_FLAGS attr support").
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes commit 735cffe5d800 ("net_sched: Introduce skbmod action")
Not used by iproute2 but maybe in future.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we unload the ep93xx_eth, whether we have opened the network
interface or not, we will either hit a kernel paging request error, or a
simple NULL pointer de-reference because:
- if ep93xx_open has been called, we have created a valid DMA mapping
for ep->descs, when we call ep93xx_stop, we also call
ep93xx_free_buffers, ep->descs now has a stale value
- if ep93xx_open has not been called, we have a NULL pointer for
ep->descs, so performing any operation against that address just won't
work
Fix this by adding a NULL pointer check for ep->descs which means that
ep93xx_free_buffers() was able to successfully tear down the descriptors
and free the DMA cookie as well.
Fixes: 1d22e05df818 ("[PATCH] Cirrus Logic ep93xx ethernet driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Only the 578xx adapters are capable of configuring UDP ports for
the purpose of tunnelling - doing the same on 577xx might lead to
a firmware assertion.
We're already not claiming support for any related feature for such
devices, but we also need to prevent the configuration of the UDP
ports to the device in this case.
Fixes: f34fa14cc033 ("bnx2x: Add vxlan RSS support")
Reported-by: Anikina Anna <anikina@gmail.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Until interface is up [and assuming ringparams weren't explicitly
configured] when queried for the size of its rings bnx2x would
claim they're the maximal size by default.
That is incorrect as by default the maximal number of buffers would
be equally divided between the various rx rings.
This prevents the user from actually setting the number of elements
on each rx ring to be of maximal size prior to transitioning the
interface into up state.
To fix this, make a rough estimation about the number of buffers.
It wouldn't always be accurate, but it would be much better than
current estimation and would allow users to increase number of
buffers during early initialization of the interface.
Reported-by: Seymour, Shane <shane.seymour@hpe.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In function hfc4s8s_probe(), the value of return variable err should be
negative on failures. However, when the call to request_region() returns
NULL, the value of err is 0. This patch fixes the bug, assigning
"-EBUSY" to err on the path that request_region() fails.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188931
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Macro BNX2X_ALLOC_AND_SET(arr, lbl, func) calls kmalloc() to allocate
memory, and jumps to label "lbl" if the allocation fails. Label "lbl"
first cleans memory and then returns variable rc. Before calling the
macro, the value of variable rc is 0. Because 0 means no error, the
callers of bnx2x_init_firmware() may be misled. This patch fixes the bug,
assigning "-ENOMEM" to rc before calling macro NX2X_ALLOC_AND_SET().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189141
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When calling dma_mapping_error(), the value of return variable rc is 0.
And when the call returns an unexpected value, rc is not set to a
negative errno. Thus, it will return 0 on the error path, and its
callers cannot detect the bug. This patch fixes the bug, assigning
"-ENOMEM" to err.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189041
Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It returns variable "error" when ioremap_nocache() returns a NULL
pointer. The value of "error" is 0 then, which will mislead the callers
to believe that there is no error. This patch fixes the bug, returning
"-ENOMEM".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189021
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the calls to kzalloc() fail, the value of return variable ret may
be 0. 0 means success in this context. This patch fixes the bug,
assigning "-ENOMEM" to ret before calling kzalloc().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188971
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The check of the return value of sock_register() is ineffective.
"if(!err)" seems to be a typo. It is better to propagate the error code
to the callers of caif_sktinit_module(). This patch removes the check
statment and directly returns the result of sock_register().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188751
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prior to commit c0371da6047a ("put iov_iter into msghdr") in v3.19, there
was no check that the iovec contained enough bytes for an ICMP header,
and the read loop would walk across neighboring stack contents. Since the
iov_iter conversion, bad arguments are noticed, but the returned error is
EFAULT. Returning EINVAL is a clearer error and also solves the problem
prior to v3.19.
This was found using trinity with KASAN on v3.18:
BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0
Read of size 8 by task trinity-c2/9623
page:ffffffbe034b9a08 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G BU 3.18.0-dirty #15
Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT)
Call trace:
[<ffffffc000209c98>] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90
[<ffffffc000209e54>] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffc000f18dc4>] dump_stack+0x7c/0xd0 lib/dump_stack.c:50
[< inline >] print_address_description mm/kasan/report.c:147
[< inline >] kasan_report_error mm/kasan/report.c:236
[<ffffffc000373dcc>] kasan_report+0x380/0x4b8 mm/kasan/report.c:259
[< inline >] check_memory_region mm/kasan/kasan.c:264
[<ffffffc00037352c>] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507
[<ffffffc0005b9624>] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15
[< inline >] memcpy_from_msg include/linux/skbuff.h:2667
[<ffffffc000ddeba0>] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674
[<ffffffc000dded30>] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714
[<ffffffc000dc91dc>] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749
[< inline >] __sock_sendmsg_nosec net/socket.c:624
[< inline >] __sock_sendmsg net/socket.c:632
[<ffffffc000cab61c>] sock_sendmsg+0x124/0x164 net/socket.c:643
[< inline >] SYSC_sendto net/socket.c:1797
[<ffffffc000cad270>] SyS_sendto+0x178/0x1d8 net/socket.c:1761
CVE-2016-8399
Reported-by: Qidan He <i@flanker017.me>
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In function lanai_dev_open(), when the call to ioremap() fails, the
value of return variable result is 0. 0 means no error in this context.
This patch fixes the bug, assigning "-ENOMEM" to result when ioremap()
returns a NULL pointer.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188791
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In function lan78xx_probe(), variable ret takes the errno code on
failures. However, when the call to usb_alloc_urb() fails, its value
will keeps 0. 0 indicates success in the context, which is inconsistent
with the execution result. This patch fixes the bug, assigning
"-ENOMEM" to ret when usb_alloc_urb() returns a NULL pointer.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188771
Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Function br_sysfs_addbr() does not set error code when the call
kobject_create_and_add() returns a NULL pointer. It may be better to
return "-ENOMEM" when kobject_create_and_add() fails.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188781
Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding space after switch keyword before open
parenthesis for readability purpose.
This patch fixes the checkpatch.pl warning:
space required before the open parenthesis '('
Signed-off-by: Suraj Deshmukh <surajssd009005@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Function bnxt_hwrm_stat_ctx_alloc() always returns 0, even if the call
to _hwrm_send_message() fails. It may be better to propagate the errors
to the caller of bnxt_hwrm_stat_ctx_alloc().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188661
Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It has been reported that update_suffix can be expensive when it is called
on a large node in which most of the suffix lengths are the same. The time
required to add 200K entries had increased from around 3 seconds to almost
49 seconds.
In order to address this we need to move the code for updating the suffix
out of resize and instead just have it handled in the cases where we are
pushing a node that increases the suffix length, or will decrease the
suffix length.
Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
Reported-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Robert Shearman <rshearma@brocade.com>
Tested-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It wasn't necessary to pass a leaf in when doing the suffix updates so just
drop it. Instead just pass the suffix and work with that.
Since we dropped the leaf there is no need to include that in the name so
the names are updated to node_push_suffix and node_pull_suffix.
Finally I noticed that the logic for pulling the suffix length back
actually had some issues. Specifically it would stop prematurely if there
was a longer suffix, but it was not as long as the original suffix. I
updated the code to address that in node_pull_suffix.
Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
Suggested-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Robert Shearman <rshearma@brocade.com>
Tested-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
In function dcbnl_cee_fill(), returns the value of variable err on
errors. However, on some error paths (e.g. nla put fails), its value may
be 0. It may be better to explicitly set a negative errno to variable
err before returning.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188881
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Boris Zhmurov has reported RCU stalls during the kswapd reclaim:
INFO: rcu_sched detected stalls on CPUs/tasks:
23-...: (22 ticks this GP) idle=92f/140000000000000/0 softirq=2638404/2638404 fqs=23
(detected by 4, t=6389 jiffies, g=786259, c=786258, q=42115)
Task dump for CPU 23:
kswapd1 R running task 0 148 2 0x00000008
Call Trace:
shrink_node+0xd2/0x2f0
kswapd+0x2cb/0x6a0
mem_cgroup_shrink_node+0x160/0x160
kthread+0xbd/0xe0
__switch_to+0x1fa/0x5c0
ret_from_fork+0x1f/0x40
kthread_create_on_node+0x180/0x180
a closer code inspection has shown that we might indeed miss all the
scheduling points in the reclaim path if no pages can be isolated from
the LRU list. This is a pathological case but other reports from Donald
Buczek have shown that we might indeed hit such a path:
clusterd-989 [009] .... 118023.654491: mm_vmscan_direct_reclaim_end: nr_reclaimed=193
kswapd1-86 [001] dN.. 118023.987475: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239830 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118024.320968: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239844 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118024.654375: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239858 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118024.987036: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239872 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118025.319651: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239886 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118025.652248: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239900 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118025.984870: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239914 nr_taken=0 file=1
[...]
kswapd1-86 [001] dN.. 118084.274403: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4241133 nr_taken=0 file=1
this is minute long snapshot which didn't take a single page from the
LRU. It is not entirely clear why only 1303 pages have been scanned
during that time (maybe there was a heavy IRQ activity interfering).
In any case it looks like we can really hit long periods without
scheduling on non preemptive kernels so an explicit cond_resched() in
shrink_node_memcg which is independent on the reclaim operation is due.
Link: http://lkml.kernel.org/r/20161202095841.16648-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Boris Zhmurov <bb@kernelpanic.ru>
Tested-by: Boris Zhmurov <bb@kernelpanic.ru>
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Reported-by: "Christopher S. Aker" <caker@theshore.net>
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg
aware") has made the workingset shadow nodes shrinker memcg aware. The
implementation is not correct though because memcg_kmem_enabled() might
become true while we are doing a global reclaim when the sc->memcg might
be NULL which is exactly what Marek has seen:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000400
IP: [<ffffffff8122d520>] mem_cgroup_node_nr_lru_pages+0x20/0x40
PGD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 60 Comm: kswapd0 Tainted: G O 4.8.10-12.pvops.qubes.x86_64 #1
task: ffff880011863b00 task.stack: ffff880011868000
RIP: mem_cgroup_node_nr_lru_pages+0x20/0x40
RSP: e02b:ffff88001186bc70 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88001186bd20 RCX: 0000000000000002
RDX: 000000000000000c RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88001186bc70 R08: 28f5c28f5c28f5c3 R09: 0000000000000000
R10: 0000000000006c34 R11: 0000000000000333 R12: 00000000000001f6
R13: ffffffff81c6f6a0 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff880013c00000(0000) knlGS:ffff880013d00000
CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000400 CR3: 00000000122f2000 CR4: 0000000000042660
Call Trace:
count_shadow_nodes+0x9a/0xa0
shrink_slab.part.42+0x119/0x3e0
shrink_node+0x22c/0x320
kswapd+0x32c/0x700
kthread+0xd8/0xf0
ret_from_fork+0x1f/0x40
Code: 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 3b 35 dd eb b1 00 55 48 89 e5 73 2c 89 d2 31 c9 31 c0 4c 63 ce 48 0f a3 ca 73 13 <4a> 8b b4 cf 00 04 00 00 41 89 c8 4a 03 84 c6 80 00 00 00 83 c1
RIP mem_cgroup_node_nr_lru_pages+0x20/0x40
RSP <ffff88001186bc70>
CR2: 0000000000000400
---[ end trace 100494b9edbdfc4d ]---
This patch fixes the issue by checking sc->memcg rather than
memcg_kmem_enabled() which is sufficient because shrink_slab makes sure
that only memcg aware shrinkers will get non-NULL memcgs and only if
memcg_kmem_enabled is true.
Fixes: 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware")
Link: http://lkml.kernel.org/r/20161201132156.21450-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
Tested-by: Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When building a specific target such as bzImage, modules aren't normally
built. However if CONFIG_TRIM_UNUSED_KSYMS is enabled, no built modules
means none of the exported symbols are used and therefore they will all
be trimmed away from the final kernel. A subsequent "make modules" will
fail because modpost cannot find the needed symbols for those modules in
the kernel binary.
Let's make sure modules are also built whenever CONFIG_TRIM_UNUSED_KSYMS
is enabled and that the kernel binary is properly rebuilt accordingly.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...
Note that before commit 82981930125a ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.
This needs to be backported to all known linux kernels.
Again, many thanks to syzkaller team for discovering this gem.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
only needed as an argument to ip_tunnel_ecn_encap(), move this
directly in the function call.
Fixes: 08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.
As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.
Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We want to set the other "err" variable here so that we can return it
later. My version of GCC misses this issue but I caught it with a
static checker.
Fixes: 9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When streaming a lot of data and the RZ/A1 can't keep up, some status bits
will get set that are not being checked or cleared which cause the
following messages and the Ethernet driver to stop working. This
patch fixes that issue.
irq 21: nobody cared (try booting with the "irqpoll" option)
handlers:
[<c036b71c>] sh_eth_interrupt
Disabling IRQ #21
Fixes: db893473d313a4ad ("sh_eth: Add support for r7s72100")
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
__bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the
same struct device during unmap that was used for the map operation,
which makes DMA-API debugging warn about it. Fix this by always using
&priv->pdev->dev throughout the driver, using an identical device
reference for all map/unmap calls.
Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ozgur Karatas reported that the very first entry in the CREDITS file had
the wrong tag for name (M: instead of N: - it happened when moving the
entry from the MAINTAINERS file, where 'M:' stands for "Maintainer").
And when I went looking, I found a couple of other cases of wrong
tagging too.
Reported-by: Ozgur Karatas <mueddib@yandex.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch adds support for PID 0x1040 of Telit LE922A.
The qmi adapter requires to have DTR set for proper working,
so QMI_WWAN_QUIRK_DTR has been enabled.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
introduced a work-around in usbnet_cdc_status() for devices that exported
cdc carrier on twice on connect. Before the commit, this behavior caused
the link state to be incorrect. It was assumed that all CDC Ethernet
devices would either export this behavior, or send one off and then one on
notification (which seems to be the default behavior).
Unfortunately, it turns out multiple devices sends a connection
notification multiple times per second (via an interrupt), even when
connection state does not change. This has been observed with several
different USB LAN dongles (at least), for example 13b1:0041 (Linksys).
After bfe9b9d2df66, the link state has been set as down and then up for
each notification. This has caused a flood of Netlink NEWLINK messages and
syslog to be flooded with messages similar to:
cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped
This commit fixes the behavior by reverting usbnet_cdc_status() to how it
was before bfe9b9d2df66. The work-around has been moved to a separate
status-function which is only called when a known, affect device is
detected.
v1->v2:
* Do not open-code netif_carrier_ok() (thanks Henning Schild).
* Call netif_carrier_off() instead of usb_link_change(). This prevents
calling schedule_work() twice without giving the work queue a chance to be
processed (thanks Bjørn Mork).
Fixes: bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
segs needs to be checked for being NULL in ipv6_gso_segment() before calling
skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:
[ 97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
[ 97.819112] IP: [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[ 97.825214] PGD 0 [ 97.827047]
[ 97.828540] Oops: 0000 [#1] SMP
[ 97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5
nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel
snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device
snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport
sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon
broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core
i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod
[ 97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1
[ 97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010
[ 97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000
[ 97.947720] RIP: 0010:[<ffffffff816e52f9>] [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[ 97.956251] RSP: 0018:ffff88012fc43a10 EFLAGS: 00010207
[ 97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594
[ 97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000
[ 97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000
[ 97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3
[ 97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000
[ 97.997198] FS: 0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000
[ 98.005280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0
[ 98.018149] Stack:
[ 98.020157] 00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e
[ 98.027584] 0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3
[ 98.035010] ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98
[ 98.042437] Call Trace:
[ 98.044879] <IRQ> [ 98.046803] [<ffffffffa017ad0a>] ? tg3_start_xmit+0x84a/0xd60 [tg3]
[ 98.053156] [<ffffffff815eeee0>] skb_mac_gso_segment+0xb0/0x130
[ 98.059158] [<ffffffff815eefd3>] __skb_gso_segment+0x73/0x110
[ 98.064985] [<ffffffff815ef40d>] validate_xmit_skb+0x12d/0x2b0
[ 98.070899] [<ffffffff815ef5d2>] validate_xmit_skb_list+0x42/0x70
[ 98.077073] [<ffffffff81618560>] sch_direct_xmit+0xd0/0x1b0
[ 98.082726] [<ffffffff815efd86>] __dev_queue_xmit+0x486/0x690
[ 98.088554] [<ffffffff8135c135>] ? cpumask_next_and+0x35/0x50
[ 98.094380] [<ffffffff815effa0>] dev_queue_xmit+0x10/0x20
[ 98.099863] [<ffffffffa09ce057>] br_dev_queue_push_xmit+0xa7/0x170 [bridge]
[ 98.106907] [<ffffffffa09ce161>] br_forward_finish+0x41/0xc0 [bridge]
[ 98.113430] [<ffffffff81627cf2>] ? nf_iterate+0x52/0x60
[ 98.118735] [<ffffffff81627d6b>] ? nf_hook_slow+0x6b/0xc0
[ 98.124216] [<ffffffffa09ce32c>] __br_forward+0x14c/0x1e0 [bridge]
[ 98.130480] [<ffffffffa09ce120>] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
[ 98.137785] [<ffffffffa09ce4bd>] br_forward+0x9d/0xb0 [bridge]
[ 98.143701] [<ffffffffa09cfbb7>] br_handle_frame_finish+0x267/0x560 [bridge]
[ 98.150834] [<ffffffffa09d0064>] br_handle_frame+0x174/0x2f0 [bridge]
[ 98.157355] [<ffffffff8102fb89>] ? sched_clock+0x9/0x10
[ 98.162662] [<ffffffff810b63b2>] ? sched_clock_cpu+0x72/0xa0
[ 98.168403] [<ffffffff815eccf5>] __netif_receive_skb_core+0x1e5/0xa20
[ 98.174926] [<ffffffff813659f9>] ? timerqueue_add+0x59/0xb0
[ 98.180580] [<ffffffff815ed548>] __netif_receive_skb+0x18/0x60
[ 98.186494] [<ffffffff815ee625>] process_backlog+0x95/0x140
[ 98.192145] [<ffffffff815edccd>] net_rx_action+0x16d/0x380
[ 98.197713] [<ffffffff8170cff1>] __do_softirq+0xd1/0x283
[ 98.203106] [<ffffffff8170b2bc>] do_softirq_own_stack+0x1c/0x30
[ 98.209107] <EOI> [ 98.211029] [<ffffffff8108a5c0>] do_softirq+0x50/0x60
[ 98.216166] [<ffffffff815ec853>] netif_rx_ni+0x33/0x80
[ 98.221386] [<ffffffffa09eeff7>] tun_get_user+0x487/0x7f0 [tun]
[ 98.227388] [<ffffffffa09ef3ab>] tun_sendmsg+0x4b/0x60 [tun]
[ 98.233129] [<ffffffffa0b68932>] handle_tx+0x282/0x540 [vhost_net]
[ 98.239392] [<ffffffffa0b68c25>] handle_tx_kick+0x15/0x20 [vhost_net]
[ 98.245916] [<ffffffffa0abacfe>] vhost_worker+0x9e/0xf0 [vhost]
[ 98.251919] [<ffffffffa0abac60>] ? vhost_umem_alloc+0x40/0x40 [vhost]
[ 98.258440] [<ffffffff81003a47>] ? do_syscall_64+0x67/0x180
[ 98.264094] [<ffffffff810a44d9>] kthread+0xd9/0xf0
[ 98.268965] [<ffffffff810a4400>] ? kthread_park+0x60/0x60
[ 98.274444] [<ffffffff8170a4d5>] ret_from_fork+0x25/0x30
[ 98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66
[ 98.299425] RIP [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[ 98.305612] RSP <ffff88012fc43a10>
[ 98.309094] CR2: 00000000000000cc
[ 98.312406] ---[ end trace 726a2c7a2d2d78d0 ]---
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|