Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
The race was introduced by me in commit 971316f0503a ("epoll:
ep_unregister_pollwait() can use the freed pwq->whead"). I did not
realize that nothing can protect eventpoll after ep_poll_callback() sets
->whead = NULL, only whead->lock can save us from the race with
ep_free() or ep_remove().
Move ->whead = NULL to the end of ep_poll_callback() and add the
necessary barriers.
TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even
before this patch.
Hopefully this explains use-after-free reported by syzcaller:
BUG: KASAN: use-after-free in debug_spin_lock_before
...
_raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159
ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148
this is spin_lock(eventpoll->lock),
...
Freed by task 17774:
...
kfree+0xe8/0x2c0 mm/slub.c:3883
ep_free+0x22c/0x2a0 fs/eventpoll.c:865
Fixes: 971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead")
Reported-by: 范龙飞 <long7573@126.com>
Cc: stable@vger.kernel.org
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After commit dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC")
we preserve the secpath for the whole skb lifecycle, but we also
end up leaking a reference to it.
We must clear the head state on skb reception, if secpath is
present.
Fixes: dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC")
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for
stacked devices") added the 'offload_fwd_mark' bit to the skb in order
to allow drivers to indicate to the bridge driver that they already
forwarded the packet in L2.
In case the bit is set, before transmitting the packet from each port,
the port's mark is compared with the mark stored in the skb's control
block. If both marks are equal, we know the packet arrived from a switch
device that already forwarded the packet and it's not re-transmitted.
However, if the packet is transmitted from the bridge device itself
(e.g., br0), we should clear the 'offload_fwd_mark' bit as the mark
stored in the skb's control block isn't valid.
This scenario can happen in rare cases where a packet was trapped during
L3 forwarding and forwarded by the kernel to a bridge device.
Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Yotam Gigi <yotamg@mellanox.com>
Tested-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the
device in case a port is enslaved to a master netdev such as bridge or
bond.
Since the driver ignores events unrelated to its ports and their
uppers, it's possible to engineer situations in which the device's data
path differs from the kernel's.
One example to such a situation is when a port is enslaved to a bond
that is already enslaved to a bridge. When the bond was enslaved the
driver ignored the event - as the bond wasn't one of its uppers - and
therefore a bridge port instance isn't created in the device.
Until such configurations are supported forbid them by checking that the
upper device doesn't have uppers of its own.
Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Nogah Frankel <nogahf@mellanox.com>
Tested-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Do not check which flash type the SoC was booted from before
using this driver. Assume that the device tree is correct and use this
driver when it was added to device tree. This also removes a build
dependency to the SoC code.
All device trees I am aware of only have one correct flash device entry
in it. The device tree is anyway bundled with the kernel in all systems
using device tree I know of.
The boot mode can be specified with some pin straps and will select the
flash type the rom code will boot from. One SPI, NOR or NAND flash chip
can be connect to the EBU and used to load the first stage boot loader
from.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-spi@vger.kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
When mounting to older servers, such as Windows XP (or even Windows 7),
the limited error messages that can be passed back to user space can
get confusing since the default dialect has changed from SMB1 (CIFS) to
more secure SMB3 dialect. Log additional information when the user chooses
to use the default dialects and when the server does not support the
dialect requested.
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
jfs had previously avoided the use of MAX_LFS_FILESIZE because it hadn't
accounted for the whole 32-bit index range on 32-bit systems. That has
been fixed by commit 0cc3b0ec23ce ("Clarify (and fix) MAX_LFS_FILESIZE
macros"), so we can simplify the code now.
Suggested by Andreas Dilger.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: jfs-discussion@lists.sourceforge.net
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
dtc uses an incorrect format specifier for printing a uint64_t value.
uint64_t may be either 'unsigned long' or 'unsigned long long' depending
on the host architecture.
Fix this by using %llx and casting to unsigned long long, which ensures
that we always have a wide enough variable to print 64 bits of hex.
HOSTCC scripts/dtc/checks.o
scripts/dtc/checks.c: In function 'check_simple_bus_reg':
scripts/dtc/checks.c:876:2: warning: format '%zx' expects argument of type 'size_t', but argument 4 has type 'uint64_t' [-Wformat=]
snprintf(unit_addr, sizeof(unit_addr), "%zx", reg);
^
scripts/dtc/checks.c:876:2: warning: format '%zx' expects argument of type 'size_t', but argument 4 has type 'uint64_t' [-Wformat=]
Link: http://lkml.kernel.org/r/20170829222034.GJ20805@n2100.armlinux.org.uk
Fixes: 828d4cdd012c ("dtc: check.c fix compile error")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit c7acec713d14 ("kernel.h: handle pointers to arrays better in
container_of()") made use of __compiletime_assert() from container_of()
thus increasing the usage of this macro, allowing developers to notice
type conflicts in usage of container_of() at compile time.
However, the implementation of __compiletime_assert relies on compiler
optimizations to report an error. This means that if a developer uses
"-O0" with any code that performs container_of(), the compiler will always
report an error regardless of whether there is an actual problem in the
code.
This patch disables compile_time_assert when optimizations are disabled to
allow such code to compile with CFLAGS="-O0".
Example compilation failure:
./include/linux/compiler.h:547:38: error: call to `__compiletime_assert_94' declared with attribute error: pointer type mismatch in container_of()
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^
./include/linux/compiler.h:530:4: note: in definition of macro `__compiletime_assert'
prefix ## suffix(); \
^~~~~~
./include/linux/compiler.h:547:2: note: in expansion of macro `_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:46:37: note: in expansion of macro `compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~~~~~~~~~~~~~~~~~
./include/linux/kernel.h:860:2: note: in expansion of macro `BUILD_BUG_ON_MSG'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
^~~~~~~~~~~~~~~~
[akpm@linux-foundation.org: use do{}while(0), per Michal]
Link: http://lkml.kernel.org/r/20170829230114.11662-1-joe@ovn.org
Fixes: c7acec713d14c6c ("kernel.h: handle pointers to arrays better in container_of()")
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Ian Abbott <abbotti@mev.co.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Wendy Wang reported off-list that a RAS HWPOISON-SOFT test case failed
and bisected it to the commit 479f854a207c ("mm, page_alloc: defer
debugging checks of pages allocated from the PCP").
The problem is that a page that was poisoned with madvise() is reused.
The commit removed a check that would trigger if DEBUG_VM was enabled
but re-enabling the check only fixes the problem as a side-effect by
printing a bad_page warning and recovering.
The root of the problem is that an madvise() can leave a poisoned page
on the per-cpu list. This patch drains all per-cpu lists after pages
are poisoned so that they will not be reused. Wendy reports that the
test case in question passes with this patch applied. While this could
be done in a targeted fashion, it is over-complicated for such a rare
operation.
Link: http://lkml.kernel.org/r/20170828133414.7qro57jbepdcyz5x@techsingularity.net
Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Wang, Wendy <wendy.wang@intel.com>
Tested-by: Wang, Wendy <wendy.wang@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Hansen, Dave" <dave.hansen@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for
write killable") made it possible to kill a forking task while it is
waiting to acquire its ->mmap_sem for write, in dup_mmap().
However, it was overlooked that this introduced an new error path before
the new mm_struct's ->uprobes_state.xol_area has been set to NULL after
being copied from the old mm_struct by the memcpy in dup_mm(). For a
task that has previously hit a uprobe tracepoint, this resulted in the
'struct xol_area' being freed multiple times if the task was killed at
just the right time while forking.
Fix it by setting ->uprobes_state.xol_area to NULL in mm_init() rather
than in uprobe_dup_mmap().
With CONFIG_UPROBE_EVENTS=y, the bug can be reproduced by the same C
program given by commit 2b7e8665b4ff ("fork: fix incorrect fput of
->exe_file causing use-after-free"), provided that a uprobe tracepoint
has been set on the fork_thread() function. For example:
$ gcc reproducer.c -o reproducer -lpthread
$ nm reproducer | grep fork_thread
0000000000400719 t fork_thread
$ echo "p $PWD/reproducer:0x719" > /sys/kernel/debug/tracing/uprobe_events
$ echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable
$ ./reproducer
Here is the use-after-free reported by KASAN:
BUG: KASAN: use-after-free in uprobe_clear_state+0x1c4/0x200
Read of size 8 at addr ffff8800320a8b88 by task reproducer/198
CPU: 1 PID: 198 Comm: reproducer Not tainted 4.13.0-rc7-00015-g36fde05f3fb5 #255
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
Call Trace:
dump_stack+0xdb/0x185
print_address_description+0x7e/0x290
kasan_report+0x23b/0x350
__asan_report_load8_noabort+0x19/0x20
uprobe_clear_state+0x1c4/0x200
mmput+0xd6/0x360
do_exit+0x740/0x1670
do_group_exit+0x13f/0x380
get_signal+0x597/0x17d0
do_signal+0x99/0x1df0
exit_to_usermode_loop+0x166/0x1e0
syscall_return_slowpath+0x258/0x2c0
entry_SYSCALL_64_fastpath+0xbc/0xbe
...
Allocated by task 199:
save_stack_trace+0x1b/0x20
kasan_kmalloc+0xfc/0x180
kmem_cache_alloc_trace+0xf3/0x330
__create_xol_area+0x10f/0x780
uprobe_notify_resume+0x1674/0x2210
exit_to_usermode_loop+0x150/0x1e0
prepare_exit_to_usermode+0x14b/0x180
retint_user+0x8/0x20
Freed by task 199:
save_stack_trace+0x1b/0x20
kasan_slab_free+0xa8/0x1a0
kfree+0xba/0x210
uprobe_clear_state+0x151/0x200
mmput+0xd6/0x360
copy_process.part.8+0x605f/0x65d0
_do_fork+0x1a5/0xbd0
SyS_clone+0x19/0x20
do_syscall_64+0x22f/0x660
return_from_SYSCALL_64+0x0/0x7a
Note: without KASAN, you may instead see a "Bad page state" message, or
simply a general protection fault.
Link: http://lkml.kernel.org/r/20170830033303.17927-1-ebiggers3@gmail.com
Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If the worker thread continues getting work, it will hog the cpu and rcu
stall complains. Make it a good citizen. This is triggered in a loop
block device test.
Link: http://lkml.kernel.org/r/5de0a179b3184e1a2183fc503448b0269f24d75b.1503697127.git.shli@fb.com
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We are doing a last second memory allocation attempt before calling
out_of_memory(). But since slab shrinker functions might indirectly
wait for other thread's __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory
allocations via sleeping locks, calling slab shrinker functions from
node_reclaim() from get_page_from_freelist() with oom_lock held has
possibility of deadlock. Therefore, make sure that last second memory
allocation attempt does not call slab shrinker functions.
Link: http://lkml.kernel.org/r/1503577106-9196-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The invalidate_page callback suffered from two pitfalls. First it used
to happen after the page table lock was release and thus a new page
might have setup before the call to invalidate_page() happened.
This is in a weird way fixed by commit c7ab0d2fdc84 ("mm: convert
try_to_unmap_one() to use page_vma_mapped_walk()") that moved the
callback under the page table lock but this also broke several existing
users of the mmu_notifier API that assumed they could sleep inside this
callback.
The second pitfall was invalidate_page() being the only callback not
taking a range of address in respect to invalidation but was giving an
address and a page. Lots of the callback implementers assumed this
could never be THP and thus failed to invalidate the appropriate range
for THP.
By killing this callback we unify the mmu_notifier callback API to
always take a virtual address range as input.
Finally this also simplifies the end user life as there is now two clear
choices:
- invalidate_range_start()/end() callback (which allow you to sleep)
- invalidate_range() where you can not sleep but happen right after
page table update under page table lock
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: axie <axie@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Changed since v1 (Linus Torvalds)
- remove now useless kvm_arch_mmu_notifier_invalidate_page()
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org (moderated for non-subscribers)
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linux-rdma@vger.kernel.org
Cc: Dean Luick <dean.luick@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Cc: linux-rdma@vger.kernel.org
Cc: Artemy Kovalyov <artemyko@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and now are bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Replace all mmu_notifier_invalidate_page() calls by *_invalidate_range()
and make sure it is bracketed by calls to *_invalidate_range_start()/end().
Note that because we can not presume the pmd value or pte value we have
to assume the worst and unconditionaly report an invalidation as
happening.
Changed since v2:
- try_to_unmap_one() only one call to mmu_notifier_invalidate_range()
- compute end with PAGE_SIZE << compound_order(page)
- fix PageHuge() case in try_to_unmap_one()
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: axie <axie@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Replace all mmu_notifier_invalidate_page() calls by *_invalidate_range()
and make sure it is bracketed by calls to *_invalidate_range_start()/end().
Note that because we can not presume the pmd value or pte value we have
to assume the worst and unconditionaly report an invalidation as
happening.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: axie <axie@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ceph_readpage() unlocks page prematurely prematurely in the case
that page is reading from fscache. Caller of readpage expects that
page is uptodate when it get unlocked. So page shoule get locked
by completion callback of fscache_read_or_alloc_pages()
Cc: stable@vger.kernel.org # 4.1+, needs backporting for < 4.7
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
wl1251: add a missing spin_lock_init()
This fixes the following kernel warning:
[ 5668.771453] BUG: spinlock bad magic on CPU#0, kworker/u2:3/9745
[ 5668.771850] lock: 0xce63ef20, .magic: 00000000, .owner: <none>/-1,
.owner_cpu: 0
[ 5668.772277] CPU: 0 PID: 9745 Comm: kworker/u2:3 Tainted: G W
4.12.0-03002-gec979a4-dirty #40
[ 5668.772796] Hardware name: Nokia RX-51 board
[ 5668.773071] Workqueue: phy1 wl1251_irq_work
[ 5668.773345] [<c010c9e4>] (unwind_backtrace) from [<c010a274>]
(show_stack+0x10/0x14)
[ 5668.773803] [<c010a274>] (show_stack) from [<c01545a4>]
(do_raw_spin_lock+0x6c/0xa0)
[ 5668.774230] [<c01545a4>] (do_raw_spin_lock) from [<c06ca578>]
(_raw_spin_lock_irqsave+0x10/0x18)
[ 5668.774658] [<c06ca578>] (_raw_spin_lock_irqsave) from [<c048c010>]
(wl1251_op_tx+0x38/0x5c)
[ 5668.775115] [<c048c010>] (wl1251_op_tx) from [<c06a12e8>]
(ieee80211_tx_frags+0x188/0x1c0)
[ 5668.775543] [<c06a12e8>] (ieee80211_tx_frags) from [<c06a138c>]
(__ieee80211_tx+0x6c/0x130)
[ 5668.775970] [<c06a138c>] (__ieee80211_tx) from [<c06a3dbc>]
(ieee80211_tx+0xdc/0x104)
[ 5668.776367] [<c06a3dbc>] (ieee80211_tx) from [<c06a4af0>]
(__ieee80211_subif_start_xmit+0x454/0x8c8)
[ 5668.776824] [<c06a4af0>] (__ieee80211_subif_start_xmit) from
[<c06a4f94>] (ieee80211_subif_start_xmit+0x30/0x2fc)
[ 5668.777343] [<c06a4f94>] (ieee80211_subif_start_xmit) from
[<c0578848>] (dev_hard_start_xmit+0x80/0x118)
...
by adding the missing spin_lock_init().
Reported-by: Pavel Machek <pavel@ucw.cz>
Cc: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PowerA gamepad initialization quirk worked with the PowerA
wired gamepad I had around (0x24c6:0x543a), but a user reported [0]
that it didn't work for him, even though our gamepads shared the
same vendor and product IDs.
When I initially implemented the PowerA quirk, I wanted to avoid
actually triggering the rumble action during init. My tests showed
that my gamepad would work correctly even if it received a rumble
of 0 intensity, so that's what I went with.
Unfortunately, this apparently isn't true for all models (perhaps
a firmware difference?). This non-working gamepad seems to require
the real magic rumble packet that the Microsoft driver sends, which
actually vibrates the gamepad. To counteract this effect, I still
send the old zero-rumble PowerA quirk packet which cancels the
rumble effect before the motors can spin up enough to vibrate.
[0]: https://github.com/paroj/xpad/issues/48#issuecomment-313904867
Reported-by: Kyle Beauchamp <kyleabeauchamp@gmail.com>
Tested-by: Kyle Beauchamp <kyleabeauchamp@gmail.com>
Fixes: 81093c9848a7 ("Input: xpad - support some quirky Xbox One pads")
Cc: stable@vger.kernel.org # v4.12
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
The Lenovo Miix2 8 DSDT contains an i2c clk / bus speed of 1700000 Hz
for one if its devices, which is not supported.
This is the second DSDT to show up with an unsupported clk in a short
time, remove the hardcoded fix for DSDTs with a 1 MiHz clock and simply
always round down the clk to the nearest supported value.
Reported-by: russianneuromancer@ya.ru
Fixes: 682c6c2188 ("i2c: designware: Some broken DSTDs use 1MiHz ...")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
A 31-bit compat process can force a BUG_ON in crst_table_upgrade
with specific, invalid mmap calls, e.g.
mmap((void*) 0x7fff8000, 0x10000, 3, 32, -1, 0)
The arch_get_unmapped_area[_topdown] functions miss an if condition
in the decision to do a page table upgrade.
Fixes: 9b11c7912d00 ("s390/mm: simplify arch_get_unmapped_area[_topdown]")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
The mm->context.asce field of a new process is not set up correctly
in case of a fork with a 5 level page table.
Add the missing case to init_new_context().
Fixes: 1aea9b3f9210 ("s390/mm: implement 5 level pages tables")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
This reverts commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ("net: phy:
Correctly process PHY_HALTED in phy_stop_machine()") because it is
creating the possibility for a NULL pointer dereference.
David Daney provide the following call trace and diagram of events:
When ndo_stop() is called we call:
phy_disconnect()
+---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
+---> phy_stop_machine()
| +---> phy_state_machine()
| +----> queue_delayed_work(): Work queued.
+--->phy_detach() implies: phydev->attached_dev = NULL;
Now at a later time the queued work does:
phy_state_machine()
+---->netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:
CPU 12 Unable to handle kernel paging request at virtual address
0000000000000048, epc == ffffffff80de37ec, ra == ffffffff80c7c
Oops[#1]:
CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
Workqueue: events_power_efficient phy_state_machine
task: 80000004021ed100 task.stack: 8000000409d70000
$ 0 : 0000000000000000 ffffffff84720060 0000000000000048 0000000000000004
$ 4 : 0000000000000000 0000000000000001 0000000000000004 0000000000000000
$ 8 : 0000000000000000 0000000000000000 00000000ffff98f3 0000000000000000
$12 : 8000000409d73fe0 0000000000009c00 ffffffff846547c8 000000000000af3b
$16 : 80000004096bab68 80000004096babd0 0000000000000000 80000004096ba800
$20 : 0000000000000000 0000000000000000 ffffffff81090000 0000000000000008
$24 : 0000000000000061 ffffffff808637b0
$28 : 8000000409d70000 8000000409d73cf0 80000000271bd300 ffffffff80c7804c
Hi : 000000000000002a
Lo : 000000000000003f
epc : ffffffff80de37ec netif_carrier_off+0xc/0x58
ra : ffffffff80c7804c phy_state_machine+0x48c/0x4f8
Status: 14009ce3 KX SX UX KERNEL EXL IE
Cause : 00800008 (ExcCode 02)
BadVA : 0000000000000048
PrId : 000d9501 (Cavium Octeon III)
Modules linked in:
Process kworker/12:1 (pid: 1502, threadinfo=8000000409d70000,
task=80000004021ed100, tls=0000000000000000)
Stack : 8000000409a54000 80000004096bab68 80000000271bd300 80000000271c1e00
0000000000000000 ffffffff808a1708 8000000409a54000 80000000271bd300
80000000271bd320 8000000409a54030 ffffffff80ff0f00 0000000000000001
ffffffff81090000 ffffffff808a1ac0 8000000402182080 ffffffff84650000
8000000402182080 ffffffff84650000 ffffffff80ff0000 8000000409a54000
ffffffff808a1970 0000000000000000 80000004099e8000 8000000402099240
0000000000000000 ffffffff808a8598 0000000000000000 8000000408eeeb00
8000000409a54000 00000000810a1d00 0000000000000000 8000000409d73de8
8000000409d73de8 0000000000000088 000000000c009c00 8000000409d73e08
8000000409d73e08 8000000402182080 ffffffff808a84d0 8000000402182080
...
Call Trace:
[<ffffffff80de37ec>] netif_carrier_off+0xc/0x58
[<ffffffff80c7804c>] phy_state_machine+0x48c/0x4f8
[<ffffffff808a1708>] process_one_work+0x158/0x368
[<ffffffff808a1ac0>] worker_thread+0x150/0x4c0
[<ffffffff808a8598>] kthread+0xc8/0xe0
[<ffffffff808617f0>] ret_from_kernel_thread+0x14/0x1c
The original motivation for this change originated from Marc Gonzales
indicating that his network driver did not have its adjust_link callback
executing with phydev->link = 0 while he was expecting it.
PHYLIB has never made any such guarantees ever because phy_stop() merely just
tells the workqueue to move into PHY_HALTED state which will happen
asynchronously.
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reported-by: David Daney <ddaney.cavm@gmail.com>
Fixes: 7ad813f20853 ("net: phy: Correctly process PHY_HALTED in phy_stop_machine()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
BCM7278 has only 128 entries while BCM7445 has the full 256 entries set,
fix that.
Fixes: 7318166cacad ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzkaller had no problem to trigger a deadlock, attaching a KCM socket
to another one (or itself). (original syzkaller report was a very
confusing lockdep splat during a sendmsg())
It seems KCM claims to only support TCP, but no enforcement is done,
so we might need to add additional checks.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sch_tbf calls qdisc_watchdog_cancel() in both its ->reset and ->destroy
callbacks but it may fail before the timer is initialized due to missing
options (either not supplied by user-space or set as a default qdisc),
also q->qdisc is used by ->reset and ->destroy so we need it initialized.
Reproduce:
$ sysctl net.core.default_qdisc=tbf
$ ip l set ethX up
Crash log:
[ 959.160172] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 959.160323] IP: qdisc_reset+0xa/0x5c
[ 959.160400] PGD 59cdb067
[ 959.160401] P4D 59cdb067
[ 959.160466] PUD 59ccb067
[ 959.160532] PMD 0
[ 959.160597]
[ 959.160706] Oops: 0000 [#1] SMP
[ 959.160778] Modules linked in: sch_tbf sch_sfb sch_prio sch_netem
[ 959.160891] CPU: 2 PID: 1562 Comm: ip Not tainted 4.13.0-rc6+ #62
[ 959.160998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 959.161157] task: ffff880059c9a700 task.stack: ffff8800376d0000
[ 959.161263] RIP: 0010:qdisc_reset+0xa/0x5c
[ 959.161347] RSP: 0018:ffff8800376d3610 EFLAGS: 00010286
[ 959.161531] RAX: ffffffffa001b1dd RBX: ffff8800373a2800 RCX: 0000000000000000
[ 959.161733] RDX: ffffffff8215f160 RSI: ffffffff8215f160 RDI: 0000000000000000
[ 959.161939] RBP: ffff8800376d3618 R08: 00000000014080c0 R09: 00000000ffffffff
[ 959.162141] R10: ffff8800376d3578 R11: 0000000000000020 R12: ffffffffa001d2c0
[ 959.162343] R13: ffff880037538000 R14: 00000000ffffffff R15: 0000000000000001
[ 959.162546] FS: 00007fcc5126b740(0000) GS:ffff88005d900000(0000) knlGS:0000000000000000
[ 959.162844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 959.163030] CR2: 0000000000000018 CR3: 000000005abc4000 CR4: 00000000000406e0
[ 959.163233] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 959.163436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 959.163638] Call Trace:
[ 959.163788] tbf_reset+0x19/0x64 [sch_tbf]
[ 959.163957] qdisc_destroy+0x8b/0xe5
[ 959.164119] qdisc_create_dflt+0x86/0x94
[ 959.164284] ? dev_activate+0x129/0x129
[ 959.164449] attach_one_default_qdisc+0x36/0x63
[ 959.164623] netdev_for_each_tx_queue+0x3d/0x48
[ 959.164795] dev_activate+0x4b/0x129
[ 959.164957] __dev_open+0xe7/0x104
[ 959.165118] __dev_change_flags+0xc6/0x15c
[ 959.165287] dev_change_flags+0x25/0x59
[ 959.165451] do_setlink+0x30c/0xb3f
[ 959.165613] ? check_chain_key+0xb0/0xfd
[ 959.165782] rtnl_newlink+0x3a4/0x729
[ 959.165947] ? rtnl_newlink+0x117/0x729
[ 959.166121] ? ns_capable_common+0xd/0xb1
[ 959.166288] ? ns_capable+0x13/0x15
[ 959.166450] rtnetlink_rcv_msg+0x188/0x197
[ 959.166617] ? rcu_read_unlock+0x3e/0x5f
[ 959.166783] ? rtnl_newlink+0x729/0x729
[ 959.166948] netlink_rcv_skb+0x6c/0xce
[ 959.167113] rtnetlink_rcv+0x23/0x2a
[ 959.167273] netlink_unicast+0x103/0x181
[ 959.167439] netlink_sendmsg+0x326/0x337
[ 959.167607] sock_sendmsg_nosec+0x14/0x3f
[ 959.167772] sock_sendmsg+0x29/0x2e
[ 959.167932] ___sys_sendmsg+0x209/0x28b
[ 959.168098] ? do_raw_spin_unlock+0xcd/0xf8
[ 959.168267] ? _raw_spin_unlock+0x27/0x31
[ 959.168432] ? __handle_mm_fault+0x651/0xdb1
[ 959.168602] ? check_chain_key+0xb0/0xfd
[ 959.168773] __sys_sendmsg+0x45/0x63
[ 959.168934] ? __sys_sendmsg+0x45/0x63
[ 959.169100] SyS_sendmsg+0x19/0x1b
[ 959.169260] entry_SYSCALL_64_fastpath+0x23/0xc2
[ 959.169432] RIP: 0033:0x7fcc5097e690
[ 959.169592] RSP: 002b:00007ffd0d5c7b48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 959.169887] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007fcc5097e690
[ 959.170089] RDX: 0000000000000000 RSI: 00007ffd0d5c7b90 RDI: 0000000000000003
[ 959.170292] RBP: ffff8800376d3f98 R08: 0000000000000001 R09: 0000000000000003
[ 959.170494] R10: 00007ffd0d5c7910 R11: 0000000000000246 R12: 0000000000000006
[ 959.170697] R13: 000000000066f1a0 R14: 00007ffd0d5cfc40 R15: 0000000000000000
[ 959.170900] ? trace_hardirqs_off_caller+0xa7/0xcf
[ 959.171076] Code: 00 41 c7 84 24 14 01 00 00 00 00 00 00 41 c7 84 24
98 00 00 00 00 00 00 00 41 5c 41 5d 41 5e 5d c3 66 66 66 66 90 55 48 89
e5 53 <48> 8b 47 18 48 89 fb 48 8b 40 48 48 85 c0 74 02 ff d0 48 8b bb
[ 959.171637] RIP: qdisc_reset+0xa/0x5c RSP: ffff8800376d3610
[ 959.171821] CR2: 0000000000000018
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently only a memory allocation failure can lead to this, so let's
initialize the timer first.
Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
netem can fail in ->init due to missing options (either not supplied by
user-space or used as a default qdisc) causing a timer->base null
pointer deref in its ->destroy() and ->reset() callbacks.
Reproduce:
$ sysctl net.core.default_qdisc=netem
$ ip l set ethX up
Crash log:
[ 1814.846943] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1814.847181] IP: hrtimer_active+0x17/0x8a
[ 1814.847270] PGD 59c34067
[ 1814.847271] P4D 59c34067
[ 1814.847337] PUD 37374067
[ 1814.847403] PMD 0
[ 1814.847468]
[ 1814.847582] Oops: 0000 [#1] SMP
[ 1814.847655] Modules linked in: sch_netem(O) sch_fq_codel(O)
[ 1814.847761] CPU: 3 PID: 1573 Comm: ip Tainted: G O 4.13.0-rc6+ #62
[ 1814.847884] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 1814.848043] task: ffff88003723a700 task.stack: ffff88005adc8000
[ 1814.848235] RIP: 0010:hrtimer_active+0x17/0x8a
[ 1814.848407] RSP: 0018:ffff88005adcb590 EFLAGS: 00010246
[ 1814.848590] RAX: 0000000000000000 RBX: ffff880058e359d8 RCX: 0000000000000000
[ 1814.848793] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880058e359d8
[ 1814.848998] RBP: ffff88005adcb5b0 R08: 00000000014080c0 R09: 00000000ffffffff
[ 1814.849204] R10: ffff88005adcb660 R11: 0000000000000020 R12: 0000000000000000
[ 1814.849410] R13: ffff880058e359d8 R14: 00000000ffffffff R15: 0000000000000001
[ 1814.849616] FS: 00007f733bbca740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
[ 1814.849919] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1814.850107] CR2: 0000000000000000 CR3: 0000000059f0d000 CR4: 00000000000406e0
[ 1814.850313] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1814.850518] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1814.850723] Call Trace:
[ 1814.850875] hrtimer_try_to_cancel+0x1a/0x93
[ 1814.851047] hrtimer_cancel+0x15/0x20
[ 1814.851211] qdisc_watchdog_cancel+0x12/0x14
[ 1814.851383] netem_reset+0xe6/0xed [sch_netem]
[ 1814.851561] qdisc_destroy+0x8b/0xe5
[ 1814.851723] qdisc_create_dflt+0x86/0x94
[ 1814.851890] ? dev_activate+0x129/0x129
[ 1814.852057] attach_one_default_qdisc+0x36/0x63
[ 1814.852232] netdev_for_each_tx_queue+0x3d/0x48
[ 1814.852406] dev_activate+0x4b/0x129
[ 1814.852569] __dev_open+0xe7/0x104
[ 1814.852730] __dev_change_flags+0xc6/0x15c
[ 1814.852899] dev_change_flags+0x25/0x59
[ 1814.853064] do_setlink+0x30c/0xb3f
[ 1814.853228] ? check_chain_key+0xb0/0xfd
[ 1814.853396] ? check_chain_key+0xb0/0xfd
[ 1814.853565] rtnl_newlink+0x3a4/0x729
[ 1814.853728] ? rtnl_newlink+0x117/0x729
[ 1814.853905] ? ns_capable_common+0xd/0xb1
[ 1814.854072] ? ns_capable+0x13/0x15
[ 1814.854234] rtnetlink_rcv_msg+0x188/0x197
[ 1814.854404] ? rcu_read_unlock+0x3e/0x5f
[ 1814.854572] ? rtnl_newlink+0x729/0x729
[ 1814.854737] netlink_rcv_skb+0x6c/0xce
[ 1814.854902] rtnetlink_rcv+0x23/0x2a
[ 1814.855064] netlink_unicast+0x103/0x181
[ 1814.855230] netlink_sendmsg+0x326/0x337
[ 1814.855398] sock_sendmsg_nosec+0x14/0x3f
[ 1814.855584] sock_sendmsg+0x29/0x2e
[ 1814.855747] ___sys_sendmsg+0x209/0x28b
[ 1814.855912] ? do_raw_spin_unlock+0xcd/0xf8
[ 1814.856082] ? _raw_spin_unlock+0x27/0x31
[ 1814.856251] ? __handle_mm_fault+0x651/0xdb1
[ 1814.856421] ? check_chain_key+0xb0/0xfd
[ 1814.856592] __sys_sendmsg+0x45/0x63
[ 1814.856755] ? __sys_sendmsg+0x45/0x63
[ 1814.856923] SyS_sendmsg+0x19/0x1b
[ 1814.857083] entry_SYSCALL_64_fastpath+0x23/0xc2
[ 1814.857256] RIP: 0033:0x7f733b2dd690
[ 1814.857419] RSP: 002b:00007ffe1d3387d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1814.858238] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f733b2dd690
[ 1814.858445] RDX: 0000000000000000 RSI: 00007ffe1d338820 RDI: 0000000000000003
[ 1814.858651] RBP: ffff88005adcbf98 R08: 0000000000000001 R09: 0000000000000003
[ 1814.858856] R10: 00007ffe1d3385a0 R11: 0000000000000246 R12: 0000000000000002
[ 1814.859060] R13: 000000000066f1a0 R14: 00007ffe1d3408d0 R15: 0000000000000000
[ 1814.859267] ? trace_hardirqs_off_caller+0xa7/0xcf
[ 1814.859446] Code: 10 55 48 89 c7 48 89 e5 e8 45 a1 fb ff 31 c0 5d c3
31 c0 c3 66 66 66 66 90 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 8b
45 30 <4c> 8b 20 41 8b 5c 24 38 31 c9 31 d2 48 c7 c7 50 8e 1d 82 41 89
[ 1814.860022] RIP: hrtimer_active+0x17/0x8a RSP: ffff88005adcb590
[ 1814.860214] CR2: 0000000000000000
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is very unlikely to happen but the backlogs memory allocation
could fail and will free q->flows, but then ->destroy() will free
q->flows too. For correctness remove the first free and let ->destroy
clean up.
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CBQ can fail on ->init by wrong nl attributes or simply for missing any,
f.e. if it's set as a default qdisc then TCA_OPTIONS (opt) will be NULL
when it is activated. The first thing init does is parse opt but it will
dereference a null pointer if used as a default qdisc, also since init
failure at default qdisc invokes ->reset() which cancels all timers then
we'll also dereference two more null pointers (timer->base) as they were
never initialized.
To reproduce:
$ sysctl net.core.default_qdisc=cbq
$ ip l set ethX up
Crash log of the first null ptr deref:
[44727.907454] BUG: unable to handle kernel NULL pointer dereference at (null)
[44727.907600] IP: cbq_init+0x27/0x205
[44727.907676] PGD 59ff4067
[44727.907677] P4D 59ff4067
[44727.907742] PUD 59c70067
[44727.907807] PMD 0
[44727.907873]
[44727.907982] Oops: 0000 [#1] SMP
[44727.908054] Modules linked in:
[44727.908126] CPU: 1 PID: 21312 Comm: ip Not tainted 4.13.0-rc6+ #60
[44727.908235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[44727.908477] task: ffff88005ad42700 task.stack: ffff880037214000
[44727.908672] RIP: 0010:cbq_init+0x27/0x205
[44727.908838] RSP: 0018:ffff8800372175f0 EFLAGS: 00010286
[44727.909018] RAX: ffffffff816c3852 RBX: ffff880058c53800 RCX: 0000000000000000
[44727.909222] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff8800372175f8
[44727.909427] RBP: ffff880037217650 R08: ffffffff81b0f380 R09: 0000000000000000
[44727.909631] R10: ffff880037217660 R11: 0000000000000020 R12: ffffffff822a44c0
[44727.909835] R13: ffff880058b92000 R14: 00000000ffffffff R15: 0000000000000001
[44727.910040] FS: 00007ff8bc583740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000
[44727.910339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44727.910525] CR2: 0000000000000000 CR3: 00000000371e5000 CR4: 00000000000406e0
[44727.910731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[44727.910936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[44727.911141] Call Trace:
[44727.911291] ? lockdep_init_map+0xb6/0x1ba
[44727.911461] ? qdisc_alloc+0x14e/0x187
[44727.911626] qdisc_create_dflt+0x7a/0x94
[44727.911794] ? dev_activate+0x129/0x129
[44727.911959] attach_one_default_qdisc+0x36/0x63
[44727.912132] netdev_for_each_tx_queue+0x3d/0x48
[44727.912305] dev_activate+0x4b/0x129
[44727.912468] __dev_open+0xe7/0x104
[44727.912631] __dev_change_flags+0xc6/0x15c
[44727.912799] dev_change_flags+0x25/0x59
[44727.912966] do_setlink+0x30c/0xb3f
[44727.913129] ? check_chain_key+0xb0/0xfd
[44727.913294] ? check_chain_key+0xb0/0xfd
[44727.913463] rtnl_newlink+0x3a4/0x729
[44727.913626] ? rtnl_newlink+0x117/0x729
[44727.913801] ? ns_capable_common+0xd/0xb1
[44727.913968] ? ns_capable+0x13/0x15
[44727.914131] rtnetlink_rcv_msg+0x188/0x197
[44727.914300] ? rcu_read_unlock+0x3e/0x5f
[44727.914465] ? rtnl_newlink+0x729/0x729
[44727.914630] netlink_rcv_skb+0x6c/0xce
[44727.914796] rtnetlink_rcv+0x23/0x2a
[44727.914956] netlink_unicast+0x103/0x181
[44727.915122] netlink_sendmsg+0x326/0x337
[44727.915291] sock_sendmsg_nosec+0x14/0x3f
[44727.915459] sock_sendmsg+0x29/0x2e
[44727.915619] ___sys_sendmsg+0x209/0x28b
[44727.915784] ? do_raw_spin_unlock+0xcd/0xf8
[44727.915954] ? _raw_spin_unlock+0x27/0x31
[44727.916121] ? __handle_mm_fault+0x651/0xdb1
[44727.916290] ? check_chain_key+0xb0/0xfd
[44727.916461] __sys_sendmsg+0x45/0x63
[44727.916626] ? __sys_sendmsg+0x45/0x63
[44727.916792] SyS_sendmsg+0x19/0x1b
[44727.916950] entry_SYSCALL_64_fastpath+0x23/0xc2
[44727.917125] RIP: 0033:0x7ff8bbc96690
[44727.917286] RSP: 002b:00007ffc360991e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[44727.917579] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007ff8bbc96690
[44727.917783] RDX: 0000000000000000 RSI: 00007ffc36099230 RDI: 0000000000000003
[44727.917987] RBP: ffff880037217f98 R08: 0000000000000001 R09: 0000000000000003
[44727.918190] R10: 00007ffc36098fb0 R11: 0000000000000246 R12: 0000000000000006
[44727.918393] R13: 000000000066f1a0 R14: 00007ffc360a12e0 R15: 0000000000000000
[44727.918597] ? trace_hardirqs_off_caller+0xa7/0xcf
[44727.918774] Code: 41 5f 5d c3 66 66 66 66 90 55 48 8d 56 04 45 31 c9
49 c7 c0 80 f3 b0 81 48 89 e5 41 55 41 54 53 48 89 fb 48 8d 7d a8 48 83
ec 48 <0f> b7 0e be 07 00 00 00 83 e9 04 e8 e6 f7 d8 ff 85 c0 0f 88 bb
[44727.919332] RIP: cbq_init+0x27/0x205 RSP: ffff8800372175f0
[44727.919516] CR2: 0000000000000000
Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Depending on where ->init fails we can get a null pointer deref due to
uninitialized hires timer (watchdog) or a double free of the qdisc hash
because it is already freed by ->destroy().
Fixes: 8d5537387505 ("net/sched/hfsc: allocate tcf block for hfsc root class")
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If sch_hhf fails in its ->init() function (either due to wrong
user-space arguments as below or memory alloc failure of hh_flows) it
will do a null pointer deref of q->hh_flows in its ->destroy() function.
To reproduce the crash:
$ tc qdisc add dev eth0 root hhf quantum 2000000 non_hh_weight 10000000
Crash log:
[ 690.654882] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 690.655565] IP: hhf_destroy+0x48/0xbc
[ 690.655944] PGD 37345067
[ 690.655948] P4D 37345067
[ 690.656252] PUD 58402067
[ 690.656554] PMD 0
[ 690.656857]
[ 690.657362] Oops: 0000 [#1] SMP
[ 690.657696] Modules linked in:
[ 690.658032] CPU: 3 PID: 920 Comm: tc Not tainted 4.13.0-rc6+ #57
[ 690.658525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 690.659255] task: ffff880058578000 task.stack: ffff88005acbc000
[ 690.659747] RIP: 0010:hhf_destroy+0x48/0xbc
[ 690.660146] RSP: 0018:ffff88005acbf9e0 EFLAGS: 00010246
[ 690.660601] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
[ 690.661155] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff821f63f0
[ 690.661710] RBP: ffff88005acbfa08 R08: ffffffff81b10a90 R09: 0000000000000000
[ 690.662267] R10: 00000000f42b7019 R11: ffff880058578000 R12: 00000000ffffffea
[ 690.662820] R13: ffff8800372f6400 R14: 0000000000000000 R15: 0000000000000000
[ 690.663769] FS: 00007f8ae5e8b740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
[ 690.667069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 690.667965] CR2: 0000000000000000 CR3: 0000000058523000 CR4: 00000000000406e0
[ 690.668918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 690.669945] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 690.671003] Call Trace:
[ 690.671743] qdisc_create+0x377/0x3fd
[ 690.672534] tc_modify_qdisc+0x4d2/0x4fd
[ 690.673324] rtnetlink_rcv_msg+0x188/0x197
[ 690.674204] ? rcu_read_unlock+0x3e/0x5f
[ 690.675091] ? rtnl_newlink+0x729/0x729
[ 690.675877] netlink_rcv_skb+0x6c/0xce
[ 690.676648] rtnetlink_rcv+0x23/0x2a
[ 690.677405] netlink_unicast+0x103/0x181
[ 690.678179] netlink_sendmsg+0x326/0x337
[ 690.678958] sock_sendmsg_nosec+0x14/0x3f
[ 690.679743] sock_sendmsg+0x29/0x2e
[ 690.680506] ___sys_sendmsg+0x209/0x28b
[ 690.681283] ? __handle_mm_fault+0xc7d/0xdb1
[ 690.681915] ? check_chain_key+0xb0/0xfd
[ 690.682449] __sys_sendmsg+0x45/0x63
[ 690.682954] ? __sys_sendmsg+0x45/0x63
[ 690.683471] SyS_sendmsg+0x19/0x1b
[ 690.683974] entry_SYSCALL_64_fastpath+0x23/0xc2
[ 690.684516] RIP: 0033:0x7f8ae529d690
[ 690.685016] RSP: 002b:00007fff26d2d6b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 690.685931] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f8ae529d690
[ 690.686573] RDX: 0000000000000000 RSI: 00007fff26d2d700 RDI: 0000000000000003
[ 690.687047] RBP: ffff88005acbff98 R08: 0000000000000001 R09: 0000000000000000
[ 690.687519] R10: 00007fff26d2d480 R11: 0000000000000246 R12: 0000000000000002
[ 690.687996] R13: 0000000001258070 R14: 0000000000000001 R15: 0000000000000000
[ 690.688475] ? trace_hardirqs_off_caller+0xa7/0xcf
[ 690.688887] Code: 00 00 e8 2a 02 ae ff 49 8b bc 1d 60 02 00 00 48 83
c3 08 e8 19 02 ae ff 48 83 fb 20 75 dc 45 31 f6 4d 89 f7 4d 03 bd 20 02
00 00 <49> 8b 07 49 39 c7 75 24 49 83 c6 10 49 81 fe 00 40 00 00 75 e1
[ 690.690200] RIP: hhf_destroy+0x48/0xbc RSP: ffff88005acbf9e0
[ 690.690636] CR2: 0000000000000000
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The below commit added a call to ->destroy() on init failure, but multiq
still frees ->queues on error in init, but ->queues is also freed by
->destroy() thus we get double free and corrupted memory.
Very easy to reproduce (eth0 not multiqueue):
$ tc qdisc add dev eth0 root multiq
RTNETLINK answers: Operation not supported
$ ip l add dumdum type dummy
(crash)
Trace log:
[ 3929.467747] general protection fault: 0000 [#1] SMP
[ 3929.468083] Modules linked in:
[ 3929.468302] CPU: 3 PID: 967 Comm: ip Not tainted 4.13.0-rc6+ #56
[ 3929.468625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 3929.469124] task: ffff88003716a700 task.stack: ffff88005872c000
[ 3929.469449] RIP: 0010:__kmalloc_track_caller+0x117/0x1be
[ 3929.469746] RSP: 0018:ffff88005872f6a0 EFLAGS: 00010246
[ 3929.470042] RAX: 00000000000002de RBX: 0000000058a59000 RCX: 00000000000002df
[ 3929.470406] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff821f7020
[ 3929.470770] RBP: ffff88005872f6e8 R08: 000000000001f010 R09: 0000000000000000
[ 3929.471133] R10: ffff88005872f730 R11: 0000000000008cdd R12: ff006d75646d7564
[ 3929.471496] R13: 00000000014000c0 R14: ffff88005b403c00 R15: ffff88005b403c00
[ 3929.471869] FS: 00007f0b70480740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
[ 3929.472286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3929.472677] CR2: 00007ffcee4f3000 CR3: 0000000059d45000 CR4: 00000000000406e0
[ 3929.473209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3929.474109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3929.474873] Call Trace:
[ 3929.475337] ? kstrdup_const+0x23/0x25
[ 3929.475863] kstrdup+0x2e/0x4b
[ 3929.476338] kstrdup_const+0x23/0x25
[ 3929.478084] __kernfs_new_node+0x28/0xbc
[ 3929.478478] kernfs_new_node+0x35/0x55
[ 3929.478929] kernfs_create_link+0x23/0x76
[ 3929.479478] sysfs_do_create_link_sd.isra.2+0x85/0xd7
[ 3929.480096] sysfs_create_link+0x33/0x35
[ 3929.480649] device_add+0x200/0x589
[ 3929.481184] netdev_register_kobject+0x7c/0x12f
[ 3929.481711] register_netdevice+0x373/0x471
[ 3929.482174] rtnl_newlink+0x614/0x729
[ 3929.482610] ? rtnl_newlink+0x17f/0x729
[ 3929.483080] rtnetlink_rcv_msg+0x188/0x197
[ 3929.483533] ? rcu_read_unlock+0x3e/0x5f
[ 3929.483984] ? rtnl_newlink+0x729/0x729
[ 3929.484420] netlink_rcv_skb+0x6c/0xce
[ 3929.484858] rtnetlink_rcv+0x23/0x2a
[ 3929.485291] netlink_unicast+0x103/0x181
[ 3929.485735] netlink_sendmsg+0x326/0x337
[ 3929.486181] sock_sendmsg_nosec+0x14/0x3f
[ 3929.486614] sock_sendmsg+0x29/0x2e
[ 3929.486973] ___sys_sendmsg+0x209/0x28b
[ 3929.487340] ? do_raw_spin_unlock+0xcd/0xf8
[ 3929.487719] ? _raw_spin_unlock+0x27/0x31
[ 3929.488092] ? __handle_mm_fault+0x651/0xdb1
[ 3929.488471] ? check_chain_key+0xb0/0xfd
[ 3929.488847] __sys_sendmsg+0x45/0x63
[ 3929.489206] ? __sys_sendmsg+0x45/0x63
[ 3929.489576] SyS_sendmsg+0x19/0x1b
[ 3929.489901] entry_SYSCALL_64_fastpath+0x23/0xc2
[ 3929.490172] RIP: 0033:0x7f0b6fb93690
[ 3929.490423] RSP: 002b:00007ffcee4ed588 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3929.490881] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f0b6fb93690
[ 3929.491198] RDX: 0000000000000000 RSI: 00007ffcee4ed5d0 RDI: 0000000000000003
[ 3929.491521] RBP: ffff88005872ff98 R08: 0000000000000001 R09: 0000000000000000
[ 3929.491801] R10: 00007ffcee4ed350 R11: 0000000000000246 R12: 0000000000000002
[ 3929.492075] R13: 000000000066f1a0 R14: 00007ffcee4f5680 R15: 0000000000000000
[ 3929.492352] ? trace_hardirqs_off_caller+0xa7/0xcf
[ 3929.492590] Code: 8b 45 c0 48 8b 45 b8 74 17 48 8b 4d c8 83 ca ff 44
89 ee 4c 89 f7 e8 83 ca ff ff 49 89 c4 eb 49 49 63 56 20 48 8d 48 01 4d
8b 06 <49> 8b 1c 14 48 89 c2 4c 89 e0 65 49 0f c7 08 0f 94 c0 83 f0 01
[ 3929.493335] RIP: __kmalloc_track_caller+0x117/0x1be RSP: ffff88005872f6a0
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Fixes: f07d1501292b ("multiq: Further multiqueue cleanup")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit below added a call to the ->destroy() callback for all qdiscs
which failed in their ->init(), but some were not prepared for such
change and can't handle partially initialized qdisc. HTB is one of them
and if any error occurs before the qdisc watchdog timer and qdisc work are
initialized then we can hit either a null ptr deref (timer->base) when
canceling in ->destroy or lockdep error info about trying to register
a non-static key and a stack dump. So to fix these two move the watchdog
timer and workqueue init before anything that can err out.
To reproduce userspace needs to send broken htb qdisc create request,
tested with a modified tc (q_htb.c).
Trace log:
[ 2710.897602] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2710.897977] IP: hrtimer_active+0x17/0x8a
[ 2710.898174] PGD 58fab067
[ 2710.898175] P4D 58fab067
[ 2710.898353] PUD 586c0067
[ 2710.898531] PMD 0
[ 2710.898710]
[ 2710.899045] Oops: 0000 [#1] SMP
[ 2710.899232] Modules linked in:
[ 2710.899419] CPU: 1 PID: 950 Comm: tc Not tainted 4.13.0-rc6+ #54
[ 2710.899646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 2710.900035] task: ffff880059ed2700 task.stack: ffff88005ad4c000
[ 2710.900262] RIP: 0010:hrtimer_active+0x17/0x8a
[ 2710.900467] RSP: 0018:ffff88005ad4f960 EFLAGS: 00010246
[ 2710.900684] RAX: 0000000000000000 RBX: ffff88003701e298 RCX: 0000000000000000
[ 2710.900933] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003701e298
[ 2710.901177] RBP: ffff88005ad4f980 R08: 0000000000000001 R09: 0000000000000001
[ 2710.901419] R10: ffff88005ad4f800 R11: 0000000000000400 R12: 0000000000000000
[ 2710.901663] R13: ffff88003701e298 R14: ffffffff822a4540 R15: ffff88005ad4fac0
[ 2710.901907] FS: 00007f2f5e90f740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000
[ 2710.902277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2710.902500] CR2: 0000000000000000 CR3: 0000000058ca3000 CR4: 00000000000406e0
[ 2710.902744] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2710.902977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2710.903180] Call Trace:
[ 2710.903332] hrtimer_try_to_cancel+0x1a/0x93
[ 2710.903504] hrtimer_cancel+0x15/0x20
[ 2710.903667] qdisc_watchdog_cancel+0x12/0x14
[ 2710.903866] htb_destroy+0x2e/0xf7
[ 2710.904097] qdisc_create+0x377/0x3fd
[ 2710.904330] tc_modify_qdisc+0x4d2/0x4fd
[ 2710.904511] rtnetlink_rcv_msg+0x188/0x197
[ 2710.904682] ? rcu_read_unlock+0x3e/0x5f
[ 2710.904849] ? rtnl_newlink+0x729/0x729
[ 2710.905017] netlink_rcv_skb+0x6c/0xce
[ 2710.905183] rtnetlink_rcv+0x23/0x2a
[ 2710.905345] netlink_unicast+0x103/0x181
[ 2710.905511] netlink_sendmsg+0x326/0x337
[ 2710.905679] sock_sendmsg_nosec+0x14/0x3f
[ 2710.905847] sock_sendmsg+0x29/0x2e
[ 2710.906010] ___sys_sendmsg+0x209/0x28b
[ 2710.906176] ? do_raw_spin_unlock+0xcd/0xf8
[ 2710.906346] ? _raw_spin_unlock+0x27/0x31
[ 2710.906514] ? __handle_mm_fault+0x651/0xdb1
[ 2710.906685] ? check_chain_key+0xb0/0xfd
[ 2710.906855] __sys_sendmsg+0x45/0x63
[ 2710.907018] ? __sys_sendmsg+0x45/0x63
[ 2710.907185] SyS_sendmsg+0x19/0x1b
[ 2710.907344] entry_SYSCALL_64_fastpath+0x23/0xc2
Note that probably this bug goes further back because the default qdisc
handling always calls ->destroy on init failure too.
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recent patch had an endian warning ie
cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
Currently the maximum size of SMB2/3 header is set incorrectly which
leads to hanging of directory listing operations on encrypted SMB3
connections. Fix this by setting the maximum size to 170 bytes that
is calculated as RFC1002 length field size (4) + transform header
size (52) + SMB2 header size (64) + create response size (56).
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
|
|
cq_period_mode assignment was mistakenly removed so it was always set to "0",
which is EQE based moderation, regardless of the device CAPs and
requested value in ethtool.
Fixes: 6a9764efb255 ("net/mlx5e: Isolate open_channels from priv->params")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Fix inline header size, make sure it is not greater than skb len.
This bug effects small packets, for example L2 packets with size < 18.
Fixes: ae76715d153e ("net/mlx5e: Check the minimum inline header mode before xmit")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
When changing from switchdev to legacy mode, all the representor port
devices (uplink nic and reps) are cleaned up. Part of this cleaning
process is removing the neigh entries and the hash table containing them.
However, a representor neigh entry might be linked to the uplink port
hash table and if the uplink nic is cleaned first the cleaning of the
representor will end up in null deref.
Fix that by unloading the representors in the opposite order of load.
Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|