Age | Commit message (Collapse) | Author | Files | Lines |
|
C-SKY page table attributes only have 'Dirty' and 'Valid' to
emulate 'PRESENT, READ, WRITE, EXEC, DIRTY, ACCESSED'.
This patch cleanup unnecessary definition.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
When the system memory is exhausted, linux will trigger kswapd to
shrink memory page cache. We found the csky's .text file mapping
pages would be reclaimed earlier than arm's elf. Because csky
doesn't give _PAGE_ACCESSED for default pgprot and in zap_pte_range
if (pte_young(ptent) &&
likely(!(vma->vm_flags & VM_SEQ_READ)))
mark_page_accessed(page);
mark_page_accessed will put the pages into active lru list.
[ 3.652722] delete busybox page from inactive file list
Call Trace:
[<9012a376>] dump_stack+0xe/0x24
[<9012a370>] dump_stack+0x8/0x24
[<9005b780>] activate_page+0x2b4/0x2d4
[<90132502>] vsnprintf+0x2c6/0x374
[<9005b880>] mark_page_accessed+0xe0/0x150
[<9006903e>] unmap_page_range+0x166/0x33c
[<90021844>] get_signal+0x98/0x3b4
[<90069232>] unmap_single_vma+0x1e/0x24
[<90069462>] unmap_vmas+0x26/0x40
[<9006d3d8>] exit_mmap+0x60/0xbc
[<9006a140>] handle_mm_fault+0x700/0xcec
[<900426b2>] ktime_get_with_offset+0x86/0x130
[<90017566>] mmput+0x2e/0x90
[<9001a30a>] do_exit+0x13e/0x6f0
[<90015448>] page_fault_end+0x14/0x74
[<9001b4bc>] SyS_exit_group+0x0/0xc
[<9001b47c>] do_group_exit+0x2c/0x6c
[<9001b4c8>] __wake_up_parent+0x0/0x20
[<9001399e>] csky_systemcall+0x6e/0x72
csky will throw the pages at first and keep them in active lru
list later after real accessed, but arm would keep them in active
lru list at the beginning.
The following are statistics of different architecture styles:
Default _PAGE_ACCESSED: alpha, arm, arm64, ia64, m68k, microblaze,
openrisc, powerpc, riscv, sh, um, x86,
xtensa
Not def _PAGE_ACCESSED: arc, c6x, h8300, hexgon, mips, s390, nds32,
nios2, parisc, sparc
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Xu Kai <xukai@nationalchip.com>
Signed-off-by: Xu Kai <xukai@nationalchip.com>
|
|
Remove including <linux/version.h> that don't need it.
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
"*" is missed in size determination as we are passing register set
rather than a pointer.
Fixes: dcad7854fcce ("sky: switch to ->regset_get()")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
Reconstruct vdso framework to support future vsyscall,
vgettimeofday features. These are very important features to reduce
system calls into the kernel for performance improvement.
The patch is reference RISC-V's
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Pick up the patch from the 'Link' made by Mark Rutland. Keep the
same with x86, arm, arm64, arc, sh, power.
Link: https://lore.kernel.org/linux-arm-kernel/1499782763-31418-1-git-send-email-mark.rutland@arm.com/
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Mark Rutland <mark.rutland@arm.com>
|
|
Sync arch/riscv/mm/fault.c into arch/csky for easy maintenance.
Here are the patches related to the modification:
cac4d1d "riscv/mm/fault: Move no context handling to no_context()"
ac416a7 "riscv/mm/fault: Move vmalloc fault handling to vmalloc_fault()"
6c11ffb "riscv/mm/fault: Move fault error handling to mm_fault_error()"
afb8c6f "riscv/mm/fault: Move access error check to function"
bda281d "riscv/mm/fault: Simplify fault error handling"
a51271d "riscv/mm/fault: Move bad area handling to bad_area()"
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
We must succeed parent's context irq status in page fault handler.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
Similar to other architectures:
In addition to in_atomic, we also need pagefault_disabled() to
check.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
The function update_mmu_cache could be called by user-io mapping.
There is no space of struct page in mem_map for the pte. Just
ignore the user-io mmaping in update_mmu_cache.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
The past code only passes the FAULT_FLAG_WRITE into
handle_mm_fault and missing USER & DEFAULT & RETRY.
The patch references to arch/riscv/mm/fault.c, but there is no
FAULT_FLAG_INSTRUCTION in csky hw.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
Print all 1024 jtlb entries and 16 iutlb entries and 16 dutlb
entries in show_regs.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
TLB invalidate didn't contain a barrier operation in csky cpu and
we need to prevent previous PTW response after TLB invalidation
instruction. Of cause, the ASID changing also needs to take care
of the issue.
CPU0 CPU1
=============== ===============
set_pte
sync_is() -> See the previous set_pte for all harts
tlbi.vas -> Invalidate all harts TLB entry & flush pipeline
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
Here is the log after enabled:
[ 1.798972] kmemleak: Kernel memory leak detector initialized (mem pool available: 15851)
[ 1.798983] kmemleak: Automatic memory scanning thread started
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
There is a prologue on page fault handler which marking pages dirty
and/or accessed in page attributes, but all of these have been
handled in handle_pte_fault.
- Add flush_tlb_one in vmalloc page fault instead of prologue.
- Using cmxchg_fixup C codes in do_page_fault instead of ASM one.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
Fixup commit c2d1adfa9a24 "csky: Add memory layout 2.5G(user):1.5G
(kernel)". That patch broke the global bit in PTE.
C-SKY TLB's entry contain two pages:
vpn, vpn + 1 -> ppn0, ppn1
All PPN's attributes contain global bit and final global is PPN0.G
& PPN1.G. So we must keep PPN0.G and PPN1.G same in one TLB's
entry.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
There are two implementation of spinlock in arch/csky:
- simple one (NR_CPU = 1,2)
- tick's one (NR_CPU = 3,4)
Remove the simple one.
There is already smp_mb in spinlock, so remove the definition of
smp_mb__after_spinlock.
Link: https://lore.kernel.org/linux-csky/20200807081253.GD2674@hirez.programming.kicks-ass.net/#t
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Peter Zijlstra <peterz@infradead.org>k
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
Optimize the performance of cmpxchg by using more fine-grained
acquire/release barriers.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
|
|
Arnd said:
I would guess that for csky, this is a mistake, as the architecture
is fairly new and should be able to implement it.
Guo reply:
The c610, c807, c810 don't support SMP, so futex_cmpxchg_enabled = 1
with asm-generic's implementation.
For c860, there is no HAVE_FUTEX_CMPXCHG and cmpxchg_inatomic/inuser
implementation, so futex_cmpxchg_enabled = 0.
Thx for point it out, we'll implement cmpxchg_inatomic/inuser for
C860 and still use asm-generic for non-smp CPUs.
LTP test:
futex_wait01 1 TPASS : futex_wait(): errno=ETIMEDOUT(110): Connection timed out
futex_wait01 2 TPASS : futex_wait(): errno=EAGAIN/EWOULDBLOCK(11): Resource temporarily unavailable
futex_wait01 3 TPASS : futex_wait(): errno=ETIMEDOUT(110): Connection timed out
futex_wait01 4 TPASS : futex_wait(): errno=EAGAIN/EWOULDBLOCK(11): Resource temporarily unavailable
futex_wait02 1 TPASS : futex_wait() woken up
futex_wait03 1 TPASS : futex_wait() woken up
futex_wait04 1 TPASS : futex_wait() returned -1: errno=EAGAIN/EWOULDBLOCK(11): Resource temporarily unavailable
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/lkml/CAK8P3a3+WaQNyJ6Za2qfu6=0mBgU1hApnRXrdp1b1=P7wwyRUg@mail.gmail.com/
|
|
Remove shareable bit for ordering barrier, just keep ordering
in current hart is enough for SMP. Using three continuous
sync.is as PTW barrier to prevent speculative PTW in 860
microarchitecture.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
Use generic atomic implementation based on cmpxchg. So remove csky
asm/atomic.h.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
|
|
Current show_regs didn't display regs->usp and it confused debug.
So fixup wrong SP display and add PT_REGS.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
Current perf init will failed with:
[ 1.452433] csky-pmu: probe of soc:pmu failed with error -16
This patch fix it up with adding CPUHP_AP_PERF_CSKY_ONLINE in
cpuhotplug.h.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
|
|
There are two ways for translating va to pa for csky:
- Use TLB(Translate Lookup Buffer) and PTW (Page Table Walk)
- Use SSEG0/1 (Simple Segment Mapping)
We use tlb mapping 0-2G and 3G-4G virtual address area and SSEG0/1
are for 2G-2.5G and 2.5G-3G translation. We could disable SSEG0
to use 2G-2.5G as TLB user mapping.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
|
|
Section 3.11 was incorrectly called 3.9, fix it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Change my email contact ahead of a likely painful eleven-month migration
to a certain cobalt enteprisey groupware cloud product that will totally
break my workflow. Some day I may get used to having to email being
sequestered behind both claret and cerulean oath2+sms 2fa layers, but
for now I'll stick with keying in one password to receive an email vs.
the required four.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
its internals like ->files and ->mm to be poked by the SQPOLL task, it
have never been nice and recently got racy. That can happen when the
owner undergoes destruction and SQPOLL tasks tries to submit new
requests in parallel, and so calls io_sq_thread_acquire*().
That patch halts SQPOLL submissions when sqo_task dies by introducing
sqo_dead flag. Once set, the SQPOLL task must not do any submission,
which is synchronised by uring_lock as well as the new flag.
The tricky part is to make sure that disabling always happens, that
means either the ring is discovered by creator's do_exit() -> cancel,
or if the final close() happens before it's done by the creator. The
last is guaranteed by the fact that for SQPOLL the creator task and only
it holds exactly one file note, so either it pins up to do_exit() or
removed by the creator on the final put in flush. (see comments in
uring_flush() around file->f_count == 2).
One more place that can trigger io_sq_thread_acquire_*() is
__io_req_task_submit(). Shoot off requests on sqo_dead there, even
though actually we don't need to. That's because cancellation of
sqo_task should wait for the request before going any further.
note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
caller would enter the ring to get an error, but it still doesn't
guarantee that the flag won't be cleared.
note 2: if final __userspace__ close happens not from the creator
task, the file note will pin the ring until the task dies.
Fixed: b1b6b5a30dce8 ("kernel/io_uring: cancel io_uring before task works")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
files_cancel() should cancel all relevant requests and drop file notes,
so we should never have file notes after that, including on-exit fput
and flush. Add a WARN_ONCE to be sure.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A simple preparation change inlining io_uring_attempt_task_drop() into
io_uring_flush().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We expect io_rw_reissue() to take place only during submission with
uring_lock held. Add a lockdep annotation to check that invariant.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature
set, it means the cache device is created with obsoleted layout with
obso_bucket_site_hi. Now bcache does not support this feature bit, a new
BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added
for a better layout to support large bucket size.
For the legacy compatibility purpose, if a cache device created with
obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache
devices attached to this cache set should be set to read-only. Then the
dirty data can be written back to backing device before re-create the
cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit
by the latest bcache-tools.
This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit
when running a cache set and attach a bcache device to the cache set. If
this bit is set,
- When run a cache set, print an error kernel message to indicate all
following attached bcache device will be read-only.
- When attach a bcache device, print an error kernel message to indicate
the attached bcache device will be read-only, and ask users to update
to latest bcache-tools.
Such change is only for cache device whose bucket size >= 32MB, this is
for the zoned SSD and almost nobody uses such large bucket size at this
moment. If you don't explicit set a large bucket size for a zoned SSD,
such change is totally transparent to your bcache device.
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
was introduced into the incompat feature set. It used bucket_size_hi
(which was added at the tail of struct cache_sb_disk) to extend current
16bit bucket size to 32bit with existing bucket_size in struct
cache_sb_disk.
This is not a good idea, there are two obvious problems,
- Bucket size is always value power of 2, if store log2(bucket size) in
existing bucket_size of struct cache_sb_disk, it is unnecessary to add
bucket_size_hi.
- Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
struct cache_sb_disk, bucket_size_hi was added after d[] which makes
csum_set calculate an unexpected super block checksum.
To fix the above problems, this patch introduces a new incompat feature
bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
means bucket_size in struct cache_sb_disk stores the order of power-of-2
bucket size value. When user specifies a bucket size larger than 32768
sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
incompat feature set, and bucket_size stores log2(bucket size) more
than store the real bucket size value.
The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
recognized by kernel driver for legacy compatible purpose. The previous
bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
and not used in bcache-tools anymore.
For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
bcache-tools and kernel driver still recognize the feature string and
display it as "obso_large_bucket".
With this change, the unnecessary extra space extend of bcache on-disk
super block can be avoided, and csum_set() may generate expected check
sum as well.
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch adds the check for features which is incompatible for
current supported feature sets.
Now if the bcache device created by bcache-tools has features that
current kernel doesn't support, read_super() will fail with error
messoage. E.g. if an unsupported incompatible feature detected,
bcache register will fail with dmesg "bcache: register_bcache() error :
Unsupported incompatible feature found".
Fixes: d721a43ff69c ("bcache: increase super block version for cache device and backing device")
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch fixes the following typos,
from BCH_FEATURE_COMPAT_SUUP to BCH_FEATURE_COMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_INCOMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_RO_COMPAT_SUPP
Fixes: d721a43ff69c ("bcache: increase super block version for cache device and backing device")
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is no need to reassign pdev_set_uuid in the second loop iteration,
so move it to the place before second loop.
Signed-off-by: Yi Li <yili@winhong.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
HSDK has hardware floating point and the common use case is with
glibc+hf so enable that as default.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
The kernel test robot reported a -5.8% performance regression on the
"poll2" test of will-it-scale, and bisected it to commit d55564cfc222
("x86: Make __put_user() generate an out-of-line call").
I didn't expect an out-of-line __put_user() to matter, because no normal
core code should use that non-checking legacy version of user access any
more. But I had overlooked the very odd poll() usage, which does a
__put_user() to update the 'revents' values of the poll array.
Now, Al Viro correctly points out that instead of updating just the
'revents' field, it would be much simpler to just copy the _whole_
pollfd entry, and then we could just use "copy_to_user()" on the whole
array of entries, the same way we use "copy_from_user()" a few lines
earlier to get the original values.
But that is not what we've traditionally done, and I worry that threaded
applications might be concurrently modifying the other fields of the
pollfd array. So while Al's suggestion is simpler - and perhaps worth
trying in the future - this instead keeps the "just update revents"
model.
To fix the performance regression, use the modern "unsafe_put_user()"
instead of __put_user(), with the proper "user_write_access_begin()"
guarding in place. This improves code generation enormously.
Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Laight <David.Laight@aculab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit 757055ae8dedf5333af17b3b5b4b70ba9bc9da4e.
The commit caused that ttynull was used as the default console
on several systems[1][2][3]. As a result, the console was
blank even when a better alternative existed.
It happened when there was no console configured
on the command line and ttynull_init() was the first initcall
calling register_console().
Or it happened when /dev/ did not exist when console_on_rootfs()
was called. It was not able to open /dev/console even though
a console driver was registered. It tried to add ttynull console
but it obviously did not help. But ttynull became the preferred
console and was used by /dev/console when it was available later.
The commit tried to fix a historical problem that have been there
for ages. The primary motivation was the commit 3cffa06aeef7ece30f6
("printk/console: Allow to disable console output by using console=""
or console=null"). It provided a clean solution for a workaround
that was widely used and worked only by chance.
This revert causes that the console="" or console=null command line
options will again work only by chance. These options will cause that
a particular console will be preferred and the default (tty) ones
will not get enabled. There will be no console registered at
all. As a result there won't be stdin, stdout, and stderr for
the init process. But it worked exactly this way even before.
The proper solution has to fulfill many conditions:
+ Register ttynull only when explicitly required or as
the ultimate fallback.
+ ttynull should get associated with /dev/console but it must
not become preferred console when used as a fallback.
Especially, it must still be possible to replace it
by a better console later.
Such a change requires clean up of the register_console() code.
Otherwise, it would be even harder to follow. Especially, the use
of has_preferred_console and CON_CONSDEV flag is tricky. The clean
up is risky. The ordering of consoles is not well defined. And
any changes tend to break existing user settings.
Do the revert at the least risky solution for now.
[1] https://lore.kernel.org/linux-kselftest/20201221144302.GR4077@smile.fi.intel.com/
[2] https://lore.kernel.org/lkml/d2a3b3c0-e548-7dd1-730f-59bc5c04e191@synopsys.com/
[3] https://patchwork.ozlabs.org/project/linux-um/patch/20210105120128.10854-1-thomas@m3y3r.de/
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Vineet Gupta <vgupta@synopsys.com>
Reported-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
hwmon, specifically hwmon_num_channel_attrs, expects the config
array in the hwmon_channel_info structure to be terminated by
a zero entry. amd_energy does not honor this convention. As
result, a KASAN warning is possible. Fix this by adding an
additional entry and setting it to zero.
Fixes: 8abee9566b7e ("hwmon: Add amd_energy driver to report energy counters")
Signed-off-by: David Arcari <darcari@redhat.com>
Cc: Naveen Krishna Chatradhi <nchatrad@amd.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
Link: https://lore.kernel.org/r/20210107144707.6927-1-darcari@redhat.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
dtc points out that the interrupts for some devices are not parsable:
picoxcell-pc3x2.dtsi:45.19-49.5: Warning (interrupts_property): /paxi/gem@30000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:51.21-55.5: Warning (interrupts_property): /paxi/dmac@40000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:57.21-61.5: Warning (interrupts_property): /paxi/dmac@50000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:233.21-237.5: Warning (interrupts_property): /rwid-axi/axi2pico@c0000000: Missing interrupt-parent
There are two VIC instances, so it's not clear which one needs to be
used. I found the BSP sources that reference VIC0, so use that:
https://github.com/r1mikey/meta-picoxcell/blob/master/recipes-kernel/linux/linux-picochip-3.0/0001-picoxcell-support-for-Picochip-picoXcell-SoC.patch
Acked-by: Jamie Iles <jamie@jamieiles.com>
Link: https://lore.kernel.org/r/20201230152010.3914962-1-arnd@kernel.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Showing the hctx flags for when BLK_MQ_F_TAG_HCTX_SHARED is set gives
something like:
root@debian:/home/john# more /sys/kernel/debug/block/sda/hctx0/flags
alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED|3
Add the decoding for that flag.
Fixes: 32bc15afed04b ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We had kernel panic, it is caused by unload module and last
close confirmation.
call trace:
[1196029.743127] free_sess+0x15/0x50 [rtrs_client]
[1196029.743128] rtrs_clt_close+0x4c/0x70 [rtrs_client]
[1196029.743129] ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client]
[1196029.743130] close_rtrs+0x25/0x50 [rnbd_client]
[1196029.743131] rnbd_client_exit+0x93/0xb99 [rnbd_client]
[1196029.743132] __x64_sys_delete_module+0x190/0x260
And in the crashdump confirmation kworker is also running.
PID: 6943 TASK: ffff9e2ac8098000 CPU: 4 COMMAND: "kworker/4:2"
#0 [ffffb206cf337c30] __schedule at ffffffff9f93f891
#1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98
#2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938
#3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7
#4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e
#5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client]
#6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client]
#7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client]
#8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client]
#9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client]
The problem is both code path try to close same session, which lead to
panic.
To fix it, just skip the sess if the refcount already drop to 0.
Fixes: f7a7a5c228d4 ("block/rnbd: client: main functionality")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Adding name to the Contributors List
Signed-off-by: Swapnil Ingle <ingleswapnil@gmail.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Acked-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since dynamically allocate sglist is used for rnbd_iu, we can't free sg
table after send_usr_msg since the callback function (cqe.done) could
still access the sglist.
Otherwise KASAN reports UAF issue:
[ 4856.600257] BUG: KASAN: use-after-free in dma_direct_unmap_sg+0x53/0x290
[ 4856.600772] Read of size 4 at addr ffff888206af3a98 by task swapper/1/0
[ 4856.601729] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G W 5.10.0-pserver #5.10.0-1+feature+linux+next+20201214.1025+0910d71
[ 4856.601748] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
[ 4856.601766] Call Trace:
[ 4856.601785] <IRQ>
[ 4856.601822] dump_stack+0x99/0xcb
[ 4856.601856] ? dma_direct_unmap_sg+0x53/0x290
[ 4856.601888] print_address_description.constprop.7+0x1e/0x230
[ 4856.601913] ? freeze_kernel_threads+0x73/0x73
[ 4856.601965] ? mark_held_locks+0x29/0xa0
[ 4856.602019] ? dma_direct_unmap_sg+0x53/0x290
[ 4856.602039] ? dma_direct_unmap_sg+0x53/0x290
[ 4856.602079] kasan_report.cold.9+0x37/0x7c
[ 4856.602188] ? mlx5_ib_post_recv+0x430/0x520 [mlx5_ib]
[ 4856.602209] ? dma_direct_unmap_sg+0x53/0x290
[ 4856.602256] dma_direct_unmap_sg+0x53/0x290
[ 4856.602366] complete_rdma_req+0x188/0x4b0 [rtrs_client]
[ 4856.602451] ? rtrs_clt_close+0x80/0x80 [rtrs_client]
[ 4856.602535] ? mlx5_ib_poll_cq+0x48b/0x16e0 [mlx5_ib]
[ 4856.602589] ? radix_tree_insert+0x3a0/0x3a0
[ 4856.602610] ? do_raw_spin_lock+0x119/0x1d0
[ 4856.602647] ? rwlock_bug.part.1+0x60/0x60
[ 4856.602740] rtrs_clt_rdma_done+0x3f7/0x670 [rtrs_client]
[ 4856.602804] ? rtrs_clt_rdma_cm_handler+0xda0/0xda0 [rtrs_client]
[ 4856.602857] ? check_flags.part.31+0x6c/0x1f0
[ 4856.602927] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 4856.602963] ? rcu_read_lock_bh_held+0xc0/0xc0
[ 4856.603137] __ib_process_cq+0x10a/0x350 [ib_core]
[ 4856.603309] ib_poll_handler+0x41/0x1c0 [ib_core]
[ 4856.603358] irq_poll_softirq+0xe6/0x280
[ 4856.603392] ? lockdep_hardirqs_on_prepare+0x111/0x210
[ 4856.603446] __do_softirq+0x10d/0x646
[ 4856.603540] asm_call_irq_on_stack+0x12/0x20
[ 4856.603563] </IRQ>
[ 4856.605096] Allocated by task 8914:
[ 4856.605510] kasan_save_stack+0x19/0x40
[ 4856.605532] __kasan_kmalloc.constprop.7+0xc1/0xd0
[ 4856.605552] __kmalloc+0x155/0x320
[ 4856.605574] __sg_alloc_table+0x155/0x1c0
[ 4856.605594] sg_alloc_table+0x1f/0x50
[ 4856.605620] send_msg_sess_info+0x119/0x2e0 [rnbd_client]
[ 4856.605646] remap_devs+0x71/0x210 [rnbd_client]
[ 4856.605676] init_sess+0xad8/0xe10 [rtrs_client]
[ 4856.605706] rtrs_clt_reconnect_work+0xd6/0x170 [rtrs_client]
[ 4856.605728] process_one_work+0x521/0xa90
[ 4856.605748] worker_thread+0x65/0x5b0
[ 4856.605769] kthread+0x1f2/0x210
[ 4856.605789] ret_from_fork+0x22/0x30
[ 4856.606159] Freed by task 8914:
[ 4856.606559] kasan_save_stack+0x19/0x40
[ 4856.606580] kasan_set_track+0x1c/0x30
[ 4856.606601] kasan_set_free_info+0x1b/0x30
[ 4856.606622] __kasan_slab_free+0x108/0x150
[ 4856.606642] slab_free_freelist_hook+0x64/0x190
[ 4856.606661] kfree+0xe2/0x650
[ 4856.606681] __sg_free_table+0xa4/0x100
[ 4856.606707] send_msg_sess_info+0x1d6/0x2e0 [rnbd_client]
[ 4856.606733] remap_devs+0x71/0x210 [rnbd_client]
[ 4856.606763] init_sess+0xad8/0xe10 [rtrs_client]
[ 4856.606792] rtrs_clt_reconnect_work+0xd6/0x170 [rtrs_client]
[ 4856.606813] process_one_work+0x521/0xa90
[ 4856.606833] worker_thread+0x65/0x5b0
[ 4856.606853] kthread+0x1f2/0x210
[ 4856.606872] ret_from_fork+0x22/0x30
The solution is to free iu's sgtable after the iu is not used anymore.
And also move sg_alloc_table into rnbd_get_iu accordingly.
Fixes: 5a1328d0c3a7 ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
KASAN detect following BUG:
[ 778.215311] ==================================================================
[ 778.216696] BUG: KASAN: use-after-free in rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.219037] Read of size 8 at addr ffff88b1d6516c28 by task tee/8842
[ 778.220500] CPU: 37 PID: 8842 Comm: tee Kdump: loaded Not tainted 5.10.0-pserver #5.10.0-1+feature+linux+next+20201214.1025+0910d71
[ 778.220529] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
[ 778.220555] Call Trace:
[ 778.220609] dump_stack+0x99/0xcb
[ 778.220667] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.220715] print_address_description.constprop.7+0x1e/0x230
[ 778.220750] ? freeze_kernel_threads+0x73/0x73
[ 778.220896] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.220932] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.220994] kasan_report.cold.9+0x37/0x7c
[ 778.221066] ? kobject_put+0x80/0x270
[ 778.221102] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.221184] rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.221240] rnbd_srv_dev_session_force_close_store+0x6a/0xc0 [rnbd_server]
[ 778.221304] ? sysfs_file_ops+0x90/0x90
[ 778.221353] kernfs_fop_write+0x141/0x240
[ 778.221451] vfs_write+0x142/0x4d0
[ 778.221553] ksys_write+0xc0/0x160
[ 778.221602] ? __ia32_sys_read+0x50/0x50
[ 778.221684] ? lockdep_hardirqs_on_prepare+0x13d/0x210
[ 778.221718] ? syscall_enter_from_user_mode+0x1c/0x50
[ 778.221821] do_syscall_64+0x33/0x40
[ 778.221862] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 778.221896] RIP: 0033:0x7f4affdd9504
[ 778.221928] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 778.221956] RSP: 002b:00007fffebb36b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 778.222011] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f4affdd9504
[ 778.222038] RDX: 0000000000000002 RSI: 00007fffebb36c50 RDI: 0000000000000003
[ 778.222066] RBP: 00007fffebb36c50 R08: 0000556a151aa600 R09: 00007f4affeb1540
[ 778.222094] R10: fffffffffffffc19 R11: 0000000000000246 R12: 0000556a151aa520
[ 778.222121] R13: 0000000000000002 R14: 00007f4affea6760 R15: 0000000000000002
[ 778.222764] Allocated by task 3212:
[ 778.223285] kasan_save_stack+0x19/0x40
[ 778.223316] __kasan_kmalloc.constprop.7+0xc1/0xd0
[ 778.223347] kmem_cache_alloc_trace+0x186/0x350
[ 778.223382] rnbd_srv_rdma_ev+0xf16/0x1690 [rnbd_server]
[ 778.223422] process_io_req+0x4d1/0x670 [rtrs_server]
[ 778.223573] __ib_process_cq+0x10a/0x350 [ib_core]
[ 778.223709] ib_cq_poll_work+0x31/0xb0 [ib_core]
[ 778.223743] process_one_work+0x521/0xa90
[ 778.223773] worker_thread+0x65/0x5b0
[ 778.223802] kthread+0x1f2/0x210
[ 778.223833] ret_from_fork+0x22/0x30
[ 778.224296] Freed by task 8842:
[ 778.224800] kasan_save_stack+0x19/0x40
[ 778.224829] kasan_set_track+0x1c/0x30
[ 778.224860] kasan_set_free_info+0x1b/0x30
[ 778.224889] __kasan_slab_free+0x108/0x150
[ 778.224919] slab_free_freelist_hook+0x64/0x190
[ 778.224947] kfree+0xe2/0x650
[ 778.224982] rnbd_destroy_sess_dev+0x2fa/0x3b0 [rnbd_server]
[ 778.225011] kobject_put+0xda/0x270
[ 778.225046] rnbd_srv_sess_dev_force_close+0x30/0x60 [rnbd_server]
[ 778.225081] rnbd_srv_dev_session_force_close_store+0x6a/0xc0 [rnbd_server]
[ 778.225111] kernfs_fop_write+0x141/0x240
[ 778.225140] vfs_write+0x142/0x4d0
[ 778.225169] ksys_write+0xc0/0x160
[ 778.225198] do_syscall_64+0x33/0x40
[ 778.225227] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 778.226506] The buggy address belongs to the object at ffff88b1d6516c00
which belongs to the cache kmalloc-512 of size 512
[ 778.227464] The buggy address is located 40 bytes inside of
512-byte region [ffff88b1d6516c00, ffff88b1d6516e00)
The problem is in the sess_dev release function we call
rnbd_destroy_sess_dev, and could free the sess_dev already, but we still
set the keep_id in rnbd_srv_sess_dev_force_close, which lead to use
after free.
To fix it, move the keep_id before the sysfs removal, and cache the
rnbd_srv_session for lock accessing,
Fixes: 786998050cbc ("block/rnbd-srv: close a mapped device from server side.")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
lkp reboot following build error:
drivers/block/rnbd/rnbd-clt.c: In function 'rnbd_softirq_done_fn':
>> drivers/block/rnbd/rnbd-clt.c:387:2: error: implicit declaration of function 'sg_free_table_chained' [-Werror=implicit-function-declaration]
387 | sg_free_table_chained(&iu->sgt, RNBD_INLINE_SG_CNT);
| ^~~~~~~~~~~~~~~~~~~~~
The reason is CONFIG_SG_POOL is not enabled in the config, to
avoid such failure, select SG_POOL in Kconfig for RNBD_CLIENT.
Fixes: 5a1328d0c3a7 ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Shakeel Butt reported in [1] that a user can request a task to be moved
to a resource group even if the task is already in the group. It just
wastes time to do the move operation which could be costly to send IPI
to a different CPU.
Add a sanity check to ensure that the move operation only happens when
the task is not already in the resource group.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/962ede65d8e95be793cb61102cca37f7bb018e66.1608243147.git.reinette.chatre@intel.com
|
|
Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.
This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.
This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.
To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.com
[ bp: Massage commit message and drop the two update_task_closid_rmid()
variants. ]
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.1608243147.git.reinette.chatre@intel.com
|