linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-08-14	clocksource/drivers/fttmr010: Pass around less pointers	Linus Walleij	1	-16/+16
	Just pass bool flags from the different initcalls and use the flags to set the right pointers. This results in less pointers passed around in init. Cc: Cédric Le Goater <clg@kaod.org> Cc: Joel Stanley <joel@jms.id.au> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210724224424.2085404-1-linus.walleij@linaro.org
2021-08-14	clocksource/drivers/mediatek: Optimize systimer irq clear flow on shutdown	Fengquan Chen	1	-2/+6
	mtk_syst_clkevt_shutdown is called after irq disabled in suspend flow, clear any pending systimer irq when shutdown to avoid suspend aborted due to timer irq pending Also as for systimer in mediatek socs, there must be firstly enable timer before clear systimer irq Fixes: e3af677607d9("clocksource/drivers/timer-mediatek: Add support for system timer") Signed-off-by: Fengquan Chen <fengquan.chen@mediatek.com> Tested-by: Hsin-Yi Wang <hsinyi@chromium.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1617960162-1988-2-git-send-email-Fengquan.Chen@mediatek.com
2021-08-14	clocksource/drivers/ingenic: Use bitfield macro helpers	周琰杰 (Zhou Yanjie)	1	-6/+7
	Use "FIELD_GET()" and "FIELD_PREP()" to simplify the code. [dlezcano] : Changed title Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> Reviewed-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1627638188-116163-1-git-send-email-zhouyanjie@wanyeetech.com
2021-08-13	clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel	Phong Hoang	1	-12/+18
	If CMT instance has at least two channels, one channel will be used as a clock source and another one used as a clock event device. In that case, IRQ is not requested for clock source channel so sh_cmt_clock_event_program_verify() might work incorrectly. Besides, when a channel is only used for clock source, don't need to re-set the next match_value since it should be maximum timeout as it still is. On the other hand, due to no IRQ, total_cycles is not counted up when reaches compare match time (timer counter resets to zero), so sh_cmt_clocksource_read() returns unexpected value. Therefore, use 64-bit clocksoure's mask for 32-bit or 16-bit variants will also lead to wrong delta calculation. Hence, this mask should correspond to timer counter width, and above function just returns the raw value of timer counter register. Fixes: bfa76bb12f23 ("clocksource: sh_cmt: Request IRQ for clock event device only") Fixes: 37e7742c55ba ("clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines") Signed-off-by: Phong Hoang <phong.hoang.wz@renesas.com> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210422123443.73334-1-niklas.soderlund+renesas@ragnatech.se
2021-08-13	dt-bindings: timer: convert rockchip,rk-timer.txt to YAML	Ezequiel Garcia	2	-27/+64
	Convert Rockchip Timer dt-bindings to YAML. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210506111136.3941-4-ezequiel@collabora.com
2021-08-13	clocksource/drivers/exynos_mct: Mark MCT device as CLOCK_EVT_FEAT_PERCPU	Will Deacon	1	-1/+2
	The "mct_tick" is a per-cpu clockevents device. Set the CLOCK_EVT_FEAT_PERCPU feature to prevent e.g. mct_tick0 being unsafely designated as the global broadcast timer and instead treat the device as a per-cpu wakeup timer. Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Will Deacon <will@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210608154341.10794-3-will@kernel.org
2021-08-13	clocksource/drivers/exynos_mct: Prioritise Arm arch timer on arm64	Will Deacon	1	-2/+11
	All arm64 CPUs feature an architected timer, which offers a relatively low-latency interface to a per-cpu clocksource and timer. For the most part, using this interface is a no-brainer, with the exception of SoCs where it cannot be used to wake up from deep idle state (i.e. CLOCK_EVT_FEAT_C3STOP is set). On the contrary, the Exynos MCT is extremely slow to access yet can be used as a wakeup source. In preparation for using the Exynos MCT as a potential wakeup timer for the Arm architected timer, reduce its ratings so that the architected timer is preferred. This effectively reverts the decision made in 6282edb72bed ("clocksource/drivers/exynos_mct: Increase priority over ARM arch timer") for arm64, as the reasoning for the original change was to work around a 32-bit SoC design. Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Chanwoo Choi <cw00.choi@samsung.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Will Deacon <will@kernel.org> Tested-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> # exynos-5422 Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210608154341.10794-2-will@kernel.org
2021-08-12	hrtimer: Unbreak hrtimer_force_reprogram()	Thomas Gleixner	1	-20/+20
	Since the recent consoliation of reprogramming functions, hrtimer_force_reprogram() is affected by a check whether the new expiry time is past the current expiry time. This breaks the NOHZ logic as that relies on the fact that the tick hrtimer is moved into the future. That means cpu_base->expires_next becomes stale and subsequent reprogramming attempts fail as well until the situation is cleaned up by an hrtimer interrupts. For some yet unknown reason this leads to a complete stall, so for now partially revert the offending commit to a known working state. The root cause for the stall is still investigated and will be fixed in a subsequent commit. Fixes: b14bca97c9f5 ("hrtimer: Consolidate reprogramming code") Reported-by: Mike Galbraith <efault@gmx.de> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <efault@gmx.de> Link: https://lore.kernel.org/r/8735recskh.ffs@tglx
2021-08-12	hrtimer: Use raw_cpu_ptr() in clock_was_set()	Thomas Gleixner	1	-2/+3
	clock_was_set() can be invoked from preemptible context. Use raw_cpu_ptr() to check whether high resolution mode is active or not. It does not matter whether the task migrates after acquiring the pointer. Fixes: e71a4153b7c2 ("hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case") Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/875ywacsmb.ffs@tglx
2021-08-10	hrtimer: Avoid more SMP function calls in clock_was_set()	Thomas Gleixner	1	-9/+65
	By unconditionally updating the offsets there are more indicators whether the SMP function calls on clock_was_set() can be avoided: - When the offset update already happened on the remote CPU then the remote update attempt will yield the same seqeuence number and no IPI is required. - When the remote CPU is currently handling hrtimer_interrupt(). In that case the remote CPU will reevaluate the timer bases before reprogramming anyway, so nothing to do. - After updating it can be checked whether the first expiring timer in the affected clock bases moves before the first expiring (softirq) timer of the CPU. If that's not the case then sending the IPI is not required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.887322464@linutronix.de
2021-08-10	hrtimer: Avoid unnecessary SMP function calls in clock_was_set()	Marcelo Tosatti	1	-2/+33
	Setting of clocks triggers an unconditional SMP function call on all online CPUs to reprogram the clock event device. However, only some clocks have their offsets updated and therefore potentially require a reprogram. That's CLOCK_REALTIME and CLOCK_TAI and in the case of resume (delayed sleep time injection) also CLOCK_BOOTTIME. Instead of sending an IPI unconditionally, check each per CPU hrtimer base whether it has active timers in the affected clock bases which are indicated by the caller in the @bases argument of clock_was_set(). If that's not the case, skip the IPI and update the offsets remotely which ensures that any subsequently armed timers on the affected clocks are evaluated with the correct offsets. [ tglx: Adopted to the new bases argument, removed the softirq_active check, added comment, fixed up stale comment ] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.787536542@linutronix.de
2021-08-10	hrtimer: Add bases argument to clock_was_set()	Thomas Gleixner	3	-10/+17
	clock_was_set() unconditionaly invokes retrigger_next_event() on all online CPUs. This was necessary because that mechanism was also used for resume from suspend to idle which is not longer the case. The bases arguments allows the callers of clock_was_set() to hand in a mask which tells clock_was_set() which of the hrtimer clock bases are affected by the clock setting. This mask will be used in the next step to check whether a CPU base has timers queued on a clock base affected by the event and avoid the SMP function call if there are none. Add a @bases argument, provide defines for the active bases masking and fixup all callsites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.691083465@linutronix.de
2021-08-10	time/timekeeping: Avoid invoking clock_was_set() twice	Thomas Gleixner	1	-8/+10
	do_adjtimex() might end up scheduling a delayed clock_was_set() via timekeeping_advance() and then invoke clock_was_set() directly which is pointless. Make timekeeping_advance() return whether an invocation of clock_was_set() is required and handle it at the call sites which allows do_adjtimex() to issue a single direct call if required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.580966888@linutronix.de
2021-08-10	timekeeping: Distangle resume and clock-was-set events	Thomas Gleixner	5	-11/+18
	Resuming timekeeping is a clock-was-set event and uses the clock-was-set notification mechanism. This is in the way of making the clock-was-set update for hrtimers selective so unnecessary IPIs are avoided when a CPU base does not have timers queued which are affected by the clock setting. Distangle it by invoking hrtimer_resume() on each unfreezing CPU and invoke the new timerfd_resume() function from timekeeping_resume() which is the only place where this is needed. Rename hrtimer_resume() to hrtimer_resume_local() to reflect the change. With this the clock_was_set*() functions are not longer required to IPI all CPUs unconditionally and can get some smarts to avoid them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.488853478@linutronix.de
2021-08-10	timerfd: Provide timerfd_resume()	Thomas Gleixner	2	-0/+18
	Resuming timekeeping is a clock-was-set event and uses the clock-was-set notification mechanism. This is in the way of making the clock-was-set update for hrtimers selective so unnecessary IPIs are avoided when a CPU base does not have timers queued which are affected by the clock setting. Provide a seperate timerfd_resume() interface so the resume logic and the clock-was-set mechanism can be distangled in the core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.395287410@linutronix.de
2021-08-10	hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case	Thomas Gleixner	1	-28/+59
	When CONFIG_HIGH_RES_TIMERS is disabled, but NOHZ is enabled then clock_was_set() is not doing anything. With HIGHRES=n the kernel relies on the periodic tick to update the clock offsets, but when NOHZ is enabled and active then CPUs which are in a deep idle sleep do not have a periodic tick which means the expiry of timers affected by clock_was_set() can be arbitrarily delayed up to the point where the CPUs are brought out of idle again. Make the clock_was_set() logic unconditionaly available so that idle CPUs are kicked out of idle to handle the update. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.288697903@linutronix.de
2021-08-10	hrtimer: Ensure timerfd notification for HIGHRES=n	Thomas Gleixner	3	-21/+19
	If high resolution timers are disabled the timerfd notification about a clock was set event is not happening for all cases which use clock_was_set_delayed() because that's a NOP for HIGHRES=n, which is wrong. Make clock_was_set_delayed() unconditially available to fix that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.196661266@linutronix.de
2021-08-10	hrtimer: Consolidate reprogramming code	Peter Zijlstra	1	-43/+29
	This code is mostly duplicated. The redudant store in the force reprogram case does no harm and the in hrtimer interrupt condition cannot be true for the force reprogram invocations. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.054424875@linutronix.de
2021-08-10	hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns()	Thomas Gleixner	1	-7/+53
	If __hrtimer_start_range_ns() is invoked with an already armed hrtimer then the timer has to be canceled first and then added back. If the timer is the first expiring timer then on removal the clockevent device is reprogrammed to the next expiring timer to avoid that the pending expiry fires needlessly. If the new expiry time ends up to be the first expiry again then the clock event device has to reprogrammed again. Avoid this by checking whether the timer is the first to expire and in that case, keep the timer on the current CPU and delay the reprogramming up to the point where the timer has been enqueued again. Reported-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135157.873137732@linutronix.de
2021-08-10	posix-cpu-timers: Recalc next expiration when timer_settime() ends up not queueing	Frederic Weisbecker	2	-7/+41
	There are several scenarios that can result in posix_cpu_timer_set() not queueing the timer but still leaving the threadgroup cputime counter running or keeping the tick dependency around for a random amount of time. 1) If timer_settime() is called with a 0 expiration on a timer that is already disabled, the process wide cputime counter will be started and won't ever get a chance to be stopped by stop_process_timer() since no timer is actually armed to be processed. The following snippet is enough to trigger the issue. void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); timer_settime(id, TIMER_ABSTIME, &val, NULL); timer_delete(id); } 2) If timer_settime() is called with a 0 expiration on a timer that is already armed, the timer is dequeued but not really disarmed. So the process wide cputime counter and the tick dependency may still remain a while around. The following code snippet keeps this overhead around for one week after the timer deletion: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; val.it_value.tv_sec = 604800; timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); timer_settime(id, 0, &val, NULL); timer_delete(id); } 3) If the timer was initially deactivated, this call to timer_settime() with an early expiration may have started the process wide cputime counter even though the timer hasn't been queued and armed because it has fired early and inline within posix_cpu_timer_set() itself. As a result the process wide cputime counter may never stop until a new timer is ever armed in the future. The following code snippet can reproduce this: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; signal(SIGALRM, SIG_IGN); timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); val.it_value.tv_nsec = 1; timer_settime(id, TIMER_ABSTIME, &val, NULL); } 4) If the timer was initially armed with a former expiration value before this call to timer_settime() and the current call sets an early deadline that has already expired, the timer fires inline within posix_cpu_timer_set(). In this case it must have been dequeued before firing inline with its new expiration value, yet it hasn't been disarmed in this case. So the process wide cputime counter and the tick dependency may still be around for a while even after the timer fired. The following code snippet can reproduce this: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; signal(SIGALRM, SIG_IGN); timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); val.it_value.tv_sec = 100; timer_settime(id, TIMER_ABSTIME, &val, NULL); val.it_value.tv_sec = 0; val.it_value.tv_nsec = 1; timer_settime(id, TIMER_ABSTIME, &val, NULL); } Fix all these issues with triggering the related base next expiration recalculation on the next tick. This also implies to re-evaluate the need to keep around the process wide cputime counter and the tick dependency, in a similar fashion to disarm_timer(). Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-7-frederic@kernel.org
2021-08-10	posix-cpu-timers: Consolidate timer base accessor	Frederic Weisbecker	1	-15/+13
	Remove the ad-hoc timer base accessors and provide a consolidated one. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-6-frederic@kernel.org
2021-08-10	posix-cpu-timers: Remove confusing return value override	Frederic Weisbecker	1	-2/+0
	The end of the function cannot be reached with an error in variable ret. Unconfuse reviewers about that. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-5-frederic@kernel.org
2021-08-10	posix-cpu-timers: Force next expiration recalc after itimer reset	Frederic Weisbecker	1	-2/+0
	When an itimer deactivates a previously armed expiration, it simply doesn't do anything. As a result the process wide cputime counter keeps running and the tick dependency stays set until it reaches the old ghost expiration value. This can be reproduced with the following snippet: void trigger_process_counter(void) { struct itimerval n = {}; n.it_value.tv_sec = 100; setitimer(ITIMER_VIRTUAL, &n, NULL); n.it_value.tv_sec = 0; setitimer(ITIMER_VIRTUAL, &n, NULL); } Fix this with resetting the relevant base expiration. This is similar to disarming a timer. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-4-frederic@kernel.org
2021-08-10	posix-cpu-timers: Force next_expiration recalc after timer deletion	Frederic Weisbecker	2	-2/+35
	A timer deletion only dequeues the timer but it doesn't shutdown the related costly process wide cputimer counter and the tick dependency. The following code snippet keeps this overhead around for one week after the timer deletion: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; val.it_value.tv_sec = 604800; timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); timer_settime(id, 0, &val, NULL); timer_delete(id); } Make sure the next target's tick recalculates the nearest expiration and clears the process wide counter and tick dependency if necessary. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-3-frederic@kernel.org
2021-08-10	posix-cpu-timers: Assert task sighand is locked while starting cputime counter	Frederic Weisbecker	3	-0/+23
	Starting the process wide cputime counter needs to be done in the same sighand locking sequence than actually arming the related timer otherwise this races against concurrent timers setting/expiring in the same threadgroup. Detecting that the cputime counter is started without holding the sighand lock is a first step toward debugging such situations. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-2-frederic@kernel.org
2021-08-10	posix-timers: Remove redundant initialization of variable ret	Colin Ian King	1	-1/+1
	The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210721120147.109570-1-colin.king@canonical.com
2021-08-10	clocksource: Replace deprecated CPU-hotplug functions.	Sebastian Andrzej Siewior	1	-3/+3
	The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210803141621.780504-35-bigeasy@linutronix.de
2021-08-08	Linux 5.14-rc5	Linus Torvalds	1	-1/+1

2021-08-06	Revert "riscv: Remove CONFIG_PHYS_RAM_BASE_FIXED"	Alexandre Ghiti	1	-0/+6
	This reverts commit 9b79878ced8f7ab85c57623f8b1f6882e484a316. The removal of this config exposes CONFIG_PHYS_RAM_BASE for all kernel types: this value being implementation-specific, this breaks the genericity of the RISC-V kernel so revert it. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Tested-by: Emil Renner Berthing <kernel@esmil.dk> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-06	riscv: Get rid of CONFIG_PHYS_RAM_BASE in kernel physical address conversion	Alexandre Ghiti	2	-8/+16
	The usage of CONFIG_PHYS_RAM_BASE for all kernel types was a mistake: this value is implementation-specific and this breaks the genericity of the RISC-V kernel. Fix this by introducing a new variable phys_ram_base that holds this value at runtime and use it in the kernel physical address conversion macro. Since this value is used only for XIP kernels, evaluate it only if CONFIG_XIP_KERNEL is set which in addition optimizes this macro for standard kernels at compile-time. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Tested-by: Emil Renner Berthing <kernel@esmil.dk> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Fixes: 44c922572952 ("RISC-V: enable XIP") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-06	kyber: make trace_block_rq call consistent with documentation	Vincent Fu	1	-1/+1
	The kyber ioscheduler calls trace_block_rq_insert() after the request is added to the queue but the documentation for trace_block_rq_insert() says that the call should be made before the request is added to the queue. Move the tracepoint for the kyber ioscheduler so that it is consistent with the documentation. Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Link: https://lore.kernel.org/r/20210804194913.10497-1-vincent.fu@samsung.com Reviewed by: Adam Manzanares <a.manzanares@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-06	ext4: fix potential htree corruption when growing large_dir directories	Theodore Ts'o	1	-1/+1
	Commit b5776e7524af ("ext4: fix potential htree index checksum corruption) removed a required restart when multiple levels of index nodes need to be split. Fix this to avoid directory htree corruptions when using the large_dir feature. Cc: stable@kernel.org # v5.11 Cc: Благодаренко Артём <artem.blagodarenko@gmail.com> Fixes: b5776e7524af ("ext4: fix potential htree index checksum corruption) Reported-by: Denis <denis@voxelsoft.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-06	tracepoint: Use rcu get state and cond sync for static call updates	Mathieu Desnoyers	1	-14/+67
	State transitions from 1->0->1 and N->2->1 callbacks require RCU synchronization. Rather than performing the RCU synchronization every time the state change occurs, which is quite slow when many tracepoints are registered in batch, instead keep a snapshot of the RCU state on the most recent transitions which belong to a chain, and conditionally wait for a grace period on the last transition of the chain if one g.p. has not elapsed since the last snapshot. This applies to both RCU and SRCU. This brings the performance regression caused by commit 231264d6927f ("Fix: tracepoint: static call function vs data state mismatch") back to what it was originally. Before this commit: # trace-cmd start -e all # time trace-cmd start -p nop real 0m10.593s user 0m0.017s sys 0m0.259s After this commit: # trace-cmd start -e all # time trace-cmd start -p nop real 0m0.878s user 0m0.000s sys 0m0.103s Link: https://lkml.kernel.org/r/20210805192954.30688-1-mathieu.desnoyers@efficios.com Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/ Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Stefan Metzmacher <metze@samba.org> Fixes: 231264d6927f ("Fix: tracepoint: static call function vs data state mismatch") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-06	io-wq: fix lack of acct->nr_workers < acct->max_workers judgement	Hao Xu	1	-1/+9
	There should be this judgement before we create an io-worker Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-06	io-wq: fix no lock protection of acct->nr_worker	Hao Xu	1	-6/+17
	There is an acct->nr_worker visit without lock protection. Think about the case: two callers call io_wqe_wake_worker(), one is the original context and the other one is an io-worker(by calling io_wqe_enqueue(wqe, linked)), on two cpus paralelly, this may cause nr_worker to be larger than max_worker. Let's fix it by adding lock for it, and let's do nr_workers++ before create_io_worker. There may be a edge cause that the first caller fails to create an io-worker, but the second caller doesn't know it and then quit creating io-worker as well: say nr_worker = max_worker - 1 cpu 0 cpu 1 io_wqe_wake_worker() io_wqe_wake_worker() nr_worker < max_worker nr_worker++ create_io_worker() nr_worker == max_worker failed return return But the chance of this case is very slim. Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> [axboe: fix unconditional create_io_worker() call] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-06	perf/x86/intel: Apply mid ACK for small core	Kan Liang	2	-8/+30
	A warning as below may be occasionally triggered in an ADL machine when these conditions occur: - Two perf record commands run one by one. Both record a PEBS event. - Both runs on small cores. - They have different adaptive PEBS configuration (PEBS_DATA_CFG). [ ] WARNING: CPU: 4 PID: 9874 at arch/x86/events/intel/ds.c:1743 setup_pebs_adaptive_sample_data+0x55e/0x5b0 [ ] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0 [ ] Call Trace: [ ] <NMI> [ ] intel_pmu_drain_pebs_icl+0x48b/0x810 [ ] perf_event_nmi_handler+0x41/0x80 [ ] </NMI> [ ] __perf_event_task_sched_in+0x2c2/0x3a0 Different from the big core, the small core requires the ACK right before re-enabling counters in the NMI handler, otherwise a stale PEBS record may be dumped into the later NMI handler, which trigger the warning. Add a new mid_ack flag to track the case. Add all PMI handler bits in the struct x86_hybrid_pmu to track the bits for different types of PMUs. Apply mid ACK for the small cores on an Alder Lake machine. The existing hybrid() macro has a compile error when taking address of a bit-field variable. Add a new macro hybrid_bit() to get the bit-field value of a given PMU. Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Tested-by: Ammy Yi <ammy.yi@intel.com> Link: https://lkml.kernel.org/r/1627997128-57891-1-git-send-email-kan.liang@linux.intel.com
2021-08-05	RDMA/iw_cxgb4: Fix refcount underflow while destroying cqs.	Dakshaja Uppalapati	3	-8/+13
	Previous atomic increment/decrement logic expects the atomic count to be '0' after the final decrement. Replacing atomic count with refcount does not allow that, as refcount_dec() considers count of 1 as underflow and triggers a kernel splat. Fix the current refcount logic by using the usual pattern of decrementing the refcount and test if it is '0' on the final deref in c4iw_destroy_cq(). Use wait_for_completion() instead of wait_event(). Fixes: 7183451f846d ("RDMA/cxgb4: Use refcount_t instead of atomic_t for reference counting") Link: https://lore.kernel.org/r/1628167412-12114-1-git-send-email-dakshaja@chelsio.com Signed-off-by: Dakshaja Uppalapati <dakshaja@chelsio.com> Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-05	drm/amdgpu: add DID for beige goby	Chengming Gui	1	-0/+7
	Add device ids. Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-05	drm/amdgpu/display: fix DMUB firmware version info	Shirish S	1	-1/+1
	DMUB firmware info is printed before it gets initialized. Correct this order to ensure true value is conveyed. Signed-off-by: Shirish S <shirish.s@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-08-05	drm/amd/display: workaround for hard hang on HPD on native DP	Qingqing Zhuo	1	-1/+3
	[Why] HPD disable and enable sequences are not mutually exclusive on Linux. For HPDs that spans over 1s (i.e. HPD low = 1s), part of the disable sequence (specifically, a request to SMU to lower refclk) could come right before the call to PHY enable, causing DMUB to access an unresponsive PHY and thus a hard hang on the system. [How] Disable 48mhz refclk off on native DP. Reviewed-by: Hersen Wu <hersenxs.wu@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-05	drm/amd/display: Fix resetting DCN3.1 HW when resuming from S4	Jude Shih	1	-3/+5
	[Why] On S4 resume we also need to fix detection of when to reload DMCUB firmware because we're currently using the VBIOS version which isn't compatible with the driver version. [How] Update the hardware init check for DCN31 since it's the ASIC that has this issue. Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Jude Shih <jude.shih@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-05	drm/amd/display: Increase stutter watermark for dcn303	Bing Guo	1	-2/+2
	[Why&How] Hardware team suggested to use SRExitTime= 35.5us as w/a to prevent underflow in certain modes. Reviewed-by: Martin Leung <martin.leung@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Bing Guo <bing.guo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-05	drm/amd/display: Fix Dynamic bpp issue with 8K30 with Navi 1X	Bing Guo	1	-1/+1
	Why: In DCN2x, HW doesn't automatically divide MASTER_UPDATE_LOCK_DB_X by the number of pipes ODM Combined. How: Set MASTER_UPDATE_LOCK_DB_X to the value that is adjusted by the number of pipes ODM Combined. Reviewed-by: Martin Leung <martin.leung@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Bing Guo <bing.guo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-05	drm/amd/display: Assume LTTPR interop for DCN31+	Wesley Chalmers	4	-19/+40
	[WHY] For DCN31 onward, LTTPR is to be enabled and set to Transparent by VBIOS. Driver is to assume that VBIOS has done this without needing to check the VBIOS interop bit. [HOW] Add LTTPR enable and interop VBIOS bits into dc->caps, and force-set the interop bit to true for DCN31+. Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-05	drm/amdgpu: fix checking pmops when PM_SLEEP is not enabled	Randy Dunlap	1	-1/+1
	'pm_suspend_target_state' is only available when CONFIG_PM_SLEEP is set/enabled. OTOH, when both SUSPEND and HIBERNATION are not set, PM_SLEEP is not set, so this variable cannot be used. ../drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c: In function ‘amdgpu_acpi_is_s0ix_active’: ../drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:1046:11: error: ‘pm_suspend_target_state’ undeclared (first use in this function); did you mean ‘__KSYM_pm_suspend_target_state’? return pm_suspend_target_state == PM_SUSPEND_TO_IDLE; ^~~~~~~~~~~~~~~~~~~~~~~ __KSYM_pm_suspend_target_state Also use shorter IS_ENABLED(CONFIG_foo) notation for checking the 2 config symbols. Fixes: 91e273712ab8dd ("drm/amdgpu: Check pmops for desired suspend state") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-next@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-08-05	drm/amd/pm: update yellow carp pmfw interface version	Xiaomeng Hou	1	-1/+1
	Correct yellow carp driver-PMFW interface version to v4. Signed-off-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-05	tracepoint: Fix static call function vs data state mismatch	Mathieu Desnoyers	1	-20/+82
	On a 1->0->1 callbacks transition, there is an issue with the new callback using the old callback's data. Considering __DO_TRACE_CALL: do { \ struct tracepoint_func it_func_ptr; \ void __data; \ it_func_ptr = \ rcu_dereference_raw((&__tracepoint_##name)->funcs); \ if (it_func_ptr) { \ __data = (it_func_ptr)->data; \ ----> [ delayed here on one CPU (e.g. vcpu preempted by the host) ] static_call(tp_func_##name)(__data, args); \ } \ } while (0) It has loaded the tp->funcs of the old callback, so it will try to use the old data. This can be fixed by adding a RCU sync anywhere in the 1->0->1 transition chain. On a N->2->1 transition, we need an rcu-sync because you may have a sequence of 3->2->1 (or 1->2->1) where the element 0 data is unchanged between 2->1, but was changed from 3->2 (or from 1->2), which may be observed by the static call. This can be fixed by adding an unconditional RCU sync in transition 2->1. Note, this fixes a correctness issue at the cost of adding a tremendous performance regression to the disabling of tracepoints. Before this commit: # trace-cmd start -e all # time trace-cmd start -p nop real 0m0.778s user 0m0.000s sys 0m0.061s After this commit: # trace-cmd start -e all # time trace-cmd start -p nop real 0m10.593s user 0m0.017s sys 0m0.259s A follow up fix will introduce a more lightweight scheme based on RCU get_state and cond_sync, that will return the performance back to what it was. As both this change and the lightweight versions are complex on their own, for bisecting any issues that this may cause, they are kept as two separate changes. Link: https://lkml.kernel.org/r/20210805132717.23813-3-mathieu.desnoyers@efficios.com Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/ Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Stefan Metzmacher <metze@samba.org> Fixes: d25e37d89dd2 ("tracepoint: Optimize using static_call()") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-05	tracepoint: static call: Compare data on transition from 2->1 callees	Mathieu Desnoyers	1	-1/+1
	On transition from 2->1 callees, we should be comparing .data rather than .func, because the same callback can be registered twice with different data, and what we care about here is that the data of array element 0 is unchanged to skip rcu sync. Link: https://lkml.kernel.org/r/20210805132717.23813-2-mathieu.desnoyers@efficios.com Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/ Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Stefan Metzmacher <metze@samba.org> Fixes: 547305a64632 ("tracepoint: Fix out of sync data passing by static caller") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-05	spi: cadence-quadspi: Fix check condition for DTR ops	Apurva Nandan	1	-3/+18
	buswidth and dtr fields in spi_mem_op are only valid when the corresponding spi_mem_op phase has a non-zero length. For example, SPI NAND core doesn't set buswidth when using SPI_MEM_OP_NO_ADDR phase. Fix the dtr checks in set_protocol() and suppports_mem_op() to ignore empty spi_mem_op phases, as checking for dtr field in empty phase will result in false negatives. Signed-off-by: Apurva Nandan <a-nandan@ti.com> Link: https://lore.kernel.org/r/20210716232504.182-3-a-nandan@ti.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-05	Bluetooth: defer cleanup of resources in hci_unregister_dev()	Tetsuo Handa	4	-24/+45
	syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to calling lock_sock() with rw spinlock held [1]. It seems that history of this locking problem is a trial and error. Commit b40df5743ee8 ("[PATCH] bluetooth: fix socket locking in hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to lock_sock() as an attempt to fix lockdep warning. Then, commit 4ce61d1c7a8e ("[BLUETOOTH]: Fix locking in hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to local_bh_disable() + bh_lock_sock_nested() as an attempt to fix the sleep in atomic context warning. Then, commit 4b5dd696f81b ("Bluetooth: Remove local_bh_disable() from hci_sock.c") in 3.3-rc1 removed local_bh_disable(). Then, commit e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to lock_sock() as an attempt to fix CVE-2021-3573. This difficulty comes from current implementation that hci_sock_dev_event(HCI_DEV_UNREG) is responsible for dropping all references from sockets because hci_unregister_dev() immediately reclaims resources as soon as returning from hci_sock_dev_event(HCI_DEV_UNREG). But the history suggests that hci_sock_dev_event(HCI_DEV_UNREG) was not doing what it should do. Therefore, instead of trying to detach sockets from device, let's accept not detaching sockets from device at hci_sock_dev_event(HCI_DEV_UNREG), by moving actual cleanup of resources from hci_unregister_dev() to hci_cleanup_dev() which is called by bt_host_release() when all references to this unregistered device (which is a kobject) are gone. Since hci_sock_dev_event(HCI_DEV_UNREG) no longer resets hci_pi(sk)->hdev, we need to check whether this device was unregistered and return an error based on HCI_UNREGISTER flag. There might be subtle behavioral difference in "monitor the hdev" functionality; please report if you found something went wrong due to this patch. Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1] Reported-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object") Acked-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>