| Age | Commit message (Collapse) | Author | Files | Lines |
|
RK3588j uses a different set of OPPs for its GPU, both in terms of
allowed frequencies and in terms of voltages.
Move the GPU OPPs table into per-variant .dtsi files to accommodate
for this difference.
The table for RK3588j is adapted from Rockchip downstream sources [1],
while RK3588 one is moved verbatim into the per-variant .dtsi file.
The values provided for RK3588 in the downstream sources match those
in the original commit.
[1] https://github.com/rockchip-linux/kernel/blob/604cec4004abe5a96c734f2fab7b74809d2d742f/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
Fixes: 6fca4edb93d3 ("arm64: dts: rockchip: Add rk3588 GPU node")
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Link: https://lore.kernel.org/r/20240617-rk-dts-additions-v5-8-c1f5f3267f1e@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
RK3588j is the 'industrial' variant of RK3588, and it uses a different
set of OPPs both in terms of allowed frequencies and in terms of
applicable voltages at each frequency setpoint.
Add the OPPs that apply to RK3588j (and apparently RK3588m too) to
enable dynamic CPU frequency scaling.
OPP values are derived from Rockchip downstream sources [1] by taking
only those OPPs which have the highest frequency for a given voltage
level and dropping the rest (if they are included, the kernel complains
at boot time about them being inefficient)
[1] https://github.com/rockchip-linux/kernel/blob/604cec4004abe5a96c734f2fab7b74809d2d742f/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Link: https://lore.kernel.org/r/20240617-rk-dts-additions-v5-7-c1f5f3267f1e@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
By default the CPUs on RK3588 start up in a conservative performance
mode. Add frequency and voltage mappings to the device tree to enable
dynamic scaling via cpufreq.
OPP values are adapted from Radxa's downstream kernel for Rock 5B [1],
stripping them down to the minimum frequency and voltage combinations
as expected by the generic upstream cpufreq-dt driver, and also dropping
those OPPs that don't differ in voltage but only in frequency (keeping
the top frequency OPP in each case).
Note that this patch ignores voltage scaling for the CPU memory
interface which the downstream kernel does through a custom cpufreq
driver, and which is why the downstream version has two sets of voltage
values for each OPP (the second one being meant for the memory
interface supply regulator). This is done instead via regulator
coupling between CPU and memory interface supplies on affected boards.
This has been tested on Rock 5B with u-boot 2023.11 compiled from
Collabora's integration tree [2] with binary bl31 and appears to be
stable both under active cooling and passive cooling (with throttling)
[1] https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
[2] https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Link: https://lore.kernel.org/r/20240617-rk-dts-additions-v5-6-c1f5f3267f1e@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
RK3588 chips allow for their CPU cores to be powered by a different
supply vs. their corresponding memory interfaces, and two of the
boards currently upstream do that (EVB1 and QuartzPro64).
The voltage of the memory interface though has to match that of the
CPU cores that use it, which downstream kernels achieve by the means
of a custom cpufreq driver which adjusts both at the same time.
It seems that regulator coupling is a more appropriate generic
interface for it, so this patch introduces coupling to affected
device trees to ensure that memory interface voltage is also updated
whenever cpufreq switches between CPU OPPs.
Note that other boards, such as Radxa Rock 5B, define both the CPU
and memory interface regulators as aliases to the same DT node, so
this doesn't apply there.
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Link: https://lore.kernel.org/r/20240617-rk-dts-additions-v5-5-c1f5f3267f1e@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
align with other Radxa products.
- mmc0 is eMMC
- mmc1 is microSD
for ZERO 3E, there is no eMMC, but aliases should start at 0, so mmc0
is microSD as exception.
Fixes: 1a5c8d307c83 ("arm64: dts: rockchip: Add Radxa ZERO 3W/3E")
Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Changes in v3:
- fix syntax error in rk3566-radxa-zero-3e.dts
Changes in v2:
- microSD is mmc0 instead of mmc1 for ZERO 3E
Link: https://lore.kernel.org/r/20240620224435.2752-1-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
LBA3368 is a RK3368 based industrial board from Neardi.
Specs:
- 1 GB DDR3 DRAM
- 8/16 GB eMMC
- µSD slot
- 100 mbit ethernet (optional 12V PoE)
- Ampak AP6255 Wifi/BT combo
- ADC button
- 4 x USB 2.0 via onboard GL852G HUB connected to SoC's ehci host
- 2 exposed as USB-A
- 2 via 2-mm-4-pin connectors
- micro USB OTG connector
- 2 x UART TTL (2-mm-4-pin connectors)
- CSI connector
- DSI connector
- eDP connector
- HDMI 2.0a output (type A)
- touchpad connector (I2C, 3.3V)
- ALC5640 audio codec
- combined headphone/microphone jack
- speaker connector pads
Signed-off-by: Alex Bee <knaerzche@gmail.com>
Link: https://lore.kernel.org/r/20240623090116.670607-5-knaerzche@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Add Neardi LBA3368, a RK3368 based industrial board.
Signed-off-by: Alex Bee <knaerzche@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240623090116.670607-3-knaerzche@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Add vendor prefix for Shanghai Neardi Technology Co., Ltd.
(http://neardi.com/)
Signed-off-by: Alex Bee <knaerzche@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240623090116.670607-2-knaerzche@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
The PinePhone Pro has a vibrator attached via GPIO so
lets enable it.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Link: https://lore.kernel.org/r/20240623165326.1273944-3-pbrobinson@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Enable the IMU sensor on the PinePhone Pro including
the i2c4 bus that it's attached to.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Link: https://lore.kernel.org/r/20240623165326.1273944-2-pbrobinson@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
The PinePhone Pro has a cluster of 3 single RGB GPIO LEDs.
Add the GPIO entries for the 3 red/green/blue LEDs and an
entry for the multi-color group to allow them to be used
as a combined RGB LED.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Link: https://lore.kernel.org/r/20240623165326.1273944-1-pbrobinson@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
The PinePhone Pro as SPI flash on board so enable the SPI
interface and the flash.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Link: https://lore.kernel.org/r/20240623204616.1344806-1-pbrobinson@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
SPI NOR flash chip may vary, so use safe(lowest) spi-max-frequency.
Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Link: https://lore.kernel.org/r/20240623023329.1044-3-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
This commit adds SFC node for Radxa ROCK 5A.
since sdhci and sfc on RK3588s share pins(i.e. exclusive), it cannot
be enabled both nodes at the same time. so status = "okay" is omitted
here.
you may be able to enable sfc (and disable sdhci) by fdt overlay.
SPI NOR flash chip may vary, so use safe(lowest) spi-max-frequency.
Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Link: https://lore.kernel.org/r/20240623023329.1044-2-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
This commit adds support for SPI NOR flash on Radxa ROCK 5B.
SPI NOR flash chip may vary, so use safe(lowest) spi-max-frequency.
Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Link: https://lore.kernel.org/r/20240623023329.1044-1-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
This links the PWM fan on Radxa Rock 5B as an active cooling device
managed automatically by the thermal subsystem, with a target SoC
temperature of 65C and a minimum-spin interval from 55C to 65C to
ensure airflow when the system gets warm
Helped-by: Dragan Simic <dsimic@manjaro.org>
Reviewed-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Link: https://lore.kernel.org/r/20240617-rk-dts-additions-v5-4-c1f5f3267f1e@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
As the GPU support on RK3588 has been merged upstream, along with OPP
values, add a corresponding cooling map for passive cooling using the GPU.
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Link: https://lore.kernel.org/r/20240617-rk-dts-additions-v5-3-c1f5f3267f1e@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
This enables the on-chip thermal monitoring sensor (TSADC) on all
RK3588(s) boards that don't have it enabled yet. It provides temperature
monitoring for the SoC and emergency thermal shutdowns, and is thus
important to have in place before CPU DVFS is enabled, as high CPU
operating performance points can overheat the chip quickly in the
absence of thermal management.
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Link: https://lore.kernel.org/r/20240617-rk-dts-additions-v5-2-c1f5f3267f1e@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
This includes the necessary device tree data to allow thermal
monitoring on RK3588(s) using the on-chip TSADC device, along with
trip points for automatic thermal management.
Each of the CPU clusters (one for the little cores and two for
the big cores) get a passive cooling trip point at 85C, which
will trigger DVFS throttling of the respective cluster upon
reaching a high temperature condition.
All zones also have a critical trip point at 115C, which will
trigger a reset.
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Link: https://lore.kernel.org/r/20240617-rk-dts-additions-v5-1-c1f5f3267f1e@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Rename the Rockchip RK3588 SoC dtsi files and, consequently, adjust their
contents appropriately, to prepare them for the ability to specify different
CPU and GPU OPPs for each of the supported RK3588 SoC variants.
As already discussed, [1][2][3][4] some of the RK3588 SoC variants require
different OPPs, and it makes more sense to have the OPPs already defined when
a board dts(i) file includes one of the SoC variant dtsi files (rk3588.dtsi,
rk3588j.dtsi or rk3588s.dtsi), rather than requiring the board dts(i) file
to also include a separate rk3588*-opp.dtsi file. The choice of the SoC
variant is already made by the inclusion of the SoC dtsi file into the board
dts(i) file, and it doesn't make much sense to, effectively, allow the board
dts(i) file to include and use an incompatible set of OPPs for the already
selected RK3588 SoC variant.
The new naming scheme for the RK3588 SoC dtsi files uses "-base" and "-extra"
suffixes to denote the DT data shared between all RK5588 SoC variants, and
the DT data shared between the unrestricted SoC variants, respectively.
For example, the DT data for the RK3588 includes both rk3588-base.dtsi and
rk3588-extra.dtsi, because it's an unrestricted SoC variant, while the DT
data for the RK3588S variant includes rk3588-base.dtsi only, because it's
a restricted SoC variant, feature- and interface-wise. This achieves a more
logical naming of the RK3588 SoC dtsi files, which reflects the way DT data
for the SoC variants is built by "stacking" the SoC variant features made
available through the "-base" and "-extra" SoC dtsi files. Additionally,
the SoC variant dtsi files (rk3588.dtsi, rk3588j.dtsi and rk3588s.dtsi) are
no longer parents to any other SoC variant dtsi files, which should help with
making the new "stacking" approach cleaner and easier to follow.
The RK3588 pinctrl dtsi files are also renamed in the same way, for the sake
of consistency. This also keeps the "-base" and "-extra" groups of the dtsi
files together when looked at in a directory listing, which is helpful.
The per-SoC-variant OPPs should go directly into the SoC dtsi files, if no
more than one SoC variant uses those OPPs, or be put into a separate "-opp"
dtsi file that's shared between and included from two or more SoC variant
dtsi files. An example for the former is the non-shared OPP data that should
go directly into the RK3588J SoC variant dtsi file (i.e. rk3588j.dtsi), and
an example for the latter is the shared OPP data that should be put into
rk3588-opp.dtsi and be included from the RK3588 and RK3588S SoC variant dtsi
files (i.e. rk3588.dtsi and rk3588s.dtsi, respectively). Consequently, if
the OPPs for the RK3588 and RK3588S SoC variants are ever made different,
the shared rk3588-opp.dtsi file should be deleted and the new OPPs should
be put directly into rk3588.dtsi and rk3588s.dtsi. [4]
No functional changes are introduced, which was validated by decompiling and
comparing all affected dtb files before and after these changes.
As a side note, due to the nature of introduced changes, this commit is best
viewed using the --break-rewrites option for git-log(1).
[1] https://lore.kernel.org/linux-rockchip/646a33e0-5c1b-471c-8183-2c0df40ea51a@cherry.de/
[2] https://lore.kernel.org/linux-rockchip/CABjd4Yxi=+3gkNnH3BysUzzYsji-=-yROtzEc8jM_g0roKB0-w@mail.gmail.com/
[3] https://lore.kernel.org/linux-rockchip/035a274be262528012173d463e25b55f@manjaro.org/
[4] https://lore.kernel.org/linux-rockchip/673dcf47596e7bc8ba065034e339bb1bbf9cdcb0.1716948159.git.dsimic@manjaro.org/T/#u
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Link: https://lore.kernel.org/r/9ffedc0e2ca7f167d9d795b2a8f43cb9f56a653b.1717923308.git.dsimic@manjaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
The CM3588 NAS by FriendlyElec pairs the CM3588 compute module, based on
the Rockchip RK3588 SoC, with the CM3588 NAS Kit carrier board.
To reflect the hardware setup, add device tree sources for the SoM and
the NAS daughter board as separate files.
Hardware features:
- Rockchip RK3588 SoC
- 4GB/8GB/16GB LPDDR4x RAM
- 64GB eMMC
- MicroSD card slot
- 1x RTL8125B 2.5G Ethernet
- 4x M.2 M-Key with PCIe 3.0 x1 (via bifurcation) for NVMe SSDs
- 2x USB 3.0 (USB 3.1 Gen1) Type-A, 1x USB 2.0 Type-A
- 1x USB 3.0 Type-C with DP AltMode support
- 2x HDMI 2.1 out, 1x HDMI in
- MIPI-CSI Connector, MIPI-DSI Connector
- 40-pin GPIO header
- 4 buttons: power, reset, recovery, MASK, user button
- 3.5mm Headphone out, 2.0mm PH-2A Mic in
- 5V Fan connector, PWM beeper, IR receiver, RTC battery connector
PCIe bifurcation is used to handle all four M.2 sockets at PCIe 3.0 x1
speed. Data lane mapping in the DT is done like described in commit
f8020dfb311d ("phy: rockchip-snps-pcie3: fix bifurcation on rk3588").
This device tree includes support for eMMC, SD card, ethernet, all USB2
and USB3 ports, all four M.2 slots, GPU, beeper, IR, RTC, UART debugging
as well as the buttons and LEDs.
The GPIOs are labeled according to the schematics.
Reviewed-by: Space Meyer <git@the-space.agency>
Signed-off-by: Sebastian Kropatsch <seb-dev@mail.de>
Link: https://lore.kernel.org/r/20240616215354.40999-3-seb-dev@mail.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Add devicetree bindings for the FriendlyElec CM3588 NAS board.
The CM3588 NAS by FriendlyElec pairs the CM3588 compute module, based on
the Rockchip RK3588 SoC, with the CM3588 NAS Kit carrier board.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Sebastian Kropatsch <seb-dev@mail.de>
Link: https://lore.kernel.org/r/20240616215354.40999-2-seb-dev@mail.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
By default the BT WAKE signal inside the M.2 key E connector on Radxa
Rock 5B is driven low, which results in the Bluetooth function being
disabled even if the inserted M.2 card supports it. Expose this signal
as an RFKILL device so that it can be enabled by the userspace.
Tested with an Intel AX210 card, which connects a Bluetooth device over
the USB 2.0 bus.
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Link: https://lore.kernel.org/r/20240517122509.4626-1-alchark@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Radxa ROCK S0 is a single-board computer based on the Rockchip RK3308B
SoC in an ultra-compact form factor.
Add initial support for eMMC, SD-card, Ethernet, WiFi/BT and USB.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20240521212247.1240226-3-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Add devicetree binding for the Radxa ROCK S0 board.
Radxa ROCK S0 is a single-board computer based on the Rockchip RK3308B
SoC in an ultra-compact form factor.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240521212247.1240226-2-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Update WiFi SDIO and BT UART related props to better reflect details
about the optional onboard RTL8723DS WiFi/BT module.
Also correct the compatible used for bluetooth to match the WiFi/BT
module used on the board.
Fixes: bc3753aed81f ("arm64: dts: rockchip: rock-pi-s add more peripherals")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20240521211029.1236094-14-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
The VCCIO4 io-domain used for WiFi/BT is using 1v8 IO signal voltage.
Add io-domains node with the VCCIO supplies connected on the board.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20240521211029.1236094-13-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Add a disabled RK3308 IO voltage domains node to SoC DT.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20240521211029.1236094-12-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
The RK3308 SoC contains a controller for one-time-programmable memory,
add a device node for it.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20240521211029.1236094-9-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Be explicit about the Ethernet port and define mdio and ethernet-phy
nodes in the device tree for ROCK Pi S.
Fixes: bc3753aed81f ("arm64: dts: rockchip: rock-pi-s add more peripherals")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20240521211029.1236094-8-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
UAR0 CTS/RTS is not wired to any pin and is not used for the default
serial console use of UART0 on ROCK Pi S.
Override the SoC defined pinctrl props to limit configuration of the
two xfer pins wired to one of the GPIO pin headers.
Fixes: 2e04c25b1320 ("arm64: dts: rockchip: add ROCK Pi S DTS support")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20240521211029.1236094-6-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Add cap-mmc-highspeed to allow use of high speed MMC mode using an eMMC
to uSD board. Use disable-wp to signal that no physical write-protect
line is present. Also add vcc_io used for card and IO line power as
vmmc-supply.
Fixes: 2e04c25b1320 ("arm64: dts: rockchip: add ROCK Pi S DTS support")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20240521211029.1236094-5-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
The Radxa ZERO 3W/3E is an ultra-small, high-performance single board
computer based on the Rockchip RK3566, with a compact form factor and
rich interfaces.
The ZERO 3W and ZERO 3E are basically the same size and model, but
differ only in storage and network interfaces.
- eMMC (3W)
- SD-card (both)
- Ethernet (3E)
- WiFi/BT (3W)
Add initial support for eMMC, SD-card, Ethernet, HDMI and USB.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20240521202810.1225636-3-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Add devicetree binding documentation for Radxa ZERO 3W/3E boards.
The Radxa ZERO 3W/3E is an ultra-small, high-performance single board
computer based on the Rockchip RK3566, with a compact form factor and
rich interfaces.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240521202810.1225636-2-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
|
|
percpu.h depends on smp.h, but doesn't include it directly because of
circular header dependency issues; percpu.h is needed in a bunch of low
level headers.
This fixes a randconfig build error on mips:
include/linux/alloc_tag.h: In function '__alloc_tag_ref_set':
include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id' [-Werror=implicit-function-declaration]
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 24e44cc22aa3 ("mm: percpu: enable per-cpu allocation tagging")
Closes: https://lore.kernel.org/oe-kbuild-all/202405210052.DIrMXJNz-lkp@intel.com/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit 617824a7f0f73e4de325cf8add58e55b28c12493.
This made a simple 'perf record -e cycles:pp make -j199' stop working on
the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed
at length in the threads in the Link tags below.
The fix provided by Ian wasn't acceptable and work to fix this will take
time we don't have at this point, so lets revert this and work on it on
the next devel cycle.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Cc: Ethan Adams <j.ethan.adams@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com
Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Occasionally, the generic/001 xfstest will fail indicating corruption in
one of the copy chains when run on cifs against a server that supports
FSCTL_DUPLICATE_EXTENTS_TO_FILE (eg. Samba with a share on btrfs). The
problem is that the remote_i_size value isn't updated by cifs_setsize()
when called by smb2_duplicate_extents(), but i_size *is*.
This may cause cifs_remap_file_range() to then skip the bit after calling
->duplicate_extents() that sets sizes.
Fix this by calling netfs_resize_file() in smb2_duplicate_extents() before
calling cifs_setsize() to set i_size.
This means we don't then need to call netfs_resize_file() upon return from
->duplicate_extents(), but we also fix the test to compare against the pre-dup
inode size.
[Note that this goes back before the addition of remote_i_size with the
netfs_inode struct. It should probably have been setting cifsi->server_eof
previously.]
Fixes: cfc63fc8126a ("smb3: fix cached file size problems in duplicate extents (reflink)")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix smb3_insert_range() to move the zero_point over to the new EOF.
Without this, generic/147 fails as reads of data beyond the old EOF point
return zeroes.
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The commit 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page
deduplication limit") introduced a possible failure case in the
stable_tree_insert(), where we may free the new allocated stable_node_dup
if we fail to prepare the missing chain node.
Then that kfolio return and unlock with a freed stable_node set... And
any MM activities can come in to access kfolio->mapping, so UAF.
Fix it by moving folio_set_stable_node() to the end after stable_node
is inserted successfully.
Link: https://lkml.kernel.org/r/20240513-b4-ksm-stable-node-uaf-v1-1-f687de76f452@linux.dev
Fixes: 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page deduplication limit")
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When I did memory failure tests recently, below panic occurs:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:1009!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:__del_page_from_free_list+0x151/0x180
RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
FS: 00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
Call Trace:
<TASK>
__rmqueue_pcplist+0x23b/0x520
get_page_from_freelist+0x26b/0xe40
__alloc_pages_noprof+0x113/0x1120
__folio_alloc_noprof+0x11/0xb0
alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
__alloc_fresh_hugetlb_folio+0xe7/0x140
alloc_pool_huge_folio+0x68/0x100
set_max_huge_pages+0x13d/0x340
hugetlb_sysctl_handler_common+0xe8/0x110
proc_sys_call_handler+0x194/0x280
vfs_write+0x387/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xc2/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff916114887
RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
And before the panic, there had an warning about bad page state:
BUG: Bad page state in process page-types pfn:8cee00
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
page_type: 0xffffff7f(buddy)
raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
Call Trace:
<TASK>
dump_stack_lvl+0x83/0xa0
bad_page+0x63/0xf0
free_unref_page+0x36e/0x5c0
unpoison_memory+0x50b/0x630
simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
debugfs_attr_write+0x42/0x60
full_proxy_write+0x5b/0x80
vfs_write+0xcd/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xc2/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f189a514887
RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
</TASK>
The root cause should be the below race:
memory_failure
try_memory_failure_hugetlb
me_huge_page
__page_handle_poison
dissolve_free_hugetlb_folio
drain_all_pages -- Buddy page can be isolated e.g. for compaction.
take_page_off_buddy -- Failed as page is not in the buddy list.
-- Page can be putback into buddy after compaction.
page_ref_inc -- Leads to buddy page with refcnt = 1.
Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning. And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.
Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.
Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
Fixes: ceaf8fbea79a ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After switching smaps_rollup to use VMA iterator, searching for next entry
is part of the condition expression of the do-while loop. So the current
VMA needs to be addressed before the continue statement.
Otherwise, with some VMAs skipped, userspace observed memory
consumption from /proc/pid/smaps_rollup will be smaller than the sum of
the corresponding fields from /proc/pid/smaps.
Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com
Fixes: c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com>
Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Syzbot has reported a potential hang in nilfs_detach_log_writer() called
during nilfs2 unmount.
Analysis revealed that this is because nilfs_segctor_sync(), which
synchronizes with the log writer thread, can be called after
nilfs_segctor_destroy() terminates that thread, as shown in the call trace
below:
nilfs_detach_log_writer
nilfs_segctor_destroy
nilfs_segctor_kill_thread --> Shut down log writer thread
flush_work
nilfs_iput_work_func
nilfs_dispose_list
iput
nilfs_evict_inode
nilfs_transaction_commit
nilfs_construct_segment (if inode needs sync)
nilfs_segctor_sync --> Attempt to synchronize with
log writer thread
*** DEADLOCK ***
Fix this issue by changing nilfs_segctor_sync() so that the log writer
thread returns normally without synchronizing after it terminates, and by
forcing tasks that are already waiting to complete once after the thread
terminates.
The skipped inode metadata flushout will then be processed together in the
subsequent cleanup work in nilfs_segctor_destroy().
Link: https://lkml.kernel.org/r/20240520132621.4054-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+e3973c409251e136fdd0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A potential and reproducible race issue has been identified where
nilfs_segctor_sync() would block even after the log writer thread writes a
checkpoint, unless there is an interrupt or other trigger to resume log
writing.
This turned out to be because, depending on the execution timing of the
log writer thread running in parallel, the log writer thread may skip
responding to nilfs_segctor_sync(), which causes a call to schedule()
waiting for completion within nilfs_segctor_sync() to lose the opportunity
to wake up.
The reason why waking up the task waiting in nilfs_segctor_sync() may be
skipped is that updating the request generation issued using a shared
sequence counter and adding an wait queue entry to the request wait queue
to the log writer, are not done atomically. There is a possibility that
log writing and request completion notification by nilfs_segctor_wakeup()
may occur between the two operations, and in that case, the wait queue
entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of
nilfs_segctor_sync() will be carried over until the next request occurs.
Fix this issue by performing these two operations simultaneously within
the lock section of sc_state_lock. Also, following the memory barrier
guidelines for event waiting loops, move the call to set_current_state()
in the same location into the event waiting loop to ensure that a memory
barrier is inserted just before the event condition determination.
Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: fix log writer related issues".
This bug fix series covers three nilfs2 log writer-related issues,
including a timer use-after-free issue and potential deadlock issue on
unmount, and a potential freeze issue in event synchronization found
during their analysis. Details are described in each commit log.
This patch (of 3):
A use-after-free issue has been reported regarding the timer sc_timer on
the nilfs_sc_info structure.
The problem is that even though it is used to wake up a sleeping log
writer thread, sc_timer is not shut down until the nilfs_sc_info structure
is about to be freed, and is used regardless of the thread's lifetime.
Fix this issue by limiting the use of sc_timer only while the log writer
thread is alive.
Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com
Fixes: fdce895ea5dd ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: "Bai, Shuangpeng" <sjb7183@psu.edu>
Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Fix warnings like:
In file included from uffd-unit-tests.c:8:
uffd-unit-tests.c: In function `uffd_poison_handle_fault':
uffd-common.h:45:33: warning: format `%llu' expects argument of type
`long long unsigned int', but argument 3 has type `__u64' {aka `long
unsigned int'} [-Wformat=]
By switching to unsigned long long for u64 for ppc64 builds.
Link: https://lkml.kernel.org/r/20240521030219.57439-1-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Klara Modin reported warnings for a kernel configured with BPF_JIT but
without MODULES:
[ 44.131296] Trying to vfree() bad address (000000004a17c299)
[ 44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[ 44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G D W 6.9.0-01786-g2c9e5d4a0082 #25
[ 44.158229] Hardware name: Raspberry Pi 3 Model B (DT)
[ 44.164433] Workqueue: events bpf_prog_free_deferred
[ 44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[ 44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[ 44.188772] sp : ffff800082a13c70
[ 44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000
[ 44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000
[ 44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880
[ 44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006
[ 44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002
[ 44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000
[ 44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370
[ 44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370
[ 44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240
[ 44.275703] Call trace:
[ 44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[ 44.283858] vfree (mm/vmalloc.c:3322)
[ 44.287835] execmem_free (mm/execmem.c:70)
[ 44.292347] bpf_jit_free_exec+0x10/0x1c
[ 44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006)
[ 44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195)
[ 44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474)
[ 44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785)
[ 44.317785] process_one_work (kernel/workqueue.c:3273)
[ 44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2))
[ 44.327292] kthread (kernel/kthread.c:388)
[ 44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861)
The problem is because bpf_arch_text_copy() silently fails to write to the
read-only area as a result of patch_map() faulting and the resulting
-EFAULT being chucked away.
Update patch_map() to use CONFIG_EXECMEM instead of
CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.
Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org
Fixes: 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reported-by: Klara Modin <klarasmodin@gmail.com>
Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.com
Tested-by: Klara Modin <klarasmodin@gmail.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|