wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2025-02-16	ASoC: tas2770: Fix volume scale	Hector Martin	1	-1/+1
	The scale starts at -100dB, not -128dB. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20250208-asoc-tas2770-v1-1-cf50ff1d59a3@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-13	ASoC: imx-audmix: remove cpu_mclk which is from cpu dai device	Shengjiu Wang	1	-31/+0
	When defer probe happens, there may be below error: platform 59820000.sai: Resources present before probing The cpu_mclk clock is from the cpu dai device, if it is not released, then the cpu dai device probe will fail for the second time. The cpu_mclk is used to get rate for rate constraint, rate constraint may be specific for each platform, which is not necessary for machine driver, so remove it. Fixes: b86ef5367761 ("ASoC: fsl: Add Audio Mixer machine driver") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Link: https://patch.msgid.link/20250213070518.547375-1-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-10	ASoC: cs35l41: Fix acpi_device_hid() not found	Stefan Binding	1	-0/+7
	Function acpi_device_hid() is only defined if CONFIG_ACPI is set. Use #ifdef CONFIG_ACPI to ensure that cs35l41 driver only calls this function is CONFIG_ACPI is define. Fixes: 1d44a30ae3f9 ("ASoC: cs35l41: Fallback to using HID for system_name if no SUB is available") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502090100.SbXmGFqs-lkp@intel.com/ Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://patch.msgid.link/20250210163256.1722350-1-sbinding@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-10	ASoC: SOF: amd: Add branch prediction hint in ACP IRQ handler	Cristian Ciocaltea	1	-1/+1
	The conditional involving sdev->first_boot in acp_sof_ipc_irq_thread() will succeed only once, i.e. during the very first run of the DSP firmware. Use the unlikely() annotation to help improve branch prediction accuracy. Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Reviewed-by: Venkata Prasad Potturu <venkataprasad.potturu@amd.com> Link: https://patch.msgid.link/20250207-sof-vangogh-fixes-v1-4-67824c1e4c9a@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-10	ASoC: SOF: amd: Handle IPC replies before FW_BOOT_COMPLETE	Cristian Ciocaltea	1	-7/+16
	In some cases, e.g. during resuming from suspend, there is a possibility that some IPC reply messages get received by the host while the DSP firmware has not yet reached the complete boot state. Detect when this happens and do not attempt to process the unexpected replies from DSP. Instead, provide proper debugging support. Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20250207-sof-vangogh-fixes-v1-3-67824c1e4c9a@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-10	ASoC: SOF: amd: Drop unused includes from Vangogh driver	Cristian Ciocaltea	2	-6/+0
	Remove all the includes for headers which are not (directly) used from the Vangogh SOF driver sources. Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Reviewed-by: Venkata Prasad Potturu <venkataprasad.potturu@amd.com> Link: https://patch.msgid.link/20250207-sof-vangogh-fixes-v1-2-67824c1e4c9a@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-10	ASoC: SOF: amd: Add post_fw_run_delay ACP quirk	Cristian Ciocaltea	3	-0/+20
	Stress testing resume from suspend on Valve Steam Deck OLED (Galileo) revealed that the DSP firmware could enter an unrecoverable faulty state, where the kernel ring buffer is flooded with IPC related error messages: [ +0.017002] snd_sof_amd_vangogh 0000:04:00.5: acp_sof_ipc_send_msg: Failed to acquire HW lock [ +0.000054] snd_sof_amd_vangogh 0000:04:00.5: ipc3_tx_msg_unlocked: ipc message send for 0x30100000 failed: -22 [ +0.000005] snd_sof_amd_vangogh 0000:04:00.5: Failed to setup widget PIPELINE.6.ACPHS1.IN [ +0.000004] snd_sof_amd_vangogh 0000:04:00.5: Failed to restore pipeline after resume -22 [ +0.000003] snd_sof_amd_vangogh 0000:04:00.5: PM: dpm_run_callback(): pci_pm_resume returns -22 [ +0.000009] snd_sof_amd_vangogh 0000:04:00.5: PM: failed to resume async: error -22 [...] [ +0.002582] PM: suspend exit [ +0.065085] snd_sof_amd_vangogh 0000:04:00.5: ipc tx error for 0x30130000 (msg/reply size: 12/0): -22 [ +0.000499] snd_sof_amd_vangogh 0000:04:00.5: error: failed widget list set up for pcm 1 dir 0 [ +0.000011] snd_sof_amd_vangogh 0000:04:00.5: error: set pcm hw_params after resume [ +0.000006] snd_sof_amd_vangogh 0000:04:00.5: ASoC: error at snd_soc_pcm_component_prepare on 0000:04:00.5: -22 [...] A system reboot would be necessary to restore the speakers functionality. However, by delaying a bit any host to DSP transmission right after the firmware boot completed, the issue could not be reproduced anymore and sound continued to work flawlessly even after performing thousands of suspend/resume cycles. Introduce the post_fw_run_delay ACP quirk to allow providing the aforementioned delay via the snd_sof_dsp_ops->post_fw_run() callback for the affected devices. Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20250207-sof-vangogh-fixes-v1-1-67824c1e4c9a@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-10	ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt713_vb_l2_rt1320_l13	Peter Ujfalusi	1	-3/+3
	s/lnl/ptl Fixes: a7ebb0255188 ("ASoC: Intel: soc-acpi-intel-ptl-match: add rt713_vb_l2_rt1320_l13 support") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20250210031954.6287-3-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-10	ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt712_vb + rt1320 support	Peter Ujfalusi	1	-3/+3
	s/lnl/ptl Fixes: bd40d912728f ("ASoC: Intel: soc-acpi-intel-ptl-match: add rt712_vb + rt1320 support") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20250210031954.6287-2-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-06	ASoC: tas2781: drop a redundant code	Shenghao Ding	1	-3/+1
	Report from internal ticket, priv->cali_data.data devm_kzalloc twice, drop the first one, it is the unnecessary one. Signed-off-by: Shenghao Ding <shenghao-ding@ti.com> Link: https://patch.msgid.link/20250206123808.1590-1-shenghao-ding@ti.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-06	ASoC: SOF: Intel: hda: add softdep pre to snd-hda-codec-hdmi module	Terry Cheong	1	-0/+1
	In enviornment without KMOD requesting module may fail to load snd-hda-codec-hdmi, resulting in HDMI audio not usable. Add softdep to loading HDMI codec module first to ensure we can load it correctly. Signed-off-by: Terry Cheong <htcheong@chromium.org> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Johny Lin <lpg76627@gmail.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://patch.msgid.link/20250206094723.18013-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-06	ASoC: SOF: ipc4-topology: Harden loops for looking up ALH copiers	Peter Ujfalusi	1	-2/+10
	Other, non DAI copier widgets could have the same stream name (sname) as the ALH copier and in that case the copier->data is NULL, no alh_data is attached, which could lead to NULL pointer dereference. We could check for this NULL pointer in sof_ipc4_prepare_copier_module() and avoid the crash, but a similar loop in sof_ipc4_widget_setup_comp_dai() will miscalculate the ALH device count, causing broken audio. The correct fix is to harden the matching logic by making sure that the 1. widget is a DAI widget - so dai = w->private is valid 2. the dai (and thus the copier) is ALH copier Fixes: a150345aa758 ("ASoC: SOF: ipc4-topology: add SoundWire/ALH aggregation support") Reported-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com> Link: https://github.com/thesofproject/sof/pull/9652 Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20250206084642.14988-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-06	ASoC: cs35l41: Fallback to using HID for system_name if no SUB is available	Stefan Binding	1	-7/+16
	For systems which load firmware on the cs35l41 which use ACPI, the _SUB value is used to differentiate firmware and tuning files for the individual systems. In the case where a system does not have a _SUB defined in ACPI node for cs35l41, there needs to be a fallback to allow the files for that system to be differentiated. Since all ACPI nodes for cs35l41 should have a HID defined, the HID should be a safe option. Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com> Reviewed-by: André Almeida <andrealmeid@igalia.com> Tested-by: André Almeida <andrealmeid@igalia.com> Link: https://patch.msgid.link/20250205164806.414020-1-sbinding@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-06	ASoC: arizona/madera: use fsleep() in up/down DAPM event delays.	Vitaly Rodionov	3	-16/+16
	Using `fsleep` instead of `msleep` resolves some customer complaints regarding the precision of up/down DAPM event timing. `fsleep()` automatically selects the appropriate sleep function, making the delay time more predictable. Signed-off-by: Vitaly Rodionov <vitalyr@opensource.cirrus.com> Link: https://patch.msgid.link/20250205160849.500306-1-vitalyr@opensource.cirrus.com Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-05	ASoC: SOF: pcm: Clear the susbstream pointer to NULL on close	Peter Ujfalusi	1	-0/+2
	The spcm->stream[substream->stream].substream is set during open and was left untouched. After the first PCM stream it will never be NULL and we have code which checks for substream NULLity as indication if the stream is active or not. For the compressed cstream pointer the same has been done, this change will correct the handling of PCM streams. Fixes: 090349a9feba ("ASoC: SOF: Add support for compress API for stream data/offset") Cc: stable@vger.kernel.org Reported-by: Curtis Malainey <cujomalainey@chromium.org> Closes: https://github.com/thesofproject/linux/pull/5214 Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Curtis Malainey <cujomalainey@chromium.org> Link: https://patch.msgid.link/20250205135232.19762-3-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-05	ASoC: SOF: stream-ipc: Check for cstream nullity in sof_ipc_msg_data()	Peter Ujfalusi	1	-1/+5
	The nullity of sps->cstream should be checked similarly as it is done in sof_set_stream_data_offset() function. Assuming that it is not NULL if sps->stream is NULL is incorrect and can lead to NULL pointer dereference. Fixes: 090349a9feba ("ASoC: SOF: Add support for compress API for stream data/offset") Cc: stable@vger.kernel.org Reported-by: Curtis Malainey <cujomalainey@chromium.org> Closes: https://github.com/thesofproject/linux/pull/5214 Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Curtis Malainey <cujomalainey@chromium.org> Link: https://patch.msgid.link/20250205135232.19762-2-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-05	ASoC: rsnd: adjust convert rate limitation	Kuninori Morimoto	1	-22/+76
	Current rsnd driver supports Synchronous SRC Mode, but HW allow to update rate only within 1% from current rate. Adjust to it. Becially, this feature is used to fine-tune subtle difference that occur during sampling rate conversion in SRC. So, it should be called within 1% margin of rate difference. If there was difference over 1%, it will apply with 1% increments by using loop without indicating error message. Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://patch.msgid.link/871pwd2qe8.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-05	ASoC: rsnd: don't indicate warning on rsnd_kctrl_accept_runtime()	Kuninori Morimoto	3	-16/+17
	rsnd_kctrl_accept_runtime() (1) is used for runtime convert rate (= Synchronous SRC Mode). Now, rsnd driver has 2 kctrls for it (A): "SRC Out Rate Switch" (B): "SRC Out Rate" // it calls (1) (A): can be called anytime (B): can be called only runtime, and will indicate warning if it was used at non-runtime. To use runtime convert rate (= Synchronous SRC Mode), user might uses command in below order. (X): > amixer set "SRC Out Rate" on > aplay xxx.wav & (Y): > amixer set "SRC Out Rate" 48010 // convert rate to 48010Hz (Y): calls B (X): calls both A and B. In this case, when user calls (X), it calls both (A) and (B), but it is not yet start running. So, (B) will indicate warning. This warning was added by commit b5c088689847 ("ASoC: rsnd: add warning message to rsnd_kctrl_accept_runtime()"), but the message sounds like the operation was not correct. Let's update warning message. The message is very SRC specific, implement it in src.c Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://patch.msgid.link/8734gt2qed.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-05	ASoC: rsnd: indicate unsupported clock rate	Kuninori Morimoto	1	-1/+2
	It will indicate "unsupported clock rate" when setup clock failed. But it is unclear what kind of rate was failed. Indicate it. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://patch.msgid.link/874j192qej.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-05	ASoC: simple-card-utils.c: add missing dlc->of_node	Kuninori Morimoto	1	-0/+1
	commit 90de551c1bf ("ASoC: simple-card-utils.c: enable multi Component support") added muiti Component support, but was missing to add dlc->of_node. Because of it, Sound device list will indicates strange name if it was DPCM connection and driver supports dai->driver->dai_args, like below > aplay -l card X: sndulcbmix [xxxx], device 0: fe.(null).rsnd-dai.0 () [] ... ^^^^^^ It will be fixed by this patch > aplay -l card X: sndulcbmix [xxxx], device 0: fe.sound@ec500000.rsnd-dai.0 () [] ... ^^^^^^^^^^^^^^ Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Link: https://patch.msgid.link/87ikpp2rtb.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-04	ASoC: rockchip: i2s-tdm: fix shift config for SND_SOC_DAIFMT_DSP_[AB]	John Keeping	1	-2/+2
	Commit 2f45a4e289779 ("ASoC: rockchip: i2s_tdm: Fixup config for SND_SOC_DAIFMT_DSP_A/B") applied a partial change to fix the configuration for DSP A and DSP B formats. The shift control also needs updating to set the correct offset for frame data compared to LRCK. Set the correct values. Fixes: 081068fd64140 ("ASoC: rockchip: add support for i2s-tdm controller") Signed-off-by: John Keeping <jkeeping@inmusicbrands.com> Link: https://patch.msgid.link/20250204161311.2117240-1-jkeeping@inmusicbrands.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-04	ASoC: Intel: soc-acpi-intel-mtl-match: declare adr as ull	Bard Liao	1	-1/+1
	The adr is u64. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://patch.msgid.link/20250204033134.92332-3-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-04	ASoC: Intel: soc-acpi-intel-tgl-match: declare adr as ull	Bard Liao	1	-8/+8
	The adr is u64. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://patch.msgid.link/20250204033134.92332-2-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-04	ASoC: Intel: sof_sdw: Add support for Fatcat board with BT offload enabled in PTL platform	Uday M Bhat	1	-0/+10
	This change adds an entry for fatcat boards in soundwire quirk table and also, enables BT offload for PTL RVP. Signed-off-by: Uday M Bhat <uday.m.bhat@intel.com> Signed-off-by: Jairaj Arava <jairaj.arava@intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20250204053943.93596-4-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-04	ASoC: Intel: sof_sdw: Add quirk for Asus Zenbook S14	Richard Fitzgerald	1	-0/+1
	Asus laptops with sound PCI subsystem ID 1043:1e13 have the DMICs connected to the host instead of the CS42L43 so need the SOC_SDW_CODEC_MIC quirk. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20250204053943.93596-3-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-04	ASoC: Intel: sof_sdw: Add lookup of quirk using PCI subsystem ID	Richard Fitzgerald	1	-6/+24
	Add lookup of PCI subsystem vendor:device ID to find a quirk. The subsystem ID (SSID) is part of the PCI specification to uniquely identify a particular system-specific implementation of a hardware device. Unlike DMI information, it identifies the sound hardware itself, rather than a specific model of PC. SSID can be more reliable and stable than DMI strings, and is preferred by some vendors as the way to identify the actual sound hardware. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20250204053943.93596-2-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-03	ASoC: fsl_micfil: Enable default case in micfil_set_quality()	Nikita Zhandarovich	1	-0/+2
	If 'micfil->quality' received from micfil_quality_set() somehow ends up with an unpredictable value, switch() operator will fail to initialize local variable qsel before regmap_update_bits() tries to utilize it. While it is unlikely, play it safe and enable a default case that returns -EINVAL error. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: bea1d61d5892 ("ASoC: fsl_micfil: rework quality setting") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://patch.msgid.link/20250116142436.22389-1-n.zhandarovich@fintech.ru Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-02	Linux 6.14-rc1	Linus Torvalds	1	-2/+2

2025-02-02	tools/power turbostat: version 2025.02.02	Len Brown	1	-1/+1
	Summary of Changes since 2024.11.30: Fix regression in 2023.11.07 that affinitized forked child in one-shot mode. Harden one-shot mode against hotplug online/offline Enable RAPL SysWatt column by default. Add initial PTL, CWF platform support. Harden initial PMT code in response to early use. Enable first built-in PMT counter: CWF c1e residency Refuse to run on unsupported platforms without --force, to encourage updating to a version that supports the system, and to avoid no-so-useful measurement results. Signed-off-by: Len Brown <len.brown@intel.com>
2025-02-01	MAINTAINERS: include linux-mm for xarray maintenance	Andrew Morton	1	-0/+1
	MM developers have an interest in the xarray code. Cc: David Gow <davidgow@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	revert "xarray: port tests to kunit"	Andrew Morton	16	-410/+294
	Revert c7bb5cf9fc4e ("xarray: port tests to kunit"). It broke the build when compiing the xarray userspace test harness code. Reported-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Closes: https://lkml.kernel.org/r/07cf896e-adf8-414f-a629-a808fc26014a@oracle.com Cc: David Gow <davidgow@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Tamir Duberstein <tamird@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	MAINTAINERS: add lib/test_xarray.c	Tamir Duberstein	1	-0/+1
	Ensure test-only changes are sent to the relevant maintainer. Link: https://lkml.kernel.org/r/20250129-xarray-test-maintainer-v1-1-482e31f30f47@gmail.com Signed-off-by: Tamir Duberstein <tamird@gmail.com> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mailmap, MAINTAINERS, docs: update Carlos's email address	Carlos Bilbao	3	-6/+8
	Update .mailmap to reflect my new (and final) primary email address, carlos.bilbao@kernel.org. Also update contact information in files Documentation/translations/sp_SP/index.rst and MAINTAINERS. Link: https://lkml.kernel.org/r/20250130012248.1196208-1-carlos.bilbao@kernel.org Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Carlos Bilbao <bilbao@vt.edu> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/hugetlb: fix hugepage allocation for interleaved memory nodes	Ritesh Harjani (IBM)	1	-1/+1
	gather_bootmem_prealloc() assumes the start nid as 0 and size as num_node_state(N_MEMORY). That means in case if memory attached numa nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan few of these nodes. Since memory attached numa nodes can be interleaved in any fashion, hence ensure that the current code checks for all numa node ids (.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that it can distributes all nr_node_ids among the these many no. threads. e.g. qemu cmdline ======================== numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20" mem_cmd="-object memory-backend-ram,id=mem1,size=16G" w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): ========================== ~ # cat /proc/meminfo \|grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 0 kB with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): =========================== ~ # cat /proc/meminfo \|grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 2 HugePages_Free: 2 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 2097152 kB Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.1736592534.git.ritesh.list@gmail.com Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com> Suggested-by: Muchun Song <muchun.song@linux.dev> Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Gang Li <gang.li@linux.dev> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: gup: fix infinite loop within __get_longterm_locked	Zhaoyang Huang	1	-10/+4
	We can run into an infinite loop in __get_longterm_locked() when collect_longterm_unpinnable_folios() finds only folios that are isolated from the LRU or were never added to the LRU. This can happen when all folios to be pinned are never added to the LRU, for example when vm_ops->fault allocated pages using cma_alloc() and never added them to the LRU. Fix it by simply taking a look at the list in the single caller, to see if anything was added. [zhaoyang.huang@unisoc.com: move definition of local] Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()") Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Aijun Sun <aijun.sun@unisoc.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm, swap: fix reclaim offset calculation error during allocation	Kairui Song	1	-1/+1
	There is a code error that will cause the swap entry allocator to reclaim and check the whole cluster with an unexpected tail offset instead of the part that needs to be reclaimed. This may cause corruption of the swap map, so fix it. Link: https://lkml.kernel.org/r/20250130115131.37777-1-ryncsn@gmail.com Fixes: 3b644773eefd ("mm, swap: reduce contention on device lock") Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Chris Li <chrisl@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	.mailmap: update email address for Christopher Obbard	Christopher Obbard	1	-0/+1
	Update my email address. Link: https://lkml.kernel.org/r/20250122-wip-obbardc-update-email-v2-1-12bde6b79ad0@linaro.org Signed-off-by: Christopher Obbard <christopher.obbard@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	kfence: skip __GFP_THISNODE allocations on NUMA systems	Marco Elver	1	-0/+2
	On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on a particular node, and failure to allocate on the desired node will result in a failed allocation. Skip __GFP_THISNODE allocations if we are running on a NUMA system, since KFENCE can't guarantee which node its pool pages are allocated on. Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com Fixes: 236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Alexander Potapenko <glider@google.com> Cc: Chistoph Lameter <cl@linux.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	nilfs2: fix possible int overflows in nilfs_fiemap()	Nikita Zhandarovich	1	-3/+3
	Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result by being prepared to go through potentially maxblocks == INT_MAX blocks, the value in n may experience an overflow caused by left shift of blkbits. While it is extremely unlikely to occur, play it safe and cast right hand expression to wider type to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com Fixes: 622daaff0a89 ("nilfs2: fiemap support") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: compaction: use the proper flag to determine watermarks	yangge	1	-4/+25
	There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of memory. I have configured 16GB of CMA memory on each NUMA node, and starting a 32GB virtual machine with device passthrough is extremely slow, taking almost an hour. Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB of no-CMA memory on a NUMA node can be used as virtual machine memory. There is 16GB of free CMA memory on a NUMA node, which is sufficient to pass the order-0 watermark check, causing the __compaction_suitable() function to consistently return true. For costly allocations, if the __compaction_suitable() function always returns true, it causes the __alloc_pages_slowpath() function to fail to exit at the appropriate point. This prevents timely fallback to allocating memory on other nodes, ultimately resulting in excessively long virtual machine startup times. Call trace: __alloc_pages_slowpath if (compact_result == COMPACT_SKIPPED \|\| compact_result == COMPACT_DEFERRED) goto nopage; // should exit __alloc_pages_slowpath() from here We could use the real unmovable allocation context to have __zone_watermark_unusable_free() subtract CMA pages, and thus we won't pass the order-0 check anymore once the non-CMA part is exhausted. There is some risk that in some different scenario the compaction could in fact migrate pages from the exhausted non-CMA part of the zone to the CMA part and succeed, and we'll skip it instead. But only __GFP_NORETRY allocations should be affected in the immediate "goto nopage" when compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY anyway and won't fail without trying to compact-migrate the non-CMA pageblocks into CMA pageblocks first, so it should be fine. After this fix, it only takes a few tens of seconds to start a 32GB virtual machine with device passthrough functionality. Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/ Link: https://lkml.kernel.org/r/1737788037-8439-1-git-send-email-yangge1116@126.com Signed-off-by: yangge <yangge1116@126.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	kernel: be more careful about dup_mmap() failures and uprobe registering	Liam R. Howlett	2	-3/+18
	If a memory allocation fails during dup_mmap(), the maple tree can be left in an unsafe state for other iterators besides the exit path. All the locks are dropped before the exit_mmap() call (in mm/mmap.c), but the incomplete mm_struct can be reached through (at least) the rmap finding the vmas which have a pointer back to the mm_struct. Up to this point, there have been no issues with being able to find an mm_struct that was only partially initialised. Syzbot was able to make the incomplete mm_struct fail with recent forking changes, so it has been proven unsafe to use the mm_struct that hasn't been initialised, as referenced in the link below. Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to invalid mm") fixed the uprobe access, it does not completely remove the race. This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the oom side (even though this is extremely unlikely to be selected as an oom victim in the race window), and sets MMF_UNSTABLE to avoid other potential users from using a partially initialised mm_struct. When registering vmas for uprobe, skip the vmas in an mm that is marked unstable. Modifying a vma in an unstable mm may cause issues if the mm isn't fully initialised. Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/ Link: https://lkml.kernel.org/r/20250127170221.1761366-1-Liam.Howlett@oracle.com Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/fake-numa: handle cases with no SRAT info	Bruno Faccini	1	-1/+10
	Handle more gracefully cases where no SRAT information is available, like in VMs with no Numa support, and allow fake-numa configuration to complete successfully in these cases Link: https://lkml.kernel.org/r/20250127171623.1523171-1-bfaccini@nvidia.com Fixes: 63db8170bf34 (“mm/fake-numa: allow later numa node hotplug”) Signed-off-by: Bruno Faccini <bfaccini@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Len Brown <lenb@kernel.org> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: kmemleak: fix upper boundary check for physical address objects	Catalin Marinas	1	-1/+1
	Memblock allocations are registered by kmemleak separately, based on their physical address. During the scanning stage, it checks whether an object is within the min_low_pfn and max_low_pfn boundaries and ignores it otherwise. With the recent addition of __percpu pointer leak detection (commit 6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")), kmemleak started reporting leaks in setup_zone_pageset() and setup_per_cpu_pageset(). These were caused by the node_data[0] object (initialised in alloc_node_data()) ending on the PFN_PHYS(max_low_pfn) boundary. The non-strict upper boundary check introduced by commit 84c326299191 ("mm: kmemleak: check physical address when scan") causes the pg_data_t object to be ignored (not scanned) and the __percpu pointers it contains to be reported as leaks. Make the max_low_pfn upper boundary check strict when deciding whether to ignore a physical address object and not scan it. Link: https://lkml.kernel.org/r/20250127184233.2974311-1-catalin.marinas@arm.com Fixes: 84c326299191 ("mm: kmemleak: check physical address when scan") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: Patrick Wang <patrick.wang.shcn@gmail.com> Cc: <stable@vger.kernel.org> [6.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mailmap: add an entry for Hamza Mahfooz	Hamza Mahfooz	1	-0/+1
	Map my previous work email to my current one. Link: https://lkml.kernel.org/r/20250120205659.139027-1-hamzamahfooz@linux.microsoft.com Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hans verkuil <hverkuil@xs4all.nl> Cc: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	MAINTAINERS: mailmap: update Yosry Ahmed's email address	Yosry Ahmed	2	-1/+2
	Moving to a linux.dev email address. Link: https://lkml.kernel.org/r/20250123231344.817358-1-yosry.ahmed@linux.dev Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	scripts/gdb: fix aarch64 userspace detection in get_current_task	Jan Kiszka	1	-1/+1
	At least recent gdb releases (seen with 14.2) return SP_EL0 as signed long which lets the right-shift always return 0. Link: https://lkml.kernel.org/r/dcd2fabc-9131-4b48-8419-6444e2d67454@siemens.com Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Barry Song <baohua@kernel.org> Cc: Kieran Bingham <kbingham@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/vmscan: accumulate nr_demoted for accurate demotion statistics	Li Zhijian	1	-3/+4
	In shrink_folio_list(), demote_folio_list() can be called 2 times. Currently stat->nr_demoted will only store the last nr_demoted( the later nr_demoted is always zero, the former nr_demoted will get lost), as a result number of demoted pages is not accurate. Accumulate the nr_demoted count across multiple calls to demote_folio_list(), ensuring accurate reporting of demotion statistics. [lizhijian@fujitsu.com: introduce local nr_demoted to fix nr_reclaimed double counting] Link: https://lkml.kernel.org/r/20250111015253.425693-1-lizhijian@fujitsu.com Link: https://lkml.kernel.org/r/20250110122133.423481-1-lizhijian@fujitsu.com Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu> Tested-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	ocfs2: fix incorrect CPU endianness conversion causing mount failure	Heming Zhao	1	-1/+1
	Commit 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") introduced a regression bug. The blksz_bits value is already converted to CPU endian in the previous code; therefore, the code shouldn't use le32_to_cpu() anymore. Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com Fixes: 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()	Hyeonggon Yoo	1	-1/+1
	Commit c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()") introduces is_first_zpdesc() function. However, the function is only used when CONFIG_DEBUG_VM=y. When building with LLVM=1 and W=1 option, the following warning is generated: $ make -j12 W=1 LLVM=1 mm/zsmalloc.o mm/zsmalloc.c:455:20: error: function 'is_first_zpdesc' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration] 455 \| static inline bool is_first_zpdesc(struct zpdesc *zpdesc) \| ^~~~~~~~~~~~~~~ 1 error generated. Fix the warning by adding __maybe_unused attribute to the function. No functional change intended. Link: https://lkml.kernel.org/r/20250127231631.4363-1-42.hyeyoo@gmail.com Fixes: c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()") Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501240958.4ILzuBrH-lkp@intel.com/ Cc: Alex Shi <alexs@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/vmscan: fix hard LOCKUP in function isolate_lru_folios	liuye	2	-1/+6
	This fixes the following hard lockup in isolate_lru_folios() during memory reclaim. If the LRU mostly contains ineligible folios this may trigger watchdog. watchdog: Watchdog detected hard LOCKUP on cpu 173 RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0 Call Trace: _raw_spin_lock_irqsave+0x31/0x40 folio_lruvec_lock_irqsave+0x5f/0x90 folio_batch_move_lru+0x91/0x150 lru_add_drain_per_cpu+0x1c/0x40 process_one_work+0x17d/0x350 worker_thread+0x27b/0x3a0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 lruvec->lru_lock owner： PID: 2865 TASK: ffff888139214d40 CPU: 40 COMMAND: "kswapd0" #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555 #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171 #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920 #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4 #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde [exception RIP: isolate_lru_folios+403] RIP: ffffffffa597df53 RSP: ffffc90006fb7c28 RFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffc90006fb7c60 RCX: ffffea04a2196f88 RDX: ffffc90006fb7c60 RSI: ffffc90006fb7c60 RDI: ffffea04a2197048 RBP: ffff88812cbd3010 R8: ffffea04a2197008 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: ffffea04a2197008 R13: ffffea04a2197048 R14: ffffc90006fb7de8 R15: 0000000003e3e937 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 <NMI exception stack> #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 #6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788 #7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0 #8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354 #9 [ffffc90006fb7ef8] kthread at ffffffffa5748238 crash> Scenario: User processe are requesting a large amount of memory and keep page active. Then a module continuously requests memory from ZONE_DMA32 area. Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached. However pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area. Reproduce: Terminal 1: Construct to continuously increase pages active(anon). mkdir /tmp/memory mount -t tmpfs -o size=1024000M tmpfs /tmp/memory dd if=/dev/zero of=/tmp/memory/block bs=4M tail /tmp/memory/block Terminal 2: vmstat -a 1 active will increase. procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ... r b swpd free inact active si so bi bo 1 0 0 1445623076 45898836 83646008 0 0 0 1 0 0 1445623076 43450228 86094616 0 0 0 1 0 0 1445623076 41003480 88541364 0 0 0 1 0 0 1445623076 38557088 90987756 0 0 0 1 0 0 1445623076 36109688 93435156 0 0 0 1 0 0 1445619552 33663256 95881632 0 0 0 1 0 0 1445619804 31217140 98327792 0 0 0 1 0 0 1445619804 28769988 100774944 0 0 0 1 0 0 1445619804 26322348 103222584 0 0 0 1 0 0 1445619804 23875592 105669340 0 0 0 cat /proc/meminfo \| head Active(anon) increase. MemTotal: 1579941036 kB MemFree: 1445618500 kB MemAvailable: 1453013224 kB Buffers: 6516 kB Cached: 128653956 kB SwapCached: 0 kB Active: 118110812 kB Inactive: 11436620 kB Active(anon): 115345744 kB Inactive(anon): 945292 kB When the Active(anon) is 115345744 kB, insmod module triggers the ZONE_DMA32 watermark. perf record -e vmscan:mm_vmscan_lru_isolate -aR perf script isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2 nr_skipped=2 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29 nr_skipped=29 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon See nr_scanned=28835844. 28835844 * 4k = 115343376KB approximately equal to 115345744 kB. If increase Active(anon) to 1000G then insmod module triggers the ZONE_DMA32 watermark. hard lockup will occur. In my device nr_scanned = 0000000003e3e937 when hard lockup. Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB. [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 ffffc90006fb7c30: 0000000000000020 0000000000000000 ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000 ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8 ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48 ffffc90006fb7c70: 0000000000000000 0000000000000000 ffffc90006fb7c80: 0000000000000000 0000000000000000 ffffc90006fb7c90: 0000000000000000 0000000000000000 ffffc90006fb7ca0: 0000000000000000 0000000003e3e937 ffffc90006fb7cb0: 0000000000000000 0000000000000000 ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000 About the Fixes: Why did it take eight years to be discovered? The problem requires the following conditions to occur: 1. The device memory should be large enough. 2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area. 3. The memory in ZONE_DMA32 needs to reach the watermark. If the memory is not large enough, or if the usage design of ZONE_DMA32 area memory is reasonable, this problem is difficult to detect. notes: The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL, but other suitable scenarios may also trigger the problem. Link: https://lkml.kernel.org/r/20241119060842.274072-1-liuye@kylinos.cn Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis") Signed-off-by: liuye <liuye@kylinos.cn> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>