aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/robust-futexes.txt (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-10platform/x86: intel_telemetry: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li1-36/+6
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-12-10platform/x86: intel_pmc_core: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li1-41/+8
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-12-10platform/x86: thinkpad_acpi: Cleanup quirks macrosJouke Witteveen1-32/+13
- Use generic quirks macros for fan quirks The fan-specific quirks macros were duplicates of the generic ones. - Remove useless #undef lines The referenced macros are not defined anywhere. Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-12-07platform/x86: touchscreen_dmi: Add info for the Mediacom Flexbook Edge 11Hans de Goede1-0/+8
Add a DMI match for the Mediacom Flexbook Edge 11, this is the same hw as the Trekstor Primebook C11, so we use the same settings. Reported-by: rmbg <alexofrichardmilitiabg@hotmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-12-03platform/x86: Fix config space access for intel_atomisp2_pmVille Syrjälä1-20/+48
We lose even config space access when we power gate the ISP via the PUNIT. That makes lspci & co. produce gibberish. To fix that let's try to implement actual runtime pm hooks and inform the pci core that the device always goes to D3cold. That will cause the pci core to resume the device before attempting config space access. This introduces another annoyance though. We get the following error every time we try to resume the device: intel_atomisp2_pm 0000:00:03.0: Refused to change power state, currently in D3 The reason being that the pci core tries to put the device back into D0 via the standard PCI PM mechanism before calling the driver resume hook. To fix this properly we'd need to infiltrate the platform pm hooks (could turn ugly real fast), or use pm domains (which don't seem to exist on x86), or some extra early resume hook for the driver (which doesn't exist either). So maybe we just choose to live with the error? Cc: Hans de Goede <hdegoede@redhat.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-12-03platform/x86: Add the VLV ISP PCI ID to atomisp2_pmVille Syrjälä1-0/+1
If the ISP is exposed as a PCI device VLV machines need the same treatment as CHV machines to power gate the ISP. Otherwise s0ix will not work. Cc: Hans de Goede <hdegoede@redhat.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-12-03platform/x86: intel_ips: Convert to use DEFINE_SHOW_ATTRIBUTE macroAndy Shevchenko1-44/+15
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-12-03platform/x86: intel_ips: Remove never happen conditionAndy Shevchenko1-3/+0
At ->remove() stage we know that device had been instantiated properly, so, it can't be an invalid pointer to the driver data. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-12-03platform/x86: intel_ips: NULL check before some freeing functions is not neededThomas Meyer1-3/+1
NULL check before some freeing functions is not needed. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-12-03platform/x86: intel_ips: remove unnecessary checks in ips_debugfs_initYueHaibing1-20/+3
As Greg KH explained in: https://lkml.org/lkml/2015/8/15/114 There no need to check the return value of debugfs_create_file() and debugfs_create_dir(). This also fix static code checker warnings: drivers/platform/x86/intel_ips.c:1314 ips_debugfs_init() warn: passing zero to 'PTR_ERR' drivers/platform/x86/intel_ips.c:1328 ips_debugfs_init() warn: passing zero to 'PTR_ERR' Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-12-03iio: inv_mpu6050: Use i2c_acpi_get_i2c_resource() helperAndy Shevchenko1-10/+6
ACPI provides a generic helper to get I2C Serial Bus resources. Use it instead of open coded variant. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2018-12-03ACPI / scan: Create platform device for INT3515 ACPI nodesAndy Shevchenko3-4/+14
The ACPI device with INT3515 _HID is representing a complex USB PD hardware infrastructure which includes several I2C slave ICs. We add an ID to the I2C multi instantiate list to enumerate all I2C slaves correctly. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-03platform/x86: i2c-multi-instantiate: Allow to have same slavesAndy Shevchenko1-2/+2
Currently the driver will not enumerate the devices where I2C slaves are of the same type. Add an instance number to make them unique. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-12-03platform/x86: i2c-multi-instantiate: Introduce IOAPIC IRQ supportAndy Shevchenko1-0/+9
If ACPI table provides an Interrupt() resource we may consider to use it instead of GpioInt() one. Here we leave an error condition, when getting IRQ resource, to the driver to decide how to proceed, because some drivers may consider IRQ resource optional. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-12-03platform/x86: i2c-multi-instantiate: Distinguish IRQ resource typeAndy Shevchenko1-9/+18
As a preparatory patch switch the driver to distinguish IRQ resource type. For now, only GpioInt() is supported. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-12-03platform/x86: i2c-multi-instantiate: Count I2cSerialBus() resourcesAndy Shevchenko1-4/+37
Instead of relying on hard coded and thus expected number of I2C clients, count the real amount provided by firmware. This allows to support non-fixed amount of the slaves. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-12-03i2c: acpi: Introduce i2c_acpi_get_i2c_resource() helperAndy Shevchenko2-12/+40
Besides current two users one more is coming. Definitely makes sense to introduce a helper. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Wolfram Sang <wsa@the-dreams.de>
2018-12-03i2c: acpi: Use ACPI_FAILURE instead of !ACPI_SUCCESSAndy Shevchenko1-1/+1
Convert to use ACPI_FAILURE instead of !ACPI_SUCCESS. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Wolfram Sang <wsa@the-dreams.de>
2018-12-03platform/x86: i2c-multi-instantiate: Get rid of obsolete conditionalAndy Shevchenko1-7/+3
Now, when i2c_acpi_new_device() never returns NULL, there is no point to check for it. Besides that, i2c_acpi_new_device() returns -EPROBE_DEFER directly and caller doesn't need to guess is better. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-12-03platform/x86: intel_cht_int33fe: Get rid of obsolete conditionalAndy Shevchenko1-19/+5
Now, when i2c_acpi_new_device() never returns NULL, there is no point to check for it. Besides that, i2c_acpi_new_device() returns -EPROBE_DEFER directly and caller doesn't need to guess is better. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-12-03i2c: acpi: Return error pointers from i2c_acpi_new_device()Andy Shevchenko1-6/+15
The caller would like to know the reason why the i2c_acpi_new_device() fails. For example, if adapter is not available, it might be in the future and we would like to re-probe the clients again. But at the same time we would like to bail out if the error seems unrecoverable, such as invalid argument supplied. To achieve this, return error pointer in some cases. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Wolfram Sang <wsa@the-dreams.de>
2018-12-03platform/x86: i2c-multi-instantiate: Defer probe when no adapter foundAndy Shevchenko1-1/+1
Likewise the rest of the i2c_acpi_new_device() users, defer the probe of the i2c-multi-intantiate driver in case adapter is not yet found. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-12-03platform/x86: i2c-multi-instantiate: Accept errors of i2c_acpi_new_device()Andy Shevchenko1-2/+7
In the future i2c_acpi_new_device() will return error pointer in some cases. Prepare i2c-multi-instantiate driver to support that. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-12-03platform/x86: intel_cht_int33fe: Accept errors of i2c_acpi_new_device()Andy Shevchenko1-5/+23
In the future i2c_acpi_new_device() will return error pointer in some cases. Prepare intel_cht_int33fe driver to support that. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-12-03platform/x86: intel_cht_int33fe: Remove duplicate NULL checkAndy Shevchenko1-4/+2
Since i2c_unregister_device() became NULL-aware we may remove duplicate NULL check. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2018-11-28platform/x86: dell-laptop: Mark expected switch fall-throughsGustavo A. R. Silva1-0/+2
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-11-27platform/x86: ideapad-laptop: Add Yoga 2 13 to no_hw_rfkill listLoic WEI YU NENG1-0/+7
Some Lenovo IdeaPad models lack a physical rfkill switch. On Lenovo models Yoga 2 13, ideapad-laptop would wrongly report all radios as blocked by hardware which caused wireless network connections to fail. Add these models without an rfkill switch to the no_hw_rfkill list. Signed-off-by: Loic WEI YU NENG <loic.wyn@gmail.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-11-10platform/x86: intel_pmc_core: Decode Snoop / Non Snoop LTRRajneesh Bhardwaj2-2/+67
The LTR values follow PCIE LTR encoding format and can be decoded as per https://pcisig.com/sites/default/files/specification_documents/ECN_LatencyTolnReporting_14Aug08.pdf This adds support to translate the raw LTR values as read from the PMC to meaningful values in nanosecond units of time. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-11-10platform/x86: intel_pmc_core: Fix LTR IGNORE Max offsetRajneesh Bhardwaj2-2/+6
Cannonlake PCH allows us to ignore LTR from more IPs than Sunrisepoint PCH so make the LTR ignore platform specific. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-11-10platform/x86: intel_pmc_core: Show Latency Tolerance infoRajneesh Bhardwaj2-7/+119
This adds support to show the Latency Tolerance Reporting for the IPs on the PCH as reported by the PMC. The format shown here is raw LTR data payload that can further be decoded as per the PCI specification. This also fixes some minor alignment issues in the header file by removing spaces and converting to tabs at some places. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-11-07platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codesJoão Paulo Rechi Vita1-2/+0
According to Asus firmware engineers, the meaning of these codes is only to notify the OS that the screen brightness has been turned on/off by the EC. This does not match the meaning of KEY_DISPLAYTOGGLE / KEY_DISPLAY_OFF, where userspace is expected to change the display brightness. Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-11-07platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCKJoão Paulo Rechi Vita1-0/+1
When the OS registers to handle events from the display off hotkey the EC will send a notification with 0x35 for every key press, independent of the backlight state. The behavior of this key on Windows, with the ATKACPI driver from Asus installed, is turning off the backlight of all connected displays with a fading effect, and any cursor input or key press turning the backlight back on. The key press or cursor input that wakes up the display is also passed through to the application under the cursor or under focus. The key that matches this behavior the closest is KEY_SCREENLOCK. Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-11-07platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkeyJoão Paulo Rechi Vita1-1/+2
In the past, Asus firmwares would change the panel backlight directly through the EC when the display off hotkey (Fn+F7) was pressed, and only notify the OS of such change, with 0x33 when the LCD was ON and 0x34 when the LCD was OFF. These are currently mapped to KEY_DISPLAYTOGGLE and KEY_DISPLAY_OFF, respectively. Most recently the EC on Asus most machines lost ability to toggle the LCD backlight directly, but unless the OS informs the firmware it is going to handle the display toggle hotkey events, the firmware still tries change the brightness through the EC, to no effect. The end result is a long list (at Endless we counted 11) of Asus laptop models where the display toggle hotkey does not perform any action. Our firmware engineers contacts at Asus were surprised that there were still machines out there with the old behavior. Calling WMNB(ASUS_WMI_DEVID_BACKLIGHT==0x00050011, 2) on the _WDG device tells the firmware that it should let the OS handle the display toggle event, in which case it will simply notify the OS of a key press with 0x35, as shown by the DSDT excerpts bellow. Scope (_SB) { (...) Device (ATKD) { (...) Name (_WDG, Buffer (0x28) { /* 0000 */ 0xD0, 0x5E, 0x84, 0x97, 0x6D, 0x4E, 0xDE, 0x11, /* 0008 */ 0x8A, 0x39, 0x08, 0x00, 0x20, 0x0C, 0x9A, 0x66, /* 0010 */ 0x4E, 0x42, 0x01, 0x02, 0x35, 0xBB, 0x3C, 0x0B, /* 0018 */ 0xC2, 0xE3, 0xED, 0x45, 0x91, 0xC2, 0x4C, 0x5A, /* 0020 */ 0x6D, 0x19, 0x5D, 0x1C, 0xFF, 0x00, 0x01, 0x08 }) Method (WMNB, 3, Serialized) { CreateDWordField (Arg2, Zero, IIA0) CreateDWordField (Arg2, 0x04, IIA1) Local0 = (Arg1 & 0xFFFFFFFF) (...) If ((Local0 == 0x53564544)) { (...) If ((IIA0 == 0x00050011)) { If ((IIA1 == 0x02)) { ^^PCI0.SBRG.EC0.SPIN (0x72, One) ^^PCI0.SBRG.EC0.BLCT = One } Return (One) } } (...) } (...) } (...) } (...) Scope (_SB.PCI0.SBRG.EC0) { (...) Name (BLCT, Zero) (...) Method (_Q10, 0, NotSerialized) // _Qxx: EC Query { If ((BLCT == Zero)) { Local0 = One Local0 = RPIN (0x72) Local0 ^= One SPIN (0x72, Local0) If (ATKP) { Local0 = (0x34 - Local0) ^^^^ATKD.IANE (Local0) } } ElseIf ((BLCT == One)) { If (ATKP) { ^^^^ATKD.IANE (0x35) } } } (...) } Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-11-07platform/x86: thinkpad_acpi: Change the keymap for Favorites hotkeyZhang Xianwei1-1/+1
The keycode KEY_FAVORITES(0x16c) used in thinkpad_acpi driver is too big (out of range > 255) for xorg to handle. xkeyboard-config has already mapped KEY_BOOKMARKS(156) to XF86Favorites: keycodes/evdev: <I164> = 164; // #define KEY_BOOKMARKS 156 symbols/inet: key <I164> { [ XF86Favorites ] }; So change the keymap to KEY_BOOKMARKS for Favorites hotkey. Signed-off-by: Zhang Xianwei <zhang.xianwei8@zte.com.cn> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-11-04Linux 4.20-rc1Linus Torvalds1-2/+2
2018-11-04sched/topology: Fix off by one bugPeter Zijlstra1-1/+1
With the addition of the NUMA identity level, we increased @level by one and will run off the end of the array in the distance sort loop. Fixed: 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-03memory_hotplug: cond_resched in __remove_pagesMichal Hocko1-0/+1
We have received a bug report that unbinding a large pmem (>1TB) can result in a soft lockup: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [ndctl:4365] [...] Supported: Yes CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default #1 SLE12-SP4 Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018 task: ffff9cce7d4410c0 task.stack: ffffbe9eb1bc4000 RIP: 0010:__put_page+0x62/0x80 Call Trace: devm_memremap_pages_release+0x152/0x260 release_nodes+0x18d/0x1d0 device_release_driver_internal+0x160/0x210 unbind_store+0xb3/0xe0 kernfs_fop_write+0x102/0x180 __vfs_write+0x26/0x150 vfs_write+0xad/0x1a0 SyS_write+0x42/0x90 do_syscall_64+0x74/0x150 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fd13166b3d0 It has been reported on an older (4.12) kernel but the current upstream code doesn't cond_resched in the hot remove code at all and the given range to remove might be really large. Fix the issue by calling cond_resched once per memory section. Link: http://lkml.kernel.org/r/20181031125840.23982-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Dan Williams <dan.j.williams@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03bfs: add sanity check at bfs_fill_super()Tetsuo Handa1-3/+6
syzbot is reporting too large memory allocation at bfs_fill_super() [1]. Since file system image is corrupted such that bfs_sb->s_start == 0, bfs_fill_super() is trying to allocate 8MB of continuous memory. Fix this by adding a sanity check on bfs_sb->s_start, __GFP_NOWARN and printf(). [1] https://syzkaller.appspot.com/bug?id=16a87c236b951351374a84c8a32f40edbc034e96 Link: http://lkml.kernel.org/r/1525862104-3407-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+71c6b5d68e91149fc8a4@syzkaller.appspotmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tigran Aivazian <aivazian.tigran@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03kernel/sysctl.c: remove duplicated includeMichael Schupikov1-1/+0
Remove one include of <linux/pipe_fs_i.h>. No functional changes. Link: http://lkml.kernel.org/r/20181004134223.17735-1-michael@schupikov.de Signed-off-by: Michael Schupikov <michael@schupikov.de> Reviewed-by: Richard Weinberger <richard@nod.at> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03kernel/kexec_file.c: remove some duplicated includeszhong jiang1-2/+0
We include kexec.h and slab.h twice in kexec_file.c. It's unnecessary. hence just remove them. Link: http://lkml.kernel.org/r/1537498098-19171-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmaskMichal Hocko5-77/+40
THP allocation mode is quite complex and it depends on the defrag mode. This complexity is hidden in alloc_hugepage_direct_gfpmask from a large part currently. The NUMA special casing (namely __GFP_THISNODE) is however independent and placed in alloc_pages_vma currently. This both adds an unnecessary branch to all vma based page allocation requests and it makes the code more complex unnecessarily as well. Not to mention that e.g. shmem THP used to do the node reclaiming unconditionally regardless of the defrag mode until recently. This was not only unexpected behavior but it was also hardly a good default behavior and I strongly suspect it was just a side effect of the code sharing more than a deliberate decision which suggests that such a layering is wrong. Get rid of the thp special casing from alloc_pages_vma and move the logic to alloc_hugepage_direct_gfpmask. __GFP_THISNODE is applied to the resulting gfp mask only when the direct reclaim is not requested and when there is no explicit numa binding to preserve the current logic. Please note that there's also a slight difference wrt MPOL_BIND now. The previous code would avoid using __GFP_THISNODE if the local node was outside of policy_nodemask(). After this patch __GFP_THISNODE is avoided for all MPOL_BIND policies. So there's a difference that if local node is actually allowed by the bind policy's nodemask, previously __GFP_THISNODE would be added, but now it won't be. From the behavior POV this is still correct because the policy nodemask is used. Link: http://lkml.kernel.org/r/20180925120326.24392-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: fix clusters leak in ocfs2_defrag_extent()Larry Chen1-0/+17
ocfs2_defrag_extent() might leak allocated clusters. When the file system has insufficient space, the number of claimed clusters might be less than the caller wants. If that happens, the original code might directly commit the transaction without returning clusters. This patch is based on code in ocfs2_add_clusters_in_btree(). [akpm@linux-foundation.org: include localalloc.h, reduce scope of data_ac] Link: http://lkml.kernel.org/r/20180904041621.16874-3-lchen@suse.com Signed-off-by: Larry Chen <lchen@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: dlmglue: clean up timestamp handlingArnd Bergmann1-17/+9
The handling of timestamps outside of the 1970..2038 range in the dlm glue is rather inconsistent: on 32-bit architectures, this has always wrapped around to negative timestamps in the 1902..1969 range, while on 64-bit kernels all timestamps are interpreted as positive 34 bit numbers in the 1970..2514 year range. Now that the VFS code handles 64-bit timestamps on all architectures, we can make the behavior more consistent here, and return the same result that we had on 64-bit already, making the file system y2038 safe in the process. Outside of dlmglue, it already uses 64-bit on-disk timestamps anway, so that part is fine. For consistency, I'm changing ocfs2_pack_timespec() to clamp anything outside of the supported range to the minimum and maximum values. This avoids a possible ambiguity of values before 1970 in particular, which used to be interpreted as times at the end of the 2514 range previously. Link: http://lkml.kernel.org/r/20180619155826.4106487-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: don't put and assigning null to bh allocated outsideChangwei Ge1-18/+59
ocfs2_read_blocks() and ocfs2_read_blocks_sync() are both used to read several blocks from disk. Currently, the input argument *bhs* can be NULL or NOT. It depends on the caller's behavior. If the function fails in reading blocks from disk, the corresponding bh will be assigned to NULL and put. Obviously, above process for non-NULL input bh is not appropriate. Because the caller doesn't even know its bhs are put and re-assigned. If buffer head is managed by caller, ocfs2_read_blocks and ocfs2_read_blocks_sync() should not evaluate it to NULL. It will cause caller accessing illegal memory, thus crash. Link: http://lkml.kernel.org/r/HK2PR06MB045285E0F4FBB561F9F2F9B3D5680@HK2PR06MB0452.apcprd06.prod.outlook.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Guozhonghua <guozhonghua@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: fix a misuse a of brelse after failing ocfs2_check_dir_entryChangwei Ge1-2/+1
Somehow, file system metadata was corrupted, which causes ocfs2_check_dir_entry() to fail in function ocfs2_dir_foreach_blk_el(). According to the original design intention, if above happens we should skip the problematic block and continue to retrieve dir entry. But there is obviouse misuse of brelse around related code. After failure of ocfs2_check_dir_entry(), current code just moves to next position and uses the problematic buffer head again and again during which the problematic buffer head is released for multiple times. I suppose, this a serious issue which is long-lived in ocfs2. This may cause other file systems which is also used in a the same host insane. So we should also consider about bakcporting this patch into linux -stable. Link: http://lkml.kernel.org/r/HK2PR06MB045211675B43EED794E597B6D56E0@HK2PR06MB0452.apcprd06.prod.outlook.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Suggested-by: Changkuo Shi <shi.changkuo@h3c.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: don't use iocb when EIOCBQUEUED returnsChangwei Ge1-2/+2
When -EIOCBQUEUED returns, it means that aio_complete() will be called from dio_complete(), which is an asynchronous progress against write_iter. Generally, IO is a very slow progress than executing instruction, but we still can't take the risk to access a freed iocb. And we do face a BUG crash issue. Using the crash tool, iocb is obviously freed already. crash> struct -x kiocb ffff881a350f5900 struct kiocb { ki_filp = 0xffff881a350f5a80, ki_pos = 0x0, ki_complete = 0x0, private = 0x0, ki_flags = 0x0 } And the backtrace shows: ocfs2_file_write_iter+0xcaa/0xd00 [ocfs2] aio_run_iocb+0x229/0x2f0 do_io_submit+0x291/0x540 SyS_io_submit+0x10/0x20 system_call_fastpath+0x16/0x75 Link: http://lkml.kernel.org/r/1523361653-14439-1-git-send-email-ge.changwei@h3c.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: without quota support, avoid calling quota recoveryGuozhonghua1-17/+34
During one dead node's recovery by other node, quota recovery work will be queued. We should avoid calling quota when it is not supported, so check the quota flags. Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA401071AC9FB@H3CMLB12-EX.srv.huawei-3com.com Signed-off-by: guozhonghua <guozhonghua@h3c.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: remove ocfs2_is_o2cb_active()Gang He3-10/+1
Remove ocfs2_is_o2cb_active(). We have similar functions to identify which cluster stack is being used via osb->osb_cluster_stack. Secondly, the current implementation of ocfs2_is_o2cb_active() is not totally safe. Based on the design of stackglue, we need to get ocfs2_stack_lock before using ocfs2_stack related data structures, and that active_stack pointer can be NULL in the case of mount failure. Link: http://lkml.kernel.org/r/1495441079-11708-1-git-send-email-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Eric Ren <zren@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappingsAndrea Arcangeli1-2/+30
THP allocation might be really disruptive when allocated on NUMA system with the local node full or hard to reclaim. Stefan has posted an allocation stall report on 4.12 based SLES kernel which suggests the same issue: kvm: page allocation stalls for 194572ms, order:9, mode:0x4740ca(__GFP_HIGHMEM|__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|__GFP_MOVABLE|__GFP_DIRECT_RECLAIM), nodemask=(null) kvm cpuset=/ mems_allowed=0-1 CPU: 10 PID: 84752 Comm: kvm Tainted: G W 4.12.0+98-ph <a href="/view.php?id=1" title="[geschlossen] Integration Ramdisk" class="resolved">0000001</a> SLE15 (unreleased) Hardware name: Supermicro SYS-1029P-WTRT/X11DDW-NT, BIOS 2.0 12/05/2017 Call Trace: dump_stack+0x5c/0x84 warn_alloc+0xe0/0x180 __alloc_pages_slowpath+0x820/0xc90 __alloc_pages_nodemask+0x1cc/0x210 alloc_pages_vma+0x1e5/0x280 do_huge_pmd_wp_page+0x83f/0xf00 __handle_mm_fault+0x93d/0x1060 handle_mm_fault+0xc6/0x1b0 __do_page_fault+0x230/0x430 do_page_fault+0x2a/0x70 page_fault+0x7b/0x80 [...] Mem-Info: active_anon:126315487 inactive_anon:1612476 isolated_anon:5 active_file:60183 inactive_file:245285 isolated_file:0 unevictable:15657 dirty:286 writeback:1 unstable:0 slab_reclaimable:75543 slab_unreclaimable:2509111 mapped:81814 shmem:31764 pagetables:370616 bounce:0 free:32294031 free_pcp:6233 free_cma:0 Node 0 active_anon:254680388kB inactive_anon:1112760kB active_file:240648kB inactive_file:981168kB unevictable:13368kB isolated(anon):0kB isolated(file):0kB mapped:280240kB dirty:1144kB writeback:0kB shmem:95832kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 81225728kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Node 1 active_anon:250583072kB inactive_anon:5337144kB active_file:84kB inactive_file:0kB unevictable:49260kB isolated(anon):20kB isolated(file):0kB mapped:47016kB dirty:0kB writeback:4kB shmem:31224kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 31897600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no The defrag mode is "madvise" and from the above report it is clear that the THP has been allocated for MADV_HUGEPAGA vma. Andrea has identified that the main source of the problem is __GFP_THISNODE usage: : The problem is that direct compaction combined with the NUMA : __GFP_THISNODE logic in mempolicy.c is telling reclaim to swap very : hard the local node, instead of failing the allocation if there's no : THP available in the local node. : : Such logic was ok until __GFP_THISNODE was added to the THP allocation : path even with MPOL_DEFAULT. : : The idea behind the __GFP_THISNODE addition, is that it is better to : provide local memory in PAGE_SIZE units than to use remote NUMA THP : backed memory. That largely depends on the remote latency though, on : threadrippers for example the overhead is relatively low in my : experience. : : The combination of __GFP_THISNODE and __GFP_DIRECT_RECLAIM results in : extremely slow qemu startup with vfio, if the VM is larger than the : size of one host NUMA node. This is because it will try very hard to : unsuccessfully swapout get_user_pages pinned pages as result of the : __GFP_THISNODE being set, instead of falling back to PAGE_SIZE : allocations and instead of trying to allocate THP on other nodes (it : would be even worse without vfio type1 GUP pins of course, except it'd : be swapping heavily instead). Fix this by removing __GFP_THISNODE for THP requests which are requesting the direct reclaim. This effectivelly reverts 5265047ac301 on the grounds that the zone/node reclaim was known to be disruptive due to premature reclaim when there was memory free. While it made sense at the time for HPC workloads without NUMA awareness on rare machines, it was ultimately harmful in the majority of cases. The existing behaviour is similar, if not as widespare as it applies to a corner case but crucially, it cannot be tuned around like zone_reclaim_mode can. The default behaviour should always be to cause the least harm for the common case. If there are specialised use cases out there that want zone_reclaim_mode in specific cases, then it can be built on top. Longterm we should consider a memory policy which allows for the node reclaim like behavior for the specific memory ranges which would allow a [1] http://lkml.kernel.org/r/20180820032204.9591-1-aarcange@redhat.com Mel said: : Both patches look correct to me but I'm responding to this one because : it's the fix. The change makes sense and moves further away from the : severe stalling behaviour we used to see with both THP and zone reclaim : mode. : : I put together a basic experiment with usemem configured to reference a : buffer multiple times that is 80% the size of main memory on a 2-socket : box with symmetric node sizes and defrag set to "always". The defrag : setting is not the default but it would be functionally similar to : accessing a buffer with madvise(MADV_HUGEPAGE). Usemem is configured to : reference the buffer multiple times and while it's not an interesting : workload, it would be expected to complete reasonably quickly as it fits : within memory. The results were; : : usemem : vanilla noreclaim-v1 : Amean Elapsd-1 42.78 ( 0.00%) 26.87 ( 37.18%) : Amean Elapsd-3 27.55 ( 0.00%) 7.44 ( 73.00%) : Amean Elapsd-4 5.72 ( 0.00%) 5.69 ( 0.45%) : : This shows the elapsed time in seconds for 1 thread, 3 threads and 4 : threads referencing buffers 80% the size of memory. With the patches : applied, it's 37.18% faster for the single thread and 73% faster with two : threads. Note that 4 threads showing little difference does not indicate : the problem is related to thread counts. It's simply the case that 4 : threads gets spread so their workload mostly fits in one node. : : The overall view from /proc/vmstats is more startling : : 4.19.0-rc1 4.19.0-rc1 : vanillanoreclaim-v1r1 : Minor Faults 35593425 708164 : Major Faults 484088 36 : Swap Ins 3772837 0 : Swap Outs 3932295 0 : : Massive amounts of swap in/out without the patch : : Direct pages scanned 6013214 0 : Kswapd pages scanned 0 0 : Kswapd pages reclaimed 0 0 : Direct pages reclaimed 4033009 0 : : Lots of reclaim activity without the patch : : Kswapd efficiency 100% 100% : Kswapd velocity 0.000 0.000 : Direct efficiency 67% 100% : Direct velocity 11191.956 0.000 : : Mostly from direct reclaim context as you'd expect without the patch. : : Page writes by reclaim 3932314.000 0.000 : Page writes file 19 0 : Page writes anon 3932295 0 : Page reclaim immediate 42336 0 : : Writes from reclaim context is never good but the patch eliminates it. : : We should never have default behaviour to thrash the system for such a : basic workload. If zone reclaim mode behaviour is ever desired but on a : single task instead of a global basis then the sensible option is to build : a mempolicy that enforces that behaviour. This was a severe regression compared to previous kernels that made important workloads unusable and it starts when __GFP_THISNODE was added to THP allocations under MADV_HUGEPAGE. It is not a significant risk to go to the previous behavior before __GFP_THISNODE was added, it worked like that for years. This was simply an optimization to some lucky workloads that can fit in a single node, but it ended up breaking the VM for others that can't possibly fit in a single node, so going back is safe. [mhocko@suse.com: rewrote the changelog based on the one from Andrea] Link: http://lkml.kernel.org/r/20180925120326.24392-2-mhocko@kernel.org Fixes: 5265047ac301 ("mm, thp: really limit transparent hugepage allocation to local node") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Debugged-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Mel Gorman <mgorman@techsingularity.net> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: <stable@vger.kernel.org> [4.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03include/linux/notifier.h: SRCU: fix ctagsSam Protsenko1-2/+1
ctags indexing ("make tags" command) throws this warning: ctags: Warning: include/linux/notifier.h:125: null expansion of name pattern "\1" This is the result of DEFINE_PER_CPU() macro expansion. Fix that by getting rid of line break. Similar fix was already done in commit 25528213fe9f ("tags: Fix DEFINE_PER_CPU expansions"), but this one probably wasn't noticed. Link: http://lkml.kernel.org/r/20181030202808.28027-1-semen.protsenko@linaro.org Fixes: 9c80172b902d ("kernel/SRCU: provide a static initializer") Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>