aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-10-19doc: Fix RCU's docbook optionsPaul E. McKenney1-14/+0
Commit 764f80798b95 ("doc: Add RCU files to docbook-generation files") added :external: options for RCU source files in the file Documentation/core-api/kernel-api.rst. However, this now means nothing, so this commit removes them. Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-19membarrier: Provide register expedited private commandMathieu Desnoyers5-11/+66
This introduces a "register private expedited" membarrier command which allows eventual removal of important memory barrier constraints on the scheduler fast-paths. It changes how the "private expedited" membarrier command (new to 4.14) is used from user-space. This new command allows processes to register their intent to use the private expedited command. This affects how the expedited private command introduced in 4.14-rc is meant to be used, and should be merged before 4.14 final. Processes are now required to register before using MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. This fixes a problem that arose when designing requested extensions to sys_membarrier() to allow JITs to efficiently flush old code from instruction caches. Several potential algorithms are much less painful if the user register intent to use this functionality early on, for example, before the process spawns the second thread. Registering at this time removes the need to interrupt each and every thread in that process at the first expedited sys_membarrier() system call. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-19parisc: Fix detection of nonsynchronous cr16 cycle countersHelge Deller1-1/+4
For CPUs which have an unknown or invalid CPU location (physical location) assume that their cycle counters aren't syncronized across CPUs. Signed-off-by: Helge Deller <deller@gmx.de> Fixes: c8c3735997a3 ("parisc: Enhance detection of synchronous cr16 clocksources") Cc: stable@vger.kernel.org # 4.13+ Signed-off-by: Helge Deller <deller@gmx.de>
2017-10-19parisc: Export __cmpxchg_u64 unconditionallyGuenter Roeck1-1/+1
__cmpxchg_u64 is built and used outside CONFIG_64BIT and thus needs to be exported. This fixes the following build error seen when building parisc:allmodconfig. ERROR: "__cmpxchg_u64" [drivers/net/ethernet/intel/i40e/i40e.ko] undefined! Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Helge Deller <deller@gmx.de>
2017-10-19parisc: Fix double-word compare and exchange in LWS code on 32-bit kernelsJohn David Anglin1-3/+3
As discussed on the debian-hppa list, double-wordcompare and exchange operations fail on 32-bit kernels. Looking at the code, I realized that the ",ma" completer does the wrong thing in the "ldw,ma 4(%r26), %r29" instruction. This increments %r26 and causes the following store to write to the wrong location. Note by Helge Deller: The patch applies cleanly to stable kernel series if this upstream commit is merged in advance: f4125cfdb300 ("parisc: Avoid trashing sr2 and sr3 in LWS code"). Signed-off-by: John David Anglin <dave.anglin@bell.net> Tested-by: Christoph Biedl <debian.axhn@manchmal.in-ulm.de> Fixes: 89206491201c ("parisc: Implement new LWS CAS supporting 64 bit operations.") Cc: stable@vger.kernel.org # 3.13+ Signed-off-by: Helge Deller <deller@gmx.de>
2017-10-19commoncap: move assignment of fs_ns to avoid null pointer dereferenceColin Ian King1-1/+2
The pointer fs_ns is assigned from inode->i_ib->s_user_ns before a null pointer check on inode, hence if inode is actually null we will get a null pointer dereference on this assignment. Fix this by only dereferencing inode after the null pointer check on inode. Detected by CoverityScan CID#1455328 ("Dereference before null check") Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-18ALSA: hda - Fix incorrect TLV callback check introduced during set_fs() removalTakashi Iwai3-42/+89
The commit 99b5c5bb9a54 ("ALSA: hda - Remove the use of set_fs()") converted the get_kctl_0dB_offset() call for killing set_fs() usage in HD-audio codec code. The conversion assumed that the TLV callback used in HD-audio code is only snd_hda_mixer_amp() and applies the TLV calculation locally. Although this assumption is correct, and all slave kctls are actually with that callback, the current code is still utterly buggy; it doesn't hit this condition and falls back to the next check. It's because the function gets called after adding slave kctls to vmaster. By assigning a slave kctl, the slave kctl object is faked inside vmaster code, and the whole kctl ops are overridden. Thus the callback op points to a different value from what we've assumed. More badly, as reported by the KERNEXEC and UDEREF features of PaX, the code flow turns into the unexpected pitfall. The next fallback check is SNDRV_CTL_ELEM_ACCESS_TLV_READ access bit, and this always hits for each kctl with TLV. Then it evaluates the callback function pointer wrongly as if it were a TLV array. Although currently its side-effect is fairly limited, this incorrect reference may lead to an unpleasant result. For addressing the regression, this patch introduces a new helper to vmaster code, snd_ctl_apply_vmaster_slaves(). This works similarly like the existing map_slaves() in hda_codec.c: it loops over the slave list of the given master, and applies the given function to each slave. Then the initializer function receives the right kctl object and we can compare the correct pointer instead of the faked one. Also, for catching the similar breakage in future, give an error message when the unexpected TLV callback is found and bail out immediately. Fixes: 99b5c5bb9a54 ("ALSA: hda - Remove the use of set_fs()") Reported-by: PaX Team <pageexec@freemail.hu> Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-10-18ALSA: hda: Remove superfluous '-' added by printk conversionTakashi Iwai1-1/+1
While converting the error messages to the standard macros in the commit 4e76a8833fac ("ALSA: hda - Replace with standard printk"), a superfluous '-' slipped in the code mistakenly. Its influence is almost negligible, merely shows a dB value as negative integer instead of positive integer (or vice versa) in the rare error message. So let's kill this embarrassing byte to show more correct value. Fixes: 4e76a8833fac ("ALSA: hda - Replace with standard printk") Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-10-18ALSA: hda: Abort capability probe at invalid register readTakashi Iwai1-0/+5
The loop in snd_hdac_bus_parse_capabilities() may go to nirvana when it hits an invalid register value read: BUG: unable to handle kernel paging request at ffffad5dc41f3fff IP: pci_azx_readl+0x5/0x10 [snd_hda_intel] Call Trace: snd_hdac_bus_parse_capabilities+0x3c/0x1f0 [snd_hda_core] azx_probe_continue+0x7d5/0x940 [snd_hda_intel] ..... This happened on a new Intel machine, and we need to check the value and abort the loop accordingly. [Note: the fixes tag below indicates only the commit where this patch can be applied; the original problem was introduced even before that commit] Fixes: 6720b38420a0 ("ALSA: hda - move bus_parse_capabilities to core") Cc: <stable@vger.kernel.org> Acked-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-10-18pkcs7: Prevent NULL pointer dereference, since sinfo is not always set.Eric Sesterhenn1-0/+3
The ASN.1 parser does not necessarily set the sinfo field, this patch prevents a NULL pointer dereference on broken input. Fixes: 99db44350672 ("PKCS#7: Appropriately restrict authenticated attributes and content type") Signed-off-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de> Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org # 4.3+
2017-10-18KEYS: load key flags and expiry time atomically in proc_keys_show()Eric Biggers1-10/+14
In proc_keys_show(), the key semaphore is not held, so the key ->flags and ->expiry can be changed concurrently. We therefore should read them atomically just once. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18KEYS: Load key expiry time atomically in keyring_search_iterator()Eric Biggers1-1/+3
Similar to the case for key_validate(), we should load the key ->expiry once atomically in keyring_search_iterator(), since it can be changed concurrently with the flags whenever the key semaphore isn't held. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18KEYS: load key flags and expiry time atomically in key_validate()Eric Biggers1-3/+4
In key_validate(), load the flags and expiry time once atomically, since these can change concurrently if key_validate() is called without the key semaphore held. And we don't want to get inconsistent results if a variable is referenced multiple times. For example, key->expiry was referenced in both 'if (key->expiry)' and in 'if (now.tv_sec >= key->expiry)', making it theoretically possible to see a spurious EKEYEXPIRED while the expiration time was being removed, i.e. set to 0. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18KEYS: don't let add_key() update an uninstantiated keyDavid Howells1-0/+10
Currently, when passed a key that already exists, add_key() will call the key's ->update() method if such exists. But this is heavily broken in the case where the key is uninstantiated because it doesn't call __key_instantiate_and_link(). Consequently, it doesn't do most of the things that are supposed to happen when the key is instantiated, such as setting the instantiation state, clearing KEY_FLAG_USER_CONSTRUCT and awakening tasks waiting on it, and incrementing key->user->nikeys. It also never takes key_construction_mutex, which means that ->instantiate() can run concurrently with ->update() on the same key. In the case of the "user" and "logon" key types this causes a memory leak, at best. Maybe even worse, the ->update() methods of the "encrypted" and "trusted" key types actually just dereference a NULL pointer when passed an uninstantiated key. Change key_create_or_update() to wait interruptibly for the key to finish construction before continuing. This patch only affects *uninstantiated* keys. For now we still allow a negatively instantiated key to be updated (thereby positively instantiating it), although that's broken too (the next patch fixes it) and I'm not sure that anyone actually uses that functionality either. Here is a simple reproducer for the bug using the "encrypted" key type (requires CONFIG_ENCRYPTED_KEYS=y), though as noted above the bug pertained to more than just the "encrypted" key type: #include <stdlib.h> #include <unistd.h> #include <keyutils.h> int main(void) { int ringid = keyctl_join_session_keyring(NULL); if (fork()) { for (;;) { const char payload[] = "update user:foo 32"; usleep(rand() % 10000); add_key("encrypted", "desc", payload, sizeof(payload), ringid); keyctl_clear(ringid); } } else { for (;;) request_key("encrypted", "desc", "callout_info", ringid); } } It causes: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: encrypted_update+0xb0/0x170 PGD 7a178067 P4D 7a178067 PUD 77269067 PMD 0 PREEMPT SMP CPU: 0 PID: 340 Comm: reproduce Tainted: G D 4.14.0-rc1-00025-g428490e38b2e #796 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff8a467a39a340 task.stack: ffffb15c40770000 RIP: 0010:encrypted_update+0xb0/0x170 RSP: 0018:ffffb15c40773de8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8a467a275b00 RCX: 0000000000000000 RDX: 0000000000000005 RSI: ffff8a467a275b14 RDI: ffffffffb742f303 RBP: ffffb15c40773e20 R08: 0000000000000000 R09: ffff8a467a275b17 R10: 0000000000000020 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8a4677057180 R15: ffff8a467a275b0f FS: 00007f5d7fb08700(0000) GS:ffff8a467f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 0000000077262005 CR4: 00000000001606f0 Call Trace: key_create_or_update+0x2bc/0x460 SyS_add_key+0x10c/0x1d0 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x7f5d7f211259 RSP: 002b:00007ffed03904c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000f8 RAX: ffffffffffffffda RBX: 000000003b2a7955 RCX: 00007f5d7f211259 RDX: 00000000004009e4 RSI: 00000000004009ff RDI: 0000000000400a04 RBP: 0000000068db8bad R08: 000000003b2a7955 R09: 0000000000000004 R10: 000000000000001a R11: 0000000000000246 R12: 0000000000400868 R13: 00007ffed03905d0 R14: 0000000000000000 R15: 0000000000000000 Code: 77 28 e8 64 34 1f 00 45 31 c0 31 c9 48 8d 55 c8 48 89 df 48 8d 75 d0 e8 ff f9 ff ff 85 c0 41 89 c4 0f 88 84 00 00 00 4c 8b 7d c8 <49> 8b 75 18 4c 89 ff e8 24 f8 ff ff 85 c0 41 89 c4 78 6d 49 8b RIP: encrypted_update+0xb0/0x170 RSP: ffffb15c40773de8 CR2: 0000000000000018 Cc: <stable@vger.kernel.org> # v2.6.12+ Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Biggers <ebiggers@google.com>
2017-10-18KEYS: Fix race between updating and finding a negative keyDavid Howells14-57/+80
Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection error into one field such that: (1) The instantiation state can be modified/read atomically. (2) The error can be accessed atomically with the state. (3) The error isn't stored unioned with the payload pointers. This deals with the problem that the state is spread over three different objects (two bits and a separate variable) and reading or updating them atomically isn't practical, given that not only can uninstantiated keys change into instantiated or rejected keys, but rejected keys can also turn into instantiated keys - and someone accessing the key might not be using any locking. The main side effect of this problem is that what was held in the payload may change, depending on the state. For instance, you might observe the key to be in the rejected state. You then read the cached error, but if the key semaphore wasn't locked, the key might've become instantiated between the two reads - and you might now have something in hand that isn't actually an error code. The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error code if the key is negatively instantiated. The key_is_instantiated() function is replaced with key_is_positive() to avoid confusion as negative keys are also 'instantiated'. Additionally, barriering is included: (1) Order payload-set before state-set during instantiation. (2) Order state-read before payload-read when using the key. Further separate barriering is necessary if RCU is being used to access the payload content after reading the payload pointers. Fixes: 146aa8b1453b ("KEYS: Merge the type-specific data with the payload data") Cc: stable@vger.kernel.org # v4.4+ Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Eric Biggers <ebiggers@google.com>
2017-10-18KEYS: checking the input id parameters before finding asymmetric keyChun-Yi Lee1-0/+2
For finding asymmetric key, the input id_0 and id_1 parameters can not be NULL at the same time. This patch adds the BUG_ON checking for id_0 and id_1. Cc: David Howells <dhowells@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Chun-Yi Lee <jlee@suse.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18KEYS: Fix the wrong index when checking the existence of second idChun-Yi Lee1-1/+1
Fix the wrong index number when checking the existence of second id in function of finding asymmetric key. The id_1 is the second id that the index in array must be 1 but not 0. Fixes: 9eb029893ad5 (KEYS: Generalise x509_request_asymmetric_key()) Cc: David Howells <dhowells@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Chun-Yi Lee <jlee@suse.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18security/keys: BIG_KEY requires CONFIG_CRYPTOArnd Bergmann1-0/+1
The recent rework introduced a possible randconfig build failure when CONFIG_CRYPTO configured to only allow modules: security/keys/big_key.o: In function `big_key_crypt': big_key.c:(.text+0x29f): undefined reference to `crypto_aead_setkey' security/keys/big_key.o: In function `big_key_init': big_key.c:(.init.text+0x1a): undefined reference to `crypto_alloc_aead' big_key.c:(.init.text+0x45): undefined reference to `crypto_aead_setauthsize' big_key.c:(.init.text+0x77): undefined reference to `crypto_destroy_tfm' crypto/gcm.o: In function `gcm_hash_crypt_remain_continue': gcm.c:(.text+0x167): undefined reference to `crypto_ahash_finup' crypto/gcm.o: In function `crypto_gcm_exit_tfm': gcm.c:(.text+0x847): undefined reference to `crypto_destroy_tfm' When we 'select CRYPTO' like the other users, we always get a configuration that builds. Fixes: 428490e38b2e ("security/keys: rewrite all of big_key crypto") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18ALSA: seq: Enable 'use' locking in all configurationsBen Hutchings2-16/+0
The 'use' locking macros are no-ops if neither SMP or SND_DEBUG is enabled. This might once have been OK in non-preemptible configurations, but even in that case snd_seq_read() may sleep while relying on a 'use' lock. So always use the proper implementations. Cc: stable@vger.kernel.org Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-10-18Revert "tools/power turbostat: stop migrating, unless '-m'"Len Brown1-9/+1
This reverts commit c91fc8519d87715a3a173475ea3778794c139996. That change caused a C6 and PC6 residency regression on large idle systems. Users also complained about new output indicating jitter: turbostat: cpu6 jitter 3794 9142 Signed-off-by: Len Brown <len.brown@intel.com> Cc: 4.13+ <stable@vger.kernel.org> # v4.13+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-10-18i2c: omap: Fix error handling for clk_get()Tony Lindgren1-0/+14
Otherwise we can get the following if the fck alias is missing: Unable to handle kernel paging request at virtual address fffffffe ... PC is at clk_get_rate+0x8/0x10 LR is at omap_i2c_probe+0x278/0x6ec ... [<c056eb08>] (clk_get_rate) from [<c06f4f08>] (omap_i2c_probe+0x278/0x6ec) [<c06f4f08>] (omap_i2c_probe) from [<c0610944>] (platform_drv_probe+0x50/0xb0) [<c0610944>] (platform_drv_probe) from [<c060e900>] (driver_probe_device+0x264/0x2ec) [<c060e900>] (driver_probe_device) from [<c060cda0>] (bus_for_each_drv+0x70/0xb8) [<c060cda0>] (bus_for_each_drv) from [<c060e5b0>] (__device_attach+0xcc/0x13c) [<c060e5b0>] (__device_attach) from [<c060db10>] (bus_probe_device+0x88/0x90) [<c060db10>] (bus_probe_device) from [<c060df68>] (deferred_probe_work_func+0x4c/0x14c) Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-10-17tracing/samples: Fix creation and deletion of simple_thread_fn creationSteven Rostedt (VMware)1-3/+11
Commit 7496946a8 ("tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()") added template examples for all the events. It created a DEFINE_EVENT_FN() example which reused the foo_bar_reg and foo_bar_unreg functions. Enabling both the TRACE_EVENT_FN() and DEFINE_EVENT_FN() example trace events caused the foo_bar_reg to be called twice, creating the test thread twice. The foo_bar_unreg would remove it only once, even if it was called multiple times, leaving a thread existing when the module is unloaded, causing an oops. Add a ref count and allow foo_bar_reg() and foo_bar_unreg() be called by multiple trace events. Cc: stable@vger.kernel.org Fixes: 7496946a8 ("tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-10-17fs: Avoid invalidation in interrupt context in dio_complete()Lukas Czerner1-6/+13
Currently we try to defer completion of async DIO to the process context in case there are any mapped pages associated with the inode so that we can invalidate the pages when the IO completes. However the check is racy and the pages can be mapped afterwards. If this happens we might end up calling invalidate_inode_pages2_range() in dio_complete() in interrupt context which could sleep. This can be reproduced by generic/451. Fix this by passing the information whether we can or can't invalidate to the dio_complete(). Thanks Eryu Guan for reporting this and Jan Kara for suggesting a fix. Fixes: 332391a9935d ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO") Reported-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-16xfs: move two more RT specific functions into CONFIG_XFS_RTArnd Bergmann1-24/+24
The last cleanup introduced two harmless warnings: fs/xfs/xfs_fsmap.c:480:1: warning: '__xfs_getfsmap_rtdev' defined but not used fs/xfs/xfs_fsmap.c:372:1: warning: 'xfs_getfsmap_rtdev_rtbitmap_helper' defined but not used This moves those two functions as well. Fixes: bb9c2e543325 ("xfs: move more RT specific code under CONFIG_XFS_RT") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-16xfs: trim writepage mapping to within eofBrian Foster3-0/+25
The writeback rework in commit fbcc02561359 ("xfs: Introduce writeback context for writepages") introduced a subtle change in behavior with regard to the block mapping used across the ->writepages() sequence. The previous xfs_cluster_write() code would only flush pages up to EOF at the time of the writepage, thus ensuring that any pages due to file-extending writes would be handled on a separate cycle and with a new, updated block mapping. The updated code establishes a block mapping in xfs_writepage_map() that could extend beyond EOF if the file has post-eof preallocation. Because we now use the generic writeback infrastructure and pass the cached mapping to each writepage call, there is no implicit EOF limit in place. If eofblocks trimming occurs during ->writepages(), any post-eof portion of the cached mapping becomes invalid. The eofblocks code has no means to serialize against writeback because there are no pages associated with post-eof blocks. Therefore if an eofblocks trim occurs and is followed by a file-extending buffered write, not only has the mapping become invalid, but we could end up writing a page to disk based on the invalid mapping. Consider the following sequence of events: - A buffered write creates a delalloc extent and post-eof speculative preallocation. - Writeback starts and on the first writepage cycle, the delalloc extent is converted to real blocks (including the post-eof blocks) and the mapping is cached. - The file is closed and xfs_release() trims post-eof blocks. The cached writeback mapping is now invalid. - Another buffered write appends the file with a delalloc extent. - The concurrent writeback cycle picks up the just written page because the writeback range end is LLONG_MAX. xfs_writepage_map() attributes it to the (now invalid) cached mapping and writes the data to an incorrect location on disk (and where the file offset is still backed by a delalloc extent). This problem is reproduced by xfstests test generic/464, which triggers racing writes, appends, open/closes and writeback requests. To address this problem, trim the mapping used during writeback to within EOF when the mapping is validated. This ensures the mapping is revalidated for any pages encountered beyond EOF as of the time the current mapping was cached or last validated. Reported-by: Eryu Guan <eguan@redhat.com> Diagnosed-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-16fs: invalidate page cache after end_io() in dio completionEryu Guan2-25/+36
Commit 332391a9935d ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO") moved page cache invalidation from iomap_dio_rw() to iomap_dio_complete() for iomap based direct write path, but before the dio->end_io() call, and it re-introdued the bug fixed by commit c771c14baa33 ("iomap: invalidate page caches should be after iomap_dio_complete() in direct write"). I found this because fstests generic/418 started failing on XFS with v4.14-rc3 kernel, which is the regression test for this specific bug. So similarly, fix it by moving dio->end_io() (which does the unwritten extent conversion) before page cache invalidation, to make sure next buffer read reads the final real allocations not unwritten extents. I also add some comments about why should end_io() go first in case we get it wrong again in the future. Note that, there's no such problem in the non-iomap based direct write path, because we didn't remove the page cache invalidation after the ->direct_IO() in generic_file_direct_write() call, but I decided to fix dio_complete() too so we don't leave a landmine there, also be consistent with iomap_dio_complete(). Fixes: 332391a9935d ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO") Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2017-10-16xfs: cancel dirty pages on invalidationDave Chinner1-12/+22
Recently we've had warnings arise from the vm handing us pages without bufferheads attached to them. This should not ever occur in XFS, but we don't defend against it properly if it does. The only place where we remove bufferheads from a page is in xfs_vm_releasepage(), but we can't tell the difference here between "page is dirty so don't release" and "page is dirty but is being invalidated so release it". In some places that are invalidating pages ask for pages to be released and follow up afterward calling ->releasepage by checking whether the page was dirty and then aborting the invalidation. This is a possible vector for releasing buffers from a page but then leaving it in the mapping, so we really do need to avoid dirty pages in xfs_vm_releasepage(). To differentiate between invalidated pages and normal pages, we need to clear the page dirty flag when invalidating the pages. This can be done through xfs_vm_invalidatepage(), and will result xfs_vm_releasepage() seeing the page as clean which matches the bufferhead state on the page after calling block_invalidatepage(). Hence we can re-add the page dirty check in xfs_vm_releasepage to catch the case where we might be releasing a page that is actually dirty and so should not have the bufferheads on it removed. This will remove one possible vector of "dirty page with no bufferheads" and so help narrow down the search for the root cause of that problem. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-16ALSA: usb-audio: Add native DSD support for Pro-Ject Pre Box S2 DigitalJussi Laako1-0/+1
Add native DSD support quirk for Pro-Ject Pre Box S2 Digital USB id 2772:0230. Signed-off-by: Jussi Laako <jussi@sonarnerd.net> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-10-16Documentation: Add a file explaining the Linux kernel license enforcement policyGreg Kroah-Hartman2-0/+148
This adds a short document describing the views of how the Linux kernel community feels about enforcing the license of the kernel. Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Alex Elder (Linaro) <elder@linaro.org> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Andy Gross <andy.gross@linaro.org> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Anna Schumaker <schumaker.anna@gmail.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Acked-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Bhumika Goyal <bhumirks@gmail.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Darrick J. Wong (Oracle) <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Acked-by: David Kershner <david.kershner@unisys.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Doug Ledford <dledford@redhat.com> Acked-by: Fabio Estevam <festevam@gmail.com> Acked-by: Felipe Balbi <balbi@kernel.org> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Hannes Reinecke <hare@suse.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Ivan Safonov <insafonov@gmail.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Acked-by: Jan Kara (SUSE) <jack@suse.cz> Acked-by: Javier Martinez Canillas <javier@dowhile0.org> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Jes Sorensen <Jes.Sorensen@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Joe Perches <joe@perches.com> Acked-by: Joerg Roedel (SUSE) <jroedel@suse.de> Acked-by: Johan Hovold <johan@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: Khalid Aziz <khalid@gonehiking.org> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Acked-by: Laura Abbott <laura@labbott.name> Acked-by: Lee Jones <lee.jones@linaro.org> Acked-by: Leon Romanovsky <leon@kernel.org> Acked-by: Linus Walleij (Linaro) <linus.walleij@linaro.org> Acked-by: Lv Zheng <zetalog@gmail.com> Acked-by: Martin K. Petersen (Oracle) <martin.petersen@oracle.com> Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Marshall <hubcap@omnibond.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paul Burton <paul.burton@mips.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Richard Weinberger <richard@nod.at> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Sebastian Reichel (Collabora) <sre@kernel.org> Acked-by: Shawn Guo <shawnguo@kernel.org> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Simon Horman <horms@verge.net.au> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Sven Eckelmann <sven@narfation.org> Acked-by: Takashi Iwai (SUSE) <tiwai@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Thierry Reding <thierry.reding@gmail.com> Acked-by: Tony Luck <tony.luck@gmail.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Vinod Koul <vkoul@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-16s390: fix zfcpdump-configDimitri John Ledkov1-0/+2
zipl from s390-tools generates root=/dev/ram0 kernel cmdline for zfcpdump, thus BLK_DEV_RAM is required. zfcpdump initrd mounts DEBUG_FS, thus is also required. Bug-Ubuntu: https://launchpad.net/bugs/1722735 Bug-Ubuntu: https://launchpad.net/bugs/1719290 Signed-off-by: Dimitri John Ledkov <xnox@ubuntu.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-16s390/cputime: fix guest/irq/softirq times after CPU hotplugChristian Borntraeger1-0/+3
On CPU hotplug some cpu stats contain bogus values: $ cat /proc/stat cpu 0 0 49 1280 0 0 0 3 0 0 cpu0 0 0 49 618 0 0 0 3 0 0 cpu1 0 0 0 662 0 0 0 0 0 0 [...] $ echo 0 > /sys/devices/system/cpu/cpu1/online $ echo 1 > /sys/devices/system/cpu/cpu1/online $ cat /proc/stat cpu 0 0 49 3200 0 450359962737 450359962737 3 0 0 cpu0 0 0 49 1956 0 0 0 3 0 0 cpu1 0 0 0 1244 0 450359962737 450359962737 0 0 0 [...] pcpu_attach_task() needs the same assignments as vtime_task_switch. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: b7394a5f4ce9 ("sched/cputime, s390: Implement delayed accounting of system time") Cc: stable@vger.kernel.org # 4.11+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-15Linux 4.14-rc5Linus Torvalds1-1/+1
2017-10-14x86/microcode: Do the family check firstBorislav Petkov1-9/+18
On CPUs like AMD's Geode, for example, we shouldn't even try to load microcode because they do not support the modern microcode loading interface. However, we do the family check *after* the other checks whether the loader has been disabled on the command line or whether we're running in a guest. So move the family checks first in order to exit early if we're being loaded on an unsupported family. Reported-and-tested-by: Sven Glodowski <glodi1@arcor.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.11.. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396 Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-14locking/lockdep: Disable cross-release features for nowIngo Molnar1-2/+2
Johan Hovold reported a big lockdep slowdown on his system, caused by lockdep: > I had noticed that the BeagleBone Black boot time appeared to have > increased significantly with 4.14 and yesterday I finally had time to > investigate it. > > Boot time (from "Linux version" to login prompt) had in fact doubled > since 4.13 where it took 17 seconds (with my current config) compared to > the 35 seconds I now see with 4.14-rc4. > > I quick bisect pointed to lockdep and specifically the following commit: > > 28a903f63ec0 ("locking/lockdep: Handle non(or multi)-acquisition of a crosslock") Because the final v4.14 release is close, disable the cross-release lockdep features for now. Bisected-by: Johan Hovold <johan@kernel.org> Debugged-by: Johan Hovold <johan@kernel.org> Reported-by: Johan Hovold <johan@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: kernel-team@lge.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mm@kvack.org Cc: linux-omap@vger.kernel.org Link: http://lkml.kernel.org/r/20171014072659.f2yr6mhm5ha3eou7@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-14x86/mm: Flush more aggressively in lazy TLB modeAndy Lutomirski3-49/+136
Since commit: 94b1b03b519b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking") x86's lazy TLB mode has been all the way lazy: when running a kernel thread (including the idle thread), the kernel keeps using the last user mm's page tables without attempting to maintain user TLB coherence at all. From a pure semantic perspective, this is fine -- kernel threads won't attempt to access user pages, so having stale TLB entries doesn't matter. Unfortunately, I forgot about a subtlety. By skipping TLB flushes, we also allow any paging-structure caches that may exist on the CPU to become incoherent. This means that we can have a paging-structure cache entry that references a freed page table, and the CPU is within its rights to do a speculative page walk starting at the freed page table. I can imagine this causing two different problems: - A speculative page walk starting from a bogus page table could read IO addresses. I haven't seen any reports of this causing problems. - A speculative page walk that involves a bogus page table can install garbage in the TLB. Such garbage would always be at a user VA, but some AMD CPUs have logic that triggers a machine check when it notices these bogus entries. I've seen a couple reports of this. Boris further explains the failure mode: > It is actually more of an optimization which assumes that paging-structure > entries are in WB DRAM: > > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables > performance optimization that assumes PML4, PDP, PDE, and PTE entries > are in cacheable WB-DRAM; memory type checks may be bypassed, and > addresses outside of WB-DRAM may result in undefined behavior or NB > protocol errors. 1=Disables performance optimization and allows PML4, > PDP, PDE and PTE entries to be in any memory type. Operating systems > that maintain page tables in memory types other than WB- DRAM must set > TlbCacheDis to insure proper operation." > > The MCE generated is an NB protocol error to signal that > > "Link: A specific coherent-only packet from a CPU was issued to an > IO link. This may be caused by software which addresses page table > structures in a memory type other than cacheable WB-DRAM without > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for > example, when page table structure addresses are above top of memory. In > such cases, the NB will generate an MCE if it sees a mismatch between > the memory operation generated by the core and the link type." > > I'm assuming coherent-only packets don't go out on IO links, thus the > error. To fix this, reinstate TLB coherence in lazy mode. With this patch applied, we do it in one of two ways: - If we have PCID, we simply switch back to init_mm's page tables when we enter a kernel thread -- this seems to be quite cheap except for the cost of serializing the CPU. - If we don't have PCID, then we set a flag and switch to init_mm the first time we would otherwise need to flush the TLB. The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed to override the default mode for benchmarking. In theory, we could optimize this better by only flushing the TLB in lazy CPUs when a page table is freed. Doing that would require auditing the mm code to make sure that all page table freeing goes through tlb_remove_page() as well as reworking some data structures to implement the improved flush logic. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Adam Borowski <kilobyte@angband.pl> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Biggers <ebiggers@google.com> Cc: Johannes Hirte <johannes.hirte@datenkhaos.de> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 94b1b03b519b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking") Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-13mm, swap: use page-cluster as max window of VMA based swap readaheadHuang Ying2-44/+7
When the VMA based swap readahead was introduced, a new knob /sys/kernel/mm/swap/vma_ra_max_order was added as the max window of VMA swap readahead. This is to make it possible to use different max window for VMA based readahead and original physical readahead. But Minchan Kim pointed out that this will cause a regression because setting page-cluster sysctl to zero cannot disable swap readahead with the change. To fix the regression, the page-cluster sysctl is used as the max window of both the VMA based swap readahead and original physical swap readahead. If more fine grained control is needed in the future, more knobs can be added as the subordinate knobs of the page-cluster sysctl. The vma_ra_max_order knob is deleted. Because the knob was introduced in v4.14-rc1, and this patch is targeting being merged before v4.14 releasing, there should be no existing users of this newly added ABI. Link: http://lkml.kernel.org/r/20171011070847.16003-1-ying.huang@intel.com Fixes: ec560175c0b6fce ("mm, swap: VMA based swap readahead") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reported-by: Minchan Kim <minchan@kernel.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13mm: page_vma_mapped: ensure pmd is loaded with READ_ONCE outside of lockWill Deacon1-15/+10
Loading the pmd without holding the pmd_lock exposes us to races with concurrent updaters of the page tables but, worse still, it also allows the compiler to cache the pmd value in a register and reuse it later on, even if we've performed a READ_ONCE in between and seen a more recent value. In the case of page_vma_mapped_walk, this leads to the following crash when the pmd loaded for the initial pmd_trans_huge check is all zeroes and a subsequent valid table entry is loaded by check_pmd. We then proceed into map_pte, but the compiler re-uses the zero entry inside pte_offset_map, resulting in a junk pointer being installed in pvmw->pte: PC is at check_pte+0x20/0x170 LR is at page_vma_mapped_walk+0x2e0/0x540 [...] Process doio (pid: 2463, stack limit = 0xffff00000f2e8000) Call trace: check_pte+0x20/0x170 page_vma_mapped_walk+0x2e0/0x540 page_mkclean_one+0xac/0x278 rmap_walk_file+0xf0/0x238 rmap_walk+0x64/0xa0 page_mkclean+0x90/0xa8 clear_page_dirty_for_io+0x84/0x2a8 mpage_submit_page+0x34/0x98 mpage_process_page_bufs+0x164/0x170 mpage_prepare_extent_to_map+0x134/0x2b8 ext4_writepages+0x484/0xe30 do_writepages+0x44/0xe8 __filemap_fdatawrite_range+0xbc/0x110 file_write_and_wait_range+0x48/0xd8 ext4_sync_file+0x80/0x4b8 vfs_fsync_range+0x64/0xc0 SyS_msync+0x194/0x1e8 This patch fixes the problem by ensuring that READ_ONCE is used before the initial checks on the pmd, and this value is subsequently used when checking whether or not the pmd is present. pmd_check is removed and the pmd_present check is inlined directly. Link: http://lkml.kernel.org/r/1507222630-5839-1-git-send-email-will.deacon@arm.com Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") Signed-off-by: Will Deacon <will.deacon@arm.com> Tested-by: Yury Norov <ynorov@caviumnetworks.com> Tested-by: Richard Ruigrok <rruigrok@codeaurora.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13kmemleak: clear stale pointers from task stacksKonstantin Khlebnikov2-1/+5
Kmemleak considers any pointers on task stacks as references. This patch clears newly allocated and reused vmap stacks. Link: http://lkml.kernel.org/r/150728990124.744199.8403409836394318684.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13fs/binfmt_misc.c: node could be NULL when evicting inodeEryu Guan1-1/+1
inode->i_private is assigned by a Node pointer only after registering a new binary format, so it could be NULL if inode was created by bm_fill_super() (or iput() was called by the error path in bm_register_write()), and this could result in NULL pointer dereference when evicting such an inode. e.g. mount binfmt_misc filesystem then umount it immediately: mount -t binfmt_misc binfmt_misc /proc/sys/fs/binfmt_misc umount /proc/sys/fs/binfmt_misc will result in BUG: unable to handle kernel NULL pointer dereference at 0000000000000013 IP: bm_evict_inode+0x16/0x40 [binfmt_misc] ... Call Trace: evict+0xd3/0x1a0 iput+0x17d/0x1d0 dentry_unlink_inode+0xb9/0xf0 __dentry_kill+0xc7/0x170 shrink_dentry_list+0x122/0x280 shrink_dcache_parent+0x39/0x90 do_one_tree+0x12/0x40 shrink_dcache_for_umount+0x2d/0x90 generic_shutdown_super+0x1f/0x120 kill_litter_super+0x29/0x40 deactivate_locked_super+0x43/0x70 deactivate_super+0x45/0x60 cleanup_mnt+0x3f/0x70 __cleanup_mnt+0x12/0x20 task_work_run+0x86/0xa0 exit_to_usermode_loop+0x6d/0x99 syscall_return_slowpath+0xba/0xf0 entry_SYSCALL_64_fastpath+0xa3/0xa Fix it by making sure Node (e) is not NULL. Link: http://lkml.kernel.org/r/20171010100642.31786-1-eguan@redhat.com Fixes: 83f918274e4b ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()") Signed-off-by: Eryu Guan <eguan@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13fs/mpage.c: fix mpage_writepage() for pages with buffersMatthew Wilcox3-5/+16
When using FAT on a block device which supports rw_page, we can hit BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we call clean_buffers() after unlocking the page we've written. Introduce a new clean_page_buffers() which cleans all buffers associated with a page and call it from within bdev_write_page(). [akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew] Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reported-by: Toshi Kani <toshi.kani@hpe.com> Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Tested-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13linux/kernel.h: add/correct kernel-doc notationRandy Dunlap1-16/+74
Add kernel-doc notation for some macros. Correct kernel-doc comments & typos for a few macros. Link: http://lkml.kernel.org/r/76fa1403-1511-be4c-e9c4-456b43edfad3@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13tty: fall back to N_NULL if switching to N_TTY fails during hangupJohannes Weiner1-6/+5
We have seen NULL-pointer dereference crashes in tty->disc_data when the N_TTY fallback driver failed to open during hangup. The immediate cause of this open to fail has been addressed in the preceding patch to vmalloc(), but this code could be more robust. As Alan pointed out in commit 8a8dabf2dd68 ("tty: handle the case where we cannot restore a line discipline"), the N_TTY driver, historically the safe fallback that could never fail, can indeed fail, but the surrounding code is not prepared to handle this. To avoid crashes he added a new N_NULL driver to take N_TTY's place as the last resort. Hook that fallback up to the hangup path. Update tty_ldisc_reinit() to reflect the reality that n_tty_open can indeed fail. Link: http://lkml.kernel.org/r/20171004185959.GC2136@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Alan Cox <alan@llwyncelyn.cymru> Cc: Christoph Hellwig <hch@lst.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13Revert "vmalloc: back off when the current task is killed"Johannes Weiner1-6/+0
This reverts commits 5d17a73a2ebe ("vmalloc: back off when the current task is killed") and 171012f56127 ("mm: don't warn when vmalloc() fails due to a fatal signal"). Commit 5d17a73a2ebe ("vmalloc: back off when the current task is killed") made all vmalloc allocations from a signal-killed task fail. We have seen crashes in the tty driver from this, where a killed task exiting tries to switch back to N_TTY, fails n_tty_open because of the vmalloc failing, and later crashes when dereferencing tty->disc_data. Arguably, relying on a vmalloc() call to succeed in order to properly exit a task is not the most robust way of doing things. There will be a follow-up patch to the tty code to fall back to the N_NULL ldisc. But the justification to make that vmalloc() call fail like this isn't convincing, either. The patch mentions an OOM victim exhausting the memory reserves and thus deadlocking the machine. But the OOM killer is only one, improbable source of fatal signals. It doesn't make sense to fail allocations preemptively with plenty of memory in most cases. The patch doesn't mention real-life instances where vmalloc sites would exhaust memory, which makes it sound more like a theoretical issue to begin with. But just in case, the OOM access to memory reserves has been restricted on the allocator side in cd04ae1e2dc8 ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"), which should take care of any theoretical concerns on that front. Revert this patch, and the follow-up that suppresses the allocation warnings when we fail the allocations due to a signal. Link: http://lkml.kernel.org/r/20171004185906.GB2136@cmpxchg.org Fixes: 171012f56127 ("mm: don't warn when vmalloc() fails due to a fatal signal") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alan Cox <alan@llwyncelyn.cymru> Cc: Christoph Hellwig <hch@lst.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13mm/cma.c: take __GFP_NOWARN into account in cma_alloc()Boris Brezillon1-1/+1
cma_alloc() unconditionally prints an INFO message when the CMA allocation fails. Make this message conditional on the non-presence of __GFP_NOWARN in gfp_mask. This patch aims at removing INFO messages that are displayed when the VC4 driver tries to allocate buffer objects. From the driver perspective an allocation failure is acceptable, and the driver can possibly do something to make following allocation succeed (like flushing the VC4 internal cache). Link: http://lkml.kernel.org/r/20171004125447.15195-1-boris.brezillon@free-electrons.com Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Laura Abbott <labbott@redhat.com> Cc: Jaewon Kim <jaewon31.kim@samsung.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Eric Anholt <eric@anholt.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13scripts/kallsyms.c: ignore symbol type 'n'Guenter Roeck1-1/+1
gcc on aarch64 may emit synbols of type 'n' if the kernel is built with '-frecord-gcc-switches'. In most cases, those symbols are reported with nm as 000000000000000e n $d and with objdump as 0000000000000000 l d .GCC.command.line 0000000000000000 .GCC.command.line 000000000000000e l .GCC.command.line 0000000000000000 $d Those symbols are detected in is_arm_mapping_symbol() and ignored. However, if "--prefix-symbols=<prefix>" is configured as well, the situation is different. For example, in efi/libstub, arm64 images are built with '--prefix-alloc-sections=.init --prefix-symbols=__efistub_'. In combination with '-frecord-gcc-switches', the symbols are now reported by nm as: 000000000000000e n __efistub_$d and by objdump as: 0000000000000000 l d .GCC.command.line 0000000000000000 .GCC.command.line 000000000000000e l .GCC.command.line 0000000000000000 __efistub_$d Those symbols are no longer ignored and included in the base address calculation. This results in a base address of 000000000000000e, which in turn causes kallsyms to abort with kallsyms failure: relative symbol value 0xffffff900800a000 out of range in relative mode The problem is seen in little endian arm64 builds with CONFIG_EFI enabled and with '-frecord-gcc-switches' set in KCFLAGS. Explicitly ignore symbols of type 'n' since those are clearly debug symbols. Link: http://lkml.kernel.org/r/1507136063-3139-1-git-send-email-linux@roeck-us.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13userfaultfd: selftest: exercise -EEXIST only in background transferAndrea Arcangeli1-5/+20
I was stress testing some backports and with high load, after some time, the latest version of the selftest showed some false positive in connection with the uffdio_copy_retry. This seems to fix it while still exercising -EEXIST in the background transfer once in a while. The fork child will quit after the last UFFDIO_COPY is run, so a repeated UFFDIO_COPY may not return -EEXIST. This change restricts the -EEXIST stress to the background transfer where the memory can't go away from under it. Also updated uffdio_zeropage, so the interface is consistent. Link: http://lkml.kernel.org/r/20171004171541.1495-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13mm: only display online cpus of the numa nodeZhen Lei1-2/+10
When I execute numactl -H (which reads /sys/devices/system/node/nodeX/cpumap and displays cpumask_of_node for each node), I get different result on X86 and arm64. For each numa node, the former only displayed online CPUs, and the latter displayed all possible CPUs. Unfortunately, both Linux documentation and numactl manual have not described it clear. I sent a mail to ask for help, and Michal Hocko replied that he preferred to print online cpus because it doesn't really make much sense to bind anything on offline nodes. Will said: "I suspect the vast majority (if not all) code that reads this file was developed for x86, so having the same behaviour for arm64 sounds like something we should do ASAP before people try to special case with things like #ifdef __aarch64__. I'd rather have this in 4.14 if possible." Link: http://lkml.kernel.org/r/1506678805-15392-2-git-send-email-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Libin <huawei.libin@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13mm: remove unnecessary WARN_ONCE in page_vma_mapped_walk().Zi Yan1-2/+1
A non present pmd entry can appear after pmd_lock is taken in page_vma_mapped_walk(), even if THP migration is not enabled. The WARN_ONCE is unnecessary. Link: http://lkml.kernel.org/r/20171003142606.12324-1-zi.yan@sent.com Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13mm/mempolicy: fix NUMA_INTERLEAVE_HIT counterAndrey Ryabinin1-2/+5
Commit 3a321d2a3dde ("mm: change the call sites of numa statistics items") separated NUMA counters from zone counters, but the NUMA_INTERLEAVE_HIT call site wasn't updated to use the new interface. So alloc_page_interleave() actually increments NR_ZONE_INACTIVE_FILE instead of NUMA_INTERLEAVE_HIT. Fix this by using __inc_numa_state() interface to increment NUMA_INTERLEAVE_HIT. Link: http://lkml.kernel.org/r/20171003191003.8573-1-aryabinin@virtuozzo.com Fixes: 3a321d2a3dde ("mm: change the call sites of numa statistics items") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Kemi Wang <kemi.wang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13include/linux/of.h: provide of_n_{addr,size}_cells wrappers for !CONFIG_OFArnd Bergmann1-0/+10
The pci-rcar driver is enabled for compile tests, and this has shown that the driver cannot build without CONFIG_OF, following the inclusion of commit f8f2fe7355fb ("PCI: rcar: Use new OF interrupt mapping when possible"): drivers/pci/host/pcie-rcar.c: In function 'pci_dma_range_parser_init': drivers/pci/host/pcie-rcar.c:1039:2: error: implicit declaration of function 'of_n_addr_cells' [-Werror=implicit-function-declaration] parser->pna = of_n_addr_cells(node); ^ As pointed out by Ben Dooks and Geert Uytterhoeven, this is actually supposed to build fine, which we can achieve if we make the declaration of of_irq_parse_and_map_pci conditional on CONFIG_OF and provide an empty inline function otherwise, as we do for a lot of other of interfaces. This lets us build the rcar_pci driver again without CONFIG_OF for build testing. All platforms using this driver select OF, so this doesn't change anything for the users. [akpm@linux-foundation.org: be consistent with surrounding code] Link: http://lkml.kernel.org/r/20170911200805.3363318-1-arnd@arndb.de Fixes: c25da4778803 ("PCI: rcar: Add Renesas R-Car PCIe driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Magnus Damm <damm@opensource.se> Cc: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>