aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-04-07SMB3: Rename clone_range to copychunk_rangeSachin Prabhu3-15/+16
Server side copy is one of the most important mechanisms smb2/smb3 supports and it was unintentionally disabled for most use cases. Renaming calls to reflect the underlying smb2 ioctl called. This is similar to the name duplicate_extents used for a similar ioctl which is also used to duplicate files by reusing fs blocks. The name change is to avoid confusion. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2017-04-07Handle mismatched open callsSachin Prabhu9-13/+143
A signal can interrupt a SendReceive call which result in incoming responses to the call being ignored. This is a problem for calls such as open which results in the successful response being ignored. This results in an open file resource on the server. The patch looks into responses which were cancelled after being sent and in case of successful open closes the open fids. For this patch, the check is only done in SendReceive2() RH-bz: 1403319 Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Cc: Stable <stable@vger.kernel.org>
2017-04-05nfp: fix potential use after free on xdp progJakub Kicinski1-1/+2
We should unregister the net_device first, before we give back our reference on xdp_prog. Otherwise xdp_prog may be freed before .ndo_stop() disabled the datapath. Found by code inspection. Fixes: ecd63a0217d5 ("nfp: add XDP support in the driver") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05tcp: fix reordering SNMP under-countingYuchung Cheng1-13/+14
Currently the reordering SNMP counters only increase if a connection sees a higher degree then it has previously seen. It ignores if the reordering degree is not greater than the default system threshold. This significantly under-counts the number of reordering events and falsely convey that reordering is rare on the network. This patch properly and faithfully records the number of reordering events detected by the TCP stack, just like the comment says "this exciting event is worth to be remembered". Note that even so TCP still under-estimate the actual reordering events because TCP requires TS options or certain packet sequences to detect reordering (i.e. ACKing never-retransmitted sequence in recovery or disordered state). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05tcp: fix lost retransmit SNMP under-countingYuchung Cheng1-1/+2
The lost retransmit SNMP stat is under-counting retransmission that uses segment offloading. This patch fixes that so all retransmission related SNMP counters are consistent. Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05sctp: get sock from transport in sctp_transport_update_pmtuXin Long7-28/+22
This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU updates to accomodate route invalidation."). As t->asoc can't be NULL in sctp_transport_update_pmtu, it could get sk from asoc, and no need to pass sk into that function. It is also to remove some duplicated codes from that function. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05ring-buffer: Fix return value check in test_ringbuffer()Wei Yongjun1-4/+4
In case of error, the function kthread_run() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com Cc: stable@vger.kernel.org Fixes: 6c43e554a ("ring-buffer: Add ring buffer startup selftest") Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-05mfd: cros-ec: Fix host command buffer sizeVic Yang1-1/+2
For SPI, we can get up to 32 additional bytes for response preamble. The current overhead (2 bytes) may cause problems when we try to receive a big response. Update it to 32 bytes. Without this fix we could see a kernel BUG when we receive a big response from the Chrome EC when is connected via SPI. Signed-off-by: Vic Yang <victoryang@google.com> Tested-by: Enric Balletbo i Serra <enric.balletbo.collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-04net: ethernet: ti: cpsw: fix race condition during open()Sekhar Nori1-6/+8
TI's cpsw driver handles both OF and non-OF case for phy connect. Unfortunately of_phy_connect() returns NULL on error while phy_connect() returns ERR_PTR(). To handle this, cpsw_slave_open() overrides the return value from phy_connect() to make it NULL or error. This leaves a small window, where cpsw_adjust_link() may be invoked for a slave while slave->phy pointer is temporarily set to -ENODEV (or some other error) before it is finally set to NULL. _cpsw_adjust_link() only handles the NULL case, and an oops results when ERR_PTR() is seen by it. Note that cpsw_adjust_link() checks PHY status for each slave whenever it is invoked. It can so happen that even though phy_connect() for a given slave returns error, _cpsw_adjust_link() is still called for that slave because the link status of another slave changed. Fix this by using a temporary pointer to store return value of {of_}phy_connect() and do a one-time write to slave->phy. Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reported-by: Yan Liu <yan-liu@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-04l2tp: fix PPP pseudo-wire auto-loadingGuillaume Nault1-1/+1
PPP pseudo-wire type is 7 (11 is L2TP_PWTYPE_IP). Fixes: f1f39f911027 ("l2tp: auto load type modules") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-04bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*Colin Ian King2-12/+12
Trival fix, rename HW_INTERRUT_ASSERT_SET_* to HW_INTERRUPT_ASSERT_SET_* Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-04l2tp: take reference on sessions being dumpedGuillaume Nault5-11/+27
Take a reference on the sessions returned by l2tp_session_find_nth() (and rename it l2tp_session_get_nth() to reflect this change), so that caller is assured that the session isn't going to disappear while processing it. For procfs and debugfs handlers, the session is held in the .start() callback and dropped in .show(). Given that pppol2tp_seq_session_show() dereferences the associated PPPoL2TP socket and that l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to call the session's .ref() callback to prevent the socket from going away from under us. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Fixes: 0ad6614048cf ("l2tp: Add debugfs files for dumping l2tp debug info") Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03tcp: minimize false-positives on TCP/GRO checkMarcelo Ricardo Leitner1-5/+9
Markus Trippelsdorf reported that after commit dcb17d22e1c2 ("tcp: warn on bogus MSS and try to amend it") the kernel started logging the warning for a NIC driver that doesn't even support GRO. It was diagnosed that it was possibly caused on connections that were using TCP Timestamps but some packets lacked the Timestamps option. As we reduce rcv_mss when timestamps are used, the lack of them would cause the packets to be bigger than expected, although this is a valid case. As this warning is more as a hint, getting a clean-cut on the threshold is probably not worth the execution time spent on it. This patch thus alleviates the false-positives with 2 quick checks: by accounting for the entire TCP option space and also checking against the interface MTU if it's available. These changes, specially the MTU one, might mask some real positives, though if they are really happening, it's possible that sooner or later it will be triggered anyway. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03sctp: check for dst and pathmtu update in sctp_packet_configXin Long2-34/+50
This patch is to move sctp_transport_dst_check into sctp_packet_config from sctp_packet_transmit and add pathmtu check in sctp_packet_config. With this fix, sctp can update dst or pathmtu before appending chunks, which can void dropping packets in sctp_packet_transmit when dst is obsolete or dst's mtu is changed. This patch is also to improve some other codes in sctp_packet_config. It updates packet max_size with gso_max_size, checks for dst and pathmtu, and appends ecne chunk only when packet is empty and asoc is not NULL. It makes sctp flush work better, as we only need to set up them once for one flush schedule. It's also safe, since asoc is NULL only when the packet is created by sctp_ootb_pkt_new in which it just gets the new dst, no need to do more things for it other than set packet with transport's pathmtu. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03flow dissector: correct size of storage for ARPSimon Horman1-1/+1
The last argument to __skb_header_pointer() should be a buffer large enough to store struct arphdr. This can be a pointer to a struct arphdr structure. The code was previously using a pointer to a pointer to struct arphdr. By my counting the storage available both before and after is 8 bytes on x86_64. Fixes: 55733350e5e8 ("flow disector: ARP support") Reported-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03drm/msm: Make sure to detach the MMU during GPU cleanupJordan Crouse2-13/+19
We should be detaching the MMU before destroying the address space. To do this cleanly, the detach has to happen in adreno_gpu_cleanup() because it needs access to structs in adreno_gpu.c. Plus it is better symmetry to have the attach and detach at the same code level. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-03drm/msm/hdmi: redefinitions of macros not requiredVinay Simha BN1-7/+0
4 macros already defined in hdmi.h, which is not required to redefine in hdmi_audio.c Signed-off-by: Vinay Simha BN <simhavcs@gmail.com> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-03drm/msm/mdp5: Update SSPP_MAX valueArchit Taneja1-1/+2
'SSPP_MAX + 1' is the max number of hwpipes that can be present on a MDP5 platform. Recently, 2 new cursor hwpipes were added, which caused overflows in arrays that used SSPP_MAX to represent the number of elements. Update the SSPP_MAX value to incorporate the extra hwpipes. Signed-off-by: Archit Taneja <architt@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-03drm/msm/dsi: Fix bug in dsi_mgr_phy_enableArchit Taneja1-1/+1
A recent commit introduces a bug in dsi_mgr_phy_enable. In the non dual DSI mode, we reset the mdsi (master DSI) PHY. This isn't right since master and slave DSI exist only in dual DSI mode. For the normal mode of operation, we should simply reset the PHY of the DSI device (i.e. msm_dsi) corresponding to the current bridge. Usage of the wrong DSI pointer also resulted in a static checker warning. That too is resolved with this fix. Fixes: b62aa70a98c5 (drm/msm/dsi: Move PHY operations out of host) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Archit Taneja <architt@codeaurora.org> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-03drm/msm: Don't allow zero sized buffer objectsJordan Crouse1-0/+6
Zero sized buffer objects tend to make various bits of the GEM infrastructure complain: WARNING: CPU: 1 PID: 2323 at drivers/gpu/drm/drm_mm.c:389 drm_mm_insert_node_generic+0x258/0x2f0 Modules linked in: CPU: 1 PID: 2323 Comm: drm-api-test Tainted: G W 4.9.0-rc4-00906-g693af44 #213 Hardware name: Qualcomm Technologies, Inc. DB820c (DT) task: ffff8000d7353400 task.stack: ffff8000d7720000 PC is at drm_mm_insert_node_generic+0x258/0x2f0 LR is at drm_vma_offset_add+0x4c/0x70 Zero sized buffers serve no appreciable value to the user so disallow them at create time. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-03drm/msm: Fix wrong pointer check in a5xx_destroyJordan Crouse1-2/+2
Instead of checking for a5xx_gpu->gpmu_iova during destroy we accidently check a5xx_gpu->gpmu_bo. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-03drm/msm: adreno: fix build error without debugfsArnd Bergmann1-0/+2
The newly added a5xx support fails to build when debugfs is diabled: drivers/gpu/drm/msm/adreno/a5xx_gpu.c:849:4: error: 'struct msm_gpu_funcs' has no member named 'show' drivers/gpu/drm/msm/adreno/a5xx_gpu.c:849:11: error: 'a5xx_show' undeclared here (not in a function); did you mean 'a5xx_irq'? This adds a missing #ifdef. Fixes: b5f103ab98c7 ("drm/msm: gpu: Add A5XX target support") Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-03xfs: fix kernel memory exposure problemsDarrick J. Wong1-1/+1
Fix a memory exposure problems in inumbers where we allocate an array of structures with holes, fail to zero the holes, then blindly copy the kernel memory contents (junk and all) into userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-04-03xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of filesCalvin Owens1-1/+9
When punching past EOF on XFS, fallocate(mode=PUNCH_HOLE|KEEP_SIZE) will round the file size up to the nearest multiple of PAGE_SIZE: calvinow@vm-disks/generic-xfs-1 ~$ dd if=/dev/urandom of=test bs=2048 count=1 calvinow@vm-disks/generic-xfs-1 ~$ stat test Size: 2048 Blocks: 8 IO Block: 4096 regular file calvinow@vm-disks/generic-xfs-1 ~$ fallocate -n -l 2048 -o 2048 -p test calvinow@vm-disks/generic-xfs-1 ~$ stat test Size: 4096 Blocks: 8 IO Block: 4096 regular file Commit 3c2bdc912a1cc050 ("xfs: kill xfs_zero_remaining_bytes") replaced xfs_zero_remaining_bytes() with calls to iomap helpers. The new helpers don't enforce that [pos,offset) lies strictly on [0,i_size) when being called from xfs_free_file_space(), so by "leaking" these ranges into xfs_zero_range() we get this buggy behavior. Fix this by reintroducing the checks xfs_zero_remaining_bytes() did against i_size at the bottom of xfs_free_file_space(). Reported-by: Aaron Gao <gzh@fb.com> Fixes: 3c2bdc912a1cc050 ("xfs: kill xfs_zero_remaining_bytes") Cc: Christoph Hellwig <hch@lst.de> Cc: Brian Foster <bfoster@redhat.com> Cc: <stable@vger.kernel.org> # 4.8+ Signed-off-by: Calvin Owens <calvinowens@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03xfs: rework the inline directory verifiersDarrick J. Wong5-56/+66
The inline directory verifiers should be called on the inode fork data, which means after iformat_local on the read side, and prior to ifork_flush on the write side. This makes the fork verifier more consistent with the way buffer verifiers work -- i.e. they will operate on the memory buffer that the code will be reading and writing directly. Furthermore, revise the verifier function to return -EFSCORRUPTED so that we don't flood the logs with corruption messages and assert notices. This has been a particular problem with xfs/348, which triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the kernel when CONFIG_XFS_DEBUG=y. Disk corruption isn't supposed to do that, at least not in a verifier. Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-02nios2: reserve boot memory for device treeTobias Klauser2-0/+10
Make sure to reserve the boot memory for the flattened device tree. Otherwise it might get overwritten, e.g. when initial_boot_params is copied, leading to a corrupted FDT and a boot hang/crash: bootconsole [early0] enabled Early console on uart16650 initialized at 0xf8001600 OF: fdt: Error -11 processing FDT Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! ---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! Guenter Roeck says: > I think I found the problem. In unflatten_and_copy_device_tree(), with added > debug information: > > OF: fdt: initial_boot_params=c861e400, dt=c861f000 size=28874 (0x70ca) > > ... and then initial_boot_params is copied to dt, which results in corrupted > fdt since the memory overlaps. Looks like the initial_boot_params memory > is not reserved and (re-)allocated by early_init_dt_alloc_memory_arch(). Cc: stable@vger.kernel.org Reported-by: Guenter Roeck <linux@roeck-us.net> Reference: http://lkml.kernel.org/r/20170226210338.GA19476@roeck-us.net Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
2017-04-02net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeoutGrygorii Strashko1-0/+2
In case, if TX watchdog is fired some or all netdev TX queues will be stopped and as part of recovery it is required not only to drain and reinitailize CPSW TX channeles, but also wake up stoppted TX queues what doesn't happen now and netdevice will stop transmiting data until reopenned. Hence, add netif_tx_wake_all_queues() call in .ndo_tx_timeout() to complete recovery and restore TX path. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-02Linux 4.11-rc5Linus Torvalds1-1/+1
2017-04-01l2tp: take a reference on sessions used in genetlink handlersGuillaume Nault3-16/+35
Callers of l2tp_nl_session_find() need to hold a reference on the returned session since there's no guarantee that it isn't going to disappear from under them. Relying on the fact that no l2tp netlink message may be processed concurrently isn't enough: sessions can be deleted by other means (e.g. by closing the PPPOL2TP socket of a ppp pseudowire). l2tp_nl_cmd_session_delete() is a bit special: it runs a callback function that may require a previous call to session->ref(). In particular, for ppp pseudowires, the callback is l2tp_session_delete(), which then calls pppol2tp_session_close() and dereferences the PPPOL2TP socket. The socket might already be gone at the moment l2tp_session_delete() calls session->ref(), so we need to take a reference during the session lookup. So we need to pass the do_ref variable down to l2tp_session_get() and l2tp_session_get_by_ifname(). Since all callers have to be updated, l2tp_session_find_by_ifname() and l2tp_nl_session_find() are renamed to reflect their new behaviour. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01l2tp: hold session while sending creation notificationsGuillaume Nault1-2/+4
l2tp_session_find() doesn't take any reference on the returned session. Therefore, the session may disappear while sending the notification. Use l2tp_session_get() instead and decrement session's refcount once the notification is sent. Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered listeners") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01l2tp: fix duplicate session creationGuillaume Nault3-56/+84
l2tp_session_create() relies on its caller for checking for duplicate sessions. This is racy since a session can be concurrently inserted after the caller's verification. Fix this by letting l2tp_session_create() verify sessions uniqueness upon insertion. Callers need to be adapted to check for l2tp_session_create()'s return code instead of calling l2tp_session_find(). pppol2tp_connect() is a bit special because it has to work on existing sessions (if they're not connected) or to create a new session if none is found. When acting on a preexisting session, a reference must be held or it could go away on us. So we have to use l2tp_session_get() instead of l2tp_session_find() and drop the reference before exiting. Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01l2tp: ensure session can't get removed during pppol2tp_session_ioctl()Guillaume Nault1-4/+11
Holding a reference on session is required before calling pppol2tp_session_ioctl(). The session could get freed while processing the ioctl otherwise. Since pppol2tp_session_ioctl() uses the session's socket, we also need to take a reference on it in l2tp_session_get(). Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01l2tp: fix race in l2tp_recv_common()Guillaume Nault4-23/+88
Taking a reference on sessions in l2tp_recv_common() is racy; this has to be done by the callers. To this end, a new function is required (l2tp_session_get()) to atomically lookup a session and take a reference on it. Callers then have to manually drop this reference. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01sctp: use right in and out stream cntXin Long4-12/+11
Since sctp reconf was added in sctp, the real cnt of in/out stream have not been c.sinit_max_instreams and c.sinit_num_ostreams any more. This patch is to replace them with stream->in/outcnt. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01bpf: add various verifier test cases for self-testsDaniel Borkmann3-6/+283
Add a couple of test cases, for example, probing for xadd on a spilled pointer to packet and map_value_adj register, various other map_value_adj tests including the unaligned load/store, and trying out pointer arithmetic on map_value_adj register itself. For the unaligned load/store, we need to figure out whether the architecture has efficient unaligned access and need to mark affected tests accordingly. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01bpf, verifier: fix rejection of unaligned access checks for map_value_adjDaniel Borkmann1-20/+38
Currently, the verifier doesn't reject unaligned access for map_value_adj register types. Commit 484611357c19 ("bpf: allow access into map value arrays") added logic to check_ptr_alignment() extending it from PTR_TO_PACKET to also PTR_TO_MAP_VALUE_ADJ, but for PTR_TO_MAP_VALUE_ADJ no enforcement is in place, because reg->id for PTR_TO_MAP_VALUE_ADJ reg types is never non-zero, meaning, we can cause BPF_H/_W/_DW-based unaligned access for architectures not supporting efficient unaligned access, and thus worst case could raise exceptions on some archs that are unable to correct the unaligned access or perform a different memory access to the actual requested one and such. i) Unaligned load with !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on r0 (map_value_adj): 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) *(u64 *)(r2 +0) = 0 3: (18) r1 = 0x42533a00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+11 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 7: (61) r1 = *(u32 *)(r0 +0) 8: (35) if r1 >= 0xb goto pc+9 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=10 R10=fp 9: (07) r0 += 3 10: (79) r7 = *(u64 *)(r0 +0) R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R1=inv,min_value=0,max_value=10 R10=fp 11: (79) r7 = *(u64 *)(r0 +2) R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R1=inv,min_value=0,max_value=10 R7=inv R10=fp [...] ii) Unaligned store with !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on r0 (map_value_adj): 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) *(u64 *)(r2 +0) = 0 3: (18) r1 = 0x4df16a00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+19 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 7: (07) r0 += 3 8: (7a) *(u64 *)(r0 +0) = 42 R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp 9: (7a) *(u64 *)(r0 +2) = 43 R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp 10: (7a) *(u64 *)(r0 -2) = 44 R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp [...] For the PTR_TO_PACKET type, reg->id is initially zero when skb->data was fetched, it later receives a reg->id from env->id_gen generator once another register with UNKNOWN_VALUE type was added to it via check_packet_ptr_add(). The purpose of this reg->id is twofold: i) it is used in find_good_pkt_pointers() for setting the allowed access range for regs with PTR_TO_PACKET of same id once verifier matched on data/data_end tests, and ii) for check_ptr_alignment() to determine that when not having efficient unaligned access and register with UNKNOWN_VALUE was added to PTR_TO_PACKET, that we're only allowed to access the content bytewise due to unknown unalignment. reg->id was never intended for PTR_TO_MAP_VALUE{,_ADJ} types and thus is always zero, the only marking is in PTR_TO_MAP_VALUE_OR_NULL that was added after 484611357c19 via 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers"). Above tests will fail for non-root environment due to prohibited pointer arithmetic. The fix splits register-type specific checks into their own helper instead of keeping them combined, so we don't run into a similar issue in future once we extend check_ptr_alignment() further and forget to add reg->type checks for some of the checks. Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01bpf, verifier: fix alu ops against map_value{, _adj} register typesDaniel Borkmann1-0/+1
While looking into map_value_adj, I noticed that alu operations directly on the map_value() resp. map_value_adj() register (any alu operation on a map_value() register will turn it into a map_value_adj() typed register) are not sufficiently protected against some of the operations. Two non-exhaustive examples are provided that the verifier needs to reject: i) BPF_AND on r0 (map_value_adj): 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) *(u64 *)(r2 +0) = 0 3: (18) r1 = 0xbf842a00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+2 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 7: (57) r0 &= 8 8: (7a) *(u64 *)(r0 +0) = 22 R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp 9: (95) exit from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp 9: (95) exit processed 10 insns ii) BPF_ADD in 32 bit mode on r0 (map_value_adj): 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) *(u64 *)(r2 +0) = 0 3: (18) r1 = 0xc24eee00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+2 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 7: (04) (u32) r0 += (u32) 0 8: (7a) *(u64 *)(r0 +0) = 22 R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 9: (95) exit from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp 9: (95) exit processed 10 insns Issue is, while min_value / max_value boundaries for the access are adjusted appropriately, we change the pointer value in a way that cannot be sufficiently tracked anymore from its origin. Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact, all the test cases coming with 484611357c19 ("bpf: allow access into map value arrays") perform BPF_ADD only on the destination register that is PTR_TO_MAP_VALUE_ADJ. Only for UNKNOWN_VALUE register types such operations make sense, f.e. with unknown memory content fetched initially from a constant offset from the map value memory into a register. That register is then later tested against lower / upper bounds, so that the verifier can then do the tracking of min_value / max_value, and properly check once that UNKNOWN_VALUE register is added to the destination register with type PTR_TO_MAP_VALUE{,_ADJ}. This is also what the original use-case is solving. Note, tracking on what is being added is done through adjust_reg_min_max_vals() and later access to the map value enforced with these boundaries and the given offset from the insn through check_map_access_adj(). Tests will fail for non-root environment due to prohibited pointer arithmetic, in particular in check_alu_op(), we bail out on the is_pointer_value() check on the dst_reg (which is false in root case as we allow for pointer arithmetic via env->allow_ptr_leaks). Similarly to PTR_TO_PACKET, one way to fix it is to restrict the allowed operations on PTR_TO_MAP_VALUE{,_ADJ} registers to 64 bit mode BPF_ADD. The test_verifier suite runs fine after the patch and it also rejects mentioned test cases. Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01r8152: The Microsoft Surface docks also use R8152 v2René Rebe2-0/+18
Without this the generic cdc_ether grabs the device, and does not really work. Signed-off-by: René Rebe <rene@exactcode.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01openvswitch: Fix ovs_flow_key_update()Yi-Hung Wei1-2/+8
ovs_flow_key_update() is called when the flow key is invalid, and it is used to update and revalidate the flow key. Commit 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key") introduces mac_proto field to flow key and use it to determine whether the flow key is valid. However, the commit does not update the code path in ovs_flow_key_update() to revalidate the flow key which may cause BUG_ON() on execute_recirc(). This patch addresses the aforementioned issue. Fixes: 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01net/faraday: Explicitly include linux/of.h and linux/property.hMark Brown1-0/+2
This driver uses interfaces from linux/of.h and linux/property.h but relies on implict inclusion of those headers which means that changes in other headers could break the build, as happened in -next for arm today. Add a explicit includes. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01net: hns: Add ACPI support to check SFP presentDaode Huang2-5/+34
The current code only supports DT to check SFP present. This patch adds ACPI support as well. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01tty: pl011: fix earlycon work-around for QDF2400 erratum 44Timur Tabi1-2/+21
The work-around for the Qualcomm Datacenter Technologies QDF2400 erratum 44 sets the "qdf2400_e44_present" global variable if the work-around is needed. However, this check does not happen until after earlycon is initialized, which means the work-around is not used, and the console hangs as soon as it displays one character. Fixes: d8a4995bcea1 ("tty: pl011: Work around QDF2400 E44 stuck BUSY bit") Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31kasan: do not sanitize kexec purgatoryMike Galbraith1-0/+1
Fixes this: kexec: Undefined symbol: __asan_load8_noabort kexec-bzImage64: Loading purgatory failed Link: http://lkml.kernel.org/r/1489672155.4458.7.camel@gmx.de Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31drivers/rapidio/devices/tsi721.c: make module parameter variable name uniqueRandy Dunlap2-4/+4
kbuild test robot reported a non-static variable name collision between a staging driver and a RapidIO driver, with a generic variable name of 'dbg_level'. Both drivers should be changed so that they don't use this generic public variable name. This patch fixes the RapidIO driver but does not change the user interface (name) for the module parameter. drivers/staging/built-in.o:(.bss+0x109d0): multiple definition of `dbg_level' drivers/rapidio/built-in.o:(.bss+0x16c): first defined here Link: http://lkml.kernel.org/r/ab527fc5-aa3c-4b07-5d48-eef5de703192@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm/hugetlb.c: don't call region_abort if region_chg failsMike Kravetz1-1/+3
Changes to hugetlbfs reservation maps is a two step process. The first step is a call to region_chg to determine what needs to be changed, and prepare that change. This should be followed by a call to call to region_add to commit the change, or region_abort to abort the change. The error path in hugetlb_reserve_pages called region_abort after a failed call to region_chg. As a result, the adds_in_progress counter in the reservation map is off by 1. This is caught by a VM_BUG_ON in resv_map_release when the reservation map is freed. syzkaller fuzzer (when using an injected kmalloc failure) found this bug, that resulted in the following: kernel BUG at mm/hugetlb.c:742! Call Trace: hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493 evict+0x481/0x920 fs/inode.c:553 iput_final fs/inode.c:1515 [inline] iput+0x62b/0xa20 fs/inode.c:1542 hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306 newseg+0x422/0xd30 ipc/shm.c:575 ipcget_new ipc/util.c:285 [inline] ipcget+0x21e/0x580 ipc/util.c:639 SYSC_shmget ipc/shm.c:673 [inline] SyS_shmget+0x158/0x230 ipc/shm.c:657 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742 Link: http://lkml.kernel.org/r/1490821682-23228-1-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31kasan: report only the first error by defaultMark Rutland5-5/+55
Disable kasan after the first report. There are several reasons for this: - Single bug quite often has multiple invalid memory accesses causing storm in the dmesg. - Write OOB access might corrupt metadata so the next report will print bogus alloc/free stacktraces. - Reports after the first easily could be not bugs by itself but just side effects of the first one. Given that multiple reports usually only do harm, it makes sense to disable kasan after the first one. If user wants to see all the reports, the boot-time parameter kasan_multi_shot must be used. [aryabinin@virtuozzo.com: wrote changelog and doc, added missing include] Link: http://lkml.kernel.org/r/20170323154416.30257-1-aryabinin@virtuozzo.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31hugetlbfs: initialize shared policy as part of inode allocationMike Kravetz1-13/+12
Any time after inode allocation, destroy_inode can be called. The hugetlbfs inode contains a shared_policy structure, and mpol_free_shared_policy is unconditionally called as part of hugetlbfs_destroy_inode. Initialize the policy as part of inode allocation so that any quick (error path) calls to destroy_inode will be handed an initialized policy. syzkaller fuzzer found this bug, that resulted in the following: BUG: KASAN: user-memory-access in atomic_inc include/asm-generic/atomic-instrumented.h:87 [inline] at addr 000000131730bd7a BUG: KASAN: user-memory-access in __lock_acquire+0x21a/0x3a80 kernel/locking/lockdep.c:3239 at addr 000000131730bd7a Write of size 4 by task syz-executor6/14086 CPU: 3 PID: 14086 Comm: syz-executor6 Not tainted 4.11.0-rc3+ #364 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: atomic_inc include/asm-generic/atomic-instrumented.h:87 [inline] __lock_acquire+0x21a/0x3a80 kernel/locking/lockdep.c:3239 lock_acquire+0x1ee/0x590 kernel/locking/lockdep.c:3762 __raw_write_lock include/linux/rwlock_api_smp.h:210 [inline] _raw_write_lock+0x33/0x50 kernel/locking/spinlock.c:295 mpol_free_shared_policy+0x43/0xb0 mm/mempolicy.c:2536 hugetlbfs_destroy_inode+0xca/0x120 fs/hugetlbfs/inode.c:952 alloc_inode+0x10d/0x180 fs/inode.c:216 new_inode_pseudo+0x69/0x190 fs/inode.c:889 new_inode+0x1c/0x40 fs/inode.c:918 hugetlbfs_get_inode+0x40/0x420 fs/hugetlbfs/inode.c:734 hugetlb_file_setup+0x329/0x9f0 fs/hugetlbfs/inode.c:1282 newseg+0x422/0xd30 ipc/shm.c:575 ipcget_new ipc/util.c:285 [inline] ipcget+0x21e/0x580 ipc/util.c:639 SYSC_shmget ipc/shm.c:673 [inline] SyS_shmget+0x158/0x230 ipc/shm.c:657 entry_SYSCALL_64_fastpath+0x1f/0xc2 Analysis provided by Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/1490477850-7944-1-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm: fix section name for .data..ro_after_initKees Cook5-9/+6
A section name for .data..ro_after_init was added by both: commit d07a980c1b8d ("s390: add proper __ro_after_init support") and commit d7c19b066dcf ("mm: kmemleak: scan .data.ro_after_init") The latter adds incorrect wrapping around the existing s390 section, and came later. I'd prefer the s390 naming, so this moves the s390-specific name up to the asm-generic/sections.h and renames the section as used by kmemleak (and in the future, kernel/extable.c). Link: http://lkml.kernel.org/r/20170327192213.GA129375@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390 parts] Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Eddie Kovsky <ewk@edkovsky.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()Naoya Horiguchi1-2/+4
I found the race condition which triggers the following bug when move_pages() and soft offline are called on a single hugetlb page concurrently. Soft offlining page 0x119400 at 0x700000000000 BUG: unable to handle kernel paging request at ffffea0011943820 IP: follow_huge_pmd+0x143/0x190 PGD 7ffd2067 PUD 7ffd1067 PMD 0 [61163.582052] Oops: 0000 [#1] SMP Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check] CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P OE 4.11.0-rc2-mm1+ #2 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:follow_huge_pmd+0x143/0x190 RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202 RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000 RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80 RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000 R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800 R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000 FS: 00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0 Call Trace: follow_page_mask+0x270/0x550 SYSC_move_pages+0x4ea/0x8f0 SyS_move_pages+0xe/0x10 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fc976e03949 RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949 RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827 RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004 R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650 R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000 Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0 CR2: ffffea0011943820 ---[ end trace e4f81353a2d23232 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This bug is triggered when pmd_present() returns true for non-present hugetlb, so fixing the present check in follow_huge_pmd() prevents it. Using pmd_present() to determine present/non-present for hugetlb is not correct, because pmd_present() checks multiple bits (not only _PAGE_PRESENT) for historical reason and it can misjudge hugetlb state. Fixes: e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()") Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm: workingset: fix premature shadow node shrinking with cgroupsJohannes Weiner1-1/+1
Commit 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware") enabled cgroup-awareness in the shadow node shrinker, but forgot to also enable cgroup-awareness in the list_lru the shadow nodes sit on. Consequently, all shadow nodes are sitting on a global (per-NUMA node) list, while the shrinker applies the limits according to the amount of cache in the cgroup its shrinking. The result is excessive pressure on the shadow nodes from cgroups that have very little cache. Enable memcg-mode on the shadow node LRUs, such that per-cgroup limits are applied to per-cgroup lists. Fixes: 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware") Link: http://lkml.kernel.org/r/20170322005320.8165-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov@tarantool.org> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>