linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-06-14	ipv6: fix checksum annotation in udp6_csum_init	Hannes Frederic Sowa	1	-2/+5
	Cc: Tom Herbert <tom@herbertland.com> Fixes: 4068579e1e098fa ("net: Implmement RFC 6936 (zero RX csums for UDP/IPv6") Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14	ipv6: tcp: fix endianness annotation in tcp_v6_send_response	Hannes Frederic Sowa	1	-2/+2
	Cc: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Fixes: 1d13a96c74fc ("ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAIT") Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14	ipv6: fix endianness error in icmpv6_err	Hannes Frederic Sowa	1	-1/+1
	IPv6 ping socket error handler doesn't correctly convert the new 32 bit mtu to host endianness before using. Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 6d0bfe22611602f ("net: ipv6: Add IPv6 support to the ping socket.") Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14	phy: marvell: fix LED configuration via marvell,reg-init	Clemens Gruber	1	-12/+26
	Configuring the PHY LED registers for the Marvell 88E1510 and others is not possible, because regardless of the values in marvell,reg-init, it is later overridden in m88e1121_config_aneg with a non-standard default. This patch moves that default configuration to .config_init to allow setting the LED configuration through marvell,reg-init in the device tree, which should override said default if it exists. Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-12	net: ethernet: ti: cpsw: use destroy ctlr to destroy channels	Ivan Khoronzhuk	1	-4/+0
	There is no reason to destroy channels that are destroyed while cpdma_ctlr destroy. In this case no need to remember how much channels where created and destroy them by one, as cpdma_ctlr destroys all of them. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-12	net_sched: prio: properly report out of memory errors	Eric Dumazet	1	-20/+12
	At Qdisc creation or change time, prio_tune() creates missing pfifo qdiscs but does not return an error code if one qdisc could not be allocated. Leaving a qdisc in non operational state without telling user anything about this problem is not good. Also, testing if we replace something different than noop_qdisc a second time makes no sense so I removed useless code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11	ipconfig: Protect ic_addrservaddr with IPCONFIG_DYNAMIC.	David S. Miller	1	-0/+2
	>> net/ipv4/ipconfig.c:130:15: warning: 'ic_addrservaddr' defined but not used [-Wunused-variable] static __be32 ic_addrservaddr = NONE; /* IP Address of the IP addresses'server */ Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: au1000_eth: fix PHY detection	Manuel Lauss	1	-2/+2
	Commit 7f854420fbfe9d49afe2ffb1df052cfe8e215541 ("phy: Add API for {un}registering an mdio device to a bus.") broke PHY detection on this driver with a copy-paste bug: The code is looking 32 times for a PHY at address 0. Fixes ethernet on AMD DB1100/DB1500/DB1550 boards which have their (autodetected) PHYs at address 31. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	Revert "net: au1000_eth: fix PHY detection"	David S. Miller	1	-6/+7
	This reverts commit a2f27217e4e60e663b5b971b0ccb287a9548b04e. I applied the wrong version of this. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: au1000_eth: fix PHY detection	Manuel Lauss	1	-7/+6
	Commit 7f854420fbfe9d49afe2ffb1df052cfe8e215541 ("phy: Add API for {un}registering an mdio device to a bus.") broke PHY detection on this driver with a copy-paste bug: The code is looking 32 times for a PHY at address 0. Fixes ethernet on AMD DB1100/DB1500/DB1550 boards which have their (autodetected) PHYs at address 31. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: remove superfluous queue wake up call	John Crispin	1	-5/+2
	The code checks if the queue should be stopped because we are below the threshold of free descriptors only to check if it should be started again. If we do end up in a state where we are at the threshold limit, it makes more sense to just stop the queue and wait for the next IRQ to trigger the TX housekeeping again. There is no rush in enqueuing the next packet, it needs to wait for all the others in the queue to be dispatched first anyway. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: only wake the queue if it is stopped	John Crispin	1	-1/+16
	The current code unconditionally wakes up the queue at the end of each tx_poll action. Change the code to only wake up the queues if any of them have actually been stopped before. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: fix off by one in the TX ring allocation	John Crispin	1	-2/+1
	The TX ring setup has an off by one error causing it to not utilise all descriptors. This has the side effect that we need to reset the next pointer at runtime to make it work. Fix the off by one and remove the code fixing the ring at runtime. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: increase watchdog_timeo	John Crispin	1	-1/+1
	During stress testing, after reducing the threshold value, we have seen TX timeouts that were caused by the watchdog_timeo value being too low. Increase the value to 5 * HZ which is a value commonly used by many other drivers. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: fix threshold value	John Crispin	1	-2/+1
	The logic to calculate the threshold value for stopping the TX queue is bad. Currently it will always use 1/2 of the rings size, which is way too much. Set the threshold to MAX_SKB_FRAGS. This makes sure that the queue is stopped when there is not enough room to accept an additional segment. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: disable all interrupts during probe	John Crispin	1	-1/+1
	The current code only disables those IRQs that we will later use. To ensure that we have a predefined state, we really want to disable all IRQs. Change the code to disable all IRQs to achieve this. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: add next data pointer coherency protection	John Crispin	2	-1/+2
	The QDMA engine can fail to update the register pointing to the next TX descriptor if this bit does not get set in the QDMA configuration register. Not setting this bit can result in invalid values inside the TX rings registers which will causes TX stalls. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: dropped rx packets are not being counted properly	John Crispin	1	-0/+2
	There are two places inside mtk_poll_rx where rx_dropped is not being incremented properly. Fix this by adding the missing code to increment the counter. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: invalid buffer lookup in mtk_tx_map()	John Crispin	1	-1/+1
	The lookup of the tx_buffer in the error path inside mtk_tx_map() uses the wrong descriptor pointer. This looks like a copy & paste error. Change the code to use the correct pointer. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: fix missing free of scratch memory	John Crispin	2	-5/+15
	Scratch memory gets allocated in mtk_init_fq_dma() but the corresponding code to free it is missing inside mtk_dma_free() causing a memory leak. With this patch applied, we can run ifconfig up/down several thousand times without any problems. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: mediatek: add missing return code check	John Crispin	1	-0/+3
	The code fails to check if the scratch memory was properly allocated. Add this check and return with an error if the allocation failed. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: ipconfig: avoid warning by making ic_addrservaddr static	Ben Dooks	1	-1/+1
	The symbol ic_addrservaddr is not static, but has no declaration to match so make it static to fix the following warning: net/ipv4/ipconfig.c:130:8: warning: symbol 'ic_addrservaddr' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: diag: add missing declarations	Ben Dooks	2	-6/+6
	The functions inet_diag_msg_common_fill and inet_diag_msg_attrs_fill seem to have been missed from the include/linux/inet_diag.h header file. Add them to fix the following warnings: net/ipv4/inet_diag.c:69:6: warning: symbol 'inet_diag_msg_common_fill' was not declared. Should it be static? net/ipv4/inet_diag.c:108:5: warning: symbol 'inet_diag_msg_attrs_fill' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	bridge: Fix incorrect re-injection of STP packets	Ido Schimmel	1	-3/+12
	Commit 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") fixed incorrect usage of NF_HOOK's return value by consuming packets in okfn via br_pass_frame_up(). However, this function re-injects packets to the Rx path with skb->dev set to the bridge device, which breaks kernel's STP, as all STP packets appear to originate from the bridge device itself. Instead, if STP is enabled and bridge isn't a 802.1ad bridge, then learn packet's SMAC and inject it back to the Rx path for further processing by the packet handlers. The patch also makes netfilter's behavior consistent with regards to packets destined to the Bridge Group Address, as no hook registered at LOCAL_IN will ever be called, regardless if STP is enabled or not. Cc: Florian Westphal <fw@strlen.de> Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	net: phy: smsc: reintroduced unconditional soft reset	Manfred Schlaegl	1	-13/+4
	We detected some problems using the smsc lan8720a in combination with i.MX28 and tracked this down to commit 21009686662f ("net: phy: smsc: move smsc_phy_config_init reset part in a soft_reset function") With 2100968666 the generic soft reset is replaced by a specific function which handles power down state correctly. But additionally the soft reset itself got conditional and is therefore also only performed if the phy is in power down state. This patch keeps the conditional wake up from power down, but re-introduces the unconditional soft reset using the generic soft reset function. It was tested on linux-4.1.25 and linux-4.7.0-rc2. Signed-off-by: Manfred Schlaegl <manfred.schlaegl@ginzinger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	vmxnet3: segCnt can be 1 for LRO packets	Shrikrishna Khare	2	-3/+3
	The device emulation may send segCnt of 1 for LRO packets. Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: Jin Heo <heoj@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	packet: compat support for sock_fprog	Willem de Bruijn	3	-2/+41
	Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains a pointer into user memory. If userland is 32-bit and kernel is 64-bit the two disagree about the layout of struct sock_fprog. Add compat setsockopt support to convert a 32-bit compat_sock_fprog to a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for SO_REUSEPORT added in commit 1957598840f4 ("soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF"). Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	stmmac: fix parameter to dwmac4_set_umac_addr()	Ben Dooks	1	-1/+1
	The dwmac4_set_umac_addr() takes a struct mac_device_info as the first parameter, but is being passed a ioaddr instead from dwmac4_set_filter(). Fix the warning/bug by changing the first parameter. drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: warning: incorrect type in argument 1 (different address spaces) drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: expected struct mac_device_info hw drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: got void [noderef] <asn:2>ioaddr Note, only compile tested this as do not have any hardware with it in. Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5e: Fix blue flame quota logic	Eli Cohen	1	-1/+2
	Blue flame is a latency enhancement feature that allows the driver to write the packet data directly to the NIC's registers thus making the read of the packet data from host memory redundant. We maintain a quota for the blue flame which is reloaded whenever we identify that the hardware is processing send requests and processes them fast enough so by the time we post the next send request it was able to process all the pending ones. This indicates that the hardware is capable of processing more blue flame requests efficiently. The blue flame quota is decremented whenever we send using blue flame. The current code erroneously clears the budget if we did not use blue flame for the current post send operation and we fix it here. Fixes: 88a85f99e51f ('net/mlx5e: TX latency optimization to save DMA reads') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5e: Use ndo_stop explicitly at shutdown flow	Eran Ben Elisha	1	-4/+1
	The current implementation copies the flow of ndo_stop instead of calling it explicitly, Fixed it. Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: E-Switch, always set mc_promisc for allmulti vports	Mohamad Haj Yahia	1	-0/+1
	Set the mc_promisc flag also in the case of adding new mc address to existing allmulti vport. Fixes: a35f71f27a61 ('net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: E-Switch, Modify node guid on vf set MAC	Noa Osherovich	4	-4/+68
	In RoCE, the RDMA-CM needs the node guid to establish connection between nodes. Today, the node guid exposed to mlx5 Ethernet VFs is zero, therefore RDMA-CM on the VF is broken. Whenever the administrator sets a MAC for a VF, derive the node guid from it and set it as well in the following way: MAC: e4:1d:2d:b3:f4:01 -> node_guid: e4:1d:2d:ff:fe:b3:f4:01 Fixes: 77256579c6b43 ('net/mlx5: E-Switch, Introduce Vport...') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: E-Switch, Fix vport enable flow	Mohamad Haj Yahia	1	-4/+1
	Reorder vport enable flow to mark the vport as enabled before calling the vport change handler which was modified to handle the case for when vport is not enabled. This fixes the case for when the PF netdev is open before sriov is enabled, once sriov is enabled at esw_enable_vport, esw_vport_change_handle_locked didn't read the PF context since it thought the PF vport was not enabled. When we enable the vport, arming for events is not required anymore, since it's done on the vport change handle Fixes: 586cfa7f1d58 ('net/mlx5: E-Switch, Use vport event handler for vport cleanup') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: E-Switch, Use the correct error check on returned pointers	Or Gerlitz	1	-17/+17
	The mlx5 flow-steering API (mlx5_create_flow_table/group/rule) never returns null pointer on error. Even if it was doing that, checking for IS_ERR_OR_NULL(p) and then returning PTR_ERR(p) would have cause bugs, since PTR_ERR(NULL) --> success, crash. To make things more robust and protect against related future bugs, convert all IS_ERR_OR_NULL checks on returned values to IS_ERR. Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs') Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: E-Switch, Use the correct free() function	Or Gerlitz	1	-3/+3
	We must use kvfree() for something that could have been allocated with vzalloc(), do that. Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs') Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: Fix E-Switch flow steering capabilities check	Maor Gottlieb	1	-13/+15
	Add missing capabilities check for E-Switch FDB and ACLs flow tables before creating their namespace in flow steering. Fixes: efdc810ba39d ('net/mlx5: Flow steering, Add vport ACL support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: Fix flow steering NIC capabilities check	Maor Gottlieb	2	-1/+15
	Flow steering infrastructure is currently used only on link layer ethernet, therefore the driver should initialize the flow steering when the device link layer is ethernet. In addition, add missing capability check before initializing the namespace of NIC RX flow tables. Fixes: 2530236303d9 ('net/mlx5_core: Flow steering tree initialization') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: Fix root flow table update	Maor Gottlieb	1	-1/+1
	When we destroy the last flow table we need to update the root_ft to NULL. It fixes an issue for when the last flow table is destroyed and recreated again, root_ft pointer will not be updated, as a result traffic will be dropped. Fixes: 2cc43b494a6c ('net/mlx5_core: Managing root flow table') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly	Shahar Klein	2	-3/+2
	Having MLX5_CMD_OP_MAX on another file causes us to repeatedly miss accounting new commands added to the driver and hence there're no entries for them in debugfs. To solve that, we integrate it into the commands enum as the last entry. Fixes: 34a40e689393 ('net/mlx5_core: Introduce modify flow table command') Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: Fix masking of reserved bits in XRCD number	Majd Dibbiny	1	-1/+1
	Mask the reserved bits when reading the number of newly created XRCD. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09	net/mlx5: Fix the size of modify QP mailbox	Majd Dibbiny	1	-0/+1
	Add 16 reserved bytes at the end of mlx5_modify_qp_mbox_in to match the hardware spec definition. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	drm/amdgpu: fix warning with powerplay disabled.	Dave Airlie	1	-1/+1
	This just fixes a warning when you disable powerplay. Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09	mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED	Oleg Drokin	1	-0/+11
	I noticed that the logic in the fadvise64_64 syscall is incorrect for partial pages. While first page of the region is correctly skipped if it is partial, the last page of the region is mistakenly discarded. This leads to problems for applications that read data in non-page-aligned chunks discarding already processed data between the reads. A somewhat misguided application that does something like write(XX bytes (non-page-alligned)); drop the data it just wrote; repeat gets a significant penalty in performance as a result. Link: http://lkml.kernel.org/r/1464917140-1506698-1-git-send-email-green@linuxhacker.ru Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-09	mm: introduce dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_all	Wang Sheng-Hui	1	-1/+19
	This patch is based on https://patchwork.ozlabs.org/patch/574623/. Tejun submitted commit 23d11a58a9a6 ("workqueue: skip flush dependency checks for legacy workqueues") for the legacy create*_workqueue() interface. But some workq created by alloc_workqueue still reports warning on memory reclaim, e.g nvme_workq with flag WQ_MEM_RECLAIM set: workqueue: WQ_MEM_RECLAIM nvme:nvme_reset_work is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu ------------[ cut here ]------------ WARNING: CPU: 0 PID: 6 at SoC/linux/kernel/workqueue.c:2448 check_flush_dependency+0xb4/0x10c ... check_flush_dependency+0xb4/0x10c flush_work+0x54/0x140 lru_add_drain_all+0x138/0x188 migrate_prep+0xc/0x18 alloc_contig_range+0xf4/0x350 cma_alloc+0xec/0x1e4 dma_alloc_from_contiguous+0x38/0x40 __dma_alloc+0x74/0x25c nvme_alloc_queue+0xcc/0x36c nvme_reset_work+0x5c4/0xda8 process_one_work+0x128/0x2ec worker_thread+0x58/0x434 kthread+0xd4/0xe8 ret_from_fork+0x10/0x50 That's because lru_add_drain_all() will schedule the drain work on system_wq, whose flag is set to 0, !WQ_MEM_RECLAIM. Introduce a dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_all(), aiding in getting memory freed. Link: http://lkml.kernel.org/r/1464917521-9775-1-git-send-email-shhuiw@foxmail.com Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-09	kernel/relay.c: fix potential memory leak	Zhouyi Zhou	1	-0/+1
	When relay_open_buf() fails in relay_open(), code will goto free_bufs, but chan is nowhere freed. Link: http://lkml.kernel.org/r/1464777927-19675-1-git-send-email-yizhouzhou@ict.ac.cn Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-09	mm: thp: broken page count after commit aa88b68c3b1d	Gerald Schaefer	1	-1/+4
	Christian Borntraeger reported a kernel panic after corrupt page counts, and it turned out to be a regression introduced with commit aa88b68c3b1d ("thp: keep huge zero page pinned until tlb flush"), at least on s390. put_huge_zero_page() was moved over from zap_huge_pmd() to release_pages(), and it was replaced by tlb_remove_page(). However, release_pages() might not always be triggered by (the arch-specific) tlb_remove_page(). On s390 we call free_page_and_swap_cache() from tlb_remove_page(), and not tlb_flush_mmu() -> free_pages_and_swap_cache() like the generic version, because we don't use the MMU-gather logic. Although both functions have very similar names, they are doing very unsimilar things, in particular free_page_xxx is just doing a put_page(), while free_pages_xxx calls release_pages(). This of course results in very harmful put_page()s on the huge zero page, on architectures where tlb_remove_page() is implemented in this way. It seems to affect only s390 and sh, but sh doesn't have THP support, so the problem (currently) probably only exists on s390. The following quick hack fixed the issue: Link: http://lkml.kernel.org/r/20160602172141.75c006a9@thinkpad Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@vger.kernel.org> [4.6.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-09	revert "mm: memcontrol: fix possible css ref leak on oom"	Andrew Morton	1	-1/+1
	Revert commit 1383399d7be0 ("mm: memcontrol: fix possible css ref leak on oom"). Johannes points out "There is a task_in_memcg_oom() check before calling mem_cgroup_oom()". Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-09	kasan: change memory hot-add error messages to info messages	Shuah Khan	1	-2/+2
	Change the following memory hot-add error messages to info messages. There is no need for these to be errors. kasan: WARNING: KASAN doesn't support memory hot-add kasan: Memory hot-add will be disabled Link: http://lkml.kernel.org/r/1464794430-5486-1-git-send-email-shuahkh@osg.samsung.com Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-09	mm/hugetlb: fix huge page reserve accounting for private mappings	Mike Kravetz	1	-2/+40
	When creating a private mapping of a hugetlbfs file, it is possible to unmap pages via ftruncate or fallocate hole punch. If subsequent faults repopulate these mappings, the reserve counts will go negative. This is because the code currently assumes all faults to private mappings will consume reserves. The problem can be recreated as follows: - mmap(MAP_PRIVATE) a file in hugetlbfs filesystem - write fault in pages in the mapping - fallocate(FALLOC_FL_PUNCH_HOLE) some pages in the mapping - write fault in pages in the hole This will result in negative huge page reserve counts and negative subpool usage counts for the hugetlbfs. Note that this can also be recreated with ftruncate, but fallocate is more straight forward. This patch modifies the routines vma_needs_reserves and vma_has_reserves to examine the reserve map associated with private mappings similar to that for shared mappings. However, the reserve map semantics for private and shared mappings are very different. This results in subtly different code that is explained in the comments. Link: http://lkml.kernel.org/r/1464720957-15698-1-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-09	mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name()	Ido Schimmel	2	-43/+26
	When rtnl_fill_ifinfo() is called for a certain netdevice it queries its various parameters such as switch id and physical port name. The function might get called in an atomic context, which means the underlying driver must not sleep during the query operation. Don't query the device and sleep during ndo_get_phys_port_name(), but instead store the needed parameters in port creation time. Fixes: 2bf9a58675c5 ("mlxsw: spectrum: Add support for physical port names") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>