linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-10-22	mISDN: fix OOM condition for sending queued I-Frames	Karsten Keil	1	-34/+20
	The old code did not check the return value of skb_clone(). The extra skb_clone() is not needed at all, if using skb_realloc_headroom() instead, which gives us a private copy with enough headroom as well. We need to requeue the original skb if the call failed, because we cannot inform upper layers about the data loss. Restructure the code to minimise rollback effort if it happens. This fix kernel bug #86091 Thanks to Insu Yun <wuninsu@gmail.com> to remind me on this issue. Signed-off-by: Karsten Keil <keil@b1-systems.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22	ISDN: fix OOM condition for sending queued I-Frames	Karsten Keil	1	-12/+8
	The skb_clone() return value was not checked and the skb_realloc_headroom() usage was wrong, the old skb was not freed. It turned out, that the skb_clone is not needed at all, the skb_realloc_headroom() will create a private copy with enough headroom and the original SKB can be used for the ACK queue. We need to requeue the original skb if the call failed, since the upper layer cannot be informed about memory shortage. Thanks to Insu Yun <wuninsu@gmail.com> to remind me on this issue. Signed-off-by: Karsten Keil <keil@b1-systems.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22	VSOCK: sock_put wasn't safe to call in interrupt context	Jorgen Hansen	2	-91/+86
	In the vsock vmci_transport driver, sock_put wasn't safe to call in interrupt context, since that may call the vsock destructor which in turn calls several functions that should only be called from process context. This change defers the callling of these functions to a worker thread. All these functions were deallocation of resources related to the transport itself. Furthermore, an unused callback was removed to simplify the cleanup. Multiple customers have been hitting this issue when using VMware tools on vSphere 2015. Also added a version to the vmci transport module (starting from 1.0.2.0-k since up until now it appears that this module was sharing version with vsock that is currently at 1.0.1.0-k). Reviewed-by: Aditya Asarwade <asarwade@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22	netlink: fix locking around NETLINK_LIST_MEMBERSHIPS	David Herrmann	1	-2/+2
	Currently, NETLINK_LIST_MEMBERSHIPS grabs the netlink table while copying the membership state to user-space. However, grabing the netlink table is effectively a write_lock_irq(), and as such we should not be triggering page-faults in the critical section. This can be easily reproduced by the following snippet: int s = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); void p = mmap(0, 4096, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANON, -1, 0); int r = getsockopt(s, 0x10e, 9, p, (void)((char*)p + 4092)); This should work just fine, but currently triggers EFAULT and a possible WARN_ON below handle_mm_fault(). Fix this by reducing locking of NETLINK_LIST_MEMBERSHIPS to a read-side lock. The write-lock was overkill in the first place, and the read-lock allows page-faults just fine. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22	net: phy: dp83848: Add TI DP83848 Ethernet PHY	Andrew F. Davis	3	-0/+105
	Add support for the TI DP83848 Ethernet PHY device. The DP83848 is a highly reliable, feature rich, IEEE 802.3 compliant single port 10/100 Mb/s Ethernet Physical Layer Transceiver supporting the MII and RMII interfaces. Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: Dan Murphy <dmurphy@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	net: sun4i-emac: Properly free resources on probe failure and remove	Hans de Goede	1	-4/+16
	Fix sun4i-emac not releasing the following resources: -iomapped memory not released on probe-failure nor on remove -clock not getting disabled on probe-failure nor on remove -sram not being released on remove And while at it also add error checking to the clk_prepare_enable call done on probe. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	openvswitch: Serialize nested ct actions if provided	Joe Stringer	1	-4/+10
	If userspace provides a ct action with no nested mark or label, then the storage for these fields is zeroed. Later when actions are requested, such zeroed fields are serialized even though userspace didn't originally specify them. Fix the behaviour by ensuring that no action is serialized in this case, and reject actions where userspace attempts to set these fields with mask=0. This should make netlink marshalling consistent across deserialization/reserialization. Reported-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	openvswitch: Mark connections new when not confirmed.	Joe Stringer	1	-5/+5
	New, related connections are marked as such as part of ovs_ct_lookup(), but they are not marked as "new" if the commit flag is used. Make this consistent by setting the "new" flag whenever !nf_ct_is_confirmed(ct). Reported-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	openvswitch: Clarify conntrack COMMIT behaviour	Joe Stringer	1	-1/+2
	The presence of this attribute does not modify the ct_state for the current packet, only future packets. Make this more clear in the header definition. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	openvswitch: Reject ct_state masks for unknown bits	Joe Stringer	2	-12/+9
	Currently, 0-bits are generated in ct_state where the bit position is undefined, and matches are accepted on these bit-positions. If userspace requests to match the 0-value for this bit then it may expect only a subset of traffic to match this value, whereas currently all packets will have this bit set to 0. Fix this by rejecting such masks. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	tcp: remove improper preemption check in tcp_xmit_probe_skb()	Renato Westphal	1	-1/+1
	Commit e520af48c7e5a introduced the following bug when setting the TCP_REPAIR sockoption: [ 2860.657036] BUG: using __this_cpu_add() in preemptible [00000000] code: daemon/12164 [ 2860.657045] caller is __this_cpu_preempt_check+0x13/0x20 [ 2860.657049] CPU: 1 PID: 12164 Comm: daemon Not tainted 4.2.3 #1 [ 2860.657051] Hardware name: Dell Inc. PowerEdge R210 II/0JP7TR, BIOS 2.0.5 03/13/2012 [ 2860.657054] ffffffff81c7f071 ffff880231e9fdf8 ffffffff8185d765 0000000000000002 [ 2860.657058] 0000000000000001 ffff880231e9fe28 ffffffff8146ed91 ffff880231e9fe18 [ 2860.657062] ffffffff81cd1a5d ffff88023534f200 ffff8800b9811000 ffff880231e9fe38 [ 2860.657065] Call Trace: [ 2860.657072] [<ffffffff8185d765>] dump_stack+0x4f/0x7b [ 2860.657075] [<ffffffff8146ed91>] check_preemption_disabled+0xe1/0xf0 [ 2860.657078] [<ffffffff8146edd3>] __this_cpu_preempt_check+0x13/0x20 [ 2860.657082] [<ffffffff817e0bc7>] tcp_xmit_probe_skb+0xc7/0x100 [ 2860.657085] [<ffffffff817e1e2d>] tcp_send_window_probe+0x2d/0x30 [ 2860.657089] [<ffffffff817d1d8c>] do_tcp_setsockopt.isra.29+0x74c/0x830 [ 2860.657093] [<ffffffff817d1e9c>] tcp_setsockopt+0x2c/0x30 [ 2860.657097] [<ffffffff81767b74>] sock_common_setsockopt+0x14/0x20 [ 2860.657100] [<ffffffff817669e1>] SyS_setsockopt+0x71/0xc0 [ 2860.657104] [<ffffffff81865172>] entry_SYSCALL_64_fastpath+0x16/0x75 Since tcp_xmit_probe_skb() can be called from process context, use NET_INC_STATS() instead of NET_INC_STATS_BH(). Fixes: e520af48c7e5 ("tcp: add TCPWinProbe and TCPKeepAlive SNMP counters") Signed-off-by: Renato Westphal <renatow@taghos.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	tipc: conditionally expand buffer headroom over udp tunnel	Jon Paul Maloy	1	-0/+5
	In commit d999297c3dbbe ("tipc: reduce locking scope during packet reception") we altered the packet retransmission function. Since then, when restransmitting packets, we create a clone of the original buffer using __pskb_copy(skb, MIN_H_SIZE), where MIN_H_SIZE is the size of the area we want to have copied, but also the smallest possible TIPC packet size. The value of MIN_H_SIZE is 24. Unfortunately, __pskb_copy() also has the effect that the headroom of the cloned buffer takes the size MIN_H_SIZE. This is too small for carrying the packet over the UDP tunnel bearer, which requires a minimum headroom of 28 bytes. A change to just use pskb_copy() lets the clone inherit the original headroom of 80 bytes, but also assumes that the copied data area is of at least that size, something that is not always the case. So that is not a viable solution. We now fix this by adding a check for sufficient headroom in the transmit function of udp_media.c, and expanding it when necessary. Fixes: commit d999297c3dbbe ("tipc: reduce locking scope during packet reception") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	net: cavium: change NET_VENDOR_CAVIUM to bool	Andreas Schwab	1	-1/+1
	CONFIG_NET_VENDOR_CAVIUM is only used to hide/show config options and to include subdirectories in the build, so it doesn't make sense to make it tristate. Signed-off-by: Andreas Schwab <schwab@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	tipc: allow non-linear first fragment buffer	Jon Paul Maloy	1	-3/+9
	The current code for message reassembly is erroneously assuming that the the first arriving fragment buffer always is linear, and then goes ahead resetting the fragment list of that buffer in anticipation of more arriving fragments. However, if the buffer already happens to be non-linear, we will inadvertently drop the already attached fragment list, and later on trig a BUG() in __pskb_pull_tail(). We see this happen when running fragmented TIPC multicast across UDP, something made possible since commit d0f91938bede ("tipc: add ip/udp media type") We fix this by not resetting the fragment list when the buffer is non- linear, and by initiatlizing our private fragment list tail pointer to the tail of the existing fragment list. Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	openvswitch: Allocate memory for ovs internal device stats.	James Morse	1	-3/+43
	"openvswitch: Remove vport stats" removed the per-vport statistics, in order to use the netdev's statistics fields. "openvswitch: Fix ovs_vport_get_stats()" fixed the export of these stats to user-space, by using the provided netdev_ops to collate them - but ovs internal devices still use an unallocated dev->tstats field to count packets, which are no longer exported by this api. Allocate the dev->tstats field for ovs internal devices, and wire up ndo_get_stats64 with the original implementation of ovs_vport_get_stats(). On its own, "openvswitch: Fix ovs_vport_get_stats()" fixes the OOPs, unmasking a full-on panic on arm64: =============%<============== [<ffffffbffc00ce4c>] internal_dev_recv+0xa8/0x170 [openvswitch] [<ffffffbffc0008b4>] do_output.isra.31+0x60/0x19c [openvswitch] [<ffffffbffc000bf8>] do_execute_actions+0x208/0x11c0 [openvswitch] [<ffffffbffc001c78>] ovs_execute_actions+0xc8/0x238 [openvswitch] [<ffffffbffc003dfc>] ovs_packet_cmd_execute+0x21c/0x288 [openvswitch] [<ffffffc0005e8c5c>] genl_family_rcv_msg+0x1b0/0x310 [<ffffffc0005e8e60>] genl_rcv_msg+0xa4/0xe4 [<ffffffc0005e7ddc>] netlink_rcv_skb+0xb0/0xdc [<ffffffc0005e8a94>] genl_rcv+0x38/0x50 [<ffffffc0005e76c0>] netlink_unicast+0x164/0x210 [<ffffffc0005e7b70>] netlink_sendmsg+0x304/0x368 [<ffffffc0005a21c0>] sock_sendmsg+0x30/0x4c [SNIP] Kernel panic - not syncing: Fatal exception in interrupt =============%<============== Fixes: 8c876639c985 ("openvswitch: Remove vport stats.") Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	net: Really fix vti6 with oif in dst lookups	David Ahern	2	-1/+5
	6e28b000825d ("net: Fix vti use case with oif in dst lookups for IPv6") is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them. Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	tipc: extend broadcast link window size	Jon Paul Maloy	1	-3/+5
	The default fix broadcast window size is currently set to 20 packets. This is a very low value, set at a time when we were still testing on 10 Mb/s hubs, and a change to it is long overdue. Commit 7845989cb4b3da1db ("net: tipc: fix stall during bclink wakeup procedure") revealed a problem with this low value. For messages of importance LOW, the backlog queue limit will be calculated to 30 packets, while a single, maximum sized message of 66000 bytes, carried across a 1500 MTU network consists of 46 packets. This leads to the following scenario (among others leading to the same situation): 1: Msg 1 of 46 packets is sent. 20 packets go to the transmit queue, 26 packets to the backlog queue. 2: Msg 2 of 46 packets is attempted sent, but rejected because there is no more space in the backlog queue at this level. The sender is added to the wakeup queue with a "pending packets chain size" number of 46. 3: Some packets in the transmit queue are acked and released. We try to wake up the sender, but the pending size of 46 is bigger than the LOW wakeup limit of 30, so this doesn't happen. 5: Subsequent acks releases all the remaining buffers. Each time we test for the wakeup criteria and find that 46 still is larger than 30, even after both the transmit and the backlog queues are empty. 6: The sender is never woken up and given a chance to send its message. He is stuck. We could now loosen the wakeup criteria (used by link_prepare_wakeup()) to become equal to the send criteria (used by tipc_link_xmit()), i.e., by ignoring the "pending packets chain size" value altogether, or we can just increase the queue limits so that the criteria can be satisfied anyway. There are good reasons (potentially multiple waiting senders) to not opt for the former solution, so we choose the latter one. This commit fixes the problem by giving the broadcast link window a default value of 50 packets. We also introduce a new minimum link window size BCLINK_MIN_WIN of 32, which is enough to always avoid the described situation. Finally, in order to not break any existing users which may set the window explicitly, we enforce that the window is set to the new minimum value in case the user is trying to set it to anything lower. Fixes: 7845989cb4b3da1db ("net: tipc: fix stall during bclink wakeup procedure") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	irda: precedence bug in irlmp_seq_hb_idx()	Dan Carpenter	1	-1/+1
	This is decrementing the pointer, instead of the value stored in the pointer. KASan detects it as an out of bounds reference. Reported-by: "Berry Cheng 程君(成淼)" <chengmiao.cj@alibaba-inc.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	xen-netfront: update num_queues to real created	Joe Jin	1	-7/+7
	Sometimes xennet_create_queues() may failed to created all requested queues, we need to update num_queues to real created to avoid NULL pointer dereference. Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	vsock: fix missing cleanup when misc_register failed	Gao feng	1	-3/+4
	reset transport and unlock if misc_register failed. Signed-off-by: Gao feng <omarapazanadi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	net: mv643xx_eth: Defer writing the first TX descriptor when using TSO	Philipp Kirchhofer	1	-3/+23
	To prevent a race between the TX DMA engine and the CPU the writing of the first transmit descriptor must be deferred until all following descriptors have been updated. The network card may otherwise start transmitting before all packet descriptors are set up correctly, which leads to data corruption or an aborted transmit operation. This deferral is already done in the non-TSO TX path, implement it also in the TSO TX path. Signed-off-by: Philipp Kirchhofer <philipp@familie-kirchhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	net: mv643xx_eth: Ensure proper data alignment in TSO TX path	Philipp Kirchhofer	1	-5/+17
	The TX DMA engine requires that buffers with a size of 8 bytes or smaller must be 64 bit aligned. This requirement may be violated when doing TSO, as in this case larger skb frags can be broken up and transmitted in small parts with then inappropriate alignment. Fix this by checking for proper alignment before handing a buffer to the DMA engine. If the data is misaligned realign it by copying it into the TSO header data area. Signed-off-by: Philipp Kirchhofer <philipp@familie-kirchhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	net: phy: smsc: disable energy detect mode	Heiko Schocher	2	-5/+38
	On some boards the energy enable detect mode leads in trouble with some switches, so make the enabling of this mode configurable through DT. Signed-off-by: Heiko Schocher <hs@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	drivers: net: cpsw: add phy-handle parsing	Heiko Schocher	2	-4/+12
	add the ability to parse "phy-handle". This is needed for phys, which have a DT node, and need to parse DT properties. Signed-off-by: Heiko Schocher <hs@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21	bcm63xx_enet: check 1000BASE-T advertisement configuration	Simon Arlott	1	-14/+19
	If a gigabit ethernet PHY is connected to a fast ethernet MAC, then it can detect 1000 support from the partner but not use it. This results in a forced speed of 1000 and RX/TX failure. Check for 1000BASE-T support and then check the advertisement configuration before setting the MAC speed to 1000mbit. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18	net: bcmgenet: Fix early link interrupt enabling	Florian Fainelli	1	-9/+23
	Link interrupts are enabled in init_umac(), which is too early for us to process them since we do not yet have a valid PHY device pointer. On BCM7425 chips for instance, we will crash calling phy_mac_interrupt() because phydev is NULL. Fix this by moving the link interrupts enabling in bcmgenet_netif_start(), under a specific function: bcmgenet_link_intr_enable() and while at it, update the comments surrounding the code. Fixes: 6cc8e6d4dcb36 ("net: bcmgenet: Delay PHY initialization to bcmgenet_open()") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18	tunnels: Don't require remote endpoint or ID during creation.	Jesse Gross	2	-10/+9
	Before lightweight tunnels existed, it really didn't make sense to create a tunnel that was not fully specified, such as without a destination IP address - the resulting packets would go nowhere. However, with lightweight tunnels, the opposite is true - it doesn't make sense to require this information when it will be provided later on by the route. This loosens the requirements for this information. An alternative would be to allow the relaxed version only when COLLECT_METADATA is enabled. However, since there are several variations on this theme (such as NBMA tunnels in GRE), just dropping the restrictions seems the most consistent across tunnels and with the existing configuration. CC: John Linville <linville@tuxdriver.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18	openvswitch: Scrub skb between namespaces	Joe Stringer	1	-0/+9
	If OVS receives a packet from another namespace, then the packet should be scrubbed. However, people have already begun to rely on the behaviour that skb->mark is preserved across namespaces, so retain this one field. This is mainly to address information leakage between namespaces when using OVS internal ports, but by placing it in ovs_vport_receive() it is more generally applicable, meaning it should not be overlooked if other port types are allowed to be moved into namespaces in future. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18	xen-netback: correctly check failed allocation	Insu Yun	1	-0/+6
	Since vzalloc can be failed in memory pressure, writes -ENOMEM to xenstore to indicate error. Signed-off-by: Insu Yun <wuninsu@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18	net: asix: add support for the Billionton GUSB2AM-1G-B USB adapter	Chia-Sheng Chang	2	-0/+5
	Just another AX88178-based 10/100/1000 USB-to-Ethernet dongle. This one shows up in lsusb as: "ID 08dd:0114 Billionton Systems, Inc". Signed-off-by: Chia-Sheng Chang <changchias@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Luca Ceresoli <luca@lucaceresoli.net> Cc: Christoph Jaeger <cj@linux.com> Cc: "Woojung.Huh@microchip.com" <Woojung.Huh@microchip.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Markus Elfring <elfring@users.sourceforge.net> Cc: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Cc: netdev@vger.kernel.org Cc: linux-usb@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18	netlink: Trim skb to alloc size to avoid MSG_TRUNC	Arad, Ronen	1	-12/+22
	netlink_dump() allocates skb based on the calculated min_dump_alloc or a per socket max_recvmsg_len. min_alloc_size is maximum space required for any single netdev attributes as calculated by rtnl_calcit(). max_recvmsg_len tracks the user provided buffer to netlink_recvmsg. It is capped at 16KiB. The intention is to avoid small allocations and to minimize the number of calls required to obtain dump information for all net devices. netlink_dump packs as many small messages as could fit within an skb that was sized for the largest single netdev information. The actual space available within an skb is larger than what is requested. It could be much larger and up to near 2x with align to next power of 2 approach. Allowing netlink_dump to use all the space available within the allocated skb increases the buffer size a user has to provide to avoid truncaion (i.e. MSG_TRUNG flag set). It was observed that with many VLANs configured on at least one netdev, a larger buffer of near 64KiB was necessary to avoid "Message truncated" error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when min_alloc_size was only little over 32KiB. This patch trims skb to allocated size in order to allow the user to avoid truncation with more reasonable buffer size. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18	Linux 4.3-rc6	Linus Torvalds	1	-1/+1

2015-10-18	i2c: designware: Do not use parameters from ACPI on Dell Inspiron 7348	Mika Westerberg	1	-0/+20
	ACPI SSCN/FMCN methods were originally added because then the platform can provide the most accurate HCNT/LCNT values to the driver. However, this seems not to be true for Dell Inspiron 7348 where using these causes the touchpad to fail in boot: i2c_hid i2c-DLL0675:00: failed to retrieve report from device. i2c_designware INT3433:00: i2c_dw_handle_tx_abort: lost arbitration i2c_hid i2c-DLL0675:00: failed to retrieve report from device. i2c_designware INT3433:00: controller timed out The values received from ACPI are (in fast mode): HCNT: 72 LCNT: 160 this translates to following timings (input clock is 100MHz on Broadwell): tHIGH: 720 ns (spec min 600 ns) tLOW: 1600 ns (spec min 1300 ns) Bus period: 2920 ns (assuming 300 ns tf and tr) Bus speed: 342.5 kHz Both tHIGH and tLOW are within the I2C specification. The calculated values when ACPI parameters are not used are (in fast mode): HCNT: 87 LCNT: 159 which translates to: tHIGH: 870 ns (spec min 600 ns) tLOW: 1590 ns (spec min 1300 ns) Bus period 3060 ns (assuming 300 ns tf and tr) Bus speed 326.8 kHz These values are also within the I2C specification. Since both ACPI and calculated values meet the I2C specification timing requirements it is hard to say why the touchpad does not function properly with the ACPI values except that the bus speed is higher in this case (but still well below the max 400kHz). Solve this by adding DMI quirk to the driver that disables using ACPI parameters on this particulare machine. Reported-by: Pavel Roskin <plroskin@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Pavel Roskin <plroskin@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2015-10-17	net: add pfmemalloc check in sk_add_backlog()	Eric Dumazet	1	-0/+8
	Greg reported crashes hitting the following check in __sk_backlog_rcv() BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); The pfmemalloc bit is currently checked in sk_filter(). This works correctly for TCP, because sk_filter() is ran in tcp_v[46]_rcv() before hitting the prequeue or backlog checks. For UDP or other protocols, this does not work, because the sk_filter() is ran from sock_queue_rcv_skb(), which might be called _after_ backlog queuing if socket is owned by user by the time packet is processed by softirq handler. Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Greg Thelen <gthelen@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-17	netfilter: ipset: Fix sleeping memory allocation in atomic context	Nikolay Borisov	1	-1/+1
	Commit 00590fdd5be0 introduced RCU locking in list type and in doing so introduced a memory allocation in list_set_add, which is done in an atomic context, due to the fact that ipset rcu list modifications are serialised with a spin lock. The reason why we can't use a mutex is that in addition to modifying the list with ipset commands, it's also being modified when a particular ipset rule timeout expires aka garbage collection. This gc is triggered from set_cleanup_entries, which in turn is invoked from a timer thus requiring the lock to be bh-safe. Concretely the following call chain can lead to "sleeping function called in atomic context" splat: call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL). And since GFP_KERNEL allows initiating direct reclaim thus potentially sleeping in the allocation path. To fix the issue change the allocation type to GFP_ATOMIC, to correctly reflect that it is occuring in an atomic context. Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list type") Signed-off-by: Nikolay Borisov <kernel@kyup.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16	sh: add copy_user_page() alias for __copy_user()	Ross Zwisler	1	-0/+1
	copy_user_page() is needed by DAX. Without this we get a compile error for DAX on SH: fs/dax.c:280:2: error: implicit declaration of function `copy_user_page' [-Werror=implicit-function-declaration] copy_user_page(vto, (void __force *)vfrom, vaddr, to); ^ This was done with a random config that happened to include DAX support. This patch has only been compile tested. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-16	lib/Kconfig: ZLIB_DEFLATE must select BITREVERSE	Andrew Morton	1	-0/+1
	lib/built-in.o: In function `__bitrev32': deftree.c:(.text+0x1e799): undefined reference to `byte_rev_table' deftree.c:(.text+0x1e7a0): undefined reference to `byte_rev_table' deftree.c:(.text+0x1e7b4): undefined reference to `byte_rev_table' deftree.c:(.text+0x1e7c1): undefined reference to `byte_rev_table' Anything which uses bitrevX() has to select BITREVERSE, to grab lib/bitrev.o. Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-16	mm, dax: fix DAX deadlocks	Ross Zwisler	2	-41/+31
	The following two locking commits in the DAX code: commit 843172978bb9 ("dax: fix race between simultaneous faults") commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX") introduced a number of deadlocks and other issues which need to be fixed for the v4.3 kernel. The list of issues in DAX after these commits (some newly introduced by the commits, some preexisting) can be found here: https://lkml.org/lkml/2015/9/25/602 (Subject: "Re: [PATCH] dax: fix deadlock in __dax_fault"). This undoes most of the changes introduced by those two commits, essentially returning us to the DAX locking scheme that was used in v4.2. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Tested-by: Dave Chinner <dchinner@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-16	memcg: convert threshold to bytes	Shaohua Li	1	-0/+1
	page_counter_memparse() returns pages for the threshold, while mem_cgroup_usage() returns bytes for memory usage. Convert the threshold to bytes. Fixes: 3e32cb2e0a12b6915 ("memcg: rename cgroup_event to mem_cgroup_event"). Signed-off-by: Shaohua Li <shli@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-16	builddeb: remove debian/files before build	Riku Voipio	1	-2/+2
	Commit 3716001bcb7f ("deb-pkg: add source package") added the ability to create a debian changelog file. This exposed that previously the builddeb script hasn't cleared debian/files between builds. As debian/files keeps accumulating entries, the changes file will end up growing indefinelty. With outdated entries in debian/files, builddeb script will exit with failure. This regression impacts those who use "make deb-pkg" target to build kernel into a .deb package and never use "make mrproper" or other means to clean kernel tree from generated directories. To fix the regression, remove debian/files before starting build and in the generated clean rule. Fixes: 3716001bcb7f ("deb-pkg: add source package") Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Reported-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net> Tested-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Ben Hutchings <ben@decadent.org.uk> Cc: Michal Marek <mmarek@suse.cz> Cc: maximilian attems <maks@stro.at> Cc: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-16	mm, fs: obey gfp_mapping for add_to_page_cache()	Michal Hocko	6	-18/+22
	Commit 6afdb859b710 ("mm: do not ignore mapping_gfp_mask in page cache allocation paths") has caught some users of hardcoded GFP_KERNEL used in the page cache allocation paths. This, however, wasn't complete and there were others which went unnoticed. Dave Chinner has reported the following deadlock for xfs on loop device: : With the recent merge of the loop device changes, I'm now seeing : XFS deadlock on my single CPU, 1GB RAM VM running xfs/073. : : The deadlocked is as follows: : : kloopd1: loop_queue_read_work : xfs_file_iter_read : lock XFS inode XFS_IOLOCK_SHARED (on image file) : page cache read (GFP_KERNEL) : radix tree alloc : memory reclaim : reclaim XFS inodes : log force to unpin inodes : <wait for log IO completion> : : xfs-cil/loop1: <does log force IO work> : xlog_cil_push : xlog_write : <loop issuing log writes> : xlog_state_get_iclog_space() : <blocks due to all log buffers under write io> : <waits for IO completion> : : kloopd1: loop_queue_write_work : xfs_file_write_iter : lock XFS inode XFS_IOLOCK_EXCL (on image file) : <wait for inode to be unlocked> : : i.e. the kloopd, with it's split read and write work queues, has : introduced a dependency through memory reclaim. i.e. that writes : need to be able to progress for reads make progress. : : The problem, fundamentally, is that mpage_readpages() does a : GFP_KERNEL allocation, rather than paying attention to the inode's : mapping gfp mask, which is set to GFP_NOFS. : : The didn't used to happen, because the loop device used to issue : reads through the splice path and that does: : : error = add_to_page_cache_lru(page, mapping, index, : GFP_KERNEL & mapping_gfp_mask(mapping)); This has changed by commit aa4d86163e4 ("block: loop: switch to VFS ITER_BVEC"). This patch changes mpage_readpage{s} to follow gfp mask set for the mapping. There are, however, other places which are doing basically the same. lustre:ll_dir_filler is doing GFP_KERNEL from the function which apparently uses GFP_NOFS for other allocations so let's make this consistent. cifs:readpages_get_pages is called from cifs_readpages and __cifs_readpages_from_fscache called from the same path obeys mapping gfp. ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well regardless it uses mapping_gfp_mask for the page allocation. ext4_mpage_readpages is the called from the page cache allocation path same as read_pages and read_cache_pages As I've noticed in my previous post I cannot say I would be happy about sprinkling mapping_gfp_mask all over the place and it sounds like we should drop gfp_mask argument altogether and use it internally in __add_to_page_cache_locked that would require all the filesystems to use mapping gfp consistently which I am not sure is the case here. From a quick glance it seems that some file system use it all the time while others are selective. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Dave Chinner <david@fromorbit.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Ming Lei <ming.lei@canonical.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-16	rbd: use writefull op for object size writes	Ilya Dryomov	2	-6/+16
	This covers only the simplest case - an object size sized write, but it's still useful in tiering setups when EC is used for the base tier as writefull op can be proxied, saving an object promotion. Even though updating ceph_osdc_new_request() to allow writefull should just be a matter of fixing an assert, I didn't do it because its only user is cephfs. All other sites were updated. Reflects ceph.git commit 7bfb7f9025a8ee0d2305f49bf0336d2424da5b5b. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-10-16	rbd: set max_sectors explicitly	Ilya Dryomov	1	-0/+1
	Commit 30e2bc08b2bb ("Revert "block: remove artifical max_hw_sectors cap"") restored a clamp on max_sectors. It's now 2560 sectors instead of 1024, but it's not good enough: we set max_hw_sectors to rbd object size because we don't want object sized I/Os to be split, and the default object size is 4M. So, set max_sectors to max_hw_sectors in rbd at queue init time. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-10-16	timekeeping: Increment clock_was_set_seq in timekeeping_init()	Thomas Gleixner	1	-1/+1
	timekeeping_init() can set the wall time offset, so we need to increment the clock_was_set_seq counter. That way hrtimers will pick up the early offset immediately. Otherwise on a machine which does not set wall time later in the boot process the hrtimer offset is stale at 0 and wall time timers are going to expire with a delay of 45 years. Fixes: 868a3e915f7f "hrtimer: Make offset update smarter" Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stefan Liebler <stli@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
2015-10-16	genirq/msi: Do not use pci_msi_[un]mask_irq as default methods	Marc Zyngier	2	-5/+5
	When we create a generic MSI domain, that MSI_FLAG_USE_DEF_CHIP_OPS is set, and that any of .mask or .unmask are NULL in the irq_chip structure, we set them to pci_msi_[un]mask_irq. This is a bad idea for at least two reasons: - PCI_MSI might not be selected, kernel fails to build (yes, this is legitimate, at least on arm64!) - This may not be a PCI/MSI domain at all (platform MSI, for example) Either way, this looks wrong. Move the overriding of mask/unmask to the PCI counterpart, and panic is any of these two methods is not set in the core code (they really should be present). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/1444760085-27857-1-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-16	via-rhine: fix VLAN receive handling regression.	Andrej Ota	1	-1/+2
	Because eth_type_trans() consumes ethernet header worth of bytes, a call to read TCI from end of packet using rhine_rx_vlan_tag() no longer works as it's reading from an invalid offset. Tested to be working on PCEngines Alix board. Fixes: 810f19bcb862 ("via-rhine: add consistent memory barrier in vlan receive code.") Signed-off-by: Andrej Ota <andrej@ota.si> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16	ipv6: Initialize rt6_info properly in ip6_blackhole_route()	Martin KaFai Lau	1	-15/+5
	ip6_blackhole_route() does not initialize the newly allocated rt6_info properly. This patch: 1. Call rt6_info_init() to initialize rt6i_siblings and rt6i_uncached 2. The current rt->dst._metrics init code is incorrect: - 'rt->dst._metrics = ort->dst._metris' is not always safe - Not sure what dst_copy_metrics() is trying to do here considering ip6_rt_blackhole_cow_metrics() always returns NULL Fix: - Always do dst_copy_metrics() - Replace ip6_rt_blackhole_cow_metrics() with dst_cow_metrics_generic() 3. Mask out the RTF_PCPU bit from the newly allocated blackhole route. This bug triggers an oops (reported by Phil Sutter) in rt6_get_cookie(). It is because RTF_PCPU is set while rt->dst.from is NULL. Fixes: d52d3997f843 ("ipv6: Create percpu rt6_info") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Phil Sutter <phil@nwl.cc> Tested-by: Phil Sutter <phil@nwl.cc> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: Phil Sutter <phil@nwl.cc> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16	ipv6: Move common init code for rt6_info to a new function rt6_info_init()	Martin KaFai Lau	1	-6/+11
	Introduce rt6_info_init() to do the common init work for 'struct rt6_info' (after calling dst_alloc). It is a prep work to fix the rt6_info init logic in the ip6_blackhole_route(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: Phil Sutter <phil@nwl.cc> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16	Bluetooth: Fix initializing conn_params in scan phase	Jakub Pawlowski	2	-8/+20
	This patch makes sure that conn_params that were created just for explicit_connect, will get properly deleted during cleanup. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-16	Bluetooth: Fix conn_params list update in hci_connect_le_scan_cleanup	Johan Hedberg	1	-4/+19
	After clearing the params->explicit_connect variable the parameters may need to be either added back to the right list or potentially left absent from both the le_reports and the le_conns lists. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>