linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-04-11	net: vrf: Fix dev refcnt leak due to IPv6 prefix route	David Ahern	1	-0/+10
	ifupdown2 found a kernel bug with IPv6 routes and movement from the main table to the VRF table. Sequence of events: Create the interface and add addresses: ip link add dev eth4.105 link eth4 type vlan id 105 ip addr add dev eth4.105 8.105.105.10/24 ip -6 addr add dev eth4.105 2008:105:105::10/64 At this point IPv6 has inserted a prefix route in the main table even though the interface is 'down'. From there the VRF device is created: ip link add dev vrf105 type vrf table 105 ip addr add dev vrf105 9.9.105.10/32 ip -6 addr add dev vrf105 2000:9:105::10/128 ip link set vrf105 up Then the interface is enslaved, while still in the 'down' state: ip link set dev eth4.105 master vrf105 Since the device is down the VRF driver cycling the device does not send the NETDEV_UP and NETDEV_DOWN but rather the NETDEV_CHANGE event which does not flush the routes inserted prior. When the link is brought up ip link set dev eth4.105 up the prefix route is added in the VRF table, but does not remove the route from the main table. Fix by handling the NETDEV_CHANGEUPPER event similar what was implemented for IPv4 in 7f49e7a38b77 ("net: Flush local routes when device changes vrf association") Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11	net: vrf: Fix dst reference counting	David Ahern	5	-167/+30
	Vivek reported a kernel exception deleting a VRF with an active connection through it. The root cause is that the socket has a cached reference to a dst that is destroyed. Converting the dst_destroy to dst_release and letting proper reference counting kick in does not work as the dst has a reference to the device which needs to be released as well. I talked to Hannes about this at netdev and he pointed out the ipv4 and ipv6 dst handling has dst_ifdown for just this scenario. Rather than continuing with the reinvented dst wheel in VRF just remove it and leverage the ipv4 and ipv6 versions. Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver") Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11	tipc: purge deferred updates from dead nodes	Erik Hugne	1	-0/+19
	If a peer node becomes unavailable, in addition to removing the nametable entries from this node we also need to purge all deferred updates associated with this node. Signed-off-by: Erik Hugne <erik.hugne@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11	tipc: make dist queue pernet	Erik Hugne	3	-9/+11
	Nametable updates received from the network that cannot be applied immediately are placed on a defer queue. This queue is global to the TIPC module, which might cause problems when using TIPC in containers. To prevent nametable updates from escaping into the wrong namespace, we make the queue pernet instead. Signed-off-by: Erik Hugne <erik.hugne@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11	cxgb4: Stop Rx Queues before freeing it up	Hariprasad Shenai	3	-3/+53
	Stop all Ethernet RX Queues before freeing up various Ingress/Egress Queues, etc. We were seeing cases of Ingress Queues not getting serviced during the shutdown process leading to Ingress Paths jamming up through the chip and blocking the shutdown effort itself. One such case involved the Firmware sending a "Flush Token" through the ULP-TX -> ULP-RX path for an Ethernet TX Queue being freed in order to make sure there weren't any remaining TX Work Requests in the pipeline. But the return path was stalled by Ingress Data unable to be delivered to the Host because those Ingress Queues were no longer being serviced. Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-10	net: stmmac: socfgpa: Ensure emac bit set in System Manger for PTP	Phil Reid	1	-3/+13
	When using the PTP fpga to hps clock source for the stmmac module the appropriate bit in the System Manager FPGA Interface Group register needs to be set. This is not set by the bootloader setup when the HPS emac pins are being for this emac module. This allows the PTP clock to be sourced from the FPGA and also connects the PTP pps and ext trig signals to the stmmac PTP hardware. Patch proposed by Phil Collins. Signed-off-by: Phil Reid <preid@electromag.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-10	netlink: don't send NETLINK_URELEASE for unbound sockets	Dmitry Ivanov	1	-1/+1
	All existing users of NETLINK_URELEASE use it to clean up resources that were previously allocated to a socket via some command. As a result, no users require getting this notification for unbound sockets. Sending it for unbound sockets, however, is a problem because any user (including unprivileged users) can create a socket that uses the same ID as an existing socket. Binding this new socket will fail, but if the NETLINK_URELEASE notification is generated for such sockets, the users thereof will be tricked into thinking the socket that they allocated the resources for is closed. In the nl80211 case, this will cause destruction of virtual interfaces that still belong to an existing hostapd process; this is the case that Dmitry noticed. In the NFC case, it will cause a poll abort. In the case of netlink log/queue it will cause them to stop reporting events, as if NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called. Fix this problem by checking that the socket is bound before generating the NETLINK_URELEASE notification. Cc: stable@vger.kernel.org Signed-off-by: Dmitry Ivanov <dima@ubnt.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-10	decnet: Do not build routes to devices without decnet private data.	David S. Miller	1	-1/+8
	In particular, make sure we check for decnet private presence for loopback devices. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-10	sctp: avoid refreshing heartbeat timer too often	Marcelo Ricardo Leitner	5	-34/+47
	Currently on high rate SCTP streams the heartbeat timer refresh can consume quite a lot of resources as timer updates are costly and it contains a random factor, which a) is also costly and b) invalidates mod_timer() optimization for not editing a timer to the same value. It may even cause the timer to be slightly advanced, for no good reason. As suggested by David Laight this patch now removes this timer update from hot path by leaving the timer on and re-evaluating upon its expiration if the heartbeat is still needed or not, similarly to what is done for TCP. If it's not needed anymore the timer is re-scheduled to the new timeout, considering the time already elapsed. For this, we now record the last tx timestamp per transport, updated in the same spots as hb timer was restarted on tx. Also split up sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the heartbeat one. On loopback with MTU of 65535 and data chunks with 1636, so that we have a considerable amount of chunks without stressing system calls, netperf -t SCTP_STREAM -l 30, perf looked like this before: Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000 Overhead Command Shared Object Symbol + 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore - _raw_write_unlock_irqrestore - 96,54% _raw_spin_unlock_irqrestore - 36,14% mod_timer + 97,24% sctp_transport_reset_timers + 2,76% sctp_do_sm + 33,65% __wake_up_sync_key + 28,77% sctp_ulpq_tail_event + 1,40% del_timer - 1,84% mod_timer + 99,03% sctp_transport_reset_timers + 0,97% sctp_do_sm + 1,50% sctp_ulpq_tail_event And after this patch, now with netperf -l 60: Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000 Overhead Command Shared Object Symbol + 5,65% netperf [kernel.vmlinux] [k] memcpy_erms + 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore - _raw_spin_unlock_irqrestore + 49,89% __wake_up_sync_key + 45,68% sctp_ulpq_tail_event - 2,85% mod_timer + 76,51% sctp_transport_reset_t3_rtx + 23,49% sctp_do_sm + 1,55% del_timer + 2,50% netperf [sctp] [k] sctp_datamsg_from_user + 2,26% netperf [sctp] [k] sctp_sendmsg Throughput-wise, from 6800mbps without the patch to 7050mbps with it, ~3.7%. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-09	sh_eth: re-enable-E-MAC interrupts in sh_eth_set_ringparam()	Sergei Shtylyov	1	-5/+1
	The E-MAC interrupts are left disabled when the ring parameters are changed via 'ethtool'. In order to fix this, it's enough to call sh_eth_dev_init() with 'true' instead of 'false' for the second argument (which conveniently allows us to remove the following code re-enabling E-DMAC interrupts and reception). Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08	bridge, netem: mark mailing lists as moderated	stephen hemminger	1	-2/+2
	I moderate these (lightly loaded) lists to block spam. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08	tuntap: restore default qdisc	Jason Wang	1	-2/+2
	After commit f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev"), default qdisc was changed to noqueue because tuntap does not set tx_queue_len during .setup(). This patch restores default qdisc by setting tx_queue_len in tun_setup(). Fixes: f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev") Cc: Phil Sutter <phil@nwl.cc> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08	orangefs: remove unused variable	Martin Brandenburg	1	-3/+1
	Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08	orangefs: Add KERN_<LEVEL> to gossip_<level> macros	Joe Perches	1	-14/+17
	Emit the logging messages at the appropriate levels. Miscellanea: o Change format to fmt o Use the more common ##__VA_ARGS__ Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08	orangefs: strncpy -> strscpy	Martin Brandenburg	1	-1/+5
	It would have been possible for a rogue client-core to send in a symlink target which is not NUL terminated. This returns EIO if the client-core gives us corrupt data. Leave debugfs and superblock code as is for now. Other dcache.c and namei.c strncpy instances are safe because ORANGEFS_NAME_MAX = NAME_MAX + 1; there is always enough space for a name plus a NUL byte. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08	orangefs: clean up truncate ctime and mtime setting	Martin Brandenburg	1	-15/+1
	The ctime and mtime are always updated on a successful ftruncate and only updated on a successful truncate where the size changed. We handle the ``if the size changed'' bit. This matches FUSE's behavior. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08	Orangefs: fix ifnullfree.cocci warnings	kbuild test robot	1	-2/+1
	fs/orangefs/orangefs-debugfs.c:130:2-26: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values. NULL check before some freeing functions is not needed. Based on checkpatch warning "kfree(NULL) is safe this check is probably not required" and kfreeaddr.cocci by Julia Lawall. Generated by: scripts/coccinelle/free/ifnullfree.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08	Orangefs: optimize boilerplate code.	Mike Marshall	2	-2/+2
	Suggested by David Binderman <dcb314@hotmail.com> The former can potentially be a performance win over the latter. memcpy(d, s, len); memset(d+len, c, size-len); memset(d, c, size); memcpy(d, s, len); Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08	Orangefs: xattr.c cleanup	Mike Marshall	1	-16/+1
	1. It is nonsense to test for negative size_t, suggested by David Binderman <dcb314@hotmail.com> 2. By the time Orangefs gets called, the vfs has ensured that name != NULL, and that buffer and size are sane. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08	mpls: find_outdev: check for err ptr in addition to NULL check	Roopa Prabhu	1	-0/+3
	find_outdev calls inet{,6}_fib_lookup_dev() or dev_get_by_index() to find the output device. In case of an error, inet{,6}_fib_lookup_dev() returns error pointer and dev_get_by_index() returns NULL. But the function only checks for NULL and thus can end up calling dev_put on an ERR_PTR. This patch adds an additional check for err ptr after the NULL check. Before: Trying to add an mpls route with no oif from user, no available path to 10.1.1.8 and no default route: $ip -f mpls route add 100 as 200 via inet 10.1.1.8 [ 822.337195] BUG: unable to handle kernel NULL pointer dereference at 00000000000003a3 [ 822.340033] IP: [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182 [ 822.340033] PGD 1db38067 PUD 1de9e067 PMD 0 [ 822.340033] Oops: 0000 [#1] SMP [ 822.340033] Modules linked in: [ 822.340033] CPU: 0 PID: 11148 Comm: ip Not tainted 4.5.0-rc7+ #54 [ 822.340033] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 822.340033] task: ffff88001db82580 ti: ffff88001dad4000 task.ti: ffff88001dad4000 [ 822.340033] RIP: 0010:[<ffffffff8148781e>] [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182 [ 822.340033] RSP: 0018:ffff88001dad7a88 EFLAGS: 00010282 [ 822.340033] RAX: ffffffffffffff9b RBX: ffffffffffffff9b RCX: 0000000000000002 [ 822.340033] RDX: 00000000ffffff9b RSI: 0000000000000008 RDI: 0000000000000000 [ 822.340033] RBP: ffff88001ddc9ea0 R08: ffff88001e9f1768 R09: 0000000000000000 [ 822.340033] R10: ffff88001d9c1100 R11: ffff88001e3c89f0 R12: ffffffff8187e0c0 [ 822.340033] R13: ffffffff8187e0c0 R14: ffff88001ddc9e80 R15: 0000000000000004 [ 822.340033] FS: 00007ff9ed798700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [ 822.340033] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 822.340033] CR2: 00000000000003a3 CR3: 000000001de89000 CR4: 00000000000006f0 [ 822.340033] Stack: [ 822.340033] 0000000000000000 0000000100000000 0000000000000000 0000000000000000 [ 822.340033] 0000000000000000 0801010a00000000 0000000000000000 0000000000000000 [ 822.340033] 0000000000000004 ffffffff8148749b ffffffff8187e0c0 000000000000001c [ 822.340033] Call Trace: [ 822.340033] [<ffffffff8148749b>] ? mpls_rt_alloc+0x2b/0x3e [ 822.340033] [<ffffffff81488e66>] ? mpls_rtm_newroute+0x358/0x3e2 [ 822.340033] [<ffffffff810e7bbc>] ? get_page+0x5/0xa [ 822.340033] [<ffffffff813b7d94>] ? rtnetlink_rcv_msg+0x17e/0x191 [ 822.340033] [<ffffffff8111794e>] ? __kmalloc_track_caller+0x8c/0x9e [ 822.340033] [<ffffffff813c9393>] ? rht_key_hashfn.isra.20.constprop.57+0x14/0x1f [ 822.340033] [<ffffffff813b7c16>] ? __rtnl_unlock+0xc/0xc [ 822.340033] [<ffffffff813cb794>] ? netlink_rcv_skb+0x36/0x82 [ 822.340033] [<ffffffff813b4507>] ? rtnetlink_rcv+0x1f/0x28 [ 822.340033] [<ffffffff813cb2b1>] ? netlink_unicast+0x106/0x189 [ 822.340033] [<ffffffff813cb5b3>] ? netlink_sendmsg+0x27f/0x2c8 [ 822.340033] [<ffffffff81392ede>] ? sock_sendmsg_nosec+0x10/0x1b [ 822.340033] [<ffffffff81393df1>] ? ___sys_sendmsg+0x182/0x1e3 [ 822.340033] [<ffffffff810e4f35>] ? __alloc_pages_nodemask+0x11c/0x1e4 [ 822.340033] [<ffffffff8110619c>] ? PageAnon+0x5/0xd [ 822.340033] [<ffffffff811062fe>] ? __page_set_anon_rmap+0x45/0x52 [ 822.340033] [<ffffffff810e7bbc>] ? get_page+0x5/0xa [ 822.340033] [<ffffffff810e85ab>] ? __lru_cache_add+0x1a/0x3a [ 822.340033] [<ffffffff81087ea9>] ? current_kernel_time64+0x9/0x30 [ 822.340033] [<ffffffff813940c4>] ? __sys_sendmsg+0x3c/0x5a [ 822.340033] [<ffffffff8148f597>] ? entry_SYSCALL_64_fastpath+0x12/0x6a [ 822.340033] Code: 83 08 04 00 00 65 ff 00 48 8b 3c 24 e8 40 7c f2 ff eb 13 48 c7 c3 9f ff ff ff eb 0f 89 ce e8 f1 ae f1 ff 48 89 c3 48 85 db 74 15 <48> 8b 83 08 04 00 00 65 ff 08 48 81 fb 00 f0 ff ff 76 0d eb 07 [ 822.340033] RIP [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182 [ 822.340033] RSP <ffff88001dad7a88> [ 822.340033] CR2: 00000000000003a3 [ 822.435363] ---[ end trace 98cc65e6f6b8bf11 ]--- After patch: $ip -f mpls route add 100 as 200 via inet 10.1.1.8 RTNETLINK answers: Network is unreachable Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reported-by: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07	ipv6: Count in extension headers in skb->network_header	Jakub Sitnicki	1	-4/+4
	When sending a UDPv6 message longer than MTU, account for the length of fragmentable IPv6 extension headers in skb->network_header offset. Same as we do in alloc_new_skb path in __ip6_append_data(). This ensures that later on __ip6_make_skb() will make space in headroom for fragmentable extension headers: /* move skb->data to ip header from ext header */ if (skb->data < skb_network_header(skb)) __skb_pull(skb, skb_network_offset(skb)); Prevents a splat due to skb_under_panic: skbuff: skb_under_panic: text:ffffffff8143397b len:2126 put:14 \ head:ffff880005bacf50 data:ffff880005bacf4a tail:0x48 end:0xc0 dev:lo ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] KASAN CPU: 0 PID: 160 Comm: reproducer Not tainted 4.6.0-rc2 #65 [...] Call Trace: [<ffffffff813eb7b9>] skb_push+0x79/0x80 [<ffffffff8143397b>] eth_header+0x2b/0x100 [<ffffffff8141e0d0>] neigh_resolve_output+0x210/0x310 [<ffffffff814eab77>] ip6_finish_output2+0x4a7/0x7c0 [<ffffffff814efe3a>] ip6_output+0x16a/0x280 [<ffffffff815440c1>] ip6_local_out+0xb1/0xf0 [<ffffffff814f1115>] ip6_send_skb+0x45/0xd0 [<ffffffff81518836>] udp_v6_send_skb+0x246/0x5d0 [<ffffffff8151985e>] udpv6_sendmsg+0xa6e/0x1090 [...] Reported-by: Ji Jianwen <jiji@redhat.com> Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07	Revert "ib_srpt: Convert to percpu_ida tag allocation"	Bart Van Assche	2	-17/+40
	This reverts commit 0fd10721fe3664f7549e74af9d28a509c9a68719. That patch causes the ib_srpt driver to crash as soon as the first SCSI command is received: kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439! invalid opcode: 0000 [#1] SMP Workqueue: target_completion target_complete_ok_work [target_core_mod] RIP: srpt_queue_response+0x437/0x4a0 [ib_srpt] Call Trace: srpt_queue_data_in+0x9/0x10 [ib_srpt] target_complete_ok_work+0x152/0x2b0 [target_core_mod] process_one_work+0x197/0x480 worker_thread+0x49/0x490 kthread+0xea/0x100 ret_from_fork+0x22/0x40 Aside from the crash, the shortcomings of that patch are as follows: - It makes the ib_srpt driver use I/O contexts allocated by transport_alloc_session_tags() but it does not initialize these I/O contexts properly. All the initializations performed by srpt_alloc_ioctx() are skipped. - It swaps the order of the send ioctx allocation and the transition to RTR mode which is wrong. - The amount of memory that is needed for I/O contexts is doubled. - srpt_rdma_ch.free_list is no longer used but is not removed. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-07	RDS: fix congestion map corruption for PAGE_SIZE > 4k	shamir rabinovitch	1	-1/+1
	When PAGE_SIZE > 4k single page can contain 2 RDS fragments. If 'rds_ib_cong_recv' ignore the RDS fragment offset in to the page it then read the data fragment as far congestion map update and lead to corruption of the RDS connection far congestion map. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07	RDS: memory allocated must be align to 8	shamir rabinovitch	1	-2/+2
	Fix issue in 'rds_ib_cong_recv' when accessing unaligned memory allocated by 'rds_page_remainder_alloc' using uint64_t pointer. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07	GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU	Alexander Duyck	5	-4/+29
	This patch fixes an issue I found in which we were dropping frames if we had enabled checksums on GRE headers that were encapsulated by either FOU or GUE. Without this patch I was barely able to get 1 Gb/s of throughput. With this patch applied I am now at least getting around 6 Gb/s. The issue is due to the fact that with FOU or GUE applied we do not provide a transport offset pointing to the GRE header, nor do we offload it in software as the GRE header is completely skipped by GSO and treated like a VXLAN or GENEVE type header. As such we need to prevent the stack from generating it and also prevent GRE from generating it via any interface we create. Fixes: c3483384ee511 ("gro: Allow tunnel stacking in the case of FOU/GUE") Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07	PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs	Strashko, Grygorii	1	-0/+2
	Now wakeirq stops working for device if wakeup option for this device will be reconfigured through sysfs, like: echo disabled > /sys/devices/platform/extcon_usb1/power/wakeup echo enabled > /sys/devices/platform/extcon_usb1/power/wakeup Once above set of commands is executed the device's wakeup_source opject will be recreated and dev->power.wakeup->wakeirq field will contain NULL. As result, device_wakeup_arm_wake_irqs() will not arm wakeirq for the affected device. Hece, lets try to fix it in the following way: check for dev->wakeirq field when device_wakeup_attach() is called and if !NULL re-attach wakeirq to the device Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	tools/power turbostat: work around RC6 counter wrap	Len Brown	1	-4/+15
	Sometimes the rc6 sysfs counter spontaneously resets, causing turbostat prints a very large number as it tries to calcuate % = 100 * (old - new) / interval When we see (old > new), print *.% instead of a bogus huge number. Note that this detection is not fool-proof, as the counter could reset several times and still result in new > old. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	tools/power turbostat: initial KBL support	Len Brown	1	-0/+14
	KBL is similar to SKL Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	tools/power turbostat: initial SKX support	Len Brown	1	-1/+8
	SKX has a lot in common with HSX Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	tools/power turbostat: decode BXT TSC frequency via CPUID	Len Brown	1	-1/+4
	Hard-code BXT ART to 19200MHz, so turbostat --debug can fully enumerate TSC: CPUID(0x15): eax_crystal: 3 ebx_tsc: 186 ecx_crystal_hz: 0 TSC: 1190 MHz (19200000 Hz * 186 / 3 / 1000000) Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	tools/power turbostat: initial BXT support	Len Brown	1	-0/+9
	Broxton has a lot in common with SKL Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	tools/power turbostat: print IRTL MSRs	Len Brown	2	-3/+64
	Some processors use the Interrupt Response Time Limit (IRTL) MSR value to describe the maximum IRQ response time latency for deep package C-states. (Though others have the register, but do not use it) Lets print it out to give insight into the cases where it is used. IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	tools/power turbostat: SGX state should print only if --debug	Len Brown	1	-1/+1
	The CPUID.SGX bit was printed, even if --debug was used Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Add KBL support	Len Brown	1	-0/+2
	KBL is similar to SKL Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Add SKX support	Len Brown	1	-0/+34
	SKX is similar to BDX Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Clean up all registered devices on exit.	Richard Cochran	1	-1/+8
	This driver registers cpuidle devices when a CPU comes online, but it leaves the registrations in place when a CPU goes offline. The module exit code only unregisters the currently online CPUs, leaving the devices for offline CPUs dangling. This patch changes the driver to clean up all registrations on exit, even those from CPUs that are offline. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Propagate hot plug errors.	Richard Cochran	1	-2/+5
	If a cpuidle registration error occurs during the hot plug notifier callback, we should really inform the hot plug machinery instead of just ignoring the error. This patch changes the callback to properly return on error. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Don't overreact to a cpuidle registration failure.	Richard Cochran	1	-1/+1
	The helper function, intel_idle_cpu_init, registers one new device with the cpuidle layer. If the registration should fail, that function immediately calls intel_idle_cpuidle_devices_uninit() to unregister every last CPU's device. However, it makes no sense to do so, when called from the hot plug notifier callback. This patch moves the call to intel_idle_cpuidle_devices_uninit() outside of the helper function to the one call site that actually needs to perform the de-registrations. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Setup the timer broadcast only on successful driver load.	Richard Cochran	1	-7/+8
	This driver sets the broadcast tick quite early on during probe and does not clean up again in cast of failure. This patch moves the setup call after the registration, placing the on_each_cpu() calls within the global CPU lock region. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Avoid a double free of the per-CPU data.	Richard Cochran	1	-4/+3
	The helper function, intel_idle_cpuidle_devices_uninit, frees the globally allocated per-CPU data. However, this function is invoked from the hot plug notifier callback at a time when freeing that data is not safe. If the call to cpuidle_register_driver() should fail (say, due to lack of memory), then the driver will free its per-CPU region. On the next CPU_ONLINE event, the driver will happily use the region again and even free it again if the failure repeats. This patch fixes the issue by moving the call to free_percpu() outside of the helper function at the two call sites that actually need to free the per-CPU data. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Fix dangling registration on error path.	Richard Cochran	1	-4/+5
	In the module_init() method, if the per-CPU allocation fails, then the active cpuidle registration is not cleaned up. This patch fixes the issue by attempting the allocation before registration, and then cleaning it up again on registration failure. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Fix deallocation order on the driver exit path.	Richard Cochran	1	-3/+3
	In the module_exit() method, this driver first frees its per-CPU pointer, then unregisters a callback making use of the pointer. Furthermore, the function, intel_idle_cpuidle_devices_uninit, is racy against CPU hot plugging as it calls for_each_online_cpu(). This patch corrects the issues by unregistering first on the exit path while holding the hot plug lock. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Remove redundant initialization calls.	Richard Cochran	1	-6/+0
	The function, intel_idle_cpuidle_driver_init, makes calls on each CPU to auto_demotion_disable() and c1e_promotion_disable(). These calls are redundant, as intel_idle_cpu_init() does the same calls just a bit later on. They are also premature, as the driver registration may yet fail. This patch removes the redundant code. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: Fix a helper function's return value.	Richard Cochran	1	-3/+1
	The function, intel_idle_cpuidle_driver_init, delivers no error codes at all. This patch changes the function to return 'void' instead of returning zero. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	intel_idle: remove useless return from void function.	Richard Cochran	1	-2/+0
	Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07	iommu/vt-d: Silence an uninitialized variable warning	Dan Carpenter	1	-1/+1
	My static checker complains that "dma_alias" is uninitialized unless we are dealing with a pci device. This is true but harmless. Anyway, we can flip the condition around to silence the warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-04-07	iommu/rockchip: Fix "is stall active" check	John Keeping	1	-4/+4
	Since commit cd6438c5f844 ("iommu/rockchip: Reconstruct to support multi slaves") rk_iommu_is_stall_active() always returns false because the bitwise AND operates on the boolean flag promoted to an integer and a value that is either zero or BIT(2). Explicitly convert the right-hand value to a boolean so that both sides are guaranteed to be either zero or one. rk_iommu_is_paging_enabled() does not suffer from the same problem since RK_MMU_STATUS_PAGING_ENABLED is BIT(0), but let's apply the same change for consistency and to make it clear that it's correct without needing to lookup the value. Fixes: cd6438c5f844 ("iommu/rockchip: Reconstruct to support multi slaves") Signed-off-by: John Keeping <john@metanate.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-04-07	iommu: Don't overwrite domain pointer when there is no default_domain	Joerg Roedel	1	-1/+2
	IOMMU drivers that do not support default domains, but make use of the the group->domain pointer can get that pointer overwritten with NULL on device add/remove. Make sure this can't happen by only overwriting the domain pointer when it is NULL. Cc: stable@vger.kernel.org # v4.4+ Fixes: 1228236de5f9 ('iommu: Move default domain allocation to iommu_group_get_for_dev()') Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-04-07	virtio: add VIRTIO_CONFIG_S_NEEDS_RESET device status bit	Stefan Hajnoczi	1	-0/+2
	The VIRTIO 1.0 specification added the DEVICE_NEEDS_RESET device status bit in "VIRTIO-98: Add DEVICE_NEEDS_RESET". This patch defines the device status bit in the uapi header file so that both the kernel and userspace applications can use it. The bit is currently unused by the virtio guest drivers and vhost. According to the spec "a good implementation will try to recover by issuing a reset". This is not attempted here because it requires auditing the virtio drivers to ensure there are no resource leaks or crashes if the device needs to be reset mid-operation. See "2.1 Device Status Field" in the VIRTIO 1.0 specification for details. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07	MAINTAINERS: add entry for QEMU	Michael S. Tsirkin	1	-0/+7
	Gabriel merged support for QEMU FW CFG interface, but there's apparently no official maintainer. It's also possible that this will grow more interfaces in future. I'll happily co-maintain it and handle pull requests together with the rest of the PV stuff I maintain. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gabriel Somlo <somlo@cmu.edu> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gabriel Somlo <somlo@cmu.edu>