linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-04-08	Copy the kernel module data from user space in chunks	Linus Torvalds	1	-1/+18
	Unlike most (all?) other copies from user space, kernel module loading is almost unlimited in size. So we do a potentially huge "copy_from_user()" when we copy the module data from user space to the kernel buffer, which can be a latency concern when preemption is disabled (or voluntary). Also, because 'copy_from_user()' clears the tail of the kernel buffer on failures, even a failed copy can end up wasting a lot of time. Normally neither of these are concerns in real life, but they do trigger when doing stress-testing with trinity. Running in a VM seems to add its own overheadm causing trinity module load testing to even trigger the watchdog. The simple fix is to just chunk up the module loading, so that it never tries to copy insanely big areas in one go. That bounds the latency, and also the amount of (unnecessarily, in this case) cleared memory for the failure case. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-08	x86: clean up/fix 'copy_in_user()' tail zeroing	Linus Torvalds	2	-9/+8
	The rule for 'copy_from_user()' is that it zeroes the remaining kernel buffer even when the copy fails halfway, just to make sure that we don't leave uninitialized kernel memory around. Because even if we check for errors, some kernel buffers stay around after thge copy (think page cache). However, the x86-64 logic for user copies uses a copy_user_generic() function for all the cases, that set the "zerorest" flag for any fault on the source buffer. Which meant that it didn't just try to clear the kernel buffer after a failure in copy_from_user(), it also tried to clear the destination user buffer for the "copy_in_user()" case. Not only is that pointless, it also means that the clearing code has to worry about the tail clearing taking page faults for the user buffer case. Which is just stupid, since that case shouldn't happen in the first place. Get rid of the whole "zerorest" thing entirely, and instead just check if the destination is in kernel space or not. And then just use memset() to clear the tail of the kernel buffer if necessary. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-08	Revert "sparc/PCI: Clip bridge windows to fit in upstream windows"	Bjorn Helgaas	1	-4/+1
	This reverts commit d63e2e1f3df904bf6bd150bdafb42ddbb3257ea8. David Ahern reported that d63e2e1f3df9 breaks booting on an 8-socket T5 sparc system. He also verified that the system boots with d63e2e1f3df9 reverted. Yinghai has some fixes, but they need a little more polishing than we can do before v4.0. Link: http://lkml.kernel.org/r/5514391F.2030300@oracle.com # report Link: http://lkml.kernel.org/r/1427857069-6789-1-git-send-email-yinghai@kernel.org # patches Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.19+
2015-04-08	PCI: Don't look for ACPI hotplug parameters if ACPI is disabled	Bjorn Helgaas	1	-0/+3
	Booting a v3.18 or newer Xen domU kernel with PCI devices passed through results in an oops (this is a 32-bit 3.13.11 dom0 with a 64-bit 4.4.0 hypervisor and 32-bit domU): BUG: unable to handle kernel paging request at 0030303e IP: [<c06ed0e6>] acpi_ns_validate_handle+0x12/0x1a Call Trace: [<c06eda4d>] ? acpi_evaluate_object+0x31/0x1fc [<c06b78e1>] ? pci_get_hp_params+0x111/0x4e0 [<c0407bc7>] ? xen_force_evtchn_callback+0x17/0x30 [<c04085fb>] ? xen_restore_fl_direct_reloc+0x4/0x4 [<c0699d34>] ? pci_device_add+0x24/0x450 Don't look for ACPI configuration information if ACPI has been disabled. I don't think this is the best fix, because we can boot plain Linux (no Xen) with "acpi=off", and we don't need this check in pci_get_hp_params(). There should be a better fix that would make Xen domU work the same way. The domU kernel has ACPI support but it has no AML. There should be a way to initialize the ACPI data structures so things fail gracefully rather than oopsing. This is an interim fix to address the regression. Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration") Link: https://bugzilla.kernel.org/show_bug.cgi?id=96301 Reported-by: Michael D Labriola <mlabriol@gdeb.com> Tested-by: Michael D Labriola <mlabriol@gdeb.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.18+
2015-04-08	drm: fix drm_mode_getconnector() locking imbalance regression	Tommi Rantala	1	-1/+3
	Regression in commit 2caa80e72b57c6216aec6f6a11fcfb4fec46daa0 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Sun Feb 22 11:38:36 2015 +0100 drm: Fix deadlock due to getconnector locking changes If the drm_connector_find() call returns NULL, we should no longer call drm_modeset_unlock() to avoid locking imbalance. Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-07	mm: numa: disable change protection for vma(VM_HUGETLB)	Naoya Horiguchi	1	-1/+3
	Currently when a process accesses a hugetlb range protected with PROTNONE, unexpected COWs are triggered, which finally puts the hugetlb subsystem into a broken/uncontrollable state, where for example h->resv_huge_pages is subtracted too much and wraps around to a very large number, and the free hugepage pool is no longer maintainable. This patch simply stops changing protection for vma(VM_HUGETLB) to fix the problem. And this also allows us to avoid useless overhead of minor faults. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Suggested-by: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-07	include/linux/dmapool.h: declare struct device	Mark Brown	1	-0/+2
	dmapool uses struct device in function arguments but relies on an implicit inclusion to declare struct device causing warnings in some configurations: include/linux/dmapool.h:31:7: warning: 'struct device' declared inside parameter list Fix this by adding a struct device declaration to the file. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-07	mm: move zone lock to a different cache line than order-0 free page lists	Mel Gorman	1	-4/+3
	Huang Ying reported the following problem due to commit 3484b2de9499 ("mm: rearrange zone fields into read-only, page alloc, statistics and page reclaim lines") from the Intel performance tests 24b7e5819ad5cbef 3484b2de9499df23c4604a513b ---------------- -------------------------- %stddev %change %stddev \ \| \ 152288 \261 0% -46.2% 81911 \261 0% aim7.jobs-per-min 237 \261 0% +85.6% 440 \261 0% aim7.time.elapsed_time 237 \261 0% +85.6% 440 \261 0% aim7.time.elapsed_time.max 25026 \261 0% +70.7% 42712 \261 0% aim7.time.system_time 2186645 \261 5% +32.0% 2885949 \261 4% aim7.time.voluntary_context_switches 4576561 \261 1% +24.9% 5715773 \261 0% aim7.time.involuntary_context_switches The problem is specific to very large machines under stress. It was not reproducible with the machines I had used to justify the original patch because large numbers of CPUs are required. When pressure is high enough, the cache line is bouncing between CPUs trying to acquire the lock and the holder of the lock adjusting free lists. The intention was that the acquirer of the lock would automatically have the cache line holding the free lists but according to Huang, this is not a universal win. One possibility is to move the zone lock to its own cache line but it increases the size of the zone. This patch moves the lock to the other end of the free lists where they do not contend under high pressure. It does mean the page allocator paths now require more cache lines but Huang reports that it restores performance to previous levels on large machines %stddev %change %stddev \ \| \ 84568 \261 1% +94.3% 164280 \261 1% aim7.jobs-per-min 2881944 \261 2% -35.1% 1870386 \261 8% aim7.time.voluntary_context_switches 681 \261 1% -3.4% 658 \261 0% aim7.time.user_time 5538139 \261 0% -12.1% 4867884 \261 0% aim7.time.involuntary_context_switches 44174 \261 1% -46.0% 23848 \261 1% aim7.time.system_time 426 \261 1% -48.4% 219 \261 1% aim7.time.elapsed_time 426 \261 1% -48.4% 219 \261 1% aim7.time.elapsed_time.max 468 \261 1% -43.1% 266 \261 2% uptime.boot Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Huang Ying <ying.huang@intel.com> Tested-by: Huang Ying <ying.huang@intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-07	Revert "libceph: use memalloc flags for net IO"	Ilya Dryomov	1	-8/+1
	This reverts commit 89baaa570ab0b476db09408d209578cfed700e9f. Dirty page throttling should be sufficient for us in the general case so there is no need to use __GFP_MEMALLOC - it would be needed only in the swap-over-rbd case, which we currently don't support. (It would probably take approximately the commit that is being reverted to add that support, but we would also need the "swap" option to distinguish from the general case and make sure swap ceph_client-s aren't shared with anything else.) See ceph-devel threads [1] and [2] for the details of why enabling pfmemalloc reserves for all cases is a bad thing. On top of potential system lockups related to drained emergency reserves, this turned out to cause ceph lockups in case peers are on the same host and communicating via loopback due to sk_filter() dropping pfmemalloc skbs on the receiving side because the receiving loopback socket is not tagged with SOCK_MEMALLOC. [1] "SOCK_MEMALLOC vs loopback" http://www.spinics.net/lists/ceph-devel/msg22998.html [2] "[PATCH] libceph: don't set memalloc flags in loopback case" http://www.spinics.net/lists/ceph-devel/msg23392.html Conflicts: net/ceph/messenger.c [ context: tcp_nodelay option ] Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Mel Gorman <mgorman@suse.de> Cc: Sage Weil <sage@redhat.com> Cc: stable@vger.kernel.org # 3.18+, needs backporting Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Mel Gorman <mgorman@suse.de>
2015-04-07	drm/i915/vlv: remove wait for previous GFX clk disable request	Jesse Barnes	1	-14/+0
	Looks like it was introduced in: commit 650ad970a39f8b6164fe8613edc150f585315289 Author: Imre Deak <imre.deak@intel.com> Date: Fri Apr 18 16:35:02 2014 +0300 drm/i915: vlv: factor out vlv_force_gfx_clock and check for pending force-of but I'm not sure why. It has caused problems for us in the past (see 85250ddff7a6 "drm/i915/chv: Remove Wait for a previous gfx force-off" and 8d4eee9cd7a1 "drm/i915: vlv: increase timeout when forcing on the GFX clock") and doesn't seem to be required, so let's just drop it. References: https://bugs.freedesktop.org/show_bug.cgi?id=89611 Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Tested-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Cc: stable@vger.kernel.org # c9c52e24194a: drm/i915/chv: Remove Wait ... Cc: stable@vger.kernel.org Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-04-07	drm/i915/chv: Remove Wait for a previous gfx force-off	Deepak S	1	-2/+4
	On CHV, PUNIT team confirmed that 'VLV_GFX_CLK_STATUS_BIT' is not a sticky bit and it will always be set. So ignore Check for previous Gfx force off during suspend and allow the force clk as part S0ix Sequence Signed-off-by: Deepak S <deepak.s@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-04-07	drm/i915/vlv: save/restore the power context base reg	Jesse Barnes	2	-0/+3
	Some BIOSes (e.g. the one on the Minnowboard) don't save/restore this reg. If it's unlocked, we can just restore the previous value, and if it's locked (in case the BIOS re-programmed it for us) the write will be ignored and we'll still have "did it move" sanity check in the PM code to warn us if something is still amiss. References: https://bugs.freedesktop.org/show_bug.cgi?id=89611 Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Tested-by: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-04-07	Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"	Rafael J. Wysocki	1	-20/+1
	Commit 84c91b7ae07c (PM / hibernate: avoid unsafe pages in e820 reserved regions) is reported to make resume from hibernation on Lenovo x230 unreliable, so revert it. We will revisit the issue the commit in question was supposed to fix in the future. Link: https://bugzilla.kernel.org/show_bug.cgi?id=96111 Reported-by: rhn <kebuac.rhn@porcupinefactory.org> Cc: 3.17+ <stable@vger.kernel.org> # 3.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-06	Linux 4.0-rc7	Linus Torvalds	1	-1/+1

2015-04-06	net/mlx4_core: Fix error message deprecation for ConnectX-2 cards	Jack Morgenstein	1	-1/+2
	Commit 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") did the deprecation only for port 1 of the card. Need to deprecate for port 2 as well. Fixes: 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06	net: dsa: fix filling routing table from OF description	Pavel Nakonechny	2	-17/+10
	According to description in 'include/net/dsa.h', in cascade switches configurations where there are more than one interconnected devices, 'rtable' array in 'dsa_chip_data' structure is used to indicate which port on this switch should be used to send packets to that are destined for corresponding switch. However, dsa_of_setup_routing_table() fills 'rtable' with port numbers of the _target_ switch, but not current one. This commit removes redundant devicetree parsing and adds needed port number as a function argument. So dsa_of_setup_routing_table() now just looks for target switch number by parsing parent of 'link' device node. To remove possible misunderstandings with the way of determining target switch number, a corresponding comment was added to the source code and to the DSA device tree bindings documentation file. This was tested on a custom board with two Marvell 88E6095 switches with following corresponding routing tables: { -1, 10 } and { 8, -1 }. Signed-off-by: Pavel Nakonechny <pavel.nakonechny@skitlab.ru> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06	l2tp: unregister l2tp_net_ops on failure path	WANG Cong	1	-0/+1
	Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06	mvneta: dont call mvneta_adjust_link() manually	Stas Sergeev	1	-6/+1
	mvneta_adjust_link() is a callback for of_phy_connect() and should not be called directly. The result of calling it directly is as below: Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06	ipv6: protect skb->sk accesses from recursive dereference inside the stack	hannes@stressinduktion.org	7	-19/+34
	We should not consult skb->sk for output decisions in xmit recursion levels > 0 in the stack. Otherwise local socket settings could influence the result of e.g. tunnel encapsulation process. ipv6 does not conform with this in three places: 1) ip6_fragment: we do consult ipv6_npinfo for frag_size 2) sk_mc_loop in ipv6 uses skb->sk and checks if we should loop the packet back to the local socket 3) ip6_skb_dst_mtu could query the settings from the user socket and force a wrong MTU Furthermore: In sk_mc_loop we could potentially land in WARN_ON(1) if we use a PF_PACKET socket ontop of an IPv6-backed vxlan device. Reuse xmit_recursion as we are currently only interested in protecting tunnel devices. Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-05	Input: alps - document stick behavior for protocol V2	Hans de Goede	1	-0/+8
	Document that protocol V2 uses standard (bare) PS/2 mouse packets for the DualPoint stick. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-By: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2015-04-05	Input: alps - report V2 Dualpoint Stick events via the right evdev node	Hans de Goede	1	-1/+6
	On V2 devices the DualPoint Stick reports bare packets, these should be reported via the "AlpsPS/2 ALPS DualPoint Stick" dev2 evdev node, which also has the INPUT_PROP_POINTING_STICK propbit set. Note that since there is no way to distinguish these packets from an external PS/2 mouse (insofar as these laptops have an external PS/2 port) this means that we will be reporting PS/2 mouse events via this evdev node too, as we've been doing in kernel 3.19 and older. This has been tested on a Dell Latitude D620 and a Dell Latitude E6400, which both have a V2 touchpad + a DualPoint Stick which reports bare packets. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2015-04-05	Input: alps - report interleaved bare PS/2 packets via dev3	Hans de Goede	1	-14/+18
	Bare packets should be reported via the same evdev device independent on whether they are detected on the beginning of a packet or in the middle of a packet. This has been tested on a Dell Latitude E6400, where the DualPoint Stick reports bare packets, which get reported via dev3 when the touchpad is idle, and via dev2 when the touchpad and stick are used simultaneously. This commit fixes this inconsistency by always reporting bare packets via dev3. Note that since the come from a DualPoint Stick they really should be reported via dev2, this gets fixed in a later commit. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2015-04-03	netns: don't allocate an id for dead netns	Nicolas Dichtel	1	-1/+3
	First, let's explain the problem. Suppose you have an ipip interface that stands in the netns foo and its link part in the netns bar (so the netns bar has an nsid into the netns foo). Now, you remove the netns bar: - the bar nsid into the netns foo is removed - the netns exit method of ipip is called, thus our ipip iface is removed: => a netlink message is built in the netns foo to advertise this deletion => this netlink message requests an nsid for bar, thus a new nsid is allocated for bar and never removed. This patch adds a check in peernet2id() so that an id cannot be allocated for a netns which is currently destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-03	Revert "netns: don't clear nsid too early on removal"	Nicolas Dichtel	1	-15/+9
	This reverts commit 4217291e592d ("netns: don't clear nsid too early on removal"). This is not the right fix, it introduces races. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-03	cpuidle: ACPI: do not overwrite name and description of C0	Thomas Schlichter	1	-1/+1
	Fix a bug that leads to showing the name and description of C-state C0 as "<null>" in sysfs after the ACPI C-states changed (e.g. after AC->DC or DC->AC transition). The function poll_idle_init() in drivers/cpuidle/driver.c initializes the state 0 during cpuidle_register_driver(), so we better do not overwrite it again with '\0' during acpi_processor_cst_has_changed(). Signed-off-by: Thomas Schlichter <thomas.schlichter@web.de> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-03	cpuidle: remove state_count field from struct cpuidle_device	Bartlomiej Zolnierkiewicz	3	-6/+3
	Thomas Schlichter reports the following issue on his Samsung NC20: "The C-states C1 and C2 to the OS when connected to AC, and additionally provides the C3 C-state when disconnected from AC. However, the number of C-states shown in sysfs is fixed to the number of C-states present at boot. If I boot with AC connected, I always only see the C-states up to C2 even if I disconnect AC. The reason is commit 130a5f692425 (ACPI / cpuidle: remove dev->state_count setting). It removes the update of dev->state_count, but sysfs uses exactly this variable to show the C-states. The fix is to use drv->state_count in sysfs. As this is currently the last user of dev->state_count, this variable can be completely removed." Remove dev->state_count as per the above. Reported-by: Thomas Schlichter <thomas.schlichter@web.de> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-03	cpufreq: Schedule work for the first-online CPU on resume	Viresh Kumar	1	-8/+11
	All CPUs leaving the first-online CPU are hotplugged out on suspend and and cpufreq core stops managing them. On resume, we need to call cpufreq_update_policy() for this CPU's policy to make sure its frequency is in sync with cpufreq's cached value, as it might have got updated by hardware during suspend/resume. The policies are always added to the top of the policy-list. So, in normal circumstances, CPU 0's policy will be the last one in the list. And so the code checks for the last policy. But there are cases where it will fail. Consider quad-core system, with policy-per core. If CPU0 is hotplugged out and added back again, the last policy will be on CPU1 :( To fix this in a proper way, always look for the policy of the first online CPU. That way we will be sure that we are calling cpufreq_update_policy() for the only CPU that wasn't hotplugged out. Cc: 3.15+ <stable@vger.kernel.org> # 3.15+ Fixes: 2f0aea936360 ("cpufreq: suspend governors on system suspend/hibernate") Reported-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-02	ip6mr: call del_timer_sync() in ip6mr_free_table()	WANG Cong	1	-1/+1
	We need to wait for the flying timers, since we are going to free the mrtable right after it. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02	net: move fib_rules_unregister() under rtnl lock	WANG Cong	6	-5/+8
	We have to hold rtnl lock for fib_rules_unregister() otherwise the following race could happen: fib_rules_unregister(): fib_nl_delrule(): ... ... ... ops = lookup_rules_ops(); list_del_rcu(&ops->list); list_for_each_entry(ops->rules) { fib_rules_cleanup_ops(ops); ... list_del_rcu(); list_del_rcu(); } Note, net->rules_mod_lock is actually not needed at all, either upper layer netns code or rtnl lock guarantees we are safe. Cc: Alexander Duyck <alexander.h.duyck@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02	ipv4: take rtnl_lock and mark mrt table as freed on namespace cleanup	WANG Cong	1	-0/+5
	This is the IPv4 part for commit 905a6f96a1b1 (ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup). Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02	[media] rtl28xxu: return success for unimplemented FE callback	Antti Palosaari	1	-2/+0
	Return success for FE callback on case we don't have any special implementation. fc0013 tuner driver calls that callback in order to switch antenna input, even we don't provide antenna switch. Returning error caused fc0013 driver given up tuning. Signed-off-by: Antti Palosaari <crope@iki.fi> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-02	[media] rtl2832: disable regmap register cache	Antti Palosaari	1	-1/+1
	Caching register reads causes some random I/O errors on channel change. Disable caching now in order to avoid those errors. Reverts partly commit dcadb82 Signed-off-by: Antti Palosaari <crope@iki.fi> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-02	tcp: fix FRTO undo on cumulative ACK of SACKed range	Neal Cardwell	1	-3/+4
	On processing cumulative ACKs, the FRTO code was not checking the SACKed bit, meaning that there could be a spurious FRTO undo on a cumulative ACK of a previously SACKed skb. The FRTO code should only consider a cumulative ACK to indicate that an original/unretransmitted skb is newly ACKed if the skb was not yet SACKed. The effect of the spurious FRTO undo would typically be to make the connection think that all previously-sent packets were in flight when they really weren't, leading to a stall and an RTO. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02	xen-netfront: transmit fully GSO-sized packets	Jonathan Davies	1	-4/+1
	xen-netfront limits transmitted skbs to be at most 44 segments in size. However, GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448 bytes each. This slight reduction in the size of packets means a slight loss in efficiency. Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER, where XEN_NETIF_MAX_TX_SIZE is 65535 bytes. The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s 6c09fa09d) in determining when to split an skb into two is sk->sk_gso_max_size - 1 - MAX_TCP_HEADER. So the maximum permitted size of an skb is calculated to be (XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER. Intuitively, this looks like the wrong formula -- we don't need two TCP headers. Instead, there is no need to deviate from the default gso_max_size of 65536 as this already accommodates the size of the header. Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments of 1448 bytes each), as observed via tcpdump. This patch makes netfront send skbs of up to 65160 bytes (45 segments of 1448 bytes each). Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as it relates to the size of the whole packet, including the header. Fixes: 9ecd1a75d977 ("xen-netfront: reduce gso_max_size to account for max TCP header") Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02	IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic	Shachar Raindel	1	-0/+8
	Properly verify that the resulting page aligned end address is larger than both the start address and the length of the memory area requested. Both the start and length arguments for ib_umem_get are controlled by the user. A misbehaving user can provide values which will cause an integer overflow when calculating the page aligned end address. This overflow can cause also miscalculation of the number of pages mapped, and additional logic issues. Addresses: CVE-2014-8159 Cc: <stable@vger.kernel.org> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-04-02	perf/x86/intel: Fix Haswell CYCLE_ACTIVITY.* counter constraints	Andi Kleen	1	-3/+3
	Some of the CYCLE_ACTIVITY.* events can only be scheduled on counter 2. Due to a typo Haswell matched those with INTEL_EVENT_CONSTRAINT, which lead to the events never matching as the comparison does not expect anything in the umask too. Fix the typo. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1425925222-32361-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Filter branches for PEBS event	Kan Liang	1	-2/+2
	For supporting Intel LBR branches filtering, Intel LBR sharing logic mechanism is introduced from commit b36817e88630 ("perf/x86: Add Intel LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to config lbr_sel, which is finally used to set LBR_SELECT. However, the intel_shared_regs_constraints() function is called after intel_pebs_constraints(). The PEBS event will return immediately after intel_pebs_constraints(). So it's impossible to filter branches for PEBS events. This patch moves intel_shared_regs_constraints() ahead of intel_pebs_constraints(). We can safely do that because the intel_shared_regs_constraints() function only returns empty constraint if its rejecting the event, otherwise it returns NULL such that we continue calling intel_pebs_constraints() and x86_get_event_constraint(). Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1427467105-9260-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	drm/radeon: fix wait in radeon_mn_invalidate_range_start	Christian König	1	-7/+4
	We need to wait for all fences, not just the exclusive one. Signed-off-by: Christian König <christian.koenig@amd.com> Cc: <stable@vger.kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-04-02	drm/radeon: add extra check in radeon_ttm_tt_unpin_userptr	Christian König	1	-0/+4
	We somehow try to free the SG table twice. Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=89734 Signed-off-by: Christian König <christian.koenig@amd.com> Cc: <stable@vger.kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-04-02	drm: Exynos: Respect framebuffer pitch for FIMD/Mixer	Daniel Stone	2	-10/+15
	When performing a modeset, use the framebuffer pitch value to set FIMD IMG_SIZE and Mixer SPAN registers. These are both defined as pitch - the distance between contiguous lines (bytes for FIMD, pixels for mixer). Fixes display on Snow (1366x768). Signed-off-by: Daniel Stone <daniels@collabora.com> Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-04-02	[media] vb2: Fix dma_dir setting for dma-contig mem type	Sakari Ailus	1	-2/+1
	The last argument of vb2_dc_get_user_pages() is of type enum dma_data_direction, but the caller, vb2_dc_get_userptr() passes a value which is the result of comparison dma_dir == DMA_FROM_DEVICE. This results in the write parameter to get_user_pages() being zero in all cases, i.e. that the caller has no intent to write there. This was broken by patch "vb2: replace 'write' by 'dma_dir'". Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: stable@vger.kernel.org # for v3.19 Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-02	kgdb/x86: Fix reporting of 'si' in kgdb on x86_64	Steffen Liebergeld	1	-1/+1
	This patch fixes an error in kgdb for x86_64 which would report the value of dx when asked to give the value of si. Signed-off-by: Steffen Liebergeld <steffen.liebergeld@kernkonzept.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set	Andy Lutomirski	1	-1/+15
	When I wrote the opportunistic SYSRET code, I missed an important difference between SYSRET and IRET. Both instructions are capable of setting EFLAGS.TF, but they behave differently when doing so: - IRET will not issue a #DB trap after execution when it sets TF. This is critical -- otherwise you'd never be able to make forward progress when returning to userspace. - SYSRET, on the other hand, will trap with #DB immediately after returning to CPL3, and the next instruction will never execute. This breaks anything that opportunistically SYSRETs to a user context with TF set. For example, running this code with TF set and a SIGTRAP handler loaded never gets past 'post_nop': extern unsigned char post_nop[]; asm volatile ("pushfq\n\t" "popq %%r11\n\t" "nop\n\t" "post_nop:" : : "c" (post_nop) : "r11"); In my defense, I can't find this documented in the AMD or Intel manual. Fix it by using IRET to restore TF. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 2a23c6b8a9c4 ("x86_64, entry: Use sysret to return to userspace when possible") Link: http://lkml.kernel.org/r/9472f1ca4c19a38ecda45bba9c91b7168135fcfa.1427923514.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	drm/i915: Reject the colorkey ioctls for primary and cursor planes	Ville Syrjälä	1	-2/+2
	The legcy colorkey ioctls are only implemented for sprite planes, so reject the ioctl for primary/cursor planes. If we want to support colorkeying with these planes (assuming we have hw support of course) we should just move ahead with the colorkey property conversion. Testcase: kms_legacy_colorkey Cc: Tommi Rantala <tt.rantala@gmail.com> Cc: stable@vger.kernel.org Reference: http://mid.gmane.org/CA+ydwtr+bCo7LJ44JFmUkVRx144UDFgOS+aJTfK6KHtvBDVuAw@mail.gmail.com Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-04-02	cpufreq: hisilicon: add acpu driver	Leo Yan	3	-0/+52
	Add acpu driver for hisilicon SoC, acpu is application processor subsystem. Currently the acpu has the coupled clock domain for two clusters, so this driver will directly use cpufreq-dt driver as backend. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-01	cpufreq: powernv: Report cpu frequency throttling	Shilpasri G Bhat	1	-1/+46
	The power and thermal safety of the system is taken care by an On-Chip-Controller (OCC) which is real-time subsystem embedded within the POWER8 processor. OCC continuously monitors the memory and core temperature, the total system power, state of power supply and fan. The cpu frequency can be throttled by OCC for the following reasons: 1)If a processor crosses its power and temperature limit then OCC will lower its Pmax to reduce the frequency and voltage. 2)If OCC crashes then the system is forced to Psafe frequency. 3)If OCC fails to recover then the kernel is not allowed to do any further frequency changes and the chip will remain in Psafe. The user can see a drop in performance when frequency is throttled and is unaware of throttling. So detect and report such a condition, so the user can check the OCC status to reboot the system or check for power supply or fan failures. The current status of the core is read from Power Management Status Register(PMSR) to check if any of the throttling condition is occurred and the appropriate throttling message is reported. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-01	cxgb4: Fix to dump devlog, even if FW is crashed	Hariprasad Shenai	4	-26/+65
	Add new Common Code routines to retrieve Firmware Device Log parameters from PCIE_FW_PF[7]. The firmware initializes its Device Log very early on and stores the parameters for its location/size in that register. Using the parameters from the register allows us to access the Firmware Device Log even when the firmware crashes very early on or we're not attached to the firmware Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-01	cxgb4: Firmware macro changes for fw verison 1.13.32.0	Hariprasad Shenai	2	-6/+41
	Adds new macro and few macro changes for fw version 1.13.32.0 also changes version string in driver to match 1.13.32.0 Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-01	lguest: now needs PCI_DIRECT.	Rusty Russell	1	-1/+1
	Since commit 8e7094694396 ("lguest: add a dummy PCI host bridge.") lguest uses PCI, but it needs you to frob the ports directly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-01	bnx2x: Fix kdump when iommu=on	Yuval Mintz	1	-23/+16
	When IOMM-vtd is active, once main kernel crashes unfinished DMAE transactions will be blocked, putting the HW in an error state which will cause further transactions to timeout. Current employed logic uses wrong macros, causing the first function to be the only function that cleanups that error state during its probe/load. This patch allows all the functions to successfully re-load in kdump kernel. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>