linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-01-28	NTB: Set dma mask and dma coherent mask to NTB devices	Serge Semin	4	-2/+15
	The dma_mask and dma_coherent_mask fields of the NTB struct device weren't initialized in hardware drivers. In fact it should be done instead of PCIe interface usage, since NTB clients are supposed to use NTB API and left unaware of real hardware implementation. In addition to that ntb_device_register() method shouldn't clear the passed ntb_dev structure, since it dma_mask is initialized by hardware drivers. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	NTB: Rename NTB messaging API methods	Serge Semin	2	-33/+28
	There is a common methods signature form used over all the NTB API like functions naming scheme, arguments names and order, etc. Recently added NTB messaging API IO callbacks were named a bit different so should be renamed to be in compliance with the rest of the API. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_hw_switchtec: fix logic error	Arnd Bergmann	1	-1/+1
	Newer gcc (version 7 and 8 presumably) warn about a statement mixing the << operator with logical and: drivers/ntb/hw/mscc/ntb_hw_switchtec.c: In function 'switchtec_ntb_init_sndev': drivers/ntb/hw/mscc/ntb_hw_switchtec.c:888:24: error: '<<' in boolean context, did you mean '<' ? [-Werror=int-in-bool-context] My interpretation here is that the author must have intended a bitmask rather than a comparison, so I'm changing the '&&' to '&', which makes a lot more sense in the context. Fixes: 1b249475275d ("ntb_hw_switchtec: Allow using Switchtec NTB in multi-partition setups") Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_hw_switchtec: Check for alignment of the buffer in mw_set_trans()	Logan Gunthorpe	1	-0/+13
	With Switchtec hardware, the buffer used for a memory window must be aligned to its size (the hardware only replaces the lower bits). In certain circumstances dma_alloc_coherent() will not provide a buffer that adheres to this requirement like when using the CMA and CONFIG_CMA_ALIGNMENT is set lower than the buffer size. When we get an unaligned buffer mw_set_trans() should return an error. We also log an error so we know the cause of the problem. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_transport: Fix bug with max_mw_size parameter	Logan Gunthorpe	1	-0/+3
	When using the max_mw_size parameter of ntb_transport to limit the size of the Memory windows, communication cannot be established and the queues freeze. This is because the mw_size that's reported to the peer is correctly limited but the size used locally is not. So the MW is initialized with a buffer smaller than the window but the TX side is using the full window. This means the TX side will be writing to a region of the window that points nowhere. This is easily fixed by applying the same limit to tx_size in ntb_transport_init_queue(). Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	MAINTAINERS: NTB: Update contact info	Allen Hubbe	1	-1/+1
	I am no longer employed by Dell EMC. For the purposes of NTB driver development and maintenance, please contact me via my personal email. Signed-off-by: Allen Hubbe <allenbh@gmail.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_hw_switchtec: Force down the link before initializing	Logan Gunthorpe	1	-7/+50
	If one host crashes and soft reboots, the other host may not see a link down event. Then when the crashed host comes back up, the surviving host may not know the link was reset and the NTB clients may not work without being reset. To solve this, we send a LINK_FORCE_DOWN message to each peer every time we come up, before we register the NTB device. If a surviving host still thinks the link is up it will take it down immediately. In this way, once the crashed host comes up fully, it will send a regular link up event as per usual and the link will be properly restarted. While we are in the area, this also fixes the MSG_LINK_UP message that was in the link down function that was reported by Doug Meyers. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reported-by: ThanhTuThai <cruisethai@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_hw_switchtec: Crosslink doorbells and messages	Logan Gunthorpe	1	-10/+55
	In a crosslink configuration doorbells and messages largely work the same but the NTB registers must be accessed through the reserved LUT window. Also, as a bonus, seeing there are now two independent sets of NTB links, both partitions can actually use all 60 doorbell registers instead of them having to be split into two for each partition. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_hw_switchtec: Add initialization code for crosslink	Logan Gunthorpe	2	-11/+206
	Crosslink is a feature of the Switchtec switches that is similar to the B2B mode of other NTB devices. It allows a system to be designed that is perfectly symmetric with two identical switches that link two hosts together. In order for the system to be symmetric, there is an empty host-less partition between the two switches which the host must enumerate and assign BAR addresses to. The firmware in the switch manages this specially so that the BAR addresses on both sides of the empty partition will be identical despite being in the same partition with the same address space. The driver determines whether crosslink is enabled by a flag set in the NTB partition info registers which are set by the switch's configuration file. When crosslink is enabled, a reserved LUT window is setup to point to the peer's switch's NTB registers and the local MWs are set to forward to the host-less partition's BARs. (Yes, this hurts my brain too.) Once this is setup, largely the same NTB infrastructure is used to communicate between the two hosts. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_hw_switchtec: Expand PFF CSR registers	Logan Gunthorpe	1	-1/+14
	The PFF CSR registers actual mirrors the PCI configuration space for all the ports in the switch. Previously, this was not needed by the driver but will be used by the crosslink code to enumerate the bus in an host-less centre partition. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_hw_switchtec: Make switchtec_ntb_init_req_id_table() more general	Logan Gunthorpe	1	-36/+56
	This is a prep patch in order to support the crosslink feature which will require the driver to setup the requester ID table in another partition as well as it's own. To aid this, create a helper function which sets up the requester IDs from an array. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_hw_switchtec: Create helper function to setup reserved LUT MWs	Logan Gunthorpe	1	-29/+43
	This is a prep patch in order to support the crosslink feature which will require the driver to use another reserved LUT window. To simplify this we move the code which sets up the reserved LUT window into a helper function which will be used by the crosslink initialization. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_hw_switchtec: Keep track of the number of LUT windows used by the driver	Logan Gunthorpe	1	-4/+8
	This is a prep patch in order to support the crosslink feature which will require the driver to use another reserved LUT window. To simplify this, we add some code to track the number of reserved LUT windows in use instead of assuming this is always 1. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb_hw_switchtec: Allow using Switchtec NTB in multi-partition setups	Kelvin Cao	2	-9/+56
	Allow using Switchtec NTB in setups that have more than two partitions. Note: this does not enable having multi-host communication, it only allows for a single NTB link between two hosts in a network that might have more than two. Use following logic to determine the NT peer partition: 1) If there are 2 partitions, and the target vector is set in the Switchtec configuration, use the partition specified in target vector. 2) If there are 2 partitions and target vector is unset use the only other partition as specified in the NT EP map. 3) If there are more than 2 partitions and target vector is set use the other partition specified in target vector. 4) If there are more than 2 partitions and target vector is unset, this is invalid and report an error. Signed-off-by: Kelvin Cao <kelvin.cao@microsemi.com> [logang@deltatee.com: commit message fleshed out] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	NTB: switchtec_ntb: Add new line on appropriate printks	Jon Mason	1	-21/+21
	Trivial addition of "\n" to the dev_* prints where necessary Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	NTB: switchtec_ntb: fix spelling mistake: "peforming" -> "performing"	Colin Ian King	1	-1/+1
	Trivial fix to spelling mistake in dev_err error message Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-By: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb: remove Intel Atom NTB driver support	Dave Jiang	2	-363/+4
	Removing dead code since this is not being used. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	ntb: remove unneeded DRIVER_LICENSE #defines	Greg Kroah-Hartman	4	-8/+4
	There is no need to #define the license of the driver, just put it in the MODULE_LICENSE() line directly as a text string. This allows tools that check that the module license matches the source code license to work properly, as there is no need to unwind the unneeded dereference, especially when the string is defined just a few lines above the usage of it. Reported-and-reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Allen Hubbe <Allen.Hubbe@emc.com> Cc: Gary R Hook <gary.hook@amd.com> Cc: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	NTB: ntb_hw_switchtec: Fix peer BAR bug in switchtec_ntb_init_shared_mw	Doug Meyer	1	-4/+5
	This resolves a bug which may incorrectly configure the peer host's LUT for shared memory window access. The code was using the local host's first BAR number, rather than the peer hosts's first BAR number, to determine what peer NT control register to program. The bug will cause the Switchtec NTB link to work only if both peers have the same first NTB BAR configured. In all other configurations, the link will not come up, failing silently. When both hosts have the same first BAR, the configuration works only because the first BAR numbers happent to be the same. When the hosts do not have the same first BAR, then the LUT translation will not be configured in the correct peer LUT and will not give the peer the shared memory window access required for the link to operate. Signed-off-by: Doug Meyer <dmeyer@gigaio.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Fixes: 678784a44ae8 ("NTB: switchtec_ntb: Initialize hardware for memory windows") Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-28	Linux 4.15	Linus Torvalds	1	-1/+1

2018-01-28	x86/ftrace: Add one more ENDPROC annotation	Josh Poimboeuf	1	-1/+1
	When ORC support was added for the ftrace_64.S code, an ENDPROC for function_hook() was missed. This results in the following warning: arch/x86/kernel/ftrace_64.o: warning: objtool: .entry.text+0x0: unreachable instruction Fixes: e2ac83d74a4d ("x86/ftrace: Fix ORC unwinding from ftrace handlers") Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180128022150.dqierscqmt3uwwsr@treble
2018-01-27	hrtimer: Reset hrtimer cpu base proper on CPU hotplug	Thomas Gleixner	1	-0/+3
	The hrtimer interrupt code contains a hang detection and mitigation mechanism, which prevents that a long delayed hrtimer interrupt causes a continous retriggering of interrupts which prevent the system from making progress. If a hang is detected then the timer hardware is programmed with a certain delay into the future and a flag is set in the hrtimer cpu base which prevents newly enqueued timers from reprogramming the timer hardware prior to the chosen delay. The subsequent hrtimer interrupt after the delay clears the flag and resumes normal operation. If such a hang happens in the last hrtimer interrupt before a CPU is unplugged then the hang_detected flag is set and stays that way when the CPU is plugged in again. At that point the timer hardware is not armed and it cannot be armed because the hang_detected flag is still active, so nothing clears that flag. As a consequence the CPU does not receive hrtimer interrupts and no timers expire on that CPU which results in RCU stalls and other malfunctions. Clear the flag along with some other less critical members of the hrtimer cpu base to ensure starting from a clean state when a CPU is plugged in. Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the root cause of that hard to reproduce heisenbug. Once understood it's trivial and certainly justifies a brown paperbag. Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic") Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
2018-01-27	x86: Mark hpa as a "Designated Reviewer" for the time being	H. Peter Anvin	1	-11/+1
	Due to some unfortunate events, I have not been directly involved in the x86 kernel patch flow for a while now. I have also not been able to ramp back up by now like I had hoped to, and after reviewing what I will need to work on both internally at Intel and elsewhere in the near term, it is clear that I am not going to be able to ramp back up until late 2018 at the very earliest. It is not acceptable to not recognize that this load is currently taken by Ingo and Thomas without my direct participation, so I mark myself as R: (designated reviewer) rather than M: (maintainer) until further notice. This is in fact recognizing the de facto situation for the past few years. I have obviously no intention of going away, and I will do everything within my power to improve Linux on x86 and x86 for Linux. This, however, puts credit where it is due and reflects a change of focus. This patch also removes stale entries for portions of the x86 architecture which have not been maintained separately from arch/x86 for a long time. If there is a reason to re-introduce them then that can happen later. Signed-off-by: H. Peter Anvin <h.peter.anvin@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180125195934.5253-1-hpa@zytor.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-26	VSOCK: set POLLOUT \| POLLWRNORM for TCP_CLOSING	Stefan Hajnoczi	1	-1/+1
	select(2) with wfds but no rfds must return when the socket is shut down by the peer. This way userspace notices socket activity and gets -EPIPE from the next write(2). Currently select(2) does not return for virtio-vsock when a SEND+RCV shutdown packet is received. This is because vsock_poll() only sets POLLOUT \| POLLWRNORM for TCP_CLOSE, not the TCP_CLOSING state that the socket is in when the shutdown is received. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26	dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state	Alexey Kodanev	1	-0/+3
	ccid2_hc_tx_rto_expire() timer callback always restarts the timer again and can run indefinitely (unless it is stopped outside), and after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time"), which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from dccp_destroy_sock() to sk_destruct(), this started to happen quite often. The timer prevents releasing the socket, as a result, sk_destruct() won't be called. Found with LTP/dccp_ipsec tests running on the bonding device, which later couldn't be unloaded after the tests were completed: unregister_netdevice: waiting for bond0 to become free. Usage count = 148 Fixes: 2a91aa396739 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26	Update the RISC-V MAINTAINERS file	Palmer Dabbelt	1	-2/+2
	Now that we're upstream in Linux we've been able to make some infrastructure changes so our port works a bit more like other ports. Specifically: * We now have a mailing list specific to the RISC-V Linux port, hosted at lists.infreadead.org. * We now have a kernel.org git tree where work on our port is coordinated. This patch changes the RISC-V maintainers entry to reflect these new bits of infrastructure. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-01-26	x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels	Andy Lutomirski	1	-13/+9
	On a 5-level kernel, if a non-init mm has a top-level entry, it needs to match init_mm's, but the vmalloc_fault() code skipped over the BUG_ON() that would have checked it. While we're at it, get rid of the rather confusing 4-level folded "pgd" logic. Cleans-up: b50858ce3e2a ("x86/mm/vmalloc: Add 5-level paging support") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Neil Berrington <neil.berrington@datacore.com> Link: https://lkml.kernel.org/r/2ae598f8c279b0a29baf75df207e6f2fdddc0a1b.1516914529.git.luto@kernel.org
2018-01-26	x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems	Andy Lutomirski	1	-5/+29
	Neil Berrington reported a double-fault on a VM with 768GB of RAM that uses large amounts of vmalloc space with PTI enabled. The cause is that load_new_mm_cr3() was never fixed to take the 5-level pgd folding code into account, so, on a 4-level kernel, the pgd synchronization logic compiles away to exactly nothing. Interestingly, the problem doesn't trigger with nopti. I assume this is because the kernel is mapped with global pages if we boot with nopti. The sequence of operations when we create a new task is that we first load its mm while still running on the old stack (which crashes if the old stack is unmapped in the new mm unless the TLB saves us), then we call prepare_switch_to(), and then we switch to the new stack. prepare_switch_to() pokes the new stack directly, which will populate the mapping through vmalloc_fault(). I assume that we're getting lucky on non-PTI systems -- the old stack's TLB entry stays alive long enough to make it all the way through prepare_switch_to() and switch_to() so that we make it to a valid stack. Fixes: b50858ce3e2a ("x86/mm/vmalloc: Add 5-level paging support") Reported-and-tested-by: Neil Berrington <neil.berrington@datacore.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: stable@vger.kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/346541c56caed61abbe693d7d2742b4a380c5001.1516914529.git.luto@kernel.org
2018-01-25	net: vrf: Add support for sends to local broadcast address	David Ahern	1	-2/+3
	Sukumar reported that sends to the local broadcast address (255.255.255.255) are broken. Check for the address in vrf driver and do not redirect to the VRF device - similar to multicast packets. With this change sockets can use SO_BINDTODEVICE to specify an egress interface and receive responses. Note: the egress interface can not be a VRF device but needs to be the enslaved device. https://bugzilla.kernel.org/show_bug.cgi?id=198521 Reported-by: Sukumar Gopalakrishnan <sukumarg1973@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25	r8169: fix memory corruption on retrieval of hardware statistics.	Francois Romieu	1	-7/+2
	Hardware statistics retrieval hurts in tight invocation loops. Avoid extraneous write and enforce strict ordering of writes targeted to the tally counters dump area address registers. Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Tested-by: Oliver Freyermuth <o.freyermuth@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25	orangefs: fix deadlock; do not write i_size in read_iter	Martin Brandenburg	2	-16/+2
	After do_readv_writev, the inode cache is invalidated anyway, so i_size will never be read. It will be fetched from the server which will also know about updates from other machines. Fixes deadlock on 32-bit SMP. See https://marc.info/?l=linux-fsdevel&m=151268557427760&w=2 Signed-off-by: Martin Brandenburg <martin@omnibond.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mike Marshall <hubcap@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-26	drm/nouveau: Move irq setup/teardown to pci ctor/dtor	Lyude Paul	1	-15/+31
	For a while we've been having issues with seemingly random interrupts coming from nvidia cards when resuming them. Originally the fix for this was thought to be just re-arming the MSI interrupt registers right after re-allocating our IRQs, however it seems a lot of what we do is both wrong and not even nessecary. This was made apparent by what appeared to be a regression in the mainline kernel that started introducing suspend/resume issues for nouveau: a0c9259dc4e1 (irq/matrix: Spread interrupts on allocation) After this commit was introduced, we started getting interrupts from the GPU before we actually re-allocated our own IRQ (see references below) and assigned the IRQ handler. Investigating this turned out that the problem was not with the commit, but the fact that nouveau even free/allocates it's irqs before and after suspend/resume. For starters: drivers in the linux kernel haven't had to handle freeing/re-allocating their IRQs during suspend/resume cycles for quite a while now. Nouveau seems to be one of the few drivers left that still does this, despite the fact there's no reason we actually need to since disabling interrupts from the device side should be enough, as the kernel is already smart enough to know to disable host-side interrupts for us before going into suspend. Since we were tearing down our IRQs by hand however, that means there was a short period during resume where interrupts could be received before we re-allocated our IRQ which would lead to us getting an unhandled IRQ. Since we never handle said IRQ and re-arm the interrupt registers, this would cause us to miss all of the interrupts from the GPU and cause our init process to start timing out on anything requiring interrupts. So, since this whole setup/teardown every suspend/resume cycle is useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor functions instead so they're only called at driver load and driver unload. This should fix most of the issues with pending interrupts on resume, along with getting suspend/resume for nouveau to work again. As well, this probably means we can also just remove the msi rearm call inside nvkm_pci_init(). But since our main focus here is to fix suspend/resume before 4.15, we'll save that for a later patch. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <efault@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-01-25	net: don't call update_pmtu unconditionally	Nicolas Dichtel	9	-18/+20
	Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to: "BUG: unable to handle kernel NULL pointer dereference at (null)" Let's add a helper to check if update_pmtu is available before calling it. Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") CC: Roman Kapl <code@rkapl.cz> CC: Xin Long <lucien.xin@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25	net: tcp: close sock if net namespace is exiting	Dan Streetman	3	-0/+28
	When a tcp socket is closed, if it detects that its net namespace is exiting, close immediately and do not wait for FIN sequence. For normal sockets, a reference is taken to their net namespace, so it will never exit while the socket is open. However, kernel sockets do not take a reference to their net namespace, so it may begin exiting while the kernel socket is still open. In this case if the kernel socket is a tcp socket, it will stay open trying to complete its close sequence. The sock's dst(s) hold a reference to their interface, which are all transferred to the namespace's loopback interface when the real interfaces are taken down. When the namespace tries to take down its loopback interface, it hangs waiting for all references to the loopback interface to release, which results in messages like: unregister_netdevice: waiting for lo to become free. Usage count = 1 These messages continue until the socket finally times out and closes. Since the net namespace cleanup holds the net_mutex while calling its registered pernet callbacks, any new net namespace initialization is blocked until the current net namespace finishes exiting. After this change, the tcp socket notices the exiting net namespace, and closes immediately, releasing its dst(s) and their reference to the loopback interface, which lets the net namespace continue exiting. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811 Signed-off-by: Dan Streetman <ddstreet@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25	perf/x86: Fix perf,x86,cpuhp deadlock	Peter Zijlstra	1	-15/+18
	More lockdep gifts, a 5-way lockup race: perf_event_create_kernel_counter() perf_event_alloc() perf_try_init_event() x86_pmu_event_init() __x86_pmu_event_init() x86_reserve_hardware() #0 mutex_lock(&pmc_reserve_mutex); reserve_ds_buffer() #1 get_online_cpus() perf_event_release_kernel() _free_event() hw_perf_event_destroy() x86_release_hardware() #0 mutex_lock(&pmc_reserve_mutex) release_ds_buffer() #1 get_online_cpus() #1 do_cpu_up() perf_event_init_cpu() #2 mutex_lock(&pmus_lock) #3 mutex_lock(&ctx->mutex) sys_perf_event_open() mutex_lock_double() #3 mutex_lock(ctx->mutex) #4 mutex_lock_nested(ctx->mutex, 1); perf_try_init_event() #4 mutex_lock_nested(ctx->mutex, 1) x86_pmu_event_init() intel_pmu_hw_config() x86_add_exclusive() #0 mutex_lock(&pmc_reserve_mutex) Fix it by using ordering constructs instead of locking. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-25	perf/core: Fix ctx::mutex deadlock	Peter Zijlstra	1	-1/+7
	Lockdep noticed the following 3-way lockup scenario: sys_perf_event_open() perf_event_alloc() perf_try_init_event() #0 ctx = perf_event_ctx_lock_nested(1) perf_swevent_init() swevent_hlist_get() #1 mutex_lock(&pmus_lock) perf_event_init_cpu() #1 mutex_lock(&pmus_lock) #2 mutex_lock(&ctx->mutex) sys_perf_event_open() mutex_lock_double() #2 mutex_lock() #0 mutex_lock_nested() And while we need that perf_event_ctx_lock_nested() for HW PMUs such that they can iterate the sibling list, trying to match it to the available counters, the software PMUs need do no such thing. Exclude them. In particular the swevent triggers the above invertion, while the tpevent PMU triggers a more elaborate one through their event_mutex. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-25	perf/core: Fix another perf,trace,cpuhp lock inversion	Peter Zijlstra	1	-2/+24
	Lockdep noticed the following 3-way lockup race: perf_trace_init() #0 mutex_lock(&event_mutex) perf_trace_event_init() perf_trace_event_reg() tp_event->class->reg() := tracepoint_probe_register #1 mutex_lock(&tracepoints_mutex) trace_point_add_func() #2 static_key_enable() #2 do_cpu_up() perf_event_init_cpu() #3 mutex_lock(&pmus_lock) #4 mutex_lock(&ctx->mutex) perf_ioctl() #4 ctx = perf_event_ctx_lock() _perf_iotcl() ftrace_profile_set_filter() #0 mutex_lock(&event_mutex) Fudge it for now by noting that the tracepoint state does not depend on the event <-> context relation. Ugly though :/ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-25	perf/core: Fix lock inversion between perf,trace,cpuhp	Peter Zijlstra	1	-2/+11
	Lockdep gifted us with noticing the following 4-way lockup scenario: perf_trace_init() #0 mutex_lock(&event_mutex) perf_trace_event_init() perf_trace_event_reg() tp_event->class->reg() := tracepoint_probe_register #1 mutex_lock(&tracepoints_mutex) trace_point_add_func() #2 static_key_enable() #2 do_cpu_up() perf_event_init_cpu() #3 mutex_lock(&pmus_lock) #4 mutex_lock(&ctx->mutex) perf_event_task_disable() mutex_lock(&current->perf_event_mutex) #4 ctx = perf_event_ctx_lock() #5 perf_event_for_each_child() do_exit() task_work_run() __fput() perf_release() perf_event_release_kernel() #4 mutex_lock(&ctx->mutex) #5 mutex_lock(&event->child_mutex) free_event() _free_event() event->destroy() := perf_trace_destroy #0 mutex_lock(&event_mutex); Fix that by moving the free_event() out from under the locks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-24	net/ibm/emac: wrong bit is used for STA control register write	Ivan Mikhaylov	1	-1/+1
	STA control register has areas of mode and opcodes for opeations. 18 bit is using for mode selection, where 0 is old MIO/MDIO access method and 1 is indirect access mode. 19-20 bits are using for setting up read/write operation(STA opcodes). In current state 'read' is set into old MIO/MDIO mode with 19 bit and write operation is set into 18 bit which is mode selection, not a write operation. To correlate write with read we set it into 20 bit. All those bit operations are MSB 0 based. Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24	net/ibm/emac: add 8192 rx/tx fifo size	Ivan Mikhaylov	2	-0/+8
	emac4syn chips has availability to use 8192 rx/tx fifo buffer sizes, in current state if we set it up in dts 8192 as example, we will get only 2048 which may impact on network speed. Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24	Revert "Input: synaptics_rmi4 - use devm_device_add_group() for attributes in F01"	Nick Dyer	1	-3/+9
	Since the sysfs attribute hangs off the RMI bus, which doesn't go away during firmware flash, it needs to be explicitly removed, otherwise we would try and register the same attribute twice. This reverts commit 36a44af5c176d619552d99697433261141dd1296. Signed-off-by: Nick Dyer <nick@shmanahar.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-01-24	vhost: do not try to access device IOTLB when not initialized	Jason Wang	1	-0/+4
	The code will try to access dev->iotlb when processing VHOST_IOTLB_INVALIDATE even if it was not initialized which may lead to NULL pointer dereference. Fixes this by check dev->iotlb before. Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24	vhost: use mutex_lock_nested() in vhost_dev_lock_vqs()	Jason Wang	1	-1/+1
	We used to call mutex_lock() in vhost_dev_lock_vqs() which tries to hold mutexes of all virtqueues. This may confuse lockdep to report a possible deadlock because of trying to hold locks belong to same class. Switch to use mutex_lock_nested() to avoid false positive. Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API") Reported-by: syzbot+dbb7c1161485e61b0241@syzkaller.appspotmail.com Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24	i40e: flower: check if TC offload is enabled on a netdev	Jakub Kicinski	1	-0/+2
	Since TC block changes drivers are required to check if the TC hw offload flag is set on the interface themselves. Fixes: 2f4b411a3d67 ("i40e: Enable cloud filters via tc-flower") Fixes: 44ae12a768b7 ("net: sched: move the can_offload check from binding phase to rule insertion phase") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Amritha Nambiar <amritha.nambiar@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24	sparc64: fix typo in CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64	Corentin Labbe	1	-1/+1
	This patch fixes the typo CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64 Fixes: 81658ad0d923 ("sparc64: Add CAMELLIA driver making use of the new camellia opcodes.") Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24	qed: Free reserved MR tid	Michal Kalderon	1	-11/+17
	A tid was allocated for reserved MR during initialization but not freed. This lead to an annoying output message during rdma unload flow. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24	qed: Remove reserveration of dpi for kernel	Michal Kalderon	1	-3/+0
	Double reservation for kernel dedicated dpi was performed. Once in the core module and once in qedr. Remove the reservation from core. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24	kcm: Check if sk_user_data already set in kcm_attach	Tom Herbert	1	-2/+14
	This is needed to prevent sk_user_data being overwritten. The check is done under the callback lock. This should prevent a socket from being attached twice to a KCM mux. It also prevents a socket from being attached for other use cases of sk_user_data as long as the other cases set sk_user_data under the lock. Followup work is needed to unify all the use cases of sk_user_data to use the same locking. Reported-by: syzbot+114b15f2be420a8886c3@syzkaller.appspotmail.com Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Tom Herbert <tom@quantonium.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24	kcm: Only allow TCP sockets to be attached to a KCM mux	Tom Herbert	1	-2/+7
	TCP sockets for IPv4 and IPv6 that are not listeners or in closed stated are allowed to be attached to a KCM mux. Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot+8865eaff7f9acd593945@syzkaller.appspotmail.com Signed-off-by: Tom Herbert <tom@quantonium.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25	MAINTAINERS: update email address for James Morris	James Morris	1	-1/+1
	Update my email address. Signed-off-by: James Morris <jmorris@namei.org>