linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-08-10	powerpc: Print the kernel load address at the end of prom_init()	Benjamin Herrenschmidt	1	-1/+1
	This makes it easier to debug crashes that happen very early before the kernel takes over Open Firmware by allowing us to relate the OF reported crashing addresses to offsets within the kernel. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10	powerpc/ptrace: Fix coredump since ptrace TM changes	Cyril Bur	3	-28/+19
	Commit 8d460f6156cd ("powerpc/process: Add the function flush_tmregs_to_thread") added flush_tmregs_to_thread() and included the assumption that it would only be called for a task which is not current. Although this is correct for ptrace, when generating a core dump, some of the routines which call flush_tmregs_to_thread() are called. This leads to a WARNing such as: Not expecting ptrace on self: TM regs may be incorrect ------------[ cut here ]------------ WARNING: CPU: 123 PID: 7727 at arch/powerpc/kernel/process.c:1088 flush_tmregs_to_thread+0x78/0x80 CPU: 123 PID: 7727 Comm: libvirtd Not tainted 4.8.0-rc1-gcc6x-g61e8a0d #1 task: c000000fe631b600 task.stack: c000000fe63b0000 NIP: c00000000001a1a8 LR: c00000000001a1a4 CTR: c000000000717780 REGS: c000000fe63b3420 TRAP: 0700 Not tainted (4.8.0-rc1-gcc6x-g61e8a0d) MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28004222 XER: 20000000 ... NIP [c00000000001a1a8] flush_tmregs_to_thread+0x78/0x80 LR [c00000000001a1a4] flush_tmregs_to_thread+0x74/0x80 Call Trace: flush_tmregs_to_thread+0x74/0x80 (unreliable) vsr_get+0x64/0x1a0 elf_core_dump+0x604/0x1430 do_coredump+0x5fc/0x1200 get_signal+0x398/0x740 do_signal+0x54/0x2b0 do_notify_resume+0x98/0xb0 ret_from_except_lite+0x70/0x74 So fix flush_tmregs_to_thread() to detect the case where it is called on current, and a transaction is active, and in that case flush the TM regs to the thread_struct. This patch also moves flush_tmregs_to_thread() into ptrace.c as it is only called from that file. Fixes: 8d460f6156cd ("powerpc/process: Add the function flush_tmregs_to_thread") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10	powerpc/32: Fix csum_partial_copy_generic()	Christophe Leroy	1	-3/+4
	Commit 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") introduced a bug when destination address is odd and initial csum is not null In that (rare) case the initial csum value has to be rotated one byte as well as the resulting value is This patch also fixes related comments Fixes: 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10	cxl: Set psl_fir_cntl to production environment value	Frederic Barrat	1	-3/+6
	Switch the setting of psl_fir_cntl from debug to production environment recommended value. It mostly affects the PSL behavior when an error is raised in psl_fir1/2. Tested with cxlflash. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs	Benjamin Herrenschmidt	1	-10/+20
	The generic allocation code may sometimes decide to assign a prefetchable 64-bit BAR to the M32 window. In fact it may also decide to allocate a 64-bit non-prefetchable BAR to the M64 one ! So using the resource flags as a test to decide which window was used for PE allocation is just wrong and leads to insane PE numbers. Instead, compare the addresses to figure it out. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Rename the function as agreed by Ben & Gavin] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/book3s: Fix MCE console messages for unrecoverable MCE.	Mahesh Salgaonkar	2	-1/+3
	When machine check occurs with MSR(RI=0), it means MC interrupt is unrecoverable and kernel goes down to panic path. But the console message still shows it as recovered. This patch fixes the MCE console messages. Fixes: 36df96f8acaf ("powerpc/book3s: Decode and save machine check event.") Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/pci: Fix endian bug in fixed PHB numbering	Michael Ellerman	1	-2/+5
	The recent commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties"), added code to read a 64-bit property from the device tree, and if not found read a 32-bit property (reg). There was a bug in the 32-bit case, on big endian machines, due to the use of the 64-bit value to read the 32-bit property. The cast of &prop means we end up writing to the high 32-bit of prop, leaving the low 32-bits containing whatever junk was on the stack. If that junk value was non-zero, and < MAX_PHBS, we would end up using it as the PHB id. This results in users seeing what appear to be random PHB ids. Fix it by reading into a u32 property and then assigning that to the u64 value, letting the CPU do the correct conversions for us. Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/eeh: Switch to conventional PCI address output in EEH log	Guilherme G. Piccoli	1	-2/+2
	This is a very minor/trivial fix for the output of PCI address on EEH logs. The PCI address on "OF node" field currently is using ":" as a separator for the function, but the usual separator is ".". This patch changes the separator to dot, so the PCI address is printed as usual. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	cxl: Fix sparse warnings	Andrew Donnellan	2	-2/+2
	Make native_irq_wait() static and use NULL rather than 0 to initialise phb->cfg_data in cxl_pci_vphb_add() to remove sparse warnings. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests	Andrew Donnellan	3	-4/+4
	Commit f67a6722d650 ("cxl: Workaround PE=0 hardware limitation in Mellanox CX4") added a "min_pe" field to struct cxl_service_layer_ops, to allow us to work around a Mellanox CX-4 hardware limitation. When allocating the PE number in cxl_context_init(), we read from ctx->afu->adapter->native->sl_ops->min_pe to get the minimum PE number. Unsurprisingly, in a PowerVM guest ctx->afu->adapter->native is NULL, and guests don't have a cxl_service_layer_ops struct anywhere. Move min_pe from struct cxl_service_layer_ops to struct cxl so it's accessible in both native and PowerVM environments. For the Mellanox CX-4, set the min_pe value in set_sl_ops(). Fixes: f67a6722d650 ("cxl: Workaround PE=0 hardware limitation in Mellanox CX4") Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	cxl: Use fixed width predefined types in data structure.	Philippe Bergheaud	1	-2/+2
	This patch fixes a regression introduced by commit b810253bd934 ("cxl: Add mechanism for delivering AFU driver specific events"). It changes the type u8 to __u8 in the uapi header cxl.h, because the former is a kernel internal type, and may not be defined in userland build environments, in particular when cross-compiling libcxl on x86_64 linux machines (RHEL6.7 and Ubuntu 16.04). This patch also changes the size of the field data_size, and makes it constant, to support 32-bit userland applications running on big-endian ppc64 kernels transparently. mpe: This is an ABI change, however the ABI was only added during the 4.8 merge window so has never been part of a released kernel - therefore we give ourselves permission to change it. Fixes: b810253bd934 ("cxl: Add mechanism for delivering AFU driver specific events") Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/vdso: Add missing include file	Guenter Roeck	1	-0/+1
	Some powerpc builds fail with the following buld error. In file included from ./arch/powerpc/include/asm/mmu_context.h:11:0, from arch/powerpc/kernel/vdso.c:28: arch/powerpc/include/asm/cputhreads.h: In function 'get_tensr': arch/powerpc/include/asm/cputhreads.h:101:2: error: implicit declaration of function 'cpu_has_feature' Fixes: b92a226e5284 ("powerpc: Move cpu_has_feature() to a separate file") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc: Fix unused function warning 'lmb_to_memblock'	Alastair D'Silva	1	-13/+13
	This patch fixes the following warning: arch/powerpc/platforms/pseries/hotplug-memory.c:323:29: error: 'lmb_to_memblock' defined but not used [-Werror=unused-function] static struct memory_block lmb_to_memblock(struct of_drconf_cell lmb) ^~~~~~~~~~~~~~~ The only consumer of this function is 'dlpar_remove_lmb', which is enabled with CONFIG_MEMORY_HOTREMOVE, so move it into the same ifdef block. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.	Mahesh Salgaonkar	1	-29/+40
	The current implementation of MCE early handling modifies CR0/1 registers without saving its old values. Fix this by moving early check for powersaving mode to machine_check_handle_early(). The power architecture 2.06 or later allows the possibility of getting machine check while in nap/sleep/winkle. The last bit of HSPRG0 is set to 1, if thread is woken up from winkle. Hence, clear the last bit of HSPRG0 (r13) before MCE handler starts using it as paca pointer. Also, the current code always puts the thread into nap state irrespective of whatever idle state it woke up from. Fix that by looking at paca->thread_idle_state and put the thread back into same state where it came from. Fixes: 1c51089f777b ("powerpc/book3s: Return from interrupt if coming from evil context.") Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h	Mahesh Salgaonkar	2	-12/+13
	Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h so that MCE handler changes in subsequent patch can use it. No functionality change. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/powernv: Load correct TOC pointer while waking up from winkle.	Mahesh Salgaonkar	1	-1/+4
	The function pnv_restore_hyp_resource() loads the TOC into r2 from the invalid PACA pointer before fixing r13 value. This do not affect POWER ISA 3.0 but it does have an impact on POWER ISA 2.07 or less leading CPU to get stuck forever. login: [ 471.830433] Processor 120 is stuck. This can be easily reproducible using following steps: - Turn off SMT $ ppc64_cpu --smt=off - offline/online any online cpu (Thread 0 of any core which is online) $ echo 0 > /sys/devices/system/cpu/cpu<num>/online $ echo 1 > /sys/devices/system/cpu/cpu<num>/online For POWER ISA 2.07 or less, the last bit of HSPRG0 is set indicating that thread is waking up from winkle. Hence, the last bit of HSPRG0(r13) needs to be clear before accessing it as PACA to avoid loading invalid values from invalid PACA pointer. Fix this by loading TOC after r13 register is corrected. Fixes: bcef83a00dc4 ("powerpc/powernv: Add platform support for stop instruction") Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again	Alexey Kardashevskiy	1	-1/+1
	Commit fd141d1a99a3 ("powerpc/powernv/pci: Rework accessing the TCE invalidate register") broke TCE invalidation on IODA2/PHB3 for real mode. This makes invalidate work again. Fixes: fd141d1a99a3 ("powerpc/powernv/pci: Rework accessing the TCE invalidate register") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/cell: Add missing error code in spufs_mkgang()	Dan Carpenter	1	-1/+3
	We should return -ENOMEM if alloc_spu_gang() fails. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	powerpc/xics: Properly set Edge/Level type and enable resend	Benjamin Herrenschmidt	6	-10/+63
	This sets the type of the interrupt appropriately. We set it as follow: - If not mapped from the device-tree, we use edge. This is the case of the virtual interrupts and PCI MSIs for example. - If mapped from the device-tree and #interrupt-cells is 2 (PAPR compliant), we use the second cell to set the appropriate type - If mapped from the device-tree and #interrupt-cells is 1 (current OPAL on P8 does that), we assume level sensitive since those are typically going to be the PSI LSIs which are level sensitive. Additionally, we mark the interrupts requested via the opal_interrupts property all level. This is a bit fishy but the best we can do until we fix OPAL to properly expose them with a complete descriptor. It is also correct for the current HW anyway as OPAL interrupts are currently PCI error and PSI interrupts which are level. Finally now that edge interrupts are properly identified, we can enable CONFIG_HARDIRQS_SW_RESEND which will make the core re-send them if they occur while masked, which some drivers rely upon. This fixes issues with lost interrupts on some Mellanox adapters. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09	crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading	Anton Blanchard	1	-1/+2
	This patch utilises the GENERIC_CPU_AUTOPROBE infrastructure to automatically load the crc32c-vpmsum module if the CPU supports it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-08	powerpc/pasemi: Fix coherent_dma_mask for dma engine	Darren Stevens	1	-0/+5
	Commit 817820b0226a ("powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask) adds a check of coherent_dma_mask for dma allocations. Unfortunately current PASemi code does not set this value for the DMA engine, which ends up with the default value of 0xffffffff, the result is on a PASemi system with >2Gb ram and iommu enabled the the onboard ethernet stops working due to an inability to allocate memory. Add an initialisation to pci_dma_dev_setup_pasemi(). Signed-off-by: Darren Stevens <darren@stevens-zone.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-07	Linux 4.8-rc1	Linus Torvalds	1	-2/+2

2016-08-07	block: rename bio bi_rw to bi_opf	Jens Axboe	51	-157/+158
	Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07	target: iblock_execute_sync_cache() should use bio_set_op_attrs()	Jens Axboe	1	-1/+1
	The original commit missed this function, it needs to mark it a write flush. Cc: Mike Christie <mchristi@redhat.com> Fixes: e742fc32fcb4 ("target: use bio op accessors") Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07	mm: make __swap_writepage() use bio_set_op_attrs()	Jens Axboe	1	-2/+3
	Cleaner than manipulating bio->bi_rw flags directly. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07	block/mm: make bdev_ops->rw_page() take a bool for read/write	Jens Axboe	11	-53/+51
	Commit abf545484d31 changed it from an 'rw' flags type to the newer ops based interface, but now we're effectively leaking some bdev internals to the rest of the kernel. Since we only care about whether it's a read or a write at that level, just pass in a bool 'is_write' parameter instead. Then we can also move op_is_write() and friends back under CONFIG_BLOCK protection. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07	fs: return EPERM on immutable inode	Eryu Guan	4	-4/+5
	In most cases, EPERM is returned on immutable inode, and there're only a few places returning EACCES. I noticed this when running LTP on overlayfs, setxattr03 failed due to unexpected EACCES on immutable inode. So converting all EACCES to EPERM on immutable inode. Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-05	ramoops: use persistent_ram_free() instead of kfree() for freeing prz	Hiraku Toyooka	1	-3/+3
	persistent_ram_zone(=prz) structures are allocated by persistent_ram_new(), which includes vmap() or ioremap(). But they are currently freed by kfree(). This uses persistent_ram_free() for correct this asymmetry usage. Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.kw@hitachi.com> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Seiji Aguchi <seiji.aguchi.tr@hitachi.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2016-08-05	ramoops: use DT reserved-memory bindings	Kees Cook	4	-33/+56
	Instead of a ramoops-specific node, use a child node of /reserved-memory. This requires that of_platform_device_create() be explicitly called for the node, though, since "/reserved-memory" does not have its own "compatible" property. Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rob Herring <robh@kernel.org>
2016-08-05	NTB: ntb_hw_intel: use local variable pdev	Allen Hubbe	1	-5/+5
	Clean up duplicated expression by replacing it with the equivalent local variable pdev. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	NTB: ntb_hw_intel: show BAR size in debugfs info	Allen Hubbe	1	-1/+38
	It will be useful to know the hardware configured BAR size to diagnose issues with NTB memory windows. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_test: Add a selftest script for the NTB subsystem	Logan Gunthorpe	2	-0/+423
	This script automates testing doorbells, scratchpads and memory windows for an NTB device. It can be run locally, with the NTB looped back to the same host or use SSH to remotely control the second host. In the single host case, the script just needs to be passed two arguments: a PCI ID for each side of the link. In the two host case the -r option must be used to specify the remote hostname (which must be SSH accessible and should probably have ssh-keys exchanged). A sample run looks like this: $ sudo ./ntb_test.sh 0000:03:00.1 0000:83:00.1 -p 29 Starting ntb_tool tests... Running link tests on: 0000:03:00.1 / 0000:83:00.1 Passed Running link tests on: 0000:83:00.1 / 0000:03:00.1 Passed Running db tests on: 0000:03:00.1 / 0000:83:00.1 Passed Running db tests on: 0000:83:00.1 / 0000:03:00.1 Passed Running spad tests on: 0000:03:00.1 / 0000:83:00.1 Passed Running spad tests on: 0000:83:00.1 / 0000:03:00.1 Passed Running mw0 tests on: 0000:03:00.1 / 0000:83:00.1 Passed Running mw0 tests on: 0000:83:00.1 / 0000:03:00.1 Passed Running mw1 tests on: 0000:03:00.1 / 0000:83:00.1 Passed Running mw1 tests on: 0000:83:00.1 / 0000:03:00.1 Passed Starting ntb_pingpong tests... Running ping pong tests on: 0000:03:00.1 / 0000:83:00.1 Passed Starting ntb_perf tests... Running local perf test without DMA 0: copied 536870912 bytes in 164453 usecs, 3264 MBytes/s Passed Running remote perf test without DMA 0: copied 536870912 bytes in 164453 usecs, 3264 MBytes/s Passed Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_perf: clear link_is_up flag when the link goes down.	Logan Gunthorpe	1	-19/+9
	When the link goes down, the link_is_up flag did not return to false. This could have caused some subtle corner case bugs when the link goes up and down quickly. Once that was fixed, there was found to be a race if the link was brought down then immediately up. The link_cleanup work would occasionally be scheduled after the next link up event. This would cancel the link_work that was supposed to occur and leave ntb_perf in an unusable state. To fix this we get rid of the link_cleanup work and put the actions directly in the link_down event. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_pingpong: Add a debugfs file to get the ping count	Logan Gunthorpe	1	-1/+61
	This commit adds a debugfs 'count' file to ntb_pingpong. This is so testing with ntb_pingpong can be automated beyond just checking the logs for pong messages. The count file returns a number which increments every pong. The counter can be cleared by writing a zero. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_tool: Add link status and files to debugfs	Logan Gunthorpe	1	-0/+92
	In order to more successfully script with ntb_tool it's useful to have a link file to check the link status so that the script doesn't use the other files until the link is up. This commit adds a 'link' file to the debugfs directory which reads boolean (Y or N) depending on the link status. Writing to the file change the link state using ntb_link_enable or ntb_link_disable. A 'link_event' file is also provided so an application can block until the link changes to the desired state. If the user writes a 1, it will block until the link is up. If the user writes a 0, it will block until the link is down. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_tool: Postpone memory window initialization for the user	Logan Gunthorpe	1	-138/+228
	In order to make the interface closer to the raw NTB API, this commit changes memory windows so they are not initialized on link up. Instead, the 'peer_trans' debugfs files are introduced. When read, they return information provided by ntb_mw_get_range. When written, they create a buffer and initialize the memory window. The value written is taken as the requested size of the buffer (which is then rounded for alignment). Writing a value of zero frees the buffer and tears down the memory window translation. The 'peer_mw' file is only created once the memory window translation is setup by the user. Additionally, it was noticed that the read and write functions for the 'peer_mw*' files should have checked for a NULL pointer. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_perf: Wait for link before running test	Logan Gunthorpe	1	-1/+4
	Instead of returning immediately with an error when the link is down, wait for the link to come up (or the user sends a SIGINT). This is to make scripting ntb_perf easier. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_perf: Return results by reading the run file	Logan Gunthorpe	1	-12/+55
	Instead of having to watch logs, allow the results to be retrieved by reading back the run file. This file will return "running" when the test is running and nothing if no tests have been run yet. It returns 1 line per thread, and will display an error message if the corresponding thread returns an error. With the above change, the pr_info calls that returned the results are then changed to pr_debug calls. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_perf: Improve thread handling to increase robustness	Logan Gunthorpe	1	-48/+76
	This commit accomplishes a few things: 1) Properly prevent multiple sets of threads from running at once using a mutex. Lots of race issues existed with the thread_cleanup. 2) The mutex allows us to ensure that threads are finished before tearing down the device or module. 3) Don't use kthread_stop when the threads can exit by themselves, as this is counter-indicated by the kthread_create documentation. Threads now wait for kthread_stop to occur. 4) Writing to the run file now blocks until the threads are complete. The test can then be safely interrupted by a SIGINT. Also, while I was at it: 5) debugfs_run_write shouldn't return 0 in the early check cases as this could cause debugfs_run_write to loop undesirably. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_perf: Schedule based on time not on performance	Logan Gunthorpe	1	-2/+4
	When debugging performance problems, if some issue causes the ntb hardware to be significantly slower than expected, ntb_perf will hang requiring a reboot because it only schedules once every 4GB. Instead, schedule based on jiffies so it will not hang the CPU if the transfer is slow. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_transport: Check the number of spads the hardware supports	Logan Gunthorpe	2	-4/+13
	I'm working on hardware that currently has a limited number of scratchpad registers and ntb_ndev fails with no clue as to why. I feel it is better to fail early and provide a reasonable error message then to fail later on. The same is done to ntb_perf, but it doesn't currently require enough spads to actually fail. I've also removed the unused SPAD_MSG and SPAD_ACK enums so that MAX_SPAD accurately reflects the number of spads used. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_tool: Add memory window debug support	Logan Gunthorpe	1	-1/+257
	We allocate some memory window buffers when the link comes up, then we provide debugfs files to read/write each side of the link. This is useful for debugging the mapping when writing new drivers. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_perf: Allow limiting the size of the memory windows	Logan Gunthorpe	1	-0/+8
	On my system, dma_alloc_coherent won't produce memory anywhere near the size of the BAR. So I needed a way to limit this. It's pretty much copied straight from ntb_transport. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	NTB: allocate number transport entries depending on size of ring size	Dave Jiang	1	-2/+27
	Currently we only allocate a fixed default number of descriptors for the tx and rx side. We should dynamically resize it to the number of descriptors resides in the transport rings. We should know the number of transmit descriptors at initializaiton. We will allocate the default number of descriptors for receive side and allocate additional ones when we know the actual max entries for receive. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <allen.hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_tool: BUG: Ensure the buffer size is large enough to return all spads	Logan Gunthorpe	1	-2/+8
	On hardware with 32 scratchpad registers the spad field in ntb tool could chop off the end. The maximum buffer size is increased from 256 to 15 times the number or scratchpads. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_tool: Fix infinite loop bug when writing spad/peer_spad file	Logan Gunthorpe	1	-4/+5
	If you tried to write two spads in one line, as per the example: root@peer# echo '0 0x01010101 1 0x7f7f7f7f' > $DBG_DIR/peer_spad then the CPU would freeze in an infinite loop. This wasn't immediately obvious but 'pos' was not incrementing the buffer, so after reading the second pair of values, 'pos' would once again be 3 and it would re-read the second pair of values ad infinitum. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>