linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-10-27	dmaengine: Kconfig: edma: Select TI_DMA_CROSSBAR in case of ARCH_OMAP	Peter Ujfalusi	1	-0/+1
	Since the crossbar is needed for eDMA when it is used on OMAP like platforms (am335x/am437x and later DRA7xx), select the crossbar to be built if ARCH_OMAP is set. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-27	dmaengine: ti-dma-crossbar: Add support for crossbar on AM33xx/AM43xx	Peter Ujfalusi	2	-32/+234
	The DMA event crossbar on AM33xx/AM43xx is different from the one found in DRA7x family. Instead of a single event crossbar it has 64 identical mux attached to each eDMA event line. When the 0 event mux is selected, the default mapped event is going to be routed to the corresponding eDMA event line. If different mux is selected, then the selected event is going to be routed to the given eDMA event. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-27	dmaengine: edma: Merge the of parsing functions	Peter Ujfalusi	1	-16/+8
	Instead of nesting functions just merge them since the resulting function is still small and readable. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-27	dmaengine: edma: Do not allocate memory for edma_rsv_info in case of DT boot	Peter Ujfalusi	1	-6/+0
	The channel/slot reservation is not supported when booted with DT so there is not need to allocate memory. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-27	dmaengine: edma: Refactor the dma device and channel struct initialization	Peter Ujfalusi	1	-42/+37
	Move all code under one function to do the dma device and eDMA channel related setup so they are not scattered around the driver. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-27	dmaengine: edma: Get qDMA channel information from HW also	Peter Ujfalusi	1	-0/+6
	Query the number of qDMA channels from CCCFG register. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-27	dmaengine: edma: Merge map_dmach_to_queue into assign_channel_eventq	Peter Ujfalusi	1	-34/+22
	edma_assign_channel_eventq() is a wrapper around edma_map_dmach_to_queue() We can merge the content of the later so we will have only one function to be used for mapping channels to given eventq Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-27	dmaengine: edma: Correct PaRAM access function names (_parm_ to _param_)	Peter Ujfalusi	1	-12/+12
	These inline functions are designed to modify parts of the PaRAM in eDMA. Change the names accordingly. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-27	dmaengine: edma: Simplify function parameter list for channel operations	Peter Ujfalusi	1	-273/+123
	Instead of passing a pointer to struct edma_cc and the channel number, pass only the pointer to the edma_chan structure for the given channel. This struct contains all the information needed by the functions and the use of this makes it obvious that most of the sanity checks can be removed from the driver. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-27	dmaengine: edma: Optimize memcpy operation	Peter Ujfalusi	1	-21/+75
	If the transfer is shorted then 64K we can complete it with one ACNT burst by configuring ACNT to the length of the copy, this require one paRAM slot. Otherwise we use two paRAM slots for the copy: slot1: will copy (length / 32767) number of 32767 byte long blocks slot2: will be configured to copy the remaining data. According to tests this patch increases the throughput of memcpy from ~3MB/s to 15MB/s Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-27	dmaengine: edma: Remove alignment constraint for memcpy	Peter Ujfalusi	1	-7/+6
	Despite the claim by the original commit adding the memcpy support, eDMA does not have constraint on the alignment of src, dst or length in increment mode. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Dynamic paRAM slot handling if HW supports it	Peter Ujfalusi	1	-49/+52
	If the eDMA3 has support for channel paRAM slot mapping we can utilize it to allocate slots on demand and save precious slots for real transfers. On am335x the eDMA has 64 channels which means we can unlock 64 paRAM slots out from the available 256. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Rename bitfields for slot and channel usage tracking	Peter Ujfalusi	1	-25/+26
	The names chosen for the bitfields were quite confusing and given no real information on what they are used for... edma_inuse -> slot_inuse: tracks the slot usage/availability edma_unused -> channel_unused: tracks the channel usage/availability Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Read channel mapping support only once from HW	Peter Ujfalusi	1	-2/+6
	Instead of directly reading it from CCCFG register take the information out once when we set up the configuration from the HW. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Simplify and optimize ccerr interrupt handler	Peter Ujfalusi	1	-47/+35
	No need to run through the bits in QEMR and CCERR events since they will not trigger any action, so just clearing the errors there is fine. In case of the missed event the loop can be optimized so we spend less time to handle the event. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Move the pending error check into helper function	Peter Ujfalusi	1	-8/+12
	In the ccerr interrupt handler the code checks for pending errors in the error status registers in two different places. Move the check out to a helper function. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Simplify the interrupt handling	Peter Ujfalusi	1	-245/+205
	With the merger of the arch/arm/common/edma.c code into the dmaengine driver, there is no longer need to have per channel callback/data storage for interrupt events. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Consolidate the comments for functions	Peter Ujfalusi	1	-75/+11
	Remove or rewrite the comments for the internal functions. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Print warning when linking slots from different eDMA	Peter Ujfalusi	1	-0/+3
	Warning message in case of linking between paRAM slots in different eDMA controllers. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Use the edma_write_slot instead open coded memcpy_toio	Peter Ujfalusi	1	-2/+1
	edma_write_slot() is for writing an entire paRAM slot. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Use dev_dbg instead pr_debug	Peter Ujfalusi	1	-12/+12
	We have access to dev, so it is better to use the dev_dbg for debug prints. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Cleanup regarding the use of dev around the code	Peter Ujfalusi	1	-31/+30
	Be consistent and do not mix the use of dev, &pdev->dev, etc in the functions. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Use devm_kcalloc when possible	Peter Ujfalusi	1	-2/+2
	When allocating a memory for number of items it is better (looks better) to use devm_kcalloc. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Allocate memory dynamically for bitmaps and structures	Peter Ujfalusi	1	-28/+34
	Instead of using defines to specify the size of different arrays and bitmaps, allocate the memory for them based on the information we get from the HW itself. Since these defines are set based on the worst case, there are devices where they are not valid. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	ARM/dmaengine: edma: Merge the two drivers under drivers/dma/	Peter Ujfalusi	8	-1587/+1431
	Move the code out from arch/arm/common and merge it inside of the dmaengine driver. This change is done with as minimal (if eny) functional change to the code as possible to avoid introducing regression. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	ARM: davinci: Add dma_mask to eDMA devices	Peter Ujfalusi	4	-0/+5
	The upcoming change to merge the arch/arm/common/edma.c into drivers/dma/edma.c will need this change when booting daVinci devices in no DT mode. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	ARM: davinci: Use platform_device_register_full() to create pdev for eDMA	Peter Ujfalusi	4	-39/+57
	Convert the eDMA platform device creation to use struct platform_device_info XXXXXX __initconst and platform_device_register_full() This will allow us to cleanly specify the dma_mask for the devices in an upcoming patch. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	ARM/dmaengine: edma: Remove limitation on the number of eDMA controllers	Peter Ujfalusi	2	-26/+13
	Since the driver stack no longer depends on lookup with id number in a global array of pointers, the limitation for the number of eDMAs are no longer needed. We can handle as many eDMAs in legacy and DT boot as we have memory for them to allocate the needed structures. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	ARM/dmaengine: edma: Public API to use private struct pointer	Peter Ujfalusi	3	-208/+214
	Instead of relying on indexes pointing to edma private date in the global pointer array, pass the private data pointer via the public API. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	ARM: common: edma: Internal API to use pointer to 'struct edma'	Peter Ujfalusi	1	-203/+197
	Merge the iomem into the 'struct edma' and change the internal (static) functions to use pointer to the edma_cc instead of the ctlr number. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	ARM/dmaengine: edma: Move of_dma_controller_register to the dmaengine driver	Peter Ujfalusi	2	-10/+16
	If the of_dma_controller is registered in the non dmaengine driver we could have race condition: the of_dma_controller has been registered, but the dmaengine driver is not yet probed. Drivers requesting DMA channels during this window will fail since we do not yet have dmaengine drivers registered. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	ARM: davinci/common: Convert edma driver to handle one eDMA instance per driver	Peter Ujfalusi	6	-326/+234
	Currently we have one device created to handle all (maximum 2) eDMAs in the system. With this change all eDMA instance will have it's own device/driver. This change is needed for further cleanups in the eDMA driver stack since the one device/driver to handle all eDMAs in the system was not flexible enough and prevents the upcoming work. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	dmaengine: edma: Simplify and optimize the edma_execute path	Peter Ujfalusi	1	-47/+29
	The code path in edma_execute() and edma_callback() can be simplified and make it more optimal. There is not need to call in to edma_execute() when the transfer has been finished for example. Also the handling of missed/first or next batch of paRAMs can be done in a more optimal way. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	ARM: common: edma: Remove unused functions	Peter Ujfalusi	2	-409/+0
	We no longer have users for these functions so they can be removed. Remove also unused enums from the header file. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-10-14	ARM: common: edma: Fix channel parameter for irq callbacks	Peter Ujfalusi	1	-2/+4
	In case when the interrupt happened for the second eDMA the channel number was incorrectly passed to the client driver. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> CC: <stable@vger.kernel.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-09-12	Linux 4.3-rc1	Linus Torvalds	1	-2/+2

2015-09-12	blk: rq_data_dir() should not return a boolean	Linus Torvalds	1	-1/+1
	rq_data_dir() returns either READ or WRITE (0 == READ, 1 == WRITE), not a boolean value. Now, admittedly the "!= 0" doesn't really change the value (0 stays as zero, 1 stays as one), but it's not only redundant, it confuses gcc, and causes gcc to warn about the construct switch (rq_data_dir(req)) { case READ: ... case WRITE: ... that we have in a few drivers. Now, the gcc warning is silly and stupid (it seems to warn not about the switch value having a different type from the case statements, but about _any_ boolean switch value), but in this case the code itself is silly and stupid too, so let's just change it, and get rid of warnings like this: drivers/block/hd.c: In function ‘hd_request’: drivers/block/hd.c:630:11: warning: switch condition has boolean value [-Wswitch-bool] switch (rq_data_dir(req)) { The odd '!= 0' came in when "cmd_flags" got turned into a "u64" in commit 5953316dbf90 ("block: make rq->cmd_flags be 64-bit") and is presumably because the old code (that just did a logical 'and' with 1) would then end up making the type of rq_data_dir() be u64 too. But if we want to retain the old regular integer type, let's just cast the result to 'int' rather than use that rather odd '!= 0'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-12	writeback: plug writeback in wb_writeback() and writeback_inodes_wb()	Linus Torvalds	1	-0/+6
	We had to revert the pluggin in writeback_sb_inodes() because the wb->list_lock is held, but we could easily plug at a higher level before taking that lock, and unplug after releasing it. This does that. Chris will run performance numbers, just to verify that this approach is comparable to the alternative (we could just drop and re-take the lock around the blk_finish_plug() rather than these two commits. I'd have preferred waiting for actual performance numbers before picking one approach over the other, but I don't want to release rc1 with the known "sleeping function called from invalid context" issue, so I'll pick this cleanup version for now. But if the numbers show that we really want to plug just at the writeback_sb_inodes() level, and we should just play ugly games with the spinlock, we'll switch to that. Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	thermal: fix intel PCH thermal driver mismerge	Linus Torvalds	1	-7/+4
	I didn't notice this when merging the thermal code from Zhang, but his merge (commit 5a924a07f882: "Merge branches 'thermal-core' and 'thermal-intel' of .git into next") of the thermal-core and thermal-intel branches was wrong. In thermal-core, commit 17e8351a7739 ("thermal: consistently use int for temperatures") converted the thermal layer to use "int" for temperatures. But in parallel, in the thermal-intel branch commit d0a12625d2ff ("thermal: Add Intel PCH thermal driver") added support for the intel PCH thermal sensor using the old interfaces that used "unsigned long" pointers. This resulted in warnings like this: drivers/thermal/intel_pch_thermal.c:184:14: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types] .get_temp = pch_thermal_get_temp, ^ drivers/thermal/intel_pch_thermal.c:184:14: note: (near initialization for ‘tzd_ops.get_temp’) drivers/thermal/intel_pch_thermal.c:186:19: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types] .get_trip_temp = pch_get_trip_temp, ^ drivers/thermal/intel_pch_thermal.c:186:19: note: (near initialization for ‘tzd_ops.get_trip_temp’) This fixes it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	ARCv2: [axs103_smp] Reduce clk for SMP FPGA configs	Vineet Gupta	1	-0/+2
	Newer bitfiles needs the reduced clk even for SMP builds Cc: <stable@vger.kernel.org> #4.2 Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	revert "ocfs2/dlm: use list_for_each_entry instead of list_for_each"	Andrew Morton	1	-2/+4
	Revert commit f83c7b5e9fd6 ("ocfs2/dlm: use list_for_each_entry instead of list_for_each"). list_for_each_entry() will dereference its `pos' argument, which can be NULL in dlm_process_recovery_data(). Reported-by: Julia Lawall <julia.lawall@lip6.fr> Reported-by: Fengguang Wu <fengguang.wu@gmail.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	mm/early_ioremap: add explicit #include of asm/early_ioremap.h	Ard Biesheuvel	1	-0/+1
	Commit 6b0f68e32ea8 ("mm: add utility for early copy from unmapped ram") introduces a function copy_from_early_mem() into mm/early_ioremap.c which itself calls early_memremap()/early_memunmap(). However, since early_memunmap() has not been declared yet at this point in the .c file, nor by any explicitly included header files, we are depending on a transitive include of asm/early_ioremap.h to declare it, which is fragile. So instead, include this header explicitly. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Salter <msalter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void	Joe Perches	4	-50/+45
	The seq_<foo> function return values were frequently misused. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") All uses of these return values have been removed, so convert the return types to void. Miscellanea: o Move seq_put_decimal_<type> and seq_escape prototypes closer the other seq_vprintf prototypes o Reorder seq_putc and seq_puts to return early on overflow o Add argument names to seq_vprintf and seq_printf o Update the seq_escape kernel-doc o Convert a couple of leading spaces to tabs in seq_escape Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mark Brown <broonie@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	selftests: enhance membarrier syscall test	Mathieu Desnoyers	1	-25/+75
	Update the membarrier syscall self-test to match the membarrier interface. Extend coverage of the interface. Consider ENOSYS as a "SKIP" test, since it is a valid configuration, but does not allow testing the system call. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	selftests: add membarrier syscall test	Pranith Kumar	4	-0/+84
	Add a self test for the membarrier system call. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	sys_membarrier(): system-wide memory barrier (generic, x86)	Mathieu Desnoyers	11	-1/+151
	Here is an implementation of a new system call, sys_membarrier(), which executes a memory barrier on all threads running on the system. It is implemented by calling synchronize_sched(). It can be used to distribute the cost of user-space memory barriers asymmetrically by transforming pairs of memory barriers into pairs consisting of sys_membarrier() and a compiler barrier. For synchronization primitives that distinguish between read-side and write-side (e.g. userspace RCU [1], rwlocks), the read-side can be accelerated significantly by moving the bulk of the memory barrier overhead to the write-side. The existing applications of which I am aware that would be improved by this system call are as follows: * Through Userspace RCU library (http://urcu.so) - DNS server (Knot DNS) https://www.knot-dns.cz/ - Network sniffer (http://netsniff-ng.org/) - Distributed object storage (https://sheepdog.github.io/sheepdog/) - User-space tracing (http://lttng.org) - Network storage system (https://www.gluster.org/) - Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf) - Financial software (https://lkml.org/lkml/2015/3/23/189) Those projects use RCU in userspace to increase read-side speed and scalability compared to locking. Especially in the case of RCU used by libraries, sys_membarrier can speed up the read-side by moving the bulk of the memory barrier cost to synchronize_rcu(). * Direct users of sys_membarrier - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198) Microsoft core dotnet GC developers are planning to use the mprotect() side-effect of issuing memory barriers through IPIs as a way to implement Windows FlushProcessWriteBuffers() on Linux. They are referring to sys_membarrier in their github thread, specifically stating that sys_membarrier() is what they are looking for. To explain the benefit of this scheme, let's introduce two example threads: Thread A (non-frequent, e.g. executing liburcu synchronize_rcu()) Thread B (frequent, e.g. executing liburcu rcu_read_lock()/rcu_read_unlock()) In a scheme where all smp_mb() in thread A are ordering memory accesses with respect to smp_mb() present in Thread B, we can change each smp_mb() within Thread A into calls to sys_membarrier() and each smp_mb() within Thread B into compiler barriers "barrier()". Before the change, we had, for each smp_mb() pairs: Thread A Thread B previous mem accesses previous mem accesses smp_mb() smp_mb() following mem accesses following mem accesses After the change, these pairs become: Thread A Thread B prev mem accesses prev mem accesses sys_membarrier() barrier() follow mem accesses follow mem accesses As we can see, there are two possible scenarios: either Thread B memory accesses do not happen concurrently with Thread A accesses (1), or they do (2). 1) Non-concurrent Thread A vs Thread B accesses: Thread A Thread B prev mem accesses sys_membarrier() follow mem accesses prev mem accesses barrier() follow mem accesses In this case, thread B accesses will be weakly ordered. This is OK, because at that point, thread A is not particularly interested in ordering them with respect to its own accesses. 2) Concurrent Thread A vs Thread B accesses Thread A Thread B prev mem accesses prev mem accesses sys_membarrier() barrier() follow mem accesses follow mem accesses In this case, thread B accesses, which are ensured to be in program order thanks to the compiler barrier, will be "upgraded" to full smp_mb() by synchronize_sched(). * Benchmarks On Intel Xeon E5405 (8 cores) (one thread is calling sys_membarrier, the other 7 threads are busy looping) 1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call. * User-space user of this system call: Userspace RCU library Both the signal-based and the sys_membarrier userspace RCU schemes permit us to remove the memory barrier from the userspace RCU rcu_read_lock() and rcu_read_unlock() primitives, thus significantly accelerating them. These memory barriers are replaced by compiler barriers on the read-side, and all matching memory barriers on the write-side are turned into an invocation of a memory barrier on all active threads in the process. By letting the kernel perform this synchronization rather than dumbly sending a signal to every process threads (as we currently do), we diminish the number of unnecessary wake ups and only issue the memory barriers on active threads. Non-running threads do not need to execute such barrier anyway, because these are implied by the scheduler context switches. Results in liburcu: Operations in 10s, 6 readers, 2 writers: memory barriers in reader: 1701557485 reads, 2202847 writes signal-based scheme: 9830061167 reads, 6700 writes sys_membarrier: 9952759104 reads, 425 writes sys_membarrier (dyn. check): 7970328887 reads, 425 writes The dynamic sys_membarrier availability check adds some overhead to the read-side compared to the signal-based scheme, but besides that, sys_membarrier slightly outperforms the signal-based scheme. However, this non-expedited sys_membarrier implementation has a much slower grace period than signal and memory barrier schemes. Besides diminishing the number of wake-ups, one major advantage of the membarrier system call over the signal-based scheme is that it does not need to reserve a signal. This plays much more nicely with libraries, and with processes injected into for tracing purposes, for which we cannot expect that signals will be unused by the application. An expedited version of this system call can be added later on to speed up the grace period. Its implementation will likely depend on reading the cpu_curr()->mm without holding each CPU's rq lock. This patch adds the system call to x86 and to asm-generic. [1] http://urcu.so membarrier(2) man page: MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2) NAME membarrier - issue memory barriers on a set of threads SYNOPSIS #include <linux/membarrier.h> int membarrier(int cmd, int flags); DESCRIPTION The cmd argument is one of the following: MEMBARRIER_CMD_QUERY Query the set of supported commands. It returns a bitmask of supported commands. MEMBARRIER_CMD_SHARED Execute a memory barrier on all threads running on the system. Upon return from system call, the caller thread is ensured that all running threads have passed through a state where all memory accesses to user-space addresses match program order between entry to and return from the system call (non-running threads are de facto in such a state). This covers threads from all pro=E2=80=90 cesses running on the system. This command returns 0. The flags argument needs to be 0. For future extensions. All memory accesses performed in program order from each targeted thread is guaranteed to be ordered with respect to sys_membarrier(). If we use the semantic "barrier()" to represent a compiler barrier forcing memory accesses to be performed in program order across the barrier, and smp_mb() to represent explicit memory barriers forcing full memory ordering across the barrier, we have the following ordering table for each pair of barrier(), sys_membarrier() and smp_mb(): The pair ordering is detailed as (O: ordered, X: not ordered): barrier() smp_mb() sys_membarrier() barrier() X X O smp_mb() X O O sys_membarrier() O O O RETURN VALUE On success, these system calls return zero. On error, -1 is returned, and errno is set appropriately. For a given command, with flags argument set to 0, this system call is guaranteed to always return the same value until reboot. ERRORS ENOSYS System call is not implemented. EINVAL Invalid arguments. Linux 2015-04-15 MEMBARRIER(2) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nicholas Miell <nmiell@comcast.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	MODSIGN: fix a compilation warning in extract-cert	David Howells	1	-1/+1
	Fix the following warning when compiling extract-cert: scripts/extract-cert.c: In function `write_cert': scripts/extract-cert.c:89:2: warning: format not a string literal and no format arguments [-Wformat-security] ERR(!i2d_X509_bio(wb, x509), cert_dst); ^ whereby the ERR() macro is taking cert_dst as the format string. "%s" should be used as the format string as the path could contain special characters. Signed-off-by: David Howells <dhowells@redhat.com> Reported-by: Jim Davis <jim.epost@gmail.com> Acked-by : David Woodhouse <david.woodhouse@intel.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	Revert "writeback: plug writeback at a high level"	Linus Torvalds	1	-3/+4
	This reverts commit d353d7587d02116b9732d5c06615aed75a4d3a47. Doing the block layer plug/unplug inside writeback_sb_inodes() is broken, because that function is actually called with a spinlock held: wb->list_lock, as pointed out by Chris Mason. Chris suggested just dropping and re-taking the spinlock around the blk_finish_plug() call (the plgging itself can happen under the spinlock), and that would technically work, but is just disgusting. We do something fairly similar - but not quite as disgusting because we at least have a better reason for it - in writeback_single_inode(), so it's not like the caller can depend on the lock being held over the call, but in this case there just isn't any good reason for that "release and re-take the lock" pattern. [ In general, we should really strive to avoid the "release and retake" pattern for locks, because in the general case it can easily cause subtle bugs when the caller caches any state around the call that might be invalidated by dropping the lock even just temporarily. ] But in this case, the plugging should be easy to just move up to the callers before the spinlock is taken, which should even improve the effectiveness of the plug. So there is really no good reason to play games with locking here. I'll send off a test-patch so that Dave Chinner can verify that that plug movement works. In the meantime this just reverts the problematic commit and adds a comment to the function so that we hopefully don't make this mistake again. Reported-by: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11	scsi_dh: fix randconfig build error	Christoph Hellwig	1	-1/+1
	It looks like the Kconfig check that was meant to fix this (commit fe9233fb6914a0eb20166c967e3020f7f0fba2c9 [SCSI] scsi_dh: fix kconfig related build errors) was actually reversed, but no-one noticed until the new set of patches which separated DM and SCSI_DH). Fixes: fe9233fb6914a0eb20166c967e3020f7f0fba2c9 Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-09-11	target: use stringify.h instead of own definition	David Disseldorp	2	-5/+2
	Signed-off-by: David Disseldorp <ddiss@suse.de> Acked-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>