aboutsummaryrefslogtreecommitdiffstats
path: root/ipc (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2015-02-17ipc,sem: use current->state helpersDavidlohr Bueso1-1/+1
Call __set_current_state() instead of assigning the new state directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments, keeping track of who changed the state. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13shmdt: use i_size_read() instead of ->i_sizeDave Hansen1-2/+3
Andrew Morton noted http://lkml.kernel.org/r/20141104142027.a7a0d010772d84560b445f59@linux-foundation.org that the shmdt uses inode->i_size outside of i_mutex being held. There is one more case in shm.c in shm_destroy(). This converts both users over to use i_size_read(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segmentsDave Hansen1-5/+13
This is a highly-contrived scenario. But, a single shmdt() call can be induced in to unmapping memory from mulitple shm segments. Example code is here: http://www.sr71.net/~dave/intel/shmfun.c The fix is pretty simple: Record the 'struct file' for the first VMA we encounter and then stick to it. Decline to unmap anything not from the same file and thus the same segment. I found this by inspection and the odds of anyone hitting this in practice are pretty darn small. Lightly tested, but it's a pretty small patch. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13ipc/msg: increase MSGMNI, remove scalingManfred Spraul6-266/+19
SysV can be abused to allocate locked kernel memory. For most systems, a small limit doesn't make sense, see the discussion with regards to SHMMAX. Therefore: increase MSGMNI to the maximum supported. And: If we ignore the risk of locking too much memory, then an automatic scaling of MSGMNI doesn't make sense. Therefore the logic can be removed. The code preserves auto_msgmni to avoid breaking any user space applications that expect that the value exists. Notes: 1) If an administrator must limit the memory allocations, then he can set MSGMNI as necessary. Or he can disable sysv entirely (as e.g. done by Android). 2) MSGMAX and MSGMNB are intentionally not increased, as these values are used to control latency vs. throughput: If MSGMNB is large, then msgsnd() just returns and more messages can be queued before a task switch to a task that calls msgrcv() is forced. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()Manfred Spraul1-3/+10
When I fixed bugs in the sem_lock() logic, I was more conservative than necessary. Therefore it is safe to replace the smp_mb() with smp_rmb(). And: With smp_rmb(), semop() syscalls are up to 10% faster. The race we must protect against is: sem->lock is free sma->complex_count = 0 sma->sem_perm.lock held by thread B thread A: A: spin_lock(&sem->lock) B: sma->complex_count++; (now 1) B: spin_unlock(&sma->sem_perm.lock); A: spin_is_locked(&sma->sem_perm.lock); A: XXXXX memory barrier A: if (sma->complex_count == 0) Thread A must read the increased complex_count value, i.e. the read must not be reordered with the read of sem_perm.lock done by spin_is_locked(). Since it's about ordering of reads, smp_rmb() is sufficient. [akpm@linux-foundation.org: update sem_lock() comment, from Davidlohr] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-04copy address of proc_ns_ops into ns_commonAl Viro2-0/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-04new helpers: ns_alloc_inum/ns_free_inumAl Viro1-3/+3
take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-04make proc_ns_operations work with struct ns_common * instead of void *Al Viro1-9/+3
We can do that now. And kill ->inum(), while we are at it - all instances are identical. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-04switch the rest of proc_ns_operations to working with &...->nsAl Viro1-6/+9
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-04common object embedded into various struct ....nsAl Viro2-5/+5
for now - just move corresponding ->proc_inum instances over there Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-03ipc/sem.c: fully initialize sem_array before making it visibleManfred Spraul1-7/+8
ipc_addid() makes a new ipc identifier visible to everyone. New objects start as locked, so that the caller can complete the initialization after the call. Within struct sem_array, at least sma->sem_base and sma->sem_nsems are accessed without any locks, therefore this approach doesn't work. Thus: Move the ipc_addid() to the end of the initialization. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reported-by: Rik van Riel <riel@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-19new helper: audit_file()Al Viro1-2/+2
... for situations when we don't have any candidate in pathnames - basically, in descriptor-based syscalls. [Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-14ipc: resolve shadow warningsMark Rustad1-46/+46
Resolve some shadow warnings produced in W=2 builds by changing the name of some parameters and local variables. Change instances of "s64" because that clashes with the well-known typedef. Also change a local variable with the name "up" because that clashes with the name of of the "up" function for semaphores. These are hazards so eliminate the hazards by renaming them. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14ipc/util.c: use __seq_open_private() instead of seq_open()Rob Jones1-16/+4
Using __seq_open_private() removes boilerplate code from sysvipc_proc_open(). The resultant code is shorter and easier to follow. However, please note that __seq_open_private() call kzalloc() rather than kmalloc() which may affect timing due to the memory initialisation overhead. Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14ipc/shm: kill the historical/wrong mm->start_stack checkOleg Nesterov1-7/+0
do_shmat() is the only user of ->start_stack (proc just reports its value), and this check looks ugly and wrong. The reason for this check is not clear at all, and it wrongly assumes that the stack can only grow down. But the main problem is that in general mm->start_stack has nothing to do with stack_vma->vm_start. Not only the application can switch to another stack and even unmap this area, setup_arg_pages() expands the stack without updating mm->start_stack during exec(). This means that in the likely case "addr > start_stack - size - PAGE_SIZE * 5" is simply impossible after find_vma_intersection() == F, or the stack can't grow anyway because of RLIMIT_STACK. Many thanks to Hugh for his explanations. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14ipc: always handle a new value of auto_msgmniAndrey Vagin1-2/+1
proc_dointvec_minmax() returns zero if a new value has been set. So we don't need to check all charecters have been handled. Below you can find two examples. In the new value has not been handled properly. $ strace ./a.out open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3 write(3, "0\n\0", 3) = 2 close(3) = 0 exit_group(0) $ cat /sys/kernel/debug/tracing/trace $strace ./a.out open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3 write(3, "0\n", 2) = 2 close(3) = 0 $ cat /sys/kernel/debug/tracing/trace a.out-697 [000] .... 3280.998235: unregister_ipcns_notifier <-proc_ipcauto_dointvec_minmax Fixes: 9eefe520c814 ("ipc: do not use a negative value to re-enable msgmni automatic recomputin") Signed-off-by: Andrey Vagin <avagin@openvz.org> Cc: Mathias Krause <minipli@googlemail.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Joe Perches <joe@perches.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-09Documentation: Docbook: Fix generated DocBook/kernel-api.xmlMasanari Iida1-3/+3
This patch fix spelling typo found in DocBook/kernel-api.xml. It is because the file is generated from the source comments, I have to fix the comments in source codes. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-08-08shm: allow exit_shm in parallel if only marking orphansJack Miller1-32/+35
If shm_rmid_force (the default state) is not set then the shmids are only marked as orphaned and does not require any add, delete, or locking of the tree structure. Seperate the sysctl on and off case, and only obtain the read lock. The newly added list head can be deleted under the read lock because we are only called with current and will only change the semids allocated by this task and not manipulate the list. This commit assumes that up_read includes a sufficient memory barrier for the writes to be seen my others that later obtain a write lock. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Jack Miller <millerjo@us.ibm.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08shm: make exit_shm work proportional to task activityJack Miller1-11/+11
This is small set of patches our team has had kicking around for a few versions internally that fixes tasks getting hung on shm_exit when there are many threads hammering it at once. Anton wrote a simple test to cause the issue: http://ozlabs.org/~anton/junkcode/bust_shm_exit.c Before applying this patchset, this test code will cause either hanging tracebacks or pthread out of memory errors. After this patchset, it will still produce output like: root@somehost:~# ./bust_shm_exit 1024 160 ... INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 116, t=2111 jiffies, g=241, c=240, q=7113) INFO: Stall ended before state dump start ... But the task will continue to run along happily, so we consider this an improvement over hanging, even if it's a bit noisy. This patch (of 3): exit_shm obtains the ipc_ns shm rwsem for write and holds it while it walks every shared memory segment in the namespace. Thus the amount of work is related to the number of shm segments in the namespace not the number of segments that might need to be cleaned. In addition, this occurs after the task has been notified the thread has exited, so the number of tasks waiting for the ns shm rwsem can grow without bound until memory is exausted. Add a list to the task struct of all shmids allocated by this task. Init the list head in copy_process. Use the ns->rwsem for locking. Add segments after id is added, remove before removing from id. On unshare of NEW_IPCNS orphan any ids as if the task had exited, similar to handling of semaphore undo. I chose a define for the init sequence since its a simple list init, otherwise it would require a function call to avoid include loops between the semaphore code and the task struct. Converting the list_del to list_del_init for the unshare cases would remove the exit followed by init, but I left it blow up if not inited. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Jack Miller <millerjo@us.ibm.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-29namespaces: Use task_lock and not rcu to protect nsproxyEric W. Biederman1-3/+3
The synchronous syncrhonize_rcu in switch_task_namespaces makes setns a sufficiently expensive system call that people have complained. Upon inspect nsproxy no longer needs rcu protection for remote reads. remote reads are rare. So optimize for same process reads and write by switching using rask_lock instead. This yields a simpler to understand lock, and a faster setns system call. In particular this fixes a performance regression observed by Rafael David Tinoco <rafael.tinoco@canonical.com>. This is effectively a revert of Pavel Emelyanov's commit cf7b708c8d1d7a27736771bcf4c457b332b0f818 Make access to task's nsproxy lighter from 2007. The race this originialy fixed no longer exists as do_notify_parent uses task_active_pid_ns(parent) instead of parent->nsproxy. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-06-06ipc: convert use of typedef ctl_table to struct ctl_tableJoe Perches2-13/+13
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc/sem.c: add a printk_once for semctl(GETNCNT/GETZCNT)Manfred Spraul1-0/+11
The actual Linux implementation for semctl(GETNCNT) and semctl(GETZCNT) always (since 0.99.10) reported a thread as sleeping on all semaphores that are listed in the semop() call. The documented behavior (both in the Linux man page and in the Single Unix Specification) is that a task should be reported on exactly one semaphore: The semaphore that caused the thread to got to sleep. This patch adds a pr_info_once() that is triggered if a thread hits the relevant case. The code triggers slightly too often, otherwise it would be necessary to replicate the old code. As there are no known users of GETNCNT or GETZCNT, this is done to prevent unnecessary bloat. The task that triggered is reported with name (tsk->comm) and pid. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc/sem.c: make semctl(,,{GETNCNT,GETZCNT}) standard compliantManfred Spraul1-21/+13
SUSv4 clearly defines how semncnt and semzcnt must be calculated: A task waits on exactly one semaphore: The semaphore from the first operation in the sop array that cannot proceed. The Linux implementation never followed the standard, it tried to count all semaphores that might be the reason why a task sleeps. This patch fixes that. Note: a) The implementation assumes that GETNCNT and GETZCNT are rare operations, therefore the code counts them only on demand. (If they wouldn't be rare, then the non-compliance would have been found earlier) b) compared to the initial version of the patch, the BUG_ONs were removed and it was clarified that the new behavior conforms to SUS. Back-compatibility concerns: Manfred: : - there is no application in Fedora that uses GETNCNT or GETZCNT. : : - application that use only single-sop semop() are also safe, the : difference only affects complex apps. : : - portable application are also safe, the new behavior is standard : compliant. : : But that's it. The old behavior existed in Linux from 0.99.something : until now. Michael: : * These operations seem to be very little used. Grepping the public : source that is contained Fedora 20 source DVD, there appear to be no : uses. Of course, this says nothing about uses in private / : non-mainstream FOSS code, but it seems likely that the same pattern : is followed there. : : * The existing behavior is hard enough to understand that I suspect : that no one understood it well enough to rely on it anyway : (especially as that behavior contradicted both man page and POSIX). : : So, there's a chance of breakage, but I estimate that it's minute. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc/sem.c: store which operation blocks in perform_atomic_semop()Manfred Spraul1-0/+3
Preparation for the next patch: In the slow-path of perform_atomic_semop(), store a pointer to the operation that caused the operation to block. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc/sem.c: change perform_atomic_semop parametersManfred Spraul1-19/+19
Right now, perform_atomic_semop gets the content of sem_queue as individual fields. Changes that, instead pass a pointer to sem_queue. This is a preparation for the next patch: it uses sem_queue to store the reason why a task must sleep. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc/sem.c: remove code duplicationManfred Spraul1-56/+51
count_semzcnt and count_semncnt are more of less identical. The patch creates a single function that either counts the number of tasks waiting for zero or waiting due to a decrease operation. Compared to the initial version, the BUG_ONs were removed. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc/sem.c: bugfix for semctl(,,GETZCNT)Manfred Spraul1-0/+10
GETZCNT is supposed to return the number of threads that wait until a semaphore value becomes 0. The current implementation overlooks complex operations that contain both wait-for-zero operation and operations that alter at least one semaphore. The patch fixes that. It's intentionally copy&paste, this will be cleaned up in the next patch. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc,msg: document volatile r_msgDavidlohr Bueso1-3/+7
The need for volatile is not obvious, document it. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc,msg: move some msgq ns code aroundDavidlohr Bueso1-69/+63
Nothing big and no logical changes, just get rid of some redundant function declarations. Move msg_[init/exit]_ns down the end of the file. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc,msg: use current->state helpersDavidlohr Bueso1-2/+2
Call __set_current_state() instead of assigning the new state directly. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Manfred Spraul <manfred@colorfullif.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc/shm.c: check for integer overflow during shmget.Manfred Spraul1-0/+3
SHMMAX is the upper limit for the size of a shared memory segment, counted in bytes. The actual allocation is that size, rounded up to the next full page. Add a check that prevents the creation of segments where the rounded up size causes an integer overflow. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc/shm.c: check for overflows of shm_totManfred Spraul1-1/+2
shm_tot counts the total number of pages used by shm segments. If SHMALL is ULONG_MAX (or nearly ULONG_MAX), then the number can overflow. Subsequent calls to shmctl(,SHM_INFO,) would return wrong values for shm_tot. The patch adds a detection for overflows. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc/shm.c: check for ulong overflows in shmatManfred Spraul1-0/+3
The increase of SHMMAX/SHMALL is a 4 patch series. The change itself is trivial, the only problem are interger overflows. The overflows are not new, but if we make huge values the default, then the code should be free from overflows. SHMMAX: - shmmem_file_setup places a hard limit on the segment size: MAX_LFS_FILESIZE. On 32-bit, the limit is > 1 TB, i.e. 4 GB-1 byte segments are possible. Rounded up to full pages the actual allocated size is 0. --> must be fixed, patch 3 - shmat: - find_vma_intersection does not handle overflows properly. --> must be fixed, patch 1 - the rest is fine, do_mmap_pgoff limits mappings to TASK_SIZE and checks for overflows (i.e.: map 2 GB, starting from addr=2.5GB fails). SHMALL: - after creating 8192 segments size (1L<<63)-1, shm_tot overflows and returns 0. --> must be fixed, patch 2. Userspace: - Obviously, there could be overflows in userspace. There is nothing we can do, only use values smaller than ULONG_MAX. I ended with "ULONG_MAX - 1L<<24": - TASK_SIZE cannot be used because it is the size of the current task. Could be 4G if it's a 32-bit task on a 64-bit kernel. - The maximum size is not standardized across archs: I found TASK_MAX_SIZE, TASK_SIZE_MAX and TASK_SIZE_64. - Just in case some arch revives a 4G/4G split, nearly ULONG_MAX is a valid segment size. - Using "0" as a magic value for infinity is even worse, because right now 0 means 0, i.e. fail all allocations. This patch (of 4): find_vma_intersection() does not work as intended if addr+size overflows. The patch adds a manual check before the call to find_vma_intersection. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc, kernel: clear whitespacePaul McQuade5-28/+27
trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc, kernel: use Linux headersPaul McQuade5-5/+5
Use #include <linux/uaccess.h> instead of <asm/uaccess.h> Use #include <linux/types.h> instead of <asm/types.h> Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ipc: constify ipc_opsMathias Krause5-20/+19
There is no need to recreate the very same ipc_ops structure on every kernel entry for msgget/semget/shmget. Just declare it static and be done with it. While at it, constify it as we don't modify the structure at runtime. Found in the PaX patch, written by the PaX Team. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07ipc: use device_initcallDavidlohr Bueso3-3/+3
... since __initcall is now deprecated. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07ipc/compat.c: remove sc_semopm macroDavidlohr Bueso1-3/+0
This macro appears to have been introduced back in the 2.5 era for semtimedop32 backward compatibility on ia32: https://lkml.org/lkml/2003/4/28/78 Nowadays, this syscall in compat just defaults back to the code found in sem.c, so it is no longer used and can thus be removed: long compat_sys_semtimedop(int semid, struct sembuf __user *tsems, unsigned nsops, const struct compat_timespec __user *timeout) { struct timespec __user *ts64; if (compat_convert_timespec(&ts64, timeout)) return -EFAULT; return sys_semtimedop(semid, tsems, nsops, ts64); } Furthermore, there are no users in compat.c. After this change, kernel builds just fine with both CONFIG_SYSVIPC_COMPAT and CONFIG_SYSVIPC. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-16ipc: Fix 2 bugs in msgrcv() MSG_COPY implementationMichael Kerrisk1-0/+2
While testing and documenting the msgrcv() MSG_COPY flag that Stanislav Kinsbursky added in commit 4a674f34ba04 ("ipc: introduce message queue copy feature" => kernel 3.8), I discovered a couple of bugs in the implementation. The two bugs concern MSG_COPY interactions with other msgrcv() flags, namely: (A) MSG_COPY + MSG_EXCEPT (B) MSG_COPY + !IPC_NOWAIT The bugs are distinct (and the fix for the first one is obvious), however my fix for both is a single-line patch, which is why I'm combining them in a single mail, rather than writing two mails+patches. ===== (A) MSG_COPY + MSG_EXCEPT ===== With the addition of the MSG_COPY flag, there are now two msgrcv() flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp' argument in unrelated ways. Specifying both in the same call is a logical error that is currently permitted, with the effect that MSG_COPY has priority and MSG_EXCEPT is ignored. The call should give an error if both flags are specified. The patch below implements that behavior. ===== (B) (B) MSG_COPY + !IPC_NOWAIT ===== The test code that was submitted in commit 3a665531a3b7 ("selftests: IPC message queue copy feature test") shows MSG_COPY being used in conjunction with IPC_NOWAIT. In other words, if there is no message at the position 'msgtyp'. return immediately with the error in ENOMSG. What was not (fully) tested is the behavior if MSG_COPY is specified *without* IPC_NOWAIT, and there is an odd behavior. If the queue contains less than 'msgtyp' messages, then the call blocks until the next message is written to the queue. At that point, the msgrcv() call returns a copy of the newly added message, regardless of whether that message is at the ordinal position 'msgtyp'. This is clearly bogus, and problematic for applications that might want to make use of the MSG_COPY flag. I considered the following possible solutions to this problem: (1) Force the call to block until a message *does* appear at the position 'msgtyp'. (2) If the MSG_COPY flag is specified, the kernel should implicitly add IPC_NOWAIT, so that the call fails with ENOMSG for this case. (3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate an error (probably, EINVAL is the right one). I do not know if any application would really want to have the functionality of solution (1), especially since an application can determine in advance the number of messages in the queue using msgctl() IPC_STAT. Obviously, this solution would be the most work to implement. Solution (2) would have the effect of silently fixing any applications that tried to employ broken behavior. However, it would mean that if we later decided to implement solution (1), then user-space could not easily detect what the kernel supports (but, since I'm somewhat doubtful that solution (1) is needed, I'm not sure that this is much of a problem). Solution (3) would have the effect of informing broken applications that they are doing something broken. The downside is that this would cause a ABI breakage for any applications that are currently employing the broken behavior. However: a) Those applications are almost certainly not getting the results they expect. b) Possibly, those applications don't even exist, because MSG_COPY is currently hidden behind CONFIG_CHECKPOINT_RESTORE. The upside of solution (3) is that if we later decided to implement solution (1), user-space could determine what the kernel supports, via the error return. In my view, solution (3) is mildly preferable to solution (2), and solution (1) could still be done later if anyone really cares. The patch below implements solution (3). PS. For anyone out there still listening, it's the usual story: documenting an API (and the thinking about, and the testing of the API, that documentation entails) is the one of the single best ways of finding bugs in the API, as I've learned from a lot of experience. Best to do that documentation before releasing the API. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-06ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter typesHeiko Carstens1-8/+8
In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that performs proper zero and sign extension convert all 64 bit parameters to their corresponding 32 bit compat counterparts. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-06ipc/compat: convert to COMPAT_SYSCALL_DEFINEHeiko Carstens2-12/+13
Convert all compat system call functions where all parameter types have a size of four or less than four bytes, or are pointer types to COMPAT_SYSCALL_DEFINE. The implicit casts within COMPAT_SYSCALL_DEFINE will perform proper zero and sign extension to 64 bit of all parameters if needed. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-06ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_tHeiko Carstens1-2/+2
Change the type of compat_sys_msgrcv's msgtyp parameter from long to compat_long_t, since compat user space passes only a 32 bit signed value. Let the compat wrapper do proper sign extension to 64 bit of this parameter. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-02-25ipc,mqueue: remove limits for the amount of system-wide queuesDavidlohr Bueso2-9/+15
Commit 93e6f119c0ce ("ipc/mqueue: cleanup definition names and locations") added global hardcoded limits to the amount of message queues that can be created. While these limits are per-namespace, reality is that it ends up breaking userspace applications. Historically users have, at least in theory, been able to create up to INT_MAX queues, and limiting it to just 1024 is way too low and dramatic for some workloads and use cases. For instance, Madars reports: "This update imposes bad limits on our multi-process application. As our app uses approaches that each process opens its own set of queues (usually something about 3-5 queues per process). In some scenarios we might run up to 3000 processes or more (which of-course for linux is not a problem). Thus we might need up to 9000 queues or more. All processes run under one user." Other affected users can be found in launchpad bug #1155695: https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695 Instead of increasing this limit, revert it entirely and fallback to the original way of dealing queue limits -- where once a user's resource limit is reached, and all memory is used, new queues cannot be created. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reported-by: Madars Vitolins <m@silodev.com> Acked-by: Doug Ledford <dledford@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> [3.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-02compat: Get rid of (get|put)_compat_time(val|spec)H. Peter Anvin2-25/+6
We have two APIs for compatiblity timespec/val, with confusingly similar names. compat_(get|put)_time(val|spec) *do* handle the case where COMPAT_USE_64BIT_TIME is set, whereas (get|put)_compat_time(val|spec) do not. This is an accident waiting to happen. Clean it up by favoring the full-service version; the limited version is replaced with double-underscore versions static to kernel/compat.c. A common pattern is to convert a struct timespec to kernel format in an allocation on the user stack. Unfortunately it is open-coded in several places. Since this allocation isn't actually needed if COMPAT_USE_64BIT_TIME is true (since user format == kernel format) encapsulate that whole pattern into the function compat_convert_timespec(). An equivalent function should be written for struct timeval if it is needed in the future. Finally, get rid of compat_(get|put)_timeval_convert(): each was only used once, and the latter was not even doing what the function said (no conversion actually was being done.) Moving the conversion into compat_sys_settimeofday() itself makes the code much more similar to sys_settimeofday() itself. v3: Remove unused compat_convert_timeval(). v2: Drop bogus "const" in the destination argument for compat_convert_time*(). Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Hans Verkuil <hans.verkuil@cisco.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Tested-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-27ipc: fix compat msgrcv with negative msgtypMateusz Guzik1-1/+1
Compat function takes msgtyp argument as u32 and passes it down to do_msgrcv which results in casting to long, thus the sign is lost and we get a big positive number instead. Cast the argument to signed type before passing it down. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Reported-by: Gabriellla Schmidt <gsc@bruker.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27ipc,msg: document barriersDavidlohr Bueso1-2/+17
Both expunge_all() and pipeline_send() rely on both a nil msg value and a full barrier to guarantee the correct ordering when waking up a task. While its counterpart at the receiving end is well documented for the lockless recv algorithm, we still need to document these specific smp_mb() calls. [akpm@linux-foundation.org: fix typo, per Mike] [akpm@linux-foundation.org: mroe tpyos] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27ipc: delete seq_max field in struct ipc_idsDavidlohr Bueso2-11/+3
This field is only used to reset the ids seq number if it exceeds the smaller of INT_MAX/SEQ_MULTIPLIER and USHRT_MAX, and can therefore be moved out of the structure and into its own macro. Since each ipc_namespace contains a table of 3 pointers to struct ipc_ids we can save space in instruction text: text data bss dec hex filename 56232 2348 24 58604 e4ec ipc/built-in.o 56216 2348 24 58588 e4dc ipc/built-in.o-after Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Jonathan Gonzalez <jgonzalez@linets.cl> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27ipc: simplify sysvipc_proc_open() returnDavidlohr Bueso1-5/+4
Get rid of silly/useless label jumping. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27ipc: remove useless return statementDavidlohr Bueso1-4/+0
Only found in ipc_rmid(). Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27ipc: remove braces for single statementsDavidlohr Bueso4-19/+16
Deal with checkpatch messages: WARNING: braces {} are not necessary for single statement blocks Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>