summaryrefslogtreecommitdiffstats
path: root/sys/kern/subr_pool.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* spellingjsg2021-03-101-3/+3
| | | | ok gnezdo@ semarie@ mpi@
* Add dt(4) TRACEPOINTs for pool_get() and pool_put(), this is simmilar to theclaudio2021-01-061-1/+6
| | | | | | ones added to malloc() and free(). Pass the struct pool pointer as argv1 since it is currently not possible to pass the pool name to btrace. OK mpi@
* pool(9): remove tickscheloha2021-01-021-11/+26
| | | | | | | | | | | | | | | | | | | | | | | | Change the pool(9) timeouts to use the system uptime instead of ticks. - Change the timeouts from variables to macros so we can use SEC_TO_NSEC(). This means these timeouts are no longer patchable via ddb(4). dlg@ does not think this will be a problem, as the timeout intervals have not changed in years. - Use low-res time to keep things fast. Add a local copy of getnsecuptime() to subr_pool.c to keep the diff small. We will need to move getnsecuptime() into kern_tc.c and document it later if we ever have other users elsewhere in the kernel. - Rename ph_tick -> ph_timestamp and pr_cache_tick -> pr_cache_timestamp. Prompted by tedu@ some time ago, but the effort stalled (may have been my fault). Input from kettenis@ and dlg@. Special thanks to mpi@ for help with struct shuffling. This change does not increase the size of struct pool_page_header or struct pool. ok dlg@ mpi@
* pool(9): replace custom TAILQ concatenation loops with TAILQ_CONCAT(3)cheloha2020-01-241-11/+3
| | | | | | | TAILQ_CONCAT(3) apparently wasn't in-tree when this code was written. Using it leaves us with less code *and* better performance. ok tedu@
* pool(9): pl_sleep(): drop unused timeout argumentcheloha2020-01-231-9/+9
| | | | | | | | | | All sleeps have been indefinite since introduction of this interface ~5 years ago, so remove the timeout argument and make indefinite sleeps implicit. While here: *sleep(9) -> *sleep_nsec(9) "i don't think we're going to use timeouts [here]" tedu@, ok mpi@
* After the kernel has reached the sysclt kern.maxclusters limit,bluhm2019-07-191-1/+7
| | | | | | | | operations get stuck while holding the net lock. Increasing the limit did not help as there was no wakeup of the waiting pools. So introduce pool_wakeup() and run through the mbuf pool request list when the limit changes. OK dlg@ visa@
* Remove file name and line number output from witness(4)visa2019-04-231-29/+22
| | | | | | | | | | | | | Reduce code clutter by removing the file name and line number output from witness(4). Typically it is easy enough to locate offending locks using the stack traces that are shown in lock order conflict reports. Tricky cases can be tracked using sysctl kern.witness.locktrace=1 . This patch additionally removes the witness(4) wrapper for mutexes. Now each mutex implementation has to invoke the WITNESS_*() macros in order to utilize the checker. Discussed with and OK dlg@, OK mpi@
* revert revert revert. there are many other archs that use custom allocs.tedu2019-02-101-22/+24
|
* if waitok flag is set, have the interrupt multipage allocator redirecttedu2019-02-101-1/+9
| | | | to the not interrupt allocator.
* make it possible to reduce kmem pressure by letting some pools use a moretedu2019-02-101-24/+14
| | | | | | | | | | accomodating allocator. an interrupt safe pool may also be used in process context, as indicated by waitok flags. thanks to the garbage collector, we can always free pages in process context. the only complication is where to put the pages. solve this by saving the allocation flags in the pool page header so the free function can examine them. not actually used in this diff. (coming soon.) arm testing and compile fixes from phessler
* Constipate all the struct lock_type's so they go into .rodataguenther2018-06-081-5/+5
| | | | ok visa@
* slightly randomize the order that new pages populate their item lists in.dlg2018-02-061-2/+13
| | | | ok tedu@ deraadt@
* While booting it does not make sense to wait for memory, there isbluhm2018-01-181-1/+7
| | | | | | | no other process which could free it. Better panic in malloc(9) or pool_get(9) instead of sleeping forever. tested by visa@ patrick@ Jan Klemkow suggested by kettenis@; OK deraadt@
* New flag PR_RWLOCK for pool_init(9) makes the pool use rwlocks insteadguenther2017-08-131-69/+248
| | | | | | | of mutexes. Use this immediately for the pool_cache futex pools. Mostly worked out with dlg@ during e2k17 ok mpi@ tedu@
* Compute the level of contention only once.visa2017-07-121-4/+5
| | | | Suggested by and OK dlg@
* When there is no contention on a pool cache lock, lower the numbervisa2017-07-121-1/+4
| | | | | | | | of items that a cache list is allowed to hold. This lets the cache release resources back to the common pool after pressure on the cache has decreased. OK dlg@
* set the alignment of the per cpu cache structures to CACHELINESIZE.dlg2017-06-231-3/+3
| | | | hardcoding 64 is too optimistic.
* change the semantic for calculating when to grow the size of a cache list.dlg2017-06-231-14/+8
| | | | | | | | | | | | | | previously it would figure out if there's enough items overall for all the cpus to have full active an inactive free lists. this included currently allocated items, which pools wont actually hold on a free list and cannot predict when they will come back. instead, see if there's enough items in the idle lists in the depot that could instead go on all the free lists on the cpus. if there's enough idle items, then we can grow. tested by hrvoje popovski and amit kulkarni ok visa@
* dynamically scale the size of the per cpu cache lists.dlg2017-06-191-1/+22
| | | | | | | | | | | | | | | | | if the lock around the global depot of extra cache lists is contented a lot in between the gc task runs, consider growing the number of entries a free list can hold. the size of the list is bounded by the number of pool items the current set of pages can represent to avoid having cpus starve each other. im not sure this semantic is right (or the least worst) but we're putting it in now to see what happens. this also means reality matches the documentation i just committed in pool_cache_init.9. tested by hrvoje popovski and amit kulkarni ok visa@
* add garbage collection of unused lists percpu cached items.dlg2017-06-161-2/+40
| | | | | | | | | | | | | | | | | | | | | | the cpu caches in pools amortise the cost of accessing global structures by moving lists of items around instead of individual items. excess lists of items are stored in the global pool struct, but these idle lists never get returned back to the system for use elsewhere. this adds a timestamp to the global idle list, which is updated when the idle list stops being empty. if the idle list hasn't been empty for a while, it means the per cpu caches arent using the idle entries and they can be recovered. timestamping the pages prevents recovery of a lot of items that may be used again shortly. eg, rx ring processing and replenishing from rate limited interrupts tends to allocate and free items in large chunks, which the timestamping smooths out. gc'ed lists are returned to the pool pages, which in turn get gc'ed back to uvm. ok visa@
* split returning an item to the pool pages out of pool_put as pool_do_put.dlg2017-06-161-24/+36
| | | | | | | | | | | | | | | | this lets pool_cache_list_put return items to the pages. currently, if pool_cache_list_put is called while the per cpu caches are enabled, the items on the list will put put straight back onto another list in the cpu cache. this also avoids counting puts for these items twice. a put for the items have already been coutned when the items went to a cpu cache, it doesnt need to be counted again when it goes back to the pool pages. another side effect of this is that pool_cache_list_put can take the pool mutex once when returning all the items in the list with pool_do_put, rather than once per item. ok visa@
* report contention on caches global data to userland.dlg2017-06-151-1/+2
|
* white space tweaks. no functional change.dlg2017-06-151-4/+4
|
* implement the backend of the sysctls that report pool cache info.dlg2017-06-151-17/+126
| | | | | | | | | | | | | | | KERN_POOL_CACHE reports info about the global cache info, like how long the lists of cache items the cpus build should be and how many of these lists are idle on the pool struct. KERN_POOL_CACHE_CPUS reports counters from each each. the counters are for how many item and list operations the cache has handled on a cpu. the sysctl provides an array of ncpusfound * struct kinfo_pool_cache_cpu, not a single struct kinfo_pool_cache_cpu. tested by hrvoje popovski ok mikeb@ millert@ ----------------------------------------------------------------------
* when enabling cpu caches, check the item size against the right thingdlg2017-06-131-2/+3
| | | | | | lists of free items on the per cpu caches are built out the pool items as struct pool_cache_items, not struct pool_cache. make the KASSERT in pool_cache_init check that properly.
* Tweak lock inits to make the system runnable with witness(4)visa2017-04-201-3/+3
| | | | on amd64 and i386.
* revert 1.206 because it allows deadlocks.dlg2017-02-201-1/+4
| | | | | | | | | | | if the gc task is running on a cpu that handles interrupts it is possible to allow a deadlock. the gc task my be cleaning up a pool and holding its mutex when an non-MPSAFE interrupt arrives and tries to take the kernel lock. another cpu may already be holding the kernel lock when it then tries use the same pool thats the pool GC is currently processing. thanks to sthen@ and mpi@ for chasing this down.
* the splvm() in pool_gc_pages is unecessary now.dlg2017-02-081-4/+1
| | | | | | | all pools set their ipls unconditionally now, so there isn't a need to second guess them. pointed out by and ok jmatthew@
* Force a context switch for every pool_get(9) with the PR_WAITOK flagmpi2017-01-241-2/+2
| | | | | | if pool_debug is equal to 2, just like we do for malloc(9). ok dlg@
* let pool page allocators advertise what sizes they can provide.dlg2016-11-211-14/+42
| | | | | | | | | | | | | | | | to keep things concise i let the multi page allocators provide multiple sizes of pages, but this feature was implicit inside pool_init and only usable if the caller of pool_init did not specify a page allocator. callers of pool_init can now suplly a page allocator that provides multiple page sizes. pool_init will try to fit 8 items onto a page still, but will scale its page size down until it fits into what the allocator provides. supported page sizes are specified as a bit field in the pa_pagesz member of a pool_allocator. setting the low bit in that word indicates that the pages can be aligned to their size.
* rename some types and functions to make the code easier to read.dlg2016-11-071-143/+149
| | | | | | | | | | pool_item_header is now pool_page_header. the more useful change is pool_list is now pool_cache_item. that's what items going into the per cpu pool caches are cast to, and they get linked together to make a list. the functions operating on what is now pool_cache_items have been renamed to make it more obvious what they manipulate.
* poison the TAILQ_ENTRY in items in the per cpu pool cache.dlg2016-11-021-12/+52
|
* add poisoning of items on the per cpu caches.dlg2016-11-021-4/+38
| | | | | | | | | it copies the existing pool code, except it works on pool_list structures instead of pool_item structures. after this id like to poison the words used by the TAILQ_ENTRY in the pool_list struct that arent used until a list of items is moved into the global depot.
* use a TAILQ to maintain the list of item lists used by the percpu code.dlg2016-11-021-10/+8
| | | | | | it makes it more readable, and fixes a bug in pool_list_put where it was returning the next item in the current list rather than the next list to be freed.
* add per cpu caches for free pool items.dlg2016-11-021-1/+316
| | | | | | | | | | | | | | | | | | | | | | this is modelled on whats described in the "Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources" paper by Jeff Bonwick and Jonathan Adams. the main semantic borrowed from the paper is the use of two lists of free pool items on each cpu, and only moving one of the lists in and out of a global depot of free lists to mitigate against a cpu thrashing against that global depot. unlike slabs, pools do not maintain or cache constructed items, which allows us to use the items themselves to build the free list rather than having to allocate arrays to point at constructed pool items. the per cpu caches are build on top of the cpumem api. this has been kicked a bit by hrvoje popovski and simon mages (thank you). im putting it in now so it is easier to work on and test. ok jmatthew@
* all pools have their ipl set via pool_setipl, so fold it into pool_init.dlg2016-09-151-28/+13
| | | | | | | | | | | | | | | | | | | | | | the ioff argument to pool_init() is unused and has been for many years, so this replaces it with an ipl argument. because the ipl will be set on init we no longer need pool_setipl. most of these changes have been done with coccinelle using the spatch below. cocci sucks at formatting code though, so i fixed that by hand. the manpage and subr_pool.c bits i did myself. ok tedu@ jmatthew@ @ipl@ expression pp; expression ipl; expression s, a, o, f, m, p; @@ -pool_init(pp, s, a, o, f, m, p); -pool_setipl(pp, ipl); +pool_init(pp, s, a, ipl, f, m, p);
* move pools to using the subr_tree version of rb treesdlg2016-09-151-9/+11
| | | | this is half way to recovering the space used by the subr_tree code.
* revert moving pools from tree.h to subr_tree.c rb trees.dlg2016-09-051-11/+9
| | | | itll go in again when i dont break userland.
* move pool red-black trees from tree.h code to subr_tree.c codedlg2016-09-051-9/+11
| | | | ok tedu@
* add a "show socket" command to ddbdlg2016-01-151-4/+6
| | | | | | should help inspecting socket issues in the future. enthusiasm from mpi@ bluhm@ deraadt@
* Now that interrupt-safe uvm maps are porperly locked, the interrupt-safekettenis2015-09-111-5/+1
| | | | | | | multi page backend allocator implementation no longer needs to grab the kernel lock. ok mlarkin@, dlg@
* Give the pool page allocator backends more sensible names. We now have:kettenis2015-09-081-20/+19
| | | | | | | | * pool_allocator_single: single page allocator, always interrupt safe * pool_allocator_multi: multi-page allocator, interrupt safe * pool_allocator_multi_ni: multi-page allocator, not interrupt-safe ok deraadt@, dlg@
* Now that msleep(9) no longer requires the kernel lock (as long as PCATCHkettenis2015-09-081-15/+2
| | | | | | | isn't specified) the default backend allocator implementation no longer needs to grab the kernel lock. ok visa@, guenther@
* We no longer need to grab the kernel lock for allocating and freeing pageskettenis2015-09-061-5/+11
| | | | | | | | | | in the (default) single page pool backend allocator. This means it is now safe to call pool_get(9) and pool_put(9) for "small" items while holding a mutex without holding the kernel lock as well as these functions will no longer acquire the kernel lock under any circumstances. For "large" items (where large is larger than 1/8th of a page) this still isn't safe though. ok dlg@
* Push down the KERNEL_LOCK/KERNEL_UNLOCK calls into the back-end allocatorkettenis2015-09-011-7/+21
| | | | | | | | | functions. Note that these calls are deliberately not added to the special-purpose back-end allocators in the various pmaps. Those allocators either don't need to grab the kernel lock, are always called with the kernel lock already held, or are only used on non-MULTIPROCESSOR platforms. pk tedu@, deraadt@, dlg@
* re-enable *8.dlg2015-08-211-2/+2
| | | | | | | if we're allowed to try and use large pages, we try and fit at least 8 of the items. this amortises the per page cost of an item a bit. "be careful" deraadt@
* remove the POOL_NEEDS_CATCHUP macro, it isnt used.dlg2015-07-231-4/+1
| | | | from martin natano
* Move `ticks' declaration to sys/kernel.h.uebayasi2015-07-201-3/+1
|
* disable *8 again for now. incoherent archs arent having much fun with it.dlg2015-04-211-2/+2
|
* nothing uses pool_sleep, so get rid of itdlg2015-04-071-3/+1
|