summaryrefslogtreecommitdiffstats
path: root/sys/kern/subr_pool.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* introduce a garbage collector for (very) idle pool pages.dlg2015-04-071-2/+53
| | | | | | | | | | | now that idle pool pages are timestamped we can tell how long theyve been idle. this adds a task that runs every second that iterates over all the pools looking for pages that have been idle for 8 seconds so it can free them. this idea probably came from a conversation with tedu@ months ago. ok tedu@ kettenis@
* reintroduce r1.173:dlg2015-03-201-2/+2
| | | | | | | | | | | | > if we're able to use large page allocators, try and place at least > 8 items on a page. this reduces the number of allocator operations > we have to do per item on large items. this was backed out because of fallout on landisk which has since been fixed. putting this in again early in the cycle so we can look for more fallout. hopefully it will stick. ok deraadt@
* Remove some includes include-what-you-use claims don'tjsg2015-03-141-2/+1
| | | | | | | have any direct symbols used. Tested for indirect use by compiling amd64/i386/sparc64 kernels. ok tedu@ deraadt@
* reintroduce page item cache colouring.dlg2015-02-101-7/+20
| | | | | | | | | | | | | | | | | | | | | if you're having trouble understanding what this helps, imagine your cpus caches are a hash table. by moving the base address of items around (colouring them), you give it more bits to hash with. in turn that makes it less likely that you will overflow buckets in your hash. i mean cache. it was inadvertantly removed in my churn of this subsystem, but as tedu has said on this issue: > The history of pool is filled with features getting trimmed because they > seemed unnecessary or in the way, only to later discover how important they > were. Having slowly learned that lesson, I think our default should be "if > bonwick says do it, we do it" until proven otherwise. until proven otherwise we can keep the functionality, especially as the code cost is minimal. ok many including tedu@ guenther@ deraadt@ millert@
* pool_chk_page iterates over a pages free item lists and checks thatdlg2015-01-221-11/+10
| | | | | | | | | | | | | | | | | | | | the items address is within the page. it does that by masking the item address with the page mask and comparing that to the page address. however, if we're using large pages with external page headers, we dont request that the large page be aligned to its size. eg, on an arch with 4k pages, an 8k large page could be aligned to 4k, so masking bits to get the page address wont work. these incorrect checks were distracting while i was debugging large pages on landisk. this changes it to do range checks to see if the item is within the page. it also checks if the item is on the page before checking if its magic values or poison is right. ok miod@
* white space fixes. no binary change.dlg2015-01-191-7/+7
|
* splassert on some archs (or just sparc64) check that you're not indlg2015-01-051-1/+7
| | | | | | | | | an interrupt handler at an ipl level higher than what you're splasserting you should be at. if you think code should be protected by IPL_BIO and its entered from an interrupt handler established at IPL_NET, you have a bug. add some asserts to gets and puts so we can pick those cases up.
* back out r1.173, aka the "* 8" diff. it tickles a problem on somedlg2015-01-041-2/+2
| | | | | | | landisk machines. we've been unable to figure out due to a lack of hardware (on my part) or time. discussed with and ok miod@
* avoid the use of an uninitialised variable in one of the codepaths injsg2015-01-041-2/+2
| | | | | | pool_setlowat() ok dlg@ tedu@
* remove some unused fields from pool. ok dlgtedu2014-12-221-7/+2
|
* if we're able to use large page allocators, try and place at leastdlg2014-12-221-2/+2
| | | | | | | 8 items on a page. this reduces the number of allocator operations we have to do per item on large items. ok tedu@
* timestamp empty pages, and only free them if theyve been idle for at leastdlg2014-12-191-2/+8
| | | | | | | | | a second. this basically brings back the functionality that was trimmed in r1.53, except this version uses ticks instead of very slow hardware clock reads. ok tedu@
* the last commit changed LIST_INSERT_HEAD to TAILQ_INSERT_TAIL cos thedlg2014-12-191-3/+3
| | | | | | | | | | latter is cheaper, but i forgot to change the thing that pulls pages off those lists to match the change in direction. the page lists went from LIFO to FIFO. this changes pool_update_curpage to use TAILQ_LAST so we go back to LIFO. pointed out by and ok tedu@
* replace the page LISTS with page TAILQs. this will let me pull pages fromdlg2014-12-191-41/+41
| | | | | | either end of the lists cheaply. ok kettenis@ tedu@
* init the mutex used in sleeping pool_gets with the right ipl if thedlg2014-12-041-3/+4
| | | | | | pool hasnt had pool_setipl called. ok kettenis@ ages ago
* move arc4random prototype to systm.h. more appropriate for most codetedu2014-11-181-2/+1
| | | | to include that than rdnvar.h. ok deraadt dlg
* hoist the slowdown handling up to the pool_do_get callers. this letsdlg2014-11-151-19/+16
| | | | | | | us handle the slowdown where we already give up pr_mtx and gets rid of an ugly goto. ok tedu@ who i think has more tweaks coming
* move the slowdown back up. it needs to take place after the allocated pagetedu2014-11-141-10/+18
| | | | | | has been added to the pool, else it doesn't help because the memory isn't available. lost in locking rework. tested blambert sthen
* Grab the pool mutex in sysctl_dopool(), but only for pools for whichkettenis2014-11-101-4/+5
| | | | | | | | | pool_setipl(9) has been called. This avoids the panic introduced in rev 1.139 (which was subsequently backed out) while still effectively guaranteeing a consistent snapshot. Pools used from interrupt handlers should use the appropriate pool IPL. ok dlg@, deraadt@
* remove color support. discussed with dlg and mikebtedu2014-11-011-14/+4
|
* take the pool_item pi_magic touching out from under #ifdef DIAGNOSTIC.dlg2014-10-131-16/+4
| | | | | | | | | | | | | | i couldnt measure a significant performance difference with or without it. this is likely a function of the memory involved being close to bits that are already being touched, the implemention being simple macros that mean registers can stay hot, and a lack of conditionals that would cause a cpu pipeline to crash. this means we're unconditionally poisoning the first two u_longs of pool items on all kernels. i think it also makes the code easier to read. discussed with deraadt@
* massage the pool item header and pool item magic words.dlg2014-10-101-31/+55
| | | | | | | | | | | | | | | | | | | | | | | | | previously they were ints, but this bumps them to long sized words. in the pool item headers they were followed by the XSIMPLEQ entries, which are basically pointers which got long word alignment. this meant there was a 4 byte gap on 64bit architectures between the magic and list entry that wasnt being poisoned or checked. this change also uses the header magic (which is sourced from arc4random) with an xor of the item address to poison the item magic value. this is inspired by tedu's XSIMPLEQ lists, and means we'll be exercising memory with more bit patterns. lastly, this takes more care around the handling of the pool_debug flag. pool pages read it when theyre created and stash a local copy of it. from then on all items returned to the page will be poisoned based on the pages local copy of the flag. items allocated off the page will be checked for valid poisoning only if both the page and pool_debug flags are both set. this avoids a race where pool_debug was not set when an item is freed (so it wouldnt get poisoned), then gets set, then an item gets allocated and fails the poison checks because pool_debug wasnt set when it was freed.
* in pool_destroy, enter and leave mutex as necessary to satisfy assertions.tedu2014-09-281-1/+3
| | | | ok dlg
* fix the calculation of the number of items to prime the pool withdlg2014-09-261-2/+3
| | | | | | | | | | | in pool_setlowat. this was stopping arm things from getting spare items into their pmap entry pools, so things that really needed them in a delicate part of boot were failing. reported by rapha@ co-debugging with miod@
* Only compile poison-related code if DIAGNOSTIC instead of if !SMALL_KERNEL,miod2014-09-231-11/+7
| | | | | for subr_poison.c will not get compiled at all on !DIAGNOSTIC kernels. Found the hard way by deraadt@
* rework the pool code to make the locking more obvious (to me atdlg2014-09-221-444/+334
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | least). after this i am confident that pools are mpsafe, ie, can be called without the kernel biglock being held. the page allocation and setup code has been split into four parts: pool_p_alloc is called without any locks held to ask the pool_allocator backend to get a page and page header and set up the item list. pool_p_insert is called with the pool lock held to insert the newly minted page on the pools internal free page list and update its internal accounting. once the pool has finished with a page it calls the following: pool_p_remove is called with the pool lock help to take the now unnecessary page off the free page list and uncount it. pool_p_free is called without the pool lock and does a bunch of checks to verify that the items arent corrupted and have all been returned to the page before giving it back to the pool_allocator to be freed. instead of pool_do_get doing all the work for pool_get, it is now only responsible for doing a single item allocation. if for any reason it cant get an item, it just returns NULL. pool_get is now responsible for checking if the allocation is allowed (according to hi watermarks etc), and for potentially sleeping waiting for resources if required. sleeping for resources is now built on top of pool_requests, which are modelled on how the scsi midlayer schedules access to scsibus resources. the pool code now calls pool_allocator backends inside its own calls to KERNEL_LOCK and KERNEL_UNLOCK, so users of pools dont have to hold biglock to call pool_get or pool_put. tested by krw@ (who found a SMALL_KERNEL issue, thank you) noone objected
* if userland asks for an unknown sysctl, return EOPNOTSUPP insteaddlg2014-09-171-2/+2
| | | | of EINVAL like other sysctl things do.
* disable taking the mutex to read pool stats.dlg2014-09-161-3/+4
| | | | | | | | | | | | | some pool users (eg, mbufs and mbuf clusters) protect calls to pools with their own locks that operate at high spl levels, rather than pool_setipl() to have pools protect themselves. this means pools mtx_enter doesnt necessarily prevent interrupts that will use a pool, so we get code paths that try to mtx_enter twice, which blows up. reported by vlado at bsdbg dot net and matt bettinger diagnosed by kettenis@
* tweak panics so they use __func__ consistently.dlg2014-09-161-41/+37
|
* deprecate PR_DEBUG and MALLOC_DEBUG in pools.dlg2014-09-161-24/+1
| | | | | poked by kspillner@ ok miod@
* remove uneeded proc.h includesjsg2014-09-141-2/+1
| | | | ok mpi@ kspillner@
* change some (flags & PR_WAITOK) to ISSET(flags, PR_WAITOK)dlg2014-09-081-5/+5
| | | | no functional change.
* deprecate the use of the PR_PHINPAGE flag by replacing it with a testdlg2014-09-081-26/+23
| | | | | | of pr_phoffset. ok doug@ guenther@
* KASSERT that the page header pool will use in page headers.dlg2014-09-051-1/+4
|
* rework how pools with large pages (>PAGE_SIZE) are implemented.dlg2014-09-041-122/+102
| | | | | | | | | | | | | | | | this moves the size of the pool page (not arch page) out of the pool allocator into struct pool. this lets us create only two pools for the automatically determined large page allocations instead of 256 of them. while here support using slack space in large pages for the pool_item_header by requiring km_alloc provide pool page aligned memory. lastly, instead of doing incorrect math to figure how how many arch pages to use for large pool pages, just use powers of two. ok mikeb@
* deprecate the "item offset" handling. nothing uses it, so we candlg2014-08-271-21/+6
| | | | | | cut it out of the code to simplify things. ok mikeb@
* bring back r1.130:dlg2014-08-201-22/+32
| | | | | | | | | | add an explicit rwlock around the global state (the pool list and serial number) rather than rely on implicit process exclusion, splhigh and splvm. the only things touching the global state come from process context so we can get away with an rwlock instead of a mutex. thankfully. ok matthew@
* external page headers use an RB tree to find the page headerdlg2014-08-181-30/+25
| | | | | | | | | | | | | | | | | | containing an item when its returned to the pool. this means you need to do an inexact comparison between an items address and the page address, cos a pool page can contain many items. previously this used RB_FIND with a compare function that would do math on every node comparison to see if one node (the key) was within the other node (the tree element). this cuts it over to using RB_NFIND to find the closest tree node instead of the exact tree node. the node compares turns into simple < and > operations, which inline very nicely with the RB_NFIND. the constraint (an item must be within a page) is then checked only once after the NFIND call. feedback from matthew@ and tedu@
* sigh. when returning ENOENT in the sysctl path, unlock on the way out.dlg2014-08-121-2/+2
|
* i accidentally removed the check for whether the requested pool indlg2014-08-121-2/+6
| | | | | the sysctl path exists. return ENOENT instead of trying a NULL deref.
* bring back r1.135:dlg2014-08-121-1/+2
| | | | | matthew@ noticed i wasnt populating npages in the kinfo_pool sent to userland.
* bring back r1.134:dlg2014-08-121-3/+3
| | | | inline is the new __inline
* bring back r1.133. this is a bit different cos we're still using splvm todlg2014-08-121-36/+21
| | | | | | | | | | protect pool_list rather than the rwlock that made i386 blow up: use pool_count to report the number of pools to userland rather than walking the list and counting the elements as we go. use sysctl_rdint, sysctl_rdstring, and sysctl_rdstruct instead of handcrafted copyouts.
* bring back r1.132:dlg2014-08-111-1/+4
| | | | | provide a pool_count global so we can figure out how many pools there are active without having to walk the global pool_list.
* bring back r1.131:dlg2014-08-111-1/+5
| | | | | take the pools mutex when copying stats out of it in the sysctl path so we are guaranteed a consistent snapshot.
* drain some boolean_t poisontedu2014-07-101-2/+2
|
* hide the biglock thrashing under pool_debug so it can be turned offtedu2014-07-101-2/+2
|
* Revert back to 1.129: pool_init() is called before rwlocks can beguenther2014-07-031-66/+60
| | | | used on some archs.
* matthew@ noticed i wasnt populating npages in the kinfo_pool sent todlg2014-07-021-1/+2
| | | | userland.
* inline is the new __inlinedlg2014-07-021-3/+3
|