aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86_64/mm/srat.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2007-07-21x86_64: fake apicid_to_node mapping for fake numaDavid Rientjes1-1/+12
When we are in the emulated NUMA case, we need to make sure that all existing apicid_to_node mappings that point to real node ID's now point to the equivalent fake node ID's. If we simply iterate over all apicid_to_node[] members for each node, we risk remapping an entry if it shares a node ID with a real node. Since apicid's may not be consecutive, we're forced to create an automatic array of apicid_to_node mappings and then copy it over once we have finished remapping fake to real nodes. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: fake pxm-to-node mapping for fake numaDavid Rientjes1-3/+73
For NUMA emulation, our SLIT should represent the true NUMA topology of the system but our proximity domain to node ID mapping needs to reflect the emulated state. When NUMA emulation has successfully setup fake nodes on the system, a new function, acpi_fake_nodes() is called. This function determines the proximity domain (_PXM) for each true node found on the system. It then finds which emulated nodes have been allocated on this true node as determined by its starting address. The node ID to PXM mapping is changed so that each fake node ID points to the PXM of the true node that it is located on. If the machine failed to register a SLIT, then we assume there is no special requirement for emulated node affinity so we use the default LOCAL_DISTANCE, which is newly exported to this code, as our measurement if the emulated nodes appear in the same PXM. Otherwise, we use REMOTE_DISTANCE. PXM_INVAL and NID_INVAL are also exported to the ACPI header file so that we can compare node_to_pxm() results in generic code (in this case, the SRAT code). Cc: Len Brown <lenb@kernel.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: various cleanups in NUMA scan nodeDavid Rientjes1-3/+3
In acpi_scan_nodes(), we immediately return -1 if acpi_numa <= 0, meaning we haven't detected any underlying ACPI topology or we have explicitly disabled its use from the command-line with numa=noacpi. acpi_table_print_srat_entry() and acpi_table_parse_srat() are only referenced within drivers/acpi/numa.c, so we can mark them as static and remove their prototypes from the header file. Likewise, pxm_to_node_map[] and node_to_pxm_map[] are only used within drivers/acpi/numa.c, so we mark them as static and remove their externs from the header file. The automatic 'result' variable is unused in acpi_numa_init(), so it's removed. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Use LOCAL_DISTANCE and REMOTE_DISTANCE in x86_64 ACPI codeDavid Rientjes1-3/+3
Use LOCAL_DISTANCE and REMOTE_DISTANCE in x86_64 ACPI code Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-02[PATCH] x86-64: set node_possible_map at runtime - try 2Suresh Siddha1-3/+5
Set the node_possible_map at runtime on x86_64. On a non NUMA system, num_possible_nodes() will now say '1'. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <clameter@engr.sgi.com>
2007-02-02ACPICA: Remove duplicate table definitions (non-conflicting), contAlexey Starikovskiy1-23/+25
Signed-off-by: Len Brown <len.brown@intel.com>
2006-10-21[PATCH] x86-64: x86_64 hot-add memory srat.c fixkeith mannthey1-2/+2
This patch corrects the logic used in srat.c to figure out what parsing what action to take when registering hot-add areas. Hot-add areas should only be added to the node information for the MEMORY_HOTPLUG_RESERVE case. When booting MEMORY_HOTPLUG_SPARSE hot-add areas on everything but the last node are getting include in the node data and during kernel boot the pages are setup then the kernel dies when the pages are used. This patch fixes this issue. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-01[PATCH] hot-add-mem x86_64: memory_add_physaddr_to_nid node fixupKeith Mannthey1-0/+2
In cases where the acpi memory-add event does not containe the pxm (node) infomation allow the driver to look up node info based on the address. The acpi_get_node call returns -1 if it can't decode the pxm info, this causes add_memory to panic. acpi_get_node would have to decode the resource from the handle (a lenghty proposition). This seems to be the cleanist point to interject the hook. [kamezawa.hiroyu@jp.fujitsu.com: build fixes] [y-goto@jp.fujitsu.com: build fixes] Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] hot-add-mem x86_64: memory_add_physaddr_to_nid enableKeith Mannthey1-1/+12
The api for hot-add memory already has a construct for finding nodes based on an address, memory_add_physaddr_to_nid. This patch allows the fucntion to do something besides return 0. It uses the nodes_add infomation to lookup to node info for a hot add event. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] hot-add-mem x86_64: Enable SPARSEMEM in srat.cKeith Mannthey1-22/+29
Enable x86_64 srat.c to share code between both reserve and sparsemem based add memory paths. Both paths need the hot-add area node locality infomration (nodes_add). This code refactors the code path to allow this. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] Allow an arch to expand node boundariesMel Gorman1-0/+2
Arch-independent zone-sizing determines the size of a node (pgdat->node_spanned_pages) based on the physical memory that was registered by the architecture. However, when CONFIG_MEMORY_HOTPLUG_RESERVE is set, the architecture expects that the spanned_pages will be much larger and that mem_map will be allocated that is used lated on memory hot-add. This patch allows an architecture that sets CONFIG_MEMORY_HOTPLUG_RESERVE to call push_node_boundaries() which will set the node beginning and end to at *least* the requested boundary. Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] Account for holes that are outside the range of physical memoryMel Gorman1-1/+3
absent_pages_in_range() made the assumption that users of the API would not care about holes beyound the end of physical memory. This was not the case. This patch will account for ranges outside of physical memory as holes correctly. Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] Have x86_64 use add_active_range() and free_area_init_nodesMel Gorman1-4/+7
Size zones and holes in an architecture independent manner for x86_64. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] Clean up acpi_numa variableAndi Kleen1-0/+2
Move it into srat.c No need to clutter up setup.c for it And remove use in setup.c completely - it only guarded a printk which can be done unconditionally. Signed-off-by: Andi Kleen <ak@suse.de>
2006-06-23[PATCH] Unify pxm_to_node() and node_to_pxm()Yasunori Goto1-32/+1
Consolidate the various arch-specific implementations of pxm_to_node() and node_to_pxm() into a single generic version. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-30[PATCH] x86_64: Handle empty node zeroDaniel Yeisley1-1/+3
From: Daniel Yeisley <dan.yeisley@unisys.com> It is possible to boot a Unisys ES7000 with CPUs from multiple cells, and not also include the memory from those cells. This can create a scenario where node 0 has cpus, but no associated memory. The system will boot fine in a configuration where node 0 has memory, but nodes 2 and 3 do not. [AK: I rechecked the code and generic code seems to indeed handle that already. Dan's original patch had a change for mm/slab.c that seems to be already in now.] Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-16[PATCH] x86_64: Fix memory hotadd heuristicsAndi Kleen1-4/+11
This fixes some boot failures on Dell and Unisys systems with memory hotadd added. - Set hotadd_percent to 0 by default. This means anybody using hotadd memory needs to specify the value on the command line. That's because there are lots of Intel boxes which have a bogus hotplug area in their SRAT and they would waste a lot of memory before. - Fix calculation of how much memory to use when the hotplug area exceeds hotadd_percent - Fix fallback when the - Fix fallback if memory hotadd is not compiled in. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09[PATCH] x86_64: Handle empty PXMs that only contain hotplug memoryAndi Kleen1-0/+6
The node setup code would try to allocate the node metadata in the node itself, but that fails if there is no memory in there. This can happen with memory hotplug when the hotplug area defines an so far empty node. Now use bootmem to try to allocate the mem_map in other nodes. And if it fails don't panic, but just ignore the node. To make this work I added a new __alloc_bootmem_nopanic function that does what its name implies. TBD should try to use nearby nodes here. Currently we just use any. It's hard to do it better because bootmem doesn't have proper fallback lists yet. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09[PATCH] x86_64: Reserve SRAT hotadd memory on x86-64Andi Kleen1-6/+158
From: Keith Mannthey, Andi Kleen Implement memory hotadd without sparsemem. The memory in the SRAT hotadd area is just preserved instead and can be activated later. There are a few restrictions: - Only one continuous hotadd area allowed per node The main problem is dealing with the many buggy SRAT tables that are out there. The strategy here is to reject anything suspicious. Originally from Keith Mannthey, with several hacks and changes by AK and also contributions from Andrew Morton [ TBD: Problems pointed out by KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>: 1) Goto's rebuild_zonelist patch will not work if CONFIG_MEMORY_HOTPLUG=n. Rebuilding zonelist is necessary when the system has just memory < 4G at boot, and hot add memory > 4G. because x86_64 has DMA32, ZONE_NORAML is not included into zonelist at boot time if system doesn't have memory >4G at boot. [AK: should just force the higher zones at boot time when SRAT tells us] 2) zone and node's spanned_pages and present_pages are not incremented. They should be. For example, our server (ia64/Fujitsu PrimeQuest) can equip memory from 4G to 1T(maybe 2T in future), and SRAT will *always* say we have possible 1T +memory. (Microsoft requires "write all possible memory in SRAT") When we reserve memmap for possible 1T memory, Linux will not work well in +minimum 4G configuraion ;) [AK: needs limiting to 5-10% of max memory] ] Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Rename struct node in x86-64 NUMA code to struct bootnodeAndi Kleen1-4/+4
It conflicts with the struct node in node.h Actually the x86-64 version was there first, but .. Suggested by Jan Beulich Cc: jbeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17[PATCH] x86_64: Always pass full number of nodes to NUMA hash computationAndi Kleen1-1/+1
Previously the numa hash code would be confused by holes in the node space and stop early. This is the first part of the fix for the non boot issue with empty nodes on Opterons. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17[PATCH] x86_64: Relax SRAT covers all memory check a bitAndi Kleen1-1/+2
Code was refusing good SRATs because about 12K got lost somewhere. Allow less than 1MB of difference before rejecting it. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: Do more checking in the SRAT header codeAndi Kleen1-4/+15
- Check if the processor/memory affinity entries are long enough according to the ACPI 3.0 spec. - Ignore memory affinity entries that define a zero length region. All based on BIOS issues found in the field @) Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: Clear more state when ignoring empty node in SRAT parsingAndi Kleen1-6/+20
Might fix boot failures on systems with empty PXMs in SRAT Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11[PATCH] x86_64: Reject SRAT tables that don't cover all memoryAndi Kleen1-0/+33
Broken BIOS on Iwill 8way systems reports these and it causes the bootmem allocator to crash. Add a sanity check if all the PXMs in the SRAT table cover all memory as reported by e820. If the sanity check fails the SRAT is rejected and the code will fall back to discover the NUMA topology using the K8 northbridge registers when applicable. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11[PATCH] x86_64: Return -1 for unknown PCI bus affinityAndi Kleen1-3/+4
When we don't know the node a PCI bus is connected to return -1. This matches the generic code. Noticed by Ravikiran G Thirumalai <kiran@scalex86.org> Cc: Ravikiran G Thirumalai <kiran@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11[PATCH] x86_64: Validate SLIT tableAndi Kleen1-0/+27
A lot of Opteron BIOS just pass 10 in all SLIT entries (10 is the normalized unit). This is actually worse than the default heuristic because it leads to pci_distance not knowing the difference between local and remote nodes anymore. This messes up some NUMA heuristics in generic code. In this case it's better to fall back to the default heuristic which just does nodea == nodeb ? 10 : 20. This patch does some basic sanity checking on the SLIT and only accepts the SLIT when it passes. Invariants enforced are: - Node to itself shall be 10 - Any other distance shouldn't be 10 - Distances smaller than 10 are illegal Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14[PATCH] x86_64: Make node boundaries consistentMagnus Damm1-4/+0
The current x86_64 NUMA memory code is inconsequent when it comes to node memory ranges. The exact behaviour varies depending on which config option that is used. setup_node_bootmem() has start and end as arguments and these are used to calculate the size of the node like this: (end - start). This is all fine if end is pointing to the first non-available byte. The problem is that the current x86_64 code sometimes treats it as the last present byte and sometimes as the first non-available byte. The result is that some configurations might lose a page at the end of the range. This patch tries to fix CONFIG_ACPI_NUMA, CONFIG_K8_NUMA and CONFIG_NUMA_EMU so they all treat the end variable as the first non-available byte. This is the same way as the single node code. The patch is boot tested on dual x86_64 hardware with the above configurations, but maybe the removed code is needed as some workaround? Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14[PATCH] x86_64: Speed up numa_node_id by putting it directly into the PDAAndi Kleen1-1/+1
Not go from the CPU number to an mapping array. Mode number is often used now in fast paths. This also adds a generic numa_node_id to all the topology includes Suggested by Eric Dumazet Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12[PATCH] x86-64: Use correct mask to compute conflicting nodes in SRATAndi Kleen1-1/+1
The nodes are not set online yet at this point. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12[PATCH] x86-64: reset apicid<->node tables when SRAT cannot be parsedAndi Kleen1-0/+3
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12[PATCH] x86-64: Clean up the SRAT node list before computing the hash functionAndi Kleen1-9/+11
Also use for_each_node_mask instead of hand crafted loops. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12[PATCH] x86-64: Improve error handling for overlapping PXMs in SRAT.Andi Kleen1-6/+13
- Report PXMs instead of nodes - Report the correct PXM, not always the one of node 1. - Only warn for the case of a PXM overlapping by itself Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12[PATCH] x86-64: Use ACPI PXM to parse PCI<->node assignmentsAndi Kleen1-0/+7
Since this is shared code I had to implement it for i386 too Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12[PATCH] x86-64: Don't assign CPU numbers in SRAT parsingAndi Kleen1-14/+3
Do that later when the CPU boots. SRAT just stores the APIC<->Node mapping node. This fixes problems on systems where the order of SRAT entries does not match the MADT. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28[PATCH] x86_64: fix cpu_to_node setup for sparse apic_idsRavikiran G Thirumalai1-6/+11
While booting with SMT disabled in bios, when using acpi srat to setup cpu_to_node[], sparse apic_ids create problems. Without this patch, intel x86_64 boxes with hyperthreading disabled in the bios (and which rely on srat for numa setup) endup having incorrect values in cpu_to_node[] arrays, causing sched domains to be built incorrectly etc. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28[PATCH] x86_64: Print a boot message for hotplug memory zonesAndi Kleen1-1/+4
From: Keith Manning Print a boot message for hotplug memory zones Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16Linux-2.6.12-rc2Linus Torvalds1-0/+217
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!