aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2016-07-25PCI: Allow additional bus numbers for hotplug bridgesKeith Busch4-0/+21
A user may hot add a switch requiring more than one bus to enumerate. This previously required a system reboot if BIOS did not sufficiently pad the bus resource, which they frequently don't do. Add a kernel parameter so a user can specify the minimum number of bus numbers to reserve for a hotplug bridge's subordinate buses so rebooting won't be necessary. The default is 1, which is equivalent to previous behavior. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-07-25PCI: Remove DPC tristate module optionKeith Busch1-4/+1
Change the Downstream Port Containment config type from tristate to bool. The driver doesn't automatically load based on any rules, so it needs to be built-in in order to bind to devices it needs to drive. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-07-25PCI: Bind DPC to Root Ports as well as Downstream PortsKeith Busch1-1/+1
PCIe port type values are not flags, so OR'ing them is not correct. Previously the result was equivalent to PCIe Downstream Ports, so we were missing binding to DPC-capable Root Ports. Change the type to 'any' so we can bind to both port types. While this will cause the code to check Upstream Ports, the driver won't claim them since they are not DPC-capable. Reported-by: Alexander Antonov <alexanderx.v.antonov@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Mika Westerberg <mika.westerberg@linux.intel.com>
2016-07-19PCI: Add DMA alias quirk for Adaptec 3805Alex Williamson1-0/+3
Add a DMA alias quirk for the Adaptec 3805, just like the 3405 quirk added in commit d3d2ab43ddae ("PCI: Add DMA alias quirk for Adaptec 3405"). Link: https://www.redhat.com/archives/vfio-users/2016-July/msg00046.html Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-07-19PCI: Include <asm/dma.h> for isa_dma_bridge_buggyBen Dooks1-0/+1
At least on arm, <asm/dma.h> does not get included when building drivers/pci/pci.o. This causes the following build warning which can be fixed by including <asm/dma.h>: drivers/pci/pci.c:37:5: warning: symbol 'isa_dma_bridge_buggy' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-21PCI: Fix whitespace in struct dpc_devMika Westerberg1-2/+2
Remove unnecessary spaces before tabs. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Keith Busch <keith.busch@intel.com>
2016-06-21PCI: Convert Downstream Port Containment driver to use devm_* functionsMika Westerberg1-9/+4
Use the device resource management (devm) interfaces so we don't need to explicitly release resources on failure paths or when the driver is removed. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Keith Busch <keith.busch@intel.com>
2016-06-20PCI: pciehp: Ignore interrupts during D3coldLukas Wunner1-0/+4
If a hotplug port is suspended to D3cold, its slot status register cannot be read. If that hotplug port happens to share its IRQ with other devices, whenever an interrupt occurs for one of these devices, pciehp logs a "no response from device" message and tries to read the PCI_EXP_SLTSTA register, even though we know that will fail. Ignore interrupts while we're in D3cold. [bhelgaas: changelog] Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-17PCI: Document connection between pci_power_t and hardware PM capabilityBjorn Helgaas1-0/+4
The dev.pme_support field, pci_pm_init(), pci_pme_capable(), and pci_raw_set_power_state() depend on the fact that the pci_power_t values (PCI_D0, PCI_D1, etc.) match the definition of the Capabilities PME_Support and the Control/Status PowerState fields in the Power Management capability (see PCI Bus Power Management spec r1.2, sec 3.2.3). Add a note to this effect at the pci_power_t typedef. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2016-06-15PCI/MSI: irqchip: Fix PCI_MSI dependenciesArnd Bergmann5-24/+31
The PCI_MSI symbol is used inconsistently throughout the tree, with some drivers using 'select' and others using 'depends on', or using conditional selects. This keeps causing problems; the latest one is a result of ARCH_ALPINE using a 'select' statement to enable its platform-specific MSI driver without enabling MSI: warning: (ARCH_ALPINE) selects ALPINE_MSI which has unmet direct dependencies (PCI && PCI_MSI) drivers/irqchip/irq-alpine-msi.c:104:15: error: variable 'alpine_msix_domain_info' has initializer but incomplete type static struct msi_domain_info alpine_msix_domain_info = { ^~~~~~~~~~~~~~~ drivers/irqchip/irq-alpine-msi.c:105:2: error: unknown field 'flags' specified in initializer .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS | ^ drivers/irqchip/irq-alpine-msi.c:105:11: error: 'MSI_FLAG_USE_DEF_DOM_OPS' undeclared here (not in a function) .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS | ^~~~~~~~~~~~~~~~~~~~~~~~ There is little reason to enable PCI support for a platform that uses MSI but then leave MSI disabled at compile time. Select PCI_MSI from irqchips that implement MSI, and make PCI host bridges that use MSI on ARM depend on PCI_MSI_IRQ_DOMAIN. For all three architectures that support PCI_MSI_IRQ_DOMAIN (ARM, ARM64, X86), enable it by default whenever MSI is enabled. [bhelgaas: changelog, omit crypto config change] Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2016-06-13PCI: Add runtime PM support for PCIe portsMika Westerberg2-0/+54
Add back runtime PM support for PCIe ports that was removed by fe9a743a2601 ("PCI/PM: Drop unused runtime PM support code for PCIe ports"). We cannot enable it automatically for all ports since there have been problems previously [1]. In summary suspended PCIe ports were not able to deal with ACPI-based hotplug reliably. One reason why this might happen is the fact that when a PCIe port is powered down, config space access to the devices behind the port is not possible. If the BIOS hotplug SMI handler assumes the port is always in D0 it will not be able to find the hotplugged devices. To be on the safe side only enable runtime PM if the port does not claim to support hotplug. For PCIe ports not using hotplug, we enable and allow runtime PM automatically. Since 'bridge_d3' can be changed any time we check this in driver ->runtime_idle() and ->runtime_suspend() and only allow runtime suspend if the flag is still set. Use autosuspend with default of 100ms idle time to prevent the port from repeatedly suspending and resuming on continuous configuration space access of devices behind the port. The actual power transition to D3 and back is handled in the PCI core. Idea to automatically unblock (allow) runtime PM for PCIe ports came from Dave Airlie. [1] https://bugzilla.kernel.org/show_bug.cgi?id=53811 This includes a fix for lockdep issue reported by Valdis Kletnieks. Tested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-13ACPI / hotplug / PCI: Runtime resume bridge before rescanMika Westerberg1-0/+4
If a PCI bridge (or PCIe port) that is runtime suspended gets an ACPI hotplug event, such as BUS_CHECK we need to make sure it is resumed before devices below the bridge are re-scanned. Otherwise the devices behind the port are not accessible and will be treated as hot-unplugged. To fix this, resume PCI bridges from runtime suspend while rescanning. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-13PCI: Power on bridges before scanning new devicesMika Westerberg1-0/+9
When a PCI device is removed through sysfs interface, the upstream bridge (PCIe port) can be runtime suspended if it was the last device on that bus. Now, if the bridge is in D3 we cannot find devices below the bridge anymore. For example following fails to find the removed device again: # echo 1 > /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/remove # echo 1 > /sys/bus/pci/devices/0000:00:01.0/rescan Where 0000:00:01.0 is the bridge device. In order to be able to rescan devices below the bridge add pm_runtime_get_sync()/pm_runtime_put() calls to pci_scan_bridge(). This should keep bridges powered on while their children devices are being scanned. Reported-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-13PCI: Put PCIe ports into D3 during suspendMika Westerberg9-5/+203
Currently the Linux PCI core does not touch power state of PCI bridges and PCIe ports when system suspend is entered. Leaving them in D0 consumes power unnecessarily and may prevent the CPU from entering deeper C-states. With recent PCIe hardware we can power down the ports to save power given that we take into account few restrictions: - The PCIe port hardware is recent enough, starting from 2015. - Devices connected to PCIe ports are effectively in D3cold once the port is transitioned to D3 (the config space is not accessible anymore and the link may be powered down). - Devices behind the PCIe port need to be allowed to transition to D3cold and back. There is a way both drivers and userspace can forbid this. - If the device behind the PCIe port is capable of waking the system it needs to be able to do so from D3cold. This patch adds a new flag to struct pci_device called 'bridge_d3'. This flag is set and cleared by the PCI core whenever there is a change in power management state of any of the devices behind the PCIe port. When system later on is suspended we only need to check this flag and if it is true transition the port to D3 otherwise we leave it in D0. Also provide override mechanism via command line parameter "pcie_port_pm=[off|force]" that can be used to disable or enable the feature regardless of the BIOS manufacturing date. Tested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-13PCI: Don't clear d3cold_allowed for PCIe portsMika Westerberg1-5/+0
The PCI core skips bridges and ports when the system is suspended. The PCI core checks return value of pci_has_subordinate() in pci_pm_suspend_noirq() to skip all devices where it is non-zero (which means PCI bridges and PCIe ports). Since PCIe ports are never suspended in the first place, there is no need to set d3cold_allowed for them. Tested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-13PCI / PM: Enforce type casting for pci_power_tAndy Shevchenko1-1/+1
When casting variables of type pci_power_t, a static analysis tool complains: include/linux/pci.h:119:37: warning: cast from restricted pci_power_t Enforce type casting to make the static analyzer happy. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10PCI: Make bus_attr_resource_alignment staticBen Dooks1-1/+1
The symbol bus_attr_resource_alignment is not exported or declared elsewhere, so make it static to fix the following warning: drivers/pci/pci.c:4900:1: warning: symbol 'bus_attr_resource_alignment' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10PCI: Mark Atheros AR9485 and QCA9882 to avoid bus resetChris Blake1-4/+6
Similar to the AR93xx series, the AR94xx and the Qualcomm QCA988x also have the same quirk for the Bus Reset. Fixes: c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") Signed-off-by: Chris Blake <chrisrblake93@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.14+
2016-06-10PCI/ASPM: Remove redundant check of pcie_set_clkpmShawn Lin1-1/+1
Without supporting clock PM capable, if we want to disable clkpm, we don't need this extra check as it must already be zero for the enable argument. And it's the same for enabling clkpm here. So let's remove this check. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10MAINTAINERS: Add file patterns for PCI device tree bindingsGeert Uytterhoeven1-0/+1
Submitters of device tree binding documentation may forget to CC the subsystem maintainer if this is missing. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10PCI: Fix comment typoAndrea Gelmini1-1/+1
Fix typo. Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10PCI: Add function 1 DMA alias quirk for Marvell 88SE9182Aaron Sierra1-0/+3
Add function 1 DMA alias quirk for Marvell 88SE9182. We found this quirk reported in the same thread as other Marvell devices, but no patch resulted: https://bugzilla.kernel.org/show_bug.cgi?id=42679#c78 Signed-off-by: Steven Graham <sgraham@xes-inc.com> Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10ARM64: PCI: Support ACPI-based PCI host controllerTomasz Nowicki2-3/+115
Implement pci_acpi_scan_root() and other arch-specific calls so ARM64 can use ACPI to setup and enumerate PCI buses. Use memory-mapped configuration space information from either the ACPI _CBA method or the MCFG table and the ECAM library and generic ECAM config accessor ops. Implement acpi_pci_bus_find_domain_nr() to retrieve the domain number from the acpi_pci_root structure. Implement pcibios_add_bus() and pcibios_remove_bus() to call acpi_pci_add_bus() and acpi_pci_remove_bus() for ACPI slot management and other configuration. Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-10ARM64: PCI: Implement AML accessors for PCI_Config regionTomasz Nowicki1-2/+10
On ACPI systems, the PCI_Config OperationRegion allows AML to access PCI configuration space. The ACPI CA AML interpreter uses performs config space accesses with acpi_os_read_pci_configuration() and acpi_os_write_pci_configuration(), which are OS-dependent functions supplied by acpi/osl.c. Implement the arch-specific raw_pci_read() and raw_pci_write() interfaces used by acpi/osl.c for PCI_Config accesses. N.B. PCI_Config accesses are not supported before PCI bus enumeration. [bhelgaas: changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-10ARM64: PCI: ACPI support for legacy IRQs parsing and consolidation with DT codeTomasz Nowicki1-3/+8
To enable PCI legacy IRQs on platforms booting with ACPI, arch code should include ACPI-specific callbacks that parse and set-up the device IRQ number, equivalent to the DT boot path. Owing to the current ACPI core scan handlers implementation, ACPI PCI legacy IRQs bindings cannot be parsed at device add time, since that would trigger ACPI scan handlers ordering issues depending on how the ACPI tables are defined. To solve this problem and consolidate FW PCI legacy IRQs parsing in one single pcibios callback (pending final removal), this patch moves DT PCI IRQ parsing to the pcibios_alloc_irq() callback (called by PCI core code at driver probe time) and adds ACPI PCI legacy IRQs parsing to the same callback too, so that FW PCI legacy IRQs parsing is confined in one single arch callback that can be easily removed when code parsing PCI legacy IRQs is consolidated and moved to core PCI code. Suggested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10ARM64: PCI: Add acpi_pci_bus_find_domain_nr()Tomasz Nowicki3-1/+16
Extend pci_bus_find_domain_nr() so it can find the domain from either: - ACPI, via the new acpi_pci_bus_find_domain_nr() interface, or - DT, via of_pci_bus_find_domain_nr() Note that this is only used for CONFIG_PCI_DOMAINS_GENERIC=y, so it does not affect x86 or ia64. [bhelgaas: changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-10PCI: Factor DT-specific pci_bus_find_domain_nr() code outTomasz Nowicki1-1/+6
pci_bus_find_domain_nr() retrieves the host bridge domain number in a DT-specific way. Rename it to of_pci_bus_find_domain_nr() to reflect that, so we can add a corresponding function for ACPI. [bhelgaas: changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-10PCI: Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERICTomasz Nowicki3-9/+6
Instead of assigning bus->domain_nr inside pci_bus_assign_domain_nr(), return the domain and let the caller do the assignment. Rename pci_bus_assign_domain_nr() to pci_bus_find_domain_nr() to reflect this. No functional change intended. [bhelgaas: changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-10PCI/ACPI: Add generic MCFG table handlingTomasz Nowicki5-1/+99
On ACPI systems that support memory-mapped config space access, i.e., ECAM, the PCI Firmware Specification says the OS can learn where the ECAM space is from either: - the static MCFG table (for non-hotpluggable bridges), or - the _CBA method (for hotpluggable bridges) The current MCFG table handling code cannot be easily generalized owing to x86-specific quirks, which makes it hard to reuse on other architectures. Implement generic MCFG handling from scratch, including: - Simple MCFG table parsing (via pci_mmcfg_late_init() as in current x86) - MCFG region lookup for a (domain, bus_start, bus_end) tuple [bhelgaas: changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-10PCI/ACPI: Support I/O resources when parsing host bridge resourcesJayachandran C1-0/+35
On platforms with memory-mapped I/O ports, such as ia64 and ARM64, we have to map the memory region and coordinate it with the arch's I/O port accessors. For ia64, we do this in arch code because it supports both dense (1 byte per I/O port) and sparse (1024 bytes per I/O port) memory mapping. For arm64, we only support dense mappings, which we can do in the generic code with pci_register_io_range() and pci_remap_iospace(). Add acpi_pci_root_remap_iospace() to remap dense memory-mapped I/O port space when adding a bridge, and call pci_unmap_iospace() to release the space when removing the bridge. [bhelgaas: changelog, move #ifdef inside acpi_pci_root_remap_iospace()] Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Sinan Kaya <okaya@codeaurora.org> [Tomasz: merged in Sinan's patch to unmap IO resources properly, updated changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-10PCI: Add pci_unmap_iospace() to unmap I/O resourcesSinan Kaya2-0/+19
Add pci_unmap_iospace() to undo what pci_remap_iospace() did. This is needed to support hotplug removal of host bridges that use pci_remap_iospace(). [bhelgaas: changelog] Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-10PCI: Add parent device field to ECAM struct pci_config_windowJayachandran C3-4/+6
Add a parent device field to struct pci_config_window. The parent is not saved now, but will be useful to save it in some cases. For ACPI on ARM64, it can be used to setup ACPI companion and domain. Since the parent dev is in struct pci_config_window now, we need not pass it to the init function as a separate argument. Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-10PCI: Move ecam.h to linux/include/pci-ecam.hJayachandran C6-10/+5
This header will be used from arch/arm64 for ACPI PCI implementation so it needs to be moved out of drivers/pci. Update users of the header file to use the new name. No functional changes. Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2016-06-05Linux 4.7-rc2Linus Torvalds1-1/+1
2016-06-05devpts: Make each mount of devpts an independent filesystem.Eric W. Biederman7-296/+126
The /dev/ptmx device node is changed to lookup the directory entry "pts" in the same directory as the /dev/ptmx device node was opened in. If there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx uses that filesystem. Otherwise the open of /dev/ptmx fails. The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that userspace can now safely depend on each mount of devpts creating a new instance of the filesystem. Each mount of devpts is now a separate and equal filesystem. Reserved ttys are now available to all instances of devpts where the mounter is in the initial mount namespace. A new vfs helper path_pts is introduced that finds a directory entry named "pts" in the directory of the passed in path, and changes the passed in path to point to it. The helper path_pts uses a function path_parent_directory that was factored out of follow_dotdot. In the implementation of devpts: - devpts_mnt is killed as it is no longer meaningful if all mounts of devpts are equal. - pts_sb_from_inode is replaced by just inode->i_sb as all cached inodes in the tty layer are now from the devpts filesystem. - devpts_add_ref is rolled into the new function devpts_ptmx. And the unnecessary inode hold is removed. - devpts_del_ref is renamed devpts_release and reduced to just a deacrivate_super. - The newinstance mount option continues to be accepted but is now ignored. In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as they are never used. Documentation/filesystems/devices.txt is updated to describe the current situation. This has been verified to work properly on openwrt-15.05, centos5, centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3, ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1, slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01. With the caveat that on centos6 and on slackware-14.1 that there wind up being two instances of the devpts filesystem mounted on /dev/pts, the lower copy does not end up getting used. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg KH <greg@kroah.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-05parisc: Move die_if_kernel() prototype into traps.h headerHelge Deller2-2/+3
Signed-off-by: Helge Deller <deller@gmx.de>
2016-06-05parisc: Fix pagefault crash in unaligned __get_user() callHelge Deller1-1/+9
One of the debian buildd servers had this crash in the syslog without any other information: Unaligned handler failed, ret = -2 clock_adjtime (pid 22578): Unaligned data reference (code 28) CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G E 4.5.0-2-parisc64-smp #1 Debian 4.5.4-1 task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001001111100000001111 Tainted: G E r00-03 000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0 r04-07 00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff r08-11 0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4 r12-15 000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b r16-19 0000000000028800 0000000000000001 0000000000000070 00000001bde7c218 r20-23 0000000000000000 00000001bde7c210 0000000000000002 0000000000000000 r24-27 0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0 r28-31 0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218 sr00-03 0000000001200000 0000000001200000 0000000000000000 0000000001200000 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88 IIR: 0ca0d089 ISR: 0000000001200000 IOR: 00000000fa6f7fff CPU: 1 CR30: 00000001bde7c000 CR31: ffffffffffffffff ORIG_R28: 00000002369fe628 IAOQ[0]: compat_get_timex+0x2dc/0x3c0 IAOQ[1]: compat_get_timex+0x2e0/0x3c0 RP(r2): compat_get_timex+0x40/0x3c0 Backtrace: [<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0 [<0000000040205024>] syscall_exit+0x0/0x14 This means the userspace program clock_adjtime called the clock_adjtime() syscall and then crashed inside the compat_get_timex() function. Syscalls should never crash programs, but instead return EFAULT. The IIR register contains the executed instruction, which disassebles into "ldw 0(sr3,r5),r9". This load-word instruction is part of __get_user() which tried to read the word at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in. The unaligned handler is able to emulate all ldw instructions, but it fails if it fails to read the source e.g. because of page fault. The following program reproduces the problem: #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <sys/mman.h> int main(void) { /* allocate 8k */ char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); /* free second half (upper 4k) and make it invalid. */ munmap(ptr+4096, 4096); /* syscall where first int is unaligned and clobbers into invalid memory region */ /* syscall should return EFAULT */ return syscall(__NR_clock_adjtime, 0, ptr+4095); } To fix this issue we simply need to check if the faulting instruction address is in the exception fixup table when the unaligned handler failed. If it is, call the fixup routine instead of crashing. While looking at the unaligned handler I found another issue as well: The target register should not be modified if the handler was unsuccessful. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org
2016-06-05parisc: Fix printk time during bootHelge Deller2-7/+3
Avoid showing invalid printk time stamps during boot. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Aaro Koskinen <aaro.koskinen@iki.fi>
2016-06-04parisc: Fix backtrace on PA-RISCMikulas Patocka1-8/+14
This patch fixes backtrace on PA-RISC There were several problems: 1) The code that decodes instructions handles instructions that subtract from the stack pointer incorrectly. If the instruction subtracts the number X from the stack pointer the code increases the frame size by (0x100000000-X). This results in invalid accesses to memory and recursive page faults. 2) Because gcc reorders blocks, handling instructions that subtract from the frame pointer is incorrect. For example, this function int f(int a) { if (__builtin_expect(a, 1)) return a; g(); return a; } is compiled in such a way, that the code that decreases the stack pointer for the first "return a" is placed before the code for "g" call. If we recognize this decrement, we mistakenly believe that the frame size for the "g" call is zero. To fix problems 1) and 2), the patch doesn't recognize instructions that decrease the stack pointer at all. To further safeguard the unwind code against nonsense values, we don't allow frame size larger than Total_frame_size. 3) The backtrace is not locked. If stack dump races with module unload, invalid table can be accessed. This patch adds a spinlock when processing module tables. Note, that for correct backtrace, you need recent binutils. Binutils 2.18 from Debian 5 produce garbage unwind tables. Binutils 2.21 work better (it sometimes forgets function frames, but at least it doesn't generate garbage). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Helge Deller <deller@gmx.de>
2016-06-03mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policiesMel Gorman1-7/+16
The optimistic fast path may use cpuset_current_mems_allowed instead of of a NULL nodemask supplied by the caller for cpuset allocations. The preferred zone is calculated on this basis for statistic purposes and as a starting point in the zonelist iterator. However, if the context can ignore memory policies due to being atomic or being able to ignore watermarks then the starting point in the zonelist iterator is no longer correct. This patch resets the zonelist iterator in the allocator slowpath if the context can ignore memory policies. This will alter the zone used for statistics but only after it is known that it makes sense for that context. Resetting it before entering the slowpath would potentially allow an ALLOC_CPUSET allocation to be accounted for against the wrong zone. Note that while nodemask is not explicitly set to the original nodemask, it would only have been overwritten if cpuset_enabled() and it was reset before the slowpath was entered. Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net Fixes: c33d6c06f60f710 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policyMel Gorman1-0/+1
Geert Uytterhoeven reported the following problem that bisected to commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") on m68k/ARAnyM BUG: scheduling while atomic: cron/668/0x10c9a0c0 Modules linked in: CPU: 0 PID: 668 Comm: cron Not tainted 4.6.0-atari-05133-gc33d6c06f60f710f #364 Call Trace: [<0003d7d0>] __schedule_bug+0x40/0x54 __schedule+0x312/0x388 __schedule+0x0/0x388 prepare_to_wait+0x0/0x52 schedule+0x64/0x82 schedule_timeout+0xda/0x104 set_next_entity+0x18/0x40 pick_next_task_fair+0x78/0xda io_schedule_timeout+0x36/0x4a bit_wait_io+0x0/0x40 bit_wait_io+0x12/0x40 __wait_on_bit+0x46/0x76 wait_on_page_bit_killable+0x64/0x6c bit_wait_io+0x0/0x40 wake_bit_function+0x0/0x4e __lock_page_or_retry+0xde/0x124 do_scan_async+0x114/0x17c lookup_swap_cache+0x24/0x4e handle_mm_fault+0x626/0x7de find_vma+0x0/0x66 down_read+0x0/0xe wait_on_page_bit_killable_timeout+0x77/0x7c find_vma+0x16/0x66 do_page_fault+0xe6/0x23a res_func+0xa3c/0x141a buserr_c+0x190/0x6d4 res_func+0xa3c/0x141a buserr+0x20/0x28 res_func+0xa3c/0x141a buserr+0x20/0x28 The relationship is not obvious but it's due to a failure to rescan the full zonelist after the fair zone allocation policy exhausts the batch count. While this is a functional problem, it's also a performance issue. A page allocator microbenchmark showed the following 4.7.0-rc1 4.7.0-rc1 vanilla reset-v1r2 Min alloc-odr0-1 327.00 ( 0.00%) 326.00 ( 0.31%) Min alloc-odr0-2 235.00 ( 0.00%) 235.00 ( 0.00%) Min alloc-odr0-4 198.00 ( 0.00%) 198.00 ( 0.00%) Min alloc-odr0-8 170.00 ( 0.00%) 170.00 ( 0.00%) Min alloc-odr0-16 156.00 ( 0.00%) 156.00 ( 0.00%) Min alloc-odr0-32 150.00 ( 0.00%) 150.00 ( 0.00%) Min alloc-odr0-64 146.00 ( 0.00%) 146.00 ( 0.00%) Min alloc-odr0-128 145.00 ( 0.00%) 145.00 ( 0.00%) Min alloc-odr0-256 155.00 ( 0.00%) 155.00 ( 0.00%) Min alloc-odr0-512 168.00 ( 0.00%) 165.00 ( 1.79%) Min alloc-odr0-1024 175.00 ( 0.00%) 174.00 ( 0.57%) Min alloc-odr0-2048 180.00 ( 0.00%) 180.00 ( 0.00%) Min alloc-odr0-4096 187.00 ( 0.00%) 186.00 ( 0.53%) Min alloc-odr0-8192 190.00 ( 0.00%) 190.00 ( 0.00%) Min alloc-odr0-16384 191.00 ( 0.00%) 191.00 ( 0.00%) Min alloc-odr1-1 736.00 ( 0.00%) 445.00 ( 39.54%) Min alloc-odr1-2 343.00 ( 0.00%) 335.00 ( 2.33%) Min alloc-odr1-4 277.00 ( 0.00%) 270.00 ( 2.53%) Min alloc-odr1-8 238.00 ( 0.00%) 233.00 ( 2.10%) Min alloc-odr1-16 224.00 ( 0.00%) 218.00 ( 2.68%) Min alloc-odr1-32 210.00 ( 0.00%) 208.00 ( 0.95%) Min alloc-odr1-64 207.00 ( 0.00%) 203.00 ( 1.93%) Min alloc-odr1-128 276.00 ( 0.00%) 202.00 ( 26.81%) Min alloc-odr1-256 206.00 ( 0.00%) 202.00 ( 1.94%) Min alloc-odr1-512 207.00 ( 0.00%) 202.00 ( 2.42%) Min alloc-odr1-1024 208.00 ( 0.00%) 205.00 ( 1.44%) Min alloc-odr1-2048 213.00 ( 0.00%) 212.00 ( 0.47%) Min alloc-odr1-4096 218.00 ( 0.00%) 216.00 ( 0.92%) Min alloc-odr1-8192 341.00 ( 0.00%) 219.00 ( 35.78%) Note that order-0 allocations are unaffected but higher orders get a small boost from this patch and a large reduction in system CPU usage overall as can be seen here: 4.7.0-rc1 4.7.0-rc1 vanilla reset-v1r2 User 85.32 86.31 System 2221.39 2053.36 Elapsed 2368.89 2202.47 Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Link: http://lkml.kernel.org/r/20160531100848.GR2527@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03mm, oom_reaper: do not use siglock in try_oom_reaper()Michal Hocko1-6/+1
Oleg has noted that siglock usage in try_oom_reaper is both pointless and dangerous. signal_group_exit can be checked lockless. The problem is that sighand becomes NULL in __exit_signal so we can crash. Fixes: 3ef22dfff239 ("oom, oom_reaper: try to reap tasks which skip regular OOM killer path") Link: http://lkml.kernel.org/r/1464679423-30218-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03mm, page_alloc: prevent infinite loop in buffered_rmqueue()Vlastimil Babka1-4/+5
In DEBUG_VM kernel, we can hit infinite loop for order == 0 in buffered_rmqueue() when check_new_pcp() returns 1, because the bad page is never removed from the pcp list. Fix this by removing the page before retrying. Also we don't need to check if page is non-NULL, because we simply grab it from the list which was just tested for being non-empty. Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") Link: http://lkml.kernel.org/r/20160530090154.GM2527@techsingularity.net Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03checkpatch: reduce git commit description style false positivesJoe Perches1-0/+1
Some lines in a commit log appear to be commit SHA1 ids like: ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")' Link: http://lkml.kernel.org/r/40e03fd7aaf1f55c75d787128d6d17c5a71226c2.1464358556.git.vdavydov@virtuozzo.com Reduce the false positives. Link: http://lkml.kernel.org/r/eda977eaa8328fef42bb3c87935d97e10ea8ff67.1464384023.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03mm/z3fold.c: avoid modifying HEADLESS page and minor cleanupVitaly Wool1-10/+14
Fix erroneous z3fold header access in a HEADLESS page in reclaim function, and change one remaining direct handle-to-buddy conversion to use the appropriate helper. Link: http://lkml.kernel.org/r/5748706F.9020208@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjenning@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()Tejun Heo1-0/+3
memcg_offline_kmem() may be called from memcg_free_kmem() after a css init failure. memcg_free_kmem() is a ->css_free callback which is called without cgroup_mutex and memcg_offline_kmem() ends up using css_for_each_descendant_pre() without any locking. Fix it by adding rcu read locking around it. mkdir: cannot create directory `65530': No space left on device =============================== [ INFO: suspicious RCU usage. ] 4.6.0-work+ #321 Not tainted ------------------------------- kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required! [ 527.243970] other info that might help us debug this: [ 527.244715] rcu_scheduler_active = 1, debug_locks = 0 2 locks held by kworker/0:5/1664: #0: ("cgroup_destroy"){.+.+..}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0 #1: ((&css->destroy_work)#3){+.+...}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0 [ 527.248098] stack backtrace: CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ #321 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 Workqueue: cgroup_destroy css_free_work_fn Call Trace: dump_stack+0x68/0xa1 lockdep_rcu_suspicious+0xd7/0x110 css_next_descendant_pre+0x7d/0xb0 memcg_offline_kmem.part.44+0x4a/0xc0 mem_cgroup_css_free+0x1ec/0x200 css_free_work_fn+0x49/0x5e0 process_one_work+0x1c5/0x4a0 worker_thread+0x49/0x490 kthread+0xea/0x100 ret_from_fork+0x1f/0x40 Link: http://lkml.kernel.org/r/20160526203018.GG23194@mtj.duckdns.org Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03mm: check the return value of lookup_page_ext for all call sitesYang Shi5-8/+77
Per the discussion with Joonsoo Kim [1], we need check the return value of lookup_page_ext() for all call sites since it might return NULL in some cases, although it is unlikely, i.e. memory hotplug. Tested with ltp with "page_owner=0". [1] http://lkml.kernel.org/r/20160519002809.GA10245@js1304-P5Q-DELUXE [akpm@linux-foundation.org: fix build-breaking typos] [arnd@arndb.de: fix build problems from lookup_page_ext] Link: http://lkml.kernel.org/r/6285269.2CksypHdYp@wuerfel [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1464023768-31025-1-git-send-email-yang.shi@linaro.org Signed-off-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03kdump: fix dmesg gdbmacro to work with record based printkCorey Minyard1-11/+82
Commit 7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer") introduced a record based printk buffer. Modify gdbmacros.txt to parse this new structure so dmesg will work properly. Link: http://lkml.kernel.org/r/1463515794-1599-1-git-send-email-minyard@acm.org Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03mm: fix overflow in vm_map_ram()Guillermo Julián Moreno1-4/+5
When remapping pages accounting for 4G or more memory space, the operation 'count << PAGE_SHIFT' overflows as it is performed on an integer. Solution: cast before doing the bitshift. [akpm@linux-foundation.org: fix vm_unmap_ram() also] [akpm@linux-foundation.org: fix vmap() as well, per Guillermo] Link: http://lkml.kernel.org/r/etPan.57175fb3.7a271c6b.2bd@naudit.es Signed-off-by: Guillermo Julián Moreno <guillermo.julian@naudit.es> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extentChris Mason1-1/+12
When dealing with inline extents, btrfs_get_extent will incorrectly try to insert a duplicate extent_map. The dup hits -EEXIST from add_extent_map, but then we try to merge with the existing one and end up trying to insert a zero length extent_map. This actually works most of the time, except when there are extent maps past the end of the inline extent. rocksdb will trigger this sometimes because it preallocates an extent and then truncates down. Josef made a script to trigger with xfs_io: #!/bin/bash xfs_io -f -c "pwrite 0 1000" inline xfs_io -c "falloc -k 4k 1M" inline xfs_io -c "pread 0 1000" -c "fadvise -d 0 1000" -c "pread 0 1000" inline xfs_io -c "fadvise -d 0 1000" inline cat inline You'll get EIOs trying to read inline after this because add_extent_map is returning EEXIST Signed-off-by: Chris Mason <clm@fb.com>