Age | Commit message (Collapse) | Author | Files | Lines |
|
The commit referenced below introduced device locking around save and
restore of state for each device during a PCI bus "try" reset, making it
decidely non-"try" and prone to deadlock in the event that a device is
already locked. Restore __pci_reset_bus() and __pci_reset_slot() to their
advertised locking semantics by pushing the save and restore functions into
the branch where the entire tree is already locked. Extend the helper
function names with "_locked" and update the comment to reflect this
calling requirement.
Fixes: b014e96d1abb ("PCI: Protect pci_error_handlers->reset_notify() usage with device_lock()")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
|
|
A warning is generated when a PCIe device is probed with a degraded link,
but there was no similar mechanism to warn when the link becomes degraded
after probing. The Link Bandwidth Notification provides this mechanism.
Use the Link Bandwidth Management Interrupt to detect bandwidth changes,
and rescan the bandwidth, looking for the weakest point. This is the same
logic used in probe().
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
|
|
Add support for altr,pcie-root-port-2.0.
Signed-off-by: Ley Foon Tan <ley.foon.tan@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
|
|
Enable PCIE_ALTERA on ARM64 platform.
Signed-off-by: Ley Foon Tan <ley.foon.tan@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|
|
Add PCIe Root Port support for Stratix 10 device.
Main differences compared to the PCIe Root Port IP on Cyclone V
and Arria 10 devices:
- HIP interface to access Root Port configuration register
- TLP programming flow:
- One REG0 register
- Don't need to check alignment
Signed-off-by: Ley Foon Tan <ley.foon.tan@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|
|
In remove(), ensure that the PME work cannot run after kfree() is called.
Otherwise, this could result in a use-after-free.
This issue was detected with the help of Coccinelle.
Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Frederick Lawler <fred@fredlawl.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Dongdong reported a deadlock triggered by a hotplug event during a sysfs
"remove" operation:
pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
# echo 1 > 0000:00:0c.0/remove
PME and hotplug share an MSI/MSI-X vector. The sysfs "remove" side is:
remove_store
pci_stop_and_remove_bus_device_locked
pci_lock_rescan_remove
pci_stop_and_remove_bus_device
...
pcie_pme_remove
pcie_pme_suspend
synchronize_irq # wait for hotplug IRQ handler
pci_unlock_rescan_remove
The hotplug side is:
pciehp_ist
pciehp_handle_presence_or_link_change
pciehp_configure_device
pci_lock_rescan_remove # wait for pci_unlock_rescan_remove()
INFO: task bash:10913 blocked for more than 120 seconds.
# ps -ax |grep D
PID TTY STAT TIME COMMAND
10913 ttyAMA0 Ds+ 0:00 -bash
14022 ? D 0:00 [irq/745-pciehp]
# cat /proc/14022/stack
__switch_to+0x94/0xd8
pci_lock_rescan_remove+0x20/0x28
pciehp_configure_device+0x30/0x140
pciehp_handle_presence_or_link_change+0x324/0x458
pciehp_ist+0x1dc/0x1e0
# cat /proc/10913/stack
__switch_to+0x94/0xd8
synchronize_irq+0x8c/0xc0
pcie_pme_suspend+0xa4/0x118
pcie_pme_remove+0x20/0x40
pcie_port_remove_service+0x3c/0x58
...
pcie_port_device_remove+0x2c/0x48
pcie_portdrv_remove+0x68/0x78
pci_device_remove+0x48/0x120
...
pci_stop_bus_device+0x84/0xc0
pci_stop_and_remove_bus_device_locked+0x24/0x40
remove_store+0xa4/0xb8
dev_attr_store+0x44/0x60
sysfs_kf_write+0x58/0x80
It is incorrect to call pcie_pme_suspend() from pcie_pme_remove() for two
reasons.
First, pcie_pme_suspend() calls synchronize_irq(), which will wait for the
native hotplug interrupt handler as well as for the PME one, because they
share one IRQ (as per the spec). That may deadlock if hotplug is signaled
while pcie_pme_remove() is running and the latter calls
pci_lock_rescan_remove() before the former.
Second, if pcie_pme_suspend() figures out that wakeup needs to be enabled
for the port, it will return without disabling the interrupt as expected by
pcie_pme_remove() which was overlooked by commit c7b5a4e6e8fb ("PCI / PM:
Fix native PME handling during system suspend/resume").
To fix that, rework pcie_pme_remove() to disable the PME interrupt, clear
its status and prevent the PME worker function from re-enabling it before
calling free_irq() on it, which should be sufficient.
Fixes: c7b5a4e6e8fb ("PCI / PM: Fix native PME handling during system suspend/resume")
Link: https://lore.kernel.org/linux-pci/c7697e7c-e1af-13e4-8491-0a3996e6ab5d@huawei.com
Reported-by: Dongdong Liu <liudongdong3@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bhelgaas: add URL and deadlock details from Dongdong]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Previously dpc_handler() called aer_get_device_error_info() without
initializing info->severity, so aer_get_device_error_info() relied on
uninitialized data.
Add dpc_get_aer_uncorrect_severity() to read the port's AER status, mask,
and severity registers and set info->severity.
Also, clear the port's AER fatal error status bits.
Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling")
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org # v4.19+
|
|
Replace Alan Douglas with Tom Joseph as the current PCI
Cadence host/endpoint controller maintainer.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Tom Joseph <tjoseph@cadence.com>
|
|
Commit 0e157e528604 ("PCI/PME: Implement runtime PM callbacks") tried to
solve an issue where the hierarchy immediately wakes up when it is
transitioned into D3cold. However, it turns out to prevent PME
propagation on some systems that do not support D3cold.
I looked more closely at what might cause the immediate wakeup. It happens
when the ACPI power resource of the root port is turned off. The AML code
associated with the _OFF() method of the ACPI power resource starts a PCIe
L2/L3 Ready transition and waits for it to complete. Right after the L2/L3
Ready transition is started the root port receives a PME from the
downstream port.
The simplest hierarchy where this happens looks like this:
00:1d.0 PCIe Root Port
^
|
v
05:00.0 PCIe switch #1 upstream port
06:01.0 PCIe switch #1 downstream hotplug port
^
|
v
08:00.0 PCIe switch #2 upstream port
It seems that the PCIe link between the two switches, before
PME_Turn_Off/PME_TO_Ack is complete for the whole hierarchy, goes
inactive and triggers PME towards the root port bringing it back to D0.
The L2/L3 Ready sequence is described in PCIe r4.0 spec sections 5.2 and
5.3.3 but unfortunately they do not state what happens if DLLSCE is
enabled during the sequence.
Disabling Data Link Layer State Changed event (DLLSCE) seems to prevent
the issue and still allows the downstream hotplug port to notice when a
device is plugged/unplugged.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202593
Fixes: 0e157e528604 ("PCI/PME: Implement runtime PM callbacks")
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org # v4.20+
|
|
The Class Code for subtractive decode PCI-to-PCI bridge is 060401h; add an
entry to make portdrv support this type of bridge. This allows use of PCIe
services on subtractive decode ports.
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
[bhelgaas: add braces surrounding entry]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The pci_device_id table was technically correct, but unusually formatted,
which made adding entries error-prone. Change the format so it's obvious
how to add entries.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
This reverts commit 0e157e52860441cb26051f131dd0b5ae3187a07b.
Heiner reported that the commit in question prevents his network adapter
from triggering PME and waking up when network cable is plugged.
The commit tried to prevent root port waking up from D3cold immediately but
looks like disabing root port PME interrupt is not the right way to fix
that issue so revert it now. The patch following proposes an alternative
solution to that issue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202103
Fixes: 0e157e528604 ("PCI/PME: Implement runtime PM callbacks")
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org # v4.20+
|
|
Latency Tolerance Reporting (LTR) allows Endpoints and Switch Upstream
Ports to report their latency requirements to upstream components. If ASPM
L1 PM substates are enabled, the LTR information helps determine when a
Link enters L1.2 [1].
Software must set the maximum latency values in the LTR Capability based on
characteristics of the platform, then set LTR Mechanism Enable in the
Device Control 2 register in the PCIe Capability. The device can then use
LTR to report its latency tolerance.
If the device reports a maximum latency value of zero, that means the
device requires the highest possible performance and the ASPM L1.2 substate
is effectively disabled.
We put devices in D3 for suspend, and we assume their internal state is
lost. On resume, previously we did not restore the LTR Capability, but we
did restore the LTR Mechanism Enable bit, so devices would request the
highest possible performance and ASPM L1.2 wouldn't be used.
[1] PCIe r4.0, sec 5.5.1
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201469
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Gigabyte X299 DESIGNARE EX motherboard has one PCIe root port that is
connected to an Alpine Ridge Thunderbolt controller. This port has slot
implemented bit set in the config space but other than that it is not
hotplug capable in the sense we are expecting in Linux (it has
dev->is_hotplug_bridge set to 0):
00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5
Bus: primary=00, secondary=05, subordinate=46, sec-latency=0
Memory behind bridge: 78000000-8fffffff [size=384M]
Prefetchable memory behind bridge: 00003800f8000000-00003800ffffffff [size=128M]
...
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
...
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #8, PowerLimit 25.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
Changed: MRL- PresDet+ LinkState+
This system is using ACPI based hotplug to notify the OS that it needs to
rescan the PCI bus (ACPI hotplug).
If there is nothing connected in any of the Thunderbolt ports the root port
will not have any runtime PM active children and is thus automatically
runtime suspended pretty soon after boot by PCI PM core. Now, when a
device is connected the BIOS SMI handler responsible for enumerating newly
added devices is not able to find anything because the port is in D3.
Prevent this from happening by blacklisting PCI power management of this
particular Gigabyte system.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202031
Reported-by: Kedar A Dongre <kedar.a.dongre@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
RussianNeuroMancer reported that the Intel 7265 wifi on a Dell Venue 11 Pro
7140 table stopped working after wakeup from suspend and bisected the
problem to 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't
have LTR"). David Ward reported the same problem on a Dell Latitude 7350.
After af8bb9f89838 ("PCI/ACPI: Request LTR control from platform before
using it"), we don't enable LTR unless the platform has granted LTR control
to us. In addition, we don't notice if the platform had already enabled
LTR itself.
After 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't have
LTR"), we avoid using LTR if we don't think the path to the device has LTR
enabled.
The combination means that if the platform itself enables LTR but declines
to give the OS control over LTR, we unnecessarily avoided using ASPM L1.2.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201469
Fixes: 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't have LTR")
Fixes: af8bb9f89838 ("PCI/ACPI: Request LTR control from platform before using it")
Reported-by: RussianNeuroMancer <russianneuromancer@ya.ru>
Reported-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v4.18+
|
|
The double underscore types are meant for compatibility in userspace
headers which does not apply here. Therefore, change to use the standard
no-underscore types.
The origin of the double underscore types dates back to before the git era
so I was not able to find a commit to see the original justification.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The double underscore types are meant for compatibility in userspace
headers which does not apply here. Therefore, change to use the standard
no-underscore types.
The origin of the double underscore types dates back to before the git era
so I was not able to find a commit to see the original justification.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
On Denverton's integration of the Intel(R) Trace Hub (for a reference and
overview see Documentation/trace/intel_th.rst) the reported size of one of
its resources (RTIT_BAR) doesn't match its actual size, which leads to
overlaps with other devices' resources.
In practice, it overlaps with XHCI MMIO space, which results in the xhci
driver bailing out after seeing its registers as 0xffffffff, and perceived
disappearance of all USB devices:
intel_th_pci 0000:00:1f.7: enabling device (0004 -> 0006)
xhci_hcd 0000:00:15.0: xHCI host controller not responding, assume dead
xhci_hcd 0000:00:15.0: xHC not responding in xhci_irq, assume controller is dead
xhci_hcd 0000:00:15.0: HC died; cleaning up
usb 1-1: USB disconnect, device number 2
For this reason, we need to resize the RTIT_BAR on Denverton to its actual
size, which in this case is 4MB. The corresponding erratum is DNV36 at the
link below:
DNV36. Processor Host Root Complex May Incorrectly Route Memory
Accesses to Intel® Trace Hub
Problem: The Intel® Trace Hub RTIT_BAR (B0:D31:F7 offset 20h) is
reported as a 2KB memory range. Due to this erratum, the
processor Host Root Complex will forward addresses from
RTIT_BAR to RTIT_BAR + 4MB -1 to Intel® Trace Hub.
Implication: Devices assigned within the RTIT_BAR to RTIT_BAR + 4MB -1
space may not function correctly.
Workaround: A BIOS code change has been identified and may be
implemented as a workaround for this erratum.
Status: No Fix.
Note that 5118ccd34780 ("intel_th: pci: Add Denverton SOC support") updates
the Trace Hub driver so it claims the Denverton device, but the resource
overlap exists regardless of whether that driver is loaded or that commit
is included.
Link: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-c3000-family-spec-update.pdf
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
[bhelgaas: include erratum text, clarify relationship with 5118ccd34780]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
|
|
Add PCIe support for the RZ/G2E (a.k.a. R8A774C0).
Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
|
|
As per Figure 6-3 in PCIe r4.0, sec 6.2.6, ERR_ messages will be forwarded
from the secondary interface to the primary interface, if the SERR# Enable
bit in the Bridge Control register is set.
It seems clear that an ACPI hotplug parameter method (_HPP or _HPX) that
tells us to "enable SERR in the command register" (ACPI v6.2, sec 6.2.8,
6.2.9.1) refers to PCI_COMMAND_SERR, which enables reporting of errors by
the function itself.
For bridges, we also interpreted that to mean we should enable
PCI_BRIDGE_CTL_SERR, which enables *forwarding* of errors by the bridge.
But we didn't enable PCI_BRIDGE_CTL_SERR anywhere else, which means we
never enabled it for non-ACPI systems or ACPI systems that didn't supply
hotplug parameters.
That means errors reported below bridges were often never forwarded up to a
Root Port where they could be signaled via AER.
Enable PCI_BRIDGE_CTL_SERR for all bridges so we can get better error
reporting for downstream devices.
Signed-off-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Move the Rohm Vendor ID to pci_ids.h instead of defining it in several
drivers.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
|
|
The HXT SD4800 PCI controller does not set the Command Completed bit unless
writes to the Slot Command register change "Control" bits.
Add SD4800 to the quirk.
Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Joey Zheng <yu.zheng@hxt-semitech.com>
|
|
The design of HXT SD4800 ACS feature is the same as QCOM QDF2xxx. Add an
ACS quirk for the SD4800.
Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
CC: Joey Zheng <yu.zheng@hxt-semitech.com>
|
|
Add the HXT vendor ID to pci_ids.h.
Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
CC: Joey Zheng <yu.zheng@hxt-semitech.com>
|
|
The PCI configuration space header type tells us whether the device is a
bridge, a CardBus bridge, or a normal device, and defines the layout of the
rest of the header (PCI r3.0 sec 6.1, PCIe r4.0 sec 7.5.1.1.9).
When we rely on the header format, e.g., when we're dealing with bridge
windows, we should check the header type, not the class code. The class
code is loosely related to the header type, but is often incorrect and the
spec doesn't actually require it to be related to the header format.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
[bhelgaas: changelog, keep the PCI_CLASS_BRIDGE_HOST check]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Currently, the pci_size() function actually returns 'size-1'. Make it
return real size to avoid confusion.
Signed-off-by: Du Changbin <changbin.du@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The sem_exit variable is conceptually a completion, so it should be called
that.
Similarly, the semOperations semaphore is a simple mutex, and can be
changed into that, respectively.
With both converted, the ibmphp_hpc_initvars() function is no longer used
and can be removed.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
match_string() returns the array index of a matching string. Use it
instead of the open-coded implementation.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Convert string compares of DT node names to use of_node_name_eq() helper
instead. This removes direct access to the node name pointer.
Signed-off-by: Rob Herring <robh@kernel.org>
[bhelgaas: drop similar rpaphp_core.c change to avoid merge conflict]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
pci_bridge_check_ranges() determines whether a bridge supports the optional
I/O and prefetchable memory windows and sets the flag bits in the bridge
resources. This *could* be done once during enumeration except that the
resource allocation code completely clears the flag bits, e.g., in the
pci_assign_unassigned_bridge_resources() path.
The problem with pci_bridge_check_ranges() in the resource allocation path
is that we may allocate resources after devices have been claimed by
drivers, and pci_bridge_check_ranges() *changes* the window registers to
determine whether they're writable. This may break concurrent accesses to
devices behind the bridge.
Add a new pci_read_bridge_windows() to determine whether a bridge supports
the optional windows, call it once during enumeration, remember the
results, and change pci_bridge_check_ranges() so it doesn't touch the
bridge windows but sets the flag bits based on those remembered results.
Link: https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-wangzhou1@hisilicon.com
Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
Reported-by: Yandong Xu <xuyandong2@huawei.com>
Tested-by: Yandong Xu <xuyandong2@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Ofer Hayut <ofer@lightbitslabs.com>
Cc: Roy Shterman <roys@lightbitslabs.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Zhou Wang <wangzhou1@hisilicon.com>
|
|
Shameerali reported that running v4.20-rc1 as QEMU guest, the PCIe hotplug
port times out during boot:
pciehp 0000:00:01.0:pcie004: Timeout on hotplug command 0x03f1 (issued 1016 msec ago)
pciehp 0000:00:01.0:pcie004: Timeout on hotplug command 0x03f1 (issued 1024 msec ago)
pciehp 0000:00:01.0:pcie004: Failed to check link status
pciehp 0000:00:01.0:pcie004: Timeout on hotplug command 0x02f1 (issued 2520 msec ago)
The issue was bisected down to commit 720d6a671a6e ("PCI: pciehp: Do not
handle events if interrupts are masked") and was further analyzed by the
reporter to be caused by the fact that pciehp first updates the hardware
and only then cache the ctrl->slot_ctrl in pcie_do_write_cmd(). If the
interrupt happens before we cache the value, pciehp_isr() reads value 0 and
decides that the interrupt was not meant for it causing the above timeout
to trigger.
Fix by moving ctrl->slot_ctrl assignment to happen before it is written to
the hardware.
Fixes: 720d6a671a6e ("PCI: pciehp: Do not handle events if interrupts are masked")
Link: https://lore.kernel.org/linux-pci/5FC3163CFD30C246ABAA99954A238FA8387DD344@FRAEML521-MBX.china.huawei.com
Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
There is a plan to build the kernel with -Wimplicit-fallthrough and
these places in the code produced warnings (W=1). Fix them up.
Signed-off-by: Mathieu Malaterre <malat@debian.org>
[bhelgaas: squash into one patch, drop extra changelog detail]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
|
|
The semantics of what "in core" means for the mincore() system call are
somewhat unclear, but Linux has always (since 2.3.52, which is when
mincore() was initially done) treated it as "page is available in page
cache" rather than "page is mapped in the mapping".
The problem with that traditional semantic is that it exposes a lot of
system cache state that it really probably shouldn't, and that users
shouldn't really even care about.
So let's try to avoid that information leak by simply changing the
semantics to be that mincore() counts actual mapped pages, not pages
that might be cheaply mapped if they were faulted (note the "might be"
part of the old semantics: being in the cache doesn't actually guarantee
that you can access them without IO anyway, since things like network
filesystems may have to revalidate the cache before use).
In many ways the old semantics were somewhat insane even aside from the
information leak issue. From the very beginning (and that beginning is
a long time ago: 2.3.52 was released in March 2000, I think), the code
had a comment saying
Later we can get more picky about what "in core" means precisely.
and this is that "later". Admittedly it is much later than is really
comfortable.
NOTE! This is a real semantic change, and it is for example known to
change the output of "fincore", since that program literally does a
mmmap without populating it, and then doing "mincore()" on that mapping
that doesn't actually have any pages in it.
I'm hoping that nobody actually has any workflow that cares, and the
info leak is real.
We may have to do something different if it turns out that people have
valid reasons to want the old semantics, and if we can limit the
information leak sanely.
Cc: Kevin Easton <kevin@guarana.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Masatake YAMATO <yamato@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")
broke both alpha and SH booting in qemu, as noticed by Guenter Roeck.
It turns out that the bug wasn't actually in that commit itself (which
would have been surprising: it was mostly a no-op), but in how the
addition of access_ok() to the strncpy_from_user() and strnlen_user()
functions now triggered the case where those functions would test the
access of the very last byte of the user address space.
The string functions actually did that user range test before too, but
they did it manually by just comparing against user_addr_max(). But
with user_access_begin() doing the check (using "access_ok()"), it now
exposed problems in the architecture implementations of that function.
For example, on alpha, the access_ok() helper macro looked like this:
#define __access_ok(addr, size) \
((get_fs().seg & (addr | size | (addr+size))) == 0)
and what it basically tests is of any of the high bits get set (the
USER_DS masking value is 0xfffffc0000000000).
And that's completely wrong for the "addr+size" check. Because it's
off-by-one for the case where we check to the very end of the user
address space, which is exactly what the strn*_user() functions do.
Why? Because "addr+size" will be exactly the size of the address space,
so trying to access the last byte of the user address space will fail
the __access_ok() check, even though it shouldn't. As a result, the
user string accessor functions failed consistently - because they
literally don't know how long the string is going to be, and the max
access is going to be that last byte of the user address space.
Side note: that alpha macro is buggy for another reason too - it re-uses
the arguments twice.
And SH has another version of almost the exact same bug:
#define __addr_ok(addr) \
((unsigned long __force)(addr) < current_thread_info()->addr_limit.seg)
so far so good: yes, a user address must be below the limit. But then:
#define __access_ok(addr, size) \
(__addr_ok((addr) + (size)))
is wrong with the exact same off-by-one case: the case when "addr+size"
is exactly _equal_ to the limit is actually perfectly fine (think "one
byte access at the last address of the user address space")
The SH version is actually seriously buggy in another way: it doesn't
actually check for overflow, even though it did copy the _comment_ that
talks about overflow.
So it turns out that both SH and alpha actually have completely buggy
implementations of access_ok(), but they happened to work in practice
(although the SH overflow one is a serious serious security bug, not
that anybody likely cares about SH security).
This fixes the problems by using a similar macro on both alpha and SH.
It isn't trying to be clever, the end address is based on this logic:
unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b;
which basically says "add start and length, and then subtract one unless
the length was zero". We can't subtract one for a zero length, or we'd
just hit an underflow instead.
For a lot of access_ok() users the length is a constant, so this isn't
actually as expensive as it initially looks.
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add support for the Adiantum encryption mode to fscrypt. Adiantum is a
tweakable, length-preserving encryption mode with security provably
reducible to that of XChaCha12 and AES-256, subject to a security bound.
It's also a true wide-block mode, unlike XTS. See the paper
"Adiantum: length-preserving encryption for entry-level processors"
(https://eprint.iacr.org/2018/720.pdf) for more details. Also see
commit 059c2a4d8e16 ("crypto: adiantum - add Adiantum support").
On sufficiently long messages, Adiantum's bottlenecks are XChaCha12 and
the NH hash function. These algorithms are fast even on processors
without dedicated crypto instructions. Adiantum makes it feasible to
enable storage encryption on low-end mobile devices that lack AES
instructions; currently such devices are unencrypted. On ARM Cortex-A7,
on 4096-byte messages Adiantum encryption is about 4 times faster than
AES-256-XTS encryption; decryption is about 5 times faster.
In fscrypt, Adiantum is suitable for encrypting both file contents and
names. With filenames, it fixes a known weakness: when two filenames in
a directory share a common prefix of >= 16 bytes, with CTS-CBC their
encrypted filenames share a common prefix too, leaking information.
Adiantum does not have this problem.
Since Adiantum also accepts long tweaks (IVs), it's also safe to use the
master key directly for Adiantum encryption rather than deriving
per-file keys, provided that the per-file nonce is included in the IVs
and the master key isn't used for any other encryption mode. This
configuration saves memory and improves performance. A new fscrypt
policy flag is added to allow users to opt-in to this configuration.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Remove the dot-prefixing since it is just a matter of the
.gitignore file.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Make simply skips a missing rule when it is marked as .PHONY.
Remove the dummy targets.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
You do not have to use define ... endef for filechk_* rules.
For simple cases, the use of assignment looks cleaner, IMHO.
I updated the usage for scripts/Kbuild.include in case somebody
misunderstands the 'define ... endif' is the requirement.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
Now that Kbuild automatically creates asm-generic wrappers for missing
mandatory headers, it is redundant to list the same headers in
generic-y and mandatory-y.
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
|
|
Some time ago, Sam pointed out a certain degree of overwrap between
generic-y and mandatory-y. (https://lkml.org/lkml/2017/7/10/121)
I tweaked the meaning of mandatory-y a little bit; now it defines the
minimum set of ASM headers that all architectures must have.
If arch does not have specific implementation of a mandatory header,
Kbuild will let it fallback to the asm-generic one by automatically
generating a wrapper. This will allow to drop lots of redundant
generic-y defines.
Previously, "mandatory" was used in the context of UAPI, but I guess
this can be extended to kernel space ASM headers.
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
|
|
These comments are leftovers of commit fcc8487d477a ("uapi: export all
headers under uapi directories").
Prior to that commit, exported headers must be explicitly added to
header-y. Now, all headers under the uapi/ directories are exported.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This commit removes redundant generic-y defines in
arch/riscv/include/asm/Kbuild.
[1] It is redundant to define the same generic-y in both
arch/$(ARCH)/include/asm/Kbuild and
arch/$(ARCH)/include/uapi/asm/Kbuild.
Remove the following generic-y:
errno.h
fcntl.h
ioctl.h
ioctls.h
ipcbuf.h
mman.h
msgbuf.h
param.h
poll.h
posix_types.h
resource.h
sembuf.h
setup.h
shmbuf.h
signal.h
socket.h
sockios.h
stat.h
statfs.h
swab.h
termbits.h
termios.h
types.h
[2] It is redundant to define generic-y when arch-specific
implementation exists in arch/$(ARCH)/include/asm/*.h
Remove the following generic-y:
cacheflush.h
module.h
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
filechk_* rules often consist of multiple 'echo' lines. They must be
surrounded with { } or ( ) to work correctly. Otherwise, only the
string from the last 'echo' would be written into the target.
Let's take care of that in the 'filechk' in scripts/Kbuild.include
to clean up filechk_* rules.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Since commit 9c2af1c7377a ("kbuild: add .DELETE_ON_ERROR special
target"), the target file is automatically deleted on failure.
The boilerplate code
... || { rm -f $@; false; }
is unneeded.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Commit 3a2429e1faf4 ("kbuild: change if_changed_rule for multi-line
recipe") and commit 4f0e3a57d6eb ("kbuild: Add support for DT binding
schema checks") came in via different sub-systems.
This is a follow-up cleanup.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
The only/last user of UIMAGE_IN/OUT was removed by commit 4722a3e6b716
("microblaze: fix multiple bugs in arch/microblaze/boot/Makefile").
The input and output should always be $< and $@.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
# define HAVE_JUMP_LABEL
#endif
We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
|
|
As mentioned in the info pages of gas, the '.align' pseudo op's
interpretation of the alignment value is architecture specific.
It might either be a byte value or taken to the power of two.
On ARM it's actually the latter which leads to unnecessary large
alignments of 16 bytes for 32 bit builds or 256 bytes for 64 bit
builds.
Fix this by switching to '.balign' instead which is consistent
across all architectures.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|