commit 99e5cde5eae78bef95bfe7c16ccda87fb070149b upstream.
Make sure to drop any device reference taken by vio_find_node() when
adding and removing virtual I/O slots.
Fixes: 5eeb8c63a3 ("[PATCH] PCI Hotplug: rpaphp: Move VIO registration")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 06cf35f903aa6da0cc8d9f81e9bcd1f7e1b534bb upstream.
Some AMD CS553x devices have read-only BARs because of a firmware or
hardware defect. There's a workaround in quirk_cs5536_vsa(), but it no
longer works after 36e8164882ca ("PCI: Restore detection of read-only
BARs"). Prior to 36e8164882ca, we filled in res->start; afterwards we
leave it zeroed out. The quirk only updated the size, so the driver tried
to use a region starting at zero, which didn't work.
Expand quirk_cs5536_vsa() to read the base addresses from the BARs and
hard-code the sizes.
On Nix's system BAR 2's read-only value is 0x6200. Prior to 36e8164882ca,
we interpret that as a 512-byte BAR based on the lowest-order bit set. Per
datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to
avoid clearing any address bits if a platform uses only 64-byte alignment.
[js] pcibios_bus_to_resource takes pdev, not bus in 3.12
[bhelgaas: changelog, reduce BAR 2 size to 64]
Fixes: 36e8164882ca ("PCI: Restore detection of read-only BARs")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4
Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf
Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdf
Reported-and-tested-by: Nix <nix@esperi.org.uk>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit c2e771b02792d222cbcd9617fe71482a64f52647 upstream.
Like the NFP6000, the NFP4000 as an erratum where reading/writing to PCI
config space addresses above 0x600 can cause the NFP to generate PCIe
completion timeouts.
Limit the NFP4000's PF's config space size to 0x600 bytes as is already
done for the NFP6000.
The NFP4000's VF is 0x6004 (PCI_DEVICE_ID_NETRONOME_NFP6000_VF), the same
device ID as the NFP6000's VF. Thus, its config space is already limited
by the existing use of quirk_nfp6000().
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 9f33a2ae59f24452c1076749deb615bccd435ca9 upstream.
The NFP6000 has an erratum where reading/writing to PCI config space
addresses above 0x600 can cause the NFP to generate PCIe completion
timeouts.
Limit the NFP6000's config space size to 0x600 bytes.
Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com>
[simon: edited changelog]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit c20aecf6963d1273d8f6d61c042b4845441ca592 upstream.
If a device quirk modifies the pci_dev->cfg_size to be less than
PCI_CFG_SPACE_EXP_SIZE (4096), but greater than PCI_CFG_SPACE_SIZE (256),
the PCI sysfs interface truncates the readable size to PCI_CFG_SPACE_SIZE.
Allow sysfs access to config space up to cfg_size, even if the device
doesn't support the entire 4096-byte PCIe config space.
Note that pci_read_config() and pci_write_config() limit access to
dev->cfg_size even though pcie_config_attr contains 4096 (the maximum
size).
Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com>
[simon: edited changelog]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
[bhelgaas: more changelog edits]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit ad67b437f187ea818b2860524d10f878fadfdd99 upstream.
b84106b4e229 ("PCI: Disable IO/MEM decoding for devices with non-compliant
BARs") disabled BAR sizing for BARs 0-5 of devices that don't comply with
the PCI spec. But it didn't do anything for expansion ROM BARs, so we
still try to size them, resulting in warnings like this on Broadwell-EP:
pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
Move the non-compliant BAR check from __pci_read_base() up to
pci_read_bases() so it applies to the expansion ROM BAR as well as
to BARs 0-5.
Note that direct callers of __pci_read_base(), like sriov_init(), will now
bypass this check. We haven't had reports of devices with broken SR-IOV
BARs yet.
[bhelgaas: changelog]
Fixes: b84106b4e229 ("PCI: Disable IO/MEM decoding for devices with non-compliant BARs")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit b84106b4e2290c081cdab521fa832596cdfea246 upstream.
The PCI config header (first 64 bytes of each device's config space) is
defined by the PCI spec so generic software can identify the device and
manage its usage of I/O, memory, and IRQ resources.
Some non-spec-compliant devices put registers other than BARs where the
BARs should be. When the PCI core sizes these "BARs", the reads and writes
it does may have unwanted side effects, and the "BAR" may appear to
describe non-sensical address space.
Add a flag bit to mark non-compliant devices so we don't touch their BARs.
Turn off IO/MEM decoding to prevent the devices from consuming address
space, since we can't read the BARs to find out what that address space
would be.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 4d8c8bd6f2062c9988817183a91fe2e623c8aa5e upstream.
Occasionaly PV guests would crash with:
pciback 0000:00:00.1: Xen PCI mapped GSI0 to IRQ16
BUG: unable to handle kernel paging request at 0000000d1a8c0be0
.. snip..
<ffffffff8139ce1b>] find_next_bit+0xb/0x10
[<ffffffff81387f22>] cpumask_next_and+0x22/0x40
[<ffffffff813c1ef8>] pci_device_probe+0xb8/0x120
[<ffffffff81529097>] ? driver_sysfs_add+0x77/0xa0
[<ffffffff815293e4>] driver_probe_device+0x1a4/0x2d0
[<ffffffff813c1ddd>] ? pci_match_device+0xdd/0x110
[<ffffffff81529657>] __device_attach_driver+0xa7/0xb0
[<ffffffff815295b0>] ? __driver_attach+0xa0/0xa0
[<ffffffff81527622>] bus_for_each_drv+0x62/0x90
[<ffffffff8152978d>] __device_attach+0xbd/0x110
[<ffffffff815297fb>] device_attach+0xb/0x10
[<ffffffff813b75ac>] pci_bus_add_device+0x3c/0x70
[<ffffffff813b7618>] pci_bus_add_devices+0x38/0x80
[<ffffffff813dc34e>] pcifront_scan_root+0x13e/0x1a0
[<ffffffff817a0692>] pcifront_backend_changed+0x262/0x60b
[<ffffffff814644c6>] ? xenbus_gather+0xd6/0x160
[<ffffffff8120900f>] ? put_object+0x2f/0x50
[<ffffffff81465c1d>] xenbus_otherend_changed+0x9d/0xa0
[<ffffffff814678ee>] backend_changed+0xe/0x10
[<ffffffff81463a28>] xenwatch_thread+0xc8/0x190
[<ffffffff810f22f0>] ? woken_wake_function+0x10/0x10
which was the result of two things:
When we call pci_scan_root_bus we would pass in 'sd' (sysdata)
pointer which was an 'pcifront_sd' structure. However in the
pci_device_add it expects that the 'sd' is 'struct sysdata' and
sets the dev->node to what is in sd->node (offset 4):
set_dev_node(&dev->dev, pcibus_to_node(bus));
__pcibus_to_node(const struct pci_bus *bus)
{
const struct pci_sysdata *sd = bus->sysdata;
return sd->node;
}
However our structure was pcifront_sd which had nothing at that
offset:
struct pcifront_sd {
int domain; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
struct pcifront_device * pdev; /* 8 8 */
}
That is an hole - filled with garbage as we used kmalloc instead of
kzalloc (the second problem).
This patch fixes the issue by:
1) Use kzalloc to initialize to a well known state.
2) Put 'struct pci_sysdata' at the start of 'pcifront_sd'. That
way access to the 'node' will access the right offset.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ae2182b1e3407de369f8c5d799543b7db74221b upstream.
A Root Port's AER structure (rpc) contains a queue of events. aer_irq()
enqueues AER status information and schedules aer_isr() to dequeue and
process it. When we remove a device, aer_remove() waits for the queue to
be empty, then frees the rpc struct.
But aer_isr() references the rpc struct after dequeueing and possibly
emptying the queue, which can cause a use-after-free error as in the
following scenario with two threads, aer_isr() on the left and a
concurrent aer_remove() on the right:
Thread A Thread B
-------- --------
aer_irq():
rpc->prod_idx++
aer_remove():
wait_event(rpc->prod_idx == rpc->cons_idx)
# now blocked until queue becomes empty
aer_isr(): # ...
rpc->cons_idx++ # unblocked because queue is now empty
... kfree(rpc)
mutex_unlock(&rpc->rpc_mutex)
To prevent this problem, use flush_work() to wait until the last scheduled
instance of aer_isr() has completed before freeing the rpc struct in
aer_remove().
I reproduced this use-after-free by flashing a device FPGA and
re-enumerating the bus to find the new device. With SLUB debug, this
crashes with 0x6b bytes (POISON_FREE, the use-after-free magic number) in
GPR25:
pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
Unable to handle kernel paging request for data at address 0x27ef9e3e
Workqueue: events aer_isr
GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
NIP [602f5328] pci_walk_bus+0xd4/0x104
[bhelgaas: changelog, stable tag]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1541dc977d376406f4584d8eb055488655c98ec upstream.
In fixup_ti816x_class(), we assigned "class = PCI_CLASS_MULTIMEDIA_VIDEO".
But PCI_CLASS_MULTIMEDIA_VIDEO is only the two-byte base class/sub-class
and needs to be shifted to make space for the low-order interface byte.
Shift PCI_CLASS_MULTIMEDIA_VIDEO to set the correct class code.
Fixes: 63c4408074 ("PCI: Add quirk for setting valid class for TI816X Endpoint")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Hemant Pedanekar <hemantp@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 16b036af31e1456cb69243a5a0c9ef801ecd1f17 upstream.
If the image size would ever read as 0, pci_get_rom_size() could keep
processing the same image over and over again. Exit the loop if we ever
read a length of zero.
This fixes a soft lockup on boot when the radeon driver calls
pci_get_rom_size() on an AMD Radeon R7 250X PCIe discrete graphics card.
[bhelgaas: changelog, reference]
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386973
Reported-by: Federico <federicotg@gmail.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 145b3fe579db66fbe999a2bc3fd5b63dffe9636d upstream.
Some implementations of modprobe fail to load the driver for a PCI device
automatically because the "interface" part of the modalias from the kernel
is lowercase, and the modalias from file2alias is uppercase.
The "interface" is the low-order byte of the Class Code, defined in PCI
r3.0, Appendix D. Most interface types defined in the spec do not use
alpha characters, so they won't be affected. For example, 00h, 01h, 10h,
20h, etc. are unaffected.
Print the "interface" byte of the Class Code in uppercase hex, as we
already do for the Vendor ID, Device ID, Class, etc.
Commit 89ec3dcf17fd ("PCI: Generate uppercase hex for modalias interface
class") fixed only half of the problem. Some udev implementations rely on
the uevent file and not the modalias file.
Fixes: d1ded203ad ("PCI: add MODALIAS to hotplug event for pci devices")
Fixes: 89ec3dcf17fd ("PCI: Generate uppercase hex for modalias interface class")
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 36e8164882ca6d3c41cb91e6f09a3ed236841f80 upstream.
Commit 6ac665c63d ("PCI: rewrite PCI BAR reading code") masked off
low-order bits from 'l', but not from 'sz'. Both are passed to pci_size(),
which compares 'base == maxbase' to check for read-only BARs. The masking
of 'l' means that comparison will never be 'true', so the check for
read-only BARs no longer works.
Resolve this by also masking off the low-order bits of 'sz' before passing
it into pci_size() as 'maxbase'. With this change, pci_size() will once
again catch the problems that have been encountered to date:
- AGP aperture BAR of AMD-7xx host bridges: if the AGP window is
disabled, this BAR is read-only and read as 0x00000008 [1]
- BARs 0-4 of ALi IDE controllers can be non-zero and read-only [1]
- Intel Sandy Bridge - Thermal Management Controller [8086:0103];
BAR 0 returning 0xfed98004 [2]
- Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
Bar 0 returning 0x00001a [3]
Link: [1] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9 ("PCI: probing read-only BARs" (pre-git))
Link: [2] https://bugzilla.kernel.org/show_bug.cgi?id=43331
Link: [3] https://bugzilla.kernel.org/show_bug.cgi?id=85991
Reported-by: William Unruh <unruh@physics.ubc.ca>
Reported-by: Martin Lucina <martin@lucina.net>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f144d1496b47e7450f41b767d0d91c724c2198bc upstream.
This can be set by quirks/drivers to be used by the architecture code
that assigns the MSI addresses.
We additionally add verification in the core MSI code that the values
assigned by the architecture do satisfy the limitation in order to fail
gracefully if they don't (ie. the arch hasn't been updated to deal with
that quirk yet).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89ec3dcf17fd3fa009ecf8faaba36828dd6bc416 upstream.
Some implementations of modprobe fail to load the driver for a PCI device
automatically because the "interface" part of the modalias from the kernel
is lowercase, and the modalias from file2alias is uppercase.
The "interface" is the low-order byte of the Class Code, defined in PCI
r3.0, Appendix D. Most interface types defined in the spec do not use
alpha characters, so they won't be affected. For example, 00h, 01h, 10h,
20h, etc. are unaffected.
Print the "interface" byte of the Class Code in uppercase hex, as we
already do for the Vendor ID, Device ID, Class, etc.
[bhelgaas: changelog]
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9fe373f9997b48fcd6222b95baf4a20c134b587a upstream.
The Crocodile chip occasionally comes up with 4k and 8k BAR sizes. Due to
an erratum, setting the SR-IOV page size causes the physical function BARs
to expand to the system page size. Since ppc64 uses 64k pages, when Linux
tries to assign the smaller resource sizes to the now 64k BARs the address
will be truncated and the BARs will overlap.
Force Linux to allocate the resource as a full page, which avoids the
overlap.
[bhelgaas: print expanded resource, too]
Signed-off-by: Douglas Lehr <dllehr@us.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67ebd8140dc8923c65451fa0f6a8eee003c4dcd3 upstream.
3448a19da4 "vgaarb: use bridges to control VGA routing where possible"
added the "flags & PCI_VGA_STATE_CHANGE_DECODES" condition to an existing
WARN_ON(), but used bitwise AND (&) instead of logical AND (&&), so the
condition is never true. Replace with logical AND.
Found by Coverity (CID 142811).
Fixes: 3448a19da4 "vgaarb: use bridges to control VGA routing where possible"
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c82126a94e69bbbac586f0249e7ef11e681246c upstream.
After a CPU upgrade while keeping the same mainboard, we faced "spurious
interrupt" problems again.
It turned out that the new CPU also featured a new GPU with a different PCI
ID.
Add this PCI ID to the quirk table. Probably all other Intel GPU PCI IDs
are affected, too, but I don't want to add them without a test system.
See f67fd55fa9 ("PCI: Add quirk for still enabled interrupts on Intel
Sandy Bridge GPUs") for some history.
[bhelgaas: add f67fd55fa9 reference, stable tag]
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 93fa9d32670f5592c8e56abc9928fc194e1e72fc upstream.
When a new device is added below a hotplug bridge, the bridge's secondary
bus speed and the device's bus speed must match. The shpchp driver
previously checked the bridge's *primary* bus speed, not the secondary bus
speed.
This caused hot-add errors like:
shpchp 0000:00:03.0: Speed of bus ff and adapter 0 mismatch
Check the secondary bus speed instead.
[bhelgaas: changelog]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=75251
Fixes: 3749c51ac6 ("PCI: Make current and maximum bus speeds part of the PCI core")
Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3cdeb713dc66057b50682048c151eae07b186c42 upstream.
Andreas reported that after 1f42db786b14 ("PCI: Enable INTx if BIOS left
them disabled"), pciehp surprise removal stopped working.
This happens because pci_reenable_device() on the hotplug bridge (used in
the pciehp_configure_device() path) clears the Interrupt Disable bit, which
apparently breaks the bridge's MSI hotplug event reporting.
Previously we cleared the Interrupt Disable bit in do_pci_enable_device(),
which is used by both pci_enable_device() and pci_reenable_device(). But
we use pci_reenable_device() after the driver may have enabled MSI or
MSI-X, and we *set* Interrupt Disable as part of enabling MSI/MSI-X.
This patch clears Interrupt Disable only when MSI/MSI-X has not been
enabled.
Fixes: 1f42db786b14 PCI: Enable INTx if BIOS left them disabled
Link: https://bugzilla.kernel.org/show_bug.cgi?id=71691
Reported-and-tested-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fc9bbf98fd66f879e628d8537ba7c240be2b58e upstream.
Add a flag to tell the PCI subsystem that kernel is shutting down in
preparation to kexec a kernel. Add code in PCI subsystem to use this flag
to clear Bus Master bit on PCI devices only in case of kexec reboot.
This fixes a power-off problem on Acer Aspire V5-573G and likely other
machines and avoids any other issues caused by clearing Bus Master bit on
PCI devices in normal shutdown path. The problem was introduced by
b566a22c23 ("PCI: disable Bus Master on PCI device shutdown").
This patch is based on discussion at
http://marc.info/?l=linux-pci&m=138425645204355&w=2
Link: https://bugzilla.kernel.org/show_bug.cgi?id=63861
Reported-by: Chang Liu <cl91tp@gmail.com>
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7cc5cf74544d97d7b69e2701595037474db1f96 upstream.
The pcie_portdrv .probe() method calls pci_enable_device() once, in
pcie_port_device_register(), but the .remove() method calls
pci_disable_device() twice, in pcie_port_device_remove() and in
pcie_portdrv_remove().
That causes a "disabling already-disabled device" warning when removing a
PCIe port device. This happens all the time when removing Thunderbolt
devices, but is also easy to reproduce with, e.g.,
"echo 0000:00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind"
This patch removes the disable from pcie_portdrv_remove().
[bhelgaas: changelog, tag for stable]
Reported-by: David Bulkow <David.Bulkow@stratus.com>
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d3a1741f1e648cfbd5a0cc94477a0d5004c6f5e upstream.
Previously we allowed callers to access Slot Capabilities, Status, and
Control for Root Ports even if the Root Port did not implement a slot.
This seems dubious because the spec only requires these registers if a
slot is implemented.
It's true that even Root Ports without slots must have *space* for these
slot registers, because the Root Capabilities, Status, and Control
registers are after the slot registers in the capability. However,
for a v1 PCIe Capability, the *semantics* of the slot registers are
undefined unless a slot is implemented.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-By: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8b303d0206b28c4ff3aecada47108d1655ae00f upstream.
Previously we relied on the PCIe r3.0, sec 7.8, spec language that says
"For Functions that do not implement the [Link, Slot, Root] registers,
these spaces must be hardwired to 0b," which means that for v2 PCIe
capabilities, we don't need to check the device type at all.
But it's simpler if we don't need to check the capability version at all,
and I think the spec is explicit enough about which registers are required
for which types that we can remove the version checks.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-By: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 834145156bedadfb50121f0bc5e9d9f9f942bcca upstream.
Commit 448bd85 (PCI/PM: add PCIe runtime D3cold support) added a
piece of code to pci_acpi_wake_dev() causing that function to behave
in a special way for devices in D3cold (so that their configuration
registers are not accessed before those devices are resumed).
However, it didn't take the clearing of the pme_poll flag into
account. That has to be done for all devices, even if they are in
D3cold, or pci_pme_list_scan() will not know that wakeup has been
signaled for the device and will poll its PME Status bit
unnecessarily.
Fix the problem by moving the clearing of the pme_poll flag in
pci_acpi_wake_dev() before the code introduced by commit 448bd85.
Reported-and-tested-by: David E. Box <david.e.box@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 60f75b8e97daf4a39790a20d962cb861b9220af5 upstream.
In theory, under a given ACPI namespace node there should be only
one child device object with _ADR whose value matches a given bus
address exactly. In practice, however, there are systems in which
multiple child device objects under a given parent have _ADR matching
exactly the same address. In those cases we use _STA to determine
which of the multiple matching devices is enabled, since some systems
are known to indicate which ACPI device object to associate with the
given physical (usually PCI) device this way.
Unfortunately, as it turns out, there are systems in which many
device objects under the same parent have _ADR matching exactly the
same bus address and none of them has _STA, in which case they all
should be regarded as enabled according to the spec. Still, if
those device objects are supposed to represent bridges (e.g. this
is the case for device objects corresponding to PCIe ports), we can
try harder and skip the ones that have no child device objects in the
ACPI namespace. With luck, we can avoid using device objects that we
are not expected to use this way.
Although this only works for bridges whose children also have ACPI
namespace representation, it is sufficient to address graphics
adapter detection issues on some systems, so rework the code finding
a matching device ACPI handle for a given bus address to implement
this idea.
Introduce a new function, acpi_find_child(), taking three arguments:
the ACPI handle of the device's parent, a bus address suitable for
the device's bus type and a bool indicating if the device is a
bridge and make it work as outlined above. Reimplement the function
currently used for this purpose, acpi_get_child(), as a call to
acpi_find_child() with the last argument set to 'false' and make
the PCI subsystem use acpi_find_child() with the bridge information
passed as the last argument to it. [Lan Tianyu notices that it is
not sufficient to use pci_is_bridge() for that, because the device's
subordinate pointer hasn't been set yet at this point, so use
hdr_type instead.]
This change fixes a regression introduced inadvertently by commit
33f767d (ACPI: Rework acpi_get_child() to be more efficient) which
overlooked the fact that for acpi_walk_namespace() "post-order" means
"after all children have been visited" rather than "on the way back",
so for device objects without children and for namespace walks of
depth 1, as in the acpi_get_child() case, the "post-order" callbacks
ordering is actually the same as the ordering of "pre-order" ones.
Since that commit changed the namespace walk in acpi_get_child() to
terminate after finding the first matching object instead of going
through all of them and returning the last one, it effectively
changed the result returned by that function in some rare cases and
that led to problems (the switch from a "pre-order" to a "post-order"
callback was supposed to prevent that from happening, but it was
ineffective).
As it turns out, the systems where the change made by commit
33f767d actually matters are those where there are multiple ACPI
device objects representing the same PCIe port (which effectively
is a bridge). Moreover, only one of them, and the one we are
expected to use, has child device objects in the ACPI namespace,
so the regression can be addressed as described above.
References: https://bugzilla.kernel.org/show_bug.cgi?id=60561
Reported-by: Peter Wu <lekensteyn@gmail.com>
Tested-by: Vladimir Lalov <mail@vlalov.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Peter Wu <lekensteyn@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa914f5ec25e4371ba18b312971314be1b9b1076 upstream.
Ben Herrenschmidt reported the following problem:
- The bus has space for all desired MMIO resources, including optional
space for SR-IOV devices
- We attempt to allocate I/O port space, but it fails because the bus
has no I/O space
- Because of the I/O allocation failure, we retry MMIO allocation,
requesting only the required space, without the optional SR-IOV space
This means we don't allocate the optional SR-IOV space, even though we
could.
This is related to 0c5be0cb0e ("PCI: Retry on IORESOURCE_IO type
allocations").
This patch changes how we handle allocation failures. We will now retry
allocation of only the resource type that failed. If MMIO allocation
fails, we'll retry only MMIO allocation. If I/O port allocation fails,
we'll retry only I/O port allocation.
[bhelgaas: changelog]
Reference: https://lkml.kernel.org/r/1367712653.11982.19.camel@pasglop
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29ed1f29b68a8395d5679b3c4e38352b617b3236 upstream.
Hot-removing a device with SR-IOV enabled causes a null pointer dereference
in v3.9 and v3.10.
This is a regression caused by ba518e3c17 ("PCI: pciehp: Iterate over all
devices in slot, not functions 0-7"). When we iterate over the
bus->devices list, we first remove the PF, which also removes all the VFs
from the list. Then the list iterator blows up because more than just the
current entry was removed from the list.
ac205b7bb7 ("PCI: make sriov work with hotplug remove") works around a
similar problem in pci_stop_bus_devices() by iterating over the list in
reverse, so the VFs are stopped and removed from the list first, before the
PF.
This patch changes pciehp_unconfigure_device() to iterate over the list in
reverse, too.
[bhelgaas: bugzilla, changelog]
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=60604
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 343df771e671d821478dd3ef525a0610b808dbf8 upstream.
After calling device_register(&bridge->dev), the bridge is reference-
counted, and it is illegal to call kfree() on it except in the release
function.
[bhelgaas: changelog, use put_device() after device_register() failure]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fbf33f516bdbcc2ab1ba1e54dfb720b0cfaa6874 upstream.
Commit 4f535093cf "PCI: Put pci_dev in device tree as early as possible"
moves device registering from pci_bus_add_devices() to pci_device_add().
That causes problems for virtual functions because device_add(&virtfn->dev)
is called before setting the virtfn->is_virtfn flag, which then causes Xen
to report PCI virtual functions as PCI physical functions.
Fix it by setting virtfn->is_virtfn before calling pci_device_add().
[Jiang Liu]: Move the setting of virtfn->is_virtfn ahead further for better
readability and modify changelog.
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 098b1aeaf4d6149953b8f1f8d55c21d85536fbff upstream.
There are two tool-stack that can instruct the Xen PCI frontend
and backend to change states: 'xm' (Python code with a daemon),
and 'xl' (C library - does not keep state changes).
With the 'xm', the path to disconnect a single PCI device (xm pci-detach
<guest> <BDF>) is:
4(Connected)->7(Reconfiguring*)-> 8(Reconfigured)-> 4(Connected)->5(Closing*).
The * is for states that the tool-stack sets. For 'xl', it is similar:
4(Connected)->7(Reconfiguring*)-> 8(Reconfigured)-> 4(Connected)
Both of them also tear down the XenBus structure, so the backend
state ends up going in the 3(Initialised) and calls pcifront_xenbus_remove.
When a PCI device is plugged back in (xm pci-attach <guest> <BDF>)
both of them follow the same pattern:
2(InitWait*), 3(Initialized*), 4(Connected*)->4(Connected).
[xen-pcifront ignores the 2,3 state changes and only acts when
4 (Connected) has been reached]
Note that this is for a _single_ PCI device. If there were two
PCI devices and only one was disconnected 'xm' would show the same
state changes.
The problem is that git commit 3d925320e9
("xen/pcifront: Use Xen-SWIOTLB when initting if required") introduced
a mechanism to initialize the SWIOTLB when the Xen PCI front moves to
Connected state. It also had some aggressive seatbelt code check that
would warn the user if one tried to change to Connected state without
hitting first the Closing state:
pcifront pci-0: PCI frontend already installed!
However, that code can be relaxed and we can continue on working
even if the frontend is instructed to be the 'Connected' state with
no devices and then gets tickled to be in 'Connected' state again.
In other words, this 4(Connected)->5(Closing)->4(Connected) state
was expected, while 4(Connected)->.... anything but 5(Closing)->4(Connected)
was not. This patch removes that aggressive check and allows
Xen pcifront to work with the 'xl' toolstack (for one or more
PCI devices) and with 'xm' toolstack (for more than two PCI
devices).
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
[v2: Added in the description about two PCI devices]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The interactions between the ACPI dock driver and the ACPI-based PCI
hotplug (acpiphp) are currently problematic because of ordering
issues during hot-remove operations.
First of all, the current ACPI glue code expects that physical
devices will always be deleted before deleting the companion ACPI
device objects. Otherwise, acpi_unbind_one() will fail with a
warning message printed to the kernel log, for example:
[ 185.026073] usb usb5: Oops, 'acpi_handle' corrupt
[ 185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
[ 185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
[ 180.013656] port1: Oops, 'acpi_handle' corrupt
This means, in particular, that struct pci_dev objects have to
be deleted before the struct acpi_device objects they are "glued"
with.
Now, the following happens the during the undocking of an ACPI-based
dock station:
1) hotplug_dock_devices() invokes registered hotplug callbacks to
destroy physical devices associated with the ACPI device objects
depending on the dock station. It calls dd->ops->handler() for
each of those device objects.
2) For PCI devices dd->ops->handler() points to
handle_hotplug_event_func() that queues up a separate work item
to execute _handle_hotplug_event_func() for the given device and
returns immediately. That work item will be executed later.
3) hotplug_dock_devices() calls dock_remove_acpi_device() for each
device depending on the dock station. This runs acpi_bus_trim()
for each of them, which causes the underlying ACPI device object
to be destroyed, but the work items queued up by
handle_hotplug_event_func() haven't been started yet.
4) _handle_hotplug_event_func() queued up in step 2) are executed
and cause the above failure to happen, because the PCI devices
they handle do not have the companion ACPI device objects any
more (those objects have been deleted in step 3).
The possible breakage doesn't end here, though, because
hotplug_dock_devices() may return before at least some of the
_handle_hotplug_event_func() work items spawned by it have a
chance to complete and then undock() will cause _DCK to be
evaluated and that will cause the devices handled by the
_handle_hotplug_event_func() to go away possibly while they are
being accessed.
This means that dd->ops->handler() for PCI devices should not point
to handle_hotplug_event_func(). Instead, it should point to a
function that will do the work of _handle_hotplug_event_func()
synchronously. For this reason, introduce such a function,
hotplug_event_func(), and modity acpiphp_dock_ops to point to
it as the handler.
Unfortunately, however, this is not sufficient, because if the dock
code were not changed further, hotplug_event_func() would now
deadlock with hotplug_dock_devices() that called it, since it would
run unregister_hotplug_dock_device() which in turn would attempt to
acquire the dock station's hp_lock mutex already acquired by
hotplug_dock_devices().
To resolve that deadlock use the observation that
unregister_hotplug_dock_device() won't need to acquire hp_lock
if PCI bridges the devices on the dock station depend on are
prevented from being removed prematurely while the first loop in
hotplug_dock_devices() is in progress.
To make that possible, introduce a mechanism by which the callers of
register_hotplug_dock_device() can provide "init" and "release"
routines that will be executed, respectively, during the addition
and removal of the physical device object associated with the
given ACPI device handle. Make acpiphp use two new functions,
acpiphp_dock_init() and acpiphp_dock_release(), that call
get_bridge() and put_bridge(), respectively, on the acpiphp bridge
holding the given device, for this purpose.
In addition to that, remove the dock station's list of
"hotplug devices" and make the dock code always walk the whole list
of "dependent devices" instead in such a way that the loops in
hotplug_dock_devices() and dock_event() (replacing the loops over
"hotplug devices") will take references to the list entries that
register_hotplug_dock_device() has been called for. That prevents
the "release" routines associated with those entries from being
called while the given entry is being processed and for PCI
devices this means that their bridges won't be removed (by a
concurrent thread) while hotplug_event_func() handling them is
being executed.
This change is based on two earlier patches from Jiang Liu.
References: https://bugzilla.kernel.org/show_bug.cgi?id=59501
Reported-and-tested-by: Alexander E. Patrakov <patrakov@gmail.com>
Tracked-down-by: Jiang Liu <jiang.liu@huawei.com>
Tested-by: Illya Klymov <xanf@xanf.me>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: 3.9+ <stable@vger.kernel.org>
On x86 platforms, the kernel respects PCI resource assignments from
the BIOS and only reassigns resources for unassigned BARs at boot
time. However, with the ACPI-based hotplug (acpiphp), it ignores the
BIOS' PCI resource assignments completely and reassigns all resources
by itself. This causes differences in PCI resource allocation
between boot time and runtime hotplug to occur, which is generally
undesirable and sometimes actively breaks things.
Namely, if there are enough resources, reassigning all PCI resources
during runtime hotplug should work, but it may fail if the resources
are constrained. This may happen, for instance, when some PCI
devices with huge MMIO BARs are involved in the runtime hotplug
operations, because the current PCI MMIO alignment algorithm may
waste huge chunks of MMIO address space in those cases.
On the Alexander's Sony VAIO VPCZ23A4R the BIOS allocates limited
MMIO resources for the dock station which contains a device
(graphics adapter) with a 256MB MMIO BAR. An attempt to reassign
that during runtime hotplug causes the dock station MMIO window to be
exhausted and acpiphp fails to allocate resources for the majority
of devices on the dock station as a result.
To prevent that from happening, modify acpiphp to follow the boot
time resources allocation behavior so that the BIOS' resource
assignments are respected during runtime hotplug too.
[rjw: Changelog]
References: https://bugzilla.kernel.org/show_bug.cgi?id=56531
Reported-and-tested-by: Alexander E. Patrakov <patrakov@gmail.com>
Tested-by: Illya Klymov <xanf@xanf.me>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: 3.9+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The following warning was seen on 3.9 when a corrected PCIe error was being
handled by the AER subsystem.
WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()
This occurred because a call to pci_get_domain_bus_and_slot() was added to
cper_print_pcie() to setup for the call to cper_print_aer(). The warning
showed up because cper_print_pcie() is called in an interrupt context and
pci_get* functions are not supposed to be called in that context.
The solution is to move the cper_print_aer() call out of the interrupt
context and into aer_recover_work_func() to avoid any warnings when calling
pci_get* functions.
Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When a PCI host bridge device receives a Bus Check notification, we
must re-enumerate starting with the bridge to discover changes (devices
that have been added or removed).
Prior to 668192b678 ("PCI: acpiphp: Move host bridge hotplug to
pci_root.c"), this happened in _handle_hotplug_event_bridge(). After that
commit, _handle_hotplug_event_bridge() is not installed for host bridges,
and the host bridge notify handler, _handle_hotplug_event_root() did not
re-enumerate.
This patch adds re-enumeration to _handle_hotplug_event_root().
This fixes cases where we don't notice the addition or removal of
PCI devices, e.g., the PCI-to-USB ExpressCard in the bugzilla below.
[bhelgaas: changelog, references]
Reference: https://lkml.kernel.org/r/CAAh6nkmbKR3HTqm5ommevsBwhL_u0N8Rk7Wsms_LfP=nBgKNew@mail.gmail.com
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=57961
Reported-by: Gavin Guo <tuffkidtt@gmail.com>
Tested-by: Gavin Guo <tuffkidtt@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org # v3.9+
Pull PCI updates from Bjorn Helgaas:
"MSI:
PCI: Set ->mask_pos correctly
Hotplug:
PCI: Delay final fixups until resources are assigned
Moorestown:
x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0"
* tag 'pci-v3.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Delay final fixups until resources are assigned
x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0
PCI: Set ->mask_pos correctly
Commit 4f535093cf "PCI: Put pci_dev in device tree as early as possible"
moved final fixups from pci_bus_add_device() to pci_device_add(). But
pci_device_add() happens before resource assignment, so BARs may not be
valid yet.
Typical flow for hot-add:
pciehp_configure_device
pci_scan_slot
pci_scan_single_device
pci_device_add
pci_fixup_device(pci_fixup_final, dev) # previous location
# resource assignment happens here
pci_bus_add_devices
pci_bus_add_device
pci_fixup_device(pci_fixup_final, dev) # new location
[bhelgaas: changelog, move fixups to pci_bus_add_device()]
Reference: https://lkml.kernel.org/r/20130415182614.GB9224@xanatos
Reported-by: David Bulkow <David.Bulkow@stratus.com>
Tested-by: David Bulkow <David.Bulkow@stratus.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.9+
Pull powerpc update from Benjamin Herrenschmidt:
"The main highlights this time around are:
- A pile of addition POWER8 bits and nits, such as updated
performance counter support (Michael Ellerman), new branch history
buffer support (Anshuman Khandual), base support for the new PCI
host bridge when not using the hypervisor (Gavin Shan) and other
random related bits and fixes from various contributors.
- Some rework of our page table format by Aneesh Kumar which fixes a
thing or two and paves the way for THP support. THP itself will
not make it this time around however.
- More Freescale updates, including Altivec support on the new e6500
cores, new PCI controller support, and a pile of new boards support
and updates.
- The usual batch of trivial cleanups & fixes"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
powerpc: Fix build error for book3e
powerpc: Context switch the new EBB SPRs
powerpc: Turn on the EBB H/FSCR bits
powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
powerpc: Setup BHRB instructions facility in HFSCR for POWER8
powerpc: Fix interrupt range check on debug exception
powerpc: Update tlbie/tlbiel as per ISA doc
powerpc: Print page size info during boot
powerpc: print both base and actual page size on hash failure
powerpc: Fix hpte_decode to use the correct decoding for page sizes
powerpc: Decode the pte-lp-encoding bits correctly.
powerpc: Use encode avpn where we need only avpn values
powerpc: Reduce PTE table memory wastage
powerpc: Move the pte free routines from common header
powerpc: Reduce the PTE_INDEX_SIZE
powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
powerpc: New hugepage directory format
powerpc: Don't truncate pgd_index wrongly
powerpc: Don't hard code the size of pte page
powerpc: Save DAR and DSISR in pt_regs on MCE
...
Pull VFS updates from Al Viro,
Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).
7kloc removed.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...
Pull networking updates from David Miller:
"Highlights (1721 non-merge commits, this has to be a record of some
sort):
1) Add 'random' mode to team driver, from Jiri Pirko and Eric
Dumazet.
2) Make it so that any driver that supports configuration of multiple
MAC addresses can provide the forwarding database add and del
calls by providing a default implementation and hooking that up if
the driver doesn't have an explicit set of handlers. From Vlad
Yasevich.
3) Support GSO segmentation over tunnels and other encapsulating
devices such as VXLAN, from Pravin B Shelar.
4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.
5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
Dukkipati.
6) In the PHY layer, allow supporting wake-on-lan in situations where
the PHY registers have to be written for it to be configured.
Use it to support wake-on-lan in mv643xx_eth.
From Michael Stapelberg.
7) Significantly improve firewire IPV6 support, from YOSHIFUJI
Hideaki.
8) Allow multiple packets to be sent in a single transmission using
network coding in batman-adv, from Martin Hundebøll.
9) Add support for T5 cxgb4 chips, from Santosh Rastapur.
10) Generalize the VXLAN forwarding tables so that there is more
flexibility in configurating various aspects of the endpoints.
From David Stevens.
11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
from Dmitry Kravkov.
12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
Neira Ayuso.
13) Start adding networking selftests.
14) In situations of overload on the same AF_PACKET fanout socket, or
per-cpu packet receive queue, minimize drop by distributing the
load to other cpus/fanouts. From Willem de Bruijn and Eric
Dumazet.
15) Add support for new payload offset BPF instruction, from Daniel
Borkmann.
16) Convert several drivers over to mdoule_platform_driver(), from
Sachin Kamat.
17) Provide a minimal BPF JIT image disassembler userspace tool, from
Daniel Borkmann.
18) Rewrite F-RTO implementation in TCP to match the final
specification of it in RFC4138 and RFC5682. From Yuchung Cheng.
19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
you like netlink, so I implemented netlink dumping of netlink
sockets.") From Andrey Vagin.
20) Remove ugly passing of rtnetlink attributes into rtnl_doit
functions, from Thomas Graf.
21) Allow userspace to be able to see if a configuration change occurs
in the middle of an address or device list dump, from Nicolas
Dichtel.
22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
Frederic Sowa.
23) Increase accuracy of packet length used by packet scheduler, from
Jason Wang.
24) Beginning set of changes to make ipv4/ipv6 fragment handling more
scalable and less susceptible to overload and locking contention,
from Jesper Dangaard Brouer.
25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
instead. From Hong Zhiguo.
26) Optimize route usage in IPVS by avoiding reference counting where
possible, from Julian Anastasov.
27) Convert IPVS schedulers to RCU, also from Julian Anastasov.
28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
Eitzenberger.
29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
nfnetlink_log, and nfnetlink_queue. From Gao feng.
30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.
31) Support several new r8169 chips, from Hayes Wang.
32) Support tokenized interface identifiers in ipv6, from Daniel
Borkmann.
33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.
34) Add 802.1ad vlan offload support, from Patrick McHardy.
35) Support mmap() based netlink communication, also from Patrick
McHardy.
36) Support HW timestamping in mlx4 driver, from Amir Vadai.
37) Rationalize AF_PACKET packet timestamping when transmitting, from
Willem de Bruijn and Daniel Borkmann.
38) Bring parity to what's provided by /proc/net/packet socket dumping
and the info provided by netlink socket dumping of AF_PACKET
sockets. From Nicolas Dichtel.
39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
Poirier"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
filter: fix va_list build error
af_unix: fix a fatal race with bit fields
bnx2x: Prevent memory leak when cnic is absent
bnx2x: correct reading of speed capabilities
net: sctp: attribute printl with __printf for gcc fmt checks
netlink: kconfig: move mmap i/o into netlink kconfig
netpoll: convert mutex into a semaphore
netlink: Fix skb ref counting.
net_sched: act_ipt forward compat with xtables
mlx4_en: fix a build error on 32bit arches
Revert "bnx2x: allow nvram test to run when device is down"
bridge: avoid OOPS if root port not found
drivers: net: cpsw: fix kernel warn on cpsw irq enable
sh_eth: use random MAC address if no valid one supplied
3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
tg3: fix to append hardware time stamping flags
unix/stream: fix peeking with an offset larger than data in queue
unix/dgram: fix peeking with an offset larger than data in queue
unix/dgram: peek beyond 0-sized skbs
openvswitch: Remove unneeded ovs_netdev_get_ifindex()
...
The "+" operation has higher precedence than "?:" and ->msi_cap is
always non-zero here so the original statement is equivalent to:
entry->mask_pos = PCI_MSI_MASK_64;
Which wasn't the intent.
[bhelgaas: my fault from 78b5a310ce]
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Pull PCI updates from Bjorn Helgaas:
"PCI changes for the v3.10 merge window:
PCI device hotplug
- Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe)
- Make acpiphp builtin only, not modular (Jiang Liu)
- Add acpiphp mutual exclusion (Jiang Liu)
Power management
- Skip "PME enabled/disabled" messages when not supported (Rafael
Wysocki)
- Fix fallback to PCI_D0 (Rafael Wysocki)
Miscellaneous
- Factor quirk_io_region (Yinghai Lu)
- Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas)
- Clean up EISA resource initialization and logging (Bjorn Helgaas)
- Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas)
- MIPS: Initialize of_node before scanning bus (Gabor Juhos)
- Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor
Juhos)
- Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang)
- Fix aer_inject return values (Prarit Bhargava)
- Remove PME/ACPI dependency (Andrew Murray)
- Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)"
* tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (63 commits)
vfio-pci: Use cached MSI/MSI-X capabilities
vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
PCI: Remove "extern" from function declarations
PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h
PCI: Use msix_table_size() directly, drop multi_msix_capable()
PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros
PCI: Drop is_64bit_address() and is_mask_bit_support() macros
PCI: Drop msi_data_reg() macro
PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros
PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly
PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc
PCI: Clean up MSI/MSI-X capability #defines
PCI: Use cached MSI-X cap while enabling MSI-X
PCI: Use cached MSI cap while enabling MSI interrupts
PCI: Remove MSI/MSI-X cap check in pci_msi_check_device()
PCI: Cache MSI/MSI-X capability offsets in struct pci_dev
PCI: Use u8, not int, for PM capability offset
[SCSI] megaraid_sas: Use correct #define for MSI-X capability
PCI: Remove "extern" from function declarations
...
This function is meant to add a helper function that will determine if a PF
has any VFs that are currently assigned to a guest. We currently have been
implementing this function per driver, and going forward I would like to avoid
that by making this function generic and using this helper.
v2: Removed extern from declaration of pci_vfs_assigned in pci.h and return
0 if SR-IOV is disabled with is inline with other PCI SRIOV functions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* pci/gavin-msi-cleanup:
vfio-pci: Use cached MSI/MSI-X capabilities
vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
PCI: Remove "extern" from function declarations
PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h
PCI: Use msix_table_size() directly, drop multi_msix_capable()
PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros
PCI: Drop is_64bit_address() and is_mask_bit_support() macros
PCI: Drop msi_data_reg() macro
PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros
PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly
PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc
PCI: Clean up MSI/MSI-X capability #defines
PCI: Use cached MSI-X cap while enabling MSI-X
PCI: Use cached MSI cap while enabling MSI interrupts
PCI: Remove MSI/MSI-X cap check in pci_msi_check_device()
PCI: Cache MSI/MSI-X capability offsets in struct pci_dev
PCI: Use u8, not int, for PM capability offset
[SCSI] megaraid_sas: Use correct #define for MSI-X capability
PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the
Table Offset register, not the flags ("Message Control" per spec)
register.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>