commit 90db10434b163e46da413d34db8d0e77404cc645 upstream.
No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.
There is nothing the callers could do, except retrying over and over
again.
So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).
Fixes: e93f8a0f82 ("KVM: convert io_bus to SRCU")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[wt: no kvm_io_bus_read_cookie in 3.10, slightly different constructs]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 6ea34c9b78c10289846db0abeebd6b84d5aca084 upstream.
We can easily reach the 1000 limit by start VM with a couple
hundred I/O devices (multifunction=on). The hardcode limit
already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).
In userspace, we already have maximum file descriptor to
limit ioeventfd count. But kvm_io_bus devices also are used
for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
by maximum file descriptor.
Currently only ioeventfds take too much kvm_io_bus devices,
so just exclude it from counting kvm_io_range limit.
Also fixed one indent issue in kvm_host.h
Signed-off-by: Amos Kong <akong@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
[wt: next patch depends on this one]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit df630b8c1e851b5e265dc2ca9c87222e342c093b upstream.
When releasing the bus, let's clear the bus pointers to mark it out. If
any further device unregister happens on this bus, we know that we're
done if we found the bus being released already.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim KrÄmáŠ<rkrcmar@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit dfcb9f4f99f1e9a49e43398a7bfbf56927544af1 upstream.
commit 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
attempted to avoid a BUG_ON call when the association being used for a
sendmsg() is blocked waiting for more sndbuf and another thread did a
peeloff operation on such asoc, moving it to another socket.
As Ben Hutchings noticed, then in such case it would return without
locking back the socket and would cause two unlocks in a row.
Further analysis also revealed that it could allow a double free if the
application managed to peeloff the asoc that is created during the
sendmsg call, because then sctp_sendmsg() would try to free the asoc
that was created only for that call.
This patch takes another approach. It will deny the peeloff operation
if there is a thread sleeping on the asoc, so this situation doesn't
exist anymore. This avoids the issues described above and also honors
the syscalls that are already being handled (it can be multiple sendmsg
calls).
Joint work with Xin Long.
Fixes: 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 2dcab598484185dea7ec22219c76dcdd59e3cb90 upstream.
Alexander Popov reported that an application may trigger a BUG_ON in
sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
waiting on it to queue more data and meanwhile another thread peels off
the association being used by the first thread.
This patch replaces the BUG_ON call with a proper error handling. It
will return -EPIPE to the original sendmsg call, similarly to what would
have been done if the association wasn't found in the first place.
Acked-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit abb6013cca147ad940b0e9fee260d2d9e93b7018 upstream.
when read/write the 64bit data, the correct lock should be hold.
Fixes: 87b6d218f3 ("tunnel: implement 64 bits statistics")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 63117f09c768be05a0bf465911297dc76394f686 upstream.
Casting is a high precedence operation but "off" and "i" are in terms of
bytes so we need to have some parenthesis here.
Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit fbfa743a9d2a0ffa24251764f10afc13eb21e739 upstream.
This function suffers from multiple issues.
First one is that pskb_may_pull() may reallocate skb->head,
so the 'raw' pointer needs either to be reloaded or not used at all.
Second issue is that NEXTHDR_DEST handling does not validate
that the options are present in skb->data, so we might read
garbage or access non existent memory.
With help from Willem de Bruijn.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 22a1e7783e173ab3d86018eb590107d68df46c11 upstream.
The commit 8dfbcc4351a0 ("[media] xc2028: avoid use after free") tried
to address the reported use-after-free by clearing the reference.
However, it's clearing the wrong pointer; it sets NULL to
priv->ctrl.fname, but it's anyway overwritten by the next line
memcpy(&priv->ctrl, p, sizeof(priv->ctrl)).
OTOH, the actual code accessing the freed string is the strcmp() call
with priv->fname:
if (!firmware_name[0] && p->fname &&
priv->fname && strcmp(p->fname, priv->fname))
free_firmware(priv);
where priv->fname points to the previous file name, and this was
already freed by kfree().
For fixing the bug properly, this patch does the following:
- Keep the copy of firmware file name in only priv->fname,
priv->ctrl.fname isn't changed;
- The allocation is done only when the firmware gets loaded;
- The kfree() is called in free_firmware() commonly
Fixes: commit 8dfbcc4351a0 ('[media] xc2028: avoid use after free')
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 210bd104c6acd31c3c6b8b075b3f12d4a9f6b60d upstream.
We have to unlock before returning -ENOMEM.
Fixes: 8dfbcc4351a0 ('[media] xc2028: avoid use after free')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit a9f61ca793becabdefab03b77568d6c6f8c1bc79 upstream.
When we crash from NMI context (e.g. after NMI injection from host when
'sysctl -w kernel.unknown_nmi_panic=1' is set) we hit
kernel BUG at mm/vmalloc.c:1530!
as vfree() is denied. While the issue could be solved with in_nmi() check
instead I opted for skipping vfree on all sorts of crashes to reduce the
amount of work which can cause consequent crashes. We don't really need to
free anything on crash.
[js] no tsc and kexec in 3.12 yet
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit f1712c73714088a7252d276a57126d56c7d37e64 upstream.
Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()
The main problem seems that the sockets themselves are not RCU
protected.
If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.
Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.
[1]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0
Call Trace:
<IRQ>
[<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60
[<ffffffff81d55771>] sk_filter+0x41/0x210
[<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0
[<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0
[<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370
[<ffffffff81f07af9>] can_receive+0xd9/0x120
[<ffffffff81f07beb>] can_rcv+0xab/0x100
[<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0
[<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0
[<ffffffff81d37f67>] process_backlog+0x127/0x280
[<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0
[<ffffffff810c88d4>] __do_softirq+0x184/0x440
[<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30
<EOI>
[<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40
[<ffffffff810c8bed>] do_softirq+0x1d/0x20
[<ffffffff81d30085>] netif_rx_ni+0xe5/0x110
[<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520
[<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230
[<ffffffff810e3baf>] process_one_work+0x24f/0x670
[<ffffffff810e44ed>] worker_thread+0x9d/0x6f0
[<ffffffff810e4450>] ? rescuer_thread+0x480/0x480
[<ffffffff810ebafc>] kthread+0x12c/0x150
[<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70
Reported-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 90cae1fe1c3540f791d5b8e025985fa5e699b2bb upstream.
As a part of memory initialisation the architecture passes an array to
free_area_init_nodes() which specifies the max PFN of each memory zone.
This array is not necessarily monotonic (due to unused zones) so this
array is parsed to build monotonic lists of the min and max PFN for each
zone. ZONE_MOVABLE is special cased here as its limits are managed by
the mm subsystem rather than the architecture. Unfortunately, this
special casing is broken when ZONE_MOVABLE is the not the last zone in
the zone list. The core of the issue is:
if (i == ZONE_MOVABLE)
continue;
arch_zone_lowest_possible_pfn[i] =
arch_zone_highest_possible_pfn[i-1];
As ZONE_MOVABLE is skipped the lowest_possible_pfn of the next zone will
be set to zero. This patch fixes this bug by adding explicitly tracking
where the next zone should start rather than relying on the contents
arch_zone_highest_possible_pfn[].
Thie is low priority. To get bitten by this you need to enable a zone
that appears after ZONE_MOVABLE in the zone_type enum. As far as I can
tell this means running a kernel with ZONE_DEVICE or ZONE_CMA enabled,
so I can't see this affecting too many people.
I only noticed this because I've been fiddling with ZONE_DEVICE on
powerpc and 4.6 broke my test kernel. This bug, in conjunction with the
changes in Taku Izumi's kernelcore=mirror patch (d91749c1dda71) and
powerpc being the odd architecture which initialises max_zone_pfn[] to
~0ul instead of 0 caused all of system memory to be placed into
ZONE_DEVICE at boot, followed a panic since device memory cannot be used
for kernel allocations. I've already submitted a patch to fix the
powerpc specific bits, but I figured this should be fixed too.
Link: http://lkml.kernel.org/r/1462435033-15601-1-git-send-email-oohall@gmail.com
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit bcdbeb844773333d2d1c08004f3b3e25921040e5 upstream.
The stop_activity() routine in dummy-hcd is supposed to unlink all
active requests for every endpoint, among other things. But it
doesn't handle ep0. As a result, fuzz testing can generate a WARNING
like the following:
WARNING: CPU: 0 PID: 4410 at drivers/usb/gadget/udc/dummy_hcd.c:672 dummy_free_request+0x153/0x170
Modules linked in:
CPU: 0 PID: 4410 Comm: syz-executor Not tainted 4.9.0-rc7+ #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff88006a64ed10 ffffffff81f96b8a ffffffff41b58ab3 1ffff1000d4c9d35
ffffed000d4c9d2d ffff880065f8ac00 0000000041b58ab3 ffffffff8598b510
ffffffff81f968f8 0000000041b58ab3 ffffffff859410e0 ffffffff813f0590
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51
[<ffffffff812b808f>] __warn+0x19f/0x1e0 kernel/panic.c:550
[<ffffffff812b831c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
[<ffffffff830fcb13>] dummy_free_request+0x153/0x170 drivers/usb/gadget/udc/dummy_hcd.c:672
[<ffffffff830ed1b0>] usb_ep_free_request+0xc0/0x420 drivers/usb/gadget/udc/core.c:195
[<ffffffff83225031>] gadgetfs_unbind+0x131/0x190 drivers/usb/gadget/legacy/inode.c:1612
[<ffffffff830ebd8f>] usb_gadget_remove_driver+0x10f/0x2b0 drivers/usb/gadget/udc/core.c:1228
[<ffffffff830ec084>] usb_gadget_unregister_driver+0x154/0x240 drivers/usb/gadget/udc/core.c:1357
This patch fixes the problem by iterating over all the endpoints in
the driver's ep array instead of iterating over the gadget's ep_list,
which explicitly leaves out ep0.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 43a6684519ab0a6c52024b5e25322476cabad893 upstream.
We got a report of yet another bug in ping
http://www.openwall.com/lists/oss-security/2017/03/24/6
->disconnect() is not called with socket lock held.
Fix this by acquiring ping rwlock earlier.
Thanks to Daniel, Alexander and Andrey for letting us know this problem.
Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Jiang <danieljiang0415@gmail.com>
Reported-by: Solar Designer <solar@openwall.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[wt: the function is ping_v4_unhash() in 3.10]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 687e0687f71ec00e0132a21fef802dee88c2f1ad upstream.
USBTMC devices are required to have a bulk-in and a bulk-out endpoint,
but the driver failed to verify this, something which could lead to the
endpoint addresses being taken from uninitialised memory.
Make sure to zero all private data as part of allocation, and add the
missing endpoint sanity check.
Note that this also addresses a more recently introduced issue, where
the interrupt-in-presence flag would also be uninitialised whenever the
optional interrupt-in endpoint is not present. This in turn could lead
to an interrupt urb being allocated, initialised and submitted based on
uninitialised values.
Fixes: dbf3e7f654c0 ("Implement an ioctl to support the USMTMC-USB488 READ_STATUS_BYTE operation.")
Fixes: 5b775f672c ("USB: add USB test and measurement class driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
[ johan: backport to v4.4 ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit ecf1e2253ea79c6204f4d6a5e756e8fb4aed5a7e upstream.
Instead of the one when another syscall takes place while another is being
processed (in another CPU, but we show it serialized, so need to "interrupt"
the other), and also when finally showing the sys_enter + sys_exit + duration,
where we were showing the sample->time for the sys_exit, duh.
Before:
# perf trace sleep 1
<SNIP>
0.373 ( 0.001 ms): close(fd: 3 ) = 0
1000.626 (1000.211 ms): nanosleep(rqtp: 0x7ffd6ddddfb0) = 0
1000.653 ( 0.003 ms): close(fd: 1 ) = 0
1000.657 ( 0.002 ms): close(fd: 2 ) = 0
1000.667 ( 0.000 ms): exit_group( )
#
After:
# perf trace sleep 1
<SNIP>
0.336 ( 0.001 ms): close(fd: 3 ) = 0
0.373 (1000.086 ms): nanosleep(rqtp: 0x7ffe303e9550) = 0
1000.481 ( 0.002 ms): close(fd: 1 ) = 0
1000.485 ( 0.001 ms): close(fd: 2 ) = 0
1000.494 ( 0.000 ms): exit_group( )
[root@jouet linux]#
[js] no trace__printf_interrupted_entry in 3.12 yet
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ecbzgmu2ni6glc6zkw8p1zmx@git.kernel.org
Fixes: 752fde44fd ("perf trace: Support interrupted syscalls")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
[wt: 3.10 uses stdout instead of trace->output ;
no trace__printf_interrupted_entry() function ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 4c47af4d5eb2c2f78f886079a3920a7078a6f0a0 upstream.
Problem statement: 1) both paths (primary path1 and alternate
path2) are up after the association has been established i.e.,
HB packets are normally exchanged, 2) path2 gets inactive after
path_max_retrans * max_rto timed out (i.e. path2 is down completely),
3) now, if a transmission times out on the only surviving/active
path1 (any ~1sec network service impact could cause this like
a channel bonding failover), then the retransmitted packets are
sent over the inactive path2; this happens with partial failover
and without it.
Besides not being optimal in the above scenario, a small failure
or timeout in the only existing path has the potential to cause
long delays in the retransmission (depending on RTO_MAX) until
the still active path is reselected. Further, when the T3-timeout
occurs, we have active_patch == retrans_path, and even though the
timeout occurred on the initial transmission of data, not a
retransmit, we end up updating retransmit path.
RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under
6.4.1. "Failover from an Inactive Destination Address" the
following:
Some of the transport addresses of a multi-homed SCTP endpoint
may become inactive due to either the occurrence of certain
error conditions (see Section 8.2) or adjustments from the
SCTP user.
When there is outbound data to send and the primary path
becomes inactive (e.g., due to failures), or where the SCTP
user explicitly requests to send data to an inactive
destination transport address, before reporting an error to
its ULP, the SCTP endpoint should try to send the data to an
alternate __active__ destination transport address if one
exists.
When retransmitting data that timed out, if the endpoint is
multihomed, it should consider each source-destination address
pair in its retransmission selection policy. When retransmitting
timed-out data, the endpoint should attempt to pick the most
divergent source-destination pair from the original
source-destination pair to which the packet was transmitted.
Note: Rules for picking the most divergent source-destination
pair are an implementation decision and are not specified
within this document.
So, we should first reconsider to take the current active
retransmission transport if we cannot find an alternative
active one. If all of that fails, we can still round robin
through unkown, partial failover, and inactive ones in the
hope to find something still suitable.
Commit 4141ddc02a ("sctp: retran_path update bug fix") broke
that behaviour by selecting the next inactive transport when
no other active transport was found besides the current assoc's
peer.retran_path. Before commit 4141ddc02a, we would have
traversed through the list until we reach our peer.retran_path
again, and in case that is still in state SCTP_ACTIVE, we would
take it and return. Only if that is not the case either, we
take the next inactive transport.
Besides all that, another issue is that transports in state
SCTP_UNKNOWN could be preferred over transports in state
SCTP_ACTIVE in case a SCTP_ACTIVE transport appears after
SCTP_UNKNOWN in the transport list yielding a weaker transport
state to be used in retransmission.
This patch mostly reverts 4141ddc02a, but also rewrites
this function to introduce more clarity and strictness into
the code. A strict priority of transport states is enforced
in this patch, hence selection is active > unkown > partial
failover > inactive.
Fixes: 4141ddc02a ("sctp: retran_path update bug fix")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vlad Yasevich <yasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[wt: picked updated function from 3.12 except the debug statement]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit cb4855b49deb1acce27706ad9509d63c4fe8e988 upstream.
We fixed this to use free_netdev() instead of kfree() but unfortunately
free_netdev() doesn't accept NULL pointers. Smatch complains about
this, it's not something I discovered through testing.
Fixes: 3030d40b5036 ('staging: vt6655: use free_netdev instead of kfree')
Fixes: 0a438d5b38 ('staging: vt6656: use free_netdev instead of kfree')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[wt: only vt6656 was converted to free_netdev in 3.10]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 2eb783c43e7cf807a45899c10ed556b6dc116625 upstream.
We set the flag TUN_PKT_STRIP if the user buffer provided is too
small to contain the entire packet plus meta-data. However, this
has been broken ever since we added GSO meta-data. VLAN acceleration
also has the same problem.
This patch fixes this by taking both into account when setting the
TUN_PKT_STRIP flag.
The fact that this has been broken for six years without anyone
realising means that nobody actually uses this flag.
Fixes: f43798c276 ("tun: Allow GSO using virtio_net_hdr")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[wt: no tuntap VLAN offloading in 3.10]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit af92305e567b7f4c9cf48b9e46c1f48ec9ffb1fb upstream.
On i.MX31 AVIC interrupt controller base address is at 0x68000000.
The problem was shadowed by the AVIC driver, which takes the correct
base address from a SoC specific header file.
Fixes: d2a37b3d91 ("ARM i.MX31: Add devicetree support")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 1f87aee6a2e55eda466a43ba6248a8b75eede153 upstream.
i.MX31 Clock Control Module controller is found on AIPS2 bus, move it
there from SPBA bus to avoid a conflict of device IO space mismatch.
Fixes: ef0e4a606f ("ARM: mx31: Replace clk_register_clkdev with clock DT lookup")
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 162b270c664dca2e0944308e92f9fcc887151a72 upstream.
KGDB is a kernel debug stub and it can't be used to debug userland as it
can only safely access kernel memory.
On MIPS however KGDB has always got the register state of sleeping
processes from the userland register context at the beginning of the
kernel stack. This is meaningless for kernel threads (which never enter
userland), and for user threads it prevents the user seeing what it is
doing while in the kernel:
(gdb) info threads
Id Target Id Frame
...
3 Thread 2 (kthreadd) 0x0000000000000000 in ?? ()
2 Thread 1 (init) 0x000000007705c4b4 in ?? ()
1 Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201
Get the register state instead from the (partial) kernel register
context stored in the task's thread_struct for resume() to restore. All
threads now correctly appear to be in context_switch():
(gdb) info threads
Id Target Id Frame
...
3 Thread 2 (kthreadd) context_switch (rq=<optimized out>, cookie=..., next=<optimized out>, prev=0x0) at kernel/sched/core.c:2903
2 Thread 1 (init) context_switch (rq=<optimized out>, cookie=..., next=<optimized out>, prev=0x0) at kernel/sched/core.c:2903
1 Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201
Call clobbered registers which aren't saved and exception registers
(BadVAddr & Cause) which can't be easily determined without stack
unwinding are reported as 0. The PC is taken from the return address,
such that the state presented matches that found immediately after
returning from resume().
Fixes: 8854700115 ("[MIPS] kgdb: add arch support for the kernel's kgdb core")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15829/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit e08293a4ccbcc993ded0fdc46f1e57926b833d63 upstream.
Take a reference on the sessions returned by l2tp_session_find_nth()
(and rename it l2tp_session_get_nth() to reflect this change), so that
caller is assured that the session isn't going to disappear while
processing it.
For procfs and debugfs handlers, the session is held in the .start()
callback and dropped in .show(). Given that pppol2tp_seq_session_show()
dereferences the associated PPPoL2TP socket and that
l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to
call the session's .ref() callback to prevent the socket from going
away from under us.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Fixes: 0ad6614048 ("l2tp: Add debugfs files for dumping l2tp debug info")
Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 49d52e8108a21749dc2114b924c907db43358984 upstream.
If the PHY is halted on stop, then do not set the state to PHY_UP. This
ensures the phy will be restarted later in phy_start when the machine is
started again.
Fixes: 00db8189d9 ("This patch adds a PHY Abstraction Layer to the Linux Kernel, enabling ethernet drivers to remain as ignorant as is reasonable of the connected PHY's design and operation details.")
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Acked-by: Xander Huff <xander.huff@ni.com>
Acked-by: Kyle Roeschley <kyle.roeschley@ni.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 17a49cd549d9dc8707dc9262210166455c612dde upstream.
Since 09d9686047db ("netfilter: x_tables: do compat validation via
translate_table"), it used compatr structure to assign newinfo
structure. In translate_compat_table of ip_tables.c and ip6_tables.c,
it used compatr->hook_entry to replace info->hook_entry and
compatr->underflow to replace info->underflow, but not do the same
replacement in arp_tables.c.
It caused invoking 32-bit "arptbale -P INPUT ACCEPT" failed in 64bit
kernel.
--------------------------------------
root@qemux86-64:~# arptables -P INPUT ACCEPT
root@qemux86-64:~# arptables -P INPUT ACCEPT
ERROR: Policy for `INPUT' offset 448 != underflow 0
arptables: Incompatible with this kernel
--------------------------------------
Fixes: 09d9686047db ("netfilter: x_tables: do compat validation via translate_table")
Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 78f7a45dac2a2d2002f98a3a95f7979867868d73 upstream.
I noticed that reading the snapshot file when it is empty no longer gives a
status. It suppose to show the status of the snapshot buffer as well as how
to allocate and use it. For example:
># cat snapshot
# tracer: nop
#
#
# * Snapshot is allocated *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')
But instead it just showed an empty buffer:
># cat snapshot
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
What happened was that it was using the ring_buffer_iter_empty() function to
see if it was empty, and if it was, it showed the status. But that function
was returning false when it was empty. The reason was that the iter header
page was on the reader page, and the reader page was empty, but so was the
buffer itself. The check only tested to see if the iter was on the commit
page, but the commit page was no longer pointing to the reader page, but as
all pages were empty, the buffer is also.
Fixes: 651e22f2701b ("ring-buffer: Always reset iterator to reader page")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit df62db5be2e5f070ecd1a5ece5945b590ee112e0 upstream.
Currently the snapshot trigger enables the probe and then allocates the
snapshot. If the probe triggers before the allocation, it could cause the
snapshot to fail and turn tracing off. It's best to allocate the snapshot
buffer first, and then enable the trigger. If something goes wrong in the
enabling of the trigger, the snapshot buffer is still allocated, but it can
also be freed by the user by writting zero into the snapshot buffer file.
Also add a check of the return status of alloc_snapshot().
Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7926aff5c57b577ab0f43364ff0c59d968f6a414 upstream.
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7ed23e1bae8bf7e37fd555066550a00b95a3a98b upstream.
On Power8 & Power9 the early CPU inititialisation in __init_HFSCR()
turns on HFSCR[TM] (Hypervisor Facility Status and Control Register
[Transactional Memory]), but that doesn't take into account that TM
might be disabled by CPU features, or disabled by the kernel being built
with CONFIG_PPC_TRANSACTIONAL_MEM=n.
So later in boot, when we have setup the CPU features, clear HSCR[TM] if
the TM CPU feature has been disabled. We use CPU_FTR_TM_COMP to account
for the CONFIG_PPC_TRANSACTIONAL_MEM=n case.
Without this a KVM guest might try use TM, even if told not to, and
cause an oops in the host kernel. Typically the oops is seen in
__kvmppc_vcore_entry() and may or may not be fatal to the host, but is
always bad news.
In practice all shipping CPU revisions do support TM, and all host
kernels we are aware of build with TM support enabled, so no one should
actually be able to hit this in the wild.
Fixes: 2a3563b023 ("powerpc: Setup in HFSCR for POWER8")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[mpe: Rewrite change log with input from Sam, add Fixes/stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[sb: Backported to linux-4.4.y: adjusted context]
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 309124e2648d668a0c23539c5078815660a4a850 upstream.
According to full-history-linux commit d3794f4fa7c3edc3 ("[PATCH] M68k
update (part 25)"), port operations are allowed on m68k if CONFIG_ISA is
defined.
However, commit 153dcc54df ("[PATCH] mem driver: fix conditional
on isa i/o support") accidentally changed an "||" into an "&&",
disabling it completely on m68k. This logic was retained when
introducing the DEVPORT symbol in commit 4f911d64e0 ("Make
/dev/port conditional on config symbol").
Drop the bogus dependency on !M68K to fix this.
Fixes: 153dcc54df ("[PATCH] mem driver: fix conditional on isa i/o support")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Al Stone <ahs3@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 291c566a28910614ce42d0ffe82196eddd6346f4 upstream.
In function mlx4_cq_completion() and mlx4_cq_event(), the
radix_tree_lookup requires a rcu_read_lock.
This is mandatory: if another core frees the CQ, it could
run the radix_tree_node_rcu_free() call_rcu() callback while
its being used by the radix tree lookup function.
Additionally, in function mlx4_cq_event(), since we are adding
the rcu lock around the radix-tree lookup, we no longer need to take
the spinlock. Also, the synchronize_irq() call for the async event
eliminates the need for incrementing the cq reference count in
mlx4_cq_event().
Other changes:
1. In function mlx4_cq_free(), replace spin_lock_irq with spin_lock:
we no longer take this spinlock in the interrupt context.
The spinlock here, therefore, simply protects against different
threads simultaneously invoking mlx4_cq_free() for different cq's.
2. In function mlx4_cq_free(), we move the radix tree delete to before
the synchronize_irq() calls. This guarantees that we will not
access this cq during any subsequent interrupts, and therefore can
safely free the CQ after the synchronize_irq calls. The rcu_read_lock
in the interrupt handlers only needs to protect against corrupting the
radix tree; the interrupt handlers may access the cq outside the
rcu_read_lock due to the synchronize_irq calls which protect against
premature freeing of the cq.
3. In function mlx4_cq_event(), we change the mlx_warn message to mlx4_dbg.
4. We leave the cq reference count mechanism in place, because it is
still needed for the cq completion tasklet mechanism.
Fixes: 6d90aa5cf17b ("net/mlx4_core: Make sure there are no pending async events when freeing CQ")
Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 6496bbf0ec481966ef9ffe5b6660d8d1b55c60cc upstream.
Single send WQE in RX buffer should be stamped with software
ownership in order to prevent the flow of QP in error in FW
once UPDATE_QP is called.
Fixes: 9f519f68cf ('mlx4_en: Not using Shared Receive Queues')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit d82c0d12c92705ef468683c9b7a8298dd61ed191 upstream.
Reorder the operations in decompress_kernel() to ensure initrd is moved
to a safe location before the bss section is zeroed.
During decompression bss can overlap with the initrd and this can
corrupt the initrd contents depending on the size of the compressed
kernel (which affects where the initrd is placed by the bootloader) and
the size of the bss section of the decompressor.
Also use the correct initrd size when checking for overlaps with
parmblock.
Fixes: 06c0dd72ae ([S390] fix boot failures with compressed kernels)
Reviewed-by: Joy Latten <joy.latten@canonical.com>
Reviewed-by: Vineetha HariPai <vineetha.hari.pai@canonical.com>
Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit b884a190afcecdbef34ca508ea5ee88bb7c77861 upstream.
The rapf copy loops in the Meta usercopy code is missing some extable
entries for HTP cores with unaligned access checking enabled, where
faults occur on the instruction immediately after the faulting access.
Add the fixup labels and extable entries for these cases so that corner
case user copy failures don't cause kernel crashes.
Fixes: 373cd784d0 ("metag: Memory handling")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 2c0b1df88b987a12d95ea1d6beaf01894f3cc725 upstream.
The fixup code to rewind the source pointer in
__asm_copy_from_user_{32,64}bit_rapf_loop() always rewound the source by
a single unit (4 or 8 bytes), however this is insufficient if the fault
didn't occur on the first load in the loop, as the source pointer will
have been incremented but nothing will have been stored until all 4
register [pairs] are loaded.
Read the LSM_STEP field of TXSTATUS (which is already loaded into a
register), a bit like the copy_to_user versions, to determine how many
iterations of MGET[DL] have taken place, all of which need rewinding.
Fixes: 373cd784d0 ("metag: Memory handling")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit fd40eee1290ad7add7aa665e3ce6b0f9fe9734b4 upstream.
The fixup code for the copy_to_user rapf loops reads TXStatus.LSM_STEP
to decide how far to rewind the source pointer. There is a special case
for the last execution of an MGETL/MGETD, since it leaves LSM_STEP=0
even though the number of MGETLs/MGETDs attempted was 4. This uses ADDZ
which is conditional upon the Z condition flag, but the AND instruction
which masked the TXStatus.LSM_STEP field didn't set the condition flags
based on the result.
Fix that now by using ANDS which does set the flags, and also marking
the condition codes as clobbered by the inline assembly.
Fixes: 373cd784d0 ("metag: Memory handling")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit fb8ea062a8f2e85256e13f55696c5c5f0dfdcc8b upstream.
When copying to userland on Meta, if any faults are encountered
immediately abort the copy instead of continuing on and repeatedly
faulting, and worse potentially copying further bytes successfully to
subsequent valid pages.
Fixes: 373cd784d0 ("metag: Memory handling")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 2257211942bbbf6c798ab70b487d7e62f7835a1a upstream.
Fix the error checking of the alignment adjustment code in
raw_copy_from_user(), which mistakenly considers it safe to skip the
error check when aligning the source buffer on a 2 or 4 byte boundary.
If the destination buffer was unaligned it may have started to copy
using byte or word accesses, which could well be at the start of a new
(valid) source page. This would result in it appearing to have copied 1
or 2 bytes at the end of the first (invalid) page rather than none at
all.
Fixes: 373cd784d0 ("metag: Memory handling")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 5402e97af667e35e54177af8f6575518bf251d51 upstream.
In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against
__TASK_TRACED. If this races with the ptrace_unfreeze_traced at the end
of a PTRACE_LISTEN, this can wake the task /after/ the check against
__TASK_TRACED, but before the reset of state to TASK_TRACED. This
causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup
against TRACED while the task is still on the rq wake_list, corrupting
it.
Oleg said:
"The kernel can crash or this can lead to other hard-to-debug problems.
In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced()
assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the
contract. Obviusly it is very wrong to manipulate task->state if this
task is already running, or WAKING, or it sleeps again"
[akpm@linux-foundation.org: coding-style fixes]
Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL")
Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com
Signed-off-by: Ben Segall <bsegall@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 4eba7bb1d72d9bde67d810d09bf62dc207b63c5c upstream.
When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.
If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a5344f ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.
The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 52ad353a5344f "(igmp: fix the problem when mc leave group")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit a9bed6b10bd117a300cceb9062003f7a2761ef99 upstream.
In some cases, we could start a new i2c transfer with the RXRDY flag
set. It is not a clean state and it leads to print annoying error
messages even if there no real issue. The cause is only having garbage
data in the Receive Holding Register because of a weird behavior of the
RXRDY flag.
Reported-by: Peter Rosin <peda@lysator.liu.se>
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Tested-by: Peter Rosin <peda@lysator.liu.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Fixes: 93563a6a71bb ("i2c: at91: fix a race condition when using the DMA controller")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7d8021c967648accd1b78e5e1ddaad655cd2c61f upstream.
This patch fixes a bug introduced by commit 977dcfdc6031 ("USB: OHCI:
don't lose track of EDs when a controller dies"). The commit changed
ed_state from ED_UNLINK to ED_IDLE too early, before finish_urb() had
been called. The user-visible consequence is that the driver
occasionally crashes or locks up when an URB is submitted while
another URB for the same endpoint is being unlinked.
This patch moves the ED state change later, to the right place. The
drawback is that now we may unnecessarily execute some instructions
multiple times when a controller dies. Since controllers dying is an
exceptional occurrence, a little wasted time won't matter.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Heiko Przybyl <lil_tux@web.de>
Tested-by: Heiko Przybyl <lil_tux@web.de>
Fixes: 977dcfdc60311e7aa571cabf6f39c36dde13339e
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 0294112ee3135fbd15eaa70015af8283642dd970 upstream.
This effectively reverts the following three commits:
7bc10388ccdd ACPI / resources: free memory on error in add_region_before()
0f1b414d1907 ACPI / PNP: Avoid conflicting resource reservations
b9a5e5e18fbf ACPI / init: Fix the ordering of acpi_reserve_resources()
(commit b9a5e5e18fbf introduced regressions some of which, but not
all, were addressed by commit 0f1b414d1907 and commit 7bc10388ccdd
was a fixup on top of the latter) and causes ACPI fixed hardware
resources to be reserved at the fs_initcall_sync stage of system
initialization.
The story is as follows. First, a boot regression was reported due
to an apparent resource reservation ordering change after a commit
that shouldn't lead to such changes. Investigation led to the
conclusion that the problem happened because acpi_reserve_resources()
was executed at the device_initcall() stage of system initialization
which wasn't strictly ordered with respect to driver initialization
(and with respect to the initialization of the pcieport driver in
particular), so a random change causing the device initcalls to be
run in a different order might break things.
The response to that was to attempt to run acpi_reserve_resources()
as soon as we knew that ACPI would be in use (commit b9a5e5e18fbf).
However, that turned out to be too early, because it caused resource
reservations made by the PNP system driver to fail on at least one
system and that failure was addressed by commit 0f1b414d1907.
That fix still turned out to be insufficient, though, because
calling acpi_reserve_resources() before the fs_initcall stage of
system initialization caused a boot regression to happen on the
eCAFE EC-800-H20G/S netbook. That meant that we only could call
acpi_reserve_resources() at the fs_initcall initialization stage
or later, but then we might just as well call it after the PNP
initalization in which case commit 0f1b414d1907 wouldn't be
necessary any more.
For this reason, the changes made by commit 0f1b414d1907 are reverted
(along with a memory leak fixup on top of that commit), the changes
made by commit b9a5e5e18fbf that went too far are reverted too and
acpi_reserve_resources() is changed into fs_initcall_sync, which
will cause it to be executed after the PNP subsystem initialization
(which is an fs_initcall) and before device initcalls (including
the pcieport driver initialization) which should avoid the initial
issue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
Link: http://marc.info/?t=143092384600002&r=1&w=2
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
Link: http://marc.info/?t=143389402600001&r=1&w=2
Fixes: b9a5e5e18fbf "ACPI / init: Fix the ordering of acpi_reserve_resources()"
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7bc10388ccdd79b3d20463151a1f8e7a590a775b upstream.
There is a small memory leak on error.
Fixes: 0f1b414d1907 (ACPI / PNP: Avoid conflicting resource reservations)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 0f1b414d190724617eb1cdd615592fa8cd9d0b50 upstream.
Commit b9a5e5e18fbf "ACPI / init: Fix the ordering of
acpi_reserve_resources()" overlooked the fact that the memory
and/or I/O regions reserved by acpi_reserve_resources() may
conflict with those reserved by the PNP "system" driver.
If that conflict actually takes place, it causes the reservations
made by the "system" driver to fail while before commit b9a5e5e18fbf
all reservations made by it and by acpi_reserve_resources() would be
successful. In turn, that allows the resources that haven't been
reserved by the "system" driver to be used by others (e.g. PCI) which
sometimes leads to functional problems (up to and including boot
failures).
To fix that issue, introduce a common resource reservation routine,
acpi_reserve_region(), to be used by both acpi_reserve_resources()
and the "system" driver, that will track all resources reserved by
it and avoid making conflicting requests.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
Link: http://marc.info/?t=143389402600001&r=1&w=2
Fixes: b9a5e5e18fbf "ACPI / init: Fix the ordering of acpi_reserve_resources()"
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>