commit 2400fd822f467cb4c886c879d8ad99feac9cf319 upstream.
The workaround for the CELL timebase bug does not correctly mark cr0 as
being clobbered. This means GCC doesn't know that the asm block changes cr0 and
might leave the result of an unrelated comparison in cr0 across the block, which
we then trash, leading to basically random behaviour.
Fixes: 859deea949 ("[POWERPC] Cell timebase bug workaround")
Cc: stable@vger.kernel.org # v2.6.19+
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Tweak change log and flag for stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 64e756c55aa46fc18fd53e8f3598b73b528d8637 upstream.
From POWER4 onwards, mfocrf() only places the specified CR field into
the destination GPR, and the rest of it is set to 0. The PowerPC AS
from version 3.0 now requires this behaviour.
The emulation code currently puts the entire CR into the destination GPR.
Fix it.
Fixes: 6888199f7f ("[POWERPC] Emulate more instructions in software")
Cc: stable@vger.kernel.org # v2.6.22+
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 01e6a61aceb82e13bec29502a8eb70d9574f97ad upstream.
Although it's not documented anywhere, there is an expectation that
atomic64_inc_not_zero() returns a result which fits in an int. This is
the behaviour implemented on all arches except powerpc.
This has caused at least one bug in practice, in the percpu-refcount
code, where the long result from our atomic64_inc_not_zero() was
truncated to an int leading to lost references and stuck systems. That
was worked around in that code in commit 966d2b04e070 ("percpu-refcount:
fix reference leak during percpu-atomic transition").
To the best of my grepping abilities there are no other callers
in-tree which truncate the value, but we should fix it anyway. Because
the breakage is subtle and potentially very harmful I'm also tagging
it for stable.
Code generation is largely unaffected because in most cases the
callers are just using the result for a test anyway. In particular the
case of fget() that was mentioned in commit a6cf7ed511
("powerpc/atomic: Implement atomic*_inc_not_zero") generates exactly
the same code.
Fixes: a6cf7ed511 ("powerpc/atomic: Implement atomic*_inc_not_zero")
Cc: stable@vger.kernel.org # v3.4
Noticed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit a9f8553e935f26cb5447f67e280946b0923cd2dc upstream.
This fixes a crash when function_graph and jprobes are used together.
This is essentially commit 237d28db036e ("ftrace/jprobes/x86: Fix
conflict between jprobes and function graph tracing"), but for powerpc.
Jprobes breaks function_graph tracing since the jprobe hook needs to use
jprobe_return(), which never returns back to the hook, but instead to
the original jprobe'd function. The solution is to momentarily pause
function_graph tracing before invoking the jprobe hook and re-enable it
when returning back to the original jprobe'd function.
Fixes: 6794c78243 ("powerpc64: port of the function graph tracer")
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream.
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
[wt: backport to 4.11: adjust context]
[wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide]
[wt: backport to 4.4: adjust context ; drop ppc hugetlb_radix changes]
[wt: backport to 3.18: adjust context ; no FOLL_POPULATE ;
s390 uses generic arch_get_unmapped_area()]
[wt: backport to 3.16: adjust context]
[wt: backport to 3.10: adjust context ; code logic in PARISC's
arch_get_unmapped_area() wasn't found ; code inserted into
expand_upwards() and expand_downwards() runs under anon_vma lock;
changes for gup.c:faultin_page go to memory.c:__get_user_pages();
included Hugh Dickins' fixes]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 60e065f70bdb0b0e916389024922ad40f3270c96 upstream.
There is a bug in binutils 2.24 which causes miscompilation if we're
building little endian and using weak symbols (which the kernel does).
It is fixed in binutils commit 57fa7b8c7e59 "Correct elf_merge_st_other
arguments for weak symbols", which is in binutils 2.25 and has been
backported to the binutils 2.24 branch and has been picked up by most
distros it seems.
However if we're running stock 2.24 (no extra version) then the bug is
present, so check for that and bail.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 48fe9e9488743eec9b7c1addd3c93f12f2123d54 upstream.
In the past, there was only one load-with-reservation instruction,
lwarx, and if a program attempted a lwarx on a misaligned address, it
would take an alignment interrupt and the kernel handler would emulate
it as though it was lwzx, which was not really correct, but benign since
it is loading the right amount of data, and the lwarx should be paired
with a stwcx. to the same address, which would also cause an alignment
interrupt which would result in a SIGBUS being delivered to the process.
We now have 5 different sizes of load-with-reservation instruction. Of
those, lharx and ldarx cause an immediate SIGBUS by luck since their
entries in aligninfo[] overlap instructions which were not fixed up, but
lqarx overlaps with lhz and will be emulated as such. lbarx can never
generate an alignment interrupt since it only operates on 1 byte.
To straighten this out and fix the lqarx case, this adds code to detect
the l[hwdq]arx instructions and return without fixing them up, resulting
in a SIGBUS being delivered to the process.
[js] include disassemble.h in 3.12
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 708e75a3ee750dce1072134e630d66c4e6eaf63c upstream.
If kvmppc_handle_exit_pr() calls kvmppc_emulate_instruction() to emulate
one instruction (in the BOOK3S_INTERRUPT_H_EMUL_ASSIST case), it calls
kvmppc_core_queue_program() afterwards if kvmppc_emulate_instruction()
returned EMULATE_FAIL, so the guest gets an program interrupt for the
illegal opcode.
However, the kvmppc_emulate_instruction() also tried to inject a
program exception for this already, so the program interrupt gets
injected twice and the return address in srr0 gets destroyed.
All other callers of kvmppc_emulate_instruction() are also injecting
a program interrupt, and since the callers have the right knowledge
about the srr1 flags that should be used, it is the function
kvmppc_emulate_instruction() that should _not_ inject program
interrupts, so remove the kvmppc_core_queue_program() here.
This fixes the issue discovered by Laurent Vivier with kvm-unit-tests
where the logs are filled with these messages when the test tries
to execute an illegal instruction:
Couldn't emulate instruction 0x00000000 (op 0 xop 0)
kvmppc_handle_exit_pr: emulation at 700 failed (00000000)
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit c21a493a2b44650707d06741601894329486f2ad upstream.
Currently xmon data-breakpoint feature is broken.
Whenever there is a watchpoint match occurs, hw_breakpoint_handler will
be called by do_break via notifier chains mechanism. If watchpoint is
registered by xmon, hw_breakpoint_handler won't find any associated
perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break
also returns without notifying to xmon.
Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not
find any perf_event associated with matched watchpoint, rather than
NOTIFY_STOP, which tells the core code to continue calling the other
breakpoint handlers including the xmon one.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 6dff5b67054e17c91bd630bcdda17cfca5aa4215 upstream.
GCC 5 generates different code for this bootwrapper null check that
causes the PS3 to hang very early in its bootup. This check is of
limited value, so just get rid of it.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7ed23e1bae8bf7e37fd555066550a00b95a3a98b upstream.
On Power8 & Power9 the early CPU inititialisation in __init_HFSCR()
turns on HFSCR[TM] (Hypervisor Facility Status and Control Register
[Transactional Memory]), but that doesn't take into account that TM
might be disabled by CPU features, or disabled by the kernel being built
with CONFIG_PPC_TRANSACTIONAL_MEM=n.
So later in boot, when we have setup the CPU features, clear HSCR[TM] if
the TM CPU feature has been disabled. We use CPU_FTR_TM_COMP to account
for the CONFIG_PPC_TRANSACTIONAL_MEM=n case.
Without this a KVM guest might try use TM, even if told not to, and
cause an oops in the host kernel. Typically the oops is seen in
__kvmppc_vcore_entry() and may or may not be fatal to the host, but is
always bad news.
In practice all shipping CPU revisions do support TM, and all host
kernels we are aware of build with TM support enabled, so no one should
actually be able to hit this in the wild.
Fixes: 2a3563b023 ("powerpc: Setup in HFSCR for POWER8")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[mpe: Rewrite change log with input from Sam, add Fixes/stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[sb: Backported to linux-4.4.y: adjusted context]
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit fe0f3168169f7c34c29b0cf0c489f126a7f29643 upstream.
Make sure to drop any reference taken by bus_find_device() in the sysfs
callbacks that are used to create and destroy devices based on
device-tree entries.
Fixes: 6bccf755ff ("[POWERPC] ibmebus: dynamic addition/removal of adapters, some code cleanup")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 815a7141c4d1b11610dccb7fcbb38633759824f2 upstream.
Make sure to drop any reference taken by bus_find_device() when creating
devices during init and driver registration.
Fixes: 55347cc996 ("[POWERPC] ibmebus: Add device creation and bus probing based on of_device")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 8ae679c4bc2ea2d16d92620da8e3e9332fa4039f upstream.
I am getting the following warning when I build kernel 4.9-git on my
PowerBook G4 with a 32-bit PPC processor:
AS arch/powerpc/kernel/misc_32.o
arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]
This problem is evident after commit 989cea5c14be ("kbuild: prevent
lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
error that has been in the code since 2005 when this source file was
created. That was with commit 9994a33865 ("powerpc: Introduce
entry_{32,64}.S, misc_{32,64}.S, systbl.S").
The offending line does not make a lot of sense. This error does not
seem to cause any errors in the executable, thus I am not recommending
that it be applied to any stable versions.
Thanks to Nicholas Piggin for suggesting this solution.
Fixes: 9994a33865 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 80f23935cadb1c654e81951f5a8b7ceae0acc1b4 upstream.
PowerPC's "cmp" instruction has four operands. Normally people write
"cmpw" or "cmpd" for the second cmp operand 0 or 1. But, frequently
people forget, and write "cmp" with just three operands.
With older binutils this is silently accepted as if this was "cmpw",
while often "cmpd" is wanted. With newer binutils GAS will complain
about this for 64-bit code. For 32-bit code it still silently assumes
"cmpw" is what is meant.
In this instance the code comes directly from ISA v2.07, including the
cmp, but cmpd is correct. Backport to stable so that new toolchains can
build old kernels.
Fixes: 948cf67c47 ("powerpc: Add NAP mode support on Power7 in HV mode")
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 224264657b8b228f949b42346e09ed8c90136a8e upstream.
should clear on access_ok() failures. Also remove the useless
range truncation logics.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 11b7e154b132232535befe51c55db048069c8461 upstream.
When we merge two contiguous partitions whose signatures are marked
NVRAM_SIG_FREE, We need update prev's length and checksum, then write it
to nvram, not cur's. So lets fix this mistake now.
Also use memset instead of strncpy to set the partition's name. It's
more readable if we want to fill up with duplicate chars .
Fixes: fa2b4e54d4 ("powerpc/nvram: Improve partition removal")
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 1a34439e5a0b2235e43f96816dbb15ee1154f656 upstream.
Debugging a data corruption issue with virtio-net/vhost-net led to
the observation that __copy_tofrom_user was occasionally returning
a value 16 larger than it should. Since the return value from
__copy_tofrom_user is the number of bytes not copied, this means
that __copy_tofrom_user can occasionally return a value larger
than the number of bytes it was asked to copy. In turn this can
cause higher-level copy functions such as copy_page_to_iter_iovec
to corrupt memory by copying data into the wrong memory locations.
It turns out that the failing case involves a fault on the store
at label 79, and at that point the first unmodified byte of the
destination is at R3 + 16. Consequently the exception handler
for that store needs to add 16 to R3 before using it to work out
how many bytes were not copied, but in this one case it was not
adding the offset to R3. To fix it, this moves the label 179 to
the point where we add 16 to R3. I have checked manually all the
exception handlers for the loads and stores in this code and the
rest of them are correct (it would be excellent to have an
automated test of all the exception cases).
This bug has been present since this code was initially
committed in May 2002 to Linux version 2.5.20.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 5045ea37377ce8cca6890d32b127ad6770e6dce5 upstream.
__kernel_get_syscall_map() and __kernel_clock_getres() use cmpli to
check if the passed in pointer is non zero. cmpli maps to a 32 bit
compare on binutils, so we ignore the top 32 bits.
A simple test case can be created by passing in a bogus pointer with
the bottom 32 bits clear. Using a clk_id that is handled by the VDSO,
then one that is handled by the kernel shows the problem:
printf("%d\n", clock_getres(CLOCK_REALTIME, (void *)0x100000000));
printf("%d\n", clock_getres(CLOCK_BOOTTIME, (void *)0x100000000));
And we get:
0
-1
The bigger issue is if we pass a valid pointer with the bottom 32 bits
clear, in this case we will return success but won't write any data
to the pointer.
I stumbled across this issue because the LLVM integrated assembler
doesn't accept cmpli with 3 arguments. Fix this by converting them to
cmpldi.
Fixes: a7f290dad3 ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit f077aaf0754bcba0fffdbd925bc12f09cd1e38aa upstream.
In commit c60ac5693c ("powerpc: Update kernel VSID range", 2013-03-13)
we lost a check on the region number (the top four bits of the effective
address) for addresses below PAGE_OFFSET. That commit replaced a check
that the top 18 bits were all zero with a check that bits 46 - 59 were
zero (performed for all addresses, not just user addresses).
This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
and we will insert a valid SLB entry for it. The VSID used will be the
same as if the top 4 bits were 0, but the page size will be some random
value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
array in the paca. If that page size is the same as would be used for
region 0, then userspace just has an alias of the region 0 space. If the
page size is different, then no HPTE will be found for the access, and
the process will get a SIGSEGV (since hash_page_mm() will refuse to create
a HPTE for the bogus address).
The access beyond the end of the mm_ctx_high_slices_psize can be at most
5.5MB past the array, and so will be in RAM somewhere. Since the access
is a load performed in real mode, it won't fault or crash the kernel.
At most this bug could perhaps leak a little bit of information about
blocks of 32 bytes of memory located at offsets of i * 512kB past the
paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.
Fixes: c60ac5693c ("powerpc: Update kernel VSID range")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
This reverts commit 8110080dc5.
Guenter noticed that this breaks PPC build when CONFIG_PPC_TRANSACTIONAL_MEM
is set, because this patch was not for 3.10.
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>