2091 Commits

Author SHA1 Message Date
Stricted bdecc6d184 Merge tag 'v3.10.108' into update
This is the 3.10.108 stable release
2018-03-21 23:07:40 +01:00
Stricted 073b9047a0 Merge tag 'v3.10.107' into update
This is the 3.10.107 stable release
2018-03-21 23:07:35 +01:00
Stricted 47e5ca72da Merge tag 'v3.10.106' into update
This is the 3.10.106 stable release
2018-03-21 23:06:23 +01:00
Stricted ad957d335c Merge tag 'v3.10.105' into update
This is the 3.10.105 stable release
2018-03-21 23:00:38 +01:00
Stricted b9e7bc93d6 Merge tag 'v3.10.103' into update
This is the 3.10.103 stable release
2018-03-21 22:58:21 +01:00
Stricted a8732f92e3 Merge tag 'v3.10.102' into update
This is the 3.10.102 stable release
2018-03-21 22:54:09 +01:00
Stricted 647f2da1e2 Merge tag 'v3.10.98' into update
This is the 3.10.98 stable release
2018-03-21 22:51:37 +01:00
Stricted dd388bd4cd Merge tag 'v3.10.97' into update
This is the 3.10.97 stable release
2018-03-21 22:51:04 +01:00
Stricted f3d34b554f Merge tag 'v3.10.95' into update
This is the 3.10.95 stable release
2018-03-21 22:50:56 +01:00
Stricted 38b8911896 Merge tag 'v3.10.85' into update
This is the 3.10.85 stable release
2018-03-21 22:46:39 +01:00
Stricted 5eab702925 Merge tag 'v3.10.80' into update
This is the 3.10.80 stable release
2018-03-21 22:45:22 +01:00
Stricted 9d35d890f3 Merge tag 'v3.10.78' into update
This is the 3.10.78 stable release
2018-03-21 22:44:38 +01:00
Stricted 9b13083065 Merge tag 'v3.10.77' into update
This is the 3.10.77 stable release
2018-03-21 22:44:34 +01:00
Stricted 4a2455f795 Merge tag 'v3.10.69' into update
This is the 3.10.69 stable release
2018-03-21 22:39:46 +01:00
Stricted b2d402e5a4 Merge tag 'v3.10.67' into update
This is the 3.10.67 stable release
2018-03-21 22:36:30 +01:00
Stricted 6f56b75961 Merge tag 'v3.10.60' into update
This is the 3.10.60 stable release
2018-03-21 22:31:34 +01:00
Stricted 4b9e97964e import PULS_20180308 2018-03-13 20:30:12 +01:00
Stricted 6fa3eb70c0 import PULS_20160108 2018-03-13 20:29:02 +01:00
Jan Kara 9f75306bea ext4: avoid deadlock when expanding inode size
commit 2e81a4eeedcaa66e35f58b81e0755b87057ce392 upstream.

When we need to move xattrs into external xattr block, we call
ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end
up calling ext4_mark_inode_dirty() again which will recurse back into
the inode expansion code leading to deadlocks.

Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move
its management into ext4_expand_extra_isize_ea() since its manipulation
is safe there (due to xattr_sem) from possible races with
ext4_xattr_set_handle() which plays with it as well.

CC: stable@vger.kernel.org   # 4.4.x
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-11-02 07:16:21 +01:00
Darrick J. Wong ebb33aff58 ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets
commit 1bd8d6cd3e413d64e543ec3e69ff43e75a1cf1ea upstream.

In the ext4 implementations of SEEK_HOLE and SEEK_DATA, make sure we
return -ENXIO for negative offsets instead of banging around inside
the extent code and returning -EFSCORRUPTED.

Reported-by: Mateusz S <muttdini@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # 4.6
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-11-02 07:16:21 +01:00
Jan Kara 2f34aab1a8 ext4: fix SEEK_HOLE
commit 7d95eddf313c88b24f99d4ca9c2411a4b82fef33 upstream.

Currently, SEEK_HOLE implementation in ext4 may both return that there's
a hole at some offset although that offset already has data and skip
some holes during a search for the next hole. The first problem is
demostrated by:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file
wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec)
Whence	Result
HOLE	0

Where we can see that SEEK_HOLE wrongly returned offset 0 as containing
a hole although we have written data there. The second problem can be
demonstrated by:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
       -c "seek -h 0" file

wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec)
wrote 8192/8192 bytes at offset 131072
8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec)
Whence	Result
HOLE	139264

Where we can see that hole at offsets 56k..128k has been ignored by the
SEEK_HOLE call.

The underlying problem is in the ext4_find_unwritten_pgoff() which is
just buggy. In some cases it fails to update returned offset when it
finds a hole (when no pages are found or when the first found page has
higher index than expected), in some cases conditions for detecting hole
are just missing (we fail to detect a situation where indices of
returned pages are not contiguous).

Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page
indices and also handle all cases where we got less pages then expected
in one place and handle it properly there.

Fixes: c8c0df241c
CC: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-11-02 07:16:20 +01:00
Konstantin Khlebnikov 9c66c82c5f ext4: keep existing extra fields when inode expands
commit 887a9730614727c4fff7cb756711b190593fc1df upstream.

ext4_expand_extra_isize() should clear only space between old and new
size.

Fixes: 6dd4ee7cab # v2.6.23
Cc: stable@vger.kernel.org
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-11-02 07:16:17 +01:00
Jerry Lee c39dcbde38 ext4: fix overflow caused by missing cast in ext4_resize_fs()
commit aec51758ce10a9c847a62a48a168f8c804c6e053 upstream.

On a 32-bit platform, the value of n_blcoks_count may be wrong during
the file system is resized to size larger than 2^32 blocks.  This may
caused the superblock being corrupted with zero blocks count.

Fixes: 1c6bd7173d
Signed-off-by: Jerry Lee <jerrylee@qnap.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # 3.7+
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-11-01 22:12:44 +01:00
Jan Kara bdc5fbb365 ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
commit fcf5ea10992fbac3c7473a1db33d56a139333cd1 upstream.

ext4_find_unwritten_pgoff() does not properly handle a situation when
starting index is in the middle of a page and blocksize < pagesize. The
following command shows the bug on filesystem with 1k blocksize:

  xfs_io -f -c "falloc 0 4k" \
            -c "pwrite 1k 1k" \
            -c "pwrite 3k 1k" \
            -c "seek -a -r 0" foo

In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.

Fix the problem by neglecting buffers in a page before starting offset.

Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
CC: stable@vger.kernel.org # 3.8+
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-11-01 22:12:44 +01:00
Daeho Jeong cd0d925440 ext4: fix inode checksum calculation problem if i_extra_size is small
commit 05ac5aa18abd7db341e54df4ae2b4c98ea0e43b7 upstream.

We've fixed the race condition problem in calculating ext4 checksum
value in commit b47820edd163 ("ext4: avoid modifying checksum fields
directly during checksum veficationon"). However, by this change,
when calculating the checksum value of inode whose i_extra_size is
less than 4, we couldn't calculate the checksum value in a proper way.
This problem was found and reported by Nix, Thank you.

Reported-by: Nix <nix@esperi.org.uk>
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-20 08:02:37 +02:00
Theodore Ts'o 48a5889bfd ext4: return EROFS if device is r/o and journal replay is needed
commit 4753d8a24d4588657bc0a4cd66d4e282dff15c8c upstream.

If the file system requires journal recovery, and the device is
read-ony, return EROFS to the mount system call.  This allows xfstests
generic/050 to pass.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-20 08:02:36 +02:00
Theodore Ts'o 399562b677 ext4: preserve the needs_recovery flag when the journal is aborted
commit 97abd7d4b5d9c48ec15c425485f054e1c15e591b upstream.

If the journal is aborted, the needs_recovery feature flag should not
be removed.  Otherwise, it's the journal might not get replayed and
this could lead to more data getting lost.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-20 08:02:36 +02:00
Jan Kara 98f58e0523 ext4: trim allocation requests to group size
commit cd648b8a8fd5071d232242d5ee7ee3c0815776af upstream.

If filesystem groups are artifically small (using parameter -g to
mkfs.ext4), ext4_mb_normalize_request() can result in a request that is
larger than a block group. Trim the request size to not confuse
allocation code.

Reported-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-20 08:02:36 +02:00
Theodore Ts'o 77bd57e66e ext4: fix fencepost in s_first_meta_bg validation
commit 2ba3e6e8afc9b6188b471f27cf2b5e3cf34e7af2 upstream.

It is OK for s_first_meta_bg to be equal to the number of block group
descriptor blocks.  (It rarely happens, but it shouldn't cause any
problems.)

https://bugzilla.kernel.org/show_bug.cgi?id=194567

Fixes: 3a4b77cd47bb837b8557595ec7425f281f2ca1fe
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-20 08:02:36 +02:00
Eryu Guan 188b2ebb36 ext4: validate s_first_meta_bg at mount time
commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe upstream.

Ralf Spenneberg reported that he hit a kernel crash when mounting a
modified ext4 image. And it turns out that kernel crashed when
calculating fs overhead (ext4_calculate_overhead()), this is because
the image has very large s_first_meta_bg (debug code shows it's
842150400), and ext4 overruns the memory in count_overhead() when
setting bitmap buffer, which is PAGE_SIZE.

ext4_calculate_overhead():
  buf = get_zeroed_page(GFP_NOFS);  <=== PAGE_SIZE buffer
  blks = count_overhead(sb, i, buf);

count_overhead():
  for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400
          ext4_set_bit(EXT4_B2C(sbi, s++), buf);   <=== buffer overrun
          count++;
  }

This can be reproduced easily for me by this script:

  #!/bin/bash
  rm -f fs.img
  mkdir -p /mnt/ext4
  fallocate -l 16M fs.img
  mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img
  debugfs -w -R "ssv first_meta_bg 842150400" fs.img
  mount -o loop fs.img /mnt/ext4

Fix it by validating s_first_meta_bg first at mount time, and
refusing to mount if its value exceeds the largest possible meta_bg
number.

[js] use EXT4_HAS_INCOMPAT_FEATURE instead of new
     ext4_has_feature_meta_bg

Reported-by: Ralf Spenneberg <ralf@os-t.de>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-20 08:02:11 +02:00
Theodore Ts'o d61f4e22b4 ext4: add sanity checking to count_overhead()
commit c48ae41bafe31e9a66d8be2ced4e42a6b57fa814 upstream.

The commit "ext4: sanity check the block and cluster size at mount
time" should prevent any problems, but in case the superblock is
modified while the file system is mounted, add an extra safety check
to make sure we won't overrun the allocated buffer.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-19 20:28:15 +02:00
Theodore Ts'o bd652ad14e ext4: fix in-superblock mount options processing
commit 5aee0f8a3f42c94c5012f1673420aee96315925a upstream.

Fix a large number of problems with how we handle mount options in the
superblock.  For one, if the string in the superblock is long enough
that it is not null terminated, we could run off the end of the string
and try to interpret superblocks fields as characters.  It's unlikely
this will cause a security problem, but it could result in an invalid
parse.  Also, parse_options is destructive to the string, so in some
cases if there is a comma-separated string, it would be modified in
the superblock.  (Fortunately it only happens on file systems with a
1k block size.)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-19 20:28:15 +02:00
Theodore Ts'o 408d8245b8 ext4: use more strict checks for inodes_per_block on mount
commit cd6bb35bf7f6d7d922509bf50265383a0ceabe96 upstream.

Centralize the checks for inodes_per_block and be more strict to make
sure the inodes_per_block_group can't end up being zero.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-19 20:28:15 +02:00
Eric Biggers 3f102dc505 ext4: mark inode dirty after converting inline directory
commit b9cf625d6ecde0d372e23ae022feead72b4228a6 upstream.

If ext4_convert_inline_data() was called on a directory with inline
data, the filesystem was left in an inconsistent state (as considered by
e2fsck) because the file size was not increased to cover the new block.
This happened because the inode was not marked dirty after i_disksize
was updated.  Fix this by marking the inode dirty at the end of
ext4_finish_convert_inline_dir().

This bug was probably not noticed before because most users mark the
inode dirty afterwards for other reasons.  But if userspace executed
FS_IOC_SET_ENCRYPTION_POLICY with invalid parameters, as exercised by
'kvm-xfstests -c adv generic/396', then the inode was never marked dirty
after updating i_disksize.

Fixes: 3c47d54170
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:47:05 +02:00
Dan Carpenter e74af998c0 ext4: return -ENOMEM instead of success
commit 578620f451f836389424833f1454eeeb2ffc9e9f upstream.

We should set the error code if kzalloc() fails.

Fixes: 67cf5b09a4 ("ext4: add the basic function for inline data support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:46:49 +02:00
Darrick J. Wong b70876f0d7 ext4: reject inodes with negative size
commit 7e6e1ef48fc02f3ac5d0edecbb0c6087cd758d58 upstream.

Don't load an inode with a negative size; this causes integer overflow
problems in the VFS.

[ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT]

js: use EIO for 3.12 instead of EFSCORRUPTED.

Fixes: a48380f769 (ext4: rename i_dir_acl to i_size_high)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:46:49 +02:00
Chandan Rajendra ac3d8fba06 ext4: fix stack memory corruption with 64k block size
commit 30a9d7afe70ed6bd9191d3000e2ef1a34fb58493 upstream.

The number of 'counters' elements needed in 'struct sg' is
super_block->s_blocksize_bits + 2. Presently we have 16 'counters'
elements in the array. This is insufficient for block sizes >= 32k. In
such cases the memcpy operation performed in ext4_mb_seq_groups_show()
would cause stack memory corruption.

Fixes: c9de560ded
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:46:49 +02:00
Chandan Rajendra cafd2159d1 ext4: fix mballoc breakage with 64k block size
commit 69e43e8cc971a79dd1ee5d4343d8e63f82725123 upstream.

'border' variable is set to a value of 2 times the block size of the
underlying filesystem. With 64k block size, the resulting value won't
fit into a 16-bit variable. Hence this commit changes the data type of
'border' to 'unsigned int'.

Fixes: c9de560ded
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:46:49 +02:00
Jan Kara 9488a47761 ext4: fix data exposure after a crash
commit 06bd3c36a733ac27962fea7d6f47168841376824 upstream.

Huang has reported that in his powerfail testing he is seeing stale
block contents in some of recently allocated blocks although he mounts
ext4 in data=ordered mode. After some investigation I have found out
that indeed when delayed allocation is used, we don't add inode to
transaction's list of inodes needing flushing before commit. Originally
we were doing that but commit f3b59291a6 removed the logic with a
flawed argument that it is not needed.

The problem is that although for delayed allocated blocks we write their
contents immediately after allocating them, there is no guarantee that
the IO scheduler or device doesn't reorder things and thus transaction
allocating blocks and attaching them to inode can reach stable storage
before actual block contents. Actually whenever we attach freshly
allocated blocks to inode using a written extent, we should add inode to
transaction's ordered inode list to make sure we properly wait for block
contents to be written before committing the transaction. So that is
what we do in this patch. This also handles other cases where stale data
exposure was possible - like filling hole via mmap in
data=ordered,nodelalloc mode.

The only exception to the above rule are extending direct IO writes where
blkdev_direct_IO() waits for IO to complete before increasing i_size and
thus stale data exposure is not possible. For now we don't complicate
the code with optimizing this special case since the overhead is pretty
low. In case this is observed to be a performance problem we can always
handle it using a special flag to ext4_map_blocks().

Fixes: f3b59291a6
Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:46:48 +02:00
Jan Kara dd2421b5ed posix_acl: Clear SGID bit when setting file permissions
commit 073931017b49d9458aa351605b43a7e34598caef upstream.

When file permissions are modified via chmod(2) and the user is not in
the owning group or capable of CAP_FSETID, the setgid bit is cleared in
inode_change_ok().  Setting a POSIX ACL via setxattr(2) sets the file
permissions as well as the new ACL, but doesn't clear the setgid bit in
a similar way; this allows to bypass the check in chmod(2).  Fix that.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[wt: dropped hfsplus changes : no xattr in 3.10]
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:46:47 +02:00
Theodore Ts'o a7ee70f64f ext4: sanity check the block and cluster size at mount time
commit 8cdf3372fe8368f56315e66bea9f35053c418093 upstream.

If the block size or cluster size is insane, reject the mount.  This
is important for security reasons (although we shouldn't be just
depending on this check).

Ref: http://www.securityfocus.com/archive/1/539661
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:24 +01:00
Ross Zwisler 7a97321f5c ext4: allow DAX writeback for hole punch
commit cca32b7eeb4ea24fa6596650e06279ad9130af98 upstream.

Currently when doing a DAX hole punch with ext4 we fail to do a writeback.
This is because the logic around filemap_write_and_wait_range() in
ext4_punch_hole() only looks for dirty page cache pages in the radix tree,
not for dirty DAX exceptional entries.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:23 +01:00
Daeho Jeong 1326ba8707 ext4: reinforce check of i_dtime when clearing high fields of uid and gid
commit 93e3b4e6631d2a74a8cf7429138096862ff9f452 upstream.

Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid
of deleted and evicted inode to fix up interoperability with old
kernels. However, it checks only i_dtime of an inode to determine
whether the inode was deleted and evicted, and this is very risky,
because i_dtime can be used for the pointer maintaining orphan inode
list, too. We need to further check whether the i_dtime is being
used for the orphan inode list even if the i_dtime is not NULL.

We found that high 16-bit fields of uid/gid of inode are unintentionally
and permanently cleared when the inode truncation is just triggered,
but not finished, and the inode metadata, whose high uid/gid bits are
cleared, is written on disk, and the sudden power-off follows that
in order.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Hobin Woo <hobin.woo@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:23 +01:00
Konstantin Khlebnikov fa8a01a81a ext4: use __GFP_NOFAIL in ext4_free_blocks()
commit adb7ef600cc9d9d15ecc934cc26af5c1379777df upstream.

This might be unexpected but pages allocated for sbi->s_buddy_cache are
charged to current memory cgroup. So, GFP_NOFS allocation could fail if
current task has been killed by OOM or if current memory cgroup has no
free memory left. Block allocator cannot handle such failures here yet.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:23 +01:00
Daeho Jeong 3a45bbb2f9 ext4: avoid modifying checksum fields directly during checksum verification
commit b47820edd1634dc1208f9212b7ecfb4230610a23 upstream.

We temporally change checksum fields in buffers of some types of
metadata into '0' for verifying the checksum values. By doing this
without locking the buffer, some metadata's checksums, which are
being committed or written back to the storage, could be damaged.
In our test, several metadata blocks were found with damaged metadata
checksum value during recovery process. When we only verify the
checksum value, we have to avoid modifying checksum fields directly.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:23 +01:00
Theodore Ts'o 7d50661372 ext4: validate that metadata blocks do not overlap superblock
commit 829fa70dddadf9dd041d62b82cd7cea63943899d upstream.

A number of fuzzing failures seem to be caused by allocation bitmaps
or other metadata blocks being pointed at the superblock.

This can cause kernel BUG or WARNings once the superblock is
overwritten, so validate the group descriptor blocks to make sure this
doesn't happen.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:22 +01:00
Vegard Nossum e7dcdba7d6 ext4: fix reference counting bug on block allocation error
commit 554a5ccc4e4a20c5f3ec859de0842db4b4b9c77e upstream.

If we hit this error when mounted with errors=continue or
errors=remount-ro:

    EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata

then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
continue. However, ext4_mb_release_context() is the wrong thing to call
here since we are still actually using the allocation context.

Instead, just error out. We could retry the allocation, but there is a
possibility of getting stuck in an infinite loop instead, so this seems
safer.

[ Fixed up so we don't return EAGAIN to userspace. --tytso ]

Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
[wt: 3.10 doesn't have EFSCORRUPTED, but XFS uses EUCLEAN as does 3.14
     on this patch so use this instead]

Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:49 +02:00
Vegard Nossum bf86199bfa ext4: short-cut orphan cleanup on error
commit c65d5c6c81a1f27dec5f627f67840726fcd146de upstream.

If we encounter a filesystem error during orphan cleanup, we should stop.
Otherwise, we may end up in an infinite loop where the same inode is
processed again and again.

    EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
    EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
    Aborting journal on device loop0-8.
    EXT4-fs (loop0): Remounting filesystem read-only
    EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
    EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
    EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
    EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
    EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
    EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
    EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
    EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
    [...]
    EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
    [...]
    EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
    [...]

See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:49 +02:00
Vegard Nossum 8b6ab35cbc ext4: don't call ext4_should_journal_data() on the journal inode
commit 6a7fd522a7c94cdef0a3b08acf8e6702056e635c upstream.

If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
to call ext4_should_journal_data() before superblock options and flags
are fully set up.  In that case, the iput() on the journal inode can
end up causing a BUG().

Work around this problem by reordering the tests so we only call
ext4_should_journal_data() after we know it's not the journal inode.

Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data")
Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang")
Cc: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:49 +02:00
Vegard Nossum 6dc68acbb6 ext4: check for extents that wrap around
commit f70749ca42943faa4d4dcce46dfdcaadb1d0c4b6 upstream.

An extent with lblock = 4294967295 and len = 1 will pass the
ext4_valid_extent() test:

	ext4_lblk_t last = lblock + len - 1;

	if (len == 0 || lblock > last)
		return 0;

since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().

We can simplify it by removing the - 1 altogether and changing the test
to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
lblock and it fails, and if len > 0 then lblock + len > lblock in order
to pass (i.e. it doesn't overflow).

Fixes: 5946d0893 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Fixes: 2f974865f ("ext4: check for zero length extent explicitly")
Cc: Eryu Guan <guaneryu@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:49 +02:00