fs issues on software raid0 (PG_VERSION does not contain valid data) - Mailing list pgsql-hackers

From Tomas Vondra
Subject fs issues on software raid0 (PG_VERSION does not contain valid data)
Date
Msg-id 5623E419.5030109@2ndquadrant.com
Whole thread Raw
Responses Re: fs issues on software raid0 (PG_VERSION does not contain valid data)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi there,

I've been doing a lot of filesystem testing / benchmarking recently, and
today I've ran into a really strange issue with ext4 on two SSD devices
in a RAID-0 configuration (Linux software raid).

I'm currently trying rerunning the test to see if it's reproducible, but
maybe someone has an idea of what might be the problem.

The issue demonstrates like this:

   FATAL:  "base/12140" is not a valid data directory
   DETAIL:  File "base/12140/PG_VERSION" does not contain valid data.
   HINT:  You might need to initdb.

The paths are obviously nonsense. But it gets funnier - the database
continues to run seemingly just fine (doing checkpoints, serving
queries, ...), until this happens:

   ERROR:  index "pg_type_oid_index" contains unexpected zero page
           at block 3 at character 61

This happens after the  benchmarking script runs vacuumdb:

   vacuumdb: query failed: ERROR:  index "pg_type_oid_index" contains
              unexpected zero page at block 3
   LINE 1: ...LECT datname FROM pg_database WHERE datallowconn ORDER BY 1

Attached is a PostgreSQL log for the whole benchmark run, log tracking
the benchmark script (useful for mapping the pg.log to steps of the
benchmark), and also log with mdadm info.

, which initializes a new cluster and then does this:

   1) run on small dataset (scale=10)
      - pgbench init
      - vacuumdb
      - warmup
      - pgbench runs for various client runs (with explicit checkpoints)

   2) run on large data set (scale=1100)
      - ... same as for (1)

   3) run on medium data set (scale=140)
      - ... same as for (1)

(The data set sizes are for a machine with 8GB of RAM.)

Anyway, the (1) completes without any errors, then while doing warmup
for (2) the "not a valid data directory" errors start to pop up, and
finally when (3) attempts to do the vacuumdb, it fails because of the
zero page in pg_type_oid_index.

All this happens on an ext4 filesystem, created on a sw raid0 manager by
kernel 4.0.4. The filesystem is created like this:

   mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/sda1 /dev/sdb1
   mkfs.ext4 -E stride=128,stripe-width=256 /dev/md0

and mounted like this

   /dev/md0 on /mnt/data type ext4 (rw,noatime,nobarrier,discard)

Neither the array nor the filesystem are corrupted in any way, and
there's no sign of kernel issues in any of the logs (/var/log/messages
or dmesg, for example).

Also, I've done a number of tests with ext4 with exactly the same mount
options, but placed directly on a single device (thus not going through
the sw raid layers), and none of those had this issue.

So it seems to me that either the sw raid somehow breaks the guarantees
we expect from ext4, or something like that. Another possibility is that
using two devices introduces some sort of race condition somewhere in
the stack. Or maybe it's not safe to use nobarrier in this case, I don't
know.

Now, I don't really think people should use software raid in cases when
data durability matters, but I'm not sure that's where the problem is.

I've found two threads that might be somewhat related:

1)
http://www.postgresql.org/message-id/201002200230.16951.andres@anarazel.de

   - Same error message, but I don't see any conclusion except for
    "cannot happen" from Greg.

2) http://www.postgresql.org/message-id/48331F9F.9030508@demabg.com

   - Essentially talks about failed RAID5 array, but that does not seem
     to be the case here (no RAID failures here).


BTW this was done on PostgreSQL 9.4.x.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan
Next
From: Andres Freund
Date:
Subject: Re: fs issues on software raid0 (PG_VERSION does not contain valid data)