Directory fsync and other fun - Mailing list pgsql-hackers

From Andres Freund
Subject Directory fsync and other fun
Date
Msg-id 201002200230.16951.andres@anarazel.de
Whole thread Raw
Responses Re: Directory fsync and other fun
List pgsql-hackers
Hi all,

I started setting up some halfway automated method of simulating hard crashes 
and even while setting those up I found some pretty unsettling results...
Now its not unlikely that my testing is flawed but unfortunately I don't see 
where right now (its 3am now and I have a 8h trainride behind me, so ...)

The simple testsetup I have till now:
Serverscript:
* setup disk
* start pg
* wait for getting killed
* setup disk
* start pg

Clientside:
* CREATE DATABASE ... TEMPLATE crashtemplate
* CHECKPOINT
* make device readonly not allowing any cache flushes or such (using 
devicemapper)
kill server
* connect to database (some of the time it errors here
* select * from $every_table (some time here)

At first pg survived that nicely without any problems. Then I got to my senses 
and started adding some background io. Like:
dd if=/dev/zero of=/mnt/test/foobar bs=10M count=1000

Thats where things started failing. All are logs from after the crash:

1: 
FATAL:  could not read relation mapping file "base/140883/pg_filenode.map": 
Interrupted system call
DEBUG:  autovacuum: processing database "postgres"
FATAL:  could not read relation mapping file "base/140883/pg_filenode.map": 
Success
DEBUG:  autovacuum: processing database "postgres"
...
FATAL:  could not read relation mapping file "base/58963/pg_filenode.map": No 
such file or directory

2:
FATAL:  "base/165459" is not a valid data directory
DETAIL:  File "base/165459/PG_VERSION" does not contain valid data.
HINT:  You might need to initdb.

3:
You are now connected to database "test".
test=# SELECT execute('SELECT * FROM table_'||g.i) FROM generate_series(1, 
3000) g(i);
ERROR:  XX001: could not read block 0 in file "base/124499/11652": read only 0 
of 8192 bytes
LOCATION:  mdread, md.c:656
(that one I did not see with -o data=ordered,barrier=1,commit=300)


I tried the following mount options/filesystems so  far:
-t ext4 -o data=writeback,barrier=1,commit=300,noauto_da_alloc
-t ext4 -o data=writeback,barrier=1,commit=300
-t ext4 -o data=writeback,barrier=0,commit=300
-t ext4 -o data=ordered,barrier=0,commit=300,noauto_da_alloc
-t ext4 -o data=ordered,barrier=1,commit=300,noauto_da_alloc
-t ext4 -o data=ordered,barrier=1,commit=300

The same with s/ext4/ext3/ and with a commit=5. With the latter the errors 
were way much harder to reproduce (not that surprisingly) but still occured.

I attached my preliminary scripts/hacks... They even contain a comment or two. 
Note though that they are a bit of a loaded gun...

I guess it would be sensible trying to do some more extensive tests on a setup 
like that... All I tested till now was create database :-(

Andres


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Merge join and index scan strangeness
Next
From: Robert Haas
Date:
Subject: explain and PARAM_EXEC