Re: replay of CREATE TABLESPACE eats data at wal_level=minimal - Mailing list pgsql-hackers

From Noah Misch
Subject Re: replay of CREATE TABLESPACE eats data at wal_level=minimal
Date
Msg-id 20210810012307.GB2693@rfd.leadboat.com
Whole thread Raw
In response to replay of CREATE TABLESPACE eats data at wal_level=minimal  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: replay of CREATE TABLESPACE eats data at wal_level=minimal  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Aug 09, 2021 at 01:08:42PM -0400, Robert Haas wrote:
> To reproduce, initialize a cluster with wal_level=minimal and
> max_wal_senders=0. Then from psql:
> 
> \! mkdir /tmp/goose
> 
> CHECKPOINT;
> CREATE TABLESPACE goose LOCATION '/tmp/goose';
> SET wal_skip_threshold=0;
> BEGIN;
> CREATE TABLE wild (a int, b text) TABLESPACE goose;
> INSERT INTO wild VALUES (1, 'chase');
> COMMIT;
> SELECT * FROM wild;
> 
> As expected, you will see one row in table 'wild'. Now perform an
> immediate shutdown. Restart the server. Table 'wild' is now empty.

Thanks for finding the problem.  It's a bad problem.

> The problem appears to be that tblspc_redo() calls
> create_tablespace_directories(), which says:
> 
>         /*
>          * Our theory for replaying a CREATE is to forcibly drop the target
>          * subdirectory if present, and then recreate it. This may be more
>          * work than needed, but it is simple to implement.
>          */
> 
> Unfortunately, this theory (which dates to
> c86f467d18aa58e18fd85b560b46d8de014e6017, vintage 2010, by Bruce) is
> correct only with wal_level>minimal. At wal_level='minimal', we can
> replay the record to recreate the relfilenode, but not the records
> that would have created the contents. However, note that if the table
> is smaller than wal_skip_threshold, then we'll log full-page images of
> the contents at commit time even at wal_level='minimal' after which we
> have no problem. As far as I can see, this bug has "always" existed,
> but before c6b92041d38512a4176ed76ad06f713d2e6c01a8 (2020, Noah) you
> would have needed a different test case. Specifically, you would have
> needed to use COPY to put the row in the table, and you would have
> needed to omit setting wal_skip_threshold since it didn't exist yet.

Agreed.

> I don't presently have a specific idea about how to fix this.

Can't recovery just not delete the directory, create it if doesn't exist, and
be happy if it does exist?  Like the attached WIP.  If we think it's possible
for a crash during mkdir to leave a directory having the wrong permissions,
adding a chmod would be in order.

Attachment

pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Another regexp performance improvement: skip useless paren-captures
Next
From: Mark Dilger
Date:
Subject: Re: Another regexp performance improvement: skip useless paren-captures