Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption - Mailing list pgsql-bugs

From TAKATSUKA Haruka
Subject Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption
Date
Msg-id 20191220101952.1e07a9d6113896d3be1a31ea@sraoss.co.jp
Whole thread Raw
In response to BUG #16172: failure of vacuum file truncation can cause permanent data corruption  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption  (TAKATSUKA Haruka <harukat@sraoss.co.jp>)
List pgsql-bugs
I also tested PostgreSQL with the attached patch avoided this data
corruption. The patch just removes DropRelFileNodeBuffers() from
smgrtruncate().


On Thu, 19 Dec 2019 07:14:42 +0000
PG Bug reporting form <noreply@postgresql.org> wrote:

> The following bug has been logged on the website:
> 
> Bug reference:      16172
> Logged by:          TAKATSUKA Haruka
> Email address:      harukat@sraoss.co.jp
> PostgreSQL version: 12.1
> Operating system:   Windows/Linux
> Description:        
> 
> Hello, pgsql hackers,
> 
> I found that failure of vacuum file truncation can cause permanent data
> corruption.
> I am reporting the reproduce steps below.
> 
> In Windows installation, the truncation sometime fails by permission
> denied error because of anti-virus software. It has caused just ERROR
> and people have offen dismissed it.
> 
> Truncation failure can also make the standby panic with the following
> messages when replaying Heap2/VISIBLE or Heap2/CLEAN, because truncation
> wal is emitted even if it doesn't complete actually in the primary.
> 
>  WARNING:  page .. of relation base/..../.... does not exist
>  CONTEXT:  WAL redo at ..... for ....: cutoff xid ... flags ...
>  PANIC:  WAL contains references to invalid pages
> 
> I think truncation failure is to be handled as more severe level.
> Any thoughts?
> 
> with best regards,
> Haruka Takatsuka / SRA OSS, Inc. Japan
> 
> 
> reproduce steps (PG12)
> ======================
> 
> $ psql -U postgres -d db1
> Pager usage is off.
> psql (12.1)
> Type "help" for help.
> 
> db1=# 
> 
>   $ gdb -p {its backend process}
> 
>   (gdb) b FileTruncate
>   Breakpoint 1 at 0x73d320: file fd.c, line 2057.
>   (gdb) c
>   Continuing.
> 
> db1=# SHOW autovacuum;
>  autovacuum
> ------------
>  off
> (1 row)
> 
> db1=# CREATE TABLE t1 (id int primary key, v text);
> CREATE
> 
> db1=# INSERT INTO t1 SELECT g, md5(g::text) FROM generate_series(1, 10000)
> as g;
> INSERT 0 10000
> 
> db1=# CHECKPOINT;
> 
>   Program received signal SIGUSR1, User defined signal 1.
>   0x00000036caae91a3 in __epoll_wait_nocancel () from /lib64/libc.so.6
>   (gdb) c
>   Continuing.
> 
> CHECKPOINT
> 
> db1=# DELETE FROM t1 WHERE id > 50;
> DELETE 9950
> 
> db1=# VACUUM t1;
> 
>   Breakpoint 1, FileTruncate (file=59, offset=8192,
> wait_event_info=167772175)
>       at fd.c:2057
>   2057    {
>   (gdb) n
>   2065            returnCode = FileAccess(file);
>   (gdb) n
>   2066            if (returnCode < 0)
>   (gdb) p returnCode = -100
>   $6 = -100
>   (gdb) c
>   Continuing.
> 
> ERROR:  could not truncate file "base/16384/16645" to 1 blocks: Success
> 
> db1=# SELECT count(*) FROM t1;
>  count
> -------
>   9930
> (1 row)
> 
(snip)

Attachment

pgsql-bugs by date:

Previous
From: Juan José Santamaría Flecha
Date:
Subject: Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)
Next
From: TAKATSUKA Haruka
Date:
Subject: Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption