Fix checkpointer PANIC due to missing fsync cancel in mdunlinkfork() - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject Fix checkpointer PANIC due to missing fsync cancel in mdunlinkfork()
Date
Msg-id CAHg+QDdH+uBc0deP7uPE5rBkn0EjdkVC2_8ykK3KV90V4z6_5Q@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi hackers,

I think I found a bug in mdunlinkfork() that causes the checkpointer 
to PANIC with "could not fsync file ... No such file or directory" when 
a relation is moved between tablespaces and the old tablespace is dropped.

mdunlinkfork() has two code paths for unlinking a relation segment file:
(1) The 'if' branch (isRedo, binary upgrade, non-MAIN forknum, or temp
relations) correctly calls register_forget_request() to cancel any pending 
fsync before unlinking. (2) The 'else' branch that handles normal 
MAIN_FORKNUM unlink is missing register_forget_request(). 
This leaves a stale SYNC_REQUEST entry in the checkpointer's 
pendingOps hash table.

The comment in ProcessSyncRequests() states below:

/*
* The fsync table could contain requests to fsync segments that
* have been deleted (unlinked) by the time we get to them. Rather
* than just hoping an ENOENT (or EACCES on Windows) error can be
* ignored, what we do on error is absorb pending requests and
* then retry. Since mdunlink() queues a "cancel" message before
* actually unlinking, the fsync request is guaranteed to be
* marked canceled after the absorb if it really was this case.
* DROP DATABASE likewise has to tell us to forget fsync requests
* before it starts deletions.
*/

but this guarantee is not provided by the else branch.

Repro:
Reproducing this is not easy as it is a race but the test below
when repeated sufficient number of times you can see ocasionally
checkpointer PANIC. Though I used TABLESPACE for somewhat
consistent repro, this race exists for operations like DROP table,
TRUNCATE table as well I believe. 

CREATE TABLESPACE ts LOCATION '/tmp/ts';
CREATE TABLE t (id int, pad text) TABLESPACE ts;
INSERT INTO t SELECT g, repeat('x', 500) FROM generate_series(1,50000) g;

-- Concurrent updates to dirty shared buffers (background sessions)
UPDATE t SET pad = repeat('y', 500) WHERE id <= 25000;

-- Move table away; FlushRelationBuffers sends SYNC_REQUESTs for old path
ALTER TABLE t SET TABLESPACE pg_default;

-- Drop tablespace forces checkpoint + removes directory
DROP TABLESPACE ts;

-- Next checkpoint hits stale entry -> PANIC
CHECKPOINT;

PANIC:  could not fsync file "pg_tblspc/41343/PG_19_202604061/5/41348": No such file or directory
LOG:  checkpointer process (PID 166749) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing

The fix is to add register_forget_request() in the else branch of
mdunlinkfork(), before register_unlink_segment(), matching what the
'if' branch already does.

Thoughts?

Thanks,
Satya
Attachment

pgsql-hackers by date:

Previous
From: Chengpeng Yan
Date:
Subject: Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types
Next
From: Chao Li
Date:
Subject: Re: Bug: Rule actions see wrong values for generated columns (NEW.gen reads OLD value)