Re: Relation bulk write facility - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Relation bulk write facility
Date
Msg-id 616dd326-c8a7-4ca0-84d5-c2f82f64de91@iki.fi
Whole thread Raw
In response to Re: Relation bulk write facility  (Noah Misch <noah@leadboat.com>)
Responses Re: Relation bulk write facility
List pgsql-hackers
Thanks for poking at this!

On 01/07/2024 23:52, Noah Misch wrote:
> Commit 8af2565 wrote:
>> --- /dev/null
>> +++ b/src/backend/storage/smgr/bulk_write.c
> 
>> +/*
>> + * Finish bulk write operation.
>> + *
>> + * This WAL-logs and flushes any remaining pending writes to disk, and fsyncs
>> + * the relation if needed.
>> + */
>> +void
>> +smgr_bulk_finish(BulkWriteState *bulkstate)
>> +{
>> +    /* WAL-log and flush any remaining pages */
>> +    smgr_bulk_flush(bulkstate);
>> +
>> +    /*
>> +     * When we wrote out the pages, we passed skipFsync=true to avoid the
>> +     * overhead of registering all the writes with the checkpointer.  Register
>> +     * the whole relation now.
>> +     *
>> +     * There is one hole in that idea: If a checkpoint occurred while we were
>> +     * writing the pages, it already missed fsyncing the pages we had written
>> +     * before the checkpoint started.  A crash later on would replay the WAL
>> +     * starting from the checkpoint, therefore it wouldn't replay our earlier
>> +     * WAL records.  So if a checkpoint started after the bulk write, fsync
>> +     * the files now.
>> +     */
>> +    if (!SmgrIsTemp(bulkstate->smgr))
>> +    {
> 
> Shouldn't this be "if (bulkstate->use_wal)"?  The GetRedoRecPtr()-based
> decision is irrelevant to the !wal case.  Either we don't need fsync at all
> (TEMP or UNLOGGED) or smgrDoPendingSyncs() will do it (wal_level=minimal).

The point of GetRedoRecPtr() is to detect if a checkpoint has started 
concurrently. It works for that purpose whether or not the bulk load is 
WAL-logged. It is not compared with the LSNs of WAL records written by 
the bulk load.

Unlogged tables do need to be fsync'd. The scenario is:

1. Bulk load an unlogged table.
2. Shut down Postgres cleanly
3. Pull power plug from server, and restart.

We talked about this earlier in the "Unlogged relation copy is not 
fsync'd" thread [1]. I had already forgotten about that; that bug 
actually still exists in back branches, and we should fix it..

[1] 
https://www.postgresql.org/message-id/flat/65e94fc8-ce1d-dd02-3be3-fda0fe8f2965%40iki.fi

> I don't see any functional problem, but this likely arranges for an
> unnecessary sync when a checkpoint starts between mdcreate() and
> here.  (The mdcreate() sync may also be unnecessary, but that's
> longstanding.)
Hmm, yes we might do two fsyncs() with wal_level=minimal, unnecessarily. 
It seems hard to eliminate the redundancy. smgr_bulk_finish() could skip 
the fsync, if it knew that smgrDoPendingSyncs() will do it later. 
However, smgrDoPendingSyncs() might also decide to WAL-log the relation 
instead of fsyncing it, and in that case we do still need the fsync.

Fortunately, fsync() on a file that's already flushed to disk is pretty 
cheap.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Relation bulk write facility
Next
From: Dean Rasheed
Date:
Subject: Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.