Re: Relation bulk write facility - Mailing list pgsql-hackers
| From | Noah Misch |
|---|---|
| Subject | Re: Relation bulk write facility |
| Date | |
| Msg-id | 20240701205250.62.nmisch@google.com Whole thread Raw |
| In response to | Re: Relation bulk write facility (Heikki Linnakangas <hlinnaka@iki.fi>) |
| Responses |
Re: Relation bulk write facility
|
| List | pgsql-hackers |
On Fri, Feb 23, 2024 at 04:27:34PM +0200, Heikki Linnakangas wrote:
> Committed this. Thanks everyone!
Commit 8af2565 wrote:
> --- /dev/null
> +++ b/src/backend/storage/smgr/bulk_write.c
> +/*
> + * Finish bulk write operation.
> + *
> + * This WAL-logs and flushes any remaining pending writes to disk, and fsyncs
> + * the relation if needed.
> + */
> +void
> +smgr_bulk_finish(BulkWriteState *bulkstate)
> +{
> + /* WAL-log and flush any remaining pages */
> + smgr_bulk_flush(bulkstate);
> +
> + /*
> + * When we wrote out the pages, we passed skipFsync=true to avoid the
> + * overhead of registering all the writes with the checkpointer. Register
> + * the whole relation now.
> + *
> + * There is one hole in that idea: If a checkpoint occurred while we were
> + * writing the pages, it already missed fsyncing the pages we had written
> + * before the checkpoint started. A crash later on would replay the WAL
> + * starting from the checkpoint, therefore it wouldn't replay our earlier
> + * WAL records. So if a checkpoint started after the bulk write, fsync
> + * the files now.
> + */
> + if (!SmgrIsTemp(bulkstate->smgr))
> + {
Shouldn't this be "if (bulkstate->use_wal)"? The GetRedoRecPtr()-based
decision is irrelevant to the !wal case. Either we don't need fsync at all
(TEMP or UNLOGGED) or smgrDoPendingSyncs() will do it (wal_level=minimal). I
don't see any functional problem, but this likely arranges for an unnecessary
sync when a checkpoint starts between mdcreate() and here. (The mdcreate()
sync may also be unnecessary, but that's longstanding.)
> + /*
> + * Prevent a checkpoint from starting between the GetRedoRecPtr() and
> + * smgrregistersync() calls.
> + */
> + Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
> + MyProc->delayChkptFlags |= DELAY_CHKPT_START;
> +
> + if (bulkstate->start_RedoRecPtr != GetRedoRecPtr())
> + {
> + /*
> + * A checkpoint occurred and it didn't know about our writes, so
> + * fsync() the relation ourselves.
> + */
> + MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
> + smgrimmedsync(bulkstate->smgr, bulkstate->forknum);
> + elog(DEBUG1, "flushed relation because a checkpoint occurred concurrently");
> + }
> + else
> + {
> + smgrregistersync(bulkstate->smgr, bulkstate->forknum);
> + MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
> + }
> + }
> +}
This is an elegant optimization.
pgsql-hackers by date: