Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations - Mailing list pgsql-hackers

From Sébastien
Subject Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Date
Msg-id CANtq+vSy1VY9hDo+epGHX4kz_Ot0QFb293mV03smSLysZ0t8mQ@mail.gmail.com
Whole thread Raw
In response to Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
List pgsql-hackers
Thank you for your answer TOM.
I'd like to have this option only for initial loading of huge amounts of data, where atomicity consistency is needed at all. Maybe an option on startup command just for initial import mode, would be nice.

I had huge problems on server 3 weeks after a 6 TB migration from other DB. I think it's sad to rewrite all data twice.

Le jeu. 13 févr. 2025 à 16:08, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Sébastien <bokanist@gmail.com> writes:
> Implementation details:

>    - A new INSERT FROZEN option could be introduced, similar to COPY FREEZE,
>    allowing direct insertion of tuples in a frozen state.
>    - This would likely require changes in heap storage logic to ensure
>    tuples are written with a frozen XID at insert time.
>    - Consideration should be given to transaction semantics and WAL logging
>    to ensure consistency and crash recovery integrity.

That last is exactly why this won't happen.  A frozen tuple would be
considered committed and visible the instant it appears in the table,
thus completely breaking both atomicity and integrity of the
transaction.

There has been work going on recently to reduce the impact of freezing
massive amounts of data by spreading the work more effectively [1].
I don't say that that particular commit has completely solved the
problem, but I think that continued effort in that direction is more
likely to yield usable results than what you're suggesting.

BTW, this might or might not be usable in your particular workflow,
but: there have long been some optimizations for data load into a
table created in the same transaction.  The idea there is that if the
transaction rolls back, the table will never have been visible to any
other transaction at all, so that maintaining atomicity/integrity of
its contents is moot.

                        regards, tom lane

[1] https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=052026c9b


--
Sébastien Caunes
+33 6 7 229 229 7

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Next
From: Pavel Borisov
Date:
Subject: Re: Get rid of WALBufMappingLock