Thread: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
[Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
One-line Summary:
Introduce an INSERT FROZEN
feature to bypass vacuum processing for large-scale cold data imports, reducing the impact on system performance post-import. For large imports, migrations and major version upgrades.
Business Use-case:
When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily loaded server, we observed that the system struggled significantly weeks later when the autovacuum process had to freeze all the imported data pages. This led to severe performance degradation, requiring manual intervention to prioritize vacuum jobs to complete them as quickly as possible.
This issue is particularly critical during database migrations or version upgrades, where a full data reload is often necessary. Each time a major PostgreSQL upgrade occurs, users must reimport large datasets, leading to the same problem of vacuum storms post-import. An INSERT FROZEN
feature would allow importing data that is known to be immutable, preventing unnecessary vacuum overhead and reducing system strain.
User impact with the change:
- Users importing large, cold datasets (initial loads, migrations, version upgrades) can mark them as "frozen" during insertion, so pages are directly marked as frozen.
- Reduced risk of autovacuum storms weeks after large imports.
- More predictable system performance post-import and post-upgrade.
- Avoid unnecessary rewriting of all pages after.
- Significant improvement for users who perform regular major version upgrades and need to reload data.
Implementation details:
- A new
INSERT FROZEN
option could be introduced, similar toCOPY FREEZE
, allowing direct insertion of tuples in a frozen state. - This would likely require changes in heap storage logic to ensure tuples are written with a frozen XID at insert time.
- Consideration should be given to transaction semantics and WAL logging to ensure consistency and crash recovery integrity.
Estimated Development Time:
Unknown (would require input from core developers to assess complexity). But I think it's not that much and pretty straightforward to implement experimentally. Then.
Opportunity Window Period:
...
Budget Money:
...
Contact Information:
Sébastien Caunes bokanist@gmail.com
Thank you for your attention.
+33 6 7 229 229 7
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
=?UTF-8?Q?S=C3=A9bastien?= <bokanist@gmail.com> writes: > Implementation details: > - A new INSERT FROZEN option could be introduced, similar to COPY FREEZE, > allowing direct insertion of tuples in a frozen state. > - This would likely require changes in heap storage logic to ensure > tuples are written with a frozen XID at insert time. > - Consideration should be given to transaction semantics and WAL logging > to ensure consistency and crash recovery integrity. That last is exactly why this won't happen. A frozen tuple would be considered committed and visible the instant it appears in the table, thus completely breaking both atomicity and integrity of the transaction. There has been work going on recently to reduce the impact of freezing massive amounts of data by spreading the work more effectively [1]. I don't say that that particular commit has completely solved the problem, but I think that continued effort in that direction is more likely to yield usable results than what you're suggesting. BTW, this might or might not be usable in your particular workflow, but: there have long been some optimizations for data load into a table created in the same transaction. The idea there is that if the transaction rolls back, the table will never have been visible to any other transaction at all, so that maintaining atomicity/integrity of its contents is moot. regards, tom lane [1] https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=052026c9b
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
This issue is particularly critical during database migrations or version upgrades, where a full data reload is often necessary. Each time a major PostgreSQL upgrade occurs, users must reimport large datasets, leading to the same problem of vacuum storms post import.
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
I'd like to have this option only for initial loading of huge amounts of data, where atomicity consistency is needed at all. Maybe an option on startup command just for initial import mode, would be nice.
I had huge problems on server 3 weeks after a 6 TB migration from other DB. I think it's sad to rewrite all data twice.
Sébastien <bokanist@gmail.com> writes:
> Implementation details:
> - A new INSERT FROZEN option could be introduced, similar to COPY FREEZE,
> allowing direct insertion of tuples in a frozen state.
> - This would likely require changes in heap storage logic to ensure
> tuples are written with a frozen XID at insert time.
> - Consideration should be given to transaction semantics and WAL logging
> to ensure consistency and crash recovery integrity.
That last is exactly why this won't happen. A frozen tuple would be
considered committed and visible the instant it appears in the table,
thus completely breaking both atomicity and integrity of the
transaction.
There has been work going on recently to reduce the impact of freezing
massive amounts of data by spreading the work more effectively [1].
I don't say that that particular commit has completely solved the
problem, but I think that continued effort in that direction is more
likely to yield usable results than what you're suggesting.
BTW, this might or might not be usable in your particular workflow,
but: there have long been some optimizations for data load into a
table created in the same transaction. The idea there is that if the
transaction rolls back, the table will never have been visible to any
other transaction at all, so that maintaining atomicity/integrity of
its contents is moot.
regards, tom lane
[1] https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=052026c9b
+33 6 7 229 229 7
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
I had huge problems on server 3 weeks after a 6 TB migration from other DB. I think it's sad to rewrite all data twice.
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Hi, On 2025-02-13 10:52:31 +0100, Sébastien wrote: > Introduce an INSERT FROZEN feature to bypass vacuum processing for > large-scale cold data imports, reducing the impact on system performance > post-import. For large imports, migrations and major version upgrades. > Business Use-case: > > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily > loaded server, we observed that the system struggled significantly weeks > later when the autovacuum process had to freeze all the imported data > pages. This led to severe performance degradation, requiring manual > intervention to prioritize vacuum jobs to complete them as quickly as > possible. What version of postgres was this? What batch sizes do you need to support? I.e. is all of this data inserted at once, or in steps? As already discussed, it seems unlikely that we'll ever support INSERT FROZEN, due to the potential of causing concurrent queries to give bogus answers. But there's actually a lot we can do to improve this short of INSERT FROZEN. The reason I asked for the version is that the behaviour would e.g. likely be worse before autovacuum_vacuum_insert_scale_factor existed. We are working on improvements around that in 18 too, ensuring that the gap between insert triggered vacuums does not grow forever. Several recent releases have also improved the situation around this in other ways, e.g. by just making vacuuming faster and by avoiding doing redundant work in more cases (by increasing relfrozenzid more aggressively). We've also been talking about performing freezing during e.g. checkpoints, if possible. If you're inserting all the data in a single transaction however, it'll be hard to improve most of this, because while that long long transaction runs, we can't do anything that needs to know the transaction has finished. OTOH, if it were a single transaction, you could already use COPY FREEZE. A somewhat related issue is that bulk INSERTs, in contrast to COPY, currently does not use the bulk-insert logic, leading the INSERT to cause a lot more WAL to be emitted compared to inserting the same data via COPY. > This issue is particularly critical during database *migrations* or *version > upgrades*, where a full data reload is often necessary. Each time a major > PostgreSQL upgrade occurs, users must reimport large datasets, leading to > the same problem of vacuum storms post-import. An INSERT FROZEN feature > would allow importing data that is known to be immutable, preventing > unnecessary vacuum overhead and reducing system strain. What are you using for such upgrades or migrations? I'd not expect INSERT to be used, due to the overhead that has compared to COPY. Greetings, Andres Freund
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Hi,
On 2025-02-13 10:52:31 +0100, Sébastien wrote:
> Introduce an INSERT FROZEN feature to bypass vacuum processing for
> large-scale cold data imports, reducing the impact on system performance
> post-import. For large imports, migrations and major version upgrades.
> Business Use-case:
>
> When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily
> loaded server, we observed that the system struggled significantly weeks
> later when the autovacuum process had to freeze all the imported data
> pages. This led to severe performance degradation, requiring manual
> intervention to prioritize vacuum jobs to complete them as quickly as
> possible.
What version of postgres was this? What batch sizes do you need to support?
I.e. is all of this data inserted at once, or in steps?
As already discussed, it seems unlikely that we'll ever support INSERT FROZEN,
due to the potential of causing concurrent queries to give bogus answers. But
there's actually a lot we can do to improve this short of INSERT FROZEN.
The reason I asked for the version is that the behaviour would e.g. likely be
worse before autovacuum_vacuum_insert_scale_factor existed. We are working on
improvements around that in 18 too, ensuring that the gap between insert
triggered vacuums does not grow forever.
Several recent releases have also improved the situation around this in other
ways, e.g. by just making vacuuming faster and by avoiding doing redundant
work in more cases (by increasing relfrozenzid more aggressively).
We've also been talking about performing freezing during e.g. checkpoints, if
possible.
If you're inserting all the data in a single transaction however, it'll be
hard to improve most of this, because while that long long transaction runs,
we can't do anything that needs to know the transaction has finished. OTOH,
if it were a single transaction, you could already use COPY FREEZE.
A somewhat related issue is that bulk INSERTs, in contrast to COPY, currently
does not use the bulk-insert logic, leading the INSERT to cause a lot more WAL
to be emitted compared to inserting the same data via COPY.
> This issue is particularly critical during database *migrations* or *version
> upgrades*, where a full data reload is often necessary. Each time a major
> PostgreSQL upgrade occurs, users must reimport large datasets, leading to
> the same problem of vacuum storms post-import. An INSERT FROZEN feature
> would allow importing data that is known to be immutable, preventing
> unnecessary vacuum overhead and reducing system strain.
What are you using for such upgrades or migrations? I'd not expect INSERT to
be used, due to the overhead that has compared to COPY.
Greetings,
Andres Freund
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
It's not a problem with heavy insert table but heavy delete. Also triggering more frequent autovacuum will not help because autovacuum does not delete recently dead tuples when a large and slow vacuum freeze operation older than their delete is still running in parallel. The solution was to increase the priority and speed of the vaccum freeze opeartion.
Anyway, there should be a way to insert freeze data other than copy that does not work with foreign tables. (INSERT into my_table select * from foreign_table)
HI SébastienYou can check out the email subject:Trigger more frequent autovacuums of heavy insert tables , I think it can alleviate the problemThanksOn Sat, Feb 15, 2025 at 3:13 AM Andres Freund <andres@anarazel.de> wrote:Hi,
On 2025-02-13 10:52:31 +0100, Sébastien wrote:
> Introduce an INSERT FROZEN feature to bypass vacuum processing for
> large-scale cold data imports, reducing the impact on system performance
> post-import. For large imports, migrations and major version upgrades.
> Business Use-case:
>
> When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily
> loaded server, we observed that the system struggled significantly weeks
> later when the autovacuum process had to freeze all the imported data
> pages. This led to severe performance degradation, requiring manual
> intervention to prioritize vacuum jobs to complete them as quickly as
> possible.
What version of postgres was this? What batch sizes do you need to support?
I.e. is all of this data inserted at once, or in steps?
As already discussed, it seems unlikely that we'll ever support INSERT FROZEN,
due to the potential of causing concurrent queries to give bogus answers. But
there's actually a lot we can do to improve this short of INSERT FROZEN.
The reason I asked for the version is that the behaviour would e.g. likely be
worse before autovacuum_vacuum_insert_scale_factor existed. We are working on
improvements around that in 18 too, ensuring that the gap between insert
triggered vacuums does not grow forever.
Several recent releases have also improved the situation around this in other
ways, e.g. by just making vacuuming faster and by avoiding doing redundant
work in more cases (by increasing relfrozenzid more aggressively).
We've also been talking about performing freezing during e.g. checkpoints, if
possible.
If you're inserting all the data in a single transaction however, it'll be
hard to improve most of this, because while that long long transaction runs,
we can't do anything that needs to know the transaction has finished. OTOH,
if it were a single transaction, you could already use COPY FREEZE.
A somewhat related issue is that bulk INSERTs, in contrast to COPY, currently
does not use the bulk-insert logic, leading the INSERT to cause a lot more WAL
to be emitted compared to inserting the same data via COPY.
> This issue is particularly critical during database *migrations* or *version
> upgrades*, where a full data reload is often necessary. Each time a major
> PostgreSQL upgrade occurs, users must reimport large datasets, leading to
> the same problem of vacuum storms post-import. An INSERT FROZEN feature
> would allow importing data that is known to be immutable, preventing
> unnecessary vacuum overhead and reducing system strain.
What are you using for such upgrades or migrations? I'd not expect INSERT to
be used, due to the overhead that has compared to COPY.
Greetings,
Andres Freund
+33 6 7 229 229 7
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Hello weuhui,
It's not a problem with heavy insert table but heavy delete. Also triggering more frequent autovacuum will not help because autovacuum does not delete recently dead tuples when a large and slow vacuum freeze operation older than their delete is still running in parallel. The solution was to increase the priority and speed of the vaccum freeze opeartion.
Anyway, there should be a way to insert freeze data other than copy that does not work with foreign tables. (INSERT into my_table select * from foreign_table)Le lun. 17 févr. 2025 à 09:46, wenhui qiu <qiuwenhuifx@gmail.com> a écrit :HI SébastienYou can check out the email subject:Trigger more frequent autovacuums of heavy insert tables , I think it can alleviate the problemThanksOn Sat, Feb 15, 2025 at 3:13 AM Andres Freund <andres@anarazel.de> wrote:Hi,
On 2025-02-13 10:52:31 +0100, Sébastien wrote:
> Introduce an INSERT FROZEN feature to bypass vacuum processing for
> large-scale cold data imports, reducing the impact on system performance
> post-import. For large imports, migrations and major version upgrades.
> Business Use-case:
>
> When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily
> loaded server, we observed that the system struggled significantly weeks
> later when the autovacuum process had to freeze all the imported data
> pages. This led to severe performance degradation, requiring manual
> intervention to prioritize vacuum jobs to complete them as quickly as
> possible.
What version of postgres was this? What batch sizes do you need to support?
I.e. is all of this data inserted at once, or in steps?
As already discussed, it seems unlikely that we'll ever support INSERT FROZEN,
due to the potential of causing concurrent queries to give bogus answers. But
there's actually a lot we can do to improve this short of INSERT FROZEN.
The reason I asked for the version is that the behaviour would e.g. likely be
worse before autovacuum_vacuum_insert_scale_factor existed. We are working on
improvements around that in 18 too, ensuring that the gap between insert
triggered vacuums does not grow forever.
Several recent releases have also improved the situation around this in other
ways, e.g. by just making vacuuming faster and by avoiding doing redundant
work in more cases (by increasing relfrozenzid more aggressively).
We've also been talking about performing freezing during e.g. checkpoints, if
possible.
If you're inserting all the data in a single transaction however, it'll be
hard to improve most of this, because while that long long transaction runs,
we can't do anything that needs to know the transaction has finished. OTOH,
if it were a single transaction, you could already use COPY FREEZE.
A somewhat related issue is that bulk INSERTs, in contrast to COPY, currently
does not use the bulk-insert logic, leading the INSERT to cause a lot more WAL
to be emitted compared to inserting the same data via COPY.
> This issue is particularly critical during database *migrations* or *version
> upgrades*, where a full data reload is often necessary. Each time a major
> PostgreSQL upgrade occurs, users must reimport large datasets, leading to
> the same problem of vacuum storms post-import. An INSERT FROZEN feature
> would allow importing data that is known to be immutable, preventing
> unnecessary vacuum overhead and reducing system strain.
What are you using for such upgrades or migrations? I'd not expect INSERT to
be used, due to the overhead that has compared to COPY.
Greetings,
Andres Freund--Sébastien Caunes
+33 6 7 229 229 7
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Hello Sébastienthis case can be solved by xid64,but it seems like very few people are interested.But it seems to me that xid64 should be implemented as soon as possible.ThanksOn Mon, Feb 17, 2025 at 9:47 PM Sébastien <bokanist@gmail.com> wrote:Hello weuhui,
It's not a problem with heavy insert table but heavy delete. Also triggering more frequent autovacuum will not help because autovacuum does not delete recently dead tuples when a large and slow vacuum freeze operation older than their delete is still running in parallel. The solution was to increase the priority and speed of the vaccum freeze opeartion.
Anyway, there should be a way to insert freeze data other than copy that does not work with foreign tables. (INSERT into my_table select * from foreign_table)Le lun. 17 févr. 2025 à 09:46, wenhui qiu <qiuwenhuifx@gmail.com> a écrit :HI SébastienYou can check out the email subject:Trigger more frequent autovacuums of heavy insert tables , I think it can alleviate the problemThanksOn Sat, Feb 15, 2025 at 3:13 AM Andres Freund <andres@anarazel.de> wrote:Hi,
On 2025-02-13 10:52:31 +0100, Sébastien wrote:
> Introduce an INSERT FROZEN feature to bypass vacuum processing for
> large-scale cold data imports, reducing the impact on system performance
> post-import. For large imports, migrations and major version upgrades.
> Business Use-case:
>
> When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily
> loaded server, we observed that the system struggled significantly weeks
> later when the autovacuum process had to freeze all the imported data
> pages. This led to severe performance degradation, requiring manual
> intervention to prioritize vacuum jobs to complete them as quickly as
> possible.
What version of postgres was this? What batch sizes do you need to support?
I.e. is all of this data inserted at once, or in steps?
As already discussed, it seems unlikely that we'll ever support INSERT FROZEN,
due to the potential of causing concurrent queries to give bogus answers. But
there's actually a lot we can do to improve this short of INSERT FROZEN.
The reason I asked for the version is that the behaviour would e.g. likely be
worse before autovacuum_vacuum_insert_scale_factor existed. We are working on
improvements around that in 18 too, ensuring that the gap between insert
triggered vacuums does not grow forever.
Several recent releases have also improved the situation around this in other
ways, e.g. by just making vacuuming faster and by avoiding doing redundant
work in more cases (by increasing relfrozenzid more aggressively).
We've also been talking about performing freezing during e.g. checkpoints, if
possible.
If you're inserting all the data in a single transaction however, it'll be
hard to improve most of this, because while that long long transaction runs,
we can't do anything that needs to know the transaction has finished. OTOH,
if it were a single transaction, you could already use COPY FREEZE.
A somewhat related issue is that bulk INSERTs, in contrast to COPY, currently
does not use the bulk-insert logic, leading the INSERT to cause a lot more WAL
to be emitted compared to inserting the same data via COPY.
> This issue is particularly critical during database *migrations* or *version
> upgrades*, where a full data reload is often necessary. Each time a major
> PostgreSQL upgrade occurs, users must reimport large datasets, leading to
> the same problem of vacuum storms post-import. An INSERT FROZEN feature
> would allow importing data that is known to be immutable, preventing
> unnecessary vacuum overhead and reducing system strain.
What are you using for such upgrades or migrations? I'd not expect INSERT to
be used, due to the overhead that has compared to COPY.
Greetings,
Andres Freund--Sébastien Caunes
+33 6 7 229 229 7
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Sorry it won't work. It just delays the problem. But still the freeze procedure must rewrite all pages.
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
On Tue, Feb 18, 2025 at 9:17 AM Sébastien <bokanist@gmail.com> wrote:Sorry it won't work. It just delays the problem. But still the freeze procedure must rewrite all pages.Actually, a 64-bit transaction ID allows for quite a "delay" - like hundreds of millions of years at your current rate. :)(Yes, there are other reasons to vacuum, and other limits and problems would arise. You'd have 99 problems, but a vacuum freeze ain't one.)Cheers,Greg--Crunchy Data - https://www.crunchydata.comEnterprise Postgres Software Products & Tech Support
+33 6 7 229 229 7
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Ok you mean that xid64 will remove the need for freezing... it's a way to see things.Le mar. 18 févr. 2025 à 15:57, Greg Sabino Mullane <htamfids@gmail.com> a écrit :On Tue, Feb 18, 2025 at 9:17 AM Sébastien <bokanist@gmail.com> wrote:Sorry it won't work. It just delays the problem. But still the freeze procedure must rewrite all pages.Actually, a 64-bit transaction ID allows for quite a "delay" - like hundreds of millions of years at your current rate. :)(Yes, there are other reasons to vacuum, and other limits and problems would arise. You'd have 99 problems, but a vacuum freeze ain't one.)Cheers,Greg--Crunchy Data - https://www.crunchydata.comEnterprise Postgres Software Products & Tech Support--Sébastien Caunes
+33 6 7 229 229 7