Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations - Mailing list pgsql-hackers
From | wenhui qiu |
---|---|
Subject | Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations |
Date | |
Msg-id | CAGjGUAJNCbKz28PcYcpwVqS3SMFjR+8-LCCJz0K=XsMQkE+weg@mail.gmail.com Whole thread Raw |
In response to | Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
|
List | pgsql-hackers |
HI Sébastien
You can check out the email subject:Trigger more frequent autovacuums of heavy insert tables , I think it can alleviate the problem
Thanks
On Sat, Feb 15, 2025 at 3:13 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2025-02-13 10:52:31 +0100, Sébastien wrote:
> Introduce an INSERT FROZEN feature to bypass vacuum processing for
> large-scale cold data imports, reducing the impact on system performance
> post-import. For large imports, migrations and major version upgrades.
> Business Use-case:
>
> When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily
> loaded server, we observed that the system struggled significantly weeks
> later when the autovacuum process had to freeze all the imported data
> pages. This led to severe performance degradation, requiring manual
> intervention to prioritize vacuum jobs to complete them as quickly as
> possible.
What version of postgres was this? What batch sizes do you need to support?
I.e. is all of this data inserted at once, or in steps?
As already discussed, it seems unlikely that we'll ever support INSERT FROZEN,
due to the potential of causing concurrent queries to give bogus answers. But
there's actually a lot we can do to improve this short of INSERT FROZEN.
The reason I asked for the version is that the behaviour would e.g. likely be
worse before autovacuum_vacuum_insert_scale_factor existed. We are working on
improvements around that in 18 too, ensuring that the gap between insert
triggered vacuums does not grow forever.
Several recent releases have also improved the situation around this in other
ways, e.g. by just making vacuuming faster and by avoiding doing redundant
work in more cases (by increasing relfrozenzid more aggressively).
We've also been talking about performing freezing during e.g. checkpoints, if
possible.
If you're inserting all the data in a single transaction however, it'll be
hard to improve most of this, because while that long long transaction runs,
we can't do anything that needs to know the transaction has finished. OTOH,
if it were a single transaction, you could already use COPY FREEZE.
A somewhat related issue is that bulk INSERTs, in contrast to COPY, currently
does not use the bulk-insert logic, leading the INSERT to cause a lot more WAL
to be emitted compared to inserting the same data via COPY.
> This issue is particularly critical during database *migrations* or *version
> upgrades*, where a full data reload is often necessary. Each time a major
> PostgreSQL upgrade occurs, users must reimport large datasets, leading to
> the same problem of vacuum storms post-import. An INSERT FROZEN feature
> would allow importing data that is known to be immutable, preventing
> unnecessary vacuum overhead and reducing system strain.
What are you using for such upgrades or migrations? I'd not expect INSERT to
be used, due to the overhead that has compared to COPY.
Greetings,
Andres Freund
pgsql-hackers by date: