Re: OOM killer while pg_restore - Mailing list pgsql-performance

From Ranier Vilela
Subject Re: OOM killer while pg_restore
Date
Msg-id CAEudQArWN1DdndHCpLHqw-SDqFfu4M_fQas9Z1UtaE5oxR9ffQ@mail.gmail.com
Whole thread Raw
In response to Re: OOM killer while pg_restore  (Marc Rechté <marc4@rechte.fr>)
List pgsql-performance
Em qui., 3 de mar. de 2022 às 15:32, Marc Rechté <marc4@rechte.fr> escreveu:
Le 03/03/2022 à 16:31, Tom Lane a écrit :
> Marc Rechté <marc4@rechte.fr> writes:
>> We have a pg_restore which fails due to RAM over-consumption of the
>> corresponding PG backend, which ends-up with OOM killer.
>> The table has one PK, one index, and 3 FK constraints, active while
>> restoring.
>> The dump contains over 200M rows for that table and is in custom format,
>> which corresponds to 37 GB of total relation size in the original DB.
> The FKs would result in queueing row trigger events, which would occupy
> some memory.  But those should only need ~12 bytes per FK per row,
> which works out to less than 10GB for this number of rows, so it may
> be that you've hit something else that we would consider a leak.
>
> Does memory consumption hold steady if you drop the FK constraints?
>
> If not, as others have noted, we'd need more info to investigate
> this.  The leak is probably independent of the specific data in
> the table, so maybe you could make a small self-contained example
> using a script to generate dummy data.
>
>                       regards, tom lane
>
>
Actually the number of rows is 232735712.

Accordingly the RAM consumption would be x12 x3 = 7.8 GiB.

This is close to the 8,1g I reported earlier (actually it was closer to
7.8 GB, due to GiB vs. GB confusion).

So there is no memory leak.

It took 16 hours on my box to reach that RAM consumption, and then the
COPY failed when checking the first FK (as the referenced table was empty).

I dropped the FK, index, and 3 FK constraints and started over the
pg_restore:

11 minutes to load the table (I did not have time to note RAM consumption)

I then created the PK and index:

24 minutes

For FK, I don't know because the referenced table are empty (but I'll be
able to test next week, if deemed necessary).

16 hours vs. 35 minutes to reach the same state.
Maybe it's out of reach, but one way to help Postgres developers fix this
is to provide Flame Graphs [1] based on these slow operations.
For confidentiality and privacy reasons, the data is out of reach.

My 2c here.

regards,
Ranier Vilela

pgsql-performance by date:

Previous
From: Marc Rechté
Date:
Subject: Re: OOM killer while pg_restore
Next
From: Luiz Felipph
Date:
Subject: Optimal configuration for server