Thread: Performance issues during pg_restore -j with big partitioned table

Performance issues during pg_restore -j with big partitioned table

From
Dimitrios Apostolou
Date:
Hello list.

My database includes one table with 1000 partitions, all of them rather
sizeable. I run:

   pg_restore -j12 --no-tablespaces --disable-triggers --exit-on-error --no-owner --no-privileges -n public -d newdb
custom_format_dump.pgdump

Right now after 24h of restore, I notice weird behaviour, so I have
several questions about it:

+ 11 postgres backend processes are sleeping as "TRUNCATE TABLE waiting".
   I see that they are waiting to issue a TRUNCATE for one of the
   partitions and then COPY data to it.  Checking the log I see that
   several partitions have already been copied finished, but many more
   are left to start.

   Why is a TRUNCATE needed at the start of a partition's COPY phase? I
   didn't issue a --clean on the command line (I don't need it as my
   database is newly created), and I don't see a mention of related
   TRUNCATE in the pg_restore manual.

+ 1 postgres backend process is doing:

   ALTER TABLE the_master_partitioned_table
     ADD CONSTRAINT ...
       FOREIGN KEY (columnX) REFERENCES another_table(columnX)

   According to my logs this started right after COPY DATA for
   another_table was finished. And apparently it has a lock on
   the_master_partitioned_table that all other TRUNCATE have to wait for.

   Is this a bug in the dependency resolution? Wouldn't it make sense for
   this to wait until all 1000 partitions have finished their COPY DATA
   phase?

+ Investigating why the above ALTER TABLE takes so long, I notice that it
   is issuing a lot of writes to the WAL.  Digging deeper shows a lot of
   time spent in SetHintBits().  Is there a way to avoid that in a clean
   pg_restore?


Thanks in advance,
Dimitris




On 4/2/25 10:32 AM, Dimitrios Apostolou wrote:
> Hello list.
> 
> My database includes one table with 1000 partitions, all of them rather
> sizeable. I run:
> 
>    pg_restore -j12 --no-tablespaces --disable-triggers --exit-on-error 
> --no-owner --no-privileges -n public -d newdb custom_format_dump.pgdump
> 
> Right now after 24h of restore, I notice weird behaviour, so I have
> several questions about it:
> 
> + 11 postgres backend processes are sleeping as "TRUNCATE TABLE waiting".
>    I see that they are waiting to issue a TRUNCATE for one of the
>    partitions and then COPY data to it.  Checking the log I see that
>    several partitions have already been copied finished, but many more
>    are left to start.
> 
>    Why is a TRUNCATE needed at the start of a partition's COPY phase? I
>    didn't issue a --clean on the command line (I don't need it as my
>    database is newly created), and I don't see a mention of related
>    TRUNCATE in the pg_restore manual.

--clean will drop the object entirely not TRUNCATE.

I'm guessing that this is being done by you per:

https://www.postgresql.org/message-id/53760c70-4a87-a453-9e02-57abc9cb2e54%40gmx.net

"After each failed attempt, I need to issue a TRUNCATE table1,table2,...
before I try again. "

> 

> 
> Thanks in advance,
> Dimitris
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




On 4/2/25 10:39 AM, Adrian Klaver wrote:
> 

> --clean will drop the object entirely not TRUNCATE.
> 
> I'm guessing that this is being done by you per:
> 
> https://www.postgresql.org/message-id/53760c70-4a87-a453-9e02-57abc9cb2e54%40gmx.net
> 
> "After each failed attempt, I need to issue a TRUNCATE table1,table2,...
> before I try again. "

Oops, forgot to engage brain.

 From pg_backup_archiver.c:

* In parallel restore, if we created the table earlier in
* this run (so that we know it is empty) and we are not
* restoring a load-via-partition-root data item then we
* wrap the COPY in a transaction and precede it with a
* TRUNCATE.  If wal_level is set to minimal this prevents
* WAL-logging the COPY.  This obtains a speedup similar
* to that from using single_txn mode in non-parallel
* restores.
*
* We mustn't do this for load-via-partition-root cases
* because some data might get moved across partition
* boundaries, risking deadlock and/or loss of previously
* loaded data.  (We assume that all partitions of a
* partitioned table will be treated the same way.)

> 
>>
> 
>>
>> Thanks in advance,
>> Dimitris
>>
>>
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Performance issues during pg_restore -j with big partitioned table

From
Dimitrios Apostolou
Date:
On Wed, 2 Apr 2025, Adrian Klaver wrote:

>
>
> On 4/2/25 10:32 AM, Dimitrios Apostolou wrote:
>>  Hello list.
>>
>>  My database includes one table with 1000 partitions, all of them rather
>>  sizeable. I run:
>>
>>    pg_restore -j12 --no-tablespaces --disable-triggers --exit-on-error
>>  --no-owner --no-privileges -n public -d newdb custom_format_dump.pgdump
>>
>>  Right now after 24h of restore, I notice weird behaviour, so I have
>>  several questions about it:
>>
>>  + 11 postgres backend processes are sleeping as "TRUNCATE TABLE waiting".
>>     I see that they are waiting to issue a TRUNCATE for one of the
>>     partitions and then COPY data to it.  Checking the log I see that
>>     several partitions have already been copied finished, but many more
>>     are left to start.
>>
>>     Why is a TRUNCATE needed at the start of a partition's COPY phase? I
>>     didn't issue a --clean on the command line (I don't need it as my
>>     database is newly created), and I don't see a mention of related
>>     TRUNCATE in the pg_restore manual.
>
> --clean will drop the object entirely not TRUNCATE.
>
> I'm guessing that this is being done by you per:
>
> https://www.postgresql.org/message-id/53760c70-4a87-a453-9e02-57abc9cb2e54%40gmx.net
>
> "After each failed attempt, I need to issue a TRUNCATE table1,table2,...
> before I try again. "

Thanks Adrian. I'm now testing restore without --data-only. All I'm doing
prior to the above pg_restore command is "createdb -T template0 newdb".
It's possible though that I'm missing something here, the whole thing is
way more complicated than I expected...


Dimitris

Re: Performance issues during pg_restore -j with big partitioned table

From
Dimitrios Apostolou
Date:
On Wed, 2 Apr 2025, Adrian Klaver wrote:
>
>
> On 4/2/25 10:39 AM, Adrian Klaver wrote:
>>
>
>>  --clean will drop the object entirely not TRUNCATE.
>>
>>  I'm guessing that this is being done by you per:
>>
>>  https://www.postgresql.org/message-id/53760c70-4a87-a453-9e02-57abc9cb2e54%40gmx.net
>>
>>  "After each failed attempt, I need to issue a TRUNCATE table1,table2,...
>>  before I try again. "
>
> Oops, forgot to engage brain.
>
> From pg_backup_archiver.c:
>
> * In parallel restore, if we created the table earlier in
> * this run (so that we know it is empty) and we are not
> * restoring a load-via-partition-root data item then we
> * wrap the COPY in a transaction and precede it with a
> * TRUNCATE.  If wal_level is set to minimal this prevents
> * WAL-logging the COPY.  This obtains a speedup similar
> * to that from using single_txn mode in non-parallel
> * restores.

This makes sense. It creates the table earlier, and then truncates it just
before copying data into it. I wonder if this really circumvents the WAL
since I don't have --single-transaction  (incompatible to -j).


Dimitris



Re: Performance issues during pg_restore -j with big partitioned table

From
Dimitrios Apostolou
Date:
On Wed, 2 Apr 2025, Dimitrios Apostolou wrote:

> Hello list.
>
> My database includes one table with 1000 partitions, all of them rather

I was not clear here: my database dump has all that, and the database is
brand new and empty.

> sizeable. I run:
>
>  pg_restore -j12 --no-tablespaces --disable-triggers --exit-on-error
> --no-owner --no-privileges -n public -d newdb custom_format_dump.pgdump
>
> Right now after 24h of restore, I notice weird behaviour, so I have several
> questions about it:
>
> + 11 postgres backend processes are sleeping as "TRUNCATE TABLE waiting".
>   I see that they are waiting to issue a TRUNCATE for one of the
>   partitions and then COPY data to it.  Checking the log I see that
>   several partitions have already been copied finished, but many more
>   are left to start.
>
>   Why is a TRUNCATE needed at the start of a partition's COPY phase? I
>   didn't issue a --clean on the command line (I don't need it as my
>   database is newly created), and I don't see a mention of related
>   TRUNCATE in the pg_restore manual.
>
> + 1 postgres backend process is doing:
>
>   ALTER TABLE the_master_partitioned_table
>     ADD CONSTRAINT ...
>       FOREIGN KEY (columnX) REFERENCES another_table(columnX)
>
>   According to my logs this started right after COPY DATA for
>   another_table was finished. And apparently it has a lock on
>   the_master_partitioned_table that all other TRUNCATE have to wait for.
>
>   Is this a bug in the dependency resolution? Wouldn't it make sense for
>   this to wait until all 1000 partitions have finished their COPY DATA
>   phase?

Trying again, pg_restore exited with error after almost 24h:

pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 8904; 2606 16529 CONSTRAINT test_runs_raw test_runs_raw_partitioned_pkey
pg_restore: error: could not execute query: ERROR:  deadlock detected
DETAIL:  Process 465409 waits for AccessExclusiveLock on relation 44437 of database 44090; blocked by process 465408.
Process 465408 waits for AccessShareLock on relation 44383 of database 44090; blocked by process 465409.
HINT:  See server log for query details.

From the logs I see that:

+ Process 465409 waits for AccessExclusiveLock on relation 44437 of database 44090; blocked by process 465408.
   --> 44437 is test_runs_raw__part_max10120k (a single partition)
+ Process 465408 waits for AccessShareLock     on relation 44383 of database 44090; blocked by process 465409.
   --> 44383 is test_runs_raw  (the master partitioned table)

Process 465409:
   ALTER TABLE ONLY public.test_runs_raw
     ADD CONSTRAINT test_runs_raw_partitioned_pkey PRIMARY KEY (workitem_n, run_n);

Process 465408:
   COPY public.test_runs_raw__part_max10120k(...) FROM stdin;


Bug? This happened on a postgres compiled from last week's master branch.
The dump I'm trying to restore is from postgres 17.4.


Thanks
Dimitris




On 4/4/25 06:13, Dimitrios Apostolou wrote:
> On Wed, 2 Apr 2025, Dimitrios Apostolou wrote:
> 

> Bug? This happened on a postgres compiled from last week's master branch.

Are you talking about the dev version?

> The dump I'm trying to restore is from postgres 17.4.
> 
> 
> Thanks
> Dimitris
> 
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Performance issues during pg_restore -j with big partitioned table

From
Dimitrios Apostolou
Date:
On Fri, 4 Apr 2025, Adrian Klaver wrote:

> On 4/4/25 06:13, Dimitrios Apostolou wrote:
>>  On Wed, 2 Apr 2025, Dimitrios Apostolou wrote:
>>
>
>>  Bug? This happened on a postgres compiled from last week's master branch.
>
> Are you talking about the dev version?

In this thread, yes. My mistake I didn't mention it from the start. I
compiled it in order to see if the problems I mentioned in my previous
thread were fixed.


Dimitris




On 4/4/25 08:09, Dimitrios Apostolou wrote:
> On Fri, 4 Apr 2025, Adrian Klaver wrote:
> 
>> On 4/4/25 06:13, Dimitrios Apostolou wrote:
>>>  On Wed, 2 Apr 2025, Dimitrios Apostolou wrote:
>>>
>>
>>>  Bug? This happened on a postgres compiled from last week's master 
>>> branch.
>>
>> Are you talking about the dev version?
> 
> In this thread, yes. My mistake I didn't mention it from the start. I
> compiled it in order to see if the problems I mentioned in my previous
> thread were fixed.

Since it still has not established that an actual problem exists in 
previous versions, this is premature. Especially as dev has not even 
reached alpha AFAIK.

To get at a solution more detailed information is needed. I would also 
suggest testing using something less the 1000 large partitions.

Information needed for any given report:

1) Postgres version being dumped from.

2) The complete pg_dump command.

3) The Postgres version being restored to.

4) The complete pg_restore command.

5) The error messages.

Also have you tried a schema only dump/restore?

> 
> 
> Dimitris
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




On Wed, Apr 2, 2025 at 1:32 PM Dimitrios Apostolou <jimis@gmx.net> wrote:
Hello list.

My database includes one table with 1000 partitions, all of them rather
sizeable. I run:

   pg_restore -j12 --no-tablespaces --disable-triggers --exit-on-error --no-owner --no-privileges -n public -d newdb custom_format_dump.pgdump

Right now after 24h of restore, I notice weird behaviour, so I have
several questions about it:

+ 11 postgres backend processes are sleeping as "TRUNCATE TABLE waiting".
   I see that they are waiting to issue a TRUNCATE for one of the
   partitions and then COPY data to it.  Checking the log I see that
   several partitions have already been copied finished, but many more
   are left to start.

   Why is a TRUNCATE needed at the start of a partition's COPY phase? I
   didn't issue a --clean on the command line (I don't need it as my
   database is newly created), and I don't see a mention of related
   TRUNCATE in the pg_restore manual.

TRUNCATE statements inside of "toc.dat" files?  I'm skeptical.

Are you maybe doing something else in that database besides pg_restore?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

On 4/4/25 9:45 AM, Ron Johnson wrote:

> 
> TRUNCATE statements inside of "toc.dat" files?  I'm skeptical.

See my post here:

https://www.postgresql.org/message-id/7be2dcc6-3ba4-4e3f-a154-8d13d816aa9b%40aklaver.com



-- 
Adrian Klaver
adrian.klaver@aklaver.com