Thread: To make pg_dump and pg_restore parallel in processing limited number of LOs

Hi team,

Hope you are all doing well.

Recently I have encountered a scenario that a user need to dump/restore a database with 1k ~ 2k large objects. In both dump and restore process, parallel mode was used with 10 processes. However, in both dump and restore processes, it seems only 1 process was used for LO.

Checking further on this issue, it seems that from PostgreSQL 17, there was a change on function getLOs. In function getLOs, blob information would be ordered by owner/ACL. For each same owner/ACL appear pair, we will group them in a ArchiveEntry for every MAX_BLOBS_PER_ARCHIVE_ENTRY entry. This brings parallelism in dump/restore if a user has millions of LOs.

However, if a user only has a limited number of LOs, like 1k, which seems sensible as LOs should be large. In this scenario, there would be only 1 process work. Therefore, I'm proposing a change. Instead of using a fixed number to group LOs with same owner/ACL pair, we can use a SQL query to distribute each pair into a fixed number of batches. For each batch, it would be assigned an ArchiveEntry. So, the workload for each pair could be distributed into processes even if there are only few numbers of LO. 

For the fixed batch number, 50 seems to be a sensible value, as normal user would not use more threads than 50 in a dump/restore process.

I have tested in a cloud VM. For a database with 600 LOs (24GB), before this patch, it would take
    kaifan@kaifanvm:~/test$ time pg_dump -Fd -d test -f dump2 -j 10
    real    23m38.274s
    user    12m24.477s
    sys     0m50.456s

After the patch, it would take 
    kaifan@kaifanvm:~/test$ time pg_dump -Fd -d test -f dump3 -j 10
    real    7m50.295s
    user    16m55.940s
    sys     1m12.640s

As multiple processes are used. May I know if you have any thought on this?

Kai Fan

Attachment
fkfk000 <fkfk000@126.com> writes:
> However, if a user only has a limited number of LOs, like 1k, which seems sensible as LOs should be large. In this
scenario,there would be only 1 process work. Therefore, I'm proposing a change. Instead of using a fixed number to
groupLOs with same owner/ACL pair, we can use a SQL query to distribute each pair into a fixed number of batches. For
eachbatch, it would be assigned an ArchiveEntry. So, the workload for each pair could be distributed into processes
evenif there are only few numbers of LO.  

I do not care for this idea.  I think this behavior will seem
entirely random to most users.  Also, you appear not to be thinking
at all about what will happen with huge numbers (millions) of blobs.
Forcing them all into a relatively small number of TOC entries will
break exactly the same cases that we intended to fix by breaking them
up into multiple TOC entries.

I'd rather do what's speculated in the existing comment:

 * At some point we might want to make this user-controllable, but for now
 * a hard-wired setting will suffice.

That would in particular allow people to split things up as finely
as one blob per TOC entry, which would be useful for selective-restore
purposes.

            regards, tom lane