Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > In a non-data_only dump, the order of the tables doesn't matter, because
> > the foreign keys are added at the very end. In parallel dumps, the
> > tables are in addition sorted by size, so the resultant order is
> > different from a single-threaded dump. This can be seen by comparing
> > the dumped TOCs of the defaults_dir_format and defaults_parallel cases.
> > But it all happens to pass the tests right now.
>
> I noticed that business about sorting the TOC by size yesterday.
> I think that's a completely bletcherous hack, and we ought to get
> rid of it in favor of keeping the TOC order the same between parallel
> and non-parallel cases, and instead doing size comparisons during
> parallel worker dispatch.
So instead of dumping things by the order of the TOC, we'll perform the
sorting later on before handing out jobs to workers? That seems alright
to me for the most part. One thing I do wonder about is if we should
also be sorting by tablespace and not just size, to try and maximize
throughput (that is, assign out parallel workers to each tablespace,
each going after the largest table in that tablespace, before coming
back around to assigning the next-largest file to the second worker on a
given tablespace, presuming we have more workers than tablespaces),
that's what we've seen works rather well in pgbackrest.
> However, at least for the directory-format case (which I think is the
> only one supported for parallel restore), we could make it compare the
> file sizes of the TABLE DATA items. That'd work pretty well as a proxy
> for both the amount of effort needed for table restore, and the amount
> of effort needed to build indexes on the tables afterwards.
Parallel restore also works w/ custom-format dumps.
Thanks!
Stephen