Re: Any work on better parallelization of pg_dump? - Mailing list pgsql-general

From Jehan-Guillaume de Rorthais
Subject Re: Any work on better parallelization of pg_dump?
Date
Msg-id 20160829183056.022f4c1d@firost
Whole thread Raw
In response to Re: Any work on better parallelization of pg_dump?  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Any work on better parallelization of pg_dump?
List pgsql-general
On Mon, 29 Aug 2016 13:13:17 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> Jehan-Guillaume de Rorthais wrote:
> > On Mon, 29 Aug 2016 13:38:03 +0200
> > hubert depesz lubaczewski <depesz@depesz.com> wrote:
> >
> > > Hi,
> > > we have rather uncommon case - DB with ~ 50GB of data, but this is
> > > spread across ~ 80000 tables.
> > >
> > > Running pg_dump -Fd -jxx dumps in parallel, but only data, and MOST of
> > > the time is spent on queries that run sequentially, and as far as I can
> > > tell, get schema of tables, and sequence values.
> > >
> > > This happens on Pg 9.5. Are there any plans to make getting schema
> > > faster for such cases? Either by parallelization, or at least by getting
> > > schema for all tables "at once", and having pg_dump "sort it out",
> > > instead of getting schema for each table separately?
>
> Depesz: I suggest you start coding ASAP.
>
> > Another issue I found in current implementation is how pg_restore deal with
> > PK. As it takes an exclusif lock on the table, it is executed alone before
> > indexes creation.
> >
> > Splitting the PK in unique index creation then the constraint creation might
> > save a lot of time as other index can be built during the PK creation.
>
> Yeah.  I recall there being some stupid limitation in ALTER TABLE .. ADD
> CONSTRAINT USING INDEX to create a primary key from a previously
> existing unique index, which would be very good to fix (I don't recall
> what it was, but it was something infuriatingly silly).

Could you elaborate? I already had to implement some custom scripts to
restore some tables using this method. The scripts were using psql and
"xargs -P" to restore the indexes and the PK outside of pg_restore.

> I suggest you start coding that ASAP.

I did start, but with no success so far as the code in pg_dump/pg_restore is
quite obscure at a first look. The few attempt I did to catch the PK creation
statement and split it in 2 distincts statements failed to be processed in
parallel IIRC. I probably dropped the patch in the meantime though.


pgsql-general by date:

Previous
From: hubert depesz lubaczewski
Date:
Subject: Re: Any work on better parallelization of pg_dump?
Next
From: Merlin Moncure
Date:
Subject: Re: a column definition list is required for functions returning "record"