Thread: Improvements in pg_dump/pg_restore toc format and performances

Improvements in pg_dump/pg_restore toc format and performances

From
Pierre Ducroquet
Date:
Hi

Following the thread "Inefficiency in parallel pg_restore with many tables", I 
started digging into why the toc.dat files are that big and where time is spent 
when parsing them.

I ended up writing several patches that shaved some time for pg_restore -l, 
and reduced the toc.dat size.

First patch is "finishing" the job of removing has oids support. When this 
support was removed, instead of dropping the field from the dumps and 
increasing the dump versions, the field was kept as is. This field stores a 
boolean as a string, "true" or "false". This is not free, and requires 10 
bytes per toc entry.

The second patch removes calls to sscanf and replaces them with strtoul. This 
was the biggest speedup for pg_restore -l.

The third patch changes the dump format further to remove these strtoul calls 
and store the integers as is instead.

The fourth patch is dirtier and does more changes to the dump format. Instead 
of storing the owner, tablespace, table access method and schema of each 
object as a string, pg_dump builds an array of these, stores them at the 
beginning of the file and replaces the strings with integer fields in the dump. 
This reduces the file size further, and removes a lot of calls to ReadStr, thus 
saving quite some time.

Toc has 453999 entries.

Patch    Toc size    Dump -s duration    pg_restore -l duration
HEAD    214M    23.1s    1.27s
#1 (has oid)    210M    22.9s    1.26s
#2 (scanf)    210M    22.9s    1.07s
#3 (no strtoul)    202M    22.8s    0.94s
#4 (string list)    181M    23.1s    0.87s

Patch four is likely to require more changes. I don't know PostgreSQL code 
enough to do better than calling pgmalloc/pgrealloc and maintaining a char** 
manually, I guess there are structs and functions that do that in a better 
way. And the location of string tables in the file and in the structures is 
probably not acceptable, I suppose these should go to the toc header instead.

I still submit these for comments and first review.

Best regards

 Pierre Ducroquet

Attachment

Re: Improvements in pg_dump/pg_restore toc format and performances

From
Nathan Bossart
Date:
On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> I ended up writing several patches that shaved some time for pg_restore -l, 
> and reduced the toc.dat size.

I've only just started taking a look at these patches, and I intend to do a
more thorough review in the hopefully-not-too-distant future.

> First patch is "finishing" the job of removing has oids support. When this 
> support was removed, instead of dropping the field from the dumps and 
> increasing the dump versions, the field was kept as is. This field stores a 
> boolean as a string, "true" or "false". This is not free, and requires 10 
> bytes per toc entry.

This sounds reasonable to me.  I wonder why this wasn't done when WITH OIDS
was removed in v12.

> The second patch removes calls to sscanf and replaces them with strtoul. This 
> was the biggest speedup for pg_restore -l.

Nice.

> The third patch changes the dump format further to remove these strtoul calls 
> and store the integers as is instead.

Do we need to worry about endianness here?

> The fourth patch is dirtier and does more changes to the dump format. Instead 
> of storing the owner, tablespace, table access method and schema of each 
> object as a string, pg_dump builds an array of these, stores them at the 
> beginning of the file and replaces the strings with integer fields in the dump. 
> This reduces the file size further, and removes a lot of calls to ReadStr, thus 
> saving quite some time.

This sounds promising.

> Patch    Toc size    Dump -s duration    pg_restore -l duration
> HEAD    214M    23.1s    1.27s
> #1 (has oid)    210M    22.9s    1.26s
> #2 (scanf)    210M    22.9s    1.07s
> #3 (no strtoul)    202M    22.8s    0.94s
> #4 (string list)    181M    23.1s    0.87s

At a glance, the size improvements in 0004 look the most interesting to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Improvements in pg_dump/pg_restore toc format and performances

From
Nathan Bossart
Date:
On Mon, Sep 18, 2023 at 02:52:47PM -0700, Nathan Bossart wrote:
> On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
>> I ended up writing several patches that shaved some time for pg_restore -l, 
>> and reduced the toc.dat size.
> 
> I've only just started taking a look at these patches, and I intend to do a
> more thorough review in the hopefully-not-too-distant future.

Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this
to waiting-on-author.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Improvements in pg_dump/pg_restore toc format and performances

From
Pierre Ducroquet
Date:
On Monday, September 18, 2023 11:52:47 PM CEST Nathan Bossart wrote:
> On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> > I ended up writing several patches that shaved some time for pg_restore
> > -l,
> > and reduced the toc.dat size.
> 
> I've only just started taking a look at these patches, and I intend to do a
> more thorough review in the hopefully-not-too-distant future.

Thank you very much.

> Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this
> to waiting-on-author.

I did not notice anything running meson test -v, I'll look further into it in 
the next days.

> > First patch is "finishing" the job of removing has oids support. When this
> > support was removed, instead of dropping the field from the dumps and
> > increasing the dump versions, the field was kept as is. This field stores
> > a
> > boolean as a string, "true" or "false". This is not free, and requires 10
> > bytes per toc entry.
> 
> This sounds reasonable to me.  I wonder why this wasn't done when WITH OIDS
> was removed in v12.

I suppose it is an oversight, or not wanting to increase the dump version for 
no reason.

> > The second patch removes calls to sscanf and replaces them with strtoul.
> > This was the biggest speedup for pg_restore -l.
> 
> Nice.
> 
> > The third patch changes the dump format further to remove these strtoul
> > calls and store the integers as is instead.
> 
> Do we need to worry about endianness here?

I used the ReadInt/WriteInt functions already defined in pg_dump that take care 
of this issue, so there should be no need to worry.

> > The fourth patch is dirtier and does more changes to the dump format.
> > Instead of storing the owner, tablespace, table access method and schema
> > of each object as a string, pg_dump builds an array of these, stores them
> > at the beginning of the file and replaces the strings with integer fields
> > in the dump. This reduces the file size further, and removes a lot of
> > calls to ReadStr, thus saving quite some time.
> 
> This sounds promising.
> 
> > Patch    Toc size    Dump -s duration    pg_restore -l duration
> > HEAD    214M    23.1s    1.27s
> > #1 (has oid)    210M    22.9s    1.26s
> > #2 (scanf)    210M    22.9s    1.07s
> > #3 (no strtoul)    202M    22.8s    0.94s
> > #4 (string list)    181M    23.1s    0.87s
> 
> At a glance, the size improvements in 0004 look the most interesting to me.

Yes it is, and the speed benefits are interesting too (at least for my usecase)







Re: Improvements in pg_dump/pg_restore toc format and performances

From
Pierre Ducroquet
Date:
On Monday, September 18, 2023 11:54:42 PM CEST Nathan Bossart wrote:
> On Mon, Sep 18, 2023 at 02:52:47PM -0700, Nathan Bossart wrote:
> > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> >> I ended up writing several patches that shaved some time for pg_restore
> >> -l,
> >> and reduced the toc.dat size.
> > 
> > I've only just started taking a look at these patches, and I intend to do
> > a
> > more thorough review in the hopefully-not-too-distant future.
> 
> Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this
> to waiting-on-author.

FYI, the failures are related to patch 0004, I said it was dirty, but it was 
apparently an understatement. Patches 0001 to 0003 don't exhibit any 
regression.





Re: Improvements in pg_dump/pg_restore toc format and performances

From
Pierre Ducroquet
Date:
On Monday, September 18, 2023 11:54:42 PM CEST Nathan Bossart wrote:
> On Mon, Sep 18, 2023 at 02:52:47PM -0700, Nathan Bossart wrote:
> > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> >> I ended up writing several patches that shaved some time for pg_restore
> >> -l,
> >> and reduced the toc.dat size.
> > 
> > I've only just started taking a look at these patches, and I intend to do
> > a
> > more thorough review in the hopefully-not-too-distant future.
> 
> Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this
> to waiting-on-author.

Attached updated patches fix this regression, I'm sorry I missed that.


Attachment

Re: Improvements in pg_dump/pg_restore toc format and performances

From
Nathan Bossart
Date:
On Tue, Sep 19, 2023 at 12:15:55PM +0200, Pierre Ducroquet wrote:
> Attached updated patches fix this regression, I'm sorry I missed that.

Thanks for the new patches.  0001 and 0002 look reasonable to me.  This is
a nitpick, but we might want to use atooid() in 0002, which is just
shorthand for the strtoul() call you are using.

> -        WriteStr(AH, NULL);        /* Terminate List */
> +        WriteInt(AH, -1);        /* Terminate List */

I think we need to be cautious about using WriteInt and ReadInt here.  OIDs
are unsigned, so we probably want to use InvalidOid (0) instead.

> +                if (AH->version >= K_VERS_1_16)
>                  {
> -                    depSize *= 2;
> -                    deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize);
> +                    depId = ReadInt(AH);
> +                    if (depId == -1)
> +                        break;        /* end of list */
> +                    if (depIdx >= depSize)
> +                    {
> +                        depSize *= 2;
> +                        deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize);
> +                    }
> +                    deps[depIdx] = depId;
> +                }
> +                else
> +                {
> +                    tmp = ReadStr(AH);
> +                    if (!tmp)
> +                        break;        /* end of list */
> +                    if (depIdx >= depSize)
> +                    {
> +                        depSize *= 2;
> +                        deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize);
> +                    }
> +                    deps[depIdx] = strtoul(tmp, NULL, 10);
> +                    free(tmp);
>                  }

I would suggest making the resizing logic common.

    if (AH->version >= K_VERS_1_16)
    {
        depId = ReadInt(AH);
        if (depId == InvalidOid)
            break;        /* end of list */
    }
    else
    {
        tmp = ReadStr(AH);
        if (!tmp)
            break;        /* end of list */
        depId = strtoul(tmp, NULL, 10);
        free(tmp);
    }

    if (depIdx >= depSize)
    {
        depSize *= 2;
        deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize);
    }
    deps[depIdx] = depId;

Also, can we make depId more locally scoped?

I have yet to look into 0004 still.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Improvements in pg_dump/pg_restore toc format and performances

From
vignesh C
Date:
On Tue, 19 Sept 2023 at 15:46, Pierre Ducroquet <p.psql@pinaraf.info> wrote:
>
> On Monday, September 18, 2023 11:54:42 PM CEST Nathan Bossart wrote:
> > On Mon, Sep 18, 2023 at 02:52:47PM -0700, Nathan Bossart wrote:
> > > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> > >> I ended up writing several patches that shaved some time for pg_restore
> > >> -l,
> > >> and reduced the toc.dat size.
> > >
> > > I've only just started taking a look at these patches, and I intend to do
> > > a
> > > more thorough review in the hopefully-not-too-distant future.
> >
> > Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this
> > to waiting-on-author.
>
> Attached updated patches fix this regression, I'm sorry I missed that.

Few comments:
1) These printf statements are not required:
+       /* write the list of am */
+       printf("%d tableams to save\n", tableam_count);
+       WriteInt(AH, tableam_count);
+       for (i = 0 ; i < tableam_count ; i++)
+       {
+               printf("%d is %s\n", i, tableams[i]);
+               WriteStr(AH, tableams[i]);
+       }
+
+       /* write the list of namespaces */
+       printf("%d namespaces to save\n", namespace_count);
+       WriteInt(AH, namespace_count);
+       for (i = 0 ; i < namespace_count ; i++)
+       {
+               printf("%d is %s\n", i, namespaces[i]);
+               WriteStr(AH, namespaces[i]);
+       }
+
+       /* write the list of owners */
+       printf("%d owners to save\n", owner_count);

2) We generally use pg_malloc in client tools, we should change palloc
to pg_malloc:
+       /* prepare dynamic arrays */
+       tableams = palloc(sizeof(char*) * 1);
+       tableams[0] = NULL;
+       tableam_count = 0;
+       namespaces = palloc(sizeof(char*) * 1);
+       namespaces[0] = NULL;
+       namespace_count = 0;
+       owners = palloc(sizeof(char*) * 1);
+       owners[0] = NULL;
+       owner_count = 0;
+       tablespaces = palloc(sizeof(char*) * 1);

3) This similar code is repeated few times, will it be possible to do
it in a common function:
+                       if (namespace_count < 128)
+                       {
+                               te->namespace_idx = AH->ReadBytePtr(AH);
+                               invalid_entry = 255;
+                       }
+                       else
+                       {
+                               te->namespace_idx = ReadInt(AH);
+                               invalid_entry = -1;
+                       }
+                       if (te->namespace_idx == invalid_entry)
+                               te->namespace = "";
+                       else
+                               te->namespace = namespaces[te->namespace_idx];

4) Can the initialization of tableam_count, namespace_count,
owner_count and tablespace_count be done at declaration and these
initialization code can be removed:
+       /* prepare dynamic arrays */
+       tableams = palloc(sizeof(char*) * 1);
+       tableams[0] = NULL;
+       tableam_count = 0;
+       namespaces = palloc(sizeof(char*) * 1);
+       namespaces[0] = NULL;
+       namespace_count = 0;
+       owners = palloc(sizeof(char*) * 1);
+       owners[0] = NULL;
+       owner_count = 0;
+       tablespaces = palloc(sizeof(char*) * 1);
+       tablespaces[0] = NULL;
+       tablespace_count = 0;

4) There are some whitespace issues in the patch:
Applying: move static strings to arrays at beginning
.git/rebase-apply/patch:24: trailing whitespace.
.git/rebase-apply/patch:128: trailing whitespace.
.git/rebase-apply/patch:226: trailing whitespace.
.git/rebase-apply/patch:232: trailing whitespace.
warning: 4 lines add whitespace errors.

Regards,
Vignesh



Re: Improvements in pg_dump/pg_restore toc format and performances

From
Nathan Bossart
Date:
On Tue, Oct 03, 2023 at 03:17:57PM +0530, vignesh C wrote:
> Few comments:

Pierre, do you plan to submit a new revision of this patch set for the
November commitfest?  If not, the commitfest entry may be marked as
returned-with-feedback soon.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Improvements in pg_dump/pg_restore toc format and performances

From
vignesh C
Date:
On Fri, 10 Nov 2023 at 23:20, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Oct 03, 2023 at 03:17:57PM +0530, vignesh C wrote:
> > Few comments:
>
> Pierre, do you plan to submit a new revision of this patch set for the
> November commitfest?  If not, the commitfest entry may be marked as
> returned-with-feedback soon.

I have changed the status of commitfest entry to "Returned with
Feedback" as the comments have not yet been resolved. Please handle
the comments and update the commitfest entry accordingly.

Regards,
Vignesh