Re: improve performance of pg_dump with many sequences - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: improve performance of pg_dump with many sequences
Date
Msg-id Zo8Fl53vBVS1Cnmt@nathan
Whole thread Raw
In response to Re: improve performance of pg_dump with many sequences  ("Euler Taveira" <euler@eulerto.com>)
Responses Re: improve performance of pg_dump with many sequences
List pgsql-hackers
On Wed, Jul 10, 2024 at 05:08:56PM -0300, Euler Taveira wrote:
> Nice improvement. The numbers for a realistic scenario (10k sequences) are

Thanks for taking a look!

> You are changing internal representation from char to int64. Is the main goal to
> validate catalog data? What if there is a new sequence data type whose
> representation is not an integer?

IIRC 0001 was primarily intended to reduce the amount of memory needed for
the sorted table.  Regarding a new sequence data type, I'm assuming we'll
have much bigger fish to fry if we do that (e.g., pg_sequence uses int8 for
the values), and I'd hope that adjusting this code wouldn't be too
difficult, anyway.

> This code path is adding zero byte to the last position of the fixed string. I
> suggest that the zero byte is added to the position after the string length.

I'm not following why that would be a better approach.  strncpy() will add
a NUL to the end of the string unless it doesn't fit in the buffer, in
which case we'll add our own via "seqtype[sizeof(seqtype) - 1] = '\0'".
Furthermore, the compiler can determine the position where the NUL should
be placed, whereas placing it at the end of the copied string requires a
runtime strlen().

> l = strlen(PQgetvalue(res, 0, 0));
> Assert(l < sizeof(seqtype));
> strncpy(seqtype, PQgetvalue(res, 0, 0), l);
> seqtype[l] = '\0';

I think the strncpy() should really be limited to the size of the seqtype
buffer.  IMHO an Assert is not sufficient.

> If you are not planning to apply 0003, make sure you fix collectSequences() to
> avoid versions less than 10. Move this part to 0002.

Yeah, no need to create the table if we aren't going to use it.

> Since you apply a fix for pg_sequence_last_value function, you can simplify the
> query in 0003. CASE is not required.

Unfortunately, I think we have to keep this workaround since older minor
releases of PostgreSQL don't have the fix.

> patched (0001 and 0002):
> real 0m0,290s
> user 0m0,038s
> sys 0m0,104s
> 
> I'm not sure if 0003 is worth. Maybe if you have another table like you
> suggested.

What pg_dump command did you test here?  Did you dump the sequence data, or
was this --schema-only?

-- 
nathan



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: improve predefined roles documentation
Next
From: Alena Rybakina
Date:
Subject: Re: POC, WIP: OR-clause support for indexes