On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> I ended up writing several patches that shaved some time for pg_restore -l,
> and reduced the toc.dat size.
I've only just started taking a look at these patches, and I intend to do a
more thorough review in the hopefully-not-too-distant future.
> First patch is "finishing" the job of removing has oids support. When this
> support was removed, instead of dropping the field from the dumps and
> increasing the dump versions, the field was kept as is. This field stores a
> boolean as a string, "true" or "false". This is not free, and requires 10
> bytes per toc entry.
This sounds reasonable to me. I wonder why this wasn't done when WITH OIDS
was removed in v12.
> The second patch removes calls to sscanf and replaces them with strtoul. This
> was the biggest speedup for pg_restore -l.
Nice.
> The third patch changes the dump format further to remove these strtoul calls
> and store the integers as is instead.
Do we need to worry about endianness here?
> The fourth patch is dirtier and does more changes to the dump format. Instead
> of storing the owner, tablespace, table access method and schema of each
> object as a string, pg_dump builds an array of these, stores them at the
> beginning of the file and replaces the strings with integer fields in the dump.
> This reduces the file size further, and removes a lot of calls to ReadStr, thus
> saving quite some time.
This sounds promising.
> Patch Toc size Dump -s duration pg_restore -l duration
> HEAD 214M 23.1s 1.27s
> #1 (has oid) 210M 22.9s 1.26s
> #2 (scanf) 210M 22.9s 1.07s
> #3 (no strtoul) 202M 22.8s 0.94s
> #4 (string list) 181M 23.1s 0.87s
At a glance, the size improvements in 0004 look the most interesting to me.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com