Home > mailing lists

Re: Experimenting with hash tables inside pg_dump - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Experimenting with hash tables inside pg_dump
Date	October 22, 2021 21:30:38
Msg-id	6D552E76-EA56-421D-961C-F8781523958A@anarazel.de Whole thread Raw
In response to	Re: Experimenting with hash tables inside pg_dump (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Experimenting with hash tables inside pg_dump (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Hi,

On October 22, 2021 8:54:13 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Andres Freund <andres@anarazel.de> writes:
>> On 2021-10-22 10:53:31 -0400, Tom Lane wrote:
>>> I'm skeptical of that, mainly because it doesn't work in old servers,
>
>> I think we can address that, if we think it's overall a promising approach to
>> pursue. E.g. if we don't need the indexes, we can make it = ANY().
>
>Hmm ... yeah, I guess we could get away with that.  It might not scale
>as nicely to a huge database, but probably dumping a huge database
>from an ancient server isn't all that interesting.

I think compared to the overhead of locking that many tables and sending O(N) queries it shouldn't be a huge factor.

One think that looks like it might be worth doing, and not hard, is to use single row mode. No need to materialize all
thatdata twice in memory. 

At a later stage it might be worth sending the array separately as a parameter. Perhaps even binary encoded.

>I'm inclined to think that it could be sane to make getTableAttrs
>and getIndexes use this style, but we probably still want functions
>and such to use per-object queries.  In those other catalogs there
>are many built-in objects that we don't really care about.  The
>prepared-queries hack I was working on last night is probably plenty
>good enough there, and it's a much less invasive patch.

Yes, that seems reasonable. I think the triggers query would benefit from the batch approach though - I see that taking
along time in aggregate on a test database with many tables I had around (partially due to the self join), and we
alreadymaterialize it. 

>Were you planning to pursue this further, or did you want me to?

It seems too nice an improvement to drop on the floor. That said, I don't really have the mental bandwidth to pursue
thisbeyond the POC stage - it seemed complicated enough that suggestion accompanied by a prototype was a good idea. So
I'dbe happy for you to incorporate this into your other changes. 

>I'd want to layer it on top of the work I did at [1], else there's
>going to be lots of merge conflicts.

Makes sense. Even if nobody else were doing anything in the area I'd probably want to split it into one commit creating
thequery once, and then separately implement the batching. 

Regards,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 22 October 2021, 21:22:24
Subject: Re: [PATCH] Fix memory corruption in pg_shdepend.c

From: Tom Lane
Date: 22 October 2021, 21:34:50
Subject: Re: [PATCH] Fix memory corruption in pg_shdepend.c

Re: Experimenting with hash tables inside pg_dump - Mailing list pgsql-hackers

Previous

Next