Re: Experimenting with hash tables inside pg_dump - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Experimenting with hash tables inside pg_dump
Date
Msg-id 2709766.1634914411@sss.pgh.pa.us
Whole thread Raw
In response to Re: Experimenting with hash tables inside pg_dump  (Andres Freund <andres@anarazel.de>)
Responses Re: Experimenting with hash tables inside pg_dump  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2021-10-21 22:13:22 -0400, Tom Lane wrote:
>> I've thought about doing something like
>> SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...)
>> but in cases with tens of thousands of tables, it seems unlikely that
>> that's going to behave all that nicely.

> That's kinda what I'm doing in the quick hack. But instead of using IN(...) I
> made it unnest('{oid, oid, ...}'), that scales much better.

I'm skeptical of that, mainly because it doesn't work in old servers,
and I really don't want to maintain two fundamentally different
versions of getTableAttrs().  I don't think you actually need the
multi-array form of unnest() here --- we know the TableInfo array
is in OID order --- but even the single-array form only works
back to 8.4.

However ... looking through getTableAttrs' main query, it seems
like the only thing there that's potentially unsafe is the
"format_type(t.oid, a.atttypmod)" call.  I wonder if it could be
sane to convert it into a single query that just scans all of
pg_attribute, and then deal with creating the formatted type names
separately, perhaps with an improved version of getFormattedTypeName
that could cache the results for non-default typmods.  The main
knock on this approach is the temptation for somebody to stick some
unsafe function into the query in future.  We could stick a big fat
warning comment into the code, but lately I despair of people reading
comments.

> To see where it's worth putting in time it'd be useful if getSchemaData() in
> verbose mode printed timing information...

I've been running test cases with log_min_duration_statement = 0,
which serves well enough.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: parallelizing the archiver
Next
From: Andres Freund
Date:
Subject: Re: Experimenting with hash tables inside pg_dump