Home > mailing lists

Re: Statistics Import and Export - Mailing list pgsql-hackers

From	Corey Huinker
Subject	Re: Statistics Import and Export
Date	February 24 20:57:17
Msg-id	CADkLM=d3H8xghP8QmOL6f_mBgP_X23-=2hOnYx7mqYiNSHimiQ@mail.gmail.com Whole thread Raw
In response to	Re: Statistics Import and Export (Andres Freund <andres@anarazel.de>)
Responses	Re: Statistics Import and Export
List	pgsql-hackers

Tree view

On Mon, Feb 24, 2025 at 9:54 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2025-02-24 05:11:48 -0500, Corey Huinker wrote:
> Incorporating most of the feedback (I kept a few of
> the appendNamedArgument() calls) presented over the weekend.
>
> * removeVersionNumStr is gone
> * relpages/reltuples/relallvisible are now char[32] buffers in RelStatsInfo
> and nowhere else (existing relpages conversion remains, however)

I don't see the point. This will use more memory and if we can't get
conversions between integers and strings right we have much bigger
problems. The same code was used in the backend too!

As I see it, the point is that we're getting an input that is a string representation from the query, and the end-goal is to convey that value with fidelity to the destination database, so there's nothing we can do to get us closer to the string that we already have.

I don't have benchmark numbers beyond the instinct that doing something takes more time than doing nothing. Granted, "nothing" here means 96 bytes of memory and 3 strncpy()s, and "something" is 24 bytes of memory, 2 atoi()s, 1 strtof() plus whatever memory and processing we do back in converting back to strings.

And it leads to storing relpages in two places, with different
transformations, which doesn't seem great.

I didn't like that either, but balanced the ugliness of that vs the cost of grinding the values back to where we started.

> @@ -6921,6 +6927,7 @@ getTables(Archive *fout, int *numTables)
> appendPQExpBufferStr(query,
> "SELECT c.tableoid, c.oid, c.relname, "
> "c.relnamespace, c.relkind, c.reltype, "
> + "c.relpages, c.reltuples, c.relallvisible, "
> "c.relowner, "
> "c.relchecks, "
> "c.relhasindex, c.relhasrules, c.relpages, "

That query is already querying relpages a bit later in the query, so we'd
query the column twice.

+1, must eliminate that duplicate.

> + printfPQExpBuffer(query, "EXECUTE getAttributeStats(");
> + appendStringLiteralAH(query, dobj->namespace->dobj.name, fout);
> + appendPQExpBufferStr(query, ", ");
> + appendStringLiteralAH(query, dobj->name, fout);
> + appendPQExpBufferStr(query, "); ");
> res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);

It seems somewhat ugly that we're building an SQL string with non-trivial
constants. It'd be better to use PQexecParams() - but I guess we don't have
any uses of it yet in pg_dump.

+1, I would like to see that change.

ISTM that we ought to expose the relation oid in pg_stats. This query would be
simpler and faster if we could just use the oid as the predicate. Will take a
while till we can rely on that, but still.

+1, but we will need to support this until v18 is as old as v9.2 is now...approx 2038.

Have you compared performance of with/without stats after these optimizations?

I have not.

pgsql-hackers by date:

From: Tom Lane
Date: 24 February, 20:50:58
Subject: Re: Statistics Import and Export

From: Andrew Jackson
Date: 24 February, 21:06:59
Subject: Re: Add Option To Check All Addresses For Matching target_session_attr

Re: Statistics Import and Export - Mailing list pgsql-hackers

Previous

Next