Re: Statistics Import and Export commit related issue. - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Statistics Import and Export commit related issue.
Date
Msg-id CADkLM=dkRHCnPaOiGctC6YPQ4ztuab8bKZ0c4=rmA75e7tVbjw@mail.gmail.com
Whole thread Raw
In response to Statistics Import and Export commit related issue.  (jian he <jian.universality@gmail.com>)
List pgsql-hackers
On Tue, Feb 25, 2025 at 1:31 AM jian he <jian.universality@gmail.com> wrote:
hi.
the thread "Statistics Import and Export" is quite hot.
so a new thread for some minor issue i found.


static int
_tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
{

    if (strcmp(te->desc, "STATISTICS DATA") == 0)
    {
        if (!ropt->dumpStatistics)
            return 0;
        else
            res = REQ_STATS;
    }

    /* If it's statistics and we don't want statistics, maybe ignore it */
    if (!ropt->dumpStatistics && strcmp(te->desc, "STATISTICS DATA") == 0)
        return 0;
}
As you can see, there is some duplicate code there.
in _tocEntryRequired, we can move `` if (strcmp(te->desc, "STATISTICS
DATA") == 0)`` after switch (curSection).
so pg_dump --section option does not influence statistics dump.
attached is the proposed change.

+1, and I think the patch provided is good as-is.
 


in dumpTableData_copy, we have:
    pg_log_info("dumping contents of table \"%s.%s\"",
                tbinfo->dobj.namespace->dobj.name, classname);
dumpRelationStats add a pg_log_info would be a good thing.
commit: https://git.postgresql.org/cgit/postgresql.git/commit/?id=1fd1bd871012732e3c6c482667d2f2c56f1a9395
didn't have any pg_log_info.

I don't have a strong opinion on this one, but I lean against doing it. I think table data got a log message because dumping table data is an operation that takes up a major % of the dump's runtime, whereas statistics are just some catalog operations. There are log messages for the global catalog operations (ex. read in all user defined functions, read in all inheritance relationships), but not per-table catalog operations. If we end up going to a global export of attribute statistics then I'd definitely want a log message for that operation.
 


https://www.postgresql.org/docs/devel/app-pgrestore.html
"""
--jobs=number-of-jobs
Run the most time-consuming steps of pg_restore — those that load
data, create indexes, or create constraints — concurrently, using up
to number-of-jobs concurrent sessions.
"""
we may need to change the above sentence?
pg_restore restore STATISTICS DATA also doing it concurrently if
--jobs is specified.

I also have no strong opinion here, but I think the sentence is fine as is because it specifies "most time-consuming" and a single catalog row operation is much smaller than a COPY load, or the sequential scans needed for index builds and constraint verification.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: SQLFunctionCache and generic plans
Next
From: Robert Haas
Date:
Subject: Re: moving some code out of explain.c