Thread: Duplicate Extended Statistics

Duplicate Extended Statistics

From

Ilia Evdokimov

Date:

03 September, 19:09:50

Hello everyone,

I have a question regarding extended statistics in PostgreSQL. Why is it possible to create duplicate extended statistics? To make it clearer, here’s an example:

CREATE TABLE t(a int, b int);
INSERT INTO t(a, b) VALUES (...);
CREATE STATISTICS ON a, b FROM t;
ANALYZE t;
....
CREATE STATISTICS ON a, b FROM t;
ANALYZE t;

After executing these queries, the following issues might arise:

ANALYZE will take longer to run because, for example, MCV extended statistics would need to be gathered twice.
Duplicate information will be stored.
The planner might take longer to find the relevant statistics since it has to search through them in a loop.

Or do duplicate extended statistics practically never occur in practice?

Thanks in advance for your response.

-- 
Regards,
Ilia Evdokimov,
Tantor Labs LCC.

Re: Duplicate Extended Statistics

From

Tom Lane

Date:

03 September, 20:11:59

Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> I have a question regarding extended statistics in PostgreSQL. Why is it 
> possible to create duplicate extended statistics?

[ shrug... ] We don't prevent duplicate indexes, either.

            regards, tom lane

Re: Duplicate Extended Statistics

From

Asad Ali

Date:

04 September, 14:28:45

Hi Ilia,

In PostgreSQL, it is possible to create duplicate extended statistics because the system does not enforce uniqueness on statistics definitions. However, this is generally not recommended, as it leads to longer ANALYZE times, increased storage usage, potential planner performance impact, and unnecessary complexity. In practice, duplicates are rare because users and tools usually avoid redundancy, as there is no added benefit to having multiple identical sets of statistics on the same columns.

Regards,

Asad Ali

On Tue, Sep 3, 2024 at 6:10 PM Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:

Hello everyone,
I have a question regarding extended statistics in PostgreSQL. Why is it possible to create duplicate extended statistics? To make it clearer, here’s an example:
CREATE TABLE t(a int, b int);
INSERT INTO t(a, b) VALUES (...);
CREATE STATISTICS ON a, b FROM t;
ANALYZE t;
....
CREATE STATISTICS ON a, b FROM t;
ANALYZE t;
After executing these queries, the following issues might arise:
ANALYZE will take longer to run because, for example, MCV extended statistics would need to be gathered twice.
Duplicate information will be stored.
The planner might take longer to find the relevant statistics since it has to search through them in a loop.
Or do duplicate extended statistics practically never occur in practice?
Thanks in advance for your response.
-- 
Regards,
Ilia Evdokimov,
Tantor Labs LCC.