Re: Duplicate Extended Statistics - Mailing list pgsql-admin

From Asad Ali
Subject Re: Duplicate Extended Statistics
Date
Msg-id CAJ9xe=u0FykfAi-ihsjVRkZ3XiJLbW97+BYhEnFo1_3wcps8AA@mail.gmail.com
Whole thread Raw
In response to Duplicate Extended Statistics  (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>)
List pgsql-admin
Hi Ilia,

In PostgreSQL, it is possible to create duplicate extended statistics because the system does not enforce uniqueness on statistics definitions. However, this is generally not recommended, as it leads to longer ANALYZE times, increased storage usage, potential planner performance impact, and unnecessary complexity. In practice, duplicates are rare because users and tools usually avoid redundancy, as there is no added benefit to having multiple identical sets of statistics on the same columns.

Regards,
Asad Ali

On Tue, Sep 3, 2024 at 6:10 PM Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:

Hello everyone,

I have a question regarding extended statistics in PostgreSQL. Why is it possible to create duplicate extended statistics? To make it clearer, here’s an example:

CREATE TABLE t(a int, b int);
INSERT INTO t(a, b) VALUES (...);
CREATE STATISTICS ON a, b FROM t;
ANALYZE t;
....
CREATE STATISTICS ON a, b FROM t;
ANALYZE t;

After executing these queries, the following issues might arise:

  1. ANALYZE will take longer to run because, for example, MCV extended statistics would need to be gathered twice.
  2. Duplicate information will be stored.
  3. The planner might take longer to find the relevant statistics since it has to search through them in a loop.

Or do duplicate extended statistics practically never occur in practice?

Thanks in advance for your response.

-- 
Regards,
Ilia Evdokimov,
Tantor Labs LCC.

pgsql-admin by date:

Previous
From: Asad Ali
Date:
Subject: Re: Basebackup
Next
From: Asad Ali
Date:
Subject: Re: User Privileges Issue