Home > mailing lists

Re: n_distinct off by a factor of 1000 - Mailing list pgsql-general

From	Adrian Klaver
Subject	Re: n_distinct off by a factor of 1000
Date	June 23, 2020 17:14:14
Msg-id	3c310268-52f2-b960-8efd-bc15b08da7a7@aklaver.com Whole thread Raw
In response to	Re: n_distinct off by a factor of 1000 (Fabio Pardi <f.pardi@portavita.eu>)
Responses	Re: n_distinct off by a factor of 1000 (Klaudie Willis <Klaudie.Willis@protonmail.com>)
List	pgsql-general

Tree view

On 6/23/20 7:05 AM, Fabio Pardi wrote:
> 
> On 23/06/2020 14:42, Klaudie Willis wrote:
>> I got my first hint of why this problem occurs when I looked at the 
>> statistics.  For the column in question, "instrument_ref" the 
>> statistics claimed it to be:
>>
>> The default_statistics_target=500, and analyze has been run.
>> select * from pg_stats where attname like 'instr%_ref'; -- Result: 
>> *40.000*
>> select count(distinct instrumentid_ref) from bigtable -- Result: *33 
>> 385 922 (!!)*
>>
>> That is an astonishing difference of almost a 1000X.
>>
> 
> I think you are counting 2 different things here.
> 
> The first query returns all the columns "like 'instr%_ref'" present in 
> the statistics (so in the whole cluster), while the second is counting 
> the actual number of different rows in bigtable.

I believe the OP actually meant the query to be:

select n_distinct from pg_stats where attname like 'instr%_ref';

> 
> 
> regards,
> 
> fabio pardi


-- 
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

From: Fabio Pardi
Date: 23 June 2020, 17:05:49
Subject: Re: n_distinct off by a factor of 1000

From: Adrian Klaver
Date: 23 June 2020, 17:28:34
Subject: Re: pg_dump empty tables

Re: n_distinct off by a factor of 1000 - Mailing list pgsql-general

Previous

Next