Home > mailing lists

Re: n_distinct off by a factor of 1000 - Mailing list pgsql-general

From	Klaudie Willis
Subject	Re: n_distinct off by a factor of 1000
Date	June 23, 2020 15:34:45
Msg-id	g7Da1HpeuexOOidPRg1xpEYwJDsaESRQYxKEFshGZSA08PhacxJ4EuXnutL7QSgah0pSUAeuvPj-v7m-YKm5eKZRrDj1DrAv_4FqHulMOK4=@protonmail.com Whole thread
In response to	Re: n_distinct off by a factor of 1000 (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses	Re: n_distinct off by a factor of 1000
List	pgsql-general

Tree view

Adrian, you are correct.  My mistanke.

K

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, June 23, 2020 4:14 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> On 6/23/20 7:05 AM, Fabio Pardi wrote:
>
> > On 23/06/2020 14:42, Klaudie Willis wrote:
> >
> > > I got my first hint of why this problem occurs when I looked at the
> > > statistics.  For the column in question, "instrument_ref" the
> > > statistics claimed it to be:
> > > The default_statistics_target=500, and analyze has been run.
> > > select * from pg_stats where attname like 'instr%_ref'; -- Result:
> > > 40.000
> > > select count(distinct instrumentid_ref) from bigtable -- Result: 33
> > > 385 922 (!!)That is an astonishing difference of almost a 1000X.
> >
> > I think you are counting 2 different things here.
> > The first query returns all the columns "like 'instr%_ref'" present in
> > the statistics (so in the whole cluster), while the second is counting
> > the actual number of different rows in bigtable.
>
> I believe the OP actually meant the query to be:
>
> select n_distinct from pg_stats where attname like 'instr%_ref';
>
> > regards,
> > fabio pardi
>
> --
>
> Adrian Klaver
> adrian.klaver@aklaver.com

pgsql-general by date:

From: Jason Ralph
Date: 23 June 2020, 15:17:01
Subject: RE: UPDATE on 20 Million Records Transaction or not?

From: Ron
Date: 23 June 2020, 15:36:31
Subject: Re: UPDATE on 20 Million Records Transaction or not?

Re: n_distinct off by a factor of 1000 - Mailing list pgsql-general

Previous

Next