Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. - Mailing list pgsql-hackers

From David Gould
Subject Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate.
Date
Msg-id 20180302185752.46b82671@engels
Whole thread Raw
In response to Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, 02 Mar 2018 17:17:29 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:


> But by the same token, analyze only looked at 0.0006 of the pages.  It's
> nice that for you, that's enough to get a robust estimate of the density
> everywhere; but I have a nasty feeling that that won't hold good for
> everybody.

My grasp of statistics is somewhat weak, so please inform me if I've got
this wrong, but every time I've looked into it I've found that one can get
pretty good accuracy and confidence with fairly small samples. Typically 1000
samples will serve no matter the population size if the desired margin of
error is 5%. Even with 99% confidence and a 1% margin of error it takes less
than 20,000 samples. See the table at:

http://www.research-advisors.com/tools/SampleSize.htm

Since we have by default 30000 sample pages and since ANALYZE takes some
trouble to get a random sample I think we really can rely on the results of
extrapolating reltuples from analyze.

-dg

-- 
David Gould                                   daveg@sonic.net
If simplicity worked, the world would be overrun with insects.


pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Amit Kapila
Date:
Subject: Re: zheap: a new storage format for PostgreSQL