Home > mailing lists

Re: Yet another abort-early plan disaster on 9.3 - Mailing list pgsql-performance

From	Greg Stark
Subject	Re: Yet another abort-early plan disaster on 9.3
Date	October 10, 2014 14:16:57
Msg-id	CAM-w4HNfZymQmTu3+TxQQD-e6_410-sDnZBaW8neurmTFh4GbA@mail.gmail.com Whole thread Raw
In response to	Re: Yet another abort-early plan disaster on 9.3 (Josh Berkus <josh@agliodbs.com>)
Responses	Re: Yet another abort-early plan disaster on 9.3 ("Tomas Vondra" <tv@fuzzy.cz>)
List	pgsql-performance

Tree view

On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Yes, it's only intractable if you're wedded to the idea of a tiny,
> fixed-size sample.  If we're allowed to sample, say, 1% of the table, we
> can get a MUCH more accurate n_distinct estimate using multiple
> algorithms, of which HLL is one.  While n_distinct will still have some
> variance, it'll be over a much smaller range.

I've gone looking for papers on this topic but from what I read this
isn't so. To get any noticeable improvement you need to read 10-50% of
the table and that's effectively the same as reading the entire table
-- and it still had pretty poor results. All the research I could find
went into how to analyze the whole table while using a reasonable
amount of scratch space and how to do it incrementally.

--
greg

pgsql-performance by date:

From: Emi Lu
Date: 08 October 2014, 17:43:04
Subject: Re: char(N), varchar(N), varchar, text

From: "Tomas Vondra"
Date: 10 October 2014, 15:10:58
Subject: Re: Yet another abort-early plan disaster on 9.3

Re: Yet another abort-early plan disaster on 9.3 - Mailing list pgsql-performance

Previous

Next