Home > mailing lists

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Improving N-Distinct estimation by ANALYZE
Date	January 6, 2006 19:37:02
Msg-id	87psn52ajv.fsf@stark.xeocode.com Whole thread Raw
In response to	Re: Improving N-Distinct estimation by ANALYZE (Josh Berkus <josh@agliodbs.com>)
Responses	Re: Improving N-Distinct estimation by ANALYZE
List	pgsql-hackers

Tree view

Josh Berkus <josh@agliodbs.com> writes:

> > These numbers don't make much sense to me. It seems like 5% is about as
> > slow as reading the whole file which is even worse than I expected. I
> > thought I was being a bit pessimistic to think reading 5% would be as
> > slow as reading 20% of the table.
> 
> It's about what *I* expected.  Disk seeking is the bane of many access 
> methods.

Sure, but that bad? That means realistic random_page_cost values should be
something more like 20 rather than 4. And that's with seeks only going to
subsequent blocks in a single file, which one would expect to average less
than the half rotation that a random seek would average. That seems worse than
anyone expects.

> Anyway, since the proof is in the pudding, Simon and I will be working on 
> some demo code for different sampling methods so that we can debate 
> results rather than theory.

Note that if these numbers are realistic then there's no i/o benefit to any
sampling method that requires anything like 5% of the entire table and is
still unreliable. Instead it makes more sense to implement an algorithm that
requires a full table scan and can produce good results more reliably.

-- 
greg

pgsql-hackers by date:

From: Josh Berkus
Date: 06 January 2006, 19:20:35
Subject: Re: Improving N-Distinct estimation by ANALYZE

From: Peter Eisentraut
Date: 06 January 2006, 20:47:32
Subject: Warning on certain configuration file changes

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

Previous

Next