Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

From: Mischa Sandberg
Subject: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Date: ,
Msg-id: 1114580284.426f253cc0087@webmail.telus.net
(view: Whole thread, Raw)
In response to: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan)
Responses: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan)
List: pgsql-hackers

Tree view

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  ("Andrew Dunstan", )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
    Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Marko Ristola, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
    Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
      Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
       Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
        Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
         Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
         Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Markus Schaber, )
          Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
           Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
            Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
            Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
            Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (John A Meinel, )
        Re: [PERFORM] Distinct-Sampling (Gibbons paper) for Postgres  (Josh Berkus, )
        Re: Distinct-Sampling (Gibbons paper) for Postgres  (, )
    Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
     Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
      Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Gurmeet Manku, )
      Citation for "Bad n_distinct estimation; hacks suggested?"  (Gurmeet Manku, )

Quoting Andrew Dunstan <>:

> After some more experimentation, I'm wondering about some sort of
> adaptive algorithm, a bit along the lines suggested by Marko
Ristola, but limited to 2 rounds.
>
> The idea would be that we take a sample (either of fixed size, or
> some  small proportion of the table) , see how well it fits a larger
sample
> > (say a few times the size of the first sample), and then adjust
the > formula accordingly to project from the larger sample the
estimate for the full population. Math not worked out yet - I think we
want to ensure that the result remains bounded by [d,N].

Perhaps I can save you some time (yes, I have a degree in Math). If I
understand correctly, you're trying extrapolate from the correlation
between a tiny sample and a larger sample. Introducing the tiny sample
into any decision can only produce a less accurate result than just
taking the larger sample on its own; GIGO. Whether they are consistent
with one another has no relationship to whether the larger sample
correlates with the whole population. You can think of the tiny sample
like "anecdotal" evidence for wonderdrugs.
--
"Dreams come true, not free." -- S.Sondheim, ITW



pgsql-hackers by date:

From: Simon Riggs
Date:
Subject: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
From: "Greg Sabino Mullane"
Date:
Subject: Re: [PATCHES] Continue transactions after errors in psql