Home > mailing lists

Re: Gsoc2012 idea, tablesample - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Gsoc2012 idea, tablesample
Date	May 11, 2012 11:28:21
Msg-id	3891.1336746464@sss.pgh.pa.us Whole thread Raw
In response to	Re: Gsoc2012 idea, tablesample ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses	Re: Gsoc2012 idea, tablesample
List	pgsql-hackers

Tree view

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Florian Pflug <fgp@phlo.org> wrote:
>> Maybe one can get rid of these sorts of problems by factoring in
>> the expected density of the table beforehand and simply accepting
>> that the results will be inaccurate if the statistics are
>> outdated?
> Unless I'm missing something, I think that works for percentage
> selection, which is what the standard talks about, without any need
> to iterate through addition samples.  Good idea!  We don't need to
> do any second pass to pare down initial results, either.  This
> greatly simplifies coding while providing exactly what the standard
> requires.
>> I'm not totally sure whether this approach is sensible to
>> non-uniformity in the tuple to line-pointer assignment, though.

If you're willing to accept that the quality of the results depends on
having up-to-date stats, then I'd suggest (1) use the planner's existing
technology to estimate the number of rows in the table; (2) multiply
by sampling factor you want to get a desired number of sample rows;
(3) use ANALYZE's existing technology to acquire that many sample rows.
While the ANALYZE code isn't perfect with respect to the problem of
nonuniform TID density, it certainly will be a lot better than
pretending that that problem doesn't exist.
        regards, tom lane

pgsql-hackers by date:

From: Michael Nolan
Date: 11 May 2012, 11:21:53
Subject: Re: problem/bug in drop tablespace?

From: Andrew Dunstan
Date: 11 May 2012, 11:37:03
Subject: Re: Draft release notes complete

Re: Gsoc2012 idea, tablesample - Mailing list pgsql-hackers

Previous

Next