Home > mailing lists

Re: More thoughts about planner's cost estimates - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: More thoughts about planner's cost estimates
Date	June 2, 2006 21:41:25
Msg-id	87k67zdymk.fsf@stark.xeocode.com Whole thread Raw
In response to	Re: More thoughts about planner's cost estimates (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: More thoughts about planner's cost estimates (Josh Berkus <josh@agliodbs.com>)
List	pgsql-hackers

Tree view

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > And a 5% sample is a pretty big. In fact my tests earlier showed the i/o from
> > 5% block sampling took just as long as reading all the blocks. Even if we
> > figure out what's causing that (IMHO surprising) result and improve matters I
> > would only expect it to be 3-4x faster than a full scan.
> 
> One way to reduce the I/O pain from extensive sampling would be to turn
> VACUUM ANALYZE into a genuine combined operation instead of a mere
> notational shorthand for two separate scans.

Except that we're also looking for every way we can to avoid having vacuum
have to touch every page so large tables with narrow hot spots can still
afford to be vacuumed regularly during peak hours.

But for most users analyze doesn't really have to run as often as vacuum. One
sequential scan per night doesn't seem like that big a deal to me.

> I'd still be worried about the CPU pain though.  ANALYZE can afford to
> expend a pretty fair number of cycles per sampled tuple, but with a
> whole-table sample that's going to add up.

That is a concern. Though the algorithm is pretty efficient, it basically
amounts to hashing all the tuples keeping only a limited number of hash
buckets and just throwing away the rest of the data. 

Things could get more complicated if we get to more complex stats than simple
n_distinct estimates and performing dependence tests on the data for
multi-column statistics.

-- 
greg

pgsql-hackers by date:

From: David Fetter
Date: 02 June 2006, 21:34:59
Subject: Re: COPY (query) TO file

From: Greg Stark
Date: 02 June 2006, 21:47:00
Subject: Re: COPY (query) TO file

Re: More thoughts about planner's cost estimates - Mailing list pgsql-hackers

Previous

Next