Re: Stopgap solution for table-size-estimate updating - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Stopgap solution for table-size-estimate updating
Date
Msg-id 1101679551.2963.94.camel@localhost.localdomain
Whole thread Raw
In response to Re: Stopgap solution for table-size-estimate updating problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Stopgap solution for table-size-estimate updating problem  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, 2004-11-28 at 18:52, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On the topic of accuracy of the estimate: Updates cause additional data
> > to be written to the table, so tables get bigger until vacuumed. Tables
> > with many Inserts are also regularly trimmed with Deletes. With a
> > relatively static workload and a regular vacuum cycle, the table size
> > for many major tables eventually levels off, remaining roughly constant
> > but the number of non-zero pages will vary over time in a saw-tooth
> > curve. Estimating the cardinality by using the number of blocks would
> > ignore the fact that many of them are empty for much of the time. That
> > would then lead to a systematic over-estimate of the cardinality of the
> > regularly updated tables.
> 
> You mean underestimate.  After a VACUUM, the tuples-per-page figure
> would be set to a relatively low value, and then subsequent inserts
> would fill in the free space, causing the actual density to rise
> while the physical number of blocks stays more or less constant.
> So the proposed method would always give an accurate number of blocks,
> but it would tend to underestimate the number of tuples in a dynamic
> situation.

OK, just *wrong* then (the estimate, as well as myself) :-)

> Still, it's better than the current method, which is likely to
> underestimate both parameters.  I believe that having an accurate block
> count and an underestimated tuple count would tend to favor choosing
> indexscans over seqscans, which is probably a good thing --- when was
> the last time you saw someone complaining that the planner had
> improperly chosen an indexscan over a seqscan?

Well, yes, but user perception is not always right.

Given we expect an underestimate, can we put in a correction factor
should the estimate get really low...sounds like we could end up
choosing nested joins more often when we should have chosen merge joins.
That is something that people do regularly complain about (indirectly).

> > How easy would it be to take into account the length of the FSM for the
> > relation also?
> 
> Don't think this would help; the FSM doesn't really track number of
> tuples.  Free space isn't a good guide to number of tuples because you
> can't distinguish inserts from updates at that level.  (I'm also a bit
> concerned about turning the FSM into a source of contention.)

Agreed.

-- 
Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: SQL:2003 TODO items
Next
From: Tom Lane
Date:
Subject: Re: [JDBC] Strange server error with current 8.0beta driver