Re: Hash Index Build Patch - Mailing list pgsql-patches

From Tom Lane
Subject Re: Hash Index Build Patch
Date
Msg-id 27185.1190837188@sss.pgh.pa.us
Whole thread Raw
In response to Re: Hash Index Build Patch  (Tom Raney <twraney@comcast.net>)
Responses Re: Hash Index Build Patch  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-patches
Tom Raney <twraney@comcast.net> writes:
> Alvaro Herrera wrote:
>> Just wondering, wouldn't it be enough to obtain a tuple count estimate
>> by using reltuples / relpages * RelationGetNumberOfBlocks, like the
>> planner does?

> We thought of that and the verdict is still out whether it is more
> costly to scan the entire relation to get the accurate count or use the
> estimate and hope for the best with the possibility of splits occurring
> during the build.   If we use the estimate and it is completely wrong
> (with the actual tuple count being much higher) the sort will provide no
> benefit and it will behave as did the original code.

I think this argument is *far* too weak to justify an extra pass over
the relation.  The planner-style calculation is quite unlikely to give a
major underestimate of the rowcount.  It might overestimate, eg if the
relation is bloated by dead tuples, but an error in that direction won't
kill you.

            regards, tom lane

pgsql-patches by date:

Previous
From: Tom Raney
Date:
Subject: Re: Hash Index Build Patch
Next
From: Tom Lane
Date:
Subject: Re: Minor recovery changes