Home > mailing lists

Re: Hash Index Build Patch - Mailing list pgsql-patches

From	Tom Raney
Subject	Re: Hash Index Build Patch
Date	September 26, 2007 16:42:57
Msg-id	46FAB647.5040906@comcast.net Whole thread Raw
In response to	Re: Hash Index Build Patch (Alvaro Herrera <alvherre@commandprompt.com>)
Responses	Re: Hash Index Build Patch
List	pgsql-patches

Tree view

Alvaro Herrera wrote:
> Hi Tom,
>
> Tom Raney wrote:
>
>
>> We used spool functions from the BTree code to sort the index
>> tuples. Sorting is done on the hash value of the tuples.  The hash
>> value depends on the number of primary bucket pages (henceforth
>> just bucket pages) that will be required to fit all the index
>> tuples. So, before sorting, the base relation is scanned to get
>> the total number of tuples.
>>
>
> Just wondering, wouldn't it be enough to obtain a tuple count estimate
> by using reltuples / relpages * RelationGetNumberOfBlocks, like the
> planner does?
>
>
Hello Alvaro,

We thought of that and the verdict is still out whether it is more
costly to scan the entire relation to get the accurate count or use the
estimate and hope for the best with the possibility of splits occurring
during the build.   If we use the estimate and it is completely wrong
(with the actual tuple count being much higher) the sort will provide no
benefit and it will behave as did the original code.

But, to be honest, I don't know exactly when the catalog is updated and
how accurate the estimate is.  If you have any information there (or
anyone else) please let me know.  It would be great to eliminate that
extra pass!

Sincerely,
Tom Raney

pgsql-patches by date:

From: Tom Lane
Date: 26 September 2007, 15:54:43
Subject: Re: Optimizer hook

From: Tom Lane
Date: 26 September 2007, 17:06:39
Subject: Re: Hash Index Build Patch

Re: Hash Index Build Patch - Mailing list pgsql-patches

Previous

Next