On Tue, Jul 4, 2017 at 4:27 PM, AP <ap@zip.com.au> wrote:
> Hi,
>
> As I am actively working on a big project I figured I'd give PGSQL 10 a
> go, primarily because of hash indexes.
>
> PostgreSQL 10 version in use is: 10~beta2~20170620.2224-1~491.gitd412f79.pgdg+1
>
> Things are mostly well with hash indexes (plain, non-unique) giving me
> a rather lovely saving in index size: they are roughly 1/4 to 1/2 the
> size of btree. This is of use as the btree indexes can be between 25GB
> and 350GB in size.
>
> There is one index that caused an issue. Towards the end of an import
> I got the following error:
>
> out of overflow pages in hash index
>
> The data being indexed is BYTEA, (quasi)random and 64 bytes in size.
> The table has over 2 billion entries. The data is not unique. There's
> an average of 10 duplicates for every unique value.
>
> Is this a valid error message or have I spotted a bug?
>
It is difficult to say at this stage, but I think we can figure out.
We can get such a message if we consume the maximum overflow pages
that hash index can support which is limited by a number of
bitmappages. Can you try to use pgstattuple extension and get us the
results of Select * from pgstathashindex('index_name');? If the
number of bitmappages is 128 and total overflow pages are 128 * 4096,
then that would mean that all the pages are used. Then maybe we can
try to see if all the pages are full with the help of pageinspect, but
I think we don't need to go there at this stage.
> I tried to duplicate the error with a reindex of the data set that I
> uploaded using btree for that index but I couldn't. The difference, as
> near as I can remember, is different (quasi)random values being indexed
> (still same level of duplication).
>
One probably theory is that in the original index where you hit the
error, there are some unused overflow pages in some of the buckets
which can be squeezed. The reason those didn't get squeezed is that
Vacuum wouldn't have been kicked off on this workload.
> I'll see if I can get the table reloaded a-fresh to see if that's what
> it takes to trigger the error.
>
Thanks. I suggest when an error occurs, don't throw away that index
because we can get some useful information from it to diagnose the
reason of error.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com