[HACKERS] pgsql 10: hash indexes testing - Mailing list pgsql-hackers

From AP
Subject [HACKERS] pgsql 10: hash indexes testing
Date
Msg-id 20170704105728.mwb72jebfmok2nm2@zip.com.au
Whole thread Raw
Responses Re: [HACKERS] pgsql 10: hash indexes testing
Re: [HACKERS] pgsql 10: hash indexes testing
List pgsql-hackers
Hi,

As I am actively working on a big project I figured I'd give PGSQL 10 a
go, primarily because of hash indexes.

PostgreSQL 10 version in use is: 10~beta2~20170620.2224-1~491.gitd412f79.pgdg+1

Things are mostly well with hash indexes (plain, non-unique) giving me
a rather lovely saving in index size: they are roughly 1/4 to 1/2 the
size of btree.  This is of use as the btree indexes can be between 25GB
and 350GB in size.

There is one index that caused an issue. Towards the end of an import
I got the following error:

out of overflow pages in hash index

The data being indexed is BYTEA, (quasi)random and 64 bytes in size.
The table has over 2 billion entries. The data is not unique. There's
an average of 10 duplicates for every unique value.

Is this a valid error message or have I spotted a bug?

I tried to duplicate the error with a reindex of the data set that I
uploaded using btree for that index but I couldn't. The difference, as
near as I can remember, is different (quasi)random values being indexed
(still same level of duplication).

I'll see if I can get the table reloaded a-fresh to see if that's what
it takes to trigger the error. The upload is done in a single transaction
with COPY BINARY of 3-4 tables, one after the other in a loop until data
is exhausted (ie: COPY into table A, B, C and D and then back to A and
repeat - there is data processing happening and this helps keep memory
usage in check).

A single transaction can COPY millions of rows (average is about 3.7
million rows with the table in question getting 3 million). There are
33 transactions in play at any one time and they all upload to the same
tables.  Last things done in a transaction are a couple of small,
single-row INSERTs into a couple of tables and then COMMIT. There is
one transaction per connection. The whole process can take 12-15 hours
and involves 1000 transactions.

Hopefully it's not a specific set of random values that generates
the error cos duplicating THAT will be outpaced by the death of the
universe. :)

AP



pgsql-hackers by date:

Previous
From: Rahila Syed
Date:
Subject: Re: [HACKERS] Default Partition for Range
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] WIP patch: distinguish selectivity of < from <= and > from >=