Re: Indexes for hashes - Mailing list pgsql-performance

From ktm@rice.edu
Subject Re: Indexes for hashes
Date
Msg-id 20160615130307.GL25300@aart.rice.edu
Whole thread Raw
In response to Indexes for hashes  (Ivan Voras <ivoras@gmail.com>)
Responses Re: Indexes for hashes  (Ivan Voras <ivoras@gmail.com>)
List pgsql-performance
On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
> Hi,
>
> I have an application which stores a large amounts of hex-encoded hash
> strings (nearly 100 GB of them), which means:
>
>    - The number of distinct characters (alphabet) is limited to 16
>    - Each string is of the same length, 64 characters
>    - The strings are essentially random
>
> Creating a B-Tree index on this results in the index size being larger than
> the table itself, and there are disk space constraints.
>
> I've found the SP-GIST radix tree index, and thought it could be a good
> match for the data because of the above constraints. An attempt to create
> it (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more
> than 12 hours (while a similar B-tree index takes a few hours at most), so
> I've interrupted it because "it probably is not going to finish in a
> reasonable time". Some slides I found on the spgist index allude that both
> build time and size are not really suitable for this purpose.
>
> My question is: what would be the most size-efficient index for this
> situation?

Hi Ivan,

If the strings are really random, then maybe a function index on the first
4, 8, or 16 characters could be used to narrow the search space and not need
to index all 64. If they are not "good" random numbers, you could use a hash
index on the strings. It will be much smaller since it currently uses a 32-bit
hash. It has a number of caveats and is not currently crash-safe, but it seems
like it might work in your environment. You can also use a functional index on
a hash-function applied to your values with a btree to give you crash safety.

Regards,
Ken


pgsql-performance by date:

Previous
From: Torsten Zuehlsdorff
Date:
Subject: Re: Indexes for hashes
Next
From: Ivan Voras
Date:
Subject: Re: Indexes for hashes