Indexes for hashes - Mailing list pgsql-performance

From Ivan Voras
Subject Indexes for hashes
Date
Msg-id CAF-QHFULmVOdrqwtR7AKRnnx6=GbAW7S6v6f4jACEOVENef7NA@mail.gmail.com
Whole thread Raw
Responses Re: Indexes for hashes  (Torsten Zuehlsdorff <mailinglists@toco-domains.de>)
Re: Indexes for hashes  ("ktm@rice.edu" <ktm@rice.edu>)
Re: Indexes for hashes  (hubert depesz lubaczewski <depesz@depesz.com>)
Re: Indexes for hashes  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-performance
Hi,

I have an application which stores a large amounts of hex-encoded hash strings (nearly 100 GB of them), which means:
  • The number of distinct characters (alphabet) is limited to 16
  • Each string is of the same length, 64 characters
  • The strings are essentially random
Creating a B-Tree index on this results in the index size being larger than the table itself, and there are disk space constraints.

I've found the SP-GIST radix tree index, and thought it could be a good match for the data because of the above constraints. An attempt to create it (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more than 12 hours (while a similar B-tree index takes a few hours at most), so I've interrupted it because "it probably is not going to finish in a reasonable time". Some slides I found on the spgist index allude that both build time and size are not really suitable for this purpose.

My question is: what would be the most size-efficient index for this situation?

pgsql-performance by date:

Previous
From: Tory M Blue
Date:
Subject: Re: Clarification on using pg_upgrade
Next
From: Glyn Astill
Date:
Subject: Re: Clarification on using pg_upgrade