Re: Indexes for hashes - Mailing list pgsql-performance

From julyanto SUTANDANG
Subject Re: Indexes for hashes
Date
Msg-id CAGu3fESMbee4ZXDTKwWgN0U7j99rYy3f6vRcjp44Qr88Mp+=TA@mail.gmail.com
Whole thread Raw
In response to Re: Indexes for hashes  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Indexes for hashes
List pgsql-performance
This way is doing faster using crc32(data) than hashtext since crc32 is hardware accelerated in intel (and others perhaps) 
this way (crc32)  is no way the same as hash, much way faster than others... 

Regards, 


On Fri, Jun 17, 2016 at 10:51 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Wed, Jun 15, 2016 at 6:34 AM, Ivan Voras <ivoras@gmail.com> wrote:
>
> I have an application which stores a large amounts of hex-encoded hash
> strings (nearly 100 GB of them), which means:
>
> The number of distinct characters (alphabet) is limited to 16
> Each string is of the same length, 64 characters
> The strings are essentially random
>
> Creating a B-Tree index on this results in the index size being larger than
> the table itself, and there are disk space constraints.
>
> I've found the SP-GIST radix tree index, and thought it could be a good
> match for the data because of the above constraints. An attempt to create it
> (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more
> than 12 hours (while a similar B-tree index takes a few hours at most), so
> I've interrupted it because "it probably is not going to finish in a
> reasonable time". Some slides I found on the spgist index allude that both
> build time and size are not really suitable for this purpose.


I've found that hash btree indexes tend to perform well in these situations:

CREATE INDEX ON t USING btree (hashtext(fieldname));

However, you'll have to modify your queries to query for both, the
hashtext and the text itself:

SELECT * FROM t WHERE hashtext(fieldname) = hashtext('blabla') AND
fieldname = 'blabla';


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--


Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source an Open Mind Company)

Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta Pusat
T: +6221 22866662 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may be legally privileged and/or confidential and is intended only for the use of the addressee(s). No addressee should forward, print, copy, or otherwise reproduce this message in any manner that would allow it to be viewed by any individual not originally listed as a recipient. If the reader of this message is not the intended recipient, you are hereby notified that any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is strictly prohibited. If you have received this communication in error, please immediately notify the sender and delete this message.Unless it is made by the authorized person, any views expressed in this message are those of the individual sender and may not necessarily reflect the views of PT Equnix Business Solutions.

pgsql-performance by date:

Previous
From: Adam Brusselback
Date:
Subject: Re: 9.6 query slower than 9.5.3
Next
From: Claudio Freire
Date:
Subject: Re: Indexes for hashes