Re: Hash Indexes - Mailing list pgsql-hackers

From Mark Kirkwood
Subject Re: Hash Indexes
Date
Msg-id 0e0c6adb-2fd8-d0d5-6b87-71c7478bc753@catalyst.net.nz
Whole thread Raw
In response to Re: Hash Indexes  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 16/09/16 18:35, Amit Kapila wrote:

> On Thu, Sep 15, 2016 at 7:53 PM, Andres Freund <andres@anarazel.de> wrote:
>> Hi,
>>
>> On 2016-05-10 17:39:22 +0530, Amit Kapila wrote:
>>> For making hash indexes usable in production systems, we need to improve
>>> its concurrency and make them crash-safe by WAL logging them.
>> One earlier question about this is whether that is actually a worthwhile
>> goal.  Are the speed and space benefits big enough in the general case?
>>
> I think there will surely by speed benefits w.r.t reads for larger
> indexes.  All our measurements till now have shown that there is a
> benefit varying from 30~60% (for reads) with hash index as compare to
> btree, and I think it could be even more if we further increase the
> size of index.  On space front, I have not done any detailed study, so
> it is not right to conclude anything, but it appears to me that if the
> index is on char/varchar column where size of key is 10 or 20 bytes,
> hash indexes should be beneficial as they store just hash-key.
>
>> Could those benefits not be achieved in a more maintainable manner by
>> adding a layer that uses a btree over hash(columns), and adds
>> appropriate rechecks after heap scans?
>>
> I don't think it can be faster for reads than using real hash index,
> but surely one can have that as a workaround.
>
>> Note that I'm not saying that hash indexes are not worthwhile, I'm just
>> doubtful that question has been explored sufficiently.
>>
> I think theoretically hash indexes are faster than btree considering
> logarithmic complexity (O(1) vs. O(logn)), also the results after
> recent optimizations indicate that hash indexes are faster than btree
> for equal to searches.  I am not saying after the recent set of
> patches proposed for hash indexes they will be better in all kind of
> cases.  It could be beneficial for cases where indexed columns are not
> updated heavily.
>
> I think one can definitely argue that we can some optimizations in
> btree and make them equivalent or better than hash indexes, but I am
> not sure if it is possible for all-kind of use-cases.
>


I think having the choice for a more equality optimized index design is 
desirable. Now that they are wal logged they are first class citizens so 
to speak. I suspect that there are a lot of further speed optimizations 
that can be considered to tease out the best performance - now that the 
basics of reliability have been sorted. I think this patch/set of 
patches is/are important!

regards

Mark




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Write Ahead Logging for Hash Indexes
Next
From: Vladimir Gordiychuk
Date:
Subject: Re: Stopping logical replication protocol