Re: Hash Indexes - Mailing list pgsql-hackers

From Kenneth Marshall
Subject Re: Hash Indexes
Date
Msg-id 20160919151621.GB12277@aart.rice.edu
Whole thread Raw
In response to Re: Hash Indexes  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Sep 19, 2016 at 12:14:26PM +0530, Amit Kapila wrote:
> On Mon, Sep 19, 2016 at 11:20 AM, Mark Kirkwood
> <mark.kirkwood@catalyst.net.nz> wrote:
> >
> >
> > On 17/09/16 06:38, Andres Freund wrote:
> >>
> >> On 2016-09-16 09:12:22 -0700, Jeff Janes wrote:
> >>>
> >>> On Thu, Sep 15, 2016 at 7:23 AM, Andres Freund <andres@anarazel.de>
> >>> wrote:
> >>>>
> >>>> One earlier question about this is whether that is actually a worthwhile
> >>>> goal.  Are the speed and space benefits big enough in the general case?
> >>>> Could those benefits not be achieved in a more maintainable manner by
> >>>> adding a layer that uses a btree over hash(columns), and adds
> >>>> appropriate rechecks after heap scans?
> >>>>
> >>>> Note that I'm not saying that hash indexes are not worthwhile, I'm just
> >>>> doubtful that question has been explored sufficiently.
> >>>
> >>> I think that exploring it well requires good code.  If the code is good,
> >>> why not commit it?
> >>
> >> Because getting there requires a lot of effort, debugging it afterwards
> >> would take effort, and maintaining it would also takes a fair amount?
> >> Adding code isn't free.
> >>
> >> I'm rather unenthused about having a hash index implementation that's
> >> mildly better in some corner cases, but otherwise doesn't have much
> >> benefit. That'll mean we'll have to step up our user education a lot,
> >> and we'll have to maintain something for little benefit.
> >>
> >
> > While I see the point of what you are saying here, I recall all previous
> > discussions about has indexes tended to go a bit like this:
> >
> > - until WAL logging of hash indexes is written it is not worthwhile trying
> > to make improvements to them
> > - WAL logging will be a lot of work, patches 1st please
> >
> > Now someone has done that work, and we seem to be objecting that because
> > they are not improved then the patches are (maybe) not worthwhile.
> >
> 
> I think saying hash indexes are not improved after proposed set of
> patches is an understatement.  The read performance has improved by
> more than 80% as compare to HEAD [1] (refer data in Mithun's mail).
> Also, tests by Mithun and Jesper has indicated that in multiple
> workloads, they are better than BTREE by 30~60% (in fact Jesper
> mentioned that he is seeing 40~60% benefit on production database,
> Jesper correct me if I am wrong.).  I agree that when index column is
> updated they are much worse than btree as of now, but no work has been
> done improve it and I am sure that it can be improved for those cases
> as well.
> 
> In general, I thought the tests done till now are sufficient to prove
> the importance of work, but if still Andres and others have doubt and
> they want to test some specific cases, then sure we can do more
> performance benchmarking.
> 
> Mark, thanks for supporting the case for improving Hash Indexes.
> 
> 
> [1] - https://www.postgresql.org/message-id/CAD__OugX0aOa7qopz3d-nbBAoVmvSmdFJOX4mv5tFRpijqH47A%40mail.gmail.com
> -- 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
> 

+1 Throughout the years, I have seen benchmarks that demonstrated the
performance advantages of even the initial hash index (without WAL)
over the btree of a hash variant. It is pretty hard to dismiss the
O(1) versus O(log(n)) difference. There are classes of problems for
which a hash index is the best solution. Lack of WAL has hamstrung
development in those areas for years.

Regards,
Ken



pgsql-hackers by date:

Previous
From: Mithun Cy
Date:
Subject: Re: "Some tests to cover hash_index"
Next
From: Jesper Pedersen
Date:
Subject: Re: pageinspect: Hash index support