Re: Hash Indexes - Mailing list pgsql-hackers
From | Kenneth Marshall |
---|---|
Subject | Re: Hash Indexes |
Date | |
Msg-id | 20160919151621.GB12277@aart.rice.edu Whole thread Raw |
In response to | Re: Hash Indexes (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Mon, Sep 19, 2016 at 12:14:26PM +0530, Amit Kapila wrote: > On Mon, Sep 19, 2016 at 11:20 AM, Mark Kirkwood > <mark.kirkwood@catalyst.net.nz> wrote: > > > > > > On 17/09/16 06:38, Andres Freund wrote: > >> > >> On 2016-09-16 09:12:22 -0700, Jeff Janes wrote: > >>> > >>> On Thu, Sep 15, 2016 at 7:23 AM, Andres Freund <andres@anarazel.de> > >>> wrote: > >>>> > >>>> One earlier question about this is whether that is actually a worthwhile > >>>> goal. Are the speed and space benefits big enough in the general case? > >>>> Could those benefits not be achieved in a more maintainable manner by > >>>> adding a layer that uses a btree over hash(columns), and adds > >>>> appropriate rechecks after heap scans? > >>>> > >>>> Note that I'm not saying that hash indexes are not worthwhile, I'm just > >>>> doubtful that question has been explored sufficiently. > >>> > >>> I think that exploring it well requires good code. If the code is good, > >>> why not commit it? > >> > >> Because getting there requires a lot of effort, debugging it afterwards > >> would take effort, and maintaining it would also takes a fair amount? > >> Adding code isn't free. > >> > >> I'm rather unenthused about having a hash index implementation that's > >> mildly better in some corner cases, but otherwise doesn't have much > >> benefit. That'll mean we'll have to step up our user education a lot, > >> and we'll have to maintain something for little benefit. > >> > > > > While I see the point of what you are saying here, I recall all previous > > discussions about has indexes tended to go a bit like this: > > > > - until WAL logging of hash indexes is written it is not worthwhile trying > > to make improvements to them > > - WAL logging will be a lot of work, patches 1st please > > > > Now someone has done that work, and we seem to be objecting that because > > they are not improved then the patches are (maybe) not worthwhile. > > > > I think saying hash indexes are not improved after proposed set of > patches is an understatement. The read performance has improved by > more than 80% as compare to HEAD [1] (refer data in Mithun's mail). > Also, tests by Mithun and Jesper has indicated that in multiple > workloads, they are better than BTREE by 30~60% (in fact Jesper > mentioned that he is seeing 40~60% benefit on production database, > Jesper correct me if I am wrong.). I agree that when index column is > updated they are much worse than btree as of now, but no work has been > done improve it and I am sure that it can be improved for those cases > as well. > > In general, I thought the tests done till now are sufficient to prove > the importance of work, but if still Andres and others have doubt and > they want to test some specific cases, then sure we can do more > performance benchmarking. > > Mark, thanks for supporting the case for improving Hash Indexes. > > > [1] - https://www.postgresql.org/message-id/CAD__OugX0aOa7qopz3d-nbBAoVmvSmdFJOX4mv5tFRpijqH47A%40mail.gmail.com > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com > +1 Throughout the years, I have seen benchmarks that demonstrated the performance advantages of even the initial hash index (without WAL) over the btree of a hash variant. It is pretty hard to dismiss the O(1) versus O(log(n)) difference. There are classes of problems for which a hash index is the best solution. Lack of WAL has hamstrung development in those areas for years. Regards, Ken
pgsql-hackers by date: