Re: Hstore: Query speedups with Gin index - Mailing list pgsql-hackers

From David Fetter
Subject Re: Hstore: Query speedups with Gin index
Date
Msg-id 20130916053710.GA9493@fetter.org
Whole thread Raw
In response to Re: Hstore: Query speedups with Gin index  (Blake Smith <blakesmith0@gmail.com>)
List pgsql-hackers
Blake,

I've taken the liberty of adding this patch to the current Commitfest.

In future, please continue to send patches both to this thread and to
the commitfest application when you have a message ID for them :)

Cheers,
David.
On Mon, Sep 09, 2013 at 09:55:01AM -0500, Blake Smith wrote:
> Thanks for getting back to me about this change Oleg. I took your advice
> and reworked the patch by adding a new hstore gin opclass
> (gin_hstore_combined_ops) and leaving the functionality of the default
> hstore gin opclass the same. This should prevent the on-disk compatibility
> issues from the first patch, and allow users to select the different
> indexing method when they build the index. The hstore regression suite is
> passing for me locally with the --enable-cassert configure flag. Please let
> me know what you think and if there is any other work that would need to be
> done (style cleanups, updating documentation, etc) to get this merged.
> 
> Thanks!
> 
> Blake
> 
> 
> 
> 
> 
> 
> On Fri, Sep 6, 2013 at 1:47 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
> 
> > Blake,
> >
> > I think it's better to implement this patch as a separate opclass, so
> > users will have option to choose indexing.
> >
> > Oleg
> >
> >
> > On Tue, Sep 3, 2013 at 6:24 PM, Blake Smith <blakesmith0@gmail.com> wrote:
> >
> >> Thanks for the feedback everyone. I've attached the patch that we are now
> >> running in production to service our hstore include queries. We rebuilt the
> >> index to account for the on-disk incompatibility. I've submitted the patch
> >> to commitfest here:
> >> https://commitfest.postgresql.org/action/patch_view?id=1203
> >>
> >> Michael: I don't have a formal benchmark, but several of our worst
> >> queries went from 10-20 seconds per query down to 50-400 ms. These are
> >> numbers we've seen when testing real production queries against our
> >> production dataset with real world access patterns.
> >> Oleg: Thanks for your thoughts on this change. As for the spgist / gin
> >> work you're doing, is there anything you need help with or are you still in
> >> the research phase? I'd love to help get something more robust merged into
> >> mainline if you think there's collaborative work to be done (even if it's
> >> only user testing).
> >>
> >> Thanks,
> >>
> >> Blake
> >>
> >>
> >>
> >>
> >> On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund <andres@2ndquadrant.com>wrote:
> >>
> >>> On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
> >>> > On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
> >>> > > Michael Paquier <michael.paquier@gmail.com> writes:
> >>> > > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <
> >>> blakesmith0@gmail.com> wrote:
> >>> > > >> The combined entry is used to support "contains (@>)" queries,
> >>> and the key
> >>> > > >> only item is used to support "key contains (?)" queries. This
> >>> change seems
> >>> > > >> to help especially with hstore keys that have high cardinalities.
> >>> Downsides
> >>> > > >> of this change is that it requires an index rebuild, and the
> >>> index will be
> >>> > > >> larger in size.
> >>> > >
> >>> > > > Index rebuild would be a problem only for minor releases,
> >>> > >
> >>> > > That's completely false; people have expected major releases to be
> >>> > > on-disk-compatible for several years now.  While there probably will
> >>> be
> >>> > > future releases in which we are willing to break storage
> >>> compatibility,
> >>> > > a contrib module doesn't get to dictate that.
> >>> > >
> >>> > > What might be a practical solution, especially if this isn't always a
> >>> > > win (which seems likely given the index-bloat risk), is to make
> >>> hstore
> >>> > > offer two different GIN index opclasses, one that works the
> >>> traditional
> >>> > > way and one that works this way.
> >>> > >
> >>> > > Another thing that needs to be taken into account here is Oleg and
> >>> > > Teodor's in-progress work on extending hstore:
> >>> > > https://www.pgcon.org/2013/schedule/events/518.en.html
> >>> > > I'm not sure if this patch would conflict with that at all, but it
> >>> > > needs to be considered.
> >>> >
> >>> > We can disallow in-place upgrades for clusters that use certain contrib
> >>> > modules --- we have done that in the past.
> >>>
> >>> But that really cannot be acceptable for hstore. The probably most
> >>> widely used extension there is.
> >>>
> >>> Greetings,
> >>>
> >>> Andres Freund
> >>>
> >>> --
> >>>  Andres Freund                     http://www.2ndQuadrant.com/
> >>>  PostgreSQL Development, 24x7 Support, Training & Services
> >>>
> >>
> >>
> >>
> >> --
> >> Blake Smith
> >> http://blakesmith.me
> >> @blakesmith
> >>
> >>
> >> --
> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-hackers
> >>
> >>
> >
> 
> 
> -- 
> Blake Smith
> http://blakesmith.me
> @blakesmith


> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



pgsql-hackers by date:

Previous
From: Satoshi Nagayasu
Date:
Subject: Re: pg_system_identifier()
Next
From: Ian Lawrence Barwick
Date:
Subject: Re: plpgsql.print_strict_params