Re: Small GIN optimizations (after 9.4) - Mailing list pgsql-hackers

From PostgreSQL - Hans-Jürgen Schönig
Subject Re: Small GIN optimizations (after 9.4)
Date
Msg-id BA395657-838D-406B-896B-A5936C147853@cybertec.at
Whole thread Raw
In response to Small GIN optimizations (after 9.4)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Small GIN optimizations (after 9.4)
List pgsql-hackers
i think there is one more thing which would be really good in GIN and which would solve a ton of issues.
atm GIN entries are sorted by item pointer.
if we could sort them by a "column" it would fix a couple of real work issues such as ...
SELECT ... FROM foo WHERE "tsearch_query" ORDER BY price DESC LIMIT 10

... or so.
it many cases you want to search for a, say, product and find the cheapest / most expensive one.
if the tsearch_query yields a high number of rows (which it often does) the subsequent sort will kill you.
many thanks,
    hans


On Feb 6, 2014, at 12:36 PM, Heikki Linnakangas wrote:

> While hacking on the GIN patches, I've come up with a few different ideas for improving performance. It's too late
for9.4, but I'll list them here if someone wants to work on them later: 
>
> * Represent ItemPointers as uint64's, to speed up comparisons. ginCompareItemPointers is inlined into only a few
instructions,but it's still more expensive than a single-instruction 64-bit comparison. ginCompareItemPointers is
calledvery heavily in a GIN scan, so even a small improvement there would make for a noticeable speedup. It might be an
improvementin code clarity, too. 
>
> * Keep the entry streams of a GinScanKey in a binary heap, to quickly find the minimum curItem among them.
>
> I did this in various versions of the fast scan patch, but then I realized that the straightforward way of doing it
iswrong, because a single GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is updated as part of
advancingone key, and the entry is in a heap of another key, updating the curItem can violate the heap property of the
otherentry's heap. 
>
> * Build a truth table (or cache) of consistent-function's results, and use that instead of calling consistent for
everyitem. 
>
> * Deduce AND or OR logic from the consistent function. Or have the opclass provide a tree of AND/OR/NOT nodes
directly,instead of a consistent function. For example, if the query is "foo & bar", we could avoid calling consistent
functionaltogether, and only return items that match both. 
>
> * Delay decoding segments during a scan. Currently, we decode all segments of a posting tree page into a single array
atonce. But with "fast scan", we might be able to skip over all entries in some of the segments. So it would be better
tocopy the segments into backend-private memory in compressed format, and decode them one segment at a time (or maybe
evenone item at a time), when needed. That would avoid the unnecessary decoding of segments that can be skipped over,
andwould also reduce memory usage of a scan. 
>
> I'll add these to the TODO.
>
> - Heikki
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de




pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: adt Makefile, was Re: jsonb and nested hstore
Next
From: Tom Lane
Date:
Subject: Re: adt Makefile, was Re: jsonb and nested hstore