Home > mailing lists

Small GIN optimizations (after 9.4) - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Small GIN optimizations (after 9.4)
Date	February 6, 2014 14:36:54
Msg-id	52F373CC.4050800@vmware.com Whole thread Raw
Responses	Re: Small GIN optimizations (after 9.4) Re: Small GIN optimizations (after 9.4)
List	pgsql-hackers

Tree view

While hacking on the GIN patches, I've come up with a few different 
ideas for improving performance. It's too late for 9.4, but I'll list 
them here if someone wants to work on them later:

* Represent ItemPointers as uint64's, to speed up comparisons. 
ginCompareItemPointers is inlined into only a few instructions, but it's 
still more expensive than a single-instruction 64-bit comparison. 
ginCompareItemPointers is called very heavily in a GIN scan, so even a 
small improvement there would make for a noticeable speedup. It might be 
an improvement in code clarity, too.

* Keep the entry streams of a GinScanKey in a binary heap, to quickly 
find the minimum curItem among them.

I did this in various versions of the fast scan patch, but then I 
realized that the straightforward way of doing it is wrong, because a 
single GinScanEntry can be part of multiple GinScanKeys. If an entry's 
curItem is updated as part of advancing one key, and the entry is in a 
heap of another key, updating the curItem can violate the heap property 
of the other entry's heap.

* Build a truth table (or cache) of consistent-function's results, and 
use that instead of calling consistent for every item.

* Deduce AND or OR logic from the consistent function. Or have the 
opclass provide a tree of AND/OR/NOT nodes directly, instead of a 
consistent function. For example, if the query is "foo & bar", we could 
avoid calling consistent function altogether, and only return items that 
match both.

* Delay decoding segments during a scan. Currently, we decode all 
segments of a posting tree page into a single array at once. But with 
"fast scan", we might be able to skip over all entries in some of the 
segments. So it would be better to copy the segments into 
backend-private memory in compressed format, and decode them one segment 
at a time (or maybe even one item at a time), when needed. That would 
avoid the unnecessary decoding of segments that can be skipped over, and 
would also reduce memory usage of a scan.

I'll add these to the TODO.

- Heikki

pgsql-hackers by date:

From: Alexander Korotkov
Date: 06 February 2014, 14:22:51
Subject: Re: GIN improvements part2: fast scan

From: Amit Kapila
Date: 06 February 2014, 15:27:56
Subject: Re: Performance Improvement by reducing WAL for Update Operation

Small GIN optimizations (after 9.4) - Mailing list pgsql-hackers

Previous

Next