Re: GiST kNN search queue (Re: KNN-GiST with recheck) - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: GiST kNN search queue (Re: KNN-GiST with recheck) |
Date | |
Msg-id | 548EDD4C.3030600@vmware.com Whole thread Raw |
In response to | Re: GiST kNN search queue (Re: KNN-GiST with recheck) (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Re: GiST kNN search queue (Re: KNN-GiST with recheck) |
List | pgsql-hackers |
On 12/15/2014 03:49 AM, Michael Paquier wrote: > On Thu, Dec 11, 2014 at 12:50 AM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> On 01/28/2014 04:12 PM, Alexander Korotkov wrote: >>>> >>>>> 3. A binary heap would be a better data structure to buffer the >>>>> rechecked >>>>> values. A Red-Black tree allows random insertions and deletions, but in >>>>> this case you need to insert arbitrary values but only remove the >>>>> minimum >>>>> item. That's exactly what a binary heap excels at. We have a nice binary >>>>> heap implementation in the backend that you can use, see >>>>> src/backend/lib/binaryheap.c. >>>>> >>> >>> Hmm. For me binary heap would be a better data structure for KNN-GiST at >>> all :-) >> >> >> I decided to give this a shot, replacing the red-black tree in GiST with the >> binary heap we have in lib/binaryheap.c. It made the GiST code somewhat >> simpler, as the binaryheap interface is simpler than the red-black tree one. >> Unfortunately, performance was somewhat worse. That was quite surprising, as >> insertions and deletions are both O(log N) in both data structures, but the >> red-black tree implementation is more complicated. >> >> I implemented another data structure called a Pairing Heap. It's also a >> fairly simple data structure, but insertions are O(1) instead of O(log N). >> It also performs fairly well in practice. >> >> With that, I got a small but measurable improvement. To test, I created a >> table like this: >> >> create table gisttest (id integer, p point); >> insert into gisttest select id, point(random(), random()) from >> generate_series(1, 1000000) id; >> create index i_gisttest on gisttest using gist (p); >> >> And I ran this query with pgbench: >> >> select id from gisttest order by p <-> '(0,0)' limit 1000; >> >> With unpatched master, I got about 650 TPS, and with the patch 720 TPS. >> That's a nice little improvement, but perhaps more importantly, the pairing >> heap implementation consumes less memory. To measure that, I put a >> MemoryContextStats(so->queueCtx) call into gistendscan. With the above >> query, but without the "limit" clause, on master I got: >> >> GiST scan context: 2109752 total in 10 blocks; 2088456 free (24998 chunks); >> 21296 used >> >> And with the patch: >> >> GiST scan context: 1061160 total in 9 blocks; 1040088 free (12502 chunks); >> 21072 used >> >> That's 2MB vs 1MB. While that's not much in absolute terms, it'd be nice to >> reduce that memory consumption, as there is no hard upper bound on how much >> might be needed. If the GiST tree is really disorganized for some reason, a >> query might need a lot more. >> >> >> So all in all, I quite like this patch, even though it doesn't do anything >> too phenomenal. It adds a some code, in the form of the new pairing heap >> implementation, but it makes the GiST code a little bit simpler. And it >> gives a small performance gain, and reduces memory usage a bit. > Hum. It looks that this patch using binary heap is intended to be a > replacement red-black tree method. Right. Here's a new version of the patch. It now uses the same pairing heap code that I posted in the other thread ("advance local xmin more aggressivley", http://www.postgresql.org/message-id/5488ACF0.8050901@vmware.com). The pairingheap_remove() function is unused in this patch, but it is needed by that other patch. > Any reason why it isn't added to the CF to track it? No. Will add. - Heikki
Attachment
pgsql-hackers by date: