Re: GiST kNN search queue (Re: KNN-GiST with recheck) - Mailing list pgsql-hackers

From Arthur Silva
Subject Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Date
Msg-id CAO_YK0UWipBifVgNY1zqha8DJi84Kr=WEhDFLvneKo9Wzj2fBw@mail.gmail.com
Whole thread Raw
In response to GiST kNN search queue (Re: KNN-GiST with recheck)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: GiST kNN search queue (Re: KNN-GiST with recheck)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Wed, Dec 10, 2014 at 1:50 PM, Heikki
Linnakangas<span dir="ltr"><<a href="mailto:hlinnakangas@vmware.com"
target="_blank">hlinnakangas@vmware.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex">On 01/28/2014 04:12 PM, Alexander Korotkov wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> >3. A binary heap would be
abetter data structure to buffer the rechecked<br /> >values. A Red-Black tree allows random insertions and
deletions,but in<br /> >this case you need to insert arbitrary values but only remove the minimum<br /> >item.
That'sexactly what a binary heap excels at. We have a nice binary<br /> >heap implementation in the backend that you
canuse, see<br /> >src/backend/lib/binaryheap.c.<br /> ><br /></blockquote> Hmm. For me binary heap would be a
betterdata structure for KNN-GiST at<br /> all :-)<br /></blockquote><br /> I decided to give this a shot, replacing
thered-black tree in GiST with the binary heap we have in lib/binaryheap.c. It made the GiST code somewhat simpler, as
thebinaryheap interface is simpler than the red-black tree one. Unfortunately, performance was somewhat worse. That was
quitesurprising, as insertions and deletions are both O(log N) in both data structures, but the red-black tree
implementationis more complicated.<br /><br /> I implemented another data structure called a Pairing Heap. It's also a
fairlysimple data structure, but insertions are O(1) instead of O(log N). It also performs fairly well in practice.<br
/><br/> With that, I got a small but measurable improvement. To test, I created a table like this:<br /><br /> create
tablegisttest (id integer, p point);<br /> insert into gisttest select id, point(random(), random()) from
generate_series(1,1000000) id;<br /> create index i_gisttest on gisttest using gist (p);<br /><br /> And I ran this
querywith pgbench:<br /><br /> select id from gisttest order by p <-> '(0,0)' limit 1000;<br /><br /> With
unpatchedmaster, I got about 650 TPS, and with the patch 720 TPS. That's a nice little improvement, but perhaps more
importantly,the pairing heap implementation consumes less memory. To measure that, I put a
MemoryContextStats(so-><u></u>queueCtx)call into gistendscan. With the above query, but without the "limit" clause,
onmaster I got:<br /><br /> GiST scan context: 2109752 total in 10 blocks; 2088456 free (24998 chunks); 21296 used<br
/><br/> And with the patch:<br /><br /> GiST scan context: 1061160 total in 9 blocks; 1040088 free (12502 chunks);
21072used<br /><br /> That's 2MB vs 1MB. While that's not much in absolute terms, it'd be nice to reduce that memory
consumption,as there is no hard upper bound on how much might be needed. If the GiST tree is really disorganized for
somereason, a query might need a lot more.<br /><br /><br /> So all in all, I quite like this patch, even though it
doesn'tdo anything too phenomenal. It adds a some code, in the form of the new pairing heap implementation, but it
makesthe GiST code a little bit simpler. And it gives a small performance gain, and reduces memory usage a
bit.<span><fontcolor="#888888"><br /><br /> - Heikki<br /><br /></font></span><br /><br /> --<br /> Sent via
pgsql-hackersmailing list (<a href="mailto:pgsql-hackers@postgresql.org"
target="_blank">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a
href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/><br /></blockquote></div><br /></div><div
class="gmail_extra">Itmay be better to replace the lib/binaryheap altogether as it offers comparable/better
performance.<br/></div></div> 

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: advance local xmin more aggressively
Next
From: Matt Newell
Date:
Subject: Re: libpq pipelining