Re: GiST kNN search queue (Re: KNN-GiST with recheck) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Date
Msg-id 548EDD4C.3030600@vmware.com
Whole thread Raw
In response to Re: GiST kNN search queue (Re: KNN-GiST with recheck)  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Re: GiST kNN search queue (Re: KNN-GiST with recheck)
List pgsql-hackers
On 12/15/2014 03:49 AM, Michael Paquier wrote:
> On Thu, Dec 11, 2014 at 12:50 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 01/28/2014 04:12 PM, Alexander Korotkov wrote:
>>>>
>>>>> 3. A binary heap would be a better data structure to buffer the
>>>>> rechecked
>>>>> values. A Red-Black tree allows random insertions and deletions, but in
>>>>> this case you need to insert arbitrary values but only remove the
>>>>> minimum
>>>>> item. That's exactly what a binary heap excels at. We have a nice binary
>>>>> heap implementation in the backend that you can use, see
>>>>> src/backend/lib/binaryheap.c.
>>>>>
>>>
>>> Hmm. For me binary heap would be a better data structure for KNN-GiST at
>>> all :-)
>>
>>
>> I decided to give this a shot, replacing the red-black tree in GiST with the
>> binary heap we have in lib/binaryheap.c. It made the GiST code somewhat
>> simpler, as the binaryheap interface is simpler than the red-black tree one.
>> Unfortunately, performance was somewhat worse. That was quite surprising, as
>> insertions and deletions are both O(log N) in both data structures, but the
>> red-black tree implementation is more complicated.
>>
>> I implemented another data structure called a Pairing Heap. It's also a
>> fairly simple data structure, but insertions are O(1) instead of O(log N).
>> It also performs fairly well in practice.
>>
>> With that, I got a small but measurable improvement. To test, I created a
>> table like this:
>>
>> create table gisttest (id integer, p point);
>> insert into gisttest select id, point(random(), random()) from
>> generate_series(1, 1000000) id;
>> create index i_gisttest on gisttest using gist (p);
>>
>> And I ran this query with pgbench:
>>
>> select id from gisttest order by p <-> '(0,0)' limit 1000;
>>
>> With unpatched master, I got about 650 TPS, and with the patch 720 TPS.
>> That's a nice little improvement, but perhaps more importantly, the pairing
>> heap implementation consumes less memory. To measure that, I put a
>> MemoryContextStats(so->queueCtx) call into gistendscan. With the above
>> query, but without the "limit" clause, on master I got:
>>
>> GiST scan context: 2109752 total in 10 blocks; 2088456 free (24998 chunks);
>> 21296 used
>>
>> And with the patch:
>>
>> GiST scan context: 1061160 total in 9 blocks; 1040088 free (12502 chunks);
>> 21072 used
>>
>> That's 2MB vs 1MB. While that's not much in absolute terms, it'd be nice to
>> reduce that memory consumption, as there is no hard upper bound on how much
>> might be needed. If the GiST tree is really disorganized for some reason, a
>> query might need a lot more.
>>
>>
>> So all in all, I quite like this patch, even though it doesn't do anything
>> too phenomenal. It adds a some code, in the form of the new pairing heap
>> implementation, but it makes the GiST code a little bit simpler. And it
>> gives a small performance gain, and reduces memory usage a bit.
> Hum. It looks that this patch using binary heap is intended to be a
> replacement red-black tree method.

Right.

Here's a new version of the patch. It now uses the same pairing heap
code that I posted in the other thread ("advance local xmin more
aggressivley",
http://www.postgresql.org/message-id/5488ACF0.8050901@vmware.com). The
pairingheap_remove() function is unused in this patch, but it is needed
by that other patch.

> Any reason why it isn't added to the CF to track it?

No. Will add.

- Heikki


Attachment

pgsql-hackers by date:

Previous
From: Claudio Freire
Date:
Subject: Re: On partitioning
Next
From: Heikki Linnakangas
Date:
Subject: Re: Commit fest 2014-12, let's begin!