Re: Todo item: Support amgettuple() in GIN - Mailing list pgsql-hackers

From Andreas Karlsson
Subject Re: Todo item: Support amgettuple() in GIN
Date
Msg-id 52988F34.2020407@proxel.se
Whole thread Raw
In response to Re: Todo item: Support amgettuple() in GIN  (Antonin Houska <antonin.houska@gmail.com>)
Responses Re: Todo item: Support amgettuple() in GIN  (Antonin Houska <antonin.houska@gmail.com>)
List pgsql-hackers
On 11/29/2013 09:54 AM, Antonin Houska wrote:
> On 11/29/2013 01:13 AM, Andreas Karlsson wrote:
>
>> When doing partial matching the code need to be able to return the union
>> of all TIDs in all the matching posting trees in TID order (to be able
>> to do AND and OR operations with multiple search keys later). It does
>> this by traversing them posting tree after posting tree and collecting
>> them all in a TIDBitmap which is later iterated over.
>
> I think it's not a plain union. My understanding is that - to evaluate a
> single key (typically array) - you first need to get all the TID streams
> for that key (i.e. one posting list/tree per element of the key array)
> and then iterate all these streams in parallel and 'merge' them     using
> consistent() function. That's how I understand ginget.c:keyGetItem().

For partial matches the merging is done in two steps: first a simple 
union of all the streams per key and then second merging those union 
streams using the consistent() function.

It is the first step that can be lossy.

> So the problem of partial match is (IMO) that there can be too many TID
> streams to merge - much more than the number of elements of the key array.

Agreed.

-- 
Andreas Karlsson



pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: COPY table FROM STDIN doesn't show count tag
Next
From: Peter Eisentraut
Date:
Subject: commit fest 2013-11 week 2 report