Home > mailing lists

Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: GIN improvements part2: fast scan
Date	June 19, 2013 08:57:02
Msg-id	CAPpHfdvOcjQZwBeT8on1yztt9e4VN_7orDXVbtE26rctS-nP_g@mail.gmail.com Whole thread Raw
In response to	Re: GIN improvements part2: fast scan (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: GIN improvements part2: fast scan
List	pgsql-hackers

Tree view

On Wed, Jun 19, 2013 at 12:49 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 19.06.2013 11:30, Alexander Korotkov wrote:
On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:

On 18.06.2013 23:59, Alexander Korotkov wrote:

I would like to illustrate that on example. Imagine you have fulltext
query
"rare_term& frequent_term". Frequent term has large posting tree while

rare term has only small posting list containing iptr1, iptr2 and iptr3.
At
first we get iptr1 from posting list of rare term, then we would like to
check whether we have to scan part of frequent term posting tree where
iptr
< iptr1. So we call pre_consistent([false, true]), because we know that
rare term is not present for iptr< iptr2. pre_consistent returns false.
So
we can start scanning frequent term posting tree from iptr1. Similarly we
can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
maximum possible pointer.

Thanks, now I understand the rare-term& frequent-term problem. Couldn't

you do that with the existing consistent function? I don't see why you need
the new pre-consistent function for this.

In the case of two entries I can. But in the case of n entries things
becomes more complicated. Imagine you have "term_1& term_2& ...& term_n"

query. When you get some item pointer from term_1 you can skip all the
lesser item pointers from term_2, term_3 ... term_n. But if all you have
for it is consistent function you have to call it with following check
arguments:
1) [false, false, false, ... , false]
2) [false, true, false, ... , false]
3) [false, false, true, ... , false]
4) [false, true, true, ..., false]
......
i.e. you have to call it 2^(n-1) times. But if you know the query specific
(i.e. in opclass) it's typically easy to calculate exactly what we need in
single pass. That's why I introduced pre_consistent.

Hmm. So how does that work with the pre-consistent function? Don't you need to call that 2^(n-1)-1 times as well?

I call pre-consistent once with [false, true, true, ..., true]. Pre-consistent knows that each true passed to it could be false positive. So, if it returns false it guarantees that consistent will be false for all possible combinations.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 19 June 2013, 08:49:54
Subject: Re: GIN improvements part2: fast scan

From: Simon Riggs
Date: 19 June 2013, 09:03:53
Subject: Re: Add visibility map information to pg_freespace.

Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

Previous

Next