Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: GIN improvements part2: fast scan
Date
Msg-id 368f0edae6c56767e78da2e45eb43a93.squirrel@sq.gransy.com
Whole thread Raw
In response to Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On 3 Únor 2014, 17:08, Alexander Korotkov wrote:
> On Mon, Feb 3, 2014 at 7:24 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>
>> On 3 Únor 2014, 15:31, Alexander Korotkov wrote:
>> >
>> > I found my patch "0005-Ternary-consistent-implementation.patch" to be
>> > completely wrong. It introduces ternary consistent function to
>> opclass,
>> > but
>> > don't uses it, because I forgot to include ginlogic.c change into
>> patch.
>> > So, it shouldn't make any impact on performance. However, testing
>> results
>> > with that patch significantly differs. That makes me very uneasy. Can
>> we
>> > now reproduce exact same?
>>
>> Do I understand it correctly that the 0005 patch should give exactly the
>> same performance as the 9.4-heikki branch (as it was applied on it, and
>> effectively did no change). This wasn't exactly what I measured,
>> although
>> the differences were not that significant.
>>
>
> Do I undestand correctly it's 9.4-heikki and 9.4-alex-1 here:
> http://www.fuzzy.cz/tmp/gin/#

Yes.

> In some queries it differs in times. I wonder why.

Not sure.

> I can rerun the tests, if that's what you're asking for. I'll improve the
>> test a bit - e.g. I plan to average multiple runs, to filter out random
>> noise (which might be significant for such short queries).
>>
>> > Right version of these two patches in one against current head is
>> > attached.
>> > I've rerun tests with it, results are
>> > /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
>> > postprocessing including graph drawing?
>>
>> Yes, I'll do that. However I'll have to rerun the other tests too,
>> because
>> the
>> previous runs were done on a different machine.
>>
>> I'm a bit confused right now. The previous patches (0005 + 0007) were
>> supposed
>> to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK
>> those
>> were
>> not commited yet, so why is this version against HEAD?
>>
>> To summarize, I know of these patch sets:
>>
>> 9.4-heikki (old version)
>>     0001-Optimize-GIN-multi-key-queries.patch
>>     0002-Further-optimize-the-multi-key-GIN-searches.patch
>>     0003-Further-optimize-GIN-multi-key-searches.patch
>>     0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch
>>
>> 9.4-alex-1 (based on 9.4-heikki)
>>     0005-Ternary-consistent-implementation.patch
>>
>> 9.4-alex-1 (based on 9.4-alex-1)
>>     0006-Sort-entries.patch
>>
>
>  From these patches I only need to compare 9.4-heikki (old version) and
> 9.4-alex-1 to release my doubts.

OK, understood.

>
> 9.4-heikki (new version)
>>     gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch
>>
>> 9.4-alex-2 (new version)
>>     gin-fast-scan.10.patch.gz
>>
>> Or did I get that wrong?
>>
>
>  Only you mentioned 9.4-alex-1 twice. I afraid to have some mess in
> numbering.

You're right. It should have been like this:

9.4-alex-1 (based on 9.4-heikki)   0005-Ternary-consistent-implementation.patch

9.4-alex-2 (based on 9.4-alex-1)   0006-Sort-entries.patch

9.4-alex-3 (new version, not yet tested)   gin-fast-scan.10.patch.gz

>
>  > Sometimes test cases are not what we expect. For example:
>> >
>> > =# explain SELECT id FROM messages WHERE body_tsvector @@
>> > to_tsquery('english','(5alpha1-initdb''d)');
>> >                                                QUERY PLAN
>> >
>> >
>> ────────────────────────────────────────────────────────────────────────────────
>> >  Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
>> >    Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
>> > ''initdb'' & ''d'''::tsquery)
>> >    ->  Bitmap Index Scan on messages_body_tsvector_idx
>> (cost=0.00..84.00
>> > rows=1 width=0)
>> >          Index Cond: (body_tsvector @@ '''5alpha1-initdb'' &
>> ''5alpha1''
>> &
>> > ''initdb'' & ''d'''::tsquery)
>> >  Planning time: 0.257 ms
>> > (5 rows)
>> >
>> > 5alpha1-initdb'd is 3 gin entries with different frequencies.
>>
>> Why do you find that strange? The way the query is formed or the way
>> it's
>> evaluated?
>>
>> The query generator certainly is not perfect, so it may produce some
>> strange queries.
>>
>
> I just mean that in this case 3 words doesn't mean 3 gin entries.

Isn't that expected? I mean, that's what to_tsquery may do, right?

Tomas




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: bugfix patch for json_array_elements
Next
From: Robert Haas
Date:
Subject: Re: bgworker crashed or not?