Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: GIN improvements part2: fast scan
Date
Msg-id CAPpHfdv0vT8A=krseEdya7btrbPCjyf0e=kMT39xXZuY5E-ibQ@mail.gmail.com
Whole thread Raw
In response to Re: GIN improvements part2: fast scan  ("Tomas Vondra" <tv@fuzzy.cz>)
Responses Re: GIN improvements part2: fast scan  ("Tomas Vondra" <tv@fuzzy.cz>)
List pgsql-hackers
On Mon, Feb 3, 2014 at 7:24 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
On 3 Únor 2014, 15:31, Alexander Korotkov wrote:
>
> I found my patch "0005-Ternary-consistent-implementation.patch" to be
> completely wrong. It introduces ternary consistent function to opclass,
> but
> don't uses it, because I forgot to include ginlogic.c change into patch.
> So, it shouldn't make any impact on performance. However, testing results
> with that patch significantly differs. That makes me very uneasy. Can we
> now reproduce exact same?

Do I understand it correctly that the 0005 patch should give exactly the
same performance as the 9.4-heikki branch (as it was applied on it, and
effectively did no change). This wasn't exactly what I measured, although
the differences were not that significant.

Do I undestand correctly it's 9.4-heikki and 9.4-alex-1 here:
In some queries it differs in times. I wonder why.

I can rerun the tests, if that's what you're asking for. I'll improve the
test a bit - e.g. I plan to average multiple runs, to filter out random
noise (which might be significant for such short queries).

> Right version of these two patches in one against current head is
> attached.
> I've rerun tests with it, results are
> /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
> postprocessing including graph drawing?

Yes, I'll do that. However I'll have to rerun the other tests too, because
the
previous runs were done on a different machine.

I'm a bit confused right now. The previous patches (0005 + 0007) were
supposed
to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK those
were
not commited yet, so why is this version against HEAD?

To summarize, I know of these patch sets:

9.4-heikki (old version)
    0001-Optimize-GIN-multi-key-queries.patch
    0002-Further-optimize-the-multi-key-GIN-searches.patch
    0003-Further-optimize-GIN-multi-key-searches.patch
    0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch

9.4-alex-1 (based on 9.4-heikki)
    0005-Ternary-consistent-implementation.patch

9.4-alex-1 (based on 9.4-alex-1)
    0006-Sort-entries.patch

 From these patches I only need to compare 9.4-heikki (old version) and 9.4-alex-1 to release my doubts.

9.4-heikki (new version)
    gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch

9.4-alex-2 (new version)
    gin-fast-scan.10.patch.gz

Or did I get that wrong?

 Only you mentioned 9.4-alex-1 twice. I afraid to have some mess in numbering.


> Sometimes test cases are not what we expect. For example:
>
> =# explain SELECT id FROM messages WHERE body_tsvector @@
> to_tsquery('english','(5alpha1-initdb''d)');
>                                                QUERY PLAN
>
> ────────────────────────────────────────────────────────────────────────────────
>  Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
>    Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
> ''initdb'' & ''d'''::tsquery)
>    ->  Bitmap Index Scan on messages_body_tsvector_idx  (cost=0.00..84.00
> rows=1 width=0)
>          Index Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
> ''initdb'' & ''d'''::tsquery)
>  Planning time: 0.257 ms
> (5 rows)
>
> 5alpha1-initdb'd is 3 gin entries with different frequencies.

Why do you find that strange? The way the query is formed or the way it's
evaluated?

The query generator certainly is not perfect, so it may produce some
strange queries.
 
I just mean that in this case 3 words doesn't mean 3 gin entries.

> Also, these patches are not intended to change relevance ordering speed.
> When number of results are high, most of time is relevance calculating and
> sorting. I propose to remove ORDER BY clause from test cases to see scan
> speed more clear.

Sure, I can do that. Or maybe one set of queries with ORDER BY, the other
one without it.

Good. 

> I've dump of postgresql.org search queries from Magnus. We can add them to
> our test case.

You mean search queries from the search for mailing list archives? Sure,
we add that.

Yes. I'll transform it into tsquery and send you privately.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: KNN-GiST with recheck
Next
From: Tom Lane
Date:
Subject: Re: bugfix patch for json_array_elements