Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: GIN improvements part2: fast scan
Date
Msg-id 07ed535f7ee700a6c5b36e2d9d2d27b6.squirrel@sq.gransy.com
Whole thread Raw
In response to Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Hi Alexander,

On 3 Únor 2014, 15:31, Alexander Korotkov wrote:
>
> I found my patch "0005-Ternary-consistent-implementation.patch" to be
> completely wrong. It introduces ternary consistent function to opclass,
> but
> don't uses it, because I forgot to include ginlogic.c change into patch.
> So, it shouldn't make any impact on performance. However, testing results
> with that patch significantly differs. That makes me very uneasy. Can we
> now reproduce exact same?

Do I understand it correctly that the 0005 patch should give exactly the
same performance as the 9.4-heikki branch (as it was applied on it, and
effectively did no change). This wasn't exactly what I measured, although
the differences were not that significant.

I can rerun the tests, if that's what you're asking for. I'll improve the
test a bit - e.g. I plan to average multiple runs, to filter out random
noise (which might be significant for such short queries).

> Right version of these two patches in one against current head is
> attached.
> I've rerun tests with it, results are
> /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
> postprocessing including graph drawing?

Yes, I'll do that. However I'll have to rerun the other tests too, because
the
previous runs were done on a different machine.

I'm a bit confused right now. The previous patches (0005 + 0007) were
supposed
to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK those
were
not commited yet, so why is this version against HEAD?

To summarize, I know of these patch sets:

9.4-heikki (old version)   0001-Optimize-GIN-multi-key-queries.patch
0002-Further-optimize-the-multi-key-GIN-searches.patch  0003-Further-optimize-GIN-multi-key-searches.patch
0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch

9.4-alex-1 (based on 9.4-heikki)   0005-Ternary-consistent-implementation.patch

9.4-alex-1 (based on 9.4-alex-1)   0006-Sort-entries.patch

9.4-heikki (new version)   gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch

9.4-alex-2 (new version)   gin-fast-scan.10.patch.gz

Or did I get that wrong?

> Sometimes test cases are not what we expect. For example:
>
> =# explain SELECT id FROM messages WHERE body_tsvector @@
> to_tsquery('english','(5alpha1-initdb''d)');
>                                                QUERY PLAN
>
> ────────────────────────────────────────────────────────────────────────────────
>  Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
>    Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
> ''initdb'' & ''d'''::tsquery)
>    ->  Bitmap Index Scan on messages_body_tsvector_idx  (cost=0.00..84.00
> rows=1 width=0)
>          Index Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
> ''initdb'' & ''d'''::tsquery)
>  Planning time: 0.257 ms
> (5 rows)
>
> 5alpha1-initdb'd is 3 gin entries with different frequencies.

Why do you find that strange? The way the query is formed or the way it's
evaluated?

The query generator certainly is not perfect, so it may produce some
strange queries.

> Also, these patches are not intended to change relevance ordering speed.
> When number of results are high, most of time is relevance calculating and
> sorting. I propose to remove ORDER BY clause from test cases to see scan
> speed more clear.

Sure, I can do that. Or maybe one set of queries with ORDER BY, the other
one without it.

> I've dump of postgresql.org search queries from Magnus. We can add them to
> our test case.

You mean search queries from the search for mailing list archives? Sure,
we add that.

Tomas




pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: jsonb and nested hstore
Next
From: Andres Freund
Date:
Subject: Re: jsonb and nested hstore