Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: GIN improvements part2: fast scan |
Date | |
Msg-id | CAPpHfdv0vT8A=krseEdya7btrbPCjyf0e=kMT39xXZuY5E-ibQ@mail.gmail.com Whole thread Raw |
In response to | Re: GIN improvements part2: fast scan ("Tomas Vondra" <tv@fuzzy.cz>) |
Responses |
Re: GIN improvements part2: fast scan
("Tomas Vondra" <tv@fuzzy.cz>)
|
List | pgsql-hackers |
On Mon, Feb 3, 2014 at 7:24 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
------
With best regards,
Alexander Korotkov.
On 3 Únor 2014, 15:31, Alexander Korotkov wrote:Do I understand it correctly that the 0005 patch should give exactly the
>
> I found my patch "0005-Ternary-consistent-implementation.patch" to be
> completely wrong. It introduces ternary consistent function to opclass,
> but
> don't uses it, because I forgot to include ginlogic.c change into patch.
> So, it shouldn't make any impact on performance. However, testing results
> with that patch significantly differs. That makes me very uneasy. Can we
> now reproduce exact same?
same performance as the 9.4-heikki branch (as it was applied on it, and
effectively did no change). This wasn't exactly what I measured, although
the differences were not that significant.
Do I undestand correctly it's 9.4-heikki and 9.4-alex-1 here:
In some queries it differs in times. I wonder why.
I can rerun the tests, if that's what you're asking for. I'll improve the
test a bit - e.g. I plan to average multiple runs, to filter out random
noise (which might be significant for such short queries).Yes, I'll do that. However I'll have to rerun the other tests too, because
> Right version of these two patches in one against current head is
> attached.
> I've rerun tests with it, results are
> /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
> postprocessing including graph drawing?
the
previous runs were done on a different machine.
I'm a bit confused right now. The previous patches (0005 + 0007) were
supposed
to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK those
were
not commited yet, so why is this version against HEAD?
To summarize, I know of these patch sets:
9.4-heikki (old version)
0001-Optimize-GIN-multi-key-queries.patch
0002-Further-optimize-the-multi-key-GIN-searches.patch
0003-Further-optimize-GIN-multi-key-searches.patch
0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch
9.4-alex-1 (based on 9.4-heikki)
0005-Ternary-consistent-implementation.patch
9.4-alex-1 (based on 9.4-alex-1)
0006-Sort-entries.patch
From these patches I only need to compare 9.4-heikki (old version) and 9.4-alex-1 to release my doubts.
9.4-heikki (new version)
gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch
9.4-alex-2 (new version)
gin-fast-scan.10.patch.gz
Or did I get that wrong?
Only you mentioned 9.4-alex-1 twice. I afraid to have some mess in numbering.
> Sometimes test cases are not what we expect. For example:Why do you find that strange? The way the query is formed or the way it's
>
> =# explain SELECT id FROM messages WHERE body_tsvector @@
> to_tsquery('english','(5alpha1-initdb''d)');
> QUERY PLAN
>
> ────────────────────────────────────────────────────────────────────────────────
> Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4)
> Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
> ''initdb'' & ''d'''::tsquery)
> -> Bitmap Index Scan on messages_body_tsvector_idx (cost=0.00..84.00
> rows=1 width=0)
> Index Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
> ''initdb'' & ''d'''::tsquery)
> Planning time: 0.257 ms
> (5 rows)
>
> 5alpha1-initdb'd is 3 gin entries with different frequencies.
evaluated?
The query generator certainly is not perfect, so it may produce some
strange queries.
I just mean that in this case 3 words doesn't mean 3 gin entries.
> Also, these patches are not intended to change relevance ordering speed.Sure, I can do that. Or maybe one set of queries with ORDER BY, the other
> When number of results are high, most of time is relevance calculating and
> sorting. I propose to remove ORDER BY clause from test cases to see scan
> speed more clear.
one without it.
Good.
You mean search queries from the search for mailing list archives? Sure,
we add that.
Yes. I'll transform it into tsquery and send you privately.
------
With best regards,
Alexander Korotkov.
pgsql-hackers by date: