Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: GIN improvements part2: fast scan
Date
Msg-id cdbb1c8d4b3996f3ddd48dd01162d51b.squirrel@sq.gransy.com
Whole thread Raw
In response to Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On 3 Únor 2014, 19:18, Alexander Korotkov wrote:
> On Mon, Feb 3, 2014 at 8:19 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>
>> >  > Sometimes test cases are not what we expect. For example:
>> >> >
>> >> > =# explain SELECT id FROM messages WHERE body_tsvector @@
>> >> > to_tsquery('english','(5alpha1-initdb''d)');
>> >> >                                                QUERY PLAN
>> >> >
>> >> >
>> >>
>> ────────────────────────────────────────────────────────────────────────────────
>> >> >  Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
>> >> >    Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' &
>> ''5alpha1'' &
>> >> > ''initdb'' & ''d'''::tsquery)
>> >> >    ->  Bitmap Index Scan on messages_body_tsvector_idx
>> >> (cost=0.00..84.00
>> >> > rows=1 width=0)
>> >> >          Index Cond: (body_tsvector @@ '''5alpha1-initdb'' &
>> >> ''5alpha1''
>> >> &
>> >> > ''initdb'' & ''d'''::tsquery)
>> >> >  Planning time: 0.257 ms
>> >> > (5 rows)
>> >> >
>> >> > 5alpha1-initdb'd is 3 gin entries with different frequencies.
>> >>
>> >> Why do you find that strange? The way the query is formed or the way
>> >> it's
>> >> evaluated?
>> >>
>> >> The query generator certainly is not perfect, so it may produce some
>> >> strange queries.
>> >>
>> >
>> > I just mean that in this case 3 words doesn't mean 3 gin entries.
>>
>> Isn't that expected? I mean, that's what to_tsquery may do, right?
>>
>
> Everything is absolutely correct. :-) It just may be not what do you
> expect
> if you aren't getting into details.

Well, that's not how I designed the benchmark. I haven't based the
benchmark on GIN entries, but on 'natural' words, to simulate real
queries. I understand using GIN terms might get "more consistent" results
(e.g. 3 GIN terms with given frequency) than the current approach.

However this was partially a goal, to cover wider range of cases. Also,
that's why the benchmark works with relative speedup - comparing the query
duration with and without the patch.

Tomas




pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: GIN improvements part2: fast scan
Next
From: Robert Haas
Date:
Subject: Re: [PATCH] pg_sleep(interval)