Thread: Bitmap and-ing between btree and gin?

Bitmap and-ing between btree and gin?

From
Jordi
Date:
Hello all,


I've been trying to get a query use indexes and it has raised a doubt whether pgsql supports bitmap and-ing between a multi-column btree index and a gin index.

The idea is to do a full-text search on a tsvector that is indexed with gin. Then there are typical extra filters like is_active that you would put in a btree index. Instead of using OFFSET I use a > operation on the id. Finally, to make sure the results listing always appear in the same order, I do an ORDER BY the id of the row. So something like this:

CREATE INDEX idx_gin_page ON page USING gin(search_vector);
CREATE INDEX idx_btree_active_iddesc ON page USING btree(is_active, id DESC);
SELECT * FROM page WHERE (( (page.search_vector) @@ (plainto_tsquery('pg_catalog.english', 'myquery'))) AND page.is_active = 1 AND page.id > 100) ORDER BY page.id DESC LIMIT 100;

Some options I considered:
- One big multi-column index with the btree_gin module, but that didn't work. I suppose it's because just like gin, it doesn't support sorting.
- Seperate indexes as above, but that didn't work. The planner would always choose the btree index to do the is_active=1 and id>100 filter and the sorting, and within those results do a manual filter on the tsvector, being extremely slow.

BUT: when I remove the ORDER BY statement, the query runs really fast. It uses the 2 indexes seperately and bitmap-ands them together, resulting in a fast executing query.

So my question is whether there is something wrong with my query or indexes, or does pgsql not support sorting and bitmap and-ing?


Thanks and have a nice day
Jordi

Re: Bitmap and-ing between btree and gin?

From
Tom Lane
Date:
Jordi <jmaillists@promani.be> writes:
> I've been trying to get a query use indexes and it has raised a doubt
> whether pgsql supports bitmap and-ing between a multi-column btree index
> and a gin index.

Sure.  But such a plan would give an unordered result, meaning that we'd
have to process the whole table before doing the ORDER BY/LIMIT.  The
planner evidently thinks that it's better to try to process the rows in
ID order so it can stop as soon as it's got 100.  If it's wrong about
that, that's likely because it's got a bad estimate of the selectivity of
the other WHERE conditions.  You might see if you can improve the
statistics for the search_vector column.

            regards, tom lane


Re: Bitmap and-ing between btree and gin?

From
Jordi
Date:
Hi Tom, thanks for your reply, much appreciated.

So basically you're saying it's hard to do sorting in any way when a gin
index is involved? Neither with a complete multi-column btree_gin index
because it doesn't support sorting per definition, nor with a seperate
gin and btree because there would be an extra post-sorting step involved
over the FULL resultset (because of the LIMIT).

Then would you have any hint on how to implement pagination when doing
full text search?
Cause in theory, if I gave it a id>100 LIMIT 100, it might just as well
return me results 150 to 250, instead of 100 to 200...

PS: I already tried maxing the statistics target setting and running
ANALYSE after, with no change.

Regards,
Jordi


On 04-02-16 17:08, Tom Lane wrote:
> Jordi <jmaillists@promani.be> writes:
>> I've been trying to get a query use indexes and it has raised a doubt
>> whether pgsql supports bitmap and-ing between a multi-column btree index
>> and a gin index.
> Sure.  But such a plan would give an unordered result, meaning that we'd
> have to process the whole table before doing the ORDER BY/LIMIT.  The
> planner evidently thinks that it's better to try to process the rows in
> ID order so it can stop as soon as it's got 100.  If it's wrong about
> that, that's likely because it's got a bad estimate of the selectivity of
> the other WHERE conditions.  You might see if you can improve the
> statistics for the search_vector column.
>
>             regards, tom lane
>
>



Re: Bitmap and-ing between btree and gin?

From
Jeff Janes
Date:
On Thu, Feb 4, 2016 at 9:19 AM, Jordi <jmaillists@promani.be> wrote:

The custom here is to respond in line, not to top-post.  Thanks.

>
> So basically you're saying it's hard to do sorting in any way when a gin
> index is involved? Neither with a complete multi-column btree_gin index
> because it doesn't support sorting per definition, nor with a seperate gin
> and btree because there would be an extra post-sorting step involved over
> the FULL resultset (because of the LIMIT).

In principle there is no reason (that I can think of) that a normal
btree index range scan couldn't accept a bitmap as an optional input,
and then use that as a filter which would allow it to walk the index
in order while throwing out tuples that can't match the other
conditions.  You are not the first person who would benefit from such
a feature.  But it would certainly not be trivial to implement.  It is
not on anyone's to-do list as far as I know.

From your earlier email:

> BUT: when I remove the ORDER BY statement, the query runs really fast. It uses the 2 indexes seperately and
bitmap-andsthem together, resulting in a fast executing query. 

When you removed the ORDER BY, did you also remove the LIMIT?  If you
removed the ORDER BY and kept the LIMIT, that is pretty much a
meaningless comparison.  You are asking a much easier question at that
point.

> Then would you have any hint on how to implement pagination when doing full
> text search?
> Cause in theory, if I gave it a id>100 LIMIT 100, it might just as well
> return me results 150 to 250, instead of 100 to 200...

Can you use a method that maintains state (cursor with fetching, or
temporary storage) so that it doesn't have to recalculate the query
for each page?

Cheers,

Jeff