Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction - Mailing list pgsql-performance

From Simon Riggs
Subject Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction
Date
Msg-id CANP8+jKeswEqRUOdRJAhubL=FFxVRq9zmhwt3gPg2_7EFAP3xw@mail.gmail.com
Whole thread Raw
In response to Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction  (Michael Lewis <mlewis@entrata.com>)
List pgsql-performance
On Thu, 20 Jun 2019 at 17:30, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Jun 20, 2019 at 05:18:33PM +0100, Simon Riggs wrote:
> On Thu, 20 Jun 2019 at 17:01, Chris Wilson <chris.wilson@cantabcapital.com>
> wrote:
>
>
> > I deliberately included r in the index, to demonstrate the issue that I’m
> > seeing. I know that there is very little locality in this particular,
> > dummy, arbitrary test case. I can try to produce a test case that has some
> > locality, but I expect it to show exactly the same results, i.e. that the
> > BRIN index performs much worse when we try to query on this column as well.
> >
>
> I'm suggesting that adding the second column to the index is the source of
> your problem, not adding the column to the query.

But it *is* odd that the index returns more rows with a strictly tighter
conditions, right ?

Oh, very. I was seeing this as an optimization issue rather than a bug report.
 
Note, it's not an issue of rowcount estimate being confused by redundant
conditions, but real rowcount, and it returns more rows even when the
conditions are duplicative.  Compare:

postgres=# explain analyze select * from brin_test where id >= 90000 and r in (1);
...
   ->  Bitmap Index Scan on brin_test_id_r_idx  (cost=0.00..12.03 rows=28125 width=0) (actual time=0.136..0.137 rows=37120 loops=1)
         Index Cond: ((id >= 90000) AND (r = 1))

postgres=# explain analyze select * from brin_test where id >= 90000 and r in (1,1);
...
   ->  Bitmap Index Scan on brin_test_id_r_idx  (cost=0.00..12.03 rows=28125 width=0) (actual time=0.263..0.263 rows=74240 loops=1)
         Index Cond: ((id >= 90000) AND (r = ANY ('{1,1}'::integer[])))

postgres=# explain analyze select * from brin_test where id >= 90000 and r in (1,1,1);
...
   ->  Bitmap Index Scan on brin_test_id_r_idx  (cost=0.00..12.03 rows=28125 width=0) (actual time=0.387..0.387 rows=111360 loops=1)
         Index Cond: ((id >= 90000) AND (r = ANY ('{1,1,1}'::integer[])))

Note, the docs say:
https://www.postgresql.org/docs/devel/indexes-multicolumn.html
|A multicolumn BRIN index can be used with query conditions that involve any
|subset of the index's columns. Like GIN and unlike B-tree or GiST, index search
|effectiveness is the same regardless of which index column(s) the query
|conditions use. The only reason to have multiple BRIN indexes instead of one
|multicolumn BRIN index on a single table is to have a different pages_per_range
|storage parameter.

The min/max values of each column are held for each block range.

If it scans using the "r" column it will identify more block ranges to scan than if it used the id column and hence would scan more real rows, so that part is understandable.

The only question is why it chooses to scan on "r" and not "id", which needs some investigation.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Solutions for the Enterprise

pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction
Next
From: Michael Lewis
Date:
Subject: Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction