Thread: Optimising a two column OR check

Optimising a two column OR check

From
Ivan Voras
Date:
Hello,

There's a "users" table with the following structure:

CREATE TABLE "user" (
  id SERIAL PRIMARY KEY,
 -- other fields
);

and there's a "friends" table with the following structure:

CREATE TABLE friend (
  user1_id INTEGER NOT NULL REFERENCES "user"(id),
  user2_id INTEGER NOT NULL REFERENCES "user"(id),
 -- other fields
  CHECK (user1_id < user2_id),
  PRIMARY KEY (user1_id, user2_id)
);

And I'm running this query:

SELECT user1_id,user2_id FROM friend WHERE user1_id=42 OR user2_id=42;

With seqscan disabled, I get this plan on 9.6:

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on friend  (cost=8.42..19.01 rows=14 width=8)
   Recheck Cond: ((user1_id = 1) OR (user2_id = 2))
   ->  BitmapOr  (cost=8.42..8.42 rows=14 width=0)
         ->  Bitmap Index Scan on friend_pkey  (cost=0.00..4.21 rows=7 width=0)
               Index Cond: (user1_id = 1)
         ->  Bitmap Index Scan on friend_user2_id_user1_id_idx  (cost=0.00..4.21 rows=7 width=0)
               Index Cond: (user2_id = 2)
(7 rows)

I expected to get an index-only scan in this situation, as that would be a very common query. Is there a way to actually make this sort of query resolvable with an index-only scan? Maybe a different table structure would help?

Re: Optimising a two column OR check

From
Justin Pryzby
Date:
On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
>  Bitmap Heap Scan on friend  (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?

The v11 release notes have this relevant item:

https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)

Justin



Re: Optimising a two column OR check

From
Andrew Gierth
Date:
>>>>> "Ivan" == Ivan Voras <ivoras@gmail.com> writes:

 Ivan> Hello,
 Ivan> There's a "users" table with the following structure:

 Ivan> CREATE TABLE "user" (
 Ivan>   id SERIAL PRIMARY KEY,
 Ivan>  -- other fields
 Ivan> );

 Ivan> and there's a "friends" table with the following structure:

 Ivan> CREATE TABLE friend (
 Ivan>   user1_id INTEGER NOT NULL REFERENCES "user"(id),
 Ivan>   user2_id INTEGER NOT NULL REFERENCES "user"(id),
 Ivan>  -- other fields
 Ivan>   CHECK (user1_id < user2_id),
 Ivan>   PRIMARY KEY (user1_id, user2_id)
 Ivan> );

 Ivan> And I'm running this query:

 Ivan> SELECT user1_id,user2_id FROM friend WHERE user1_id=42 OR user2_id=42;

To get friends of user 42:

SELECT user1_id FROM friend WHERE user2_id=42
UNION ALL
SELECT user2_id FROM friend WHERE user1_id=42;

assuming you create the (user2_id,user1_id) index, this should get you
an Append of two index-only scans. We can use UNION ALL here rather than
UNION because the table constraints ensure there are no duplicates.

-- 
Andrew (irc:RhodiumToad)



Re: Optimising a two column OR check

From
Jeff Janes
Date:
On Sat, Oct 12, 2019 at 10:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
>  Bitmap Heap Scan on friend  (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?

It would have to scan the entire index to find the cases where   user2_id=42 but user1_id is not constrained.  Technically User1_id would be constrained to be less than 42, but I don't think the planner will take that into account.


The v11 release notes have this relevant item:

https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)


But this is not one of those cases.  It is only possible when the only data needed is whether the row exists or not.

Cheers,

Jeff

Re: Optimising a two column OR check

From
MichaelDBA
Date:
Another thing to consider is the visibility map.  From what I understand, index only scans are preferred for heavily updated tables, not infrequently updated ones.  Even though index only scans imply ONLY they really aren't in the sense that they may need to visit the Visibility Map for the heap. This can be costly and the planner may remove index only scan consideration if the VM has tuples that are not visible.

BTW, to Andrew, the UNION ALL alternative still results in bitmap index scans from my testing.

Regards,
Michael Vitale



Jeff Janes wrote on 10/12/2019 11:17 AM:
On Sat, Oct 12, 2019 at 10:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
>  Bitmap Heap Scan on friend  (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?

It would have to scan the entire index to find the cases where   user2_id=42 but user1_id is not constrained.  Technically User1_id would be constrained to be less than 42, but I don't think the planner will take that into account.


The v11 release notes have this relevant item:

https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)


But this is not one of those cases.  It is only possible when the only data needed is whether the row exists or not.

Cheers,

Jeff

Re: Optimising a two column OR check

From
Andrew Gierth
Date:
>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:

 MichaelDBA> BTW, to Andrew, the UNION ALL alternative still results in
 MichaelDBA> bitmap index scans from my testing.

You probably forgot to vacuum the table.

-- 
Andrew (irc:RhodiumToad)



Re: Optimising a two column OR check

From
MichaelDBA
Date:
Yikes, apologies to all, my wording is the opposite of what I meant!

Index only scans are preferred for infrequently updated ones, not heavily updated ones where the visibility map is updated often.

Regards,
Michael Vitale


MichaelDBA wrote on 10/12/2019 11:27 AM:
Another thing to consider is the visibility map.  From what I understand, index only scans are preferred for heavily updated tables, not infrequently updated ones.  Even though index only scans imply ONLY they really aren't in the sense that they may need to visit the Visibility Map for the heap. This can be costly and the planner may remove index only scan consideration if the VM has tuples that are not visible.

BTW, to Andrew, the UNION ALL alternative still results in bitmap index scans from my testing.

Regards,
Michael Vitale



Jeff Janes wrote on 10/12/2019 11:17 AM:
On Sat, Oct 12, 2019 at 10:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
>  Bitmap Heap Scan on friend  (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?

It would have to scan the entire index to find the cases where   user2_id=42 but user1_id is not constrained.  Technically User1_id would be constrained to be less than 42, but I don't think the planner will take that into account.


The v11 release notes have this relevant item:

https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)


But this is not one of those cases.  It is only possible when the only data needed is whether the row exists or not.

Cheers,

Jeff


Re: Optimising a two column OR check

From
MichaelDBA
Date:
Nope, vacuumed it and still got the bitmap index scans.

Andrew Gierth wrote on 10/12/2019 11:33 AM:
>>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:
>   MichaelDBA> BTW, to Andrew, the UNION ALL alternative still results in
>   MichaelDBA> bitmap index scans from my testing.
>
> You probably forgot to vacuum the table.
>




Re: Optimising a two column OR check

From
MichaelDBA
Date:
Perhaps the fix by Alexander Kuzmenkov in V11 added this VM consideration for having a preference of bitmap index scan over an index only scan.  Looks like I'm goin' down the rabbit hole...

Regards,
Michael Vitale

MichaelDBA wrote on 10/12/2019 11:35 AM:
Nope, vacuumed it and still got the bitmap index scans.

Andrew Gierth wrote on 10/12/2019 11:33 AM:
"MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:
  MichaelDBA> BTW, to Andrew, the UNION ALL alternative still results in
  MichaelDBA> bitmap index scans from my testing.

You probably forgot to vacuum the table.





Re: Optimising a two column OR check

From
Andrew Gierth
Date:
>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:

 MichaelDBA> Nope, vacuumed it and still got the bitmap index scans.

Let's see your explains. Here's mine:

# set enable_seqscan=false;  -- because I only have a few rows
SET
# insert into friend values (1,2),(2,5);
INSERT 0 2
# vacuum analyze friend;
VACUUM
# explain analyze SELECT user1_id FROM friend WHERE user2_id=2 UNION ALL select user2_id FROM friend WHERE user1_id=2;
                                                                   QUERY PLAN
                       
 

------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.13..8.32 rows=2 width=4) (actual time=0.009..0.014 rows=2 loops=1)
   ->  Index Only Scan using friend_user2_id_user1_id_idx on friend  (cost=0.13..4.15 rows=1 width=4) (actual
time=0.009..0.009rows=1 loops=1)
 
         Index Cond: (user2_id = 2)
         Heap Fetches: 0
   ->  Index Only Scan using friend_pkey on friend friend_1  (cost=0.13..4.15 rows=1 width=4) (actual time=0.003..0.004
rows=1loops=1)
 
         Index Cond: (user1_id = 2)
         Heap Fetches: 0
 Planning Time: 0.271 ms
 Execution Time: 0.045 ms
(9 rows)

Note that you have to put some actual rows in the table; if it is
completely empty, you'll not get a representative result.

-- 
Andrew (irc:RhodiumToad)



Re: Optimising a two column OR check

From
MichaelDBA
Date:
Yep, you're right, Andrew, adding a couple rows made it do the index 
only scan.  I reckon I got misled by turning off sequential scans, 
thinking that actual rows were not important anymore.  Overly simplistic 
reasonings can get one into trouble, lol.

Regards,
Michael Vitale


Andrew Gierth wrote on 10/12/2019 11:46 AM:
>>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:
>   MichaelDBA> Nope, vacuumed it and still got the bitmap index scans.
>
> Let's see your explains. Here's mine:
>
> # set enable_seqscan=false;  -- because I only have a few rows
> SET
> # insert into friend values (1,2),(2,5);
> INSERT 0 2
> # vacuum analyze friend;
> VACUUM
> # explain analyze SELECT user1_id FROM friend WHERE user2_id=2 UNION ALL select user2_id FROM friend WHERE
user1_id=2;
>                                                                     QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------
>   Append  (cost=0.13..8.32 rows=2 width=4) (actual time=0.009..0.014 rows=2 loops=1)
>     ->  Index Only Scan using friend_user2_id_user1_id_idx on friend  (cost=0.13..4.15 rows=1 width=4) (actual
time=0.009..0.009rows=1 loops=1)
 
>           Index Cond: (user2_id = 2)
>           Heap Fetches: 0
>     ->  Index Only Scan using friend_pkey on friend friend_1  (cost=0.13..4.15 rows=1 width=4) (actual
time=0.003..0.004rows=1 loops=1)
 
>           Index Cond: (user1_id = 2)
>           Heap Fetches: 0
>   Planning Time: 0.271 ms
>   Execution Time: 0.045 ms
> (9 rows)
>
> Note that you have to put some actual rows in the table; if it is
> completely empty, you'll not get a representative result.
>




Re: Optimising a two column OR check

From
Andrew Gierth
Date:
>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:

 MichaelDBA> Yep, you're right, Andrew, adding a couple rows made it do
 MichaelDBA> the index only scan.  I reckon I got misled by turning off
 MichaelDBA> sequential scans, thinking that actual rows were not
 MichaelDBA> important anymore.  Overly simplistic reasonings can get
 MichaelDBA> one into trouble, lol.

We do some odd stuff with the statistics estimates for completely empty
tables because (a) it's not common in practice for a table to be always
empty (i.e. the emptiness is usually transient) and (b) if you take the
emptiness of a table at face value, you end up generating insanely bad
plans for certain FK check queries that may not get replanned quickly
enough to mitigate the performance impact.

--
Andrew (irc:RhodiumToad)



Re: Optimising a two column OR check

From
Ivan Voras
Date:


On Sat, 12 Oct 2019 at 17:16, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Ivan" == Ivan Voras <ivoras@gmail.com> writes:
 
 Ivan> SELECT user1_id,user2_id FROM friend WHERE user1_id=42 OR user2_id=42;

To get friends of user 42:

SELECT user1_id FROM friend WHERE user2_id=42
UNION ALL
SELECT user2_id FROM friend WHERE user1_id=42;

assuming you create the (user2_id,user1_id) index, this should get you
an Append of two index-only scans. We can use UNION ALL here rather than
UNION because the table constraints ensure there are no duplicates.
 
Thanks! That's a more elegant solution for my query than what I had in mind!

Re: Optimising a two column OR check

From
Ivan Voras
Date:
On Sat, 12 Oct 2019 at 17:46, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:

 MichaelDBA> Nope, vacuumed it and still got the bitmap index scans.

Let's see your explains. Here's mine:

# set enable_seqscan=false;  -- because I only have a few rows
SET
# insert into friend values (1,2),(2,5);
INSERT 0 2
# vacuum analyze friend;
VACUUM
# explain analyze SELECT user1_id FROM friend WHERE user2_id=2 UNION ALL select user2_id FROM friend WHERE user1_id=2;
                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.13..8.32 rows=2 width=4) (actual time=0.009..0.014 rows=2 loops=1)
   ->  Index Only Scan using friend_user2_id_user1_id_idx on friend  (cost=0.13..4.15 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
         Index Cond: (user2_id = 2)
         Heap Fetches: 0
   ->  Index Only Scan using friend_pkey on friend friend_1  (cost=0.13..4.15 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=1)
         Index Cond: (user1_id = 2)
         Heap Fetches: 0
 Planning Time: 0.271 ms
 Execution Time: 0.045 ms
(9 rows)

Note that you have to put some actual rows in the table; if it is
completely empty, you'll not get a representative result.

Confirming what's been said - the whole thing works fine on 10. I can't get index only scans on 9.6, but that's a dev machine anyway.

Now if only hash indexes supported multiple column, that'd probably result in all my data being returned from a single read of a hash bucket, but that's going into microoptimisation territory :)

Thanks!