Home > mailing lists

Thread: Optimising a two column OR check

Optimising a two column OR check

From

Ivan Voras

Date:

12 October 2019, 14:39:56

Hello,

There's a "users" table with the following structure:

CREATE TABLE "user" (

id SERIAL PRIMARY KEY,

-- other fields

);

and there's a "friends" table with the following structure:

CREATE TABLE friend (

user1_id INTEGER NOT NULL REFERENCES "user"(id),

user2_id INTEGER NOT NULL REFERENCES "user"(id),

-- other fields

CHECK (user1_id < user2_id),

PRIMARY KEY (user1_id, user2_id)

);

And I'm running this query:

SELECT user1_id,user2_id FROM friend WHERE user1_id=42 OR user2_id=42;

With seqscan disabled, I get this plan on 9.6:

QUERY PLAN
-------------------------------------------------------------------------------------------------
Bitmap Heap Scan on friend (cost=8.42..19.01 rows=14 width=8)
Recheck Cond: ((user1_id = 1) OR (user2_id = 2))
-> BitmapOr (cost=8.42..8.42 rows=14 width=0)
-> Bitmap Index Scan on friend_pkey (cost=0.00..4.21 rows=7 width=0)
Index Cond: (user1_id = 1)
-> Bitmap Index Scan on friend_user2_id_user1_id_idx (cost=0.00..4.21 rows=7 width=0)
Index Cond: (user2_id = 2)
(7 rows)

I expected to get an index-only scan in this situation, as that would be a very common query. Is there a way to actually make this sort of query resolvable with an index-only scan? Maybe a different table structure would help?

Re: Optimising a two column OR check

From

Justin Pryzby

Date:

12 October 2019, 14:43:36

On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
>  Bitmap Heap Scan on friend  (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?

The v11 release notes have this relevant item:

https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)

Justin

Re: Optimising a two column OR check

From

Andrew Gierth

Date:

12 October 2019, 15:16:50

>>>>> "Ivan" == Ivan Voras <ivoras@gmail.com> writes:

 Ivan> Hello,
 Ivan> There's a "users" table with the following structure:

 Ivan> CREATE TABLE "user" (
 Ivan>   id SERIAL PRIMARY KEY,
 Ivan>  -- other fields
 Ivan> );

 Ivan> and there's a "friends" table with the following structure:

 Ivan> CREATE TABLE friend (
 Ivan>   user1_id INTEGER NOT NULL REFERENCES "user"(id),
 Ivan>   user2_id INTEGER NOT NULL REFERENCES "user"(id),
 Ivan>  -- other fields
 Ivan>   CHECK (user1_id < user2_id),
 Ivan>   PRIMARY KEY (user1_id, user2_id)
 Ivan> );

 Ivan> And I'm running this query:

 Ivan> SELECT user1_id,user2_id FROM friend WHERE user1_id=42 OR user2_id=42;

To get friends of user 42:

SELECT user1_id FROM friend WHERE user2_id=42
UNION ALL
SELECT user2_id FROM friend WHERE user1_id=42;

assuming you create the (user2_id,user1_id) index, this should get you
an Append of two index-only scans. We can use UNION ALL here rather than
UNION because the table constraints ensure there are no duplicates.

-- 
Andrew (irc:RhodiumToad)

Re: Optimising a two column OR check

From

Jeff Janes

Date:

12 October 2019, 15:17:58

On Sat, Oct 12, 2019 at 10:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
> Bitmap Heap Scan on friend (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?

It would have to scan the entire index to find the cases where user2_id=42 but user1_id is not constrained. Technically User1_id would be constrained to be less than 42, but I don't think the planner will take that into account.

The v11 release notes have this relevant item:

https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)

But this is not one of those cases. It is only possible when the only data needed is whether the row exists or not.

Cheers,

Jeff

Re: Optimising a two column OR check

From

MichaelDBA

Date:

12 October 2019, 15:27:55

Another thing to consider is the visibility map. From what I understand, index only scans are preferred for heavily updated tables, not infrequently updated ones. Even though index only scans imply ONLY they really aren't in the sense that they may need to visit the Visibility Map for the heap. This can be costly and the planner may remove index only scan consideration if the VM has tuples that are not visible.

BTW, to Andrew, the UNION ALL alternative still results in bitmap index scans from my testing.

Regards,
Michael Vitale

Jeff Janes wrote on 10/12/2019 11:17 AM:

On Sat, Oct 12, 2019 at 10:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
> Bitmap Heap Scan on friend (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?

It would have to scan the entire index to find the cases where user2_id=42 but user1_id is not constrained. Technically User1_id would be constrained to be less than 42, but I don't think the planner will take that into account.

The v11 release notes have this relevant item:

https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)

But this is not one of those cases. It is only possible when the only data needed is whether the row exists or not.

Cheers,

Jeff

Re: Optimising a two column OR check

From

Andrew Gierth

Date:

12 October 2019, 15:33:37

>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:

 MichaelDBA> BTW, to Andrew, the UNION ALL alternative still results in
 MichaelDBA> bitmap index scans from my testing.

You probably forgot to vacuum the table.

-- 
Andrew (irc:RhodiumToad)

Re: Optimising a two column OR check

From

MichaelDBA

Date:

12 October 2019, 15:33:53

Yikes, apologies to all, my wording is the opposite of what I meant!

Index only scans are preferred for infrequently updated ones, not heavily updated ones where the visibility map is updated often.

Regards,
Michael Vitale

MichaelDBA wrote on 10/12/2019 11:27 AM:

Another thing to consider is the visibility map. From what I understand, index only scans are preferred for heavily updated tables, not infrequently updated ones. Even though index only scans imply ONLY they really aren't in the sense that they may need to visit the Visibility Map for the heap. This can be costly and the planner may remove index only scan consideration if the VM has tuples that are not visible.

BTW, to Andrew, the UNION ALL alternative still results in bitmap index scans from my testing.

Regards,
Michael Vitale

Jeff Janes wrote on 10/12/2019 11:17 AM:
On Sat, Oct 12, 2019 at 10:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
> Bitmap Heap Scan on friend (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?

It would have to scan the entire index to find the cases where user2_id=42 but user1_id is not constrained. Technically User1_id would be constrained to be less than 42, but I don't think the planner will take that into account.

The v11 release notes have this relevant item:

https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)

But this is not one of those cases. It is only possible when the only data needed is whether the row exists or not.

Cheers,

Jeff

Re: Optimising a two column OR check

From

MichaelDBA

Date:

12 October 2019, 15:35:22

Nope, vacuumed it and still got the bitmap index scans.

Andrew Gierth wrote on 10/12/2019 11:33 AM:
>>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:
>   MichaelDBA> BTW, to Andrew, the UNION ALL alternative still results in
>   MichaelDBA> bitmap index scans from my testing.
>
> You probably forgot to vacuum the table.
>

Re: Optimising a two column OR check

From

MichaelDBA

Date:

12 October 2019, 15:45:45

Perhaps the fix by Alexander Kuzmenkov in V11 added this VM consideration for having a preference of bitmap index scan over an index only scan. Looks like I'm goin' down the rabbit hole...

Regards,
Michael Vitale

MichaelDBA wrote on 10/12/2019 11:35 AM:

Nope, vacuumed it and still got the bitmap index scans.

Andrew Gierth wrote on 10/12/2019 11:33 AM:
"MichaelDBA" == MichaelDBA <MichaelDBA@sqlexec.com> writes:
MichaelDBA> BTW, to Andrew, the UNION ALL alternative still results in
MichaelDBA> bitmap index scans from my testing.

You probably forgot to vacuum the table.

Re: Optimising a two column OR check

From

Andrew Gierth

Date:

12 October 2019, 15:46:44

>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:

 MichaelDBA> Nope, vacuumed it and still got the bitmap index scans.

Let's see your explains. Here's mine:

# set enable_seqscan=false;  -- because I only have a few rows
SET
# insert into friend values (1,2),(2,5);
INSERT 0 2
# vacuum analyze friend;
VACUUM
# explain analyze SELECT user1_id FROM friend WHERE user2_id=2 UNION ALL select user2_id FROM friend WHERE user1_id=2;
                                                                   QUERY PLAN
                       
 

------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.13..8.32 rows=2 width=4) (actual time=0.009..0.014 rows=2 loops=1)
   ->  Index Only Scan using friend_user2_id_user1_id_idx on friend  (cost=0.13..4.15 rows=1 width=4) (actual
time=0.009..0.009rows=1 loops=1)
 
         Index Cond: (user2_id = 2)
         Heap Fetches: 0
   ->  Index Only Scan using friend_pkey on friend friend_1  (cost=0.13..4.15 rows=1 width=4) (actual time=0.003..0.004
rows=1loops=1)
 
         Index Cond: (user1_id = 2)
         Heap Fetches: 0
 Planning Time: 0.271 ms
 Execution Time: 0.045 ms
(9 rows)

Note that you have to put some actual rows in the table; if it is
completely empty, you'll not get a representative result.

-- 
Andrew (irc:RhodiumToad)

Re: Optimising a two column OR check

From

MichaelDBA

Date:

12 October 2019, 15:54:07

Yep, you're right, Andrew, adding a couple rows made it do the index 
only scan.  I reckon I got misled by turning off sequential scans, 
thinking that actual rows were not important anymore.  Overly simplistic 
reasonings can get one into trouble, lol.

Regards,
Michael Vitale


Andrew Gierth wrote on 10/12/2019 11:46 AM:
>>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:
>   MichaelDBA> Nope, vacuumed it and still got the bitmap index scans.
>
> Let's see your explains. Here's mine:
>
> # set enable_seqscan=false;  -- because I only have a few rows
> SET
> # insert into friend values (1,2),(2,5);
> INSERT 0 2
> # vacuum analyze friend;
> VACUUM
> # explain analyze SELECT user1_id FROM friend WHERE user2_id=2 UNION ALL select user2_id FROM friend WHERE
user1_id=2;
>                                                                     QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------
>   Append  (cost=0.13..8.32 rows=2 width=4) (actual time=0.009..0.014 rows=2 loops=1)
>     ->  Index Only Scan using friend_user2_id_user1_id_idx on friend  (cost=0.13..4.15 rows=1 width=4) (actual
time=0.009..0.009rows=1 loops=1)
 
>           Index Cond: (user2_id = 2)
>           Heap Fetches: 0
>     ->  Index Only Scan using friend_pkey on friend friend_1  (cost=0.13..4.15 rows=1 width=4) (actual
time=0.003..0.004rows=1 loops=1)
 
>           Index Cond: (user1_id = 2)
>           Heap Fetches: 0
>   Planning Time: 0.271 ms
>   Execution Time: 0.045 ms
> (9 rows)
>
> Note that you have to put some actual rows in the table; if it is
> completely empty, you'll not get a representative result.
>

Re: Optimising a two column OR check

From

Andrew Gierth

Date:

12 October 2019, 16:58:22

>>>>> "MichaelDBA" == MichaelDBA  <MichaelDBA@sqlexec.com> writes:

 MichaelDBA> Yep, you're right, Andrew, adding a couple rows made it do
 MichaelDBA> the index only scan.  I reckon I got misled by turning off
 MichaelDBA> sequential scans, thinking that actual rows were not
 MichaelDBA> important anymore.  Overly simplistic reasonings can get
 MichaelDBA> one into trouble, lol.

We do some odd stuff with the statistics estimates for completely empty
tables because (a) it's not common in practice for a table to be always
empty (i.e. the emptiness is usually transient) and (b) if you take the
emptiness of a table at face value, you end up generating insanely bad
plans for certain FK check queries that may not get replanned quickly
enough to mitigate the performance impact.

--
Andrew (irc:RhodiumToad)

Re: Optimising a two column OR check

From

Ivan Voras

Date:

13 October 2019, 13:29:19

On Sat, 12 Oct 2019 at 17:16, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

>>>>> "Ivan" == Ivan Voras <ivoras@gmail.com> writes:

Ivan> SELECT user1_id,user2_id FROM friend WHERE user1_id=42 OR user2_id=42;

To get friends of user 42:

SELECT user1_id FROM friend WHERE user2_id=42
UNION ALL
SELECT user2_id FROM friend WHERE user1_id=42;

assuming you create the (user2_id,user1_id) index, this should get you
an Append of two index-only scans. We can use UNION ALL here rather than
UNION because the table constraints ensure there are no duplicates.

Thanks! That's a more elegant solution for my query than what I had in mind!

Re: Optimising a two column OR check

From

Ivan Voras

Date:

13 October 2019, 13:38:49

On Sat, 12 Oct 2019 at 17:46, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

>>>>> "MichaelDBA" == MichaelDBA <MichaelDBA@sqlexec.com> writes:

MichaelDBA> Nope, vacuumed it and still got the bitmap index scans.

Let's see your explains. Here's mine:

# set enable_seqscan=false; -- because I only have a few rows
SET
# insert into friend values (1,2),(2,5);
INSERT 0 2
# vacuum analyze friend;
VACUUM
# explain analyze SELECT user1_id FROM friend WHERE user2_id=2 UNION ALL select user2_id FROM friend WHERE user1_id=2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.13..8.32 rows=2 width=4) (actual time=0.009..0.014 rows=2 loops=1)
-> Index Only Scan using friend_user2_id_user1_id_idx on friend (cost=0.13..4.15 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
Index Cond: (user2_id = 2)
Heap Fetches: 0
-> Index Only Scan using friend_pkey on friend friend_1 (cost=0.13..4.15 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=1)
Index Cond: (user1_id = 2)
Heap Fetches: 0
Planning Time: 0.271 ms
Execution Time: 0.045 ms
(9 rows)

Note that you have to put some actual rows in the table; if it is
completely empty, you'll not get a representative result.

Confirming what's been said - the whole thing works fine on 10. I can't get index only scans on 9.6, but that's a dev machine anyway.

Now if only hash indexes supported multiple column, that'd probably result in all my data being returned from a single read of a hash bucket, but that's going into microoptimisation territory :)

Thanks!