Thread: Optimising a two column OR check
Hello,
There's a "users" table with the following structure:
CREATE TABLE "user" (
id SERIAL PRIMARY KEY,
-- other fields
);
and there's a "friends" table with the following structure:
CREATE TABLE friend (
user1_id INTEGER NOT NULL REFERENCES "user"(id),
user2_id INTEGER NOT NULL REFERENCES "user"(id),
-- other fields
CHECK (user1_id < user2_id),
PRIMARY KEY (user1_id, user2_id)
);
And I'm running this query:
SELECT user1_id,user2_id FROM friend WHERE user1_id=42 OR user2_id=42;
With seqscan disabled, I get this plan on 9.6:
QUERY PLAN
-------------------------------------------------------------------------------------------------
Bitmap Heap Scan on friend (cost=8.42..19.01 rows=14 width=8)
Recheck Cond: ((user1_id = 1) OR (user2_id = 2))
-> BitmapOr (cost=8.42..8.42 rows=14 width=0)
-> Bitmap Index Scan on friend_pkey (cost=0.00..4.21 rows=7 width=0)
Index Cond: (user1_id = 1)
-> Bitmap Index Scan on friend_user2_id_user1_id_idx (cost=0.00..4.21 rows=7 width=0)
Index Cond: (user2_id = 2)
(7 rows)
-------------------------------------------------------------------------------------------------
Bitmap Heap Scan on friend (cost=8.42..19.01 rows=14 width=8)
Recheck Cond: ((user1_id = 1) OR (user2_id = 2))
-> BitmapOr (cost=8.42..8.42 rows=14 width=0)
-> Bitmap Index Scan on friend_pkey (cost=0.00..4.21 rows=7 width=0)
Index Cond: (user1_id = 1)
-> Bitmap Index Scan on friend_user2_id_user1_id_idx (cost=0.00..4.21 rows=7 width=0)
Index Cond: (user2_id = 2)
(7 rows)
I expected to get an index-only scan in this situation, as that would be a very common query. Is there a way to actually make this sort of query resolvable with an index-only scan? Maybe a different table structure would help?
On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote: > With seqscan disabled, I get this plan on 9.6: > Bitmap Heap Scan on friend (cost=8.42..19.01 rows=14 width=8) ... > I expected to get an index-only scan in this situation, as that would be a > very common query. Is there a way to actually make this sort of query > resolvable with an index-only scan? Maybe a different table structure would > help? The v11 release notes have this relevant item: https://www.postgresql.org/docs/11/release-11.html |Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov) Justin
>>>>> "Ivan" == Ivan Voras <ivoras@gmail.com> writes: Ivan> Hello, Ivan> There's a "users" table with the following structure: Ivan> CREATE TABLE "user" ( Ivan> id SERIAL PRIMARY KEY, Ivan> -- other fields Ivan> ); Ivan> and there's a "friends" table with the following structure: Ivan> CREATE TABLE friend ( Ivan> user1_id INTEGER NOT NULL REFERENCES "user"(id), Ivan> user2_id INTEGER NOT NULL REFERENCES "user"(id), Ivan> -- other fields Ivan> CHECK (user1_id < user2_id), Ivan> PRIMARY KEY (user1_id, user2_id) Ivan> ); Ivan> And I'm running this query: Ivan> SELECT user1_id,user2_id FROM friend WHERE user1_id=42 OR user2_id=42; To get friends of user 42: SELECT user1_id FROM friend WHERE user2_id=42 UNION ALL SELECT user2_id FROM friend WHERE user1_id=42; assuming you create the (user2_id,user1_id) index, this should get you an Append of two index-only scans. We can use UNION ALL here rather than UNION because the table constraints ensure there are no duplicates. -- Andrew (irc:RhodiumToad)
On Sat, Oct 12, 2019 at 10:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
> Bitmap Heap Scan on friend (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?
It would have to scan the entire index to find the cases where user2_id=42 but user1_id is not constrained. Technically User1_id would be constrained to be less than 42, but I don't think the planner will take that into account.
The v11 release notes have this relevant item:
https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)
But this is not one of those cases. It is only possible when the only data needed is whether the row exists or not.
Cheers,
Jeff
Another thing to consider is the visibility map. From what I understand, index only scans are preferred for heavily updated tables, not infrequently updated ones. Even though index only scans imply ONLY they really aren't in the sense that they may need to visit the Visibility Map for the heap. This can be costly and the planner may remove index only scan consideration if the VM has tuples that are not visible.
BTW, to Andrew, the UNION ALL alternative still results in bitmap index scans from my testing.
Regards,
Michael Vitale
Jeff Janes wrote on 10/12/2019 11:17 AM:
BTW, to Andrew, the UNION ALL alternative still results in bitmap index scans from my testing.
Regards,
Michael Vitale
Jeff Janes wrote on 10/12/2019 11:17 AM:
On Sat, Oct 12, 2019 at 10:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
> Bitmap Heap Scan on friend (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?It would have to scan the entire index to find the cases where user2_id=42 but user1_id is not constrained. Technically User1_id would be constrained to be less than 42, but I don't think the planner will take that into account.
The v11 release notes have this relevant item:
https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)But this is not one of those cases. It is only possible when the only data needed is whether the row exists or not.Cheers,Jeff
>>>>> "MichaelDBA" == MichaelDBA <MichaelDBA@sqlexec.com> writes: MichaelDBA> BTW, to Andrew, the UNION ALL alternative still results in MichaelDBA> bitmap index scans from my testing. You probably forgot to vacuum the table. -- Andrew (irc:RhodiumToad)
Yikes, apologies to all, my wording is the opposite of what I meant!
Index only scans are preferred for infrequently updated ones, not heavily updated ones where the visibility map is updated often.
Regards,
Michael Vitale
MichaelDBA wrote on 10/12/2019 11:27 AM:
Index only scans are preferred for infrequently updated ones, not heavily updated ones where the visibility map is updated often.
Regards,
Michael Vitale
MichaelDBA wrote on 10/12/2019 11:27 AM:
Another thing to consider is the visibility map. From what I understand, index only scans are preferred for heavily updated tables, not infrequently updated ones. Even though index only scans imply ONLY they really aren't in the sense that they may need to visit the Visibility Map for the heap. This can be costly and the planner may remove index only scan consideration if the VM has tuples that are not visible.
BTW, to Andrew, the UNION ALL alternative still results in bitmap index scans from my testing.
Regards,
Michael Vitale
Jeff Janes wrote on 10/12/2019 11:17 AM:On Sat, Oct 12, 2019 at 10:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:On Sat, Oct 12, 2019 at 04:39:56PM +0200, Ivan Voras wrote:
> With seqscan disabled, I get this plan on 9.6:
> Bitmap Heap Scan on friend (cost=8.42..19.01 rows=14 width=8)
...
> I expected to get an index-only scan in this situation, as that would be a
> very common query. Is there a way to actually make this sort of query
> resolvable with an index-only scan? Maybe a different table structure would
> help?It would have to scan the entire index to find the cases where user2_id=42 but user1_id is not constrained. Technically User1_id would be constrained to be less than 42, but I don't think the planner will take that into account.
The v11 release notes have this relevant item:
https://www.postgresql.org/docs/11/release-11.html
|Allow bitmap scans to perform index-only scans when possible (Alexander Kuzmenkov)But this is not one of those cases. It is only possible when the only data needed is whether the row exists or not.Cheers,Jeff
Nope, vacuumed it and still got the bitmap index scans. Andrew Gierth wrote on 10/12/2019 11:33 AM: >>>>>> "MichaelDBA" == MichaelDBA <MichaelDBA@sqlexec.com> writes: > MichaelDBA> BTW, to Andrew, the UNION ALL alternative still results in > MichaelDBA> bitmap index scans from my testing. > > You probably forgot to vacuum the table. >
Perhaps the fix by Alexander Kuzmenkov in V11 added this VM consideration for having a preference of bitmap index scan over an index only scan. Looks like I'm goin' down the rabbit hole...
Regards,
Michael Vitale
MichaelDBA wrote on 10/12/2019 11:35 AM:
Regards,
Michael Vitale
MichaelDBA wrote on 10/12/2019 11:35 AM:
Nope, vacuumed it and still got the bitmap index scans.
Andrew Gierth wrote on 10/12/2019 11:33 AM:MichaelDBA> BTW, to Andrew, the UNION ALL alternative still results in"MichaelDBA" == MichaelDBA <MichaelDBA@sqlexec.com> writes:
MichaelDBA> bitmap index scans from my testing.
You probably forgot to vacuum the table.
>>>>> "MichaelDBA" == MichaelDBA <MichaelDBA@sqlexec.com> writes: MichaelDBA> Nope, vacuumed it and still got the bitmap index scans. Let's see your explains. Here's mine: # set enable_seqscan=false; -- because I only have a few rows SET # insert into friend values (1,2),(2,5); INSERT 0 2 # vacuum analyze friend; VACUUM # explain analyze SELECT user1_id FROM friend WHERE user2_id=2 UNION ALL select user2_id FROM friend WHERE user1_id=2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Append (cost=0.13..8.32 rows=2 width=4) (actual time=0.009..0.014 rows=2 loops=1) -> Index Only Scan using friend_user2_id_user1_id_idx on friend (cost=0.13..4.15 rows=1 width=4) (actual time=0.009..0.009rows=1 loops=1) Index Cond: (user2_id = 2) Heap Fetches: 0 -> Index Only Scan using friend_pkey on friend friend_1 (cost=0.13..4.15 rows=1 width=4) (actual time=0.003..0.004 rows=1loops=1) Index Cond: (user1_id = 2) Heap Fetches: 0 Planning Time: 0.271 ms Execution Time: 0.045 ms (9 rows) Note that you have to put some actual rows in the table; if it is completely empty, you'll not get a representative result. -- Andrew (irc:RhodiumToad)
Yep, you're right, Andrew, adding a couple rows made it do the index only scan. I reckon I got misled by turning off sequential scans, thinking that actual rows were not important anymore. Overly simplistic reasonings can get one into trouble, lol. Regards, Michael Vitale Andrew Gierth wrote on 10/12/2019 11:46 AM: >>>>>> "MichaelDBA" == MichaelDBA <MichaelDBA@sqlexec.com> writes: > MichaelDBA> Nope, vacuumed it and still got the bitmap index scans. > > Let's see your explains. Here's mine: > > # set enable_seqscan=false; -- because I only have a few rows > SET > # insert into friend values (1,2),(2,5); > INSERT 0 2 > # vacuum analyze friend; > VACUUM > # explain analyze SELECT user1_id FROM friend WHERE user2_id=2 UNION ALL select user2_id FROM friend WHERE user1_id=2; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------ > Append (cost=0.13..8.32 rows=2 width=4) (actual time=0.009..0.014 rows=2 loops=1) > -> Index Only Scan using friend_user2_id_user1_id_idx on friend (cost=0.13..4.15 rows=1 width=4) (actual time=0.009..0.009rows=1 loops=1) > Index Cond: (user2_id = 2) > Heap Fetches: 0 > -> Index Only Scan using friend_pkey on friend friend_1 (cost=0.13..4.15 rows=1 width=4) (actual time=0.003..0.004rows=1 loops=1) > Index Cond: (user1_id = 2) > Heap Fetches: 0 > Planning Time: 0.271 ms > Execution Time: 0.045 ms > (9 rows) > > Note that you have to put some actual rows in the table; if it is > completely empty, you'll not get a representative result. >
>>>>> "MichaelDBA" == MichaelDBA <MichaelDBA@sqlexec.com> writes: MichaelDBA> Yep, you're right, Andrew, adding a couple rows made it do MichaelDBA> the index only scan. I reckon I got misled by turning off MichaelDBA> sequential scans, thinking that actual rows were not MichaelDBA> important anymore. Overly simplistic reasonings can get MichaelDBA> one into trouble, lol. We do some odd stuff with the statistics estimates for completely empty tables because (a) it's not common in practice for a table to be always empty (i.e. the emptiness is usually transient) and (b) if you take the emptiness of a table at face value, you end up generating insanely bad plans for certain FK check queries that may not get replanned quickly enough to mitigate the performance impact. -- Andrew (irc:RhodiumToad)
On Sat, 12 Oct 2019 at 17:16, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Ivan" == Ivan Voras <ivoras@gmail.com> writes:
Ivan> SELECT user1_id,user2_id FROM friend WHERE user1_id=42 OR user2_id=42;
To get friends of user 42:
SELECT user1_id FROM friend WHERE user2_id=42
UNION ALL
SELECT user2_id FROM friend WHERE user1_id=42;
assuming you create the (user2_id,user1_id) index, this should get you
an Append of two index-only scans. We can use UNION ALL here rather than
UNION because the table constraints ensure there are no duplicates.
Thanks! That's a more elegant solution for my query than what I had in mind!
On Sat, 12 Oct 2019 at 17:46, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "MichaelDBA" == MichaelDBA <MichaelDBA@sqlexec.com> writes:
MichaelDBA> Nope, vacuumed it and still got the bitmap index scans.
Let's see your explains. Here's mine:
# set enable_seqscan=false; -- because I only have a few rows
SET
# insert into friend values (1,2),(2,5);
INSERT 0 2
# vacuum analyze friend;
VACUUM
# explain analyze SELECT user1_id FROM friend WHERE user2_id=2 UNION ALL select user2_id FROM friend WHERE user1_id=2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.13..8.32 rows=2 width=4) (actual time=0.009..0.014 rows=2 loops=1)
-> Index Only Scan using friend_user2_id_user1_id_idx on friend (cost=0.13..4.15 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
Index Cond: (user2_id = 2)
Heap Fetches: 0
-> Index Only Scan using friend_pkey on friend friend_1 (cost=0.13..4.15 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=1)
Index Cond: (user1_id = 2)
Heap Fetches: 0
Planning Time: 0.271 ms
Execution Time: 0.045 ms
(9 rows)
Note that you have to put some actual rows in the table; if it is
completely empty, you'll not get a representative result.
Confirming what's been said - the whole thing works fine on 10. I can't get index only scans on 9.6, but that's a dev machine anyway.
Now if only hash indexes supported multiple column, that'd probably result in all my data being returned from a single read of a hash bucket, but that's going into microoptimisation territory :)
Thanks!