Thread: SQL Query Performance - what gives?
Let's take the following EXPLAIN results: ticker=# explain select * from post, forum where forum.name = post.forum and invisible <> 1 and to_tsvector('english', message) @@ to_tsquery('violence') order by modified desc limit 100; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Limit (cost=5951.85..5952.10 rows=100 width=706) -> Sort (cost=5951.85..5955.37 rows=1408 width=706) Sort Key: post.modified -> Hash Join (cost=613.80..5898.04 rows=1408 width=706) Hash Cond: (post.forum = forum.name) -> Bitmap Heap Scan on post (cost=370.93..5635.71 rows=1435 width=435) Recheck Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text)) Filter: (invisible <> 1) -> Bitmap Index Scan on idx_message (cost=0.00..370.57 rows=1435 width=0) Index Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text)) -> Hash (cost=242.07..242.07 rows=64 width=271) -> Index Scan using forum_name on forum (cost=0.00..242.07 rows=64 width=271) (12 rows) ticker=# And ticker=# explain select * from post, forum where forum.name = post.forum and invisible <> 1 and ((permission & '127') = permission) and (contrib is null or contrib = ' ' or contrib like '%b%') and to_tsvector('english', message) @@ to_tsquery('violence') order by modified desc limit 100; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=1329.81..1329.87 rows=22 width=706) -> Sort (cost=1329.81..1329.87 rows=22 width=706) Sort Key: post.modified -> Nested Loop (cost=978.96..1329.32 rows=22 width=706) -> Index Scan using forum_name on forum (cost=0.00..242.71 rows=1 width=271) Filter: (((contrib IS NULL) OR (contrib = ' '::text) OR (contrib ~~ '%b%'::text)) AND ((permission & 127) = permission)) -> Bitmap Heap Scan on post (cost=978.96..1086.28 rows=27 width=435) Recheck Cond: ((to_tsvector('english'::text, post.message) @@ to_tsquery('violence'::text)) AND (post.forum = forum.name)) Filter: (post.invisible <> 1) -> BitmapAnd (cost=978.96..978.96 rows=27 width=0) -> Bitmap Index Scan on idx_message (cost=0.00..370.57 rows=1435 width=0) Index Cond: (to_tsvector('english'::text, post.message) @@ to_tsquery('violence'::text)) -> Bitmap Index Scan on post_forum (cost=0.00..607.78 rows=26575 width=0) Index Cond: (post.forum = forum.name) (14 rows) ticker=# The difference in these two queries is that the second qualifies the returned search to check two permission blocks - one related to the user's permission bit mask, and the second a mask of single-character "flags" (the user's classification must be in the list of permitted classifications) Ok. Notice that the top-line cost of the first query is HIGHER. The first query runs almost instantly - average execution latency is frequently in the few-hundred millisecond range. The second query can take upward of 30 seconds (!) to run. Neither hits the disk, the machine in question has scads of free RAM available, and while busy is not particularly constrained. Other simultaneous users on the database are getting queries back immediately (no unreasonable delays). If I remove parts of the permission tests it does not matter. If ANY of those tests qualifies the returned values the performance goes in the toilet. If I re-order when the permission tests appear (e.g. at the end of the search command) it makes no difference in the response time either (it does, however, change the EXPLAIN output somewhat, and thereby appears to change the query plan. What's going on here? I can usually figure out what's causing bad performance and fix it with the judicious addition of an index or other similar thing - this one has me completely mystified. -- Karl
Attachment
Karl Denninger <karl@denninger.net> wrote: > Let's take the following EXPLAIN results: We could tell a lot more from EXPLAIN ANALYZE results. The table definitions (with index information) would help, too. -Kevin
First query:
ticker=# explain analyze select * from post, forum where forum.name = post.forum and invisible <> 1 and to_tsvector('english', message) @@ to_tsquery('violence') order by modified desc limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5959.78..5960.03 rows=100 width=706) (actual time=49.847..50.264 rows=100 loops=1)
-> Sort (cost=5959.78..5963.30 rows=1408 width=706) (actual time=49.843..49.982 rows=100 loops=1)
Sort Key: post.modified
Sort Method: top-N heapsort Memory: 168kB
-> Hash Join (cost=621.72..5905.96 rows=1408 width=706) (actual time=4.050..41.238 rows=2055 loops=1)
Hash Cond: (post.forum = forum.name)
-> Bitmap Heap Scan on post (cost=370.93..5635.71 rows=1435 width=435) (actual time=3.409..32.648 rows=2055 loops=1)
Recheck Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text))
Filter: (invisible <> 1)
-> Bitmap Index Scan on idx_message (cost=0.00..370.57 rows=1435 width=0) (actual time=2.984..2.984 rows=2085 loops=1)
Index Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text))
-> Hash (cost=249.97..249.97 rows=66 width=271) (actual time=0.596..0.596 rows=64 loops=1)
-> Index Scan using forum_name on forum (cost=0.00..249.97 rows=66 width=271) (actual time=0.093..0.441 rows=64 loops=1)
Total runtime: 50.625 ms
(14 rows)
ticker=#
Second query:
ticker=# explain analyze select * from post, forum where forum.name = post.forum and invisible <> 1 and ((permission & '127') = permission) and (contrib is null or contrib = ' ' or contrib like '%b%') and to_tsvector('english', message) @@ to_tsquery('violence') order by modified desc limit 100;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1337.71..1337.76 rows=21 width=706) (actual time=31121.317..31121.736 rows=100 loops=1)
-> Sort (cost=1337.71..1337.76 rows=21 width=706) (actual time=31121.313..31121.452 rows=100 loops=1)
Sort Key: post.modified
Sort Method: top-N heapsort Memory: 168kB
-> Nested Loop (cost=978.97..1337.25 rows=21 width=706) (actual time=2.841..31108.926 rows=2055 loops=1)
-> Index Scan using forum_name on forum (cost=0.00..250.63 rows=1 width=271) (actual time=0.013..0.408 rows=63 loops=1)
Filter: (((contrib IS NULL) OR (contrib = ' '::text) OR (contrib ~~ '%b%'::text)) AND ((permission & 127) = permission))
-> Bitmap Heap Scan on post (cost=978.97..1086.28 rows=27 width=435) (actual time=109.832..493.648 rows=33 loops=63)
Recheck Cond: ((to_tsvector('english'::text, post.message) @@ to_tsquery('violence'::text)) AND (post.forum = forum.name))
Filter: (post.invisible <> 1)
-> BitmapAnd (cost=978.97..978.97 rows=27 width=0) (actual time=98.832..98.832 rows=0 loops=63)
-> Bitmap Index Scan on idx_message (cost=0.00..370.57 rows=1435 width=0) (actual time=0.682..0.682 rows=2085 loops=63)
Index Cond: (to_tsvector('english'::text, post.message) @@ to_tsquery('violence'::text))
-> Bitmap Index Scan on post_forum (cost=0.00..607.78 rows=26575 width=0) (actual time=97.625..97.625 rows=22616 loops=63)
Index Cond: (post.forum = forum.name)
Total runtime: 31122.781 ms
(16 rows)
ticker=#
ticker=# \d post
Table "public.post"
Column | Type | Modifiers
-----------+--------------------------+--------------------------------------------------------
forum | text |
number | integer |
toppost | integer |
views | integer | default 0
login | text |
subject | text |
message | text |
inserted | timestamp with time zone |
modified | timestamp with time zone |
replied | timestamp with time zone |
who | text |
reason | text |
ordinal | integer | not null default nextval('post_ordinal_seq'::regclass)
replies | integer | default 0
invisible | integer |
sticky | integer |
ip | inet |
lock | integer | default 0
pinned | integer | default 0
marked | boolean |
Indexes:
"post_pkey" PRIMARY KEY, btree (ordinal)
"idx_message" gin (to_tsvector('english'::text, message))
"idx_subject" gin (to_tsvector('english'::text, subject))
"post_forum" btree (forum)
"post_getlastpost" btree (forum, modified)
"post_inserted" btree (inserted)
"post_login" btree (login)
"post_modified" btree (modified)
"post_number" btree (number)
"post_order" btree (number, inserted)
"post_ordinal" btree (ordinal)
"post_top" btree (toppost)
"post_toppost" btree (forum, toppost, inserted)
Foreign-key constraints:
"forum_fk" FOREIGN KEY (forum) REFERENCES forum(name) ON UPDATE CASCADE ON DELETE CASCADE
"login_fk" FOREIGN KEY (login) REFERENCES usertable(login) ON UPDATE CASCADE ON DELETE CASCADE
Triggers:
_tickerforum_logtrigger AFTER INSERT OR DELETE OR UPDATE ON post FOR EACH ROW EXECUTE PROCEDURE _tickerforum.logtrigger('_tickerforum', '20', 'vvvvvvvvvvvvk')
Disabled triggers:
_tickerforum_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON post FOR EACH ROW EXECUTE PROCEDURE _tickerforum.denyaccess('_tickerforum')
ticker=# \d forum
Table "public.forum"
Column | Type | Modifiers
-------------+--------------------------+-----------
name | text | not null
description | text |
long_desc | text |
forum_type | integer |
forum_order | integer |
lastpost | timestamp with time zone |
lastperson | text |
permission | integer | default 0
modtime | integer |
numposts | integer | default 0
type | integer | default 0
readonly | integer | default 0
moderated | integer | default 0
flags | integer |
rsslength | text |
contrib | text |
autolock | text |
autodest | text |
open | text |
Indexes:
"forum_pkey" PRIMARY KEY, btree (name)
"forum_name" UNIQUE, btree (name)
"forum_order" UNIQUE, btree (forum_order)
Triggers:
_tickerforum_logtrigger AFTER INSERT OR DELETE OR UPDATE ON forum FOR EACH ROW EXECUTE PROCEDURE _tickerforum.logtrigger('_tickerforum', '7', 'k')
Disabled triggers:
_tickerforum_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON forum FOR EACH ROW EXECUTE PROCEDURE _tickerforum.denyaccess('_tickerforum')
(The triggers exist due to replication via Slony)
Kevin Grittner wrote:
ticker=# explain analyze select * from post, forum where forum.name = post.forum and invisible <> 1 and to_tsvector('english', message) @@ to_tsquery('violence') order by modified desc limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5959.78..5960.03 rows=100 width=706) (actual time=49.847..50.264 rows=100 loops=1)
-> Sort (cost=5959.78..5963.30 rows=1408 width=706) (actual time=49.843..49.982 rows=100 loops=1)
Sort Key: post.modified
Sort Method: top-N heapsort Memory: 168kB
-> Hash Join (cost=621.72..5905.96 rows=1408 width=706) (actual time=4.050..41.238 rows=2055 loops=1)
Hash Cond: (post.forum = forum.name)
-> Bitmap Heap Scan on post (cost=370.93..5635.71 rows=1435 width=435) (actual time=3.409..32.648 rows=2055 loops=1)
Recheck Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text))
Filter: (invisible <> 1)
-> Bitmap Index Scan on idx_message (cost=0.00..370.57 rows=1435 width=0) (actual time=2.984..2.984 rows=2085 loops=1)
Index Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text))
-> Hash (cost=249.97..249.97 rows=66 width=271) (actual time=0.596..0.596 rows=64 loops=1)
-> Index Scan using forum_name on forum (cost=0.00..249.97 rows=66 width=271) (actual time=0.093..0.441 rows=64 loops=1)
Total runtime: 50.625 ms
(14 rows)
ticker=#
Second query:
ticker=# explain analyze select * from post, forum where forum.name = post.forum and invisible <> 1 and ((permission & '127') = permission) and (contrib is null or contrib = ' ' or contrib like '%b%') and to_tsvector('english', message) @@ to_tsquery('violence') order by modified desc limit 100;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1337.71..1337.76 rows=21 width=706) (actual time=31121.317..31121.736 rows=100 loops=1)
-> Sort (cost=1337.71..1337.76 rows=21 width=706) (actual time=31121.313..31121.452 rows=100 loops=1)
Sort Key: post.modified
Sort Method: top-N heapsort Memory: 168kB
-> Nested Loop (cost=978.97..1337.25 rows=21 width=706) (actual time=2.841..31108.926 rows=2055 loops=1)
-> Index Scan using forum_name on forum (cost=0.00..250.63 rows=1 width=271) (actual time=0.013..0.408 rows=63 loops=1)
Filter: (((contrib IS NULL) OR (contrib = ' '::text) OR (contrib ~~ '%b%'::text)) AND ((permission & 127) = permission))
-> Bitmap Heap Scan on post (cost=978.97..1086.28 rows=27 width=435) (actual time=109.832..493.648 rows=33 loops=63)
Recheck Cond: ((to_tsvector('english'::text, post.message) @@ to_tsquery('violence'::text)) AND (post.forum = forum.name))
Filter: (post.invisible <> 1)
-> BitmapAnd (cost=978.97..978.97 rows=27 width=0) (actual time=98.832..98.832 rows=0 loops=63)
-> Bitmap Index Scan on idx_message (cost=0.00..370.57 rows=1435 width=0) (actual time=0.682..0.682 rows=2085 loops=63)
Index Cond: (to_tsvector('english'::text, post.message) @@ to_tsquery('violence'::text))
-> Bitmap Index Scan on post_forum (cost=0.00..607.78 rows=26575 width=0) (actual time=97.625..97.625 rows=22616 loops=63)
Index Cond: (post.forum = forum.name)
Total runtime: 31122.781 ms
(16 rows)
ticker=#
ticker=# \d post
Table "public.post"
Column | Type | Modifiers
-----------+--------------------------+--------------------------------------------------------
forum | text |
number | integer |
toppost | integer |
views | integer | default 0
login | text |
subject | text |
message | text |
inserted | timestamp with time zone |
modified | timestamp with time zone |
replied | timestamp with time zone |
who | text |
reason | text |
ordinal | integer | not null default nextval('post_ordinal_seq'::regclass)
replies | integer | default 0
invisible | integer |
sticky | integer |
ip | inet |
lock | integer | default 0
pinned | integer | default 0
marked | boolean |
Indexes:
"post_pkey" PRIMARY KEY, btree (ordinal)
"idx_message" gin (to_tsvector('english'::text, message))
"idx_subject" gin (to_tsvector('english'::text, subject))
"post_forum" btree (forum)
"post_getlastpost" btree (forum, modified)
"post_inserted" btree (inserted)
"post_login" btree (login)
"post_modified" btree (modified)
"post_number" btree (number)
"post_order" btree (number, inserted)
"post_ordinal" btree (ordinal)
"post_top" btree (toppost)
"post_toppost" btree (forum, toppost, inserted)
Foreign-key constraints:
"forum_fk" FOREIGN KEY (forum) REFERENCES forum(name) ON UPDATE CASCADE ON DELETE CASCADE
"login_fk" FOREIGN KEY (login) REFERENCES usertable(login) ON UPDATE CASCADE ON DELETE CASCADE
Triggers:
_tickerforum_logtrigger AFTER INSERT OR DELETE OR UPDATE ON post FOR EACH ROW EXECUTE PROCEDURE _tickerforum.logtrigger('_tickerforum', '20', 'vvvvvvvvvvvvk')
Disabled triggers:
_tickerforum_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON post FOR EACH ROW EXECUTE PROCEDURE _tickerforum.denyaccess('_tickerforum')
ticker=# \d forum
Table "public.forum"
Column | Type | Modifiers
-------------+--------------------------+-----------
name | text | not null
description | text |
long_desc | text |
forum_type | integer |
forum_order | integer |
lastpost | timestamp with time zone |
lastperson | text |
permission | integer | default 0
modtime | integer |
numposts | integer | default 0
type | integer | default 0
readonly | integer | default 0
moderated | integer | default 0
flags | integer |
rsslength | text |
contrib | text |
autolock | text |
autodest | text |
open | text |
Indexes:
"forum_pkey" PRIMARY KEY, btree (name)
"forum_name" UNIQUE, btree (name)
"forum_order" UNIQUE, btree (forum_order)
Triggers:
_tickerforum_logtrigger AFTER INSERT OR DELETE OR UPDATE ON forum FOR EACH ROW EXECUTE PROCEDURE _tickerforum.logtrigger('_tickerforum', '7', 'k')
Disabled triggers:
_tickerforum_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON forum FOR EACH ROW EXECUTE PROCEDURE _tickerforum.denyaccess('_tickerforum')
(The triggers exist due to replication via Slony)
Kevin Grittner wrote:
Karl Denninger <karl@denninger.net> wrote:Let's take the following EXPLAIN results:We could tell a lot more from EXPLAIN ANALYZE results. The table definitions (with index information) would help, too. -Kevin
Attachment
Karl Denninger <karl@denninger.net> wrote: > -> Index Scan using forum_name on forum > (cost=0.00..250.63 rows=1 width=271) (actual time=0.013..0.408 > rows=63 loops=1) > Filter: (((contrib IS NULL) OR (contrib = ' > '::text) OR (contrib ~~ '%b%'::text)) AND ((permission & 127) = > permission)) The biggest issue, as far as I can see, is that it thinks that the selection criteria on forum will limit to one row, while it really matches 63 rows. You might be able to coerce it into a faster plan with something like this (untested): select * from (select * from post where invisible <> 1 and to_tsvector('english', message) @@ to_tsquery('violence') ) p, forum where forum.name = p.forum and (permission & '127') = permission and (contrib is null or contrib = ' ' or contrib like '%b%') order by modified desc limit 100 ; -Kevin
Kevin Grittner wrote:
That didn't help.
The FTS alone returns 2,000 records on that table, and does so VERY quickly:
ticker=# explain analyze select count(ordinal) from post, forum where post.forum=forum.name and invisible <> 1
and to_tsvector('english', message)
@@ to_tsquery('violence');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=5901.57..5901.58 rows=1 width=4) (actual time=17.492..17.494 rows=1 loops=1)
-> Hash Join (cost=613.80..5898.04 rows=1408 width=4) (actual time=1.436..14.620 rows=2056 loops=1)
Hash Cond: (post.forum = forum.name)
-> Bitmap Heap Scan on post (cost=370.93..5635.71 rows=1435 width=14) (actual time=1.123..7.944 rows=2056 loops=1)
Recheck Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text))
Filter: (invisible <> 1)
-> Bitmap Index Scan on idx_message (cost=0.00..370.57 rows=1435 width=0) (actual time=0.738..0.738 rows=2099 loops=1)
Index Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text))
-> Hash (cost=242.07..242.07 rows=64 width=9) (actual time=0.300..0.300 rows=64 loops=1)
-> Index Scan using forum_name on forum (cost=0.00..242.07 rows=64 width=9) (actual time=0.011..0.182 rows=64 loops=1)
Total runtime: 17.559 ms
(11 rows)
ticker=#
Ok, but now when we check the permission mask....
ticker=# explain analyze select count(ordinal) from post, forum where post.forum=forum.name and invisible <> 1
and to_tsvector('english', message)
@@ to_tsquery('violence') and (permission & 4 = permission);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1329.07..1329.08 rows=1 width=4) (actual time=29819.293..29819.295 rows=1 loops=1)
-> Nested Loop (cost=978.97..1329.01 rows=22 width=4) (actual time=2.575..29815.530 rows=2056 loops=1)
-> Index Scan using forum_name on forum (cost=0.00..242.39 rows=1 width=13) (actual time=0.016..0.355 rows=62 loops=1)
Filter: ((permission & 4) = permission)
-> Bitmap Heap Scan on post (cost=978.97..1086.28 rows=27 width=14) (actual time=97.997..480.746 rows=33 loops=62)
Recheck Cond: ((to_tsvector('english'::text, post.message) @@ to_tsquery('violence'::text)) AND (post.forum = forum.name))
Filter: (post.invisible <> 1)
-> BitmapAnd (cost=978.97..978.97 rows=27 width=0) (actual time=91.106..91.106 rows=0 loops=62)
-> Bitmap Index Scan on idx_message (cost=0.00..370.57 rows=1435 width=0) (actual time=0.680..0.680 rows=2099 loops=62)
Index Cond: (to_tsvector('english'::text, post.message) @@ to_tsquery('violence'::text))
-> Bitmap Index Scan on post_forum (cost=0.00..607.78 rows=26575 width=0) (actual time=89.927..89.927 rows=22980 loops=62)
Index Cond: (post.forum = forum.name)
Total runtime: 29819.376 ms
(13 rows)
ticker=#
The problem appearsa to lie in the "nested loop", and I don't understand why that's happening. Isn't a **LINEAR** check on each returned value (since we do the aggregate first?) sufficient? Why is the query planner creating a nested loop - the aggregate contains the tested field and it is not subject to change once aggregated?!
Karl Denninger <karl@denninger.net> wrote:-> Index Scan using forum_name on forum (cost=0.00..250.63 rows=1 width=271) (actual time=0.013..0.408 rows=63 loops=1) Filter: (((contrib IS NULL) OR (contrib = ' '::text) OR (contrib ~~ '%b%'::text)) AND ((permission & 127) = permission))The biggest issue, as far as I can see, is that it thinks that the selection criteria on forum will limit to one row, while it really matches 63 rows. You might be able to coerce it into a faster plan with something like this (untested): select * from (select * from post where invisible <> 1 and to_tsvector('english', message) @@ to_tsquery('violence') ) p, forum where forum.name = p.forum and (permission & '127') = permission and (contrib is null or contrib = ' ' or contrib like '%b%') order by modified desc limit 100 ; -Kevin
That didn't help.
The FTS alone returns 2,000 records on that table, and does so VERY quickly:
ticker=# explain analyze select count(ordinal) from post, forum where post.forum=forum.name and invisible <> 1
and to_tsvector('english', message)
@@ to_tsquery('violence');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=5901.57..5901.58 rows=1 width=4) (actual time=17.492..17.494 rows=1 loops=1)
-> Hash Join (cost=613.80..5898.04 rows=1408 width=4) (actual time=1.436..14.620 rows=2056 loops=1)
Hash Cond: (post.forum = forum.name)
-> Bitmap Heap Scan on post (cost=370.93..5635.71 rows=1435 width=14) (actual time=1.123..7.944 rows=2056 loops=1)
Recheck Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text))
Filter: (invisible <> 1)
-> Bitmap Index Scan on idx_message (cost=0.00..370.57 rows=1435 width=0) (actual time=0.738..0.738 rows=2099 loops=1)
Index Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text))
-> Hash (cost=242.07..242.07 rows=64 width=9) (actual time=0.300..0.300 rows=64 loops=1)
-> Index Scan using forum_name on forum (cost=0.00..242.07 rows=64 width=9) (actual time=0.011..0.182 rows=64 loops=1)
Total runtime: 17.559 ms
(11 rows)
ticker=#
Ok, but now when we check the permission mask....
ticker=# explain analyze select count(ordinal) from post, forum where post.forum=forum.name and invisible <> 1
and to_tsvector('english', message)
@@ to_tsquery('violence') and (permission & 4 = permission);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1329.07..1329.08 rows=1 width=4) (actual time=29819.293..29819.295 rows=1 loops=1)
-> Nested Loop (cost=978.97..1329.01 rows=22 width=4) (actual time=2.575..29815.530 rows=2056 loops=1)
-> Index Scan using forum_name on forum (cost=0.00..242.39 rows=1 width=13) (actual time=0.016..0.355 rows=62 loops=1)
Filter: ((permission & 4) = permission)
-> Bitmap Heap Scan on post (cost=978.97..1086.28 rows=27 width=14) (actual time=97.997..480.746 rows=33 loops=62)
Recheck Cond: ((to_tsvector('english'::text, post.message) @@ to_tsquery('violence'::text)) AND (post.forum = forum.name))
Filter: (post.invisible <> 1)
-> BitmapAnd (cost=978.97..978.97 rows=27 width=0) (actual time=91.106..91.106 rows=0 loops=62)
-> Bitmap Index Scan on idx_message (cost=0.00..370.57 rows=1435 width=0) (actual time=0.680..0.680 rows=2099 loops=62)
Index Cond: (to_tsvector('english'::text, post.message) @@ to_tsquery('violence'::text))
-> Bitmap Index Scan on post_forum (cost=0.00..607.78 rows=26575 width=0) (actual time=89.927..89.927 rows=22980 loops=62)
Index Cond: (post.forum = forum.name)
Total runtime: 29819.376 ms
(13 rows)
ticker=#
The problem appearsa to lie in the "nested loop", and I don't understand why that's happening. Isn't a **LINEAR** check on each returned value (since we do the aggregate first?) sufficient? Why is the query planner creating a nested loop - the aggregate contains the tested field and it is not subject to change once aggregated?!
Attachment
Karl Denninger <karl@denninger.net> writes: > The problem appearsa to lie in the "nested loop", and I don't understand > why that's happening. It looks to me like there are several issues here. One is the drastic underestimate of the number of rows satisfying the permission condition. That leads the planner to think that a nestloop join with the other table will be fast, which is only right if there are just one or a few rows coming out of "forum". With sixty-some rows you get sixty-some repetitions of the scan of the other table, which loses. Problem number two is the overeager use of a BitmapAnd to add on another index that isn't really very selective. That might be a correct decision but it looks fishy here. We rewrote choose_bitmap_and a couple of times to try to fix that problem ... what PG version is this exactly? The third thing that looks fishy is that it's using unqualified index scans for no apparent reason. Have you got enable_seqscan turned off, and if so what happens when you fix that? What other nondefault planner settings are you using? But anyway, the big problem seems to be poor selectivity estimates for conditions like "(permission & 127) = permission". I have bad news for you: there is simply no way in the world that Postgres is not going to suck at estimating that, because the planner has no knowledge whatsoever of the behavior of "&". You could consider writing and submitting a patch that would teach it something about that, but in the near term it would be a lot easier to reconsider your representation of permissions. You'd be likely to get significantly better results, not to mention have more-readable queries, if you stored them as a group of simple boolean columns. regards, tom lane
Tom Lane wrote: > Karl Denninger <karl@denninger.net> writes: > >> The problem appearsa to lie in the "nested loop", and I don't understand >> why that's happening. >> > It looks to me like there are several issues here. > > One is the drastic underestimate of the number of rows satisfying the > permission condition. That leads the planner to think that a nestloop > join with the other table will be fast, which is only right if there are > just one or a few rows coming out of "forum". With sixty-some rows you > get sixty-some repetitions of the scan of the other table, which loses. > "Loses" isn't quite the right word... :) > Problem number two is the overeager use of a BitmapAnd to add on another > index that isn't really very selective. That might be a correct > decision but it looks fishy here. We rewrote choose_bitmap_and a couple > of times to try to fix that problem ... what PG version is this exactly? > $ psql ticker Welcome to psql 8.3.6, the PostgreSQL interactive terminal. > The third thing that looks fishy is that it's using unqualified index > scans for no apparent reason. Have you got enable_seqscan turned off, > and if so what happens when you fix that? What other nondefault planner > settings are you using? > None; here is the relevant section of the postgresql.conf file: # - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_mergejoin = on #enable_nestloop = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on # - Planner Cost Constants - #seq_page_cost = 1.0 # measured on an arbitrary scale #random_page_cost = 4.0 # same scale as above #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above #effective_cache_size = 128MB # - Genetic Query Optimizer - #geqo = on #geqo_threshold = 12 #geqo_effort = 5 # range 1-10 #geqo_pool_size = 0 # selects default based on effort #geqo_generations = 0 # selects default based on effort #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - default_statistics_target = 100 # range 1-1000 #constraint_exclusion = off #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of explicit # JOIN clauses All commented out - nothing set to non-defaults, other than the default statistics target. > But anyway, the big problem seems to be poor selectivity estimates for > conditions like "(permission & 127) = permission". I have bad news for > you: there is simply no way in the world that Postgres is not going to > suck at estimating that, because the planner has no knowledge whatsoever > of the behavior of "&". You could consider writing and submitting a > patch that would teach it something about that, but in the near term > it would be a lot easier to reconsider your representation of > permissions. You'd be likely to get significantly better results, > not to mention have more-readable queries, if you stored them as a group > of simple boolean columns. > > regards, tom lane > Ugh. The issue here is that the permission structure is quite extensible by the users of the code; there are defined bits (Bit 4, for example, means that the user is an "ordinary user" and has a login account) but the upper bits are entirely administrator-defined and may vary from one installation to another (and do) The bitmask allows the setting of multiple permissions but the table definition doesn't have to change (well, so long as the bits fit into a word!) Finally, this is a message forum - the actual code itself is template-driven and the bitmask permission structure is ALL OVER the templates; getting that out of there would be a really nasty rewrite, not to mention breaking the user (non-developer, but owner) extensibility of the current structure. Is there a way to TELL the planner how to deal with this, even if it makes the SQL non-portable or is a hack on the source mandatory? For the particular instance where this came up it won't be murderous to omit the bitmask check from the query, as there are no "owner/moderator only" sub-forums (the one place where not checking that would bite HARD as it would allow searches of "hidden" content by ordinary users.) However, there are other installations where this will be a bigger deal; I can in the immediate term put that query into the config file (instead of hard-coding it) so for people who can't live with the performance they can make the tradeoff decision. -- Karl
Attachment
> The bitmask allows the setting of multiple permissions but the table > definition doesn't have to change (well, so long as the bits fit into a > word!) Finally, this is a message forum - the actual code itself is > template-driven and the bitmask permission structure is ALL OVER the > templates; getting that out of there would be a really nasty rewrite, > not to mention breaking the user (non-developer, but owner) > extensibility of the current structure. > > Is there a way to TELL the planner how to deal with this, even if it > makes the SQL non-portable or is a hack on the source mandatory? You could use an integer array instead of a bit mask, make a gist index on it, and instead of doing "mask & xxx" do "array contains xxx", which is indexable with gist. The idea is that it can get much better row estimation. Instead of 1,2,3, you can use 1,2,4,8, etc if you like. you'd probably need a function to convert a bitmask into ints and another to do the conversion back, so the rest of your app gets the expected bitmasks. Or add a bitmask type to postgres with ptoper statistics...
Karl Denninger wrote: > The bitmask allows the setting of multiple permissions but the table > definition doesn't have to change (well, so long as the bits fit into a > word!) Finally, this is a message forum - the actual code itself is > template-driven and the bitmask permission structure is ALL OVER the > templates; getting that out of there would be a really nasty rewrite, > not to mention breaking the user (non-developer, but owner) > extensibility of the current structure. > > Is there a way to TELL the planner how to deal with this, even if it > makes the SQL non-portable or is a hack on the source mandatory? You could maybe create function indexes for common bitmap operations; for example if it's common to check a single bit you could create 32 indexes, on (field & 1), (field & 2), (field & 4), etc. You could also maybe extend this so if you need to query multiple bits you decompose them into individual single-bit queries, e.g. instead of (field & 3) you do ((field & 1) and (field & 2)). I suppose there will be a break-even point in complexity before which the above approach will be very slow but after it it should scale better then the alternative.