Thread: Weird query plans for my queries, causing terrible performance.
To create a list of recent topics in a forum I use these queries in mysql: SELECT TopicID, StatusID, UserID FROM F_Topics WHERE ForumID = 15 AND Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false ORDER BY Lastmessage DESC LIMIT 1000 and SELECT TopicID, StatusID, UserID FROM F_Topics WHERE StatusID IN(1) AND Deleted = false AND ForumID = 15 Where in the client-code some simple checks are done on the queryresults (for the rights a user has), which would make the queries quite slow on mysql. And after that it simply selects the needed info from the database using a in list: SELECT F_Topics.Name, F_Topics.TopicID, etc FROM F_Topics LEFT JOIN F_Users AS L_Users ON (L_Users.UserID = F_Topics.LastposterID) , F_Users WHERE TopicID IN (list of topicids gathered from the above queries) AND F_Users.UserID = F_Topics.UserID AND !F_Topics.Deleted AND F_Topics.ForumID=15 Together it takes around 60ms to finish in mysql. That isn't the nicest query setup. So I tried to create a single query for this in postgresql. The F_Topics table has some fields, including TopicID (primary key), UserID (foreign key to F_Users.UserID), LastPosterID (nullable foreign to F_Users.UserID), LastMessage (timestamp), ForumID, StatusID (being a integer with just 3 different values) and deleted (boolean) There are three indices: f_topics_pkey (on the topicid) f_topics_forum_lastmessage_deleted (forumid, lastmessage, deleted) f_topics_forum_status_deleted (forumid, statusid, deleted) and the only relevant index on the F_Users table is its primary key on UserID. The query: SELECT F_Topics.TopicID,F_Topics.StatusID, UserID FROM F_Topics WHERE (StatusID IN(1) OR F_Topics.Lastmessage > '2002-08-01 23:27:03+02') AND F_Topics.Deleted = false AND ForumID = 15 Results in the plan: Seq Scan on f_topics (cost=0.00..20347.58 rows=59 width=12) Filter: (((statusid = 1) OR (lastmessage > '2002-08-01 23:27:03'::timestamp without time zone)) AND (deleted = false) AND (forumid = 15)) Which would take around 2 seconds to complete. While the query: SELECT TopicID, StatusID, UserID FROM F_Topics WHERE (StatusID IN(1) AND Deleted = false AND ForumID = 15) OR (Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false AND ForumID = 15) Results in the plan: Index Scan using f_topics_forum_status_deleted, f_topics_forum_lastmessage_deleted on f_topics (cost=0.00..241.87 rows=60 width=12) Index Cond: (((forumid = 15) AND (statusid = 1) AND (deleted = false)) OR ((forumid = 15) AND (lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) AND (deleted = false))) Filter: (((statusid = 1) AND (deleted = false) AND (forumid = 15)) OR ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) AND (deleted = false) AND (forumid = 15))) (3 rows) Executing in around 9ms and exactly what I need. Of coarse the query needs to be joined with the F_Users table. My tries were: SELECT TopicID, StatusID as StatusID, F_Users.UserID FROM F_Topics, F_Users WHERE (StatusID IN(1) AND Deleted = false AND ForumID = 15 AND F_Topics.UserID = F_Users.UserID) OR (Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false AND ForumID = 15 AND F_Topics.UserID = F_Users.UserID) and SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID FROM F_Topics, F_Users WHERE ( (StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15) OR (F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted = false AND ForumID = 15)) AND F_Topics.UserID = F_Users.UserID (and a inner join'ed version of the above) All three took 500-600ms to complete and changed the query plan to something similar to: Nested Loop (cost=0.00..23788.87 rows=1 width=16) -> Index Scan using f_topics_forum_status_deleted on f_topics (cost=0.00..23785.84 rows=1 width=12) Index Cond: (forumid = 15) Filter: (((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) OR (statusid = 1)) AND ((deleted = false) OR (statusid = 1)) AND ((forumid = 15) OR (statusid = 1)) AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) OR (deleted = false)) AND (deleted = false) AND ((forumid = 15) OR (deleted = false)) AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) OR (forumid = 15)) AND ((deleted = false) OR (forumid = 15))) -> Index Scan using f_users_pkey on f_users (cost=0.00..3.01 rows=1 width=4) Index Cond: ("outer".userid = f_users.userid) (6 rows) I.e. It creates a much harder query plan to execute and terribly slows down what should be much simpler and much faster. By the way, I did do 'vacuum full analyze' before gathering these plans. The result of the above queries is 134 rows from 549679 topicrows in total. 3 rows being selected due to the status = 1 and 131 due to the lastmessage time restriction. In total there are 15943 topics in the 15th forum, in total 102 having a statusid = 1, 5321 meeting the last message requirement and 15 topics have the deleted boolean set to true. I hope someone can shine a bright light on this, since I'm out of clues. I also tries using subselects for the separate parts either union'ed together or OR'ed together. Resulting in even worse plans (with seq-scans for the f_users.userid for instance, or the f_topics.topicid) of over 60 seconds execution time. Best regards, Arjen van der Meijden
Arjen van der Meijden <acm@tweakers.net> writes: > Of coarse the query needs to be joined with the F_Users table. > My tries were: > ... > SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID > FROM F_Topics, F_Users > WHERE ( > (StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15) > OR > (F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted = > false AND ForumID = 15)) > AND F_Topics.UserID = F_Users.UserID > (and a inner join'ed version of the above) You sure you tried the inner-join case? I did explain SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID FROM F_Topics JOIN F_Users USING (userid) WHERE (StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15) OR (F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted = false AND ForumID = 15) and got the plan you wanted: Nested Loop (cost=0.00..14.51 rows=1 width=16) -> Index Scan using f_topics_forum_status_deleted, f_topics_forum_lastmessage_deleted on f_topics (cost=0.00..9.67 rows=1width=12) Index Cond: (((forumid = 15) AND (statusid = 1) AND (deleted = false)) OR ((forumid = 15) AND (lastmessage > '2002-08-0117:27:03-04'::timestamp with time zone) AND (deleted = false))) Filter: (((statusid = 1) AND (deleted = false) AND (forumid = 15)) OR ((lastmessage > '2002-08-01 17:27:03-04'::timestampwith time zone) AND (deleted = false) AND (forumid = 15))) -> Index Scan using f_users_pkey on f_users (cost=0.00..4.82 rows=1 width=4) Index Cond: ("outer".userid = f_users.userid) Of course the cost estimates are bogus because I have no data in the test tables, but the thing is capable of producing the plan you want. I believe the reason the query you give above doesn't work like you want is that the planner first converts the whole WHERE clause to CNF form (canonical AND-of-OR layout), and then is unable to separate out the join clause from the spaghetti-like restriction clause. The CNF-conversion heuristic is usually a good one, but not in this case. Writing the join clause as a JOIN clause keeps it separate from WHERE, preventing this mistake from being made. Then the WHERE clause is already in DNF (canonical OR-of-ANDs) form, which the planner also likes, so it doesn't rearrange it. It might be that we could improve the qual-rearrangement heuristics by trying to keep join clauses separate from single-relation restrictions. Anyone care to take a look at it? The gold is all hidden in src/backend/optimizer/prep/prepqual.c ... regards, tom lane
> Van: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Verzonden: vrijdag 31 januari 2003 2:35 > > You sure you tried the inner-join case? I did > > explain SELECT F_Topics.TopicID,F_Topics.StatusID as > StatusID, F_Users.UserID FROM F_Topics JOIN F_Users USING > (userid) WHERE (StatusID IN(1) AND F_Topics.Deleted = false > AND ForumID = 15) OR (F_Topics.Lastmessage > '2002-08-01 > 23:27:03+02' AND F_Topics.Deleted = > false AND ForumID = 15) > > and got the plan you wanted: > > Nested Loop (cost=0.00..14.51 rows=1 width=16) > -> Index Scan using f_topics_forum_status_deleted, > f_topics_forum_lastmessage_deleted on f_topics > (cost=0.00..9.67 rows=1 width=12) > Index Cond: (((forumid = 15) AND (statusid = 1) AND > (deleted = false)) OR ((forumid = 15) AND (lastmessage > > '2002-08-01 17:27:03-04'::timestamp with time zone) AND > (deleted = false))) > Filter: (((statusid = 1) AND (deleted = false) AND > (forumid = 15)) OR ((lastmessage > '2002-08-01 > 17:27:03-04'::timestamp with time zone) AND (deleted = false) > AND (forumid = 15))) > -> Index Scan using f_users_pkey on f_users > (cost=0.00..4.82 rows=1 width=4) > Index Cond: ("outer".userid = f_users.userid) > That is weird, a copy&paste of your command into my psql results in: Nested Loop (cost=0.00..23788.87 rows=1 width=16) -> Index Scan using f_topics_forum_status_deleted on f_topics (cost=0.00..23785.84 rows=1 width=12) Index Cond: (forumid = 15) Filter: (((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) OR (statusid = 1)) AND ((deleted = false) OR (statusid = 1)) AND ((forumid = 15) OR (statusid = 1)) AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) OR (deleted = false)) AND (deleted = false) AND ((forumid = 15) OR (deleted = false)) AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) OR (forumid = 15)) AND ((deleted = false) OR (forumid = 15))) -> Index Scan using f_users_pkey on f_users (cost=0.00..3.01 rows=1 width=4) Index Cond: ("outer".userid = f_users.userid) Which is the same as the one as when using a normal join in the where part. > Of course the cost estimates are bogus because I have no data > in the test tables, but the thing is capable of producing the > plan you want. I didn't doubt it could, I was even counting on it :) Actually, when I tried *not* to have postgres use any of the dataknowledge using this: prepare the_query(integer, integer, timestamp) AS SELECT TopicID, StatusID as StatusID, F_Users.UserID FROM F_Topics INNER JOIN F_Users USING (UserID) WHERE (StatusID IN ( $1 ) AND F_Topics.Deleted = false AND ForumID = $2 ) OR (Lastmessage > $3 AND F_Topics.Deleted = false AND ForumID = $2 ) execute the_query(1, 15, '2002-08-01 23:27:03+02') The execution time is 2 seconds, ie it is doing a sequential scan somewhere along the path. > I believe the reason the query you give above doesn't work > like you want is that the planner first converts the whole > WHERE clause to CNF form (canonical AND-of-OR layout), and > then is unable to separate out the join clause from the > spaghetti-like restriction clause. The CNF-conversion > heuristic is usually a good one, but not in this case. > > Writing the join clause as a JOIN clause keeps it separate > from WHERE, preventing this mistake from being made. Then > the WHERE clause is already in DNF (canonical OR-of-ANDs) > form, which the planner also likes, so it doesn't rearrange it. That would be nice, but in my case postgres 7.3 decides to rearrange it apparently. For all variants, I know, that I tried the same explain output resulted. Regards, Arjen
Arjen van der Meijden <acm@tweakers.net> writes: > That is weird, a copy&paste of your command into my psql results in: > [different results] Hm, there must be some difference on this query between 7.3 and CVS tip then; I was not expecting that. [ ... a debugger is fired up ... time passes ... ] Ah hah. There's a rather shaky heuristic in canonicalize_qual that prefers DNF if certain things are true, one of them being that the qual condition mentions only one relation. This test is being fooled because you wrote some of the variables with "F_Topics." and some without (which means that, according to the letter of the SQL spec, they refer to the join relation's result and not the original table). I find that 7.3 will produce the desired plan if I write all the variables in the WHERE clause the same way, either with or without "F_Topics.". CVS tip doesn't show this effect because it handles join variables differently. So that's your workaround for the moment. As I was saying, this code could use some fresh ideas... regards, tom lane