Weird query plans for my queries, causing terrible performance. - Mailing list pgsql-general
From | Arjen van der Meijden |
---|---|
Subject | Weird query plans for my queries, causing terrible performance. |
Date | |
Msg-id | b1celn$37k$1@news.tudelft.nl Whole thread Raw |
Responses |
Re: Weird query plans for my queries, causing terrible performance.
|
List | pgsql-general |
To create a list of recent topics in a forum I use these queries in mysql: SELECT TopicID, StatusID, UserID FROM F_Topics WHERE ForumID = 15 AND Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false ORDER BY Lastmessage DESC LIMIT 1000 and SELECT TopicID, StatusID, UserID FROM F_Topics WHERE StatusID IN(1) AND Deleted = false AND ForumID = 15 Where in the client-code some simple checks are done on the queryresults (for the rights a user has), which would make the queries quite slow on mysql. And after that it simply selects the needed info from the database using a in list: SELECT F_Topics.Name, F_Topics.TopicID, etc FROM F_Topics LEFT JOIN F_Users AS L_Users ON (L_Users.UserID = F_Topics.LastposterID) , F_Users WHERE TopicID IN (list of topicids gathered from the above queries) AND F_Users.UserID = F_Topics.UserID AND !F_Topics.Deleted AND F_Topics.ForumID=15 Together it takes around 60ms to finish in mysql. That isn't the nicest query setup. So I tried to create a single query for this in postgresql. The F_Topics table has some fields, including TopicID (primary key), UserID (foreign key to F_Users.UserID), LastPosterID (nullable foreign to F_Users.UserID), LastMessage (timestamp), ForumID, StatusID (being a integer with just 3 different values) and deleted (boolean) There are three indices: f_topics_pkey (on the topicid) f_topics_forum_lastmessage_deleted (forumid, lastmessage, deleted) f_topics_forum_status_deleted (forumid, statusid, deleted) and the only relevant index on the F_Users table is its primary key on UserID. The query: SELECT F_Topics.TopicID,F_Topics.StatusID, UserID FROM F_Topics WHERE (StatusID IN(1) OR F_Topics.Lastmessage > '2002-08-01 23:27:03+02') AND F_Topics.Deleted = false AND ForumID = 15 Results in the plan: Seq Scan on f_topics (cost=0.00..20347.58 rows=59 width=12) Filter: (((statusid = 1) OR (lastmessage > '2002-08-01 23:27:03'::timestamp without time zone)) AND (deleted = false) AND (forumid = 15)) Which would take around 2 seconds to complete. While the query: SELECT TopicID, StatusID, UserID FROM F_Topics WHERE (StatusID IN(1) AND Deleted = false AND ForumID = 15) OR (Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false AND ForumID = 15) Results in the plan: Index Scan using f_topics_forum_status_deleted, f_topics_forum_lastmessage_deleted on f_topics (cost=0.00..241.87 rows=60 width=12) Index Cond: (((forumid = 15) AND (statusid = 1) AND (deleted = false)) OR ((forumid = 15) AND (lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) AND (deleted = false))) Filter: (((statusid = 1) AND (deleted = false) AND (forumid = 15)) OR ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) AND (deleted = false) AND (forumid = 15))) (3 rows) Executing in around 9ms and exactly what I need. Of coarse the query needs to be joined with the F_Users table. My tries were: SELECT TopicID, StatusID as StatusID, F_Users.UserID FROM F_Topics, F_Users WHERE (StatusID IN(1) AND Deleted = false AND ForumID = 15 AND F_Topics.UserID = F_Users.UserID) OR (Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false AND ForumID = 15 AND F_Topics.UserID = F_Users.UserID) and SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID FROM F_Topics, F_Users WHERE ( (StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15) OR (F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted = false AND ForumID = 15)) AND F_Topics.UserID = F_Users.UserID (and a inner join'ed version of the above) All three took 500-600ms to complete and changed the query plan to something similar to: Nested Loop (cost=0.00..23788.87 rows=1 width=16) -> Index Scan using f_topics_forum_status_deleted on f_topics (cost=0.00..23785.84 rows=1 width=12) Index Cond: (forumid = 15) Filter: (((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) OR (statusid = 1)) AND ((deleted = false) OR (statusid = 1)) AND ((forumid = 15) OR (statusid = 1)) AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) OR (deleted = false)) AND (deleted = false) AND ((forumid = 15) OR (deleted = false)) AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone) OR (forumid = 15)) AND ((deleted = false) OR (forumid = 15))) -> Index Scan using f_users_pkey on f_users (cost=0.00..3.01 rows=1 width=4) Index Cond: ("outer".userid = f_users.userid) (6 rows) I.e. It creates a much harder query plan to execute and terribly slows down what should be much simpler and much faster. By the way, I did do 'vacuum full analyze' before gathering these plans. The result of the above queries is 134 rows from 549679 topicrows in total. 3 rows being selected due to the status = 1 and 131 due to the lastmessage time restriction. In total there are 15943 topics in the 15th forum, in total 102 having a statusid = 1, 5321 meeting the last message requirement and 15 topics have the deleted boolean set to true. I hope someone can shine a bright light on this, since I'm out of clues. I also tries using subselects for the separate parts either union'ed together or OR'ed together. Resulting in even worse plans (with seq-scans for the f_users.userid for instance, or the f_topics.topicid) of over 60 seconds execution time. Best regards, Arjen van der Meijden
pgsql-general by date: