Thread: Weird query plans for my queries, causing terrible performance.

Weird query plans for my queries, causing terrible performance.

From
Arjen van der Meijden
Date:
To create a list of recent topics in a forum I use these queries in mysql:
SELECT TopicID, StatusID, UserID
FROM F_Topics
WHERE ForumID = 15 AND Lastmessage > '2002-08-01 23:27:03+02'
AND Deleted = false
ORDER BY Lastmessage DESC
LIMIT 1000

and

SELECT TopicID, StatusID, UserID
FROM F_Topics
WHERE StatusID IN(1) AND Deleted = false AND ForumID = 15

Where in the client-code some simple checks are done on the queryresults
(for the rights a user has), which  would make the queries quite slow on
mysql.

And after that it simply selects the needed info from the database using
a in list:

SELECT F_Topics.Name, F_Topics.TopicID, etc
FROM
F_Topics
LEFT JOIN
F_Users AS L_Users ON (L_Users.UserID = F_Topics.LastposterID)
, F_Users
WHERE
TopicID IN (list of topicids gathered from the above queries)
AND
F_Users.UserID = F_Topics.UserID
AND
!F_Topics.Deleted
AND
F_Topics.ForumID=15

Together it takes around 60ms to finish in mysql. That isn't the nicest
query setup. So I tried to create a single query for this in postgresql.

The F_Topics table has some fields, including TopicID (primary key),
UserID (foreign key to F_Users.UserID), LastPosterID (nullable foreign
to F_Users.UserID), LastMessage (timestamp), ForumID, StatusID (being a
integer with just 3 different values) and deleted (boolean)

There are three indices:
f_topics_pkey (on the topicid)
f_topics_forum_lastmessage_deleted (forumid, lastmessage, deleted)
f_topics_forum_status_deleted (forumid, statusid, deleted)

and the only relevant index on the F_Users table is its primary key on
UserID.

The query:
SELECT F_Topics.TopicID,F_Topics.StatusID, UserID
FROM F_Topics
WHERE (StatusID IN(1) OR F_Topics.Lastmessage > '2002-08-01 23:27:03+02')
AND F_Topics.Deleted = false AND ForumID = 15

Results in the plan:

  Seq Scan on f_topics  (cost=0.00..20347.58 rows=59 width=12)
    Filter: (((statusid = 1) OR (lastmessage > '2002-08-01
23:27:03'::timestamp without time zone)) AND (deleted = false) AND
(forumid = 15))
Which would take around 2 seconds to complete.

While the query:
SELECT TopicID, StatusID, UserID
FROM F_Topics
WHERE (StatusID IN(1) AND Deleted = false AND ForumID = 15)
OR
(Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false AND ForumID
= 15)

Results in the plan:

                    Index Scan using f_topics_forum_status_deleted,
f_topics_forum_lastmessage_deleted on f_topics  (cost=0.00..241.87
rows=60 width=12)
    Index Cond: (((forumid = 15) AND (statusid = 1) AND (deleted =
false)) OR ((forumid = 15) AND (lastmessage > '2002-08-01
23:27:03'::timestamp without time zone) AND (deleted = false)))
    Filter: (((statusid = 1) AND (deleted = false) AND (forumid = 15))
OR ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone)
AND (deleted = false) AND (forumid = 15)))
(3 rows)

Executing in around 9ms and exactly what I need.

Of coarse the query needs to be joined with the F_Users table.
My tries were:
SELECT TopicID, StatusID as StatusID, F_Users.UserID
FROM F_Topics, F_Users
WHERE (StatusID IN(1) AND Deleted = false AND ForumID = 15 AND
F_Topics.UserID = F_Users.UserID)
OR
(Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false AND ForumID
= 15  AND F_Topics.UserID = F_Users.UserID)

and

SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID
FROM F_Topics, F_Users
WHERE (
(StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15)
OR
(F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted =
false AND ForumID = 15))
AND F_Topics.UserID = F_Users.UserID

(and a inner join'ed version of the above)

All three took 500-600ms to complete and changed the query plan to
something similar to:
  Nested Loop  (cost=0.00..23788.87 rows=1 width=16)
    ->  Index Scan using f_topics_forum_status_deleted on f_topics
(cost=0.00..23785.84 rows=1 width=12)
          Index Cond: (forumid = 15)
          Filter: (((lastmessage > '2002-08-01 23:27:03'::timestamp
without time zone) OR (statusid = 1)) AND ((deleted = false) OR
(statusid = 1)) AND ((forumid = 15) OR (statusid = 1)) AND ((lastmessage
 > '2002-08-01 23:27:03'::timestamp without time zone) OR (deleted =
false)) AND (deleted = false) AND ((forumid = 15) OR (deleted = false))
AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone)
OR (forumid = 15)) AND ((deleted = false) OR (forumid = 15)))
    ->  Index Scan using f_users_pkey on f_users  (cost=0.00..3.01
rows=1 width=4)
          Index Cond: ("outer".userid = f_users.userid)
(6 rows)

I.e. It creates a much harder query plan to execute and terribly slows
down what should be much simpler and much faster.


By the way, I did do 'vacuum full analyze' before gathering these plans.
The result of the above queries is 134 rows from 549679 topicrows in
total. 3 rows being selected due to the status = 1 and 131 due to the
lastmessage time restriction.
In total there are 15943 topics in the 15th forum, in total 102 having a
statusid = 1, 5321 meeting the last message requirement and 15 topics
have the deleted boolean set to true.

I hope someone can shine a bright light on this, since I'm out of clues.
I also tries using subselects for the separate parts either union'ed
together or OR'ed together. Resulting in even worse plans (with
seq-scans for the f_users.userid for instance, or the f_topics.topicid)
of over 60 seconds execution time.

Best regards,

Arjen van der Meijden


Re: Weird query plans for my queries, causing terrible performance.

From
Tom Lane
Date:
Arjen van der Meijden <acm@tweakers.net> writes:
> Of coarse the query needs to be joined with the F_Users table.
> My tries were:
> ...
> SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID
> FROM F_Topics, F_Users
> WHERE (
> (StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15)
> OR
> (F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted =
> false AND ForumID = 15))
> AND F_Topics.UserID = F_Users.UserID

> (and a inner join'ed version of the above)

You sure you tried the inner-join case?  I did

explain SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID
FROM F_Topics  JOIN F_Users USING (userid)
WHERE
(StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15)
OR
(F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted =
false AND ForumID = 15)

and got the plan you wanted:

 Nested Loop  (cost=0.00..14.51 rows=1 width=16)
   ->  Index Scan using f_topics_forum_status_deleted, f_topics_forum_lastmessage_deleted on f_topics  (cost=0.00..9.67
rows=1width=12) 
         Index Cond: (((forumid = 15) AND (statusid = 1) AND (deleted = false)) OR ((forumid = 15) AND (lastmessage >
'2002-08-0117:27:03-04'::timestamp with time zone) AND (deleted = false))) 
         Filter: (((statusid = 1) AND (deleted = false) AND (forumid = 15)) OR ((lastmessage > '2002-08-01
17:27:03-04'::timestampwith time zone) AND (deleted = false) AND (forumid = 15))) 
   ->  Index Scan using f_users_pkey on f_users  (cost=0.00..4.82 rows=1 width=4)
         Index Cond: ("outer".userid = f_users.userid)

Of course the cost estimates are bogus because I have no data in the
test tables, but the thing is capable of producing the plan you want.

I believe the reason the query you give above doesn't work like you want
is that the planner first converts the whole WHERE clause to CNF form
(canonical AND-of-OR layout), and then is unable to separate out the
join clause from the spaghetti-like restriction clause.  The
CNF-conversion heuristic is usually a good one, but not in this case.

Writing the join clause as a JOIN clause keeps it separate from WHERE,
preventing this mistake from being made.  Then the WHERE clause is
already in DNF (canonical OR-of-ANDs) form, which the planner also
likes, so it doesn't rearrange it.

It might be that we could improve the qual-rearrangement heuristics
by trying to keep join clauses separate from single-relation
restrictions.  Anyone care to take a look at it?  The gold is all hidden
in src/backend/optimizer/prep/prepqual.c ...

            regards, tom lane

Re: Weird query plans for my queries,

From
Arjen van der Meijden
Date:
> Van: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Verzonden: vrijdag 31 januari 2003 2:35
>
> You sure you tried the inner-join case?  I did
>
> explain SELECT F_Topics.TopicID,F_Topics.StatusID as
> StatusID, F_Users.UserID FROM F_Topics  JOIN F_Users USING
> (userid) WHERE (StatusID IN(1) AND F_Topics.Deleted = false
> AND ForumID = 15) OR (F_Topics.Lastmessage > '2002-08-01
> 23:27:03+02' AND F_Topics.Deleted =
> false AND ForumID = 15)
>
> and got the plan you wanted:
>
>  Nested Loop  (cost=0.00..14.51 rows=1 width=16)
>    ->  Index Scan using f_topics_forum_status_deleted,
> f_topics_forum_lastmessage_deleted on f_topics
> (cost=0.00..9.67 rows=1 width=12)
>          Index Cond: (((forumid = 15) AND (statusid = 1) AND
> (deleted = false)) OR ((forumid = 15) AND (lastmessage >
> '2002-08-01 17:27:03-04'::timestamp with time zone) AND
> (deleted = false)))
>          Filter: (((statusid = 1) AND (deleted = false) AND
> (forumid = 15)) OR ((lastmessage > '2002-08-01
> 17:27:03-04'::timestamp with time zone) AND (deleted = false)
> AND (forumid = 15)))
>    ->  Index Scan using f_users_pkey on f_users
> (cost=0.00..4.82 rows=1 width=4)
>          Index Cond: ("outer".userid = f_users.userid)
>

That is weird, a copy&paste of your command into my psql results in:

 Nested Loop  (cost=0.00..23788.87 rows=1 width=16)
   ->  Index Scan using f_topics_forum_status_deleted on f_topics
(cost=0.00..23785.84 rows=1 width=12)
         Index Cond: (forumid = 15)
         Filter: (((lastmessage > '2002-08-01 23:27:03'::timestamp
without time zone) OR (statusid = 1)) AND ((deleted = false) OR
(statusid = 1)) AND ((forumid = 15) OR (statusid = 1)) AND ((lastmessage
> '2002-08-01 23:27:03'::timestamp without time zone) OR (deleted =
false)) AND (deleted = false) AND ((forumid = 15) OR (deleted = false))
AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone)
OR (forumid = 15)) AND ((deleted = false) OR (forumid = 15)))
   ->  Index Scan using f_users_pkey on f_users  (cost=0.00..3.01 rows=1
width=4)
         Index Cond: ("outer".userid = f_users.userid)

Which is the same as the one as when using a normal join in the where
part.

> Of course the cost estimates are bogus because I have no data
> in the test tables, but the thing is capable of producing the
> plan you want.
I didn't doubt it could, I was even counting on it :)

Actually, when I tried *not* to have postgres use any of the
dataknowledge using this:

prepare the_query(integer, integer, timestamp) AS
SELECT TopicID, StatusID as StatusID, F_Users.UserID
FROM F_Topics INNER JOIN F_Users USING (UserID)
WHERE (StatusID IN ( $1 ) AND F_Topics.Deleted = false AND ForumID = $2
)
OR
(Lastmessage > $3 AND F_Topics.Deleted = false AND ForumID = $2 )

execute the_query(1, 15, '2002-08-01 23:27:03+02')

The execution time is 2 seconds, ie it is doing a sequential scan
somewhere along the path.

> I believe the reason the query you give above doesn't work
> like you want is that the planner first converts the whole
> WHERE clause to CNF form (canonical AND-of-OR layout), and
> then is unable to separate out the join clause from the
> spaghetti-like restriction clause.  The CNF-conversion
> heuristic is usually a good one, but not in this case.
>
> Writing the join clause as a JOIN clause keeps it separate
> from WHERE, preventing this mistake from being made.  Then
> the WHERE clause is already in DNF (canonical OR-of-ANDs)
> form, which the planner also likes, so it doesn't rearrange it.
That would be nice, but in my case postgres 7.3 decides to rearrange it
apparently.
For all variants, I know, that I tried the same explain output resulted.

Regards,

Arjen



Re: Weird query plans for my queries, causing terrible performance.

From
Tom Lane
Date:
Arjen van der Meijden <acm@tweakers.net> writes:
> That is weird, a copy&paste of your command into my psql results in:
> [different results]

Hm, there must be some difference on this query between 7.3 and CVS tip
then; I was not expecting that.  [ ... a debugger is fired up ... time
passes ... ]

Ah hah.  There's a rather shaky heuristic in canonicalize_qual that
prefers DNF if certain things are true, one of them being that the
qual condition mentions only one relation.  This test is being fooled
because you wrote some of the variables with "F_Topics." and some
without (which means that, according to the letter of the SQL spec,
they refer to the join relation's result and not the original table).
I find that 7.3 will produce the desired plan if I write all the
variables in the WHERE clause the same way, either with or without
"F_Topics.".  CVS tip doesn't show this effect because it handles join
variables differently.

So that's your workaround for the moment.  As I was saying, this code
could use some fresh ideas...

            regards, tom lane