poor execution plan because column dependence - Mailing list pgsql-performance

From Václav Ovsík
Subject poor execution plan because column dependence
Date
Msg-id 20110412232343.GA15966@bobek.localdomain
Whole thread Raw
Responses Re: poor execution plan because column dependence
Re: poor execution plan because column dependence
List pgsql-performance
Hi,
I have done migration of the Request Tracker 3.8.9
(http://requesttracker.wikia.com/wiki/HomePage) from Mysql to
PostgreSQL in testing environment.
The RT schema used can be viewed at
https://github.com/bestpractical/rt/blob/3.8-trunk/etc/schema.Pg.
I have added full text search on table Attachments based on trigrams
(and still experimenting with it), but is is not interesting for the
problem (the problem is not caused by it directly).
The full text search alone works quite good. A user testing a new RT instance
reported a poor performance problem with a bit more complex query (more
conditions resulting in table joins).
Queries are constructed by module DBIx::SearchBuilder.
The problematic query logged:

rt=# EXPLAIN ANALYZE SELECT DISTINCT  main.* FROM Tickets main JOIN Transactions Transactions_1  ON (
Transactions_1.ObjectId= main.id ) JOIN Attachments Attachments_2  ON ( Attachments_2.TransactionId = Transactions_1.id
) WHERE (Transactions_1.ObjectType = 'RT::Ticket') AND (main.Status != 'deleted') AND (main.Status = 'resolved' AND
main.LastUpdated> '2008-12-31 23:00:00' AND main.Created > '2005-12-31 23:00:00' AND main.Queue = '15' AND  (
Attachments_2.trigrams@@ text_to_trgm_tsquery('uir') AND Attachments_2.Content ILIKE '%uir%' ) ) AND (main.Type =
'ticket')AND (main.EffectiveId = main.id)  ORDER BY main.id ASC; 

                                                  QUERY PLAN
                                                                                                                

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=23928.60..23928.67 rows=1 width=162) (actual time=5201.139..5207.965 rows=649 loops=1)
   ->  Sort  (cost=23928.60..23928.61 rows=1 width=162) (actual time=5201.137..5201.983 rows=5280 loops=1)
         Sort Key: main.effectiveid, main.issuestatement, main.resolution, main.owner, main.subject,
main.initialpriority,main.finalpriority, main.priority, main.timeestimated, main.timeworked, main.timeleft, main.told,
main.starts,main.started, main.due, main.resolved, main.lastupdatedby, main.lastupdated, main.creator, main.created,
main.disabled
         Sort Method:  quicksort  Memory: 1598kB
         ->  Nested Loop  (cost=0.00..23928.59 rows=1 width=162) (actual time=10.060..5120.834 rows=5280 loops=1)
               ->  Nested Loop  (cost=0.00..10222.38 rows=1734 width=166) (actual time=8.702..1328.970 rows=417711
loops=1)
                     ->  Seq Scan on tickets main  (cost=0.00..5687.88 rows=85 width=162) (actual time=8.258..94.012
rows=25410loops=1) 
                           Filter: (((status)::text <> 'deleted'::text) AND (lastupdated > '2008-12-31
23:00:00'::timestampwithout time zone) AND (created > '2005-12-31 23:00:00'::timestamp without time zone) AND
(effectiveid= id) AND (queue = 15) AND ((type)::text = 'ticket'::text) AND ((status)::text = 'resolved'::text)) 
                     ->  Index Scan using transactions1 on transactions transactions_1  (cost=0.00..53.01 rows=27
width=8)(actual time=0.030..0.039 rows=16 loops=25410) 
                           Index Cond: (((transactions_1.objecttype)::text = 'RT::Ticket'::text) AND
(transactions_1.objectid= main.effectiveid)) 
               ->  Index Scan using attachments2 on attachments attachments_2  (cost=0.00..7.89 rows=1 width=4) (actual
time=0.008..0.009rows=0 loops=417711) 
                     Index Cond: (attachments_2.transactionid = transactions_1.id)
                     Filter: ((attachments_2.trigrams @@ '''uir'''::tsquery) AND (attachments_2.content ~~*
'%uir%'::text))
 Total runtime: 5208.149 ms
(14 rows)

The above times are for already cached data (repeated query).
I think the execution plan is poor. Better would be to filter table attachments
at first and then join the rest. The reason is a bad estimate on number of rows
returned from table tickets (85 estimated -> 25410 in the reality).
Eliminating sub-condition...


rt=# explain analyze select * from tickets where effectiveid = id;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on tickets  (cost=0.00..4097.40 rows=530 width=162) (actual time=0.019..38.130 rows=101869 loops=1)
   Filter: (effectiveid = id)
 Total runtime: 54.318 ms
(3 rows)

Estimated 530 rows, but reality is 101869 rows.

The problem is the strong dependance between id and effectiveid. The RT
documentation says:

    EffectiveId:
    By default, a ticket's EffectiveId is the same as its ID. RT supports the
    ability to merge tickets together. When you merge a ticket into
    another one, RT sets the first ticket's EffectiveId to the second
    ticket's ID. RT uses this data to quickly look up which ticket
    you're really talking about when you reference a merged ticket.


I googled the page http://wiki.postgresql.org/wiki/Cross_Columns_Stats

Maybe I identified the already documented problem. What I can do with this
situation? Some workaround?

Thanks in advance for any suggestions.
Best Regards
--
Zito

pgsql-performance by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Performance
Next
From: Bob Lunney
Date:
Subject: Re: poor execution plan because column dependence