poor execution plan because column dependence - Mailing list pgsql-performance
From | Václav Ovsík |
---|---|
Subject | poor execution plan because column dependence |
Date | |
Msg-id | 20110412232343.GA15966@bobek.localdomain Whole thread Raw |
Responses |
Re: poor execution plan because column dependence
Re: poor execution plan because column dependence |
List | pgsql-performance |
Hi, I have done migration of the Request Tracker 3.8.9 (http://requesttracker.wikia.com/wiki/HomePage) from Mysql to PostgreSQL in testing environment. The RT schema used can be viewed at https://github.com/bestpractical/rt/blob/3.8-trunk/etc/schema.Pg. I have added full text search on table Attachments based on trigrams (and still experimenting with it), but is is not interesting for the problem (the problem is not caused by it directly). The full text search alone works quite good. A user testing a new RT instance reported a poor performance problem with a bit more complex query (more conditions resulting in table joins). Queries are constructed by module DBIx::SearchBuilder. The problematic query logged: rt=# EXPLAIN ANALYZE SELECT DISTINCT main.* FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId= main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = 'RT::Ticket') AND (main.Status != 'deleted') AND (main.Status = 'resolved' AND main.LastUpdated> '2008-12-31 23:00:00' AND main.Created > '2005-12-31 23:00:00' AND main.Queue = '15' AND ( Attachments_2.trigrams@@ text_to_trgm_tsquery('uir') AND Attachments_2.Content ILIKE '%uir%' ) ) AND (main.Type = 'ticket')AND (main.EffectiveId = main.id) ORDER BY main.id ASC; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=23928.60..23928.67 rows=1 width=162) (actual time=5201.139..5207.965 rows=649 loops=1) -> Sort (cost=23928.60..23928.61 rows=1 width=162) (actual time=5201.137..5201.983 rows=5280 loops=1) Sort Key: main.effectiveid, main.issuestatement, main.resolution, main.owner, main.subject, main.initialpriority,main.finalpriority, main.priority, main.timeestimated, main.timeworked, main.timeleft, main.told, main.starts,main.started, main.due, main.resolved, main.lastupdatedby, main.lastupdated, main.creator, main.created, main.disabled Sort Method: quicksort Memory: 1598kB -> Nested Loop (cost=0.00..23928.59 rows=1 width=162) (actual time=10.060..5120.834 rows=5280 loops=1) -> Nested Loop (cost=0.00..10222.38 rows=1734 width=166) (actual time=8.702..1328.970 rows=417711 loops=1) -> Seq Scan on tickets main (cost=0.00..5687.88 rows=85 width=162) (actual time=8.258..94.012 rows=25410loops=1) Filter: (((status)::text <> 'deleted'::text) AND (lastupdated > '2008-12-31 23:00:00'::timestampwithout time zone) AND (created > '2005-12-31 23:00:00'::timestamp without time zone) AND (effectiveid= id) AND (queue = 15) AND ((type)::text = 'ticket'::text) AND ((status)::text = 'resolved'::text)) -> Index Scan using transactions1 on transactions transactions_1 (cost=0.00..53.01 rows=27 width=8)(actual time=0.030..0.039 rows=16 loops=25410) Index Cond: (((transactions_1.objecttype)::text = 'RT::Ticket'::text) AND (transactions_1.objectid= main.effectiveid)) -> Index Scan using attachments2 on attachments attachments_2 (cost=0.00..7.89 rows=1 width=4) (actual time=0.008..0.009rows=0 loops=417711) Index Cond: (attachments_2.transactionid = transactions_1.id) Filter: ((attachments_2.trigrams @@ '''uir'''::tsquery) AND (attachments_2.content ~~* '%uir%'::text)) Total runtime: 5208.149 ms (14 rows) The above times are for already cached data (repeated query). I think the execution plan is poor. Better would be to filter table attachments at first and then join the rest. The reason is a bad estimate on number of rows returned from table tickets (85 estimated -> 25410 in the reality). Eliminating sub-condition... rt=# explain analyze select * from tickets where effectiveid = id; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Seq Scan on tickets (cost=0.00..4097.40 rows=530 width=162) (actual time=0.019..38.130 rows=101869 loops=1) Filter: (effectiveid = id) Total runtime: 54.318 ms (3 rows) Estimated 530 rows, but reality is 101869 rows. The problem is the strong dependance between id and effectiveid. The RT documentation says: EffectiveId: By default, a ticket's EffectiveId is the same as its ID. RT supports the ability to merge tickets together. When you merge a ticket into another one, RT sets the first ticket's EffectiveId to the second ticket's ID. RT uses this data to quickly look up which ticket you're really talking about when you reference a merged ticket. I googled the page http://wiki.postgresql.org/wiki/Cross_Columns_Stats Maybe I identified the already documented problem. What I can do with this situation? Some workaround? Thanks in advance for any suggestions. Best Regards -- Zito
pgsql-performance by date: