Re: poor execution plan because column dependence - Mailing list pgsql-performance

From Bob Lunney
Subject Re: poor execution plan because column dependence
Date
Msg-id 867169.80414.qm@web39703.mail.mud.yahoo.com
Whole thread Raw
In response to poor execution plan because column dependence  (Václav Ovsík <vaclav.ovsik@i.cz>)
Responses Re: poor execution plan because column dependence  (Václav Ovsík <vaclav.ovsik@i.cz>)
List pgsql-performance
Zito,

Using psql log in as the database owner and run "analyze verbose".  Happiness will ensue.

Also, when requesting help with a query its important to state the database version ("select version();") and what, if
any,configuration changes you have made in postgresql.conf.   Listing ony the ones that have changed is sufficient. 

Finally, the wiki has some good information on the care and feeding of a PostgreSQL database:

http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT



Bob Lunney

--- On Tue, 4/12/11, Václav Ovsík <vaclav.ovsik@i.cz> wrote:

> From: Václav Ovsík <vaclav.ovsik@i.cz>
> Subject: [PERFORM] poor execution plan because column dependence
> To: pgsql-performance@postgresql.org
> Date: Tuesday, April 12, 2011, 7:23 PM
> Hi,
> I have done migration of the Request Tracker 3.8.9
> (http://requesttracker.wikia.com/wiki/HomePage) from
> Mysql to
> PostgreSQL in testing environment.
> The RT schema used can be viewed at
> https://github.com/bestpractical/rt/blob/3.8-trunk/etc/schema.Pg.
> I have added full text search on table Attachments based on
> trigrams
> (and still experimenting with it), but is is not
> interesting for the
> problem (the problem is not caused by it directly).
> The full text search alone works quite good. A user testing
> a new RT instance
> reported a poor performance problem with a bit more complex
> query (more
> conditions resulting in table joins).
> Queries are constructed by module DBIx::SearchBuilder.
> The problematic query logged:
>
> rt=# EXPLAIN ANALYZE SELECT DISTINCT  main.* FROM
> Tickets main JOIN Transactions Transactions_1  ON (
> Transactions_1.ObjectId = main.id ) JOIN Attachments
> Attachments_2  ON ( Attachments_2.TransactionId =
> Transactions_1.id )  WHERE (Transactions_1.ObjectType =
> 'RT::Ticket') AND (main.Status != 'deleted') AND
> (main.Status = 'resolved' AND main.LastUpdated >
> '2008-12-31 23:00:00' AND main.Created > '2005-12-31
> 23:00:00' AND main.Queue = '15' AND  (
> Attachments_2.trigrams @@ text_to_trgm_tsquery('uir') AND
> Attachments_2.Content ILIKE '%uir%' ) ) AND (main.Type =
> 'ticket') AND (main.EffectiveId = main.id)  ORDER BY
> main.id ASC;
>                
>                
>                
>                
>                
>                
>                
>                
>                
>                
>            QUERY
> PLAN               
>                
>                
>                
>                
>                
>                
>                
>                
>                
>            
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Unique  (cost=23928.60..23928.67 rows=1 width=162)
> (actual time=5201.139..5207.965 rows=649 loops=1)
>    ->  Sort 
> (cost=23928.60..23928.61 rows=1 width=162) (actual
> time=5201.137..5201.983 rows=5280 loops=1)
>          Sort Key:
> main.effectiveid, main.issuestatement, main.resolution,
> main.owner, main.subject, main.initialpriority,
> main.finalpriority, main.priority, main.timeestimated,
> main.timeworked, main.timeleft, main.told, main.starts,
> main.started, main.due, main.resolved, main.lastupdatedby,
> main.lastupdated, main.creator, main.created, main.disabled
>          Sort Method: 
> quicksort  Memory: 1598kB
>          ->  Nested
> Loop  (cost=0.00..23928.59 rows=1 width=162) (actual
> time=10.060..5120.834 rows=5280 loops=1)
>            
>    ->  Nested Loop 
> (cost=0.00..10222.38 rows=1734 width=166) (actual
> time=8.702..1328.970 rows=417711 loops=1)
>                
>      ->  Seq Scan on tickets
> main  (cost=0.00..5687.88 rows=85 width=162) (actual
> time=8.258..94.012 rows=25410 loops=1)
>                
>            Filter:
> (((status)::text <> 'deleted'::text) AND (lastupdated
> > '2008-12-31 23:00:00'::timestamp without time zone) AND
> (created > '2005-12-31 23:00:00'::timestamp without time
> zone) AND (effectiveid = id) AND (queue = 15) AND
> ((type)::text = 'ticket'::text) AND ((status)::text =
> 'resolved'::text))
>                
>      ->  Index Scan using
> transactions1 on transactions transactions_1 
> (cost=0.00..53.01 rows=27 width=8) (actual time=0.030..0.039
> rows=16 loops=25410)
>                
>            Index Cond:
> (((transactions_1.objecttype)::text = 'RT::Ticket'::text)
> AND (transactions_1.objectid = main.effectiveid))
>            
>    ->  Index Scan using attachments2
> on attachments attachments_2  (cost=0.00..7.89 rows=1
> width=4) (actual time=0.008..0.009 rows=0 loops=417711)
>                
>      Index Cond:
> (attachments_2.transactionid = transactions_1.id)
>                
>      Filter: ((attachments_2.trigrams @@
> '''uir'''::tsquery) AND (attachments_2.content ~~*
> '%uir%'::text))
>  Total runtime: 5208.149 ms
> (14 rows)
>
> The above times are for already cached data (repeated
> query).
> I think the execution plan is poor. Better would be to
> filter table attachments
> at first and then join the rest. The reason is a bad
> estimate on number of rows
> returned from table tickets (85 estimated -> 25410 in
> the reality).
> Eliminating sub-condition...
>
>
> rt=# explain analyze select * from tickets where
> effectiveid = id;
>                
>                
>                
>   QUERY PLAN           
>                
>                
>      
> --------------------------------------------------------------------------------------------------------------
>  Seq Scan on tickets  (cost=0.00..4097.40 rows=530
> width=162) (actual time=0.019..38.130 rows=101869 loops=1)
>    Filter: (effectiveid = id)
>  Total runtime: 54.318 ms
> (3 rows)
>
> Estimated 530 rows, but reality is 101869 rows.
>
> The problem is the strong dependance between id and
> effectiveid. The RT
> documentation says:
>
>     EffectiveId:
>     By default, a ticket's EffectiveId is the
> same as its ID. RT supports the
>     ability to merge tickets together. When you
> merge a ticket into
>     another one, RT sets the first ticket's
> EffectiveId to the second
>     ticket's ID. RT uses this data to quickly
> look up which ticket
>     you're really talking about when you
> reference a merged ticket.
>
>
> I googled the page http://wiki.postgresql.org/wiki/Cross_Columns_Stats
>
> Maybe I identified the already documented problem. What I
> can do with this
> situation? Some workaround?
>
> Thanks in advance for any suggestions.
> Best Regards
> --
> Zito
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>




pgsql-performance by date:

Previous
From: Václav Ovsík
Date:
Subject: poor execution plan because column dependence
Next
From: Tom Lane
Date:
Subject: Re: poor execution plan because column dependence