Re: Query planner question - Mailing list pgsql-general

From Soni M
Subject Re: Query planner question
Date
Msg-id CAAMgDXkttXOtsDkpastMfQ3MQ6UaBH_vRYJ_7vxC_XmV4XAutQ@mail.gmail.com
Whole thread Raw
In response to Re: Query planner question  (David G Johnston <david.g.johnston@gmail.com>)
Responses Re: Query planner question  (Alban Hertroys <haramrae@gmail.com>)
List pgsql-general



On Thu, Aug 21, 2014 at 9:26 AM, David G Johnston <david.g.johnston@gmail.com> wrote:
Soni M wrote
> Hi Everyone,
>
> I have this query :
>
> select t.ticket_id ,
> tb.transmission_id
> from ticket t,
> transmission_base tb
> where t.latest_transmission_id = tb.transmission_id
> and t.ticket_number = tb.ticket_number
> and tb.parse_date > ('2014-07-31');
>
> Execution plan: http://explain.depesz.com/s/YAak
>
> Indexes on ticket :
>     "ticket_pkey" PRIMARY KEY, btree (ticket_id) CLUSTER
>     "ticket_by_latest_transmission" btree (latest_transmission_id)
>     "ticket_by_ticket_number" btree (ticket_number)
>
> This query only returns some portions of rows from ticket table.
> The question is, Why does postgres need to get all the rows from ticket
> table in order to complete this query?
> Can't postgres use indexes to get only needed rows on ticket table?
>
> I try set seqscan to off, but still index scan try to get all rows on
> ticket table.
> Here's the execution plan : http://explain.depesz.com/s/abH2

Short answer: you haven't defined "(latest_transmission_id, ticket_number)"
as being a foreign key onto the transmission_base table yet you seem to want
it to act like one.

Currently we have only latest_transmission_id as FK, described here :
TABLE "ticket" CONSTRAINT "fkcbe86b0c6ddac9e" FOREIGN KEY (latest_transmission_id) REFERENCES transmission_base(transmission_id)

Change the query to include only FK still result the same:
explain select t.ticket_id ,
tb.transmission_id
from ticket t,
transmission_base tb
where t.latest_transmission_id = tb.transmission_id
and tb.parse_date > ('2014-07-31');
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=113928.06..2583606.96 rows=200338 width=8)
   Hash Cond: (t.latest_transmission_id = tb.transmission_id)
   ->  Seq Scan on ticket t  (cost=0.00..1767767.26 rows=69990826 width=8)
   ->  Hash  (cost=108923.38..108923.38 rows=400374 width=4)
         ->  Index Scan using transmission_base_by_parse_date on transmission_base tb  (cost=0.00..108923.38 rows=400374 width=4)
               Index Cond: (parse_date > '2014-07-31 00:00:00'::timestamp without time zone)
(6 rows)

I've googling this one, it seems that's how hash join works. For hash join operation, the join predicate cannot be used for the index scan, only independent predicate can be used in index scan. http://use-the-index-luke.com/sql/join/hash-join-partial-objects

 
Because of this failure the planner considers the following:

Nested Looping over 380,000 records is going to suck so it tries some
advanced "merge/join" techniques to try and speed things up.  In any such
alternative the entire ticket table needs to be considered since there is no
constraint provided for that table - the only constraint in on
transmission_base and it rightly is using an index to find records matching
the where clause.

Since ticket_number and latest_transmission_id are found in separate indexes
I do not believe the planner can make use of an Index Only scan to fulfill
the join so each index lookup would require a corresponding heap lookup
which means extra work compared to just sequentially scanning the heap in
the first place.  Since it is going to hit the entire thing in either case
the sequential scan is the logical choice for it to make.

Others will correct any factual mistakes I may have made - I am theorizing
here and do not understand the planner sufficient well to be 100% certain
that an FK definition will solve the problem.

David J.





--
View this message in context: http://postgresql.1045698.n5.nabble.com/Query-planner-question-tp5815659p5815661.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Regards,

Soni Maula Harriz

pgsql-general by date:

Previous
From: Birta Levente
Date:
Subject: Re: POWA tool
Next
From: Andy Lau
Date:
Subject: WAL log level compatibility