Thread: BUG #2730: strange query performance !

BUG #2730: strange query performance !

From
"Baudracco Pierre"
Date:
The following bug has been logged online:

Bug reference:      2730
Logged by:          Baudracco Pierre
Email address:      pierre.baudracco@aliasource.fr
PostgreSQL version: 8.1.5
Operating system:   Linux Debian unstable (SID)
Description:        strange query performance !
Details:

Debian : ii  postgresql-8.1                   8.1.5-1

on a simple data model (a contact, linked to a company and categories
categorizing a contact) this query takes more than 30 seconds !

SELECT distinct contact_id
FROM Contact
  LEFT JOIN Company ON contact_company_id=company_id
  LEFT JOIN CategoryLink AS cl ON
     contact_id=cl.categorylink_entity_id
     AND cl.categorylink_entity='contact'
     AND cl.categorylink_category='contactcategory2'
WHERE cl.categorylink_category_id='268' and contact_archive=0;

If I invert (only change) the LEFT JOIN clause the queries returns
immediately

SELECT distinct contact_id
FROM Contact
  LEFT JOIN CategoryLink AS cl ON
     contact_id=cl.categorylink_entity_id
     AND cl.categorylink_entity='contact'
     AND cl.categorylink_category='contactcategory2'
  LEFT JOIN Company ON contact_company_id=company_id     WHERE
cl.categorylink_category_id='268' and contact_archive=0;

if I supress one of the where clause, it returns immediately too... very
strange

here are the queries plan (explain):
first and slow one :

************************
obm=> explain  SELECT distinct contact_id FROM Contact LEFT JOIN Company ON
contact_company_id=company_id LEFT JOIN CategoryLink AS cl ON
contact_id=cl.categorylink_entity_id AND cl.categorylink_entity='contact'
AND cl.categorylink_category='contactcategory2' WHERE
cl.categorylink_category_id='268' and contact_archive=0;

      QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------
 Unique  (cost=0.00..1423.38 rows=1 width=4)
   ->  Nested Loop  (cost=0.00..1423.37 rows=1 width=4)
         Join Filter: ("inner".contact_id = "outer".categorylink_entity_id)
         ->  Index Scan using categorylink_pkey on categorylink cl
(cost=0.00..4.87 rows=1 width=4)
               Index Cond: ((categorylink_category_id = 268) AND
((categorylink_category)::text = 'contactcategory2'::text) AND
((categorylink_entity)::text = 'contact'::text))
         ->  Nested Loop Left Join  (cost=0.00..1417.30 rows=96 width=4)
               ->  Seq Scan on contact  (cost=0.00..840.51 rows=96 width=8)
                     Filter: ((contact_archive)::text = '0'::text)
               ->  Index Scan using company_pkey on company
(cost=0.00..6.00 rows=1 width=4)
                     Index Cond: ("outer".contact_company_id =
company.company_id)


I can see a strange seq scan on contact but why ???

*****************************************************
second one, really fast

obm=> explain  SELECT distinct contact_id FROM Contact LEFT JOIN
CategoryLink AS cl ON contact_id=cl.categorylink_entity_id AND
cl.categorylink_entity='contact' AND
cl.categorylink_category='contactcategory2' LEFT JOIN Company on
contact_company_id=company_id WHERE cl.categorylink_category_id='268' and
contact_archive=0;

         QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------------
 Unique  (cost=0.00..16.91 rows=1 width=4)
   ->  Nested Loop Left Join  (cost=0.00..16.90 rows=1 width=4)
         ->  Nested Loop  (cost=0.00..10.90 rows=1 width=8)
               ->  Index Scan using categorylink_pkey on categorylink cl
(cost=0.00..4.87 rows=1 width=4)
                     Index Cond: ((categorylink_category_id = 268) AND
((categorylink_category)::text = 'contactcategory2'::text) AND
((categorylink_entity)::text = 'contact'::text))
               ->  Index Scan using contact_pkey on contact
(cost=0.00..6.01 rows=1 width=8)
                     Index Cond: (contact.contact_id =
"outer".categorylink_entity_id)
                     Filter: ((contact_archive)::text = '0'::text)
         ->  Index Scan using company_pkey on company  (cost=0.00..6.00
rows=1 width=4)
               Index Cond: ("outer".contact_company_id =
company.company_id)


Is it a bug or may I have missed something ??

Re: BUG #2730: strange query performance !

From
Tom Lane
Date:
"Baudracco Pierre" <pierre.baudracco@aliasource.fr> writes:
> on a simple data model (a contact, linked to a company and categories
> categorizing a contact) this query takes more than 30 seconds !
> ...
> If I invert (only change) the LEFT JOIN clause the queries returns
> immediately

Existing Postgres releases do not consider rearranging the order of
outer joins, because they don't have the logic needed to determine
whether it's safe (will give the same answer) or not to change the
join order.  8.2 will be smarter.

            regards, tom lane