Re: bad query performance - Mailing list pgsql-sql

From Luis Sousa
Subject Re: bad query performance
Date
Msg-id 3EC3531E.1040805@ualg.pt
Whole thread Raw
In response to Re: bad query performance  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
List pgsql-sql
Didrik Pinte wrote:

> At 18:54 5/14/2003, Luis Sousa wrote:
>
>> Stephan Szabo wrote:
>>
>>> On Mon, 12 May 2003, Didrik Pinte wrote:
>>>
>>>
>>>
>>>> Here is the query :
>>>> ------------------------------------------------------------------------------------
>>>> SELECT profiles_et.username, profiles_et.name, profiles_et.firstname,
>>>> profiles_et.email, company_et.name AS company, count(logs_et.dt) AS 
>>>> cnt,
>>>> pro_invoice_addr_at.id
>>>>
>>>>
>>>> FROM
>>>> (
>>>> (
>>>> (
>>>> secu.company_et JOIN secu.pro_invoice_addr_at
>>>> ON (pro_invoice_addr_at.company = company_et.id)
>>>> )
>>>>
>>>> JOIN secu.profiles_et ON (pro_invoice_addr_at.profile =
>>>> profiles_et.id)
>>>> )
>>>> JOIN secu.logs_et ON (logs_et.invaddr = pro_invoice_addr_at.id)
>>>> )
>>>>
>>>>
>>>> GROUP BY profiles_et.username, profiles_et.name, 
>>>> profiles_et.firstname,
>>>> profiles_et.email, company_et.name, pro_invoice_addr_at.id
>>>>
>>>> ORDER BY count(logs_et.dt) DESC;
>>>> ------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>> The secu_company, secu_prov_invoice_addr_at and secu_profiles 
>>>> contains only
>>>> informations about the users.
>>>>
>>>> The query is executing in 8,6 minutes.... Indexes are defined on 
>>>> all the
>>>> field used in the different joins.
>>>>
>>>
>>> Have you recently analyzed the tables involved?
>>> I'd guess that the two outer sorts and the sort on logs_et are probably
>>> the killers, but can you send explain analyze output? Looking at the
>>> estimated row numbers and width, it seems to me that with 16Mb of 
>>> sort_mem
>>> it's going to have to sort on disk (although you don't have enough 
>>> memory
>>> to up it far enough probably if the estimates are right)
>>>
>>>
>>>
>>>
>>>> Here is the query plan :
>>>>
>>>> ------------------------------------------------------------------------------------
>>>> Sort (cost=895649.54..896073.23 rows=169474 width=145)
>>>> Sort Key: count(logs_et.dt)
>>>> -> Aggregate (cost=831240.24..865135.10 rows=169474 width=145)
>>>> -> Group (cost=831240.24..860898.24 rows=1694743 width=145)
>>>> -> Sort (cost=831240.24..835477.10 rows=1694743 width=145)
>>>> Sort Key: profiles_et.username, profiles_et.name,
>>>> profiles_et.firstname, profiles_et.email, company_et.name,
>>>> pro_invoice_addr_at.id
>>>> -> Merge Join (cost=274406.73..304066.75
>>>> rows=1694743 width=145)
>>>> Merge Cond: ("outer".id = "inner".invaddr)
>>>> -> Sort (cost=143.81..144.92 rows=446 width=126)
>>>> Sort Key: pro_invoice_addr_at.id
>>>> -> Merge Join (cost=90.27..124.18
>>>> rows=446 width=126)
>>>> Merge Cond: ("outer".id =
>>>> "inner".profile)
>>>> -> Index Scan using profiles_pk on
>>>> profiles_et (cost=0.00..24.98 rows=449 width=66)
>>>> -> Sort (cost=90.27..91.39
>>>> rows=446 width=60)
>>>> Sort Key:
>>>> pro_invoice_addr_at.profile
>>>> -> Merge
>>>> Join (cost=37.82..70.65 rows=446 width=60)
>>>> Merge Cond:
>>>> ("outer".company = "inner".id)
>>>> -> Index Scan using
>>>> invaddr_at_company_idx on pro_invoice_addr_at (cost=0.00..24.68 
>>>> rows=446
>>>> width=33)
>>>> -> Sort
>>>> (cost=37.82..38.55 rows=291 width=27)
>>>> Sort Key:
>>>> company_et.id
>>>> -> Seq Scan on
>>>> company_et (cost=0.00..25.91 rows=291 width=27)
>>>> -> Sort (cost=274262.92..278499.78
>>>> rows=1694743 width=19)
>>>> Sort Key: logs_et.invaddr
>>>> -> Seq Scan on
>>>> logs_et (cost=0.00..55404.43 rows=1694743 width=19)
>>>> ------------------------------------------------------------------------------------
>>>>
>>>> The computer is a Pentium III 850 Mhz with 256 mb RAM (Sort memory for
>>>> postres is 16 mb, and 64 mb of shared memory).
>>>>
>>>
>>>
>>>
>>> ---------------------------(end of 
>>> broadcast)---------------------------
>>> TIP 3: if posting/reading through Usenet, please send an appropriate
>>> subscribe-nomail command to majordomo@postgresql.org so that your
>>> message can get through to the mailing list cleanly
>>>
>>>
>>>
>> -> Seq Scan on
>> company_et (cost=0.00..25.91 rows=291 width=27)
>> -> Sort (cost=274262.92..278499.78
>> rows=1694743 width=19)
>> Sort Key: logs_et.invaddr
>>
>>
>> Do you have any indexes defined for logs_et.invaddr and for 
>> pro_invoice_addr_at.id ?
>>
>> Luis Sousa
>
>
>
> pro_invoice_add_at.id had not index but was a primary key. I've 
> created an index on it. Here is the explain analyse for the query :
>
> Sort (cost=895649.54..896073.23 rows=169474 width=145) (actual 
> time=553423.57..553423.76 rows=259 loops=1)
> Sort Key: count(logs_et.dt)
> -> Aggregate (cost=831240.24..865135.10 rows=169474 width=145) (actual 
> time=463611.22..553421.28 rows=259 loops=1)
> -> Group (cost=831240.24..860898.24 rows=1694743 width=145) (actual 
> time=463610.39..550481.54 rows=1698665 loops=1)
> -> Sort (cost=831240.24..835477.10 rows=1694743 width=145) (actual 
> time=463610.36..468962.97 rows=1698665 loops=1)
> Sort Key: profiles_et.username, profiles_et.name, 
> profiles_et.firstname, profiles_et.email, company_et.name, 
> pro_invoice_addr_at.id
> -> Merge Join (cost=274406.73..304066.75 rows=1694743 width=145) 
> (actual time=76531.90..97594.64 rows=1698665 loops=1)
> Merge Cond: ("outer".id = "inner".invaddr)
> -> Sort (cost=143.81..144.92 rows=446 width=126) (actual 
> time=147.27..147.67 rows=446 loops=1)
> Sort Key: pro_invoice_addr_at.id
> -> Merge Join (cost=90.27..124.18 rows=446 width=126) (actual 
> time=105.69..143.63 rows=446 loops=1)
> Merge Cond: ("outer".id = "inner".profile)
> -> Index Scan using profiles_pk on profiles_et (cost=0.00..24.98 
> rows=449 width=66) (actual time=50.53..79.70 rows=449 loops=1)
> -> Sort (cost=90.27..91.39 rows=446 width=60) (actual 
> time=55.13..55.54 rows=446 loops=1)
> Sort Key: pro_invoice_addr_at.profile
> -> Merge Join (cost=37.82..70.65 rows=446 width=60) (actual 
> time=40.11..49.69 rows=446 loops=1)
> Merge Cond: ("outer".company = "inner".id)
> -> Index Scan using invaddr_at_company_idx on pro_invoice_addr_at 
> (cost=0.00..24.68 rows=446 width=33) (actual time=9.91..13.54 rows=446 
> loops=1)
> -> Sort (cost=37.82..38.55 rows=291 width=27) (actual 
> time=30.11..30.49 rows=447 loops=1)
> Sort Key: company_et.id
> -> Seq Scan on company_et (cost=0.00..25.91 rows=291 width=27) (actual 
> time=7.24..27.73 rows=291 loops=1)
> -> Sort (cost=274262.92..278499.78 rows=1694743 width=19) (actual 
> time=76384.57..80077.80 rows=1698665 loops=1)
> Sort Key: logs_et.invaddr
> -> Seq Scan on logs_et (cost=0.00..55404.43 rows=1694743 width=19) 
> (actual time=11.47..13274.68 rows=1698665 loops=1)
> Total runtime: 553673.02 msec
>
> ,"**** DISCLAIMER ****
> This e-mail and any attachments thereto may contain information
> which is confidential and/or protected by intellectual property
> rights and are intended for the sole use of the recipient(s) named above.
> Any use of the information contained herein (including, but not 
> limited to,
> total or partial reproduction, communication or distribution in any form)
> by persons other than the designated recipient(s) is prohibited.
> If you have received this e-mail in error, please notify the sender
> and delete the material from any computer.
> Thank you for your cooperation.
>
>
>
You can make some experiences trying to change the sequence of JOIN, 
like this (without parenthesis the sequence is different)

SELECT profiles_et.username, profiles_et.name, profiles_et.firstname, 
profiles_et.email, company_et.name AS company, count(logs_et.dt) AS cnt, 
pro_invoice_addr_at.id
FROM secu.company_et JOIN secu.pro_invoice_addr_at ON 
(pro_invoice_addr_at.company = company_et.id)
JOIN secu.profiles_et ON (pro_invoice_addr_at.profile = profiles_et.id)
JOIN secu.logs_et ON (logs_et.invaddr = pro_invoice_addr_at.id)
GROUP BY profiles_et.username, profiles_et.name, profiles_et.firstname, 
profiles_et.email, company_et.name, pro_invoice_addr_at.id
ORDER BY count(logs_et.dt) DESC;

In the first JOIN, put the table with less records, and continue that order.

Luis Sousa



pgsql-sql by date:

Previous
From: Matt Mello
Date:
Subject: Re: "deadlock detected" documentation
Next
From: Luis Sousa
Date:
Subject: Re: I need help with a rule to relocate duplicate records