Re: Slow SQL query (14-15 seconds) - Mailing list pgsql-performance

From Bruno Baguette
Subject Re: Slow SQL query (14-15 seconds)
Date
Msg-id 491C377E.3070904@gmail.com
Whole thread Raw
In response to Re: Slow SQL query (14-15 seconds)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Slow SQL query (14-15 seconds)
List pgsql-performance
Le 13/11/08 14:31, Tom Lane a écrit :
> It's the repeatedly executed EXISTS subplan that's hurting you:
>
>>                             SubPlan
>>                               ->  Nested Loop  (cost=35.56..378.16
>> rows=2 width=0) (actual time=16.511..16.511 rows=0 loops=818)
>
> 16.511 * 818 = 13505.998, so this is all but about 100 msec of the
> runtime.  Can't tell if there's any easy way to improve it.  In
> pre-8.4 releases trying to convert the EXISTS into an IN might help.

Hello Tom !

If I replace the EXISTS by a IN subquery, it falls from 14-15 seconds to
5 seconds !

####################################
AND EXISTS (
              SELECT 1 FROM commandes
              INNER JOIN clients ON commandes.fk_client_id =
clients.pk_client_id
              INNER JOIN societes AS societe_client ON
clients.fk_societe_id = societe_client.pk_societe_id
              WHERE delivery_date_livraison BETWEEN (NOW() - '1
year'::interval) AND NOW() AND societe_client.pk_societe_id =
societes.pk_societe_id
            )
####################################

replaced by a IN subquery

####################################
AND societes.pk_societe_id IN (
                                 SELECT societes.pk_societe_id
                                 FROM commandes
                                 INNER JOIN clients ON
commandes.fk_client_id = clients.pk_client_id
                                 INNER JOIN societes AS societe_client
ON clients.fk_societe_id = societe_client.pk_societe_id
                                 WHERE delivery_date_livraison BETWEEN
(NOW() - '1 year'::interval) AND NOW()
                               )
####################################

Heres's the EXPLAIN ANALYZE of the new SQL query :


####################################

            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=280995.27..280996.30 rows=414 width=147) (actual
time=5164.297..5165.638 rows=818 loops=1)
    Sort Key: lower((societes.denomination_commerciale)::text)
    ->  Hash Left Join  (cost=697.38..280977.27 rows=414 width=147)
(actual time=110.093..5156.853 rows=818 loops=1)
          Hash Cond: ("outer".pk_societe_id = "inner".societe_id)
          ->  Merge Left Join  (cost=642.64..280817.00 rows=414
width=139) (actual time=98.886..5141.305 rows=818 loops=1)
                Merge Cond: ("outer".pk_societe_id = "inner".societe_id)
                ->  Merge Left Join  (cost=551.92..280717.18 rows=414
width=131) (actual time=87.278..5123.133 rows=818 loops=1)
                      Merge Cond: ("outer".pk_societe_id =
"inner".societe_id)
                      ->  Index Scan using pkey_societe_id on societes
(cost=0.00..280155.54 rows=414 width=123) (actual time=21.748..5051.976
rows=818 loops=1)
                            Filter: ((NOT is_deleted) AND (subplan))
                            SubPlan
                              ->  Hash Join  (cost=170.88..438.17
rows=2298 width=0) (actual time=6.165..6.165 rows=1 loops=818)
                                    Hash Cond: ("outer".fk_client_id =
"inner".pk_client_id)
                                    ->  Bitmap Heap Scan on commandes
(cost=35.66..266.10 rows=2775 width=8) (actual time=6.144..6.144 rows=1
loops=818)
                                          Recheck Cond:
((delivery_date_livraison >= (now() - '1 year'::interval)) AND
(delivery_date_livraison <= now()))
                                          ->  Bitmap Index Scan on
idx_date_livraison  (cost=0.00..35.66 rows=2775 width=0) (actual
time=6.121..6.121 rows=3109 loops=818)
                                                Index Cond:
((delivery_date_livraison >= (now() - '1 year'::interval)) AND
(delivery_date_livraison <= now()))
                                    ->  Hash  (cost=132.46..132.46
rows=1105 width=8) (actual time=13.573..13.573 rows=1082 loops=1)
                                          ->  Hash Join
(cost=48.39..132.46 rows=1105 width=8) (actual time=3.933..11.246
rows=1082 loops=1)
                                                Hash Cond:
("outer".fk_societe_id = "inner".pk_societe_id)
                                                ->  Seq Scan on clients
  (cost=0.00..66.35 rows=1335 width=16) (actual time=0.004..2.623
rows=1308 loops=1)
                                                ->  Hash
(cost=46.11..46.11 rows=911 width=8) (actual time=3.900..3.900 rows=903
loops=1)
                                                      ->  Seq Scan on
societes societe_client  (cost=0.00..46.11 rows=911 width=8) (actual
time=0.004..1.947 rows=903 loops=1)
                      ->  Sort  (cost=551.92..554.20 rows=911 width=16)
(actual time=65.518..66.453 rows=563 loops=1)
                            Sort Key: stats_commandes.societe_id
                            ->  Subquery Scan stats_commandes
(cost=486.64..507.14 rows=911 width=16) (actual time=61.034..64.117
rows=563 loops=1)
                                  ->  HashAggregate
(cost=486.64..498.03 rows=911 width=16) (actual time=61.028..62.177
rows=563 loops=1)
                                        ->  Hash Join
(cost=135.22..458.94 rows=5539 width=16) (actual time=13.517..48.643
rows=5971 loops=1)
                                              Hash Cond:
("outer".fk_client_id = "inner".pk_client_id)
                                              ->  Seq Scan on commandes
  (cost=0.00..234.90 rows=6690 width=16) (actual time=0.004..11.951
rows=5971 loops=1)
                                              ->  Hash
(cost=132.46..132.46 rows=1105 width=16) (actual time=13.486..13.486
rows=1082 loops=1)
                                                    ->  Hash Join
(cost=48.39..132.46 rows=1105 width=16) (actual time=3.827..11.123
rows=1082 loops=1)
                                                          Hash Cond:
("outer".fk_societe_id = "inner".pk_societe_id)
                                                          ->  Seq Scan
on clients  (cost=0.00..66.35 rows=1335 width=16) (actual
time=0.003..2.566 rows=1308 loops=1)
                                                          ->  Hash
(cost=46.11..46.11 rows=911 width=8) (actual time=3.802..3.802 rows=903
loops=1)
                                                                ->  Seq
Scan on societes  (cost=0.00..46.11 rows=911 width=8) (actual
time=0.004..1.906 rows=903 loops=1)
                ->  Sort  (cost=90.72..92.83 rows=844 width=16) (actual
time=11.566..13.070 rows=885 loops=1)
                      Sort Key: stats_adresses_livraison.societe_id
                      ->  Subquery Scan stats_adresses_livraison
(cost=30.71..49.70 rows=844 width=16) (actual time=4.504..9.357 rows=885
loops=1)
                            ->  HashAggregate  (cost=30.71..41.26
rows=844 width=16) (actual time=4.499..6.304 rows=885 loops=1)
                                  ->  Seq Scan on
societes_adresses_livraison  (cost=0.00..25.90 rows=962 width=16)
(actual time=0.005..2.221 rows=991 loops=1)
                                        Filter: (NOT is_deleted)
          ->  Hash  (cost=52.48..52.48 rows=903 width=16) (actual
time=11.164..11.164 rows=903 loops=1)
                ->  Subquery Scan stats_adresses_facturation
(cost=32.16..52.48 rows=903 width=16) (actual time=4.339..9.220 rows=903
loops=1)
                      ->  HashAggregate  (cost=32.16..43.45 rows=903
width=16) (actual time=4.334..6.116 rows=903 loops=1)
                            ->  Seq Scan on
societes_adresses_facturation  (cost=0.00..27.25 rows=983 width=16)
(actual time=0.006..2.128 rows=943 loops=1)
                                  Filter: (NOT is_deleted)
  Total runtime: 5167.896 ms
(48 lignes)

####################################


Many thanks for the help, that's already better (3x time faster) !

Can you explain why a IN is fastest than an EXISTS subquery ? Until now,
I was thinking that IN would require PostgreSQL to scan all the table
(from the beginning to the end) and that EXISTS would require to scan
all the table (from the beginning until getting one match).

Do you think I can improve again the performance of that query ? I
expected more speed since theses are little tables

delivery=> SELECT COUNT(*) FROM societes;
  count
-------
    903
(1 ligne)

delivery=> SELECT COUNT(*) FROM clients;
  count
-------
   1308
(1 ligne)

delivery=> SELECT COUNT(*) FROM commandes;
  count
-------
   5972
(1 ligne)


One reader told me Gmail was guilty for cutting the lines, so I've put a
copy of the query plan on pastebin.com to keep it readable :
<http://pastebin.com/m6434f639>

Thanks in advance for any tips !

Regards,

--
Bruno Baguette

pgsql-performance by date:

Previous
From: Tomasz Myrta
Date:
Subject: Re: Slow SQL query (14-15 seconds)
Next
From: "Vladimir Sitnikov"
Date:
Subject: Re: Slow SQL query (14-15 seconds)