Re: Help with tuning this query (with explain analyze finally) - Mailing list pgsql-performance

From Ken Egervari
Subject Re: Help with tuning this query (with explain analyze finally)
Date
Msg-id 004601c51fbe$828bbb50$6a01a8c0@KEN
Whole thread Raw
In response to Help with tuning this query  ("Ken Egervari" <ken@upfactor.com>)
Responses Re: Help with tuning this query (with explain analyze finally)  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Josh,

>1) To determine your query order ala Dan Tow and drive off of person,
>please
>SET JOIN_COLLAPSE_LIMIT = 1 and then run Mark Kirkwood's version of the
>query.  (Not that I believe in Dan Tow ... see previous message ... but it
>would be interesting to see the results.

Unfortunately, the query still takes 250 milliseconds.  I tried it with
other queries and the results are the same as before.  Here is the explain
analayze anyway:

Sort  (cost=2036.83..2036.87 rows=16 width=103) (actual
time=328.000..328.000 rows=39 loops=1)
  Sort Key: cs.date
  ->  Nested Loop Left Join  (cost=620.61..2036.51 rows=16 width=103)
(actual time=250.000..328.000 rows=39 loops=1)
        ->  Hash Join  (cost=620.61..1984.90 rows=16 width=78) (actual
time=250.000..328.000 rows=39 loops=1)
              Hash Cond: ("outer".carrier_code_id = "inner".id)
              ->  Merge Join  (cost=606.11..1965.99 rows=825 width=74)
(actual time=250.000..328.000 rows=310 loops=1)
                    Merge Cond: ("outer".current_status_id = "inner".id)
                    ->  Index Scan using shipment_current_status_id_idx on
shipment s  (cost=0.00..2701.26 rows=60307 width=66) (actual
time=0.000..77.000 rows=27711 loops=1)
                          Filter: (is_purged = false)
                    ->  Sort  (cost=606.11..610.50 rows=1756 width=12)
(actual time=141.000..141.000 rows=6902 loops=1)
                          Sort Key: cs.id
                          ->  Hash Join  (cost=1.11..511.48 rows=1756
width=12) (actual time=0.000..109.000 rows=6902 loops=1)
                                Hash Cond: ("outer".release_code_id =
"inner".id)
                                ->  Index Scan Backward using
current_status_date_idx on shipment_status cs  (cost=0.01..422.58 rows=14047
width=16) (actual time=0.000..78.000 rows=14925 loops=1)
                                      Index Cond: ((date >=
(('now'::text)::date - 31)) AND (date <= ('now'::text)::date))
                                      Filter: (cs.* IS NOT NULL)
                                ->  Hash  (cost=1.10..1.10 rows=1 width=4)
(actual time=0.000..0.000 rows=0 loops=1)
                                      ->  Seq Scan on release_code rc
(cost=0.00..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                                            Filter: ((number)::text =
'9'::text)
              ->  Hash  (cost=14.49..14.49 rows=2 width=8) (actual
time=0.000..0.000 rows=0 loops=1)
                    ->  Nested Loop  (cost=6.87..14.49 rows=2 width=8)
(actual time=0.000..0.000 rows=2 loops=1)
                          ->  Index Scan using person_pkey on person p
(cost=0.00..5.73 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                                Index Cond: (id = 355)
                          ->  Hash Join  (cost=6.87..8.74 rows=2 width=8)
(actual time=0.000..0.000 rows=2 loops=1)
                                Hash Cond: ("outer".carrier_id =
"inner".carrier_id)
                                ->  Seq Scan on carrier_code cc
(cost=0.00..1.57 rows=57 width=8) (actual time=0.000..0.000 rows=57 loops=1)
                                ->  Hash  (cost=6.86..6.86 rows=1 width=12)
(actual time=0.000..0.000 rows=0 loops=1)
                                      ->  Hash Join  (cost=3.04..6.86 rows=1
width=12) (actual time=0.000..0.000 rows=1 loops=1)
                                            Hash Cond: ("outer".id =
"inner".carrier_id)
                                            ->  Seq Scan on carrier c
(cost=0.00..3.54 rows=54 width=4) (actual time=0.000..0.000 rows=54 loops=1)
                                            ->  Hash  (cost=3.04..3.04
rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1)
                                                  ->  Index Scan using
carrier_to_person_person_id_idx on carrier_to_person ctp  (cost=0.00..3.04
rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
                                                        Index Cond: (355 =
person_id)
        ->  Index Scan using shipment_status_shipment_id_idx on
shipment_status ss  (cost=0.00..3.20 rows=2 width=25) (actual
time=0.000..0.000 rows=1 loops=39)
              Index Cond: (ss.shipment_id = "outer".id)
Total runtime: 328.000 ms

>2) Force PG to drop the merge join via SET ENABLE_MERGEJOIN = FALSE;

Setting this option had no effect either  In fact, the query is a bit slower
(266 milliseconds but 250 came up once in 20 executions).

>Also, please let us know some about the server you're using and your
>configuration parameters, particularly:
>shared_buffers
>work_mem
>effective_cache_size
>random_page_cost

Well, I'm on a test machine so the settings haven't changed one bit from the
defaults.  This may sound embarrassing, but I bet the production server is
not custom configured either.  The computer I'm running these queries on is
just a simple Athon XP 2100+ on WinXP with 1GB of RAM.  The production
server is a faster P4, but the rest is the same.  Here are the 4 values in
my configuration, but 3 of them were commented:

shared_buffers = 1000
#work_mem = 1024
#effective_cache_size = 1000
#random_page_cost = 4

I'm not sure what these do, but I'm guessing the last 2 affect the planner
to do different things with the statistics.  Should I increase the first
two?

Regards,
Ken


pgsql-performance by date:

Previous
From: "Ken Egervari"
Date:
Subject: Re: Help with tuning this query (more musings)
Next
From: Richard Huxton
Date:
Subject: Re: Help with tuning this query (more musings)