Re: query optimization - Mailing list pgsql-performance

From Thomas Kellerer
Subject Re: query optimization
Date
Msg-id jnc860$q30$1@dough.gmane.org
Whole thread Raw
In response to Re: query optimization  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: query optimization
List pgsql-performance
Tom Lane wrote on 26.04.2012 21:17:
> Richard Kojedzinszky<krichy@tvnetwork.hu>  writes:
>> Dear list,
>> We have a database schema, which looks the same as the attached script.
>
>> When filling the tables with data, and skipping analyze on the table (so
>> pg_stats contains no records for table 'a'), the first select in the
>> script runs fast, but after an analyze the planner decides to sequence
>> scan tables b and c, thus making the query much slower. Can somebody help
>> me solving this issue, or tuning our installation to not to use sequence
>> scans in this case?
>
> Um ... did you analyze all the tables, or just some of them?  I get
> sub-millisecond runtimes if all four tables have been analyzed, but it
> does seem to pick lousy plans if, say, only a and b have been analyzed.
>

Here it's similar to Richard's experience:

Before analyzing the four tables, the first statement yields this plan:

Merge Left Join  (cost=504.89..2634509.91 rows=125000000 width=16) (actual time=0.103..0.108 rows=1 loops=1)
   Merge Cond: (a.b = b.id)
   ->  Sort  (cost=504.89..506.14 rows=500 width=8) (actual time=0.043..0.043 rows=1 loops=1)
         Sort Key: a.b
         Sort Method: quicksort  Memory: 17kB
         ->  Bitmap Heap Scan on a  (cost=12.14..482.47 rows=500 width=8) (actual time=0.028..0.029 rows=1 loops=1)
               Recheck Cond: (id = 4)
               ->  Bitmap Index Scan on a_idx1  (cost=0.00..12.01 rows=500 width=0) (actual time=0.021..0.021 rows=1
loops=1)
                     Index Cond: (id = 4)
   ->  Materialize  (cost=0.00..884002.52 rows=50000000 width=8) (actual time=0.041..0.057 rows=5 loops=1)
         ->  Merge Join  (cost=0.00..759002.52 rows=50000000 width=8) (actual time=0.037..0.051 rows=5 loops=1)
               Merge Cond: (b.id = c.id)
               ->  Index Scan using b_idx1 on b  (cost=0.00..4376.26 rows=100000 width=4) (actual time=0.016..0.018
rows=5loops=1) 
               ->  Materialize  (cost=0.00..4626.26 rows=100000 width=4) (actual time=0.017..0.022 rows=5 loops=1)
                     ->  Index Scan using c_idx1 on c  (cost=0.00..4376.26 rows=100000 width=4) (actual
time=0.014..0.017rows=5 loops=1) 
Total runtime: 0.209 ms

This continues to stay the plan for about 10-15 repetitions, then it turns to this plan

Hash Right Join  (cost=2701.29..6519.30 rows=1 width=16) (actual time=79.604..299.227 rows=1 loops=1)
   Hash Cond: (b.id = a.b)
   ->  Hash Join  (cost=2693.00..6136.00 rows=100000 width=8) (actual time=79.550..265.251 rows=100000 loops=1)
         Hash Cond: (b.id = c.id)
         ->  Seq Scan on b  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.011..36.158 rows=100000 loops=1)
         ->  Hash  (cost=1443.00..1443.00 rows=100000 width=4) (actual time=79.461..79.461 rows=100000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 2735kB
               ->  Seq Scan on c  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.010..32.930 rows=100000
loops=1)
   ->  Hash  (cost=8.28..8.28 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Index Scan using a_idx1 on a  (cost=0.00..8.28 rows=1 width=8) (actual time=0.010..0.012 rows=1 loops=1)
               Index Cond: (id = 4)
Total runtime: 299.564 ms

(I guess autovacuum kicked in, because that the same plan I get when running analyze on all four tables right after
populatingthem) 

And the second one yields this one here (Regardless of analyze or not):

QUERY PLAN
Nested Loop Left Join  (cost=0.00..16.89 rows=1 width=16) (actual time=0.027..0.031 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..16.57 rows=1 width=12) (actual time=0.020..0.022 rows=1 loops=1)
         ->  Index Scan using a_idx1 on a  (cost=0.00..8.28 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)
               Index Cond: (id = 4)
         ->  Index Scan using b_idx1 on b  (cost=0.00..8.28 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
               Index Cond: (a.b = id)
   ->  Index Scan using c_idx1 on c  (cost=0.00..0.31 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
         Index Cond: (b.id = id)
Total runtime: 0.104 ms


My version:
PostgreSQL 9.1.3, compiled by Visual C++ build 1500, 32-bit
Running on Windows XP SP3

shared_buffers = 768MB
work_mem = 24MB
effective_cache_size = 1024MB

All other (relevant) settings are on defaults

Regards
Thomas




pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: query optimization
Next
From: Craig James
Date:
Subject: auto-vacuum vs. full table update