Re: query optimization - Mailing list pgsql-performance

From Richard Kojedzinszky
Subject Re: query optimization
Date
Msg-id alpine.DEB.2.02.1204270653110.22041@krichy.tvnetwork.hu
Whole thread Raw
In response to Re: query optimization  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Dear Tom,

Thanks for your response. I my test cases, until an analyze have been run,
the queries run fast. After only a have been analyzed, the query plan
changes so that a sequence scan for b and c tables is done, and joining
them with 'a' is done within memory.

So, my tests are here, with autovacuum turned off:

krichy=> \i test.sql
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 100000
INSERT 0 100000
INSERT 0 100000
INSERT 0 100000
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
krichy=> explain analyze select * from a left join (b inner join c on b.id
= c.id) on a.b = b.id where a.id = 1;
                                                                 QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
  Merge Left Join  (cost=504.88..2633669.90 rows=125000000 width=16)
(actual time=0.183..0.189 rows=1 loops=1)
    Merge Cond: (a.b = b.id)
    ->  Sort  (cost=504.88..506.13 rows=500 width=8) (actual
time=0.073..0.074 rows=1 loops=1)
          Sort Key: a.b
          Sort Method: quicksort  Memory: 17kB
          ->  Bitmap Heap Scan on a  (cost=12.13..482.47 rows=500 width=8)
(actual time=0.048..0.048 rows=1 loops=1)
                Recheck Cond: (id = 1)
                ->  Bitmap Index Scan on a_idx1  (cost=0.00..12.01 rows=500
width=0) (actual time=0.044..0.044 rows=1 loops=1)
                      Index Cond: (id = 1)
    ->  Materialize  (cost=0.00..883162.52 rows=50000000 width=8) (actual
time=0.103..0.108 rows=2 loops=1)
          ->  Merge Join  (cost=0.00..758162.52 rows=50000000 width=8)
(actual time=0.100..0.104 rows=2 loops=1)
                Merge Cond: (b.id = c.id)
                ->  Index Scan using b_idx1 on b  (cost=0.00..3956.26
rows=100000 width=4) (actual time=0.050..0.050 rows=2 loops=1)
                ->  Materialize  (cost=0.00..4206.26 rows=100000 width=4)
(actual time=0.048..0.049 rows=2 loops=1)
                      ->  Index Scan using c_idx1 on c  (cost=0.00..3956.26
rows=100000 width=4) (actual time=0.046..0.047 rows=2 loops=1)
  Total runtime: 0.276 ms
(16 rows)

krichy=> ANALYZE a;
ANALYZE
krichy=> explain analyze select * from a left join (b inner join c on b.id
= c.id) on a.b = b.id where a.id = 1;
                                                              QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Merge Right Join  (cost=8.29..883178.31 rows=500 width=16) (actual
time=0.050..0.056 rows=1 loops=1)
    Merge Cond: (b.id = a.b)
    ->  Merge Join  (cost=0.00..758162.52 rows=50000000 width=8) (actual
time=0.034..0.038 rows=2 loops=1)
          Merge Cond: (b.id = c.id)
          ->  Index Scan using b_idx1 on b  (cost=0.00..3956.26 rows=100000
width=4) (actual time=0.015..0.016 rows=2 loops=1)
          ->  Materialize  (cost=0.00..4206.26 rows=100000 width=4) (actual
time=0.015..0.017 rows=2 loops=1)
                ->  Index Scan using c_idx1 on c  (cost=0.00..3956.26
rows=100000 width=4) (actual time=0.012..0.013 rows=2 loops=1)
    ->  Materialize  (cost=8.29..8.29 rows=1 width=8) (actual
time=0.015..0.016 rows=1 loops=1)
          ->  Sort  (cost=8.29..8.29 rows=1 width=8) (actual
time=0.013..0.013 rows=1 loops=1)
                Sort Key: a.b
                Sort Method: quicksort  Memory: 17kB
                ->  Index Scan using a_idx1 on a  (cost=0.00..8.28 rows=1
width=8) (actual time=0.007..0.008 rows=1 loops=1)
                      Index Cond: (id = 1)
  Total runtime: 0.101 ms
(14 rows)

krichy=> ANALYZE b;
ANALYZE
krichy=> explain analyze select * from a left join (b inner join c on b.id
= c.id) on a.b = b.id where a.id = 1;
                                                         QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
  Hash Right Join  (cost=2651.29..6419.30 rows=1 width=16) (actual
time=83.823..257.890 rows=1 loops=1)
    Hash Cond: (b.id = a.b)
    ->  Hash Join  (cost=2643.00..6036.00 rows=100000 width=8) (actual
time=83.790..224.552 rows=100000 loops=1)
          Hash Cond: (c.id = b.id)
          ->  Seq Scan on c  (cost=0.00..1393.00 rows=100000 width=4)
(actual time=0.010..35.925 rows=100000 loops=1)
          ->  Hash  (cost=1393.00..1393.00 rows=100000 width=4) (actual
time=83.752..83.752 rows=100000 loops=1)
                Buckets: 16384  Batches: 1  Memory Usage: 2344kB
                ->  Seq Scan on b  (cost=0.00..1393.00 rows=100000 width=4)
(actual time=0.009..36.302 rows=100000 loops=1)
    ->  Hash  (cost=8.28..8.28 rows=1 width=8) (actual time=0.012..0.012
rows=1 loops=1)
          Buckets: 1024  Batches: 1  Memory Usage: 1kB
          ->  Index Scan using a_idx1 on a  (cost=0.00..8.28 rows=1
width=8) (actual time=0.007..0.008 rows=1 loops=1)
                Index Cond: (id = 1)
  Total runtime: 258.245 ms
(13 rows)

krichy=> ANALYZE c;
ANALYZE
krichy=> explain analyze select * from a left join (b inner join c on b.id
= c.id) on a.b = b.id where a.id = 1;
                                                         QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
  Hash Right Join  (cost=2651.29..6419.30 rows=1 width=16) (actual
time=83.295..255.653 rows=1 loops=1)
    Hash Cond: (b.id = a.b)
    ->  Hash Join  (cost=2643.00..6036.00 rows=100000 width=8) (actual
time=83.275..222.373 rows=100000 loops=1)
          Hash Cond: (b.id = c.id)
          ->  Seq Scan on b  (cost=0.00..1393.00 rows=100000 width=4)
(actual time=0.010..35.726 rows=100000 loops=1)
          ->  Hash  (cost=1393.00..1393.00 rows=100000 width=4) (actual
time=83.238..83.238 rows=100000 loops=1)
                Buckets: 16384  Batches: 1  Memory Usage: 2344kB
                ->  Seq Scan on c  (cost=0.00..1393.00 rows=100000 width=4)
(actual time=0.009..36.243 rows=100000 loops=1)
    ->  Hash  (cost=8.28..8.28 rows=1 width=8) (actual time=0.011..0.011
rows=1 loops=1)
          Buckets: 1024  Batches: 1  Memory Usage: 1kB
          ->  Index Scan using a_idx1 on a  (cost=0.00..8.28 rows=1
width=8) (actual time=0.008..0.009 rows=1 loops=1)
                Index Cond: (id = 1)
  Total runtime: 256.008 ms
(13 rows)

So after analyzing all the tables, the result changed much, psql uses
other plans to do the query, and in my case it is effectively much slower.

My configuration file has:
work_mem=16MB

with this removed, the query goes fast again, but I dont know why the more
memory makes psql chose a worse plan.

Thanks in advance,

Kojedzinszky Richard
Euronet Magyarorszag Informatikai Zrt.

On Thu, 26 Apr 2012, Tom Lane wrote:

> Date: Thu, 26 Apr 2012 15:17:18 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Richard Kojedzinszky <krichy@tvnetwork.hu>
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] query optimization
>
> Richard Kojedzinszky <krichy@tvnetwork.hu> writes:
>> Dear list,
>> We have a database schema, which looks the same as the attached script.
>
>> When filling the tables with data, and skipping analyze on the table (so
>> pg_stats contains no records for table 'a'), the first select in the
>> script runs fast, but after an analyze the planner decides to sequence
>> scan tables b and c, thus making the query much slower. Can somebody help
>> me solving this issue, or tuning our installation to not to use sequence
>> scans in this case?
>
> Um ... did you analyze all the tables, or just some of them?  I get
> sub-millisecond runtimes if all four tables have been analyzed, but it
> does seem to pick lousy plans if, say, only a and b have been analyzed.
>
> What you really need for this query structure is the parameterized-path
> work I've been doing for 9.2; but at least on the exact example given,
> I'm not seeing that 9.1 is that much worse.
>
>             regards, tom lane
>

pgsql-performance by date:

Previous
From: Jan Nielsen
Date:
Subject: Re: Configuration Recommendations
Next
From: Eyal Wilde
Date:
Subject: Re: scale up (postgresql vs mssql)