Thread: Confusing Query Performance

Confusing Query Performance

From

"Gauri Kanekar"

Date:

01 October 2008, 07:34:44

Hi,

We have a table called "table1" which contains around 638725448 records.
We created a subset of this table and named it as "new_table1" which has around 120107519 records.

"new_table1" is 18% of the the whole "table1".

If we fire the below queries we are not finding any drastic performance gain.

Query 1 :
SELECT SUM(table1.idlv), SUM(table1.cdlv)
       FROM table1, table2 CROSS JOIN table3
       WHERE table1.dk = table2.k
             AND table2.dt BETWEEN '2008.08.01' AND '2008.08.20'
             AND table1.nk = table3.k
             AND table3.id = 999 ;

Time taken :
9967.051 ms
9980.021 ms

                                                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=647373.04..647373.05 rows=1 width=16) (actual time=9918.010..9918.010 rows=1 loops=1)
   -> Nested Loop (cost=186.26..647160.32 rows=42543 width=16) (actual time=655.832..6622.011 rows=5120582 loops=1)
         -> Nested Loop (cost=0.00..17.42 rows=30 width=8) (actual time=0.024..0.164 rows=31 loops=1)
               -> Index Scan using ridx on table3 (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.014 rows=1 loops=1)
                     Index Cond: (id = 999)
               -> Index Scan using rdtidx on table2 (cost=0.00..8.85 rows=30 width=4) (actual time=0.008..0.110 rows=31 loops=1)
                     Index Cond: ((table2.dt >= '2008-08-01 00:00:00'::timestamp without time zone) AND (table2.dt <= '2008-08-20 00:00:00'::timestamp without time zone))
         -> Bitmap Heap Scan on table1 (cost=186.26..21489.55 rows=5459 width=24) (actual time=57.053..170.657 rows=165180 loops=31)
               Recheck Cond: ((table1.nk = table3.k) AND (table1.dk = table2.k))
               -> Bitmap Index Scan on rndtidx (cost=0.00..184.89 rows=5459 width=0) (actual time=47.855..47.855 rows=165180 loops=31)
                     Index Cond: ((table1.nk = table3.k) AND (table1.dk = table2.k))
Total runtime: 9918.118 ms
(12 rows)

Time: 9967.051 ms

Query 2 :

SELECT SUM(new_table1.idlv) , SUM(new_table1.cdlv)
       FROM new_table1, table2 CROSS JOIN table3
       WHERE new_table1.dk = table2.k
             AND table2.dt BETWEEN '2008.08.01' AND '2008.08.20'
             AND new_table1.nk = table3.k
             AND table3.id = 999 ;

Time taken :
8225.308 ms
8500.728 ms

                                                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=414372.59..414372.60 rows=1 width=16) (actual time=8224.300..8224.300 rows=1 loops=1)
   -> Nested Loop (cost=0.00..414246.81 rows=25155 width=16) (actual time=19.578..4922.680 rows=5120582 loops=1)
         -> Nested Loop (cost=0.00..17.42 rows=30 width=8) (actual time=0.034..0.125 rows=31 loops=1)
               -> Index Scan using ridx on table3 (cost=0.00..8.27 rows=1 width=4) (actual time=0.020..0.022 rows=1 loops=1)
                     Index Cond: (id = 999)
               -> Index Scan using rdtidx on table2 (cost=0.00..8.85 rows=30 width=4) (actual time=0.010..0.064 rows=31 loops=1)
                     Index Cond: ((table2.dt >= '2008-08-01 00:00:00'::timestamp without time zone) AND (table2.dt <= '2008-08-20 00:00:00'::timestamp without time zone))
         -> Index Scan using rndtidx on new_table1 (cost=0.00..13685.26 rows=8159 width=24) (actual time=0.648..117.415 rows=165180 loops=31)
               Index Cond: ((new_table1.nk = table3.k) AND (new_table1.dk = table2.k))
Total runtime: 8224.386 ms
(10 rows)

Time: 8225.308 ms

We have set join_collapse_limit = 8, from_collapse_limit = 1.

--
Regards
Gauri

Re: Confusing Query Performance

From

Matthew Wakeling

Date:

01 October 2008, 07:50:03

On Wed, 1 Oct 2008, Gauri Kanekar wrote:
> "new_table1" is 18% of the the whole "table1".

>    ->  Nested Loop  (cost=186.26..647160.32 rows=42543 width=16) (actual time=655.832..6622.011 rows=5120582 loops=1)

>    ->  Nested Loop  (cost=0.00..414246.81 rows=25155 width=16) (actual time=19.578..4922.680 rows=5120582 loops=1)

The new table may be that much smaller than the old table, but you're
selecting exactly the same amount of data from it. The data is fetched by
indexes, which means random access, so the overall size of the data that
you don't fetch doesn't make any difference.

Matthew

--
Existence is a convenient concept to designate all of the files that an
executable program can potentially process.   -- Fortran77 standard

Re: Confusing Query Performance

From

Josh Berkus

Date:

01 October 2008, 17:05:39

On Wednesday 01 October 2008 03:34, Gauri Kanekar wrote:
>    ->  Nested Loop  (cost=186.26..647160.32 rows=42543 width=16) (actual
> time=655.832..6622.011 rows=5120582 loops=1)

That nested loop estimate is off by 100x, which is why the DB is using a
slow nested loop for a large amount of data.  I'd try increasing your
statistics collection, analyze, and re-run the query.

--
--Josh

Josh Berkus
PostgreSQL
San Francisco