The Two Towers - Mailing list pgsql-novice

From Mladen Gogala
Subject The Two Towers
Date
Msg-id 4C0DADF2.9000305@vmsinfo.com
Whole thread Raw
Responses Re: The Two Towers
Re: The Two Towers
List pgsql-novice
Battleground was a machine running CentOS 5.5 and the two towers were 2
databases. First, PostgreSQL 8.4.4:

scott=#  select count(*) from emp
e1,emp e2, emp e3, emp e4,emp e5, emp e6;
    count 
  ---------
   7529536
  (1 row)

  Time: 2127.345 ms
  scott=# select version();

                                                   

version                    


                               

  --------------------------------------------------------------------------------
  --------------------------------
   PostgreSQL 8.4.4 on i686-redhat-linux-gnu,
compiled by GCC gcc (GCC) 4.1.2 2008
  0704 (Red Hat 4.1.2-46), 32-bit
  (1 row)

  Time: 1.008 ms
  scott=# explain analyze
  scott-# select count(*) from emp e1,emp e2,
emp e3, emp e4,emp e5, emp e6;

                                                              
QUERY PLAN      

                                                        

  --------------------------------------------------------------------------------
  ---------------------------------------------------------
   Aggregate 
(cost=1052615959933232.62..1052615959933232.62 rows=1 width=0) (actu
  al time=13684.811..13684.811 rows=1 loops=1)
     ->  Nested Loop 
(cost=16.60..935975959933232.62 rows=46656000000000000 width
  =0) (actual time=0.038..10508.108
rows=7529536 loops=1)
           ->  Nested Loop 
(cost=0.00..2855959933216.00 rows=77760000000000 width
  =0) (actual time=0.025..820.829 rows=537824
loops=1)
                 ->  Nested Loop 
(cost=0.00..4759933216.00 rows=129600000000 widt
  h=0) (actual time=0.021..59.701 rows=38416
loops=1)

                    
-> Nested Loop  
(cost=0.00..7933216.00 rows=216000000 widt
  h=0) (actual time=0.018..4.424 rows=2744
loops=1)

                          
-> Nested 
Loop  (cost=0.00..13216.00 rows=360000 wid
  th=0) (actual time=0.015..0.333 rows=196
loops=1)

                                
-> Seq 
Scan on emp e1  (cost=0.00..16.00 rows=
  600 width=0) (actual time=0.009..0.018
rows=14 loops=1)

                                
-> Seq 
Scan on emp e6  (cost=0.00..16.00 rows=
  600 width=0) (actual time=0.002..0.010
rows=14 loops=14)

                          
-> Seq Scan 
on emp e5  (cost=0.00..16.00 rows=600 wi
  dth=0) (actual time=0.002..0.008 rows=14
loops=196)

                    
-> Seq Scan on emp 
e4  (cost=0.00..16.00 rows=600 width=0)
   (actual time=0.001..0.007 rows=14
loops=2744)
                 ->  Seq Scan on emp
e3 
(cost=0.00..16.00 rows=600 width=0) (actu
  al time=0.001..0.007 rows=14 loops=38416)
           ->  Materialize 
(cost=16.60..22.60 rows=600 width=0) (actual time=0.00
  0..0.006 rows=14 loops=537824)
                 ->  Seq Scan on emp
e2 
(cost=0.00..16.00 rows=600 width=0) (actu
  al time=0.003..0.012 rows=14 loops=1)
   Total runtime: 13684.977 ms
  (14 rows)
  Time: 13714.576 ms
  scott=# \d+ emp

                            
Table
"public.emp"
    Column  |           
Type            | 
Modifiers | Storage  | Description
  ----------+-----------------------------+-----------+----------+-------------
   empno    |
smallint                   
|
not null  | plain    |
   ename    | character varying(10)       |
not null  | extended |
   job      | character varying(9)       
|           | extended |
   mgr      |
smallint                   
|           | plain    |
   hiredate | timestamp without time zone
|           | plain    |
   sal      | double
precision           
|           | plain    |
   comm     | double
precision           
|           | plain    |
   deptno   |
smallint                   
|           | plain    |
  Indexes:
      "emp_pkey" PRIMARY KEY, btree (empno)
      "emp_deptno_i" btree (deptno)
      "emp_ename_id" btree (ename)
      "ind_emp_deptno" btree (deptno)
  Foreign-key constraints:
      "fk_deptno" FOREIGN KEY (deptno)
REFERENCES dept(deptno)
  Has OIDs: no


Now, Oracle 11.2.0.1 with the April patch:
SQL> set autotrace on
  SQL> select count(*) from emp e1,emp e2,
emp e3, emp e4,emp e5, emp e6;

    COUNT(*)
  ----------
     7529536

  Elapsed: 00:00:00.85

  Execution Plan
  ----------------------------------------------------------
  Plan hash value: 1049923164

  ------------------------------------------------------------------------------
  | Id  |
Operation                  | 
Name   | Rows  | Cost (%CPU)| Time     |
  ------------------------------------------------------------------------------
  |   0 | SELECT STATEMENT           
|        |     1 |   157K  (1)| 00:31:25 |
  |   1 |  SORT AGGREGATE            
|        |     1
|           
|         | 
  |   2 |   MERGE JOIN CARTESIAN     
|        |  7529K|   157K  (1)| 00:31:25 |
  |   3 |    MERGE JOIN CARTESIAN    
|        |   537K| 11224   (1)| 00:02:15 |
  |   4 |     MERGE JOIN CARTESIAN   
|        | 38416 |   808   (1)| 00:00:10 |
  |   5 |      MERGE JOIN CARTESIAN  
|        |  2744 |    62   (0)| 00:00:01 |
  |   6 |       MERGE JOIN CARTESIAN 
|        |   196 |     7   (0)| 00:00:01 |
  |   7 |        INDEX FULL SCAN      |
PK_EMP |    14 |     1   (0)| 00:00:01 |
  |   8 |        BUFFER
SORT         
|        |    14 |     6   (0)|
00:00:01| 
  |   9 |         INDEX FAST FULL SCAN|
PK_EMP |    14 |     0   (0)| 00:00:01 |
  |  10 |       BUFFER
SORT          
|        |    14 |    62   (0)| 00:00:01 |
  |  11 |        INDEX FAST FULL SCAN |
PK_EMP |    14 |     0   (0)| 00:00:01 |
  |  12 |      BUFFER
SORT           
|        |    14 |   808   (1)| 00:00:10 |
  |  13 |       INDEX FAST FULL SCAN  |
PK_EMP |    14 |     0   (0)| 00:00:01 |
  |  14 |     BUFFER
SORT            
|        |    14 | 11224   (1)| 00:02:15 |
  |  15 |      INDEX FAST FULL SCAN   |
PK_EMP |    14 |     0   (0)| 00:00:01 |
  |  16 |    BUFFER
SORT             
|        |    14 |   157K  (1)| 00:31:25 |
  |  17 |     INDEX FAST FULL SCAN    |
PK_EMP |    14 |     0   (0)| 00:00:01 |
  ------------------------------------------------------------------------------


  Statistics
  ----------------------------------------------------------
            0  recursive calls
            0  db block gets
           21  consistent gets
            0  physical reads
            0  redo size
          425  bytes sent via SQL*Net to
client
          419  bytes received via SQL*Net
from client
            2  SQL*Net roundtrips to/from
client
            5  sorts (memory)
            0  sorts (disk)
            1  rows processed

  SQL>

  SQL> select * from v$version;

  BANNER
  --------------------------------------------------------------------------------
  Oracle Database 11g Enterprise Edition
Release 11.2.0.1.0 - Production
  PL/SQL Release 11.2.0.1.0 - Production
  CORE    11.2.0.1.0      Production
  TNS for Linux: Version 11.2.0.1.0 -
Production
  NLSRTL Version 11.2.0.1.0 - Production


Both databases have current statistics, the data is exactly the same,
as shown by the results. Oracle finished in 0.85 second (850
milliseconds) while PostgreSQL took 2127.345 ms, 2.5 times slower than
Oracle. The difference that is obvious is the access path: Postgres
chose nested loops, while Oracle chose merge join and utilized the
primary key for the table. Postgres, apparently cannot do "fast full
scan" of an index and doesn't know how to utilize the primary key in
cases like this. However, my gripe is that the optimizer should have
selected merge join, just like Oracle did. In this case, nested loops
are definitely the wrong choice. Rule based optimizers, the kind of the
optimizer that takes into consideration only the structure of the
table, usually ends up being dominated by the nested loops method.
Nested loops method usually dominates the OLTP type applications but
can really mess up large reports. I am under the impression that
Postgres query planner is geared toward the OLTP type of the database.
Maybe a new parameter is needed that would somehow shift gears to "data
warehouse use", on demand? I have to say, the advantage is still on the
side of Sauron.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

pgsql-novice by date:

Previous
From: Mladen Gogala
Date:
Subject: Re: The escape clause in the SELECT statement in PostgreSQL 8.4
Next
From: Donn Washburn
Date:
Subject: Re: The Two Towers