The Two Towers - Mailing list pgsql-novice
From | Mladen Gogala |
---|---|
Subject | The Two Towers |
Date | |
Msg-id | 4C0DADF2.9000305@vmsinfo.com Whole thread Raw |
Responses |
Re: The Two Towers
Re: The Two Towers |
List | pgsql-novice |
Battleground was a machine running CentOS 5.5 and the two towers were 2 databases. First, PostgreSQL 8.4.4: scott=# select count(*) from emp e1,emp e2, emp e3, emp e4,emp e5, emp e6; count --------- 7529536 (1 row) Time: 2127.345 ms scott=# select version(); version -------------------------------------------------------------------------------- -------------------------------- PostgreSQL 8.4.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 2008 0704 (Red Hat 4.1.2-46), 32-bit (1 row) Time: 1.008 ms scott=# explain analyze scott-# select count(*) from emp e1,emp e2, emp e3, emp e4,emp e5, emp e6; QUERY PLAN -------------------------------------------------------------------------------- --------------------------------------------------------- Aggregate (cost=1052615959933232.62..1052615959933232.62 rows=1 width=0) (actu al time=13684.811..13684.811 rows=1 loops=1) -> Nested Loop (cost=16.60..935975959933232.62 rows=46656000000000000 width =0) (actual time=0.038..10508.108 rows=7529536 loops=1) -> Nested Loop (cost=0.00..2855959933216.00 rows=77760000000000 width =0) (actual time=0.025..820.829 rows=537824 loops=1) -> Nested Loop (cost=0.00..4759933216.00 rows=129600000000 widt h=0) (actual time=0.021..59.701 rows=38416 loops=1) -> Nested Loop (cost=0.00..7933216.00 rows=216000000 widt h=0) (actual time=0.018..4.424 rows=2744 loops=1) -> Nested Loop (cost=0.00..13216.00 rows=360000 wid th=0) (actual time=0.015..0.333 rows=196 loops=1) -> Seq Scan on emp e1 (cost=0.00..16.00 rows= 600 width=0) (actual time=0.009..0.018 rows=14 loops=1) -> Seq Scan on emp e6 (cost=0.00..16.00 rows= 600 width=0) (actual time=0.002..0.010 rows=14 loops=14) -> Seq Scan on emp e5 (cost=0.00..16.00 rows=600 wi dth=0) (actual time=0.002..0.008 rows=14 loops=196) -> Seq Scan on emp e4 (cost=0.00..16.00 rows=600 width=0) (actual time=0.001..0.007 rows=14 loops=2744) -> Seq Scan on emp e3 (cost=0.00..16.00 rows=600 width=0) (actu al time=0.001..0.007 rows=14 loops=38416) -> Materialize (cost=16.60..22.60 rows=600 width=0) (actual time=0.00 0..0.006 rows=14 loops=537824) -> Seq Scan on emp e2 (cost=0.00..16.00 rows=600 width=0) (actu al time=0.003..0.012 rows=14 loops=1) Total runtime: 13684.977 ms (14 rows) Time: 13714.576 ms scott=# \d+ emp Table "public.emp" Column | Type | Modifiers | Storage | Description ----------+-----------------------------+-----------+----------+------------- empno | smallint | not null | plain | ename | character varying(10) | not null | extended | job | character varying(9) | | extended | mgr | smallint | | plain | hiredate | timestamp without time zone | | plain | sal | double precision | | plain | comm | double precision | | plain | deptno | smallint | | plain | Indexes: "emp_pkey" PRIMARY KEY, btree (empno) "emp_deptno_i" btree (deptno) "emp_ename_id" btree (ename) "ind_emp_deptno" btree (deptno) Foreign-key constraints: "fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno) Has OIDs: no Now, Oracle 11.2.0.1 with the April patch: SQL> set autotrace on SQL> select count(*) from emp e1,emp e2, emp e3, emp e4,emp e5, emp e6; COUNT(*) ---------- 7529536 Elapsed: 00:00:00.85 Execution Plan ---------------------------------------------------------- Plan hash value: 1049923164 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 157K (1)| 00:31:25 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | MERGE JOIN CARTESIAN | | 7529K| 157K (1)| 00:31:25 | | 3 | MERGE JOIN CARTESIAN | | 537K| 11224 (1)| 00:02:15 | | 4 | MERGE JOIN CARTESIAN | | 38416 | 808 (1)| 00:00:10 | | 5 | MERGE JOIN CARTESIAN | | 2744 | 62 (0)| 00:00:01 | | 6 | MERGE JOIN CARTESIAN | | 196 | 7 (0)| 00:00:01 | | 7 | INDEX FULL SCAN | PK_EMP | 14 | 1 (0)| 00:00:01 | | 8 | BUFFER SORT | | 14 | 6 (0)| 00:00:01| | 9 | INDEX FAST FULL SCAN| PK_EMP | 14 | 0 (0)| 00:00:01 | | 10 | BUFFER SORT | | 14 | 62 (0)| 00:00:01 | | 11 | INDEX FAST FULL SCAN | PK_EMP | 14 | 0 (0)| 00:00:01 | | 12 | BUFFER SORT | | 14 | 808 (1)| 00:00:10 | | 13 | INDEX FAST FULL SCAN | PK_EMP | 14 | 0 (0)| 00:00:01 | | 14 | BUFFER SORT | | 14 | 11224 (1)| 00:02:15 | | 15 | INDEX FAST FULL SCAN | PK_EMP | 14 | 0 (0)| 00:00:01 | | 16 | BUFFER SORT | | 14 | 157K (1)| 00:31:25 | | 17 | INDEX FAST FULL SCAN | PK_EMP | 14 | 0 (0)| 00:00:01 | ------------------------------------------------------------------------------ Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 21 consistent gets 0 physical reads 0 redo size 425 bytes sent via SQL*Net to client 419 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 5 sorts (memory) 0 sorts (disk) 1 rows processed SQL> SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production PL/SQL Release 11.2.0.1.0 - Production CORE 11.2.0.1.0 Production TNS for Linux: Version 11.2.0.1.0 - Production NLSRTL Version 11.2.0.1.0 - Production Both databases have current statistics, the data is exactly the same, as shown by the results. Oracle finished in 0.85 second (850 milliseconds) while PostgreSQL took 2127.345 ms, 2.5 times slower than Oracle. The difference that is obvious is the access path: Postgres chose nested loops, while Oracle chose merge join and utilized the primary key for the table. Postgres, apparently cannot do "fast full scan" of an index and doesn't know how to utilize the primary key in cases like this. However, my gripe is that the optimizer should have selected merge join, just like Oracle did. In this case, nested loops are definitely the wrong choice. Rule based optimizers, the kind of the optimizer that takes into consideration only the structure of the table, usually ends up being dominated by the nested loops method. Nested loops method usually dominates the OLTP type applications but can really mess up large reports. I am under the impression that Postgres query planner is geared toward the OLTP type of the database. Maybe a new parameter is needed that would somehow shift gears to "data warehouse use", on demand? I have to say, the advantage is still on the side of Sauron. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
pgsql-novice by date: