Oracle v. Postgres 9.0 query performance - Mailing list pgsql-performance

From Tony Capobianco
Subject Oracle v. Postgres 9.0 query performance
Date
Msg-id 1307545892.1990.23.camel@tony1.localdomain
Whole thread Raw
Responses Re: Oracle v. Postgres 9.0 query performance
Re: Oracle v. Postgres 9.0 query performance
List pgsql-performance
We are thiiiiiis close to moving our datawarehouse from Oracle to
Postgres.  This query is identical on both systems, but runs much, much
faster on Oracle.  Our Postgres host has far superior hardware and
tuning parameters have been set via pgtune.  Most everything else runs
faster in Postgres, except for this query.  In Oracle, we get a hash
join that takes about 2 minutes:

SQL> set line 200
delete from plan_table;
explain plan for
CREATE TABLE ecr_opens
as
select o.emailcampaignid, count(memberid) opencnt
  from openactivity o,ecr_sents s
 where s.emailcampaignid = o.emailcampaignid
 group by o.emailcampaignid;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
SQL>
13 rows deleted.

SQL>   2    3    4    5    6    7
Explained.

SQL> SQL>
PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4034426201


---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name               | Rows  |
Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

---------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT        |                    |  5094 |
91692 |  9651  (24)| 00:02:16 |        |      |            |
|   1 |  LOAD AS SELECT               | ECR_OPENS          |       |
|            |          |        |      |            |
|   2 |   PX COORDINATOR              |                    |       |
|            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10002           |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,02 | P->S | QC (RAND)  |
|   4 |     HASH GROUP BY             |                    |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,02 | PCWP |            |
|   5 |      PX RECEIVE               |                    |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,02 | PCWP |            |

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   6 |       PX SEND HASH            | :TQ10001           |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,01 | P->P | HASH       |
|   7 |        HASH GROUP BY          |                    |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,01 | PCWP |            |
|   8 |         NESTED LOOPS          |                    |    17M|
297M|   200  (98)| 00:00:03 |  Q1,01 | PCWP |            |
|   9 |          BUFFER SORT          |                    |       |
|            |          |  Q1,01 | PCWC |            |
|  10 |           PX RECEIVE          |                    |       |
|            |          |  Q1,01 | PCWP |            |
|  11 |            PX SEND ROUND-ROBIN| :TQ10000           |       |
|            |          |        | S->P | RND-ROBIN  |
|  12 |             TABLE ACCESS FULL | ECR_SENTS          |   476 |
6188 |     3   (0)| 00:00:01 |        |      |            |
|* 13 |          INDEX RANGE SCAN     | OPENACT_EMCAMP_IDX | 36355 |
177K|     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |

---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------

  13 - access("S"."EMAILCAMPAIGNID"="O"."EMAILCAMPAIGNID")

Note
-----
   - dynamic sampling used for this statement

29 rows selected.

SQL> desc openactivity
 Name                                      Null?    Type
 ----------------------------------------- --------
----------------------------
 EMAILCAMPAIGNID                           NOT NULL NUMBER
 MEMBERID                                  NOT NULL NUMBER
 OPENDATE                                           DATE
 IPADDRESS                                          VARCHAR2(25)
 DATE_ID                                            NUMBER

SQL> select count(*) from openactivity;

  COUNT(*)
----------
 192542480

SQL> desc ecr_sents
 Name                                      Null?    Type
 ----------------------------------------- --------
----------------------------
 EMAILCAMPAIGNID                                    NUMBER
 MEMCNT                                             NUMBER
 DATE_ID                                            NUMBER
 SENTDATE                                           DATE


SQL> select count(*) from ecr_sents;

  COUNT(*)
----------
       476

Our final result is the ecr_opens table which is 476 rows.

On Postgres, this same query takes about 58 minutes (could not run
explain analyze because it is in progress):

pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
                                                 QUERY
PLAN
-------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
         ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4)
         ->  Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12)
               Index Cond: (o.emailcampaignid = s.emailcampaignid)
(5 rows)


pg_dw=# \d openactivity
             Table "openactivity"
     Column      |         Type          | Modifiers
-----------------+-----------------------+-----------
 emailcampaignid | integer               | not null
 memberid        | bigint                | not null
 opendate        | date                  |
 ipaddress       | character varying(25) |
 date_id         | integer               |
Indexes:
    "openact_dateid_idx" btree (date_id), tablespace "pg_idx"
    "openact_emcamp_idx" btree (emailcampaignid), tablespace "pg_idx"

pg_dw=# select count(*) from openactivity;
   count
-----------
 192542480

pg_dw=# \d ecr_sents
       Table "staging.ecr_sents"
     Column      |  Type   | Modifiers
-----------------+---------+-----------
 emailcampaignid | integer |
 memcnt          | numeric |
 date_id         | integer |
 sentdate        | date    |
Indexes:
    "ecr_sents_ecid_idx" btree (emailcampaignid), tablespace
"staging_idx"

pg_dw=# select count(*) from ecr_sents;
 count
-------
   479

We added an index on ecr_sents to see if that improved performance, but
did not work.  Both tables have updated stats:


pg_dw=# select relname, last_vacuum, last_autovacuum, last_analyze,
last_autoanalyze from pg_stat_all_tables where relname in
('openactivity','ecr_sents');
   relname    |          last_vacuum          | last_autovacuum |
last_analyze          |       last_autoanalyze

--------------+-------------------------------+-----------------+-------------------------------+-------------------------------
 ecr_sents    |                               |                 |
2011-06-08 10:31:20.677172-04 | 2011-06-08 10:31:34.545504-04
 openactivity | 2011-06-02 16:34:47.129695-04 |                 |
2011-06-07 13:48:21.909546-04 | 2011-04-27 17:49:15.004551-04

Relevant info:
pg_dw=# SELECT
pg_dw-#    'version'::text AS "name",
pg_dw-#    version() AS "current_setting"
pg_dw-#  UNION ALL
pg_dw-#  SELECT
pg_dw-#    name,current_setting(name)
pg_dw-#  FROM pg_settings
pg_dw-#  WHERE NOT source='default' AND NOT name IN
pg_dw-#    ('config_file','data_directory','hba_file','ident_file',
pg_dw(#    'log_timezone','DateStyle','lc_messages','lc_monetary',
pg_dw(#    'lc_numeric','lc_time','timezone_abbreviations',
pg_dw(#    'default_text_search_config','application_name',
pg_dw(#    'transaction_deferrable','transaction_isolation',
pg_dw(#    'transaction_read_only');
             name             |
current_setting

------------------------------+-------------------------------------------------------------------------------------------------------------------
 version                      | PostgreSQL 9.0.3 on
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red
Hat 4.1.2-48), 64-bit
 archive_command              | (disabled)
 archive_timeout              | 1h
 autovacuum_max_workers       | 10
 checkpoint_completion_target | 0.9
 checkpoint_segments          | 64
 checkpoint_timeout           | 1h
 constraint_exclusion         | on
 default_statistics_target    | 100
 effective_cache_size         | 22GB
 effective_io_concurrency     | 5
 fsync                        | on
 lc_collate                   | en_US.UTF-8
 lc_ctype                     | en_US.UTF-8
 listen_addresses             | *
 log_checkpoints              | on
 log_destination              | stderr
 log_directory                | pg_log
 log_error_verbosity          | default
 log_filename                 | pg_dw.log
 log_line_prefix              | %m-%u-%p
 log_lock_waits               | on
 log_min_error_statement      | panic
 log_min_messages             | notice
 log_rotation_age             | 0
 log_rotation_size            | 0
 log_truncate_on_rotation     | off
 logging_collector            | on
 maintenance_work_mem         | 1GB
 max_connections              | 400
 max_stack_depth              | 2MB
 search_path                  | xxxxx
 server_encoding              | UTF8
 shared_buffers               | 7680MB
 TimeZone                     | US/Eastern
 wal_buffers                  | 32MB
 wal_level                    | archive
 work_mem                     | 768MB

Should this query be hashing the smaller table on Postgres rather than
using nested loops?

Thanks.
Tony


pgsql-performance by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: Set of related slow queries
Next
From: tv@fuzzy.cz
Date:
Subject: Re: Oracle v. Postgres 9.0 query performance