Thread: Oracle v. Postgres 9.0 query performance

Oracle v. Postgres 9.0 query performance

From

Tony Capobianco

Date:

08 June 2011, 12:18:20

We are thiiiiiis close to moving our datawarehouse from Oracle to
Postgres.  This query is identical on both systems, but runs much, much
faster on Oracle.  Our Postgres host has far superior hardware and
tuning parameters have been set via pgtune.  Most everything else runs
faster in Postgres, except for this query.  In Oracle, we get a hash
join that takes about 2 minutes:

SQL> set line 200
delete from plan_table;
explain plan for
CREATE TABLE ecr_opens
as
select o.emailcampaignid, count(memberid) opencnt
  from openactivity o,ecr_sents s
 where s.emailcampaignid = o.emailcampaignid
 group by o.emailcampaignid;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
SQL>
13 rows deleted.

SQL>   2    3    4    5    6    7
Explained.

SQL> SQL>
PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4034426201


---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name               | Rows  |
Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

---------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT        |                    |  5094 |
91692 |  9651  (24)| 00:02:16 |        |      |            |
|   1 |  LOAD AS SELECT               | ECR_OPENS          |       |
|            |          |        |      |            |
|   2 |   PX COORDINATOR              |                    |       |
|            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10002           |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,02 | P->S | QC (RAND)  |
|   4 |     HASH GROUP BY             |                    |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,02 | PCWP |            |
|   5 |      PX RECEIVE               |                    |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,02 | PCWP |            |

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   6 |       PX SEND HASH            | :TQ10001           |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,01 | P->P | HASH       |
|   7 |        HASH GROUP BY          |                    |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,01 | PCWP |            |
|   8 |         NESTED LOOPS          |                    |    17M|
297M|   200  (98)| 00:00:03 |  Q1,01 | PCWP |            |
|   9 |          BUFFER SORT          |                    |       |
|            |          |  Q1,01 | PCWC |            |
|  10 |           PX RECEIVE          |                    |       |
|            |          |  Q1,01 | PCWP |            |
|  11 |            PX SEND ROUND-ROBIN| :TQ10000           |       |
|            |          |        | S->P | RND-ROBIN  |
|  12 |             TABLE ACCESS FULL | ECR_SENTS          |   476 |
6188 |     3   (0)| 00:00:01 |        |      |            |
|* 13 |          INDEX RANGE SCAN     | OPENACT_EMCAMP_IDX | 36355 |
177K|     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |

---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------

  13 - access("S"."EMAILCAMPAIGNID"="O"."EMAILCAMPAIGNID")

Note
-----
   - dynamic sampling used for this statement

29 rows selected.

SQL> desc openactivity
 Name                                      Null?    Type
 ----------------------------------------- --------
----------------------------
 EMAILCAMPAIGNID                           NOT NULL NUMBER
 MEMBERID                                  NOT NULL NUMBER
 OPENDATE                                           DATE
 IPADDRESS                                          VARCHAR2(25)
 DATE_ID                                            NUMBER

SQL> select count(*) from openactivity;

  COUNT(*)
----------
 192542480

SQL> desc ecr_sents
 Name                                      Null?    Type
 ----------------------------------------- --------
----------------------------
 EMAILCAMPAIGNID                                    NUMBER
 MEMCNT                                             NUMBER
 DATE_ID                                            NUMBER
 SENTDATE                                           DATE


SQL> select count(*) from ecr_sents;

  COUNT(*)
----------
       476

Our final result is the ecr_opens table which is 476 rows.

On Postgres, this same query takes about 58 minutes (could not run
explain analyze because it is in progress):

pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
                                                 QUERY
PLAN
-------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
         ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4)
         ->  Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12)
               Index Cond: (o.emailcampaignid = s.emailcampaignid)
(5 rows)


pg_dw=# \d openactivity
             Table "openactivity"
     Column      |         Type          | Modifiers
-----------------+-----------------------+-----------
 emailcampaignid | integer               | not null
 memberid        | bigint                | not null
 opendate        | date                  |
 ipaddress       | character varying(25) |
 date_id         | integer               |
Indexes:
    "openact_dateid_idx" btree (date_id), tablespace "pg_idx"
    "openact_emcamp_idx" btree (emailcampaignid), tablespace "pg_idx"

pg_dw=# select count(*) from openactivity;
   count
-----------
 192542480

pg_dw=# \d ecr_sents
       Table "staging.ecr_sents"
     Column      |  Type   | Modifiers
-----------------+---------+-----------
 emailcampaignid | integer |
 memcnt          | numeric |
 date_id         | integer |
 sentdate        | date    |
Indexes:
    "ecr_sents_ecid_idx" btree (emailcampaignid), tablespace
"staging_idx"

pg_dw=# select count(*) from ecr_sents;
 count
-------
   479

We added an index on ecr_sents to see if that improved performance, but
did not work.  Both tables have updated stats:


pg_dw=# select relname, last_vacuum, last_autovacuum, last_analyze,
last_autoanalyze from pg_stat_all_tables where relname in
('openactivity','ecr_sents');
   relname    |          last_vacuum          | last_autovacuum |
last_analyze          |       last_autoanalyze

--------------+-------------------------------+-----------------+-------------------------------+-------------------------------
 ecr_sents    |                               |                 |
2011-06-08 10:31:20.677172-04 | 2011-06-08 10:31:34.545504-04
 openactivity | 2011-06-02 16:34:47.129695-04 |                 |
2011-06-07 13:48:21.909546-04 | 2011-04-27 17:49:15.004551-04

Relevant info:
pg_dw=# SELECT
pg_dw-#    'version'::text AS "name",
pg_dw-#    version() AS "current_setting"
pg_dw-#  UNION ALL
pg_dw-#  SELECT
pg_dw-#    name,current_setting(name)
pg_dw-#  FROM pg_settings
pg_dw-#  WHERE NOT source='default' AND NOT name IN
pg_dw-#    ('config_file','data_directory','hba_file','ident_file',
pg_dw(#    'log_timezone','DateStyle','lc_messages','lc_monetary',
pg_dw(#    'lc_numeric','lc_time','timezone_abbreviations',
pg_dw(#    'default_text_search_config','application_name',
pg_dw(#    'transaction_deferrable','transaction_isolation',
pg_dw(#    'transaction_read_only');
             name             |
current_setting

------------------------------+-------------------------------------------------------------------------------------------------------------------
 version                      | PostgreSQL 9.0.3 on
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red
Hat 4.1.2-48), 64-bit
 archive_command              | (disabled)
 archive_timeout              | 1h
 autovacuum_max_workers       | 10
 checkpoint_completion_target | 0.9
 checkpoint_segments          | 64
 checkpoint_timeout           | 1h
 constraint_exclusion         | on
 default_statistics_target    | 100
 effective_cache_size         | 22GB
 effective_io_concurrency     | 5
 fsync                        | on
 lc_collate                   | en_US.UTF-8
 lc_ctype                     | en_US.UTF-8
 listen_addresses             | *
 log_checkpoints              | on
 log_destination              | stderr
 log_directory                | pg_log
 log_error_verbosity          | default
 log_filename                 | pg_dw.log
 log_line_prefix              | %m-%u-%p
 log_lock_waits               | on
 log_min_error_statement      | panic
 log_min_messages             | notice
 log_rotation_age             | 0
 log_rotation_size            | 0
 log_truncate_on_rotation     | off
 logging_collector            | on
 maintenance_work_mem         | 1GB
 max_connections              | 400
 max_stack_depth              | 2MB
 search_path                  | xxxxx
 server_encoding              | UTF8
 shared_buffers               | 7680MB
 TimeZone                     | US/Eastern
 wal_buffers                  | 32MB
 wal_level                    | archive
 work_mem                     | 768MB

Should this query be hashing the smaller table on Postgres rather than
using nested loops?

Thanks.
Tony

Re: Oracle v. Postgres 9.0 query performance

From

tv@fuzzy.cz

Date:

08 June 2011, 12:31:35

> On Postgres, this same query takes about 58 minutes (could not run
> explain analyze because it is in progress):
>
> pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> pg_dw-# as
> pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> pg_dw-#   from openactivity o,ecr_sents s
> pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> pg_dw-#  group by o.emailcampaignid;
>                                                  QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
>    ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>          ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> (cost=0.00..38.59 rows=479 width=4)
>          ->  Index Scan using openact_emcamp_idx on openactivity o
> (cost=0.00..3395.49 rows=19372 width=12)
>                Index Cond: (o.emailcampaignid = s.emailcampaignid)
> (5 rows)
>

Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
explain.depesz.com.

regards
Tomas

Re: Oracle v. Postgres 9.0 query performance

From

Tom Lane

Date:

08 June 2011, 12:33:14

Tony Capobianco <tcapobianco@prospectiv.com> writes:
> pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> pg_dw-# as
> pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> pg_dw-#   from openactivity o,ecr_sents s
> pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> pg_dw-#  group by o.emailcampaignid;
>                                                  QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
>    ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>          ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> (cost=0.00..38.59 rows=479 width=4)
>          ->  Index Scan using openact_emcamp_idx on openactivity o
> (cost=0.00..3395.49 rows=19372 width=12)
>                Index Cond: (o.emailcampaignid = s.emailcampaignid)
> (5 rows)

> Should this query be hashing the smaller table on Postgres rather than
> using nested loops?

Yeah, seems like it.  Just for testing purposes, do "set enable_nestloop
= 0" and see what plan you get then.

            regards, tom lane

Re: Oracle v. Postgres 9.0 query performance

From

Tony Capobianco

Date:

08 June 2011, 12:40:42

pg_dw=# set enable_nestloop =0;
SET
Time: 0.165 ms
pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
                                       QUERY
PLAN
-----------------------------------------------------------------------------------------
 HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
   ->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
         Hash Cond: (o.emailcampaignid = s.emailcampaignid)
         ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12)
         ->  Hash  (cost=8.79..8.79 rows=479 width=4)
               ->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4)

Yikes.  Two sequential scans.


On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote:
> Tony Capobianco <tcapobianco@prospectiv.com> writes:
> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> > pg_dw-# as
> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> > pg_dw-#   from openactivity o,ecr_sents s
> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> > pg_dw-#  group by o.emailcampaignid;
> >                                                  QUERY
> > PLAN
> > -------------------------------------------------------------------------------------------------------------
> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
> >    ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> >          ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> > (cost=0.00..38.59 rows=479 width=4)
> >          ->  Index Scan using openact_emcamp_idx on openactivity o
> > (cost=0.00..3395.49 rows=19372 width=12)
> >                Index Cond: (o.emailcampaignid = s.emailcampaignid)
> > (5 rows)
>
> > Should this query be hashing the smaller table on Postgres rather than
> > using nested loops?
>
> Yeah, seems like it.  Just for testing purposes, do "set enable_nestloop
> = 0" and see what plan you get then.
>
>             regards, tom lane
>

Re: Oracle v. Postgres 9.0 query performance

From

Stephen Frost

Date:

08 June 2011, 12:52:08

* Tony Capobianco (tcapobianco@prospectiv.com) wrote:
>  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
>    ->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
>          Hash Cond: (o.emailcampaignid = s.emailcampaignid)
>          ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
> rows=192540967 width=12)
>          ->  Hash  (cost=8.79..8.79 rows=479 width=4)
>                ->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
> width=4)
>
> Yikes.  Two sequential scans.

Err, isn't that more-or-less exactly what you want here?  The smaller
table is going to be hashed and then you'll traverse the bigger table
and bounce each row off the hash table.  Have you tried actually running
this and seeing how long it takes?  The bigger table doesn't look to be
*that* big, if your i/o subsystem is decent and you've got a lot of
memory available for kernel cacheing, should be quick.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Oracle v. Postgres 9.0 query performance

From

Vitalii Tymchyshyn

Date:

08 June 2011, 12:52:17

08.06.11 18:40, Tony Capobianco написав(ла):
> pg_dw=# set enable_nestloop =0;
> SET
> Time: 0.165 ms
> pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> pg_dw-# as
> pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> pg_dw-#   from openactivity o,ecr_sents s
> pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> pg_dw-#  group by o.emailcampaignid;
>                                         QUERY
> PLAN
> -----------------------------------------------------------------------------------------
>   HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
>     ->   Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
>           Hash Cond: (o.emailcampaignid = s.emailcampaignid)
>           ->   Seq Scan on openactivity o  (cost=0.00..3529930.67
> rows=192540967 width=12)
>           ->   Hash  (cost=8.79..8.79 rows=479 width=4)
>                 ->   Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
> width=4)
>
> Yikes.  Two sequential scans.

Yep. Can you see another options? Either you take each of 479 records
and try to find matching records in another table using index (first
plan), or you take both two tables fully (seq scan) and join - second plan.
First plan is better if your large table is clustered enough on
emailcampaignid field (479 index reads and 479 sequential table reads).
If it's not, you may get a 479 table reads transformed into a lot or
random reads.
BTW: May be you have different data clustering in PostgreSQL & Oracle?
Or data in Oracle may be "hot" in caches?
Also, sequential scan is not too bad thing. It may be cheap enough to
read millions of records if they are not too wide. Please show "select
pg_size_pretty(pg_relation_size('openactivity'));" Have you tried to
explain analyze second plan?

Best regards, Vitalii Tymchyshyn


>
> On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote:
>> Tony Capobianco<tcapobianco@prospectiv.com>  writes:
>>> pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
>>> pg_dw-# as
>>> pg_dw-# select o.emailcampaignid, count(memberid) opencnt
>>> pg_dw-#   from openactivity o,ecr_sents s
>>> pg_dw-#  where s.emailcampaignid = o.emailcampaignid
>>> pg_dw-#  group by o.emailcampaignid;
>>>                                                   QUERY
>>> PLAN
>>> -------------------------------------------------------------------------------------------------------------
>>>   GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
>>>     ->   Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>>>           ->   Index Scan using ecr_sents_ecid_idx on ecr_sents s
>>> (cost=0.00..38.59 rows=479 width=4)
>>>           ->   Index Scan using openact_emcamp_idx on openactivity o
>>> (cost=0.00..3395.49 rows=19372 width=12)
>>>                 Index Cond: (o.emailcampaignid = s.emailcampaignid)
>>> (5 rows)
>>> Should this query be hashing the smaller table on Postgres rather than
>>> using nested loops?
>> Yeah, seems like it.  Just for testing purposes, do "set enable_nestloop
>> = 0" and see what plan you get then.

Re: Oracle v. Postgres 9.0 query performance

From

Tony Capobianco

Date:

08 June 2011, 13:22:17

Here's the explain analyze:

pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
as
select o.emailcampaignid, count(memberid) opencnt
  from openactivity o,ecr_sents s
 where s.emailcampaignid = o.emailcampaignid
 group by o.emailcampaignid;

QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
time=308630.967..2592279.526 rows=472 loops=1)
   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
(actual time=31.489..2589363.047 rows=8586466 loops=1)
         ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
loops=1)
         ->  Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
rows=17926 loops=479)
               Index Cond: (o.emailcampaignid = s.emailcampaignid)
 Total runtime: 2592284.336 ms


On Wed, 2011-06-08 at 17:31 +0200, tv@fuzzy.cz wrote:
> > On Postgres, this same query takes about 58 minutes (could not run
> > explain analyze because it is in progress):
> >
> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> > pg_dw-# as
> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> > pg_dw-#   from openactivity o,ecr_sents s
> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> > pg_dw-#  group by o.emailcampaignid;
> >                                                  QUERY
> > PLAN
> > -------------------------------------------------------------------------------------------------------------
> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
> >    ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> >          ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> > (cost=0.00..38.59 rows=479 width=4)
> >          ->  Index Scan using openact_emcamp_idx on openactivity o
> > (cost=0.00..3395.49 rows=19372 width=12)
> >                Index Cond: (o.emailcampaignid = s.emailcampaignid)
> > (5 rows)
> >
>
> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
> explain.depesz.com.
>
> regards
> Tomas
>
>

Re: Oracle v. Postgres 9.0 query performance

From

Pavel Stehule

Date:

08 June 2011, 13:28:13

Hello

what is your settings for

random_page_cost, seq_page_cost and work_mem?

Regards

Pavel Stehule

2011/6/8 Tony Capobianco <tcapobianco@prospectiv.com>:
> Here's the explain analyze:
>
> pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
> as
> select o.emailcampaignid, count(memberid) opencnt
>  from openactivity o,ecr_sents s
>  where s.emailcampaignid = o.emailcampaignid
>  group by o.emailcampaignid;
>
> QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
> time=308630.967..2592279.526 rows=472 loops=1)
>   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> (actual time=31.489..2589363.047 rows=8586466 loops=1)
>         ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
> loops=1)
>         ->  Index Scan using openact_emcamp_idx on openactivity o
> (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
> rows=17926 loops=479)
>               Index Cond: (o.emailcampaignid = s.emailcampaignid)
>  Total runtime: 2592284.336 ms
>
>
> On Wed, 2011-06-08 at 17:31 +0200, tv@fuzzy.cz wrote:
>> > On Postgres, this same query takes about 58 minutes (could not run
>> > explain analyze because it is in progress):
>> >
>> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
>> > pg_dw-# as
>> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
>> > pg_dw-#   from openactivity o,ecr_sents s
>> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
>> > pg_dw-#  group by o.emailcampaignid;
>> >                                                  QUERY
>> > PLAN
>> > -------------------------------------------------------------------------------------------------------------
>> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
>> >    ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>> >          ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
>> > (cost=0.00..38.59 rows=479 width=4)
>> >          ->  Index Scan using openact_emcamp_idx on openactivity o
>> > (cost=0.00..3395.49 rows=19372 width=12)
>> >                Index Cond: (o.emailcampaignid = s.emailcampaignid)
>> > (5 rows)
>> >
>>
>> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
>> explain.depesz.com.
>>
>> regards
>> Tomas
>>
>>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Oracle v. Postgres 9.0 query performance

From

Tony Capobianco

Date:

08 June 2011, 13:28:28

Well, this ran much better.  However, I'm not sure if it's because of
set enable_nestloop = 0, or because I'm executing the query twice in a
row, where previous results may be cached.  I will try this setting in
my code for when this process runs later today and see what the result
is.

Thanks!

pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;

                                                                   QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12) (actual
time=167254.751..167254.937 rows=472 loops=1)
   ->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12) (actual
time=0.300..164577.131 rows=8586466 loops=1)
         Hash Cond: (o.emailcampaignid = s.emailcampaignid)
         ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12) (actual time=0.011..124351.878 rows=192542480
loops=1)
         ->  Hash  (cost=8.79..8.79 rows=479 width=4) (actual
time=0.253..0.253 rows=479 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 17kB
               ->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4) (actual time=0.010..0.121 rows=479 loops=1)
 Total runtime: 167279.950 ms



On Wed, 2011-06-08 at 11:51 -0400, Stephen Frost wrote:
> * Tony Capobianco (tcapobianco@prospectiv.com) wrote:
> >  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
> >    ->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
> >          Hash Cond: (o.emailcampaignid = s.emailcampaignid)
> >          ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
> > rows=192540967 width=12)
> >          ->  Hash  (cost=8.79..8.79 rows=479 width=4)
> >                ->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
> > width=4)
> >
> > Yikes.  Two sequential scans.
>
> Err, isn't that more-or-less exactly what you want here?  The smaller
> table is going to be hashed and then you'll traverse the bigger table
> and bounce each row off the hash table.  Have you tried actually running
> this and seeing how long it takes?  The bigger table doesn't look to be
> *that* big, if your i/o subsystem is decent and you've got a lot of
> memory available for kernel cacheing, should be quick.
>
>     Thanks,
>
>         Stephen

Re: Oracle v. Postgres 9.0 query performance

From

Tony Capobianco

Date:

08 June 2011, 13:34:01

pg_dw=# show random_page_cost ;
 random_page_cost
------------------
 4
(1 row)

Time: 0.299 ms
pg_dw=# show seq_page_cost ;
 seq_page_cost
---------------
 1
(1 row)

Time: 0.250 ms
pg_dw=# show work_mem ;
 work_mem
----------
 768MB
(1 row)




On Wed, 2011-06-08 at 18:27 +0200, Pavel Stehule wrote:
> Hello
>
> what is your settings for
>
> random_page_cost, seq_page_cost and work_mem?
>
> Regards
>
> Pavel Stehule
>
> 2011/6/8 Tony Capobianco <tcapobianco@prospectiv.com>:
> > Here's the explain analyze:
> >
> > pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
> > as
> > select o.emailcampaignid, count(memberid) opencnt
> >  from openactivity o,ecr_sents s
> >  where s.emailcampaignid = o.emailcampaignid
> >  group by o.emailcampaignid;
> >
> > QUERY
> > PLAN
> >
----------------------------------------------------------------------------------------------------------------------------------------------------------------
> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
> > time=308630.967..2592279.526 rows=472 loops=1)
> >   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> > (actual time=31.489..2589363.047 rows=8586466 loops=1)
> >         ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> > (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
> > loops=1)
> >         ->  Index Scan using openact_emcamp_idx on openactivity o
> > (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
> > rows=17926 loops=479)
> >               Index Cond: (o.emailcampaignid = s.emailcampaignid)
> >  Total runtime: 2592284.336 ms
> >
> >
> > On Wed, 2011-06-08 at 17:31 +0200, tv@fuzzy.cz wrote:
> >> > On Postgres, this same query takes about 58 minutes (could not run
> >> > explain analyze because it is in progress):
> >> >
> >> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> >> > pg_dw-# as
> >> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> >> > pg_dw-#   from openactivity o,ecr_sents s
> >> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> >> > pg_dw-#  group by o.emailcampaignid;
> >> >                                                  QUERY
> >> > PLAN
> >> > -------------------------------------------------------------------------------------------------------------
> >> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
> >> >    ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> >> >          ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> >> > (cost=0.00..38.59 rows=479 width=4)
> >> >          ->  Index Scan using openact_emcamp_idx on openactivity o
> >> > (cost=0.00..3395.49 rows=19372 width=12)
> >> >                Index Cond: (o.emailcampaignid = s.emailcampaignid)
> >> > (5 rows)
> >> >
> >>
> >> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
> >> explain.depesz.com.
> >>
> >> regards
> >> Tomas
> >>
> >>
> >
> >
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
> >
>

Re: Oracle v. Postgres 9.0 query performance

From

Tom Lane

Date:

08 June 2011, 14:04:02

Tony Capobianco <tcapobianco@prospectiv.com> writes:
> Well, this ran much better.  However, I'm not sure if it's because of
> set enable_nestloop = 0, or because I'm executing the query twice in a
> row, where previous results may be cached.  I will try this setting in
> my code for when this process runs later today and see what the result
> is.

If the performance differential holds up, you should look at adjusting
your cost parameters so that the planner isn't so wrong about which one
is faster.  Hacking enable_nestloop is a band-aid, not something you
want to use in production.

Looking at the values you gave earlier, I wonder whether the
effective_cache_size setting isn't unreasonably high.  That's reducing
the estimated cost of accessing the large table via indexscans, and
I'm thinking it reduced it too much.

            regards, tom lane

Re: Oracle v. Postgres 9.0 query performance

From

Pavel Stehule

Date:

08 June 2011, 14:18:05

2011/6/8 Tony Capobianco <tcapobianco@prospectiv.com>:
> pg_dw=# show random_page_cost ;
>  random_page_cost
> ------------------
>  4
> (1 row)
>
> Time: 0.299 ms
> pg_dw=# show seq_page_cost ;
>  seq_page_cost
> ---------------
>  1
> (1 row)
>
> Time: 0.250 ms
> pg_dw=# show work_mem ;
>  work_mem
> ----------
>  768MB
> (1 row)
>
>

it is ok.

Pavel

>
>
> On Wed, 2011-06-08 at 18:27 +0200, Pavel Stehule wrote:
>> Hello
>>
>> what is your settings for
>>
>> random_page_cost, seq_page_cost and work_mem?
>>
>> Regards
>>
>> Pavel Stehule
>>
>> 2011/6/8 Tony Capobianco <tcapobianco@prospectiv.com>:
>> > Here's the explain analyze:
>> >
>> > pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
>> > as
>> > select o.emailcampaignid, count(memberid) opencnt
>> >  from openactivity o,ecr_sents s
>> >  where s.emailcampaignid = o.emailcampaignid
>> >  group by o.emailcampaignid;
>> >
>> > QUERY
>> > PLAN
>> >
----------------------------------------------------------------------------------------------------------------------------------------------------------------
>> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
>> > time=308630.967..2592279.526 rows=472 loops=1)
>> >   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>> > (actual time=31.489..2589363.047 rows=8586466 loops=1)
>> >         ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
>> > (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
>> > loops=1)
>> >         ->  Index Scan using openact_emcamp_idx on openactivity o
>> > (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
>> > rows=17926 loops=479)
>> >               Index Cond: (o.emailcampaignid = s.emailcampaignid)
>> >  Total runtime: 2592284.336 ms
>> >
>> >
>> > On Wed, 2011-06-08 at 17:31 +0200, tv@fuzzy.cz wrote:
>> >> > On Postgres, this same query takes about 58 minutes (could not run
>> >> > explain analyze because it is in progress):
>> >> >
>> >> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
>> >> > pg_dw-# as
>> >> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
>> >> > pg_dw-#   from openactivity o,ecr_sents s
>> >> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
>> >> > pg_dw-#  group by o.emailcampaignid;
>> >> >                                                  QUERY
>> >> > PLAN
>> >> > -------------------------------------------------------------------------------------------------------------
>> >> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
>> >> >    ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>> >> >          ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
>> >> > (cost=0.00..38.59 rows=479 width=4)
>> >> >          ->  Index Scan using openact_emcamp_idx on openactivity o
>> >> > (cost=0.00..3395.49 rows=19372 width=12)
>> >> >                Index Cond: (o.emailcampaignid = s.emailcampaignid)
>> >> > (5 rows)
>> >> >
>> >>
>> >> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
>> >> explain.depesz.com.
>> >>
>> >> regards
>> >> Tomas
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-performance
>> >
>>
>
>
>

Re: Oracle v. Postgres 9.0 query performance

From

Tony Capobianco

Date:

08 June 2011, 16:03:08

My current setting is 22G.  According to some documentation, I want to
set effective_cache_size to my OS disk cache + shared_buffers.  In this
case, I have 4 quad-core processors with 512K cache (8G) and my
shared_buffers is 7680M.  Therefore my effective_cache_size should be
approximately 16G?  Most of our other etl processes are running fine,
however I'm curious if I could see a significant performance boost by
reducing the effective_cache_size.

On Wed, 2011-06-08 at 13:03 -0400, Tom Lane wrote:
> Tony Capobianco <tcapobianco@prospectiv.com> writes:
> > Well, this ran much better.  However, I'm not sure if it's because of
> > set enable_nestloop = 0, or because I'm executing the query twice in a
> > row, where previous results may be cached.  I will try this setting in
> > my code for when this process runs later today and see what the result
> > is.
>
> If the performance differential holds up, you should look at adjusting
> your cost parameters so that the planner isn't so wrong about which one
> is faster.  Hacking enable_nestloop is a band-aid, not something you
> want to use in production.
>
> Looking at the values you gave earlier, I wonder whether the
> effective_cache_size setting isn't unreasonably high.  That's reducing
> the estimated cost of accessing the large table via indexscans, and
> I'm thinking it reduced it too much.
>
>             regards, tom lane
>

Re: Oracle v. Postgres 9.0 query performance

From

"Kevin Grittner"

Date:

08 June 2011, 16:31:17

Tony Capobianco <tcapobianco@prospectiv.com> wrote:

> According to some documentation, I want to set
> effective_cache_size to my OS disk cache + shared_buffers.

That seems reasonable, and is what has worked well for me.

> In this case, I have 4 quad-core processors with 512K cache (8G)
> and my shared_buffers is 7680M.  Therefore my effective_cache_size
> should be approximately 16G?

I didn't follow that at all.  Can you run `free` or `vmstat`?  If
so, go by what those say your cache size is.

> Most of our other etl processes are running fine, however I'm
> curious if I could see a significant performance boost by reducing
> the effective_cache_size.

Since it is an optimizer costing parameter and has no affect on
memory allocation, you can set it on a connection and run a query on
that connection to test the impact.  Why wonder about it when you
can easily test it?

-Kevin

Re: Oracle v. Postgres 9.0 query performance

From

Samuel Gendler

Date:

08 June 2011, 16:39:09

On Wed, Jun 8, 2011 at 12:03 PM, Tony Capobianco <tcapobianco@prospectiv.com> wrote:

My current setting is 22G. According to some documentation, I want to
set effective_cache_size to my OS disk cache + shared_buffers. In this
case, I have 4 quad-core processors with 512K cache (8G) and my
shared_buffers is 7680M. Therefore my effective_cache_size should be
approximately 16G? Most of our other etl processes are running fine,
however I'm curious if I could see a significant performance boost by
reducing the effective_cache_size.

disk cache, not CPU memory cache. It will be some significant fraction of total RAM on the host. Incidentally, 16 * 512K cache = 8MB, not 8GB.

http://en.wikipedia.org/wiki/CPU_cache

Re: Oracle v. Postgres 9.0 query performance

From

Tony Capobianco

Date:

08 June 2011, 16:55:26

Oooo...some bad math there.  Thanks.

On Wed, 2011-06-08 at 12:38 -0700, Samuel Gendler wrote:
>
>
> On Wed, Jun 8, 2011 at 12:03 PM, Tony Capobianco
> <tcapobianco@prospectiv.com> wrote:
>         My current setting is 22G.  According to some documentation, I
>         want to
>         set effective_cache_size to my OS disk cache +
>         shared_buffers.  In this
>         case, I have 4 quad-core processors with 512K cache (8G) and
>         my
>         shared_buffers is 7680M.  Therefore my effective_cache_size
>         should be
>         approximately 16G?  Most of our other etl processes are
>         running fine,
>         however I'm curious if I could see a significant performance
>         boost by
>         reducing the effective_cache_size.
>
>
>
>
>
> disk cache, not CPU memory cache.  It will be some significant
> fraction of total RAM on the host.  Incidentally, 16 * 512K cache =
> 8MB, not 8GB.
>
>
> http://en.wikipedia.org/wiki/CPU_cache
>
>
>
>

Re: Oracle v. Postgres 9.0 query performance

From

Tatsuo Ishii

Date:

09 June 2011, 23:21:10

> * Tony Capobianco (tcapobianco@prospectiv.com) wrote:
>>  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
>>    ->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
>>          Hash Cond: (o.emailcampaignid = s.emailcampaignid)
>>          ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
>> rows=192540967 width=12)
>>          ->  Hash  (cost=8.79..8.79 rows=479 width=4)
>>                ->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
>> width=4)
>>
>> Yikes.  Two sequential scans.
>
> Err, isn't that more-or-less exactly what you want here?  The smaller
> table is going to be hashed and then you'll traverse the bigger table
> and bounce each row off the hash table.  Have you tried actually running
> this and seeing how long it takes?  The bigger table doesn't look to be
> *that* big, if your i/o subsystem is decent and you've got a lot of
> memory available for kernel cacheing, should be quick.

Just out of curiosity, is there any chance that this kind of query is
speeding up in 9.1 because of following changes?

     * Allow FULL OUTER JOIN to be implemented as a hash join, and allow
       either side of a LEFT OUTER JOIN or RIGHT OUTER JOIN to be hashed
       (Tom Lane)
       Previously FULL OUTER JOIN could only be implemented as a merge
       join, and LEFT OUTER JOIN and RIGHT OUTER JOIN could hash only the
       nullable side of the join. These changes provide additional query
       optimization possibilities.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: Oracle v. Postgres 9.0 query performance

From

Tom Lane

Date:

09 June 2011, 23:25:32

Tatsuo Ishii <ishii@postgresql.org> writes:
> Just out of curiosity, is there any chance that this kind of query is
> speeding up in 9.1 because of following changes?

>      * Allow FULL OUTER JOIN to be implemented as a hash join, and allow
>        either side of a LEFT OUTER JOIN or RIGHT OUTER JOIN to be hashed
>        (Tom Lane)

The given query wasn't an outer join, so this wouldn't affect it.

            regards, tom lane