Re: Oracle v. Postgres 9.0 query performance - Mailing list pgsql-performance

From Vitalii Tymchyshyn
Subject Re: Oracle v. Postgres 9.0 query performance
Date
Msg-id 4DEF9AA5.2000803@gmail.com
Whole thread Raw
In response to Re: Oracle v. Postgres 9.0 query performance  (Tony Capobianco <tcapobianco@prospectiv.com>)
List pgsql-performance
08.06.11 18:40, Tony Capobianco написав(ла):
> pg_dw=# set enable_nestloop =0;
> SET
> Time: 0.165 ms
> pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> pg_dw-# as
> pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> pg_dw-#   from openactivity o,ecr_sents s
> pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> pg_dw-#  group by o.emailcampaignid;
>                                         QUERY
> PLAN
> -----------------------------------------------------------------------------------------
>   HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
>     ->   Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
>           Hash Cond: (o.emailcampaignid = s.emailcampaignid)
>           ->   Seq Scan on openactivity o  (cost=0.00..3529930.67
> rows=192540967 width=12)
>           ->   Hash  (cost=8.79..8.79 rows=479 width=4)
>                 ->   Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
> width=4)
>
> Yikes.  Two sequential scans.

Yep. Can you see another options? Either you take each of 479 records
and try to find matching records in another table using index (first
plan), or you take both two tables fully (seq scan) and join - second plan.
First plan is better if your large table is clustered enough on
emailcampaignid field (479 index reads and 479 sequential table reads).
If it's not, you may get a 479 table reads transformed into a lot or
random reads.
BTW: May be you have different data clustering in PostgreSQL & Oracle?
Or data in Oracle may be "hot" in caches?
Also, sequential scan is not too bad thing. It may be cheap enough to
read millions of records if they are not too wide. Please show "select
pg_size_pretty(pg_relation_size('openactivity'));" Have you tried to
explain analyze second plan?

Best regards, Vitalii Tymchyshyn


>
> On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote:
>> Tony Capobianco<tcapobianco@prospectiv.com>  writes:
>>> pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
>>> pg_dw-# as
>>> pg_dw-# select o.emailcampaignid, count(memberid) opencnt
>>> pg_dw-#   from openactivity o,ecr_sents s
>>> pg_dw-#  where s.emailcampaignid = o.emailcampaignid
>>> pg_dw-#  group by o.emailcampaignid;
>>>                                                   QUERY
>>> PLAN
>>> -------------------------------------------------------------------------------------------------------------
>>>   GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
>>>     ->   Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>>>           ->   Index Scan using ecr_sents_ecid_idx on ecr_sents s
>>> (cost=0.00..38.59 rows=479 width=4)
>>>           ->   Index Scan using openact_emcamp_idx on openactivity o
>>> (cost=0.00..3395.49 rows=19372 width=12)
>>>                 Index Cond: (o.emailcampaignid = s.emailcampaignid)
>>> (5 rows)
>>> Should this query be hashing the smaller table on Postgres rather than
>>> using nested loops?
>> Yeah, seems like it.  Just for testing purposes, do "set enable_nestloop
>> = 0" and see what plan you get then.


pgsql-performance by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Oracle v. Postgres 9.0 query performance
Next
From: Tony Capobianco
Date:
Subject: Re: Oracle v. Postgres 9.0 query performance