Thread: postgres 9 query performance

postgres 9 query performance

From

yazan suleiman

Date:

28 January 2011, 19:09:38

I am evaluating postgres 9 to migrate away from Oracle. The following query runs too slow, also please find the explain plan:

****************************************************************

explain analyze select DISTINCT EVENT.ID, ORIGIN.ID AS ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN,

EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE, ORIGIN.DEPTH,ORIGIN.EVTYPE,

ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR, ORIGIN.CONTRIBUTOR OCONTRIBUTOR,MAGNITUDE.ID AS MAGID,

MAGNITUDE.MAGNITUDE,MAGNITUDE.TYPE AS MAGTYPE

from event.event left join event.origin on event.id=origin.eventid left join event.magnitude on origin.id=event.magnitude.origin_id

WHERE EXISTS(select origin_id from event.magnitude where magnitude.magnitude>=7.2 and origin.id=origin_id)

order by ORIGIN.TIME desc,MAGNITUDE.MAGNITUDE desc,EVENT.ID,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID

"Unique (cost=740549.86..741151.42 rows=15039 width=80) (actual time=17791.557..17799.092 rows=5517 loops=1)"

" -> Sort (cost=740549.86..740587.45 rows=15039 width=80) (actual time=17791.556..17792.220 rows=5517 loops=1)"

" Sort Key: origin."time", event.magnitude.magnitude, event.id, event.preferred_origin_id, origin.id, event.contributor, origin.latitude, origin.longitude, origin.depth, origin.evtype, origin.catalog, origin.author, origin.contributor, event.magnitude.id, event.magnitude.type"

" Sort Method: quicksort Memory: 968kB"

" -> Nested Loop Left Join (cost=34642.50..739506.42 rows=15039 width=80) (actual time=6.927..17769.788 rows=5517 loops=1)"

" -> Hash Semi Join (cost=34642.50..723750.23 rows=14382 width=62) (actual time=6.912..17744.858 rows=2246 loops=1)"

" Hash Cond: (origin.id = event.magnitude.origin_id)"

" -> Merge Left Join (cost=0.00..641544.72 rows=6133105 width=62) (actual time=0.036..16221.008 rows=6133105 loops=1)"

" Merge Cond: (event.id = origin.eventid)"

" -> Index Scan using event_key_index on event (cost=0.00..163046.53 rows=3272228 width=12) (actual time=0.017..1243.616 rows=3276192 loops=1)"

" -> Index Scan using origin_fk_index on origin (cost=0.00..393653.81 rows=6133105 width=54) (actual time=0.013..3033.657 rows=6133105 loops=1)"

" -> Hash (cost=34462.73..34462.73 rows=14382 width=4) (actual time=6.668..6.668 rows=3198 loops=1)"

" Buckets: 2048 Batches: 1 Memory Usage: 113kB"

" -> Bitmap Heap Scan on magnitude (cost=324.65..34462.73 rows=14382 width=4) (actual time=1.682..5.414 rows=3198 loops=1)"

" Recheck Cond: (magnitude >= 7.2)"

" -> Bitmap Index Scan on mag_index (cost=0.00..321.05 rows=14382 width=0) (actual time=1.331..1.331 rows=3198 loops=1)"

" Index Cond: (magnitude >= 7.2)"

" -> Index Scan using mag_fkey_index on magnitude (cost=0.00..1.06 rows=3 width=22) (actual time=0.007..0.009 rows=2 loops=2246)"

" Index Cond: (origin.id = event.magnitude.origin_id)"

"Total runtime: 17799.669 ms"

****************************************************************

This query runs in Oracle in 1 second while takes 16 seconds in postgres, The difference tells me that I am doing something wrong somewhere. This is a new installation on a local Mac machine with 12G of RAM.

I have:

effective_cache_size=4096MB

shared_buffer=2048MB

work_mem=100MB

Re: postgres 9 query performance

From

Scott Marlowe

Date:

28 January 2011, 19:34:07

On Fri, Jan 28, 2011 at 10:30 AM, yazan suleiman
<yazan.suleiman@gmail.com> wrote:
> I am evaluating postgres 9 to migrate away from Oracle.  The following query
> runs too slow, also please find the explain plan:
> ****************************************************************
> explain analyze select DISTINCT EVENT.ID, ORIGIN.ID AS
> ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN,
> EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE,
> ORIGIN.DEPTH,ORIGIN.EVTYPE,
> "Total runtime: 17799.669 ms"
> ****************************************************************
> This query runs in Oracle in 1 second while takes 16 seconds in postgres,
> The difference tells me that I am doing something wrong somewhere.  This is
> a new installation on a local Mac machine with 12G of RAM.

Try turning it into a group by instead of a distinct.  i.e.

select a,b,c,d from xyz group by a,b,c,d

and see if it's faster.  There is some poor performance on large data
sets for distinct.  Don't know if they got fixed in 9.0 or not, if not
then definitely try a group by and see.

Re: postgres 9 query performance

From

Kenneth Marshall

Date:

28 January 2011, 19:50:39

On Fri, Jan 28, 2011 at 09:30:19AM -0800, yazan suleiman wrote:
> I am evaluating postgres 9 to migrate away from Oracle.  The following query
> runs too slow, also please find the explain plan:
>
> ****************************************************************
> explain analyze select DISTINCT EVENT.ID, ORIGIN.ID AS
> ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN,
> EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE,
> ORIGIN.DEPTH,ORIGIN.EVTYPE,
> ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR, ORIGIN.CONTRIBUTOR OCONTRIBUTOR,
> MAGNITUDE.ID AS MAGID,
> MAGNITUDE.MAGNITUDE,MAGNITUDE.TYPE AS MAGTYPE
> from event.event left join event.origin on event.id=origin.eventid left join
> event.magnitude on origin.id=event.magnitude.origin_id
> WHERE EXISTS(select origin_id from event.magnitude where
>  magnitude.magnitude>=7.2 and origin.id=origin_id)
> order by ORIGIN.TIME desc,MAGNITUDE.MAGNITUDE desc,EVENT.ID
> ,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID
>
>
> "Unique  (cost=740549.86..741151.42 rows=15039 width=80) (actual
> time=17791.557..17799.092 rows=5517 loops=1)"
> "  ->  Sort  (cost=740549.86..740587.45 rows=15039 width=80) (actual
> time=17791.556..17792.220 rows=5517 loops=1)"
> "        Sort Key: origin."time", event.magnitude.magnitude, event.id,
> event.preferred_origin_id, origin.id, event.contributor, origin.latitude,
> origin.longitude, origin.depth, origin.evtype, origin.catalog,
> origin.author, origin.contributor, event.magnitude.id, event.magnitude.type"
> "        Sort Method:  quicksort  Memory: 968kB"
> "        ->  Nested Loop Left Join  (cost=34642.50..739506.42 rows=15039
> width=80) (actual time=6.927..17769.788 rows=5517 loops=1)"
> "              ->  Hash Semi Join  (cost=34642.50..723750.23 rows=14382
> width=62) (actual time=6.912..17744.858 rows=2246 loops=1)"
> "                    Hash Cond: (origin.id = event.magnitude.origin_id)"
> "                    ->  Merge Left Join  (cost=0.00..641544.72 rows=6133105
> width=62) (actual time=0.036..16221.008 rows=6133105 loops=1)"
> "                          Merge Cond: (event.id = origin.eventid)"
> "                          ->  Index Scan using event_key_index on event
>  (cost=0.00..163046.53 rows=3272228 width=12) (actual time=0.017..1243.616
> rows=3276192 loops=1)"
> "                          ->  Index Scan using origin_fk_index on origin
>  (cost=0.00..393653.81 rows=6133105 width=54) (actual time=0.013..3033.657
> rows=6133105 loops=1)"
> "                    ->  Hash  (cost=34462.73..34462.73 rows=14382 width=4)
> (actual time=6.668..6.668 rows=3198 loops=1)"
> "                          Buckets: 2048  Batches: 1  Memory Usage: 113kB"
> "                          ->  Bitmap Heap Scan on magnitude
>  (cost=324.65..34462.73 rows=14382 width=4) (actual time=1.682..5.414
> rows=3198 loops=1)"
> "                                Recheck Cond: (magnitude >= 7.2)"
> "                                ->  Bitmap Index Scan on mag_index
>  (cost=0.00..321.05 rows=14382 width=0) (actual time=1.331..1.331 rows=3198
> loops=1)"
> "                                      Index Cond: (magnitude >= 7.2)"
> "              ->  Index Scan using mag_fkey_index on magnitude
>  (cost=0.00..1.06 rows=3 width=22) (actual time=0.007..0.009 rows=2
> loops=2246)"
> "                    Index Cond: (origin.id = event.magnitude.origin_id)"
> "Total runtime: 17799.669 ms"
> ****************************************************************
>
> This query runs in Oracle in 1 second while takes 16 seconds in postgres,
> The difference tells me that I am doing something wrong somewhere.  This is
> a new installation on a local Mac machine with 12G of RAM.
>
> I have:
> effective_cache_size=4096MB
> shared_buffer=2048MB
> work_mem=100MB

It sounds like the queries are not doing the same thing. What is
the schema/index definition for Oracle versus PostgreSQL?

Ken

Re: postgres 9 query performance

From

Andres Freund

Date:

28 January 2011, 20:19:41

On Friday, January 28, 2011 06:30:19 PM yazan suleiman wrote:
> I am evaluating postgres 9 to migrate away from Oracle.  The following
> query runs too slow, also please find the explain plan:
First:

explain analyze
SELECT DISTINCT
    EVENT.ID
    ,ORIGIN.ID AS ORIGINID
    ,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN
    ,EVENT.CONTRIBUTOR
    ,ORIGIN.TIME
    ,ORIGIN.LATITUDE
    ,ORIGIN.LONGITUDE
    ,ORIGIN.DEPTH
    ,ORIGIN.EVTYPE
    ,ORIGIN.CATALOG
    ,ORIGIN.AUTHOR OAUTHOR
    ,ORIGIN.CONTRIBUTOR OCONTRIBUTOR
    ,MAGNITUDE.ID AS MAGID
    ,MAGNITUDE.MAGNITUDE
    ,MAGNITUDE.TYPE AS MAGTYPE
FROM
    event.event
    left join event.origin on event.id = origin.eventid
    left join event.magnitude on origin.id = event.magnitude.origin_id
WHERE
    EXISTS(
        select origin_id
        from event.magnitude
        where magnitude.magnitude >= 7.2 and origin.id = origin_id
    )
order by
    ORIGIN.TIME desc
    ,MAGNITUDE.MAGNITUDE desc
    ,EVENT.ID
    ,EVENT.PREFERRED_ORIGIN_ID
    ,ORIGIN.ID

I am honestly stumped if anybody can figure something sensible out of the
original formatting of the query...

What happens if you change the
    left join event.origin on event.id = origin.eventid
into
    join event.origin on event.id = origin.eventid
?

The EXISTS() requires that origin is not null anyway. (Not sure why the
planner doesn't recognize that though).

Andres

Re: postgres 9 query performance

From

yazan suleiman

Date:

29 January 2011, 08:03:14

They have the same indexes, foreign keys are indexed in addition to the search values like magnitude. Distinct does nothing to speed up the query. If I remove the select in the where clause the time goes down to 98 ms:

select DISTINCT EVENT.ID, ORIGIN.ID AS ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN, EVENT.CONTRIBUTOR,
ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE, ORIGIN.DEPTH,ORIGIN.EVTYPE, ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR,
ORIGIN.CONTRIBUTOR OCONTRIBUTOR,MAGNITUDE.ID AS MAGID,MAGNITUDE.MAGNITUDE,MAGNITUDE.TYPE AS MAGTYPE
from event.event left join event.origin on event.id=origin.eventid left join event.magnitude on origin.id=event.magnitude.origin_id
WHERE magnitude.magnitude>=7.2 order by ORIGIN.TIME desc,MAGNITUDE.MAGNITUDE desc,EVENT.ID,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID

The new query returns 4000 rows, so the result is still big. I am not sure if I am answering your question, but I don't have access to generate ddl from Oracle. Thanks for the reply.

On Fri, Jan 28, 2011 at 12:50 PM, Kenneth Marshall <ktm@rice.edu> wrote:

On Fri, Jan 28, 2011 at 09:30:19AM -0800, yazan suleiman wrote:
> I am evaluating postgres 9 to migrate away from Oracle. The following query
> runs too slow, also please find the explain plan:
>
> ****************************************************************
> explain analyze select DISTINCT EVENT.ID, ORIGIN.ID AS
> ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN,
> EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE,
> ORIGIN.DEPTH,ORIGIN.EVTYPE,
> ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR, ORIGIN.CONTRIBUTOR OCONTRIBUTOR,
> MAGNITUDE.ID AS MAGID,
> MAGNITUDE.MAGNITUDE,MAGNITUDE.TYPE AS MAGTYPE
> from event.event left join event.origin on event.id=origin.eventid left join
> event.magnitude on origin.id=event.magnitude.origin_id
> WHERE EXISTS(select origin_id from event.magnitude where
> magnitude.magnitude>=7.2 and origin.id=origin_id)
> order by ORIGIN.TIME desc,MAGNITUDE.MAGNITUDE desc,EVENT.ID
> ,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID
>
>
> "Unique (cost=740549.86..741151.42 rows=15039 width=80) (actual
> time=17791.557..17799.092 rows=5517 loops=1)"
> " -> Sort (cost=740549.86..740587.45 rows=15039 width=80) (actual
> time=17791.556..17792.220 rows=5517 loops=1)"
> " Sort Key: origin."time", event.magnitude.magnitude, event.id,
> event.preferred_origin_id, origin.id, event.contributor, origin.latitude,
> origin.longitude, origin.depth, origin.evtype, origin.catalog,
> origin.author, origin.contributor, event.magnitude.id, event.magnitude.type"
> " Sort Method: quicksort Memory: 968kB"
> " -> Nested Loop Left Join (cost=34642.50..739506.42 rows=15039
> width=80) (actual time=6.927..17769.788 rows=5517 loops=1)"
> " -> Hash Semi Join (cost=34642.50..723750.23 rows=14382
> width=62) (actual time=6.912..17744.858 rows=2246 loops=1)"
> " Hash Cond: (origin.id = event.magnitude.origin_id)"
> " -> Merge Left Join (cost=0.00..641544.72 rows=6133105
> width=62) (actual time=0.036..16221.008 rows=6133105 loops=1)"
> " Merge Cond: (event.id = origin.eventid)"
> " -> Index Scan using event_key_index on event
> (cost=0.00..163046.53 rows=3272228 width=12) (actual time=0.017..1243.616
> rows=3276192 loops=1)"
> " -> Index Scan using origin_fk_index on origin
> (cost=0.00..393653.81 rows=6133105 width=54) (actual time=0.013..3033.657
> rows=6133105 loops=1)"
> " -> Hash (cost=34462.73..34462.73 rows=14382 width=4)
> (actual time=6.668..6.668 rows=3198 loops=1)"
> " Buckets: 2048 Batches: 1 Memory Usage: 113kB"
> " -> Bitmap Heap Scan on magnitude
> (cost=324.65..34462.73 rows=14382 width=4) (actual time=1.682..5.414
> rows=3198 loops=1)"
> " Recheck Cond: (magnitude >= 7.2)"
> " -> Bitmap Index Scan on mag_index
> (cost=0.00..321.05 rows=14382 width=0) (actual time=1.331..1.331 rows=3198
> loops=1)"
> " Index Cond: (magnitude >= 7.2)"
> " -> Index Scan using mag_fkey_index on magnitude
> (cost=0.00..1.06 rows=3 width=22) (actual time=0.007..0.009 rows=2
> loops=2246)"
> " Index Cond: (origin.id = event.magnitude.origin_id)"
> "Total runtime: 17799.669 ms"
> ****************************************************************
>
> This query runs in Oracle in 1 second while takes 16 seconds in postgres,
> The difference tells me that I am doing something wrong somewhere. This is
> a new installation on a local Mac machine with 12G of RAM.
>
> I have:
> effective_cache_size=4096MB
> shared_buffer=2048MB
> work_mem=100MB

It sounds like the queries are not doing the same thing. What is
the schema/index definition for Oracle versus PostgreSQL?

Ken

Re: postgres 9 query performance

From

yazan suleiman

Date:

29 January 2011, 08:03:16

OK, that did it. Time is now 315 ms. I am so exited working with postgres. I really apologize for the format, my first time posting on the list. That does not justify it though. Really thanks.

On Fri, Jan 28, 2011 at 1:19 PM, Andres Freund <andres@anarazel.de> wrote:

On Friday, January 28, 2011 06:30:19 PM yazan suleiman wrote:
> I am evaluating postgres 9 to migrate away from Oracle. The following
> query runs too slow, also please find the explain plan:
First:

explain analyze
SELECT DISTINCT
EVENT.ID
,ORIGIN.ID AS ORIGINID
,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN
,EVENT.CONTRIBUTOR
,ORIGIN.TIME
,ORIGIN.LATITUDE
,ORIGIN.LONGITUDE
,ORIGIN.DEPTH
,ORIGIN.EVTYPE
,ORIGIN.CATALOG
,ORIGIN.AUTHOR OAUTHOR
,ORIGIN.CONTRIBUTOR OCONTRIBUTOR
,MAGNITUDE.ID AS MAGID
,MAGNITUDE.MAGNITUDE
,MAGNITUDE.TYPE AS MAGTYPE
FROM
event.event
left join event.origin on event.id = origin.eventid
left join event.magnitude on origin.id = event.magnitude.origin_id
WHERE
EXISTS(
select origin_id
from event.magnitude
where magnitude.magnitude >= 7.2 and origin.id = origin_id
)
order by
ORIGIN.TIME desc
,MAGNITUDE.MAGNITUDE desc
,EVENT.ID
,EVENT.PREFERRED_ORIGIN_ID
,ORIGIN.ID

I am honestly stumped if anybody can figure something sensible out of the
original formatting of the query...

What happens if you change the
left join event.origin on event.id = origin.eventid
into
join event.origin on event.id = origin.eventid
?

The EXISTS() requires that origin is not null anyway. (Not sure why the
planner doesn't recognize that though).

Andres

Re: postgres 9 query performance

From

Tom Lane

Date:

30 January 2011, 21:22:57

Andres Freund <andres@anarazel.de> writes:
> What happens if you change the
>     left join event.origin on event.id = origin.eventid
> into
>     join event.origin on event.id = origin.eventid
> ?

> The EXISTS() requires that origin is not null anyway. (Not sure why the
> planner doesn't recognize that though).

Sloppy thinking in reduce_outer_joins() is why.  Fixed now:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=1df57f63f3f60c684aa8918910ac410e9c780713

            regards, tom lane

Re: postgres 9 query performance

From

Andres Freund

Date:

30 January 2011, 21:43:12

On Sunday 30 January 2011 23:18:15 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > What happens if you change the
> >
> >     left join event.origin on event.id = origin.eventid
> >
> > into
> >
> >     join event.origin on event.id = origin.eventid
> >
> > ?
> >
> > The EXISTS() requires that origin is not null anyway. (Not sure why the
> > planner doesn't recognize that though).
>
> Sloppy thinking in reduce_outer_joins() is why.
Wow. Nice one, thanks.

Andres

Re: postgres 9 query performance

From

"Ross J. Reedstrom"

Date:

01 February 2011, 16:24:52

On Sun, Jan 30, 2011 at 05:18:15PM -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > What happens if you change the
> >     left join event.origin on event.id = origin.eventid
> > into
> >     join event.origin on event.id = origin.eventid
> > ?
>
> > The EXISTS() requires that origin is not null anyway. (Not sure why the
> > planner doesn't recognize that though).
>
> Sloppy thinking in reduce_outer_joins() is why.  Fixed now:
> http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=1df57f63f3f60c684aa8918910ac410e9c780713
>
>             regards, tom lane

This is one of the reasons I love open source in general, and PostgreSQL
in particular: Tom has the bandwidth to notice these kinds of
workarounds being discussed on support lists, and turn them immediately
into improvements in the planner. Partly because (I assume, based on
the commit message) Andres's parenthetical comment red-flagged it for
him, since he knew he could trust Andres's opinion that there was
probably a planner improvement hiding here. Amazing!

Ross
--
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist        phone: 713-348-6166
Connexions                  http://cnx.org            fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE