Thread: postgres 9 query performance
I am evaluating postgres 9 to migrate away from Oracle. The following query runs too slow, also please find the explain plan:
****************************************************************
explain analyze select DISTINCT EVENT.ID, ORIGIN.ID AS ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN,
EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE, ORIGIN.DEPTH,ORIGIN.EVTYPE,
ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR, ORIGIN.CONTRIBUTOR OCONTRIBUTOR,MAGNITUDE.ID AS MAGID,
MAGNITUDE.MAGNITUDE,MAGNITUDE.TYPE AS MAGTYPE
from event.event left join event.origin on event.id=origin.eventid left join event.magnitude on origin.id=event.magnitude.origin_id
WHERE EXISTS(select origin_id from event.magnitude where magnitude.magnitude>=7.2 and origin.id=origin_id)
"Unique (cost=740549.86..741151.42 rows=15039 width=80) (actual time=17791.557..17799.092 rows=5517 loops=1)"
" -> Sort (cost=740549.86..740587.45 rows=15039 width=80) (actual time=17791.556..17792.220 rows=5517 loops=1)"
" Sort Key: origin."time", event.magnitude.magnitude, event.id, event.preferred_origin_id, origin.id, event.contributor, origin.latitude, origin.longitude, origin.depth, origin.evtype, origin.catalog, origin.author, origin.contributor, event.magnitude.id, event.magnitude.type"
" Sort Method: quicksort Memory: 968kB"
" -> Nested Loop Left Join (cost=34642.50..739506.42 rows=15039 width=80) (actual time=6.927..17769.788 rows=5517 loops=1)"
" -> Hash Semi Join (cost=34642.50..723750.23 rows=14382 width=62) (actual time=6.912..17744.858 rows=2246 loops=1)"
" Hash Cond: (origin.id = event.magnitude.origin_id)"
" -> Merge Left Join (cost=0.00..641544.72 rows=6133105 width=62) (actual time=0.036..16221.008 rows=6133105 loops=1)"
" Merge Cond: (event.id = origin.eventid)"
" -> Index Scan using event_key_index on event (cost=0.00..163046.53 rows=3272228 width=12) (actual time=0.017..1243.616 rows=3276192 loops=1)"
" -> Index Scan using origin_fk_index on origin (cost=0.00..393653.81 rows=6133105 width=54) (actual time=0.013..3033.657 rows=6133105 loops=1)"
" -> Hash (cost=34462.73..34462.73 rows=14382 width=4) (actual time=6.668..6.668 rows=3198 loops=1)"
" Buckets: 2048 Batches: 1 Memory Usage: 113kB"
" -> Bitmap Heap Scan on magnitude (cost=324.65..34462.73 rows=14382 width=4) (actual time=1.682..5.414 rows=3198 loops=1)"
" Recheck Cond: (magnitude >= 7.2)"
" -> Bitmap Index Scan on mag_index (cost=0.00..321.05 rows=14382 width=0) (actual time=1.331..1.331 rows=3198 loops=1)"
" Index Cond: (magnitude >= 7.2)"
" -> Index Scan using mag_fkey_index on magnitude (cost=0.00..1.06 rows=3 width=22) (actual time=0.007..0.009 rows=2 loops=2246)"
" Index Cond: (origin.id = event.magnitude.origin_id)"
"Total runtime: 17799.669 ms"
****************************************************************
This query runs in Oracle in 1 second while takes 16 seconds in postgres, The difference tells me that I am doing something wrong somewhere. This is a new installation on a local Mac machine with 12G of RAM.
I have:
effective_cache_size=4096MB
shared_buffer=2048MB
work_mem=100MB
On Fri, Jan 28, 2011 at 10:30 AM, yazan suleiman <yazan.suleiman@gmail.com> wrote: > I am evaluating postgres 9 to migrate away from Oracle. The following query > runs too slow, also please find the explain plan: > **************************************************************** > explain analyze select DISTINCT EVENT.ID, ORIGIN.ID AS > ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN, > EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE, > ORIGIN.DEPTH,ORIGIN.EVTYPE, > "Total runtime: 17799.669 ms" > **************************************************************** > This query runs in Oracle in 1 second while takes 16 seconds in postgres, > The difference tells me that I am doing something wrong somewhere. This is > a new installation on a local Mac machine with 12G of RAM. Try turning it into a group by instead of a distinct. i.e. select a,b,c,d from xyz group by a,b,c,d and see if it's faster. There is some poor performance on large data sets for distinct. Don't know if they got fixed in 9.0 or not, if not then definitely try a group by and see.
On Fri, Jan 28, 2011 at 09:30:19AM -0800, yazan suleiman wrote: > I am evaluating postgres 9 to migrate away from Oracle. The following query > runs too slow, also please find the explain plan: > > **************************************************************** > explain analyze select DISTINCT EVENT.ID, ORIGIN.ID AS > ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN, > EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE, > ORIGIN.DEPTH,ORIGIN.EVTYPE, > ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR, ORIGIN.CONTRIBUTOR OCONTRIBUTOR, > MAGNITUDE.ID AS MAGID, > MAGNITUDE.MAGNITUDE,MAGNITUDE.TYPE AS MAGTYPE > from event.event left join event.origin on event.id=origin.eventid left join > event.magnitude on origin.id=event.magnitude.origin_id > WHERE EXISTS(select origin_id from event.magnitude where > magnitude.magnitude>=7.2 and origin.id=origin_id) > order by ORIGIN.TIME desc,MAGNITUDE.MAGNITUDE desc,EVENT.ID > ,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID > > > "Unique (cost=740549.86..741151.42 rows=15039 width=80) (actual > time=17791.557..17799.092 rows=5517 loops=1)" > " -> Sort (cost=740549.86..740587.45 rows=15039 width=80) (actual > time=17791.556..17792.220 rows=5517 loops=1)" > " Sort Key: origin."time", event.magnitude.magnitude, event.id, > event.preferred_origin_id, origin.id, event.contributor, origin.latitude, > origin.longitude, origin.depth, origin.evtype, origin.catalog, > origin.author, origin.contributor, event.magnitude.id, event.magnitude.type" > " Sort Method: quicksort Memory: 968kB" > " -> Nested Loop Left Join (cost=34642.50..739506.42 rows=15039 > width=80) (actual time=6.927..17769.788 rows=5517 loops=1)" > " -> Hash Semi Join (cost=34642.50..723750.23 rows=14382 > width=62) (actual time=6.912..17744.858 rows=2246 loops=1)" > " Hash Cond: (origin.id = event.magnitude.origin_id)" > " -> Merge Left Join (cost=0.00..641544.72 rows=6133105 > width=62) (actual time=0.036..16221.008 rows=6133105 loops=1)" > " Merge Cond: (event.id = origin.eventid)" > " -> Index Scan using event_key_index on event > (cost=0.00..163046.53 rows=3272228 width=12) (actual time=0.017..1243.616 > rows=3276192 loops=1)" > " -> Index Scan using origin_fk_index on origin > (cost=0.00..393653.81 rows=6133105 width=54) (actual time=0.013..3033.657 > rows=6133105 loops=1)" > " -> Hash (cost=34462.73..34462.73 rows=14382 width=4) > (actual time=6.668..6.668 rows=3198 loops=1)" > " Buckets: 2048 Batches: 1 Memory Usage: 113kB" > " -> Bitmap Heap Scan on magnitude > (cost=324.65..34462.73 rows=14382 width=4) (actual time=1.682..5.414 > rows=3198 loops=1)" > " Recheck Cond: (magnitude >= 7.2)" > " -> Bitmap Index Scan on mag_index > (cost=0.00..321.05 rows=14382 width=0) (actual time=1.331..1.331 rows=3198 > loops=1)" > " Index Cond: (magnitude >= 7.2)" > " -> Index Scan using mag_fkey_index on magnitude > (cost=0.00..1.06 rows=3 width=22) (actual time=0.007..0.009 rows=2 > loops=2246)" > " Index Cond: (origin.id = event.magnitude.origin_id)" > "Total runtime: 17799.669 ms" > **************************************************************** > > This query runs in Oracle in 1 second while takes 16 seconds in postgres, > The difference tells me that I am doing something wrong somewhere. This is > a new installation on a local Mac machine with 12G of RAM. > > I have: > effective_cache_size=4096MB > shared_buffer=2048MB > work_mem=100MB It sounds like the queries are not doing the same thing. What is the schema/index definition for Oracle versus PostgreSQL? Ken
On Friday, January 28, 2011 06:30:19 PM yazan suleiman wrote: > I am evaluating postgres 9 to migrate away from Oracle. The following > query runs too slow, also please find the explain plan: First: explain analyze SELECT DISTINCT EVENT.ID ,ORIGIN.ID AS ORIGINID ,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN ,EVENT.CONTRIBUTOR ,ORIGIN.TIME ,ORIGIN.LATITUDE ,ORIGIN.LONGITUDE ,ORIGIN.DEPTH ,ORIGIN.EVTYPE ,ORIGIN.CATALOG ,ORIGIN.AUTHOR OAUTHOR ,ORIGIN.CONTRIBUTOR OCONTRIBUTOR ,MAGNITUDE.ID AS MAGID ,MAGNITUDE.MAGNITUDE ,MAGNITUDE.TYPE AS MAGTYPE FROM event.event left join event.origin on event.id = origin.eventid left join event.magnitude on origin.id = event.magnitude.origin_id WHERE EXISTS( select origin_id from event.magnitude where magnitude.magnitude >= 7.2 and origin.id = origin_id ) order by ORIGIN.TIME desc ,MAGNITUDE.MAGNITUDE desc ,EVENT.ID ,EVENT.PREFERRED_ORIGIN_ID ,ORIGIN.ID I am honestly stumped if anybody can figure something sensible out of the original formatting of the query... What happens if you change the left join event.origin on event.id = origin.eventid into join event.origin on event.id = origin.eventid ? The EXISTS() requires that origin is not null anyway. (Not sure why the planner doesn't recognize that though). Andres
They have the same indexes, foreign keys are indexed in addition to the search values like magnitude. Distinct does nothing to speed up the query. If I remove the select in the where clause the time goes down to 98 ms:
select DISTINCT EVENT.ID, ORIGIN.ID AS ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN, EVENT.CONTRIBUTOR,
ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE, ORIGIN.DEPTH,ORIGIN.EVTYPE, ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR,
ORIGIN.CONTRIBUTOR OCONTRIBUTOR,MAGNITUDE.ID AS MAGID,MAGNITUDE.MAGNITUDE,MAGNITUDE.TYPE AS MAGTYPE
from event.event left join event.origin on event.id=origin.eventid left join event.magnitude on origin.id=event.magnitude.origin_id
WHERE magnitude.magnitude>=7.2 order by ORIGIN.TIME desc,MAGNITUDE.MAGNITUDE desc,EVENT.ID,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID
The new query returns 4000 rows, so the result is still big. I am not sure if I am answering your question, but I don't have access to generate ddl from Oracle. Thanks for the reply.
select DISTINCT EVENT.ID, ORIGIN.ID AS ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN, EVENT.CONTRIBUTOR,
ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE, ORIGIN.DEPTH,ORIGIN.EVTYPE, ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR,
ORIGIN.CONTRIBUTOR OCONTRIBUTOR,MAGNITUDE.ID AS MAGID,MAGNITUDE.MAGNITUDE,MAGNITUDE.TYPE AS MAGTYPE
from event.event left join event.origin on event.id=origin.eventid left join event.magnitude on origin.id=event.magnitude.origin_id
WHERE magnitude.magnitude>=7.2 order by ORIGIN.TIME desc,MAGNITUDE.MAGNITUDE desc,EVENT.ID,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID
The new query returns 4000 rows, so the result is still big. I am not sure if I am answering your question, but I don't have access to generate ddl from Oracle. Thanks for the reply.
On Fri, Jan 28, 2011 at 12:50 PM, Kenneth Marshall <ktm@rice.edu> wrote:
It sounds like the queries are not doing the same thing. What isOn Fri, Jan 28, 2011 at 09:30:19AM -0800, yazan suleiman wrote:
> I am evaluating postgres 9 to migrate away from Oracle. The following query
> runs too slow, also please find the explain plan:
>
> ****************************************************************
> explain analyze select DISTINCT EVENT.ID, ORIGIN.ID AS
> ORIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN,
> EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE,
> ORIGIN.DEPTH,ORIGIN.EVTYPE,
> ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR, ORIGIN.CONTRIBUTOR OCONTRIBUTOR,
> MAGNITUDE.ID AS MAGID,
> MAGNITUDE.MAGNITUDE,MAGNITUDE.TYPE AS MAGTYPE
> from event.event left join event.origin on event.id=origin.eventid left join
> event.magnitude on origin.id=event.magnitude.origin_id
> WHERE EXISTS(select origin_id from event.magnitude where
> magnitude.magnitude>=7.2 and origin.id=origin_id)
> order by ORIGIN.TIME desc,MAGNITUDE.MAGNITUDE desc,EVENT.ID
> ,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID
>
>
> "Unique (cost=740549.86..741151.42 rows=15039 width=80) (actual
> time=17791.557..17799.092 rows=5517 loops=1)"
> " -> Sort (cost=740549.86..740587.45 rows=15039 width=80) (actual
> time=17791.556..17792.220 rows=5517 loops=1)"
> " Sort Key: origin."time", event.magnitude.magnitude, event.id,
> event.preferred_origin_id, origin.id, event.contributor, origin.latitude,
> origin.longitude, origin.depth, origin.evtype, origin.catalog,
> origin.author, origin.contributor, event.magnitude.id, event.magnitude.type"
> " Sort Method: quicksort Memory: 968kB"
> " -> Nested Loop Left Join (cost=34642.50..739506.42 rows=15039
> width=80) (actual time=6.927..17769.788 rows=5517 loops=1)"
> " -> Hash Semi Join (cost=34642.50..723750.23 rows=14382
> width=62) (actual time=6.912..17744.858 rows=2246 loops=1)"
> " Hash Cond: (origin.id = event.magnitude.origin_id)"
> " -> Merge Left Join (cost=0.00..641544.72 rows=6133105
> width=62) (actual time=0.036..16221.008 rows=6133105 loops=1)"
> " Merge Cond: (event.id = origin.eventid)"
> " -> Index Scan using event_key_index on event
> (cost=0.00..163046.53 rows=3272228 width=12) (actual time=0.017..1243.616
> rows=3276192 loops=1)"
> " -> Index Scan using origin_fk_index on origin
> (cost=0.00..393653.81 rows=6133105 width=54) (actual time=0.013..3033.657
> rows=6133105 loops=1)"
> " -> Hash (cost=34462.73..34462.73 rows=14382 width=4)
> (actual time=6.668..6.668 rows=3198 loops=1)"
> " Buckets: 2048 Batches: 1 Memory Usage: 113kB"
> " -> Bitmap Heap Scan on magnitude
> (cost=324.65..34462.73 rows=14382 width=4) (actual time=1.682..5.414
> rows=3198 loops=1)"
> " Recheck Cond: (magnitude >= 7.2)"
> " -> Bitmap Index Scan on mag_index
> (cost=0.00..321.05 rows=14382 width=0) (actual time=1.331..1.331 rows=3198
> loops=1)"
> " Index Cond: (magnitude >= 7.2)"
> " -> Index Scan using mag_fkey_index on magnitude
> (cost=0.00..1.06 rows=3 width=22) (actual time=0.007..0.009 rows=2
> loops=2246)"
> " Index Cond: (origin.id = event.magnitude.origin_id)"
> "Total runtime: 17799.669 ms"
> ****************************************************************
>
> This query runs in Oracle in 1 second while takes 16 seconds in postgres,
> The difference tells me that I am doing something wrong somewhere. This is
> a new installation on a local Mac machine with 12G of RAM.
>
> I have:
> effective_cache_size=4096MB
> shared_buffer=2048MB
> work_mem=100MB
the schema/index definition for Oracle versus PostgreSQL?
Ken
OK, that did it. Time is now 315 ms. I am so exited working with postgres. I really apologize for the format, my first time posting on the list. That does not justify it though. Really thanks.
On Fri, Jan 28, 2011 at 1:19 PM, Andres Freund <andres@anarazel.de> wrote:
On Friday, January 28, 2011 06:30:19 PM yazan suleiman wrote:First:
> I am evaluating postgres 9 to migrate away from Oracle. The following
> query runs too slow, also please find the explain plan:
explain analyze
SELECT DISTINCTEVENT.IDFROM
,ORIGIN.ID AS ORIGINID
,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN
,EVENT.CONTRIBUTOR
,ORIGIN.TIME
,ORIGIN.LATITUDE
,ORIGIN.LONGITUDE
,ORIGIN.DEPTH
,ORIGIN.EVTYPE
,ORIGIN.CATALOG
,ORIGIN.AUTHOR OAUTHOR
,ORIGIN.CONTRIBUTOR OCONTRIBUTOR
,MAGNITUDE.ID AS MAGID
,MAGNITUDE.MAGNITUDE
,MAGNITUDE.TYPE AS MAGTYPEevent.eventI am honestly stumped if anybody can figure something sensible out of the
left join event.origin on event.id = origin.eventid
left join event.magnitude on origin.id = event.magnitude.origin_id
WHERE
EXISTS(
select origin_id
from event.magnitude
where magnitude.magnitude >= 7.2 and origin.id = origin_id
)
order by
ORIGIN.TIME desc
,MAGNITUDE.MAGNITUDE desc
,EVENT.ID
,EVENT.PREFERRED_ORIGIN_ID
,ORIGIN.ID
original formatting of the query...
What happens if you change theleft join event.origin on event.id = origin.eventidintojoin event.origin on event.id = origin.eventid?
The EXISTS() requires that origin is not null anyway. (Not sure why the
planner doesn't recognize that though).
Andres
Andres Freund <andres@anarazel.de> writes: > What happens if you change the > left join event.origin on event.id = origin.eventid > into > join event.origin on event.id = origin.eventid > ? > The EXISTS() requires that origin is not null anyway. (Not sure why the > planner doesn't recognize that though). Sloppy thinking in reduce_outer_joins() is why. Fixed now: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=1df57f63f3f60c684aa8918910ac410e9c780713 regards, tom lane
On Sunday 30 January 2011 23:18:15 Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > What happens if you change the > > > > left join event.origin on event.id = origin.eventid > > > > into > > > > join event.origin on event.id = origin.eventid > > > > ? > > > > The EXISTS() requires that origin is not null anyway. (Not sure why the > > planner doesn't recognize that though). > > Sloppy thinking in reduce_outer_joins() is why. Wow. Nice one, thanks. Andres
On Sun, Jan 30, 2011 at 05:18:15PM -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > What happens if you change the > > left join event.origin on event.id = origin.eventid > > into > > join event.origin on event.id = origin.eventid > > ? > > > The EXISTS() requires that origin is not null anyway. (Not sure why the > > planner doesn't recognize that though). > > Sloppy thinking in reduce_outer_joins() is why. Fixed now: > http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=1df57f63f3f60c684aa8918910ac410e9c780713 > > regards, tom lane This is one of the reasons I love open source in general, and PostgreSQL in particular: Tom has the bandwidth to notice these kinds of workarounds being discussed on support lists, and turn them immediately into improvements in the planner. Partly because (I assume, based on the commit message) Andres's parenthetical comment red-flagged it for him, since he knew he could trust Andres's opinion that there was probably a planner improvement hiding here. Amazing! Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 Connexions http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE