Thread: Speed comparison to Oracle. Why was this query slower on pgsql?

Speed comparison to Oracle. Why was this query slower on pgsql?

From
Francisco Reyes
Date:
I have been instructed to "learn Oracle" by my boss.
among the firs things I am doing is trying to get a feeling for speed.
For instance I have a query like:



select field1, count(*) from mytable group by field1;

On postgresql it takes about 15 seconds while on Oracle it took about 10
seconds.

I thought maybe it was due to disks, but I did a couple of other queries
which would also do a table scan and postgresql actually seemed faster.
For instance I did another query like:

select field1, field2 from mytable where field1 between 1930 and
1940;

The both were very fast, but it seemed the PostgreSQL was slightly faster
every time.

Is Oracle better at aggregate functions?
Another point worth mentioning. The Oracle table has 40K records less.
Postgresql has 770K, Oracld has 730K.

I am pushing for using PostgreSQL for reporting and my boss seems to be
agreeing so far. So I hope we will continue using PostgreSQL no matter what, but I am trying
to learn what types of things each does better than the other.

Is there any resource which compares pgsql to other major DBMS in terms of
their strenghts and weaknesses?


Re: Speed comparison to Oracle. Why was this query slower on pgsql?

From
Martijn van Oosterhout
Date:
On Fri, Feb 22, 2002 at 01:53:40PM -0500, Francisco Reyes wrote:
> Is Oracle better at aggregate functions?
> Another point worth mentioning. The Oracle table has 40K records less.
> Postgresql has 770K, Oracld has 730K.

Posssibly. At the moment postgres will grab all the data from the table,
sort it, group it and then aggregate. Oracle is probably cleverer and just
aggregates directly. On that many rows it probably makes a difference.

HTH,
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> If the company that invents a cure for AIDS is expected to make their
> money back in 17 years, why can't we ask the same of the company that
> markets big-titted lip-syncing chicks and goddamn cartoon mice?

Re: Speed comparison to Oracle. Why was this query slower

From
Francisco Reyes
Date:
On Sat, 23 Feb 2002, Martijn van Oosterhout wrote:

> On Fri, Feb 22, 2002 at 01:53:40PM -0500, Francisco Reyes wrote:
> > Is Oracle better at aggregate functions?
> > Another point worth mentioning. The Oracle table has 40K records less.
> > Postgresql has 770K, Oracld has 730K.
>
> Posssibly. At the moment postgres will grab all the data from the table,
> sort it, group it and then aggregate. Oracle is probably cleverer and just
> aggregates directly. On that many rows it probably makes a difference.

How could it be done in a more clever fashion?
Or how would something be aggregated "directly"?


Re: Speed comparison to Oracle. Why was this query slower on pgsql?

From
Martijn van Oosterhout
Date:
On Fri, Feb 22, 2002 at 05:23:40PM -0500, Francisco Reyes wrote:
> On Sat, 23 Feb 2002, Martijn van Oosterhout wrote:
>
> > On Fri, Feb 22, 2002 at 01:53:40PM -0500, Francisco Reyes wrote:
> > > Is Oracle better at aggregate functions?
> > > Another point worth mentioning. The Oracle table has 40K records less.
> > > Postgresql has 770K, Oracld has 730K.
> >
> > Posssibly. At the moment postgres will grab all the data from the table,
> > sort it, group it and then aggregate. Oracle is probably cleverer and just
> > aggregates directly. On that many rows it probably makes a difference.
>
> How could it be done in a more clever fashion?
> Or how would something be aggregated "directly"?

By hashing. Get a hash table. For each row, hash the grouping rows to lookup
the intermediate aggregate stage to aggregate this row into. At the end, run
through your hash dumping the results.

The advantage is that instead of sorting all the data in memory or possibly
disk, you only have to do a sequential scan and store a structure
proportional to the number of *output* rows.

Where I work this would be cool because we routinely aggregate a million
rows into about 7 groups so this would really help.

I looked into this once but got stuck in the hashing step. The rest looked
ok.
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> If the company that invents a cure for AIDS is expected to make their
> money back in 17 years, why can't we ask the same of the company that
> markets big-titted lip-syncing chicks and goddamn cartoon mice?

Re: Speed comparison to Oracle. Why was this query slower on pgsql?

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Fri, Feb 22, 2002 at 05:23:40PM -0500, Francisco Reyes wrote:
>>> Is Oracle better at aggregate functions?
>> How could it be done in a more clever fashion?

> By hashing. Get a hash table. For each row, hash the grouping rows to lookup
> the intermediate aggregate stage to aggregate this row into. At the end, run
> through your hash dumping the results.

This is on our TODO list.  It'd be interesting to know whether that is
the source of Oracle's speed advantage in this particular scenario,
though.  What is PG's EXPLAIN output for this query, and what does
Oracle have to say about it?  (They don't call it EXPLAIN, but I know
they have an equivalent function to show the query plan for a query.)

            regards, tom lane

Re: Speed comparison to Oracle. Why was this query slower

From
Bruce Momjian
Date:
Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > On Fri, Feb 22, 2002 at 05:23:40PM -0500, Francisco Reyes wrote:
> >>> Is Oracle better at aggregate functions?
> >> How could it be done in a more clever fashion?
>
> > By hashing. Get a hash table. For each row, hash the grouping rows to lookup
> > the intermediate aggregate stage to aggregate this row into. At the end, run
> > through your hash dumping the results.
>
> This is on our TODO list.  It'd be interesting to know whether that is
> the source of Oracle's speed advantage in this particular scenario,
> though.  What is PG's EXPLAIN output for this query, and what does
> Oracle have to say about it?  (They don't call it EXPLAIN, but I know
> they have an equivalent function to show the query plan for a query.)

Was the original users doing GROUP BY with the aggregate?  I don't
remember.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Speed comparison to Oracle. Why was this query slower

From
Francisco Reyes
Date:
On Fri, 22 Feb 2002, Bruce Momjian wrote:

> Tom Lane wrote:
> > Martijn van Oosterhout <kleptog@svana.org> writes:
> > > On Fri, Feb 22, 2002 at 05:23:40PM -0500, Francisco Reyes wrote:
> > >>> Is Oracle better at aggregate functions?
> > >> How could it be done in a more clever fashion?
> >
> > > By hashing. Get a hash table. For each row, hash the grouping rows to lookup
> > > the intermediate aggregate stage to aggregate this row into. At the end, run
> > > through your hash dumping the results.
>
> Was the original users doing GROUP BY with the aggregate?  I don't
> remember.

I started the thread. Not sure if you refered to me when you wrote about
the "original user".

If so my query was something along the lines:
select field1, count(*) from mytable group by field1;

The table is 770K records and the result took 15 seconds on PostgreSQL. It
only took 10 seconds on Oracle. The Oracle table had 730K rows.

Other queries which would do also a sequential scan, but with a smaller
result set was faster on Postgresql. About 2 seconds on Postgresql and 3
to 4 seconds on Oracle. That other query was:
select field1, count(*) from mytable group by field1 where value between
val1 and val2;

The second query operated on a set of a couple of thousand records.