Thread: Oracle vs PostgreSQL in real life

Oracle vs PostgreSQL in real life

From

Jean-Paul ARGUDO

Date:

27 February 2002, 09:47:06

Okay...

I'm very sceptic  today.

I'm making a survey on Oracle 8.0 on NT4 remplacement with a RedHat 7.2/PG 7.2

The customer gave me stuff to migrate, like scripts in Pro*C Oracle that I
migrated successfully with ECPG. Other stuff with Connect by statments, thanks
to OpenACS guys, I migrated this Connect by statments too.

But finaly, with all my mind I explained all queries, made all good, I hope
everything has be done. 

The "test" is a big batch that computes stuffs in the database. Here are the
timings of both Oracle and PG (7.2) :

Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred
tps)

Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
80 tps (eighty tps).


Tests were made on the same machine, a pentium 3 600 MHz with 256 Megs RAM and
RAID 5.

We formatted the server and insstalled linux stuff after..

So what you think of SUCH difference between Oracle/NT and Linux/PG ?

I feel very bad in front of the customer, to tell there is a 1:15 ratio between
Oracle / Nt and Linux / PostgreSQL, since I'm real PG fan and DBA in french Open
Souce company...

Thanks a lot for support.

:-(((

-- 
Jean-Paul ARGUDO

Re: Oracle vs PostgreSQL in real life

From

mlw

Date:

27 February 2002, 13:12:11

Jean-Paul ARGUDO wrote:
> 
> Okay...
> 
> I'm very sceptic  today.
> 
> I'm making a survey on Oracle 8.0 on NT4 remplacement with a RedHat 7.2/PG 7.2
> 
> The customer gave me stuff to migrate, like scripts in Pro*C Oracle that I
> migrated successfully with ECPG. Other stuff with Connect by statments, thanks
> to OpenACS guys, I migrated this Connect by statments too.
> 
> But finaly, with all my mind I explained all queries, made all good, I hope
> everything has be done.
> 
> The "test" is a big batch that computes stuffs in the database. Here are the
> timings of both Oracle and PG (7.2) :
> 
> Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred
> tps)
> 
> Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> 80 tps (eighty tps).

Wow! That is huge. Ok, let me ask some questions:

Did you do a "vacuum analyze" on the tables?
If you did not analyze the tables, it may be using table scans instead of
indexes. That would make a huge difference. Also, it may choose poorly between
hash joins and merge joins.

Did you tune "buffers" in postgresql.conf?
If you have too few buffers, you will get no caching effect on the queries.

Re: Oracle vs PostgreSQL in real life

From

Hannu Krosing

Date:

27 February 2002, 13:16:36

On Wed, 2002-02-27 at 16:46, Jean-Paul ARGUDO wrote:
> Okay...
> 
> I'm very sceptic  today.
> 
> I'm making a survey on Oracle 8.0 on NT4 remplacement with a RedHat 7.2/PG 7.2
> 
> The customer gave me stuff to migrate, like scripts in Pro*C Oracle that I
> migrated successfully with ECPG. Other stuff with Connect by statments, thanks
> to OpenACS guys, I migrated this Connect by statments too.
> 
> But finaly, with all my mind I explained all queries, made all good, I hope
> everything has be done. 

What was the postgresql.conf set to ?

> 
> The "test" is a big batch that computes stuffs in the database. 

Could you run this batch in smaller chunks to see if PG is slow from the
start or does it slow down as it goes ?

> Here are the timings of both Oracle and PG (7.2) :
> 
> Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred
> tps)
> 
> Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> 80 tps (eighty tps).

What kind of tps are these ? 

I.e. what does each t do ?

-------------
Hannu

Re: Oracle vs PostgreSQL in real life

From

Justin Clift

Date:

27 February 2002, 13:18:42

Hi Jean-Paul,

I know you've probably done this, but I'll ask just in case.

Did you tune the memory of the PostgreSQL server configuration?

i.e. the postgresql.conf file?

If so, what are the values you changed from default?

:-)

Regards and best wishes,

Justin Clift


Jean-Paul ARGUDO wrote:
> 
> Okay...
> 
> I'm very sceptic  today.
> 
> I'm making a survey on Oracle 8.0 on NT4 remplacement with a RedHat 7.2/PG 7.2
> 
> The customer gave me stuff to migrate, like scripts in Pro*C Oracle that I
> migrated successfully with ECPG. Other stuff with Connect by statments, thanks
> to OpenACS guys, I migrated this Connect by statments too.
> 
> But finaly, with all my mind I explained all queries, made all good, I hope
> everything has be done.
> 
> The "test" is a big batch that computes stuffs in the database. Here are the
> timings of both Oracle and PG (7.2) :
> 
> Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred
> tps)
> 
> Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> 80 tps (eighty tps).
> 
> Tests were made on the same machine, a pentium 3 600 MHz with 256 Megs RAM and
> RAID 5.
> 
> We formatted the server and insstalled linux stuff after..
> 
> So what you think of SUCH difference between Oracle/NT and Linux/PG ?
> 
> I feel very bad in front of the customer, to tell there is a 1:15 ratio between
> Oracle / Nt and Linux / PostgreSQL, since I'm real PG fan and DBA in french Open
> Souce company...
> 
> Thanks a lot for support.
> 
> :-(((
> 
> --
> Jean-Paul ARGUDO
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi

Re: Oracle vs PostgreSQL in real life

From

Alex Avriette

Date:

27 February 2002, 13:19:25

The "test" is a big batch that computes stuffs in the database. Here are the
timings of both Oracle and PG (7.2) :

Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two
hundred
tps)

Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45
hours),
80 tps (eighty tps).

---

Jean-Paul, I think the problem here is not having postgres configured
properly. I am in a similar situation here where we are migrating data from
postgres into oracle. Postgres has been as much as 40x faster than Oracle in
many situations here. Note also that our oracle instance is on a quad
processor Sun 280R, and our postgres 'instance' is on a p3/1ghz. Iterating
over 440,000 xml 'text' fields in oracle takes about 4 days. In postgres it
takes 8 hours. Iterating over a 3.5M row table is just inconceivable for
oracle, and I do it in postgres all the time.

My suspicion is that our oracle instance is not tuned very well, and the
code that is manipulating the database (in this case perl) is much smarter
for postgres (we have separate developers to do perl-oracle interfaces).

Postgres is a fantastic, fast database. But you really must configure it,
and code intelligently to use it.

-alex

Re: Oracle vs PostgreSQL in real life

From

Marc Lavergne

Date:

27 February 2002, 13:34:30

There is probably an explanation but "computes stuffs" doesn't provide 
much information to go with. Do you think you could boil this down to a 
test case? Also, expand on what the batch file does, the size
database, and which interface you are using. I'm sure people would like 
to help, but there simply isn't enough information do derive and 
conclusions here.

Cheers,

Marc

Jean-Paul ARGUDO wrote:

> Okay...
> 
> I'm very sceptic  today.
> 
> I'm making a survey on Oracle 8.0 on NT4 remplacement with a RedHat 7.2/PG 7.2
> 
> The customer gave me stuff to migrate, like scripts in Pro*C Oracle that I
> migrated successfully with ECPG. Other stuff with Connect by statments, thanks
> to OpenACS guys, I migrated this Connect by statments too.
> 
> But finaly, with all my mind I explained all queries, made all good, I hope
> everything has be done. 
> 
> The "test" is a big batch that computes stuffs in the database. Here are the
> timings of both Oracle and PG (7.2) :
> 
> Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred
> tps)
> 
> Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> 80 tps (eighty tps).
> 
> 
> Tests were made on the same machine, a pentium 3 600 MHz with 256 Megs RAM and
> RAID 5.
> 
> We formatted the server and insstalled linux stuff after..
> 
> So what you think of SUCH difference between Oracle/NT and Linux/PG ?
> 
> I feel very bad in front of the customer, to tell there is a 1:15 ratio between
> Oracle / Nt and Linux / PostgreSQL, since I'm real PG fan and DBA in french Open
> Souce company...
> 
> Thanks a lot for support.
> 
> :-(((
> 
>

Re: Oracle vs PostgreSQL in real life

From

Jean-Paul ARGUDO

Date:

27 February 2002, 13:45:28

> What was the postgresql.conf set to ?

I put parameters in another mail, please watch for it.

> > The "test" is a big batch that computes stuffs in the database. 
> Could you run this batch in smaller chunks to see if PG is slow from the
> start or does it slow down as it goes ?

The batch starts really fast and past 2 minuts, begins to slow down dramatically
and never stops to get slower and slower

> > Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> > 80 tps (eighty tps).
> What kind of tps are these ? 

Here's what we have in output:

This is the WINDOWS NT4 / Oracle 8.0 ouput when the batch is totally finished:

Time : 00:47:50

Transaction : 25696
Item : 344341
Transaction (in milliseconds) : 111
Item (in milliseconds) : 8

Errors : 0
Warnings : 0
PLU not found : 0
NOM not found : 0
Alloc NOM : 739
Free NOM : 739
Error 1555 : 0

Read : 2093582
Write : 1772364
Read/Write : 3865946

Free memory (RAM) : 117396 Ko / 261548 Ko

PLU SELECT : 344341
NOM SELECT : 1377364
T04 SELECT : 1840
T01 INSERT : 593
T01 UPDATE : 1376771
T02 INSERT : 28810
T02 UPDATE : 315531
T03 INSERT : 41199
T13 INSERT : 9460
RJT INSERT : 0
RJT SELECT : 0

-------------------- 
Beware "Transaction" does not mean transaction..  a "transaction" here contains one ore
more "item", in the context of the application/database.

What for real DML orders: 3.865.946 queries done in 47 min 50 secs. (the queries
are reparted in many tables, look for detail couting under "Free memory..."
line.. (a table name is 3 letters long)

Thats 1 347 queries per second... -ouch!

This is the Linux Red Hat 7.2 / PostgreSQL 7.2 port of the Pro*C program
producing the output

As you'll understand, it is not the COMPLETE batch, we had to stop it..:


Time : 00:16:26

Transaction      : 750
Item             : 7391
Transaction (ms) : 1314
Item (ms)        : 133

Errors        : 1
Warnings      : 0
PLU not found : 0
NOM not found : 0
Alloc NOM     : 739
Free NOM      : 0
Error 1555    : 0

Read       : 45127.000
Write      : 37849.000
Read/Write : 82976.000

PLU SELECT : 7391
NOM SELECT : 29564
T04 SELECT : 31
T01 INSERT : 378
T01 UPDATE : 29186
T02 INSERT : 3385
T02 UPDATE : 4006
T03 INSERT : 613
T13 INSERT : 281
RJT INSERT : 0
RJT SELECT : 0

---------------- you see

we have 82.976 queries in  16 min 26 seconds thats a 

84 queries per second

--

definitely nothing to do with Oracle :-((

Very bad for us since if this customers kicks Oracle to get PG, it can be really
fantastic, this customer has much influence on the business....


Thanks for helping me that much to all of you.
-- 
Jean-Paul ARGUDO

Re: Oracle vs PostgreSQL in real life

From

Jean-Paul ARGUDO

Date:

27 February 2002, 13:49:42

Okay,

To answer many replies (thanks!), I'll try to put more details:

* DELL serverP3 600 MHZ 256 M ramRAID 5 

* kernel

Linux int2412 2.4.9-21SGI_XFS_1.0.2 #1 Thu Feb 7 16:50:37 CET 2002 i686 unknown    

with aacraid-cox because aacraid had poor perfs with this server (at 1st we
tought about raid5 problems)

* postgresql.conf : here are _all_ uncomented parameters:

tcpip_socket = true
max_connections = 16
port = 5432

shared_buffers = 19000          # 2*max_connections, min 16
max_fsm_relations = 200         # min 10, fsm is free space map
max_fsm_pages = 12000           # min 1000, fsm is free space map
max_locks_per_transaction = 256 # min 10
wal_buffers = 24                # min 4

sort_mem = 8192            # min 32
vacuum_mem = 8192          # min 1024


wal_debug = 0             # range 0-16

fsync = true

silent_mode = true
log_connections = false
log_timestamp = false
log_pid = false

debug_level = 0 # range 0-16

debug_print_query = false
debug_print_parse = false
debug_print_rewritten = false
debug_print_plan = false
debug_pretty_print = false
show_parser_stats = false
show_planner_stats = false
show_executor_stats = false
show_query_stats = false

transform_null_equals = true

* /proc parameters:

proc/sys/kernel/shmall => 184217728   (more than 130M)
proc/sys/kernel/shmall => 184217728  

* we made a bunch of vmstat logs too, we made graphics to understand, all in a postscript file, with gun graph ... this
isvery interesting, but as I dont
 
know if attachments are autorized here, please tell me if I can post it too. It
shows swap in/out, memory, I/O, etc.. 


Thanks for your support!


-- 
Jean-Paul ARGUDO

Re: Oracle vs PostgreSQL in real life

From

Doug McNaught

Date:

27 February 2002, 14:01:45

Jean-Paul ARGUDO <jean-paul.argudo@idealx.com> writes:

> Okay...
> 
> I'm very sceptic  today.

Did you adjust the shared buffers and other tuning settings for
Postgres?

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863

Re: Oracle vs PostgreSQL in real life

From

Jean-Paul ARGUDO

Date:

27 February 2002, 15:32:36

The batch is originally wrotten in Pro*C for Oracle under Windows NT.

We transalted it thanks to fabulous ecpg client interface.

I posted details as asked before, hope this will help on some deeper analysis.

Thanks for your remarks.



-- 
Jean-Paul ARGUDO

Re: Oracle vs PostgreSQL in real life

From

"Dann Corbit"

Date:

27 February 2002, 16:28:29

-----Original Message-----
From: Marc Lavergne [mailto:mlavergne-pub@richlava.com]
Sent: Wednesday, February 27, 2002 9:41 AM
To: Jean-Paul ARGUDO
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Oracle vs PostgreSQL in real life


There is probably an explanation but "computes stuffs" doesn't provide
much information to go with. Do you think you could boil this down to a
test case? Also, expand on what the batch file does, the size
database, and which interface you are using. I'm sure people would like
to help, but there simply isn't enough information do derive and
conclusions here.
>>----------------------------------------------------------------------
This seems a very important test case.  If possible, and the client will
allow it, perhaps the relevant pieces of the schema could be published
to some ftp site along with the relevant C code.  Then, we could
populate
the tables with dummy data and run the same tests.  One of two things
will happen (I predict).

1.  Someone will find a way to make it run fast.
OR
2.  Someone will offer an improvement to PostgreSQL so that it can do as
well or better than Oracle for this application.

Without understanding the problem, we end up guessing.
<<----------------------------------------------------------------------

Re: Oracle vs PostgreSQL in real life

From

mlw

Date:

27 February 2002, 17:07:23

Jean-Paul ARGUDO wrote:
> This is the Linux Red Hat 7.2 / PostgreSQL 7.2 port of the Pro*C program
> producing the output
> 
> As you'll understand, it is not the COMPLETE batch, we had to stop it..:
> 
> Time : 00:16:26
> 
> Transaction      : 750
> Item             : 7391
> Transaction (ms) : 1314
> Item (ms)        : 133
> 
> Errors        : 1
> Warnings      : 0
> PLU not found : 0
> NOM not found : 0
> Alloc NOM     : 739
> Free NOM      : 0
> Error 1555    : 0
> 
> Read       : 45127.000
> Write      : 37849.000
> Read/Write : 82976.000
> 
> PLU SELECT : 7391
> NOM SELECT : 29564
> T04 SELECT : 31
> T01 INSERT : 378
> T01 UPDATE : 29186

Are you updating 29186 records in a table here? If so, is this table used in
the following queries?

> T02 INSERT : 3385
> T02 UPDATE : 4006

Ditto here, is T02 updated and then used in subsequent queries?

> T03 INSERT : 613
> T13 INSERT : 281
> RJT INSERT : 0
> RJT SELECT : 0

Are these queries run in this order, or are the inserts/updates/selects
intermingled?

A judicial vacuum on a couple of the tables may help.

Also, I noticed you had 19000 buffers. I did some experimentation with buffers
and found more is not always better. Depending on the nature of your database,
2048~4096 seem to be a sweet spot for some of he stuff that I do.

Again, have you "analyzed" the database? PostgreSQL will do badly if you have
not analyzed. (Oracle also benefits from analyzing, depending on the nature of
the data.)

Have you done an "explain" on the queries used in your batch? You may be able
to see what's going on.

Re: Oracle vs PostgreSQL in real life

From

Jean-Paul ARGUDO

Date:

28 February 2002, 04:33:53

Hi all,

As I wrote it before there, it is an ECPG script that runs with bad perfs.
I put back trace/ notices/debug mode on the server.

Here is an example of what does the debug doesnt stop to do:

c... stuffs are CURSORS

it seems that on every commit, the cursor is closed

[... snip ...]
NOTICE:  Closing pre-existing portal "csearcht04"
NOTICE:  Closing pre-existing portal "csearcht30"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "csearcht04"
NOTICE:  Closing pre-existing portal "csearcht30"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
NOTICE:  Closing pre-existing portal "cfindplu"
[... snip ...]

c... stuffs are CURSORS

it seems that on every commit, the cursor is closed... and re-opened with new
variables'values 

btw, as many asked me, queries are VERY simple, there is only a few queries.
Each query works on one table at a time. no joins for example. Only massive bulk
work with CURSORS.

Any way to avoid closing/opening of cursors? 
Any tip on porting the best way cursors?;.

thanks in advance.

PS: I am currently testing vacuums between the script to pause the data
manipulation, make a vacuum analyze and continue the treatments.

-- 
Jean-Paul ARGUDO

Re: Oracle vs PostgreSQL in real life

From

Jean-Paul ARGUDO

Date:

28 February 2002, 08:40:19

> > it seems that on every commit, the cursor is closed... and re-opened with
> > new
> > variables'values
> 
> I think that currently the only way to reuse query plans would be migrating
> some
> of your logic to backend and using SPI prepared statements.
> 
> > btw, as many asked me, queries are VERY simple, there is only a few
> queries.
> > Each query works on one table at a time. no joins for example. Only
> massive bulk
> > work with CURSORS.
> 
> Again, can't some of it be moved to backend, either using PL/PgSQL or C (or
> pltcl, plperl, plpython ;)
> 


OK.


We read all the " Chapter 21. Server Programming Interface" with SPI doc.

This seems _really_ interresting, make me remember of outline statments in
Oracle.

So:

1) how to find some sample code? are SPI statments can be called from
/into ecpg?

2) if prepared statments and stored execution plan exist, why can't thos be used
from any client interface or simple sql?

3) You tell us we can "move to the backend" some queries: do you mean we would
have better performances with stored functions in plpgsql? 

Thanks a lot Hannu, I promise to stop soon with questions :-)

This is _so_ important for us..

Best regards & wishes.

--
Jean-Paul ARGUDO

Re: Oracle vs PostgreSQL in real life

From

Karel Zak

Date:

28 February 2002, 09:05:18

On Thu, Feb 28, 2002 at 02:39:47PM +0100, Jean-Paul ARGUDO wrote:

> 2) if prepared statments and stored execution plan exist, why can't thos be used
> from any client interface or simple sql?
There is "execute already parsed query plan" in SPI layout only. The PostgreSQL hasn't SQL interface for this -- except
myexperimental patch for 7.0 (I sometime think about port it to latest PostgreSQLreleace, but I haven't relevant
motivationdo it...)
 

> 3) You tell us we can "move to the backend" some queries: do you mean we would
> have better performances with stored functions in plpgsql? 
You needn't use plpgsql only. You can use C/C++, Tcl, Perl, Python.IMHO best performance has C + SPI + "store execution
plan".
But don't forget the path of query in PostgreSQL is not query parser only. Use "execute already parsed query plan" has
effectifyou use some query really often and the query spend in parserlonger time....
 
       Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz,
http://mape.jcu.cz

Re: Oracle vs PostgreSQL in real life

From

bpalmer

Date:

28 February 2002, 09:10:52

> many situations here. Note also that our oracle instance is on a quad
> processor Sun 280R, and our postgres 'instance' is on a p3/1ghz. Iterating

A 280r is a 2 way system,  not 4 way (hence the 2 in 280).

- Brandon

----------------------------------------------------------------------------c: 646-456-5455
              h: 201-798-4983b. palmer,  bpalmer@crimelabs.net           pgp:crimelabs.net/bpalmer.pgp5

Re: Oracle vs PostgreSQL in real life

From

Hannu Krosing

Date:

28 February 2002, 11:47:20

On Thu, 2002-02-28 at 15:58, Karel Zak wrote:
> On Thu, Feb 28, 2002 at 02:39:47PM +0100, Jean-Paul ARGUDO wrote:
> 
> > 2) if prepared statments and stored execution plan exist, why can't thos be used
> > from any client interface or simple sql?
> 
>  There is "execute already parsed query plan" in SPI layout only. 
>  The PostgreSQL hasn't SQL interface for this -- except my experimental 
>  patch for 7.0 (I sometime think about port it to latest PostgreSQL
>  releace, but I haven't relevant motivation do it...)

I did some testing 

5000*20 runs of update on non-existing key

(send query+parse+optimise+update 0 rows)

[hannu@taru abench]$ time ./abench.py 2>/dev/null 

real    0m38.992s
user    0m6.590s
sys     0m1.860s

5000*20 runs of update on random existing key

(send query+parse+optimise+update 1 row)

[hannu@taru abench]$ time ./abench.py 2>/dev/null 

real    1m48.380s
user    0m17.330s
sys     0m2.940s


the backend wallclock time for first is   39.0 -  6.6 = 32.4
the backend wallclock time for second is 108.4 - 17.3 = 91.1

so roughly 1/3 of time is spent on 

communication+parse+optimize+locate

and 2/3 on actually updating the tuples

if we could save half of parse/optimise time by saving query plans, then
the backend performance would go up from 1097 to 100000/(91.1-16.2)=1335
updates/sec.

------------------

As an ad hoc test for parsing-planning-optimising costs I did the
following

backend time for  "explain update t01 set val='bum'"
30.0 - 5.7 = 24.3

[hannu@taru abench]$ time ./abench.py 2>/dev/null 

real    0m30.038s
user    0m5.660s
sys     0m2.800s


backend time for "explain update t01 set val='bum' where i = %s"
39.8 - 8.0 = 31.8

[hannu@taru abench]$ time ./abench.py 2>/dev/null 

real    0m39.883s
user    0m8.000s
sys     0m2.620s


so adding "where i=n" to a query made
(parse+plan+show plan) run 1.3 times slower

some of it must be communication overhead, but sure 
some is parsing/planning/optimising time.

--------------
Hannu

Re: Oracle vs PostgreSQL in real life

From

Antonio Sergio de Mello e Souza

Date:

28 February 2002, 12:08:48

  Jean-Paul ARGUDO wrote:

>As I wrote it before there, it is an ECPG script that runs with bad perfs.
>I put back trace/ notices/debug mode on the server.
>
>Here is an example of what does the debug doesnt stop to do:
>
>c... stuffs are CURSORS
>
>it seems that on every commit, the cursor is closed
>
>[... snip ...]
>NOTICE:  Closing pre-existing portal "csearcht04"
>
...

>NOTICE:  Closing pre-existing portal "cfindplu"
>NOTICE:  Closing pre-existing portal "cfindplu"
>NOTICE:  Closing pre-existing portal "cfindplu"
>[... snip ...]
>
>c... stuffs are CURSORS
>
>it seems that on every commit, the cursor is closed... and re-opened with new
>variables'values 
>
By default, Postgres executes transactions in autocommit mode.
This means that each statement is executed in its own transaction and a 
commit is performed
at the end of the statement, what is much slower than executing all 
statements inside a
begin ... commit block.
To disable the autocommit mode you have to compile the ECPG script with 
the -t option.
I Hope that it helps.

Regards,

Antonio Sergio

Re: Oracle vs PostgreSQL in real life

From

Michael Meskes

Date:

01 March 2002, 03:03:30

On Wed, Feb 27, 2002 at 06:44:53PM +0100, Jean-Paul ARGUDO wrote:
> To answer many replies (thanks!), I'll try to put more details:
> ... 
> Linux int2412 2.4.9-21SGI_XFS_1.0.2 #1 Thu Feb 7 16:50:37 CET 2002 i686 unknown    

But you know that kernels up to 2.4.10 had huge problems with virtual
memory, don#t you. I'd recommend testing it either on 2.4.17 (which seems to
run stable for me) or, if you want to be sure and do not need SMP, use
2.2.20.

Michael
-- 
Michael Meskes
Michael@Fam-Meskes.De
Go SF 49ers! Go Rhein Fire!
Use Debian GNU/Linux! Use PostgreSQL!

Re: Oracle vs PostgreSQL in real life

From

Michael Meskes

Date:

01 March 2002, 03:04:59

On Thu, Feb 28, 2002 at 10:32:48AM +0100, Jean-Paul ARGUDO wrote:
> As I wrote it before there, it is an ECPG script that runs with bad perfs.
> ...
> it seems that on every commit, the cursor is closed

Cursors shouldn't be closed, but prepared statements are deallocated on each
commit. AFAIK this is what the standard says.

Michael
-- 
Michael Meskes
Michael@Fam-Meskes.De
Go SF 49ers! Go Rhein Fire!
Use Debian GNU/Linux! Use PostgreSQL!

Re: Oracle vs PostgreSQL in real life

From

Michael Meskes

Date:

01 March 2002, 03:13:18

On Thu, Feb 28, 2002 at 01:18:29PM -0500, Antonio Sergio de Mello e Souza wrote:
> By default, Postgres executes transactions in autocommit mode.

That of course is true.

> To disable the autocommit mode you have to compile the ECPG script with 
> the -t option.

That unfortunately is not. It's just the opposite way. ecpg per default uses
the Oracle way and issues a BEGIN after each commit automatically. Thus you
only have to specify COMMIT every now and then to end the transaction. If
you use "-t" or SET AUTOCOMMIT ON, then you run in the normal PostgreSQL
environment and get each command inside its own transaction. To manually
start and end transactions you have to use "-t" resp. EXEC SQL SET AUTOCOMMIT ON
and then issue a BEGIN.

Michael
-- 
Michael Meskes
Michael@Fam-Meskes.De
Go SF 49ers! Go Rhein Fire!
Use Debian GNU/Linux! Use PostgreSQL!

Re: Oracle vs PostgreSQL in real life

From

Karel Zak

Date:

01 March 2002, 04:29:25

On Thu, Feb 28, 2002 at 06:21:34PM +0200, Hannu Krosing wrote:
> On Thu, 2002-02-28 at 15:58, Karel Zak wrote:
> > On Thu, Feb 28, 2002 at 02:39:47PM +0100, Jean-Paul ARGUDO wrote:
> > 
> > > 2) if prepared statments and stored execution plan exist, why can't thos be used
> > > from any client interface or simple sql?
> > 
> >  There is "execute already parsed query plan" in SPI layout only. 
> >  The PostgreSQL hasn't SQL interface for this -- except my experimental 
> >  patch for 7.0 (I sometime think about port it to latest PostgreSQL
> >  releace, but I haven't relevant motivation do it...)
> 
> I did some testing 
> 
> 5000*20 runs of update on non-existing key
> 
> (send query+parse+optimise+update 0 rows)
> 
> [hannu@taru abench]$ time ./abench.py 2>/dev/null 
> 
> real    0m38.992s
> user    0m6.590s
> sys     0m1.860s
> 
> 5000*20 runs of update on random existing key
> 
> (send query+parse+optimise+update 1 row)
> 
> [hannu@taru abench]$ time ./abench.py 2>/dev/null 
> 
> real    1m48.380s
> user    0m17.330s
> sys     0m2.940s
> 
> 
> the backend wallclock time for first is   39.0 -  6.6 = 32.4
> the backend wallclock time for second is 108.4 - 17.3 = 91.1
> 
> so roughly 1/3 of time is spent on 
> 
> communication+parse+optimize+locate
> 
> and 2/3 on actually updating the tuples
> 
> if we could save half of parse/optimise time by saving query plans, then
> the backend performance would go up from 1097 to 100000/(91.1-16.2)=1335
> updates/sec.
It depend on proportion between time-in-parser and time-in-executor. Ifyour query spend a lot of time in parser and
optimizeris a query plan cache interesting for you. Because the PostgreSQL has dynamic functions and operators the time
inparser can be for some queries very interesting.We have good notion about total queries time now (for example
frombenchtests), but we haven't real time statistics about path-of-queryin backend. How long time spend a query in the
parser,how log in the optimizer or executor? (... maybe use profiling, but I not surewith it). All my suggestion for
memorymanagment was based on resultof control messages those I wrote into mmgr. And for example Tom was surprised of
oftenrealloc usage. I want say, we need more and moredata from code, else we can't good optimize it ;-)suggestion:
"TODO:solid path-of-query time profiling for developers" :-)       Karel
 
-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz,
http://mape.jcu.cz

Re: Oracle vs PostgreSQL in real life

From

"Zeugswetter Andreas SB SD"

Date:

01 March 2002, 06:34:42

> > As I wrote it before there, it is an ECPG script that runs with bad perfs.
> > ...
> > it seems that on every commit, the cursor is closed
>
> Cursors shouldn't be closed, but prepared statements are deallocated on each
> commit. AFAIK this is what the standard says.

Wow, this sure sounds completely bogus to me.
Imho CURSORS opened inside a transanction (after BEGIN WORK) are supposed to
be closed (at least with autocommit=yes).

I do not think COMMIT is supposed to do anything with a prepared
statement. That is what EXEC SQL FREE :statementid is for.

That would then match e.g. Informix esql/c.

Andreas

Re: Oracle vs PostgreSQL in real life

From

Michael Tiemann

Date:

01 March 2002, 07:54:36

The number 2.4.9-21 corresponds to the (Red Hat) kernel I'm running right now.  Yes, 2.4.X as released from kernel.org
hadhuge problems with virtual memory 

(for virually all values of X), but many of these problems have been addressed 
by keeping the kernel relatively frozen and just working on VM problems (which 
is one of the things we've been doing at Red Hat).  I'm not saying we've got it 
totally nailed just yet, but I want to present the view that some branches of 
the Linux kernel *have* been given the attention they need to avoid some of the 
well-known problems that linux.org kernels are (essentially--through Linus's 
law) designed to find.

M

Michael Meskes wrote:
> On Wed, Feb 27, 2002 at 06:44:53PM +0100, Jean-Paul ARGUDO wrote:
> 
>>To answer many replies (thanks!), I'll try to put more details:
>>... 
>>Linux int2412 2.4.9-21SGI_XFS_1.0.2 #1 Thu Feb 7 16:50:37 CET 2002 i686 unknown    
>>
> 
> But you know that kernels up to 2.4.10 had huge problems with virtual
> memory, don#t you. I'd recommend testing it either on 2.4.17 (which seems to
> run stable for me) or, if you want to be sure and do not need SMP, use
> 2.2.20.
> 
> Michael
>

Re: Oracle vs PostgreSQL in real life

From

Antonio Sergio de Mello e Souza

Date:

01 March 2002, 08:16:07

Michael Meskes wrote:

>On Thu, Feb 28, 2002 at 01:18:29PM -0500, Antonio Sergio de Mello e Souza wrote:
>
>>By default, Postgres executes transactions in autocommit mode.
>>
>That of course is true.
>
>>To disable the autocommit mode you have to compile the ECPG script with 
>>the -t option.
>>
>That unfortunately is not. It's just the opposite way. ecpg per default uses
>the Oracle way and issues a BEGIN after each commit automatically. Thus you
>only have to specify COMMIT every now and then to end the transaction. If
>you use "-t" or SET AUTOCOMMIT ON, then you run in the normal PostgreSQL
>environment and get each command inside its own transaction. To manually
>start and end transactions you have to use "-t" resp. EXEC SQL SET AUTOCOMMIT ON
>and then issue a BEGIN.
>
Many thanks for the explanation! Sorry for the wrong advice...  :-(

Regards,

Antonio Sergio

Re: Oracle vs PostgreSQL in real life : NEWS!!!

From

Jean-Paul ARGUDO

Date:

01 March 2002, 13:45:32

> > Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred
> > tps)
> > 
> > Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> > 80 tps (eighty tps).

Well... Where to start?

We work on a team of two. The other one is a C/C++ senior coder. He
mailed me a remark about datatypes on the database. Here is what he sent
me:

Our database has different datatypes, here are a count of distinct
datatypes in all tables:

197 numeric(x)19 numeric(x,2) 2 varchar(x)61 char(x)36 datetime

He asked me about numeric(x) and he questioned my about how PG managed
the NUMERIC types. 

I gave him a pointer on "numeric.c" in the PG srcs.

I analyzed this source and found that NUMERIC types are much most
expensive than simple INTEGER.

I really fall on the floor.. :-( I was sure with as good quality PG is,
when NUMERIC(x) columns are declared, It would be translated in INTEGER
(int2, 4 or 8, whatever...).

So, I made a pg_dump of the current database, made some perl
remplacements NUMERIC(x,0) to INTEGER.

I loaded the database and launched treatments: the results are REALLY
IMPRESIVE: here what I have:

((it is a port of Oracle/WinNT stuff to PostgreSQL/Red Hat 7.2)):
    Oracle        PG72 with NUMERICs    PG72 with INTEGERS
--------------------------------------------------------------------------
sample
connect by
query ported    350ms                   750ms                569ms 
to PG
(thanks to 
OpenACS code!)
--------------------------------------------------------------------------
sample "big"
query with
connect bys     3 min 30s             8 min 40s            5 min 1s
and many 
sub-queries
--------------------------------------------------------------------------
Big Batch       
treatment        1300 queries/s        80 queries/s         250 queries/s
queries

PRO*C to     45 min to go          ~4 to 6 DAYS        not yet 
ECPG                                   to go              tested fully

Ratio              1:1                   1:21              not yet ..                                    21 times
slower!

--------------------------------------------------------------------------
((but this batch will be yet re-writen in pure C + libpq + SPI, so we think we'll have better results again))


So as you see, DATA TYPES are REALLY important, as I did write on a
techdocs article ( I should have tought in this earlier )

Then?

I'll inform you of what's going on with this Oracle/winnt 2 PG/linux port :-))

And We thank you _very_ much of all the help you gave us.

Best regards and Wishes,

-- 
Jean-Paul ARGUDO

Re: Oracle vs PostgreSQL in real life

From

Michael Meskes

Date:

01 March 2002, 14:13:38

On Fri, Mar 01, 2002 at 10:14:34AM -0500, Antonio Sergio de Mello e Souza wrote:
> Many thanks for the explanation! Sorry for the wrong advice...  :-(

No problem. I have to apologize for the lack of docs. :-)

Michael
-- 
Michael Meskes
Michael@Fam-Meskes.De
Go SF 49ers! Go Rhein Fire!
Use Debian GNU/Linux! Use PostgreSQL!

Re: Oracle vs PostgreSQL in real life : NEWS!!!

From

"D'Arcy J.M. Cain"

Date:

02 March 2002, 08:20:51

On March 1, 2002 01:44 pm, Jean-Paul ARGUDO wrote:
> I analyzed this source and found that NUMERIC types are much most
> expensive than simple INTEGER.
>
> I really fall on the floor.. :-( I was sure with as good quality PG is,
> when NUMERIC(x) columns are declared, It would be translated in INTEGER
> (int2, 4 or 8, whatever...).
>
> So, I made a pg_dump of the current database, made some perl
> remplacements NUMERIC(x,0) to INTEGER.
>
> I loaded the database and launched treatments: the results are REALLY
> IMPRESIVE: here what I have:

Any chance you can try it with the MONEY type?  It does use integers to
store the data.  It isn't really designed for general numeric use but it
would be interesting to see how it fares.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

Re: Oracle vs PostgreSQL in real life : NEWS!!!

From

Doug McNaught

Date:

02 March 2002, 09:02:53

"D'Arcy J.M. Cain" <darcy@druid.net> writes:

> Any chance you can try it with the MONEY type?  It does use integers to
> store the data.  It isn't really designed for general numeric use but it
> would be interesting to see how it fares.

I think the MONEY type is deprecated...

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863

Re: Oracle vs PostgreSQL in real life : NEWS!!!

From

"D'Arcy J.M. Cain"

Date:

02 March 2002, 10:48:02

On March 2, 2002 09:02 am, Doug McNaught wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > Any chance you can try it with the MONEY type?  It does use integers to
> > store the data.  It isn't really designed for general numeric use but it
> > would be interesting to see how it fares.
>
> I think the MONEY type is deprecated...

I keep hearing that but I use it heavily and I hope it never goes away.  The
NUMERIC type is nice but I still think that the MONEY type works well for
certain things.  I bet it will be shown to be more efficient.  Certainly
it has limitations but within those limitations it works well.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

Re: Oracle vs PostgreSQL in real life : NEWS!!!

From

Michael Meskes

Date:

03 March 2002, 03:51:19

On Fri, Mar 01, 2002 at 07:44:10PM +0100, Jean-Paul ARGUDO wrote:
> ((but this batch will be yet re-writen in pure C + libpq + SPI,
>   so we think we'll have better results again))

You mean instead of using ecpg? I'd really be interested in the results of
this.

Michael
-- 
Michael Meskes
Michael@Fam-Meskes.De
Go SF 49ers! Go Rhein Fire!
Use Debian GNU/Linux! Use PostgreSQL!

Re: Oracle vs PostgreSQL in real life

From

"Mattew T. O'Connor"

Date:

04 March 2002, 10:19:03

> shared_buffers = 19000          # 2*max_connections, min 16

This number sounds too high.  If you only have 256M RAM, this is using over 
150 of it.  Are you swapping alot?  What is the load on the server while it's 
runing?

Re: Oracle vs PostgreSQL in real life

From

Hannu Krosing

Date:

04 March 2002, 10:21:43

On Wed, 2002-02-27 at 23:21, Jean-Paul ARGUDO wrote:
> > What was the postgresql.conf set to ?
> 
> I put parameters in another mail, please watch for it.
> 
> > > The "test" is a big batch that computes stuffs in the database. 
> > Could you run this batch in smaller chunks to see if PG is slow from the
> > start or does it slow down as it goes ?
> 
> The batch starts really fast and past 2 minuts, begins to slow down dramatically

This usually means that it is a good time to pause the patch and do a
"vacuum analyze" (or just "analyze" for 7.2)

In 7.2 you can probably do the vacuum analyze in parallel but it will
likely run faster when other backends are stopped.

> and never stops to get slower and slower
> > > Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> > > 80 tps (eighty tps).
> > What kind of tps are these ? 
> 
> Here's what we have in output:
> 
> This is the WINDOWS NT4 / Oracle 8.0 ouput when the batch is totally finished:
> 
> Time : 00:47:50
> 
> Transaction : 25696
> Item : 344341
> Transaction (in milliseconds) : 111
> Item (in milliseconds) : 8
> 
> Errors : 0
> Warnings : 0
> PLU not found : 0
> NOM not found : 0
> Alloc NOM : 739
> Free NOM : 739
> Error 1555 : 0
> 
> Read : 2093582
> Write : 1772364
> Read/Write : 3865946
> 
> Free memory (RAM) : 117396 Ko / 261548 Ko

Assuming these must be interpreted as TABLE COMMAND : COUNT

> PLU SELECT : 344341
> NOM SELECT : 1377364
> T04 SELECT : 1840
> T01 INSERT : 593
> T01 UPDATE : 1376771

This means for postgres with no vacuum in between that you will have in
fact a 1.3M row table to search for 593 actual rows.

Running a (parallel) vacuum or vacuum full and possibly even reindex
will help a lot.




> T02 INSERT : 28810
> T02 UPDATE : 315531

here we have only 10/1 ratio on deleted/live records all of which have
unfortunately be checked for visibility in postgres.

> T03 INSERT : 41199
> T13 INSERT : 9460
> RJT INSERT : 0
> RJT SELECT : 0
> 
> -------------------- 
> Beware "Transaction" does not mean transaction..  a "transaction" here contains one ore
> more "item", in the context of the application/database.

I dont know ECPG very well, but are you sure that you are not running in
autocommit mode, i.e. that each command is not run in its own
transaction.

On the other end of spectrum - are you possibly running all the queries
in one transaction ?

> What for real DML orders: 3.865.946 queries done in 47 min 50 secs. (the queries
> are reparted in many tables, look for detail couting under "Free memory..."
> line.. (a table name is 3 letters long)
> 
> Thats 1 347 queries per second... -ouch!

How complex are these queries ?

If much time is spent by backend on optimizing (vs. executing), then you
could win by rewriting some of these as PL/SQL or C procedures that do a
prepare/execute using SPI and use a stored plan.

> This is the Linux Red Hat 7.2 / PostgreSQL 7.2 port of the Pro*C program
> producing the output
> 
> As you'll understand, it is not the COMPLETE batch, we had to stop it..:

Can you run VACUUM ANALYZE and continue ?

> Time : 00:16:26
> 
> Transaction      : 750
> Item             : 7391
> Transaction (ms) : 1314
> Item (ms)        : 133
> 
> Errors        : 1
> Warnings      : 0
> PLU not found : 0
> NOM not found : 0
> Alloc NOM     : 739
> Free NOM      : 0
> Error 1555    : 0
> 
> Read       : 45127.000
> Write      : 37849.000
> Read/Write : 82976.000
> 
> PLU SELECT : 7391
> NOM SELECT : 29564
> T04 SELECT : 31
> T01 INSERT : 378

Was the T01 table empty at the start (does it have 378 rows) ?

> T01 UPDATE : 29186

could you get a plan for an update on T01 at this point

does it look ok ?

can you make it faster by manipulating enable_xxx variables ?

> T02 INSERT : 3385
> T02 UPDATE : 4006
> T03 INSERT : 613
> T13 INSERT : 281
> RJT INSERT : 0
> RJT SELECT : 0
> 
> ---------------- you see
> 
> we have 82.976 queries in  16 min 26 seconds thats a 
> 
> 84 queries per second
> 
> --
> 
> definitely nothing to do with Oracle :-((

Was oracle out-of-box or did you (or someone else) tune it too ?

> Very bad for us since if this customers kicks Oracle to get PG, it can be really
> fantastic, this customer has much influence on the business....

--------------
Hannu

Re: Oracle vs PostgreSQL in real life

From

Hannu Krosing

Date:

04 March 2002, 10:23:48

On Wed, 2002-02-27 at 23:21, Jean-Paul ARGUDO wrote:
> > What was the postgresql.conf set to ?
> 
> I put parameters in another mail, please watch for it.
> 
> > > The "test" is a big batch that computes stuffs in the database. 
> > Could you run this batch in smaller chunks to see if PG is slow from the
> > start or does it slow down as it goes ?
> 
> The batch starts really fast and past 2 minuts, begins to slow down dramatically
> and never stops to get slower and slower
> 

I did a small test run on my home computer (Celeron 350, IDE disks,
untuned 7.2 on RH 7.2) 

I made a small table (int,text) with primary key on int and filled it
with values 1-512 for int.

then I ran a python script that updated 10000 random rows in patches of
10 updates.

the first run took 

a)   1.28 - 112 tps 

as it used seq scans

then I ran VACUUM ANALYZE and next runs were

1. 24 sec - 416 tps
2. 43 sec - 232 tps
3. 71 sec - 140 tps

then I tied the same query and run vacuum in another window manually
each 5 sec.

the result was similar to 1 - 24.5 sec

running vacuum every 10 sec slowed it to  25.1 sec, running every 3 sec
to 24.3 sec. Running vacuum in a tight loop slowed test down to 30.25
sec.

-------------------------
Hannu

Re: Oracle vs PostgreSQL in real life

From

"Hannu Krosing"

Date:

04 March 2002, 14:49:27

----- Original Message -----
From: "Jean-Paul ARGUDO" <jean-paul.argudo@idealx.com>
>
>
> PS: I am currently testing vacuums between the script to pause the data
> manipulation, make a vacuum analyze and continue the treatments.
>

I ran a small test (Pg 7.2, RH 7.1, Athlon 850, 512Mb) that created a small
table of 2 fields with primary key on first,
filled it with 768 values and then run the following script:

--------------------------------
#!/usr/bin/python

import pg, random
con = pg.connect()
q = "update t01 set val='bum' where i = %s"
for trx in range(5000):   con.query('begin')   for cmd in range(20):       rn = random.randint(1,768)       con.query(q
%rn)   con.query('commit')
 
--------------------------------

when run as is it made average of 152 updates/sec
[hannu@taru hannu]$ time ./abench.py

real    10m55.034s
user    0m27.270s
sys     0m4.700s

after doing a vacuum full i run it together with a parallel process
that was a simple loop sleeping 5 sec and then doing vacuum

--------------------------------
#!/usr/bin/python

import time, pg

con = pg.connect()

for trx in range(5000):   for cmd in range(20):       time.sleep(5)       print 'vacuum'       con.query('vacuum')
print 'done!'
 
--------------------------------

The same script runs now at average 917
[hannu@taru hannu]$ time ./abench.py

real    1m48.416s
user    0m16.840s
sys     0m3.300s

So here we have a case where the new vacuum can really save a day !

I also tried other vacuuming intervals and it seems that ~4 sec is the best
for this case

here are test results
interval - time - updates per sec

2 sec - 1.53.5 - 881
3 sec - 1.49.6 - 912
4 sec - 1.48.0 - 925
5 sec - 1.48.4 - 922
6 sec - 1.49.7 - 911
10 sec - 1.56.8 - 856
no vac - 10.55.0 - 152
--------------
Hannu

Re: Oracle vs PostgreSQL in real life

From

"Hannu Krosing"

Date:

04 March 2002, 14:55:14

----- Original Message -----
From: "Jean-Paul ARGUDO" <jean-paul.argudo@idealx.com>
>
> it seems that on every commit, the cursor is closed... and re-opened with
new
> variables'values

I think that currently the only way to reuse query plans would be migrating
some
of your logic to backend and using SPI prepared statements.

> btw, as many asked me, queries are VERY simple, there is only a few
queries.
> Each query works on one table at a time. no joins for example. Only
massive bulk
> work with CURSORS.

Again, can't some of it be moved to backend, either using PL/PgSQL or C (or
pltcl, plperl, plpython ;)

> PS: I am currently testing vacuums between the script to pause the data
> manipulation, make a vacuum analyze and continue the treatments.

Starting with 7.2 you cand also run both analyze and simple vacuum in
parallel to the main app.

You will most likely need to run analyze once after tables are more or less
filled and then a parallel
vacuum every 5-30 sec to avoid tables growing too big. You could limit
vacuum to only those
tables that see a lot of updating (or delete/insert).

-----------
Hannu

Re: Oracle vs PostgreSQL in real life : NEWS!!!

From

Hannu Krosing

Date:

07 March 2002, 08:31:08

On Fri, 2002-03-01 at 23:44, Jean-Paul ARGUDO wrote:
> > > Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred
> > > tps)
> > > 
> > > Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> > > 80 tps (eighty tps).
> 
> Well... Where to start?
> 
> We work on a team of two. The other one is a C/C++ senior coder. He
> mailed me a remark about datatypes on the database. Here is what he sent
> me:
> 
> Our database has different datatypes, here are a count of distinct
> datatypes in all tables:
> 
> 197 numeric(x)
>  19 numeric(x,2)
>   2 varchar(x)
>  61 char(x)
>  36 datetime
> 
> He asked me about numeric(x) and he questioned my about how PG managed
> the NUMERIC types. 
> 
> I gave him a pointer on "numeric.c" in the PG srcs.
> 
> I analyzed this source and found that NUMERIC types are much most
> expensive than simple INTEGER.
> 
> I really fall on the floor.. :-( I was sure with as good quality PG is,
> when NUMERIC(x) columns are declared, It would be translated in INTEGER
> (int2, 4 or 8, whatever...).

Postgres does not do any silent type replacing based on data type max
length.

> So, I made a pg_dump of the current database, made some perl
> remplacements NUMERIC(x,0) to INTEGER.
> 
> I loaded the database and launched treatments: the results are REALLY
> IMPRESIVE: here what I have:
> 
> ((it is a port of Oracle/WinNT stuff to PostgreSQL/Red Hat 7.2)):
> 
>         Oracle        PG72 with NUMERICs    PG72 with INTEGERS
> --------------------------------------------------------------------------
> sample
> connect by
> query ported    350ms                   750ms                569ms 
> to PG
> (thanks to 
> OpenACS code!)

Did you rewrite your CONNECT BY queries as recursive functions or did
you use varbit tree position pointers ?

> --------------------------------------------------------------------------
> sample "big"
> query with
> connect bys     3 min 30s             8 min 40s            5 min 1s
> and many 
> sub-queries

Could you give more information on this query - i suspect this can be
made at least as fast as Oracle :)

> --------------------------------------------------------------------------
> Big Batch       
> treatment        1300 queries/s        80 queries/s         250 queries/s
> queries
> 
> PRO*C to     45 min to go          ~4 to 6 DAYS        not yet 
> ECPG                                   to go              tested fully
> 
> Ratio              1:1                   1:21              not yet ..
>                                      21 times slower!

Did you run concurrent vacuum for both PG results ?

From my limited testing it seems that such vacuum is absolutely needed
for big batches of mostly updates.

And btw 45min * 21 = 15h45 not 4-6 days :)

> --------------------------------------------------------------------------
> ((but this batch will be yet re-writen in pure C + libpq + SPI,
>   so we think we'll have better results again))

You probably will get better results :)

I rerun my test (5000 transactions of 20 updates on random unique key
between 1 and 768, with concurrent vacuum running every 4 sec) moving
the inner loop of 20 random updates to server, both without SPI prepared
statements and then using prepared statements.
Test hardware - Athlon 859, IDE, 512MB ram

update of random row i=1..768 
all queries sent from client              2:02 = 820 updates sec
[hannu@taru abench]$ time ./abench.py 
real    2m1.522s
user    0m20.260s
sys     0m3.320s
[hannu@taru abench]$ time ./abench.py 
real    2m2.320s
user    0m19.830s
sys     0m3.490s

using plpython without prepared statements 1:35 = 1052 updates/sec
[hannu@taru abench]$ time ./abenchplpy2.py 
real    1m34.587s
user    0m1.280s
sys     0m0.400s
[hannu@taru abench]$ time ./abenchplpy2.py 
real    1m36.919s
user    0m1.350s
sys     0m0.450s

using plpython with SPI prepared statements 1:06.30 = 1503 updates/sec
[hannu@taru abench]$ time ./abenchplpy.py 
real    1m6.134s
user    0m1.400s
sys     0m0.720s
[hannu@taru abench]$ time ./abenchplpy.py 
real    1m7.186s
user    0m1.580s
sys     0m0.570s

plpython non-functional with SPI prepared 
statements - update where i=1024          0:17.65 = 5666 non-updates sec
[hannu@taru abench]$ time ./abenchplpy.py 
real    0m17.650s
user    0m0.990s
sys     0m0.290s


> So as you see, DATA TYPES are REALLY important, as I did write on a
> techdocs article ( I should have tought in this earlier )

Yes they are.

But running concurrent vacuum is _much_ more important if the number of
updates is much bigger than number of records (thanks Tom!)

------------------
Hannu