Re: Oracle vs PostgreSQL in real life : NEWS!!! - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Oracle vs PostgreSQL in real life : NEWS!!!
Date
Msg-id 1015107716.3512.89.camel@rh72.home.ee
Whole thread Raw
In response to Re: Oracle vs PostgreSQL in real life : NEWS!!!  (Jean-Paul ARGUDO <jean-paul.argudo@IDEALX.com>)
List pgsql-hackers
On Fri, 2002-03-01 at 23:44, Jean-Paul ARGUDO wrote:
> > > Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred
> > > tps)
> > > 
> > > Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> > > 80 tps (eighty tps).
> 
> Well... Where to start?
> 
> We work on a team of two. The other one is a C/C++ senior coder. He
> mailed me a remark about datatypes on the database. Here is what he sent
> me:
> 
> Our database has different datatypes, here are a count of distinct
> datatypes in all tables:
> 
> 197 numeric(x)
>  19 numeric(x,2)
>   2 varchar(x)
>  61 char(x)
>  36 datetime
> 
> He asked me about numeric(x) and he questioned my about how PG managed
> the NUMERIC types. 
> 
> I gave him a pointer on "numeric.c" in the PG srcs.
> 
> I analyzed this source and found that NUMERIC types are much most
> expensive than simple INTEGER.
> 
> I really fall on the floor.. :-( I was sure with as good quality PG is,
> when NUMERIC(x) columns are declared, It would be translated in INTEGER
> (int2, 4 or 8, whatever...).

Postgres does not do any silent type replacing based on data type max
length.

> So, I made a pg_dump of the current database, made some perl
> remplacements NUMERIC(x,0) to INTEGER.
> 
> I loaded the database and launched treatments: the results are REALLY
> IMPRESIVE: here what I have:
> 
> ((it is a port of Oracle/WinNT stuff to PostgreSQL/Red Hat 7.2)):
> 
>         Oracle        PG72 with NUMERICs    PG72 with INTEGERS
> --------------------------------------------------------------------------
> sample
> connect by
> query ported    350ms                   750ms                569ms 
> to PG
> (thanks to 
> OpenACS code!)

Did you rewrite your CONNECT BY queries as recursive functions or did
you use varbit tree position pointers ?

> --------------------------------------------------------------------------
> sample "big"
> query with
> connect bys     3 min 30s             8 min 40s            5 min 1s
> and many 
> sub-queries

Could you give more information on this query - i suspect this can be
made at least as fast as Oracle :)

> --------------------------------------------------------------------------
> Big Batch       
> treatment        1300 queries/s        80 queries/s         250 queries/s
> queries
> 
> PRO*C to     45 min to go          ~4 to 6 DAYS        not yet 
> ECPG                                   to go              tested fully
> 
> Ratio              1:1                   1:21              not yet ..
>                                      21 times slower!

Did you run concurrent vacuum for both PG results ?

From my limited testing it seems that such vacuum is absolutely needed
for big batches of mostly updates.

And btw 45min * 21 = 15h45 not 4-6 days :)

> --------------------------------------------------------------------------
> ((but this batch will be yet re-writen in pure C + libpq + SPI,
>   so we think we'll have better results again))

You probably will get better results :)

I rerun my test (5000 transactions of 20 updates on random unique key
between 1 and 768, with concurrent vacuum running every 4 sec) moving
the inner loop of 20 random updates to server, both without SPI prepared
statements and then using prepared statements.
Test hardware - Athlon 859, IDE, 512MB ram

update of random row i=1..768 
all queries sent from client              2:02 = 820 updates sec
[hannu@taru abench]$ time ./abench.py 
real    2m1.522s
user    0m20.260s
sys     0m3.320s
[hannu@taru abench]$ time ./abench.py 
real    2m2.320s
user    0m19.830s
sys     0m3.490s

using plpython without prepared statements 1:35 = 1052 updates/sec
[hannu@taru abench]$ time ./abenchplpy2.py 
real    1m34.587s
user    0m1.280s
sys     0m0.400s
[hannu@taru abench]$ time ./abenchplpy2.py 
real    1m36.919s
user    0m1.350s
sys     0m0.450s

using plpython with SPI prepared statements 1:06.30 = 1503 updates/sec
[hannu@taru abench]$ time ./abenchplpy.py 
real    1m6.134s
user    0m1.400s
sys     0m0.720s
[hannu@taru abench]$ time ./abenchplpy.py 
real    1m7.186s
user    0m1.580s
sys     0m0.570s

plpython non-functional with SPI prepared 
statements - update where i=1024          0:17.65 = 5666 non-updates sec
[hannu@taru abench]$ time ./abenchplpy.py 
real    0m17.650s
user    0m0.990s
sys     0m0.290s


> So as you see, DATA TYPES are REALLY important, as I did write on a
> techdocs article ( I should have tought in this earlier )

Yes they are.

But running concurrent vacuum is _much_ more important if the number of
updates is much bigger than number of records (thanks Tom!)

------------------
Hannu



pgsql-hackers by date:

Previous
From: "Pavlo Baron"
Date:
Subject: Re: TODO question
Next
From: "Dan Langille"
Date:
Subject: Re: [SQL] Uniqueness of rule, constraint, and trigger names