Re: Oracle vs PostgreSQL in real life : NEWS!!! - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: Oracle vs PostgreSQL in real life : NEWS!!! |
Date | |
Msg-id | 1015107716.3512.89.camel@rh72.home.ee Whole thread Raw |
In response to | Re: Oracle vs PostgreSQL in real life : NEWS!!! (Jean-Paul ARGUDO <jean-paul.argudo@IDEALX.com>) |
List | pgsql-hackers |
On Fri, 2002-03-01 at 23:44, Jean-Paul ARGUDO wrote: > > > Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred > > > tps) > > > > > > Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours), > > > 80 tps (eighty tps). > > Well... Where to start? > > We work on a team of two. The other one is a C/C++ senior coder. He > mailed me a remark about datatypes on the database. Here is what he sent > me: > > Our database has different datatypes, here are a count of distinct > datatypes in all tables: > > 197 numeric(x) > 19 numeric(x,2) > 2 varchar(x) > 61 char(x) > 36 datetime > > He asked me about numeric(x) and he questioned my about how PG managed > the NUMERIC types. > > I gave him a pointer on "numeric.c" in the PG srcs. > > I analyzed this source and found that NUMERIC types are much most > expensive than simple INTEGER. > > I really fall on the floor.. :-( I was sure with as good quality PG is, > when NUMERIC(x) columns are declared, It would be translated in INTEGER > (int2, 4 or 8, whatever...). Postgres does not do any silent type replacing based on data type max length. > So, I made a pg_dump of the current database, made some perl > remplacements NUMERIC(x,0) to INTEGER. > > I loaded the database and launched treatments: the results are REALLY > IMPRESIVE: here what I have: > > ((it is a port of Oracle/WinNT stuff to PostgreSQL/Red Hat 7.2)): > > Oracle PG72 with NUMERICs PG72 with INTEGERS > -------------------------------------------------------------------------- > sample > connect by > query ported 350ms 750ms 569ms > to PG > (thanks to > OpenACS code!) Did you rewrite your CONNECT BY queries as recursive functions or did you use varbit tree position pointers ? > -------------------------------------------------------------------------- > sample "big" > query with > connect bys 3 min 30s 8 min 40s 5 min 1s > and many > sub-queries Could you give more information on this query - i suspect this can be made at least as fast as Oracle :) > -------------------------------------------------------------------------- > Big Batch > treatment 1300 queries/s 80 queries/s 250 queries/s > queries > > PRO*C to 45 min to go ~4 to 6 DAYS not yet > ECPG to go tested fully > > Ratio 1:1 1:21 not yet .. > 21 times slower! Did you run concurrent vacuum for both PG results ? From my limited testing it seems that such vacuum is absolutely needed for big batches of mostly updates. And btw 45min * 21 = 15h45 not 4-6 days :) > -------------------------------------------------------------------------- > ((but this batch will be yet re-writen in pure C + libpq + SPI, > so we think we'll have better results again)) You probably will get better results :) I rerun my test (5000 transactions of 20 updates on random unique key between 1 and 768, with concurrent vacuum running every 4 sec) moving the inner loop of 20 random updates to server, both without SPI prepared statements and then using prepared statements. Test hardware - Athlon 859, IDE, 512MB ram update of random row i=1..768 all queries sent from client 2:02 = 820 updates sec [hannu@taru abench]$ time ./abench.py real 2m1.522s user 0m20.260s sys 0m3.320s [hannu@taru abench]$ time ./abench.py real 2m2.320s user 0m19.830s sys 0m3.490s using plpython without prepared statements 1:35 = 1052 updates/sec [hannu@taru abench]$ time ./abenchplpy2.py real 1m34.587s user 0m1.280s sys 0m0.400s [hannu@taru abench]$ time ./abenchplpy2.py real 1m36.919s user 0m1.350s sys 0m0.450s using plpython with SPI prepared statements 1:06.30 = 1503 updates/sec [hannu@taru abench]$ time ./abenchplpy.py real 1m6.134s user 0m1.400s sys 0m0.720s [hannu@taru abench]$ time ./abenchplpy.py real 1m7.186s user 0m1.580s sys 0m0.570s plpython non-functional with SPI prepared statements - update where i=1024 0:17.65 = 5666 non-updates sec [hannu@taru abench]$ time ./abenchplpy.py real 0m17.650s user 0m0.990s sys 0m0.290s > So as you see, DATA TYPES are REALLY important, as I did write on a > techdocs article ( I should have tought in this earlier ) Yes they are. But running concurrent vacuum is _much_ more important if the number of updates is much bigger than number of records (thanks Tom!) ------------------ Hannu
pgsql-hackers by date: