Thread: SLOW pG performance

SLOW pG performance

From

Juan Francisco Diaz

Date:

04 September 2003, 18:20:03

Hi again, I just tried the first load in my brand new pg database...
And IT SUCKS! yeah! it took over an hour to load around 200 thousand
records, while the sqlserver2000 took about 5-7 minutes...
Im using the SAME program, some changes necessary but nothing really
important.
Im using only functions o load the records, and they all receive basically
varchar, so inside he funciont i have to cast some arguments in order to
compare char and varchar o to insert into integer columns.
I dont know if the CAST FUNCTION TAKES TOO LONG,   maybe it is the source of
the problem, maybe it is something else. Al li kno is that I really need to
have the 200 thou record loaded in 10 mins tops. I have some Fkm but
sqlserver did to so it shouldnt be a prob.
Ah the db and the program are running on the same one G4 processor Mac OSX
10.2.6 machine (that is im loading locally), when loading to sqlserver2000
it is done via the 100mbps intranet.

I really dont know, any ideas to improve performance are welcome!!!
Im using prepared statements, i dont know what else to tell you, ah yes im
runnig pgphpadmin 3, so i have to run the apache web server.

Thanks a lot

JuanF
(surprised and disappointed new pg fan)

Re: SLOW pG performance

From

Stephan Szabo

Date:

04 September 2003, 19:00:30

On Thu, 4 Sep 2003, Juan Francisco Diaz wrote:

> Hi again, I just tried the first load in my brand new pg database...
> And IT SUCKS! yeah! it took over an hour to load around 200 thousand
> records, while the sqlserver2000 took about 5-7 minutes...
> Im using the SAME program, some changes necessary but nothing really
> important.

How were you loading the data, one row per transaction or batched in some
fashion?

What do the schema(s) of the table(s) involved look like? What (if any)
constraints are involved (esp. foreign keys).

What do the specs of the relative machines look like?  Especially given
that in one case you're running multiple things on one machine and in the
other it sounds like they're on separate machines?

What do your function(s) look like?  Have you tried doing a subset not
using functions for comparison?

Re: SLOW pG performance

From

Juan Francisco Diaz

Date:

04 September 2003, 19:34:58

El 9/4/03 5:00 PM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> escribió:

> On Thu, 4 Sep 2003, Juan Francisco Diaz wrote:
>
>> Hi again, I just tried the first load in my brand new pg database...
>> And IT SUCKS! yeah! it took over an hour to load around 200 thousand
>> records, while the sqlserver2000 took about 5-7 minutes...
>> Im using the SAME program, some changes necessary but nothing really
>> important.
>
> How were you loading the data, one row per transaction or batched in some
> fashion?
One row per transaction, in both cases sqlserver and pgsql.
>
> What do the schema(s) of the table(s) involved look like? What (if any)
> constraints are involved (esp. foreign keys).
There are 50 tables, but the ones were the insertion is taking place are
only 10. Each of these 10 has one FK ((compoiste) and in a couple cases 2 FK
(composite).
The most used tables have more than 20 columns, and in two cases i have to
break the insertion into two, an insertion and an update due to the large
number of columns (39 and 45). Functions dont allow more than 32 parameters,
anyone know why?

I dont know if you would like to see ALL of the schemas, or if what ive told
you is enough or if maybe you would like to see the schemas of the top 5
tables.
>
> What do the specs of the relative machines look like?  Especially given
> that in one case you're running multiple things on one machine and in the
> other it sounds like they're on separate machines?
>
Yes sqlserver is in one server and locally im using postgresl.
My local machine is a PowerPc G4 533Mhz with 256Megs in RAM, Mac OSX 10.2.6
The DELL server, on which the sqlserver 2000 is running, is a P3 1.4gHz, 1
gig RAM. (What a difference uh? I didnt realize the specs were so
different... Until now)
> What do your function(s) look like?  Have you tried doing a subset not
> using functions for comparison?
>
Ok, so im a newbie and i dont understand exactly the question, but the
following is one of the most used functions (please tell me how and what is
a subset and for what is it useful?):

DECLARE
                Lista ALIAS FOR $1;
                Libro ALIAS FOR $2;
                RS ALIAS FOR $3;
                Dir ALIAS FOR $4;
                Tel ALIAS FOR $5;
                Ext ALIAS FOR $6;
                Ind ALIAS FOR $7;
                Nuni ALIAS FOR $8;
                DRS ALIAS FOR $9;
                DDir2 ALIAS FOR $10;
                DTel2 ALIAS FOR $11;
                DExt2 ALIAS FOR $12;
                DInd2 ALIAS FOR $13;
                DNuni ALIAS FOR $14;
                DAdi2 ALIAS FOR $15;
                Ser ALIAS FOR $16;
                Avi ALIAS FOR $17;
                PosAvi ALIAS FOR $18;
                Refe ALIAS FOR $19;
                CRS ALIAS FOR $20;
                CDir2 ALIAS FOR $21;
                CTel2 ALIAS FOR $22;
                CExt2 ALIAS FOR $23;
                CInd2 ALIAS FOR $24;
                CNuni ALIAS FOR $25;
                TipoT ALIAS FOR $26;
                MAScTel ALIAS FOR $27;
                pXTex ALIAS FOR $28;
                pYTex ALIAS FOR $29;
                pXDir ALIAS FOR $30;
                pYDir ALIAS FOR $31;
                pXTel ALIAS FOR $32;
                
        BEGIN
                IF CAST(Lista AS CHAR) not in (select IdLista from cabezalis
ta where IdLista = CAST(Lista AS CHAR) and IdLibro = CAST(Libro AS CHAR(6)))
 THEN
                        insert into cabezalista(IdLista, IdLibro, Rsoc, Dire
ccion, Telefono, Extension, Indicativo, NumUnico, DRsoc, DDir, Dtel, DExt, D
Adi, DInd, DNumUni, Serie, IdAviso, PosLogo, IdReferencia, CRsoc, CDir, CTel
, CExt, Cind, CNumUni, TipoTel, MAScaraTel, posXTex, posytex, posxdir, posyd
ir, posxtel, posytel, posxext, posyext, posxind, posyind, posxnum, posynum)
                                        values(Lista, Libro, RS, Dir, Tel, E
xt, Ind, Nuni, DRS, DDir2, DTel2, DExt2, DAdi2, DInd2, DNuni, Ser, Avi, PosA
vi, Refe, CRS, CDir2, CTel2, CExt2, CInd2, CNuni, TipoT, MAScTel, pXTex, pYT
ex, pXDir, pYDir, pXTel, 0, 0, 0, 0, 0, 0, 0);
                        return Lista;
                ELSE 
                        return Libro;
                END IF;
        END;


The above is in plpgsql language.

THANKS FOR YOUR HELP!!!!!!!!!!!
I'll be forever in debt with this mailing list!

JuanF
PG rulz!
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Re: SLOW pG performance

From

Stephan Szabo

Date:

04 September 2003, 19:51:26

On Thu, 4 Sep 2003, Juan Francisco Diaz wrote:

> El 9/4/03 5:00 PM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> escribi�:
>
> > On Thu, 4 Sep 2003, Juan Francisco Diaz wrote:
> >
> >> Hi again, I just tried the first load in my brand new pg database...
> >> And IT SUCKS! yeah! it took over an hour to load around 200 thousand
> >> records, while the sqlserver2000 took about 5-7 minutes...
> >> Im using the SAME program, some changes necessary but nothing really
> >> important.
> >
> > How were you loading the data, one row per transaction or batched in some
> > fashion?
> One row per transaction, in both cases sqlserver and pgsql.

There is a speed difference generally right now on loading single rows or
multiple rows in a transaction.  Batching is often preferred if possible.
:)

> > What do the schema(s) of the table(s) involved look like? What (if any)
> > constraints are involved (esp. foreign keys).
> There are 50 tables, but the ones were the insertion is taking place are
> only 10. Each of these 10 has one FK ((compoiste) and in a couple cases 2 FK
> (composite).
>
> The most used tables have more than 20 columns, and in two cases i have to
> break the insertion into two, an insertion and an update due to the large
> number of columns (39 and 45). Functions dont allow more than 32 parameters,
> anyone know why?

You can change it at compile time.  Raising it offers a small amount of
overhead currently and so it's not raised for the default install.

> I dont know if you would like to see ALL of the schemas, or if what ive told
> you is enough or if maybe you would like to see the schemas of the top 5
> tables.

One or two representative ones are probably fine.  It's a question of data
types (do all the foreign key constraints reference columns of the same
type for example) and indexes mostly.  Note my examples of things to look
for down below as well.

> > What do the specs of the relative machines look like?  Especially given
> > that in one case you're running multiple things on one machine and in the
> > other it sounds like they're on separate machines?
> >
> Yes sqlserver is in one server and locally im using postgresl.
> My local machine is a PowerPc G4 533Mhz with 256Megs in RAM, Mac OSX 10.2.6
> The DELL server, on which the sqlserver 2000 is running, is a P3 1.4gHz, 1
> gig RAM. (What a difference uh? I didnt realize the specs were so
> different... Until now)

That's a pretty big difference, especially the ram (especially since it
seems like it's also got apache and the client program running on it).
Another one might be related to the disk systems.

> > What do your function(s) look like?  Have you tried doing a subset not
> > using functions for comparison?
> >
> Ok, so im a newbie and i dont understand exactly the question, but the
> following is one of the most used functions (please tell me how and what is
> a subset and for what is it useful?):

I meant a subset of the inserts directly rather than through functions to
figure out if things were related to insert performance or the function
overhead.  See below for a comment on this particular function,
however.

> [function bits snipped]
> ����������������IF�CAST(Lista�AS�CHAR)�not�in�(select�IdLista�from�cabezalis
> ta�where�IdLista�=�CAST(Lista�AS�CHAR)�and�IdLibro�=�CAST(Libro�AS�CHAR(6)))
> [function bits snipped]

I'd suggest IF NOT EXISTS (select ...) since I believe that's equivalent
and possibly faster since you're already doing a comparison of
IdLista=CAST(Lista AS CHAR) in the subselect.  Also, for this particular
one, an index on (IdLista,IdLibro) may help the search (although it'd need
to be maintained by the insert, so it may be a wash), and you should
probably be careful with types here as well (and I'm not sure why you
aren't just declaring lista and libro as the appropriate types and not
casting them).