Thread: SLOW pG performance
Hi again, I just tried the first load in my brand new pg database... And IT SUCKS! yeah! it took over an hour to load around 200 thousand records, while the sqlserver2000 took about 5-7 minutes... Im using the SAME program, some changes necessary but nothing really important. Im using only functions o load the records, and they all receive basically varchar, so inside he funciont i have to cast some arguments in order to compare char and varchar o to insert into integer columns. I dont know if the CAST FUNCTION TAKES TOO LONG, maybe it is the source of the problem, maybe it is something else. Al li kno is that I really need to have the 200 thou record loaded in 10 mins tops. I have some Fkm but sqlserver did to so it shouldnt be a prob. Ah the db and the program are running on the same one G4 processor Mac OSX 10.2.6 machine (that is im loading locally), when loading to sqlserver2000 it is done via the 100mbps intranet. I really dont know, any ideas to improve performance are welcome!!! Im using prepared statements, i dont know what else to tell you, ah yes im runnig pgphpadmin 3, so i have to run the apache web server. Thanks a lot JuanF (surprised and disappointed new pg fan)
On Thu, 4 Sep 2003, Juan Francisco Diaz wrote: > Hi again, I just tried the first load in my brand new pg database... > And IT SUCKS! yeah! it took over an hour to load around 200 thousand > records, while the sqlserver2000 took about 5-7 minutes... > Im using the SAME program, some changes necessary but nothing really > important. How were you loading the data, one row per transaction or batched in some fashion? What do the schema(s) of the table(s) involved look like? What (if any) constraints are involved (esp. foreign keys). What do the specs of the relative machines look like? Especially given that in one case you're running multiple things on one machine and in the other it sounds like they're on separate machines? What do your function(s) look like? Have you tried doing a subset not using functions for comparison?
El 9/4/03 5:00 PM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> escribió: > On Thu, 4 Sep 2003, Juan Francisco Diaz wrote: > >> Hi again, I just tried the first load in my brand new pg database... >> And IT SUCKS! yeah! it took over an hour to load around 200 thousand >> records, while the sqlserver2000 took about 5-7 minutes... >> Im using the SAME program, some changes necessary but nothing really >> important. > > How were you loading the data, one row per transaction or batched in some > fashion? One row per transaction, in both cases sqlserver and pgsql. > > What do the schema(s) of the table(s) involved look like? What (if any) > constraints are involved (esp. foreign keys). There are 50 tables, but the ones were the insertion is taking place are only 10. Each of these 10 has one FK ((compoiste) and in a couple cases 2 FK (composite). The most used tables have more than 20 columns, and in two cases i have to break the insertion into two, an insertion and an update due to the large number of columns (39 and 45). Functions dont allow more than 32 parameters, anyone know why? I dont know if you would like to see ALL of the schemas, or if what ive told you is enough or if maybe you would like to see the schemas of the top 5 tables. > > What do the specs of the relative machines look like? Especially given > that in one case you're running multiple things on one machine and in the > other it sounds like they're on separate machines? > Yes sqlserver is in one server and locally im using postgresl. My local machine is a PowerPc G4 533Mhz with 256Megs in RAM, Mac OSX 10.2.6 The DELL server, on which the sqlserver 2000 is running, is a P3 1.4gHz, 1 gig RAM. (What a difference uh? I didnt realize the specs were so different... Until now) > What do your function(s) look like? Have you tried doing a subset not > using functions for comparison? > Ok, so im a newbie and i dont understand exactly the question, but the following is one of the most used functions (please tell me how and what is a subset and for what is it useful?): DECLARE Lista ALIAS FOR $1; Libro ALIAS FOR $2; RS ALIAS FOR $3; Dir ALIAS FOR $4; Tel ALIAS FOR $5; Ext ALIAS FOR $6; Ind ALIAS FOR $7; Nuni ALIAS FOR $8; DRS ALIAS FOR $9; DDir2 ALIAS FOR $10; DTel2 ALIAS FOR $11; DExt2 ALIAS FOR $12; DInd2 ALIAS FOR $13; DNuni ALIAS FOR $14; DAdi2 ALIAS FOR $15; Ser ALIAS FOR $16; Avi ALIAS FOR $17; PosAvi ALIAS FOR $18; Refe ALIAS FOR $19; CRS ALIAS FOR $20; CDir2 ALIAS FOR $21; CTel2 ALIAS FOR $22; CExt2 ALIAS FOR $23; CInd2 ALIAS FOR $24; CNuni ALIAS FOR $25; TipoT ALIAS FOR $26; MAScTel ALIAS FOR $27; pXTex ALIAS FOR $28; pYTex ALIAS FOR $29; pXDir ALIAS FOR $30; pYDir ALIAS FOR $31; pXTel ALIAS FOR $32; BEGIN IF CAST(Lista AS CHAR) not in (select IdLista from cabezalis ta where IdLista = CAST(Lista AS CHAR) and IdLibro = CAST(Libro AS CHAR(6))) THEN insert into cabezalista(IdLista, IdLibro, Rsoc, Dire ccion, Telefono, Extension, Indicativo, NumUnico, DRsoc, DDir, Dtel, DExt, D Adi, DInd, DNumUni, Serie, IdAviso, PosLogo, IdReferencia, CRsoc, CDir, CTel , CExt, Cind, CNumUni, TipoTel, MAScaraTel, posXTex, posytex, posxdir, posyd ir, posxtel, posytel, posxext, posyext, posxind, posyind, posxnum, posynum) values(Lista, Libro, RS, Dir, Tel, E xt, Ind, Nuni, DRS, DDir2, DTel2, DExt2, DAdi2, DInd2, DNuni, Ser, Avi, PosA vi, Refe, CRS, CDir2, CTel2, CExt2, CInd2, CNuni, TipoT, MAScTel, pXTex, pYT ex, pXDir, pYDir, pXTel, 0, 0, 0, 0, 0, 0, 0); return Lista; ELSE return Libro; END IF; END; The above is in plpgsql language. THANKS FOR YOUR HELP!!!!!!!!!!! I'll be forever in debt with this mailing list! JuanF PG rulz! > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
On Thu, 4 Sep 2003, Juan Francisco Diaz wrote: > El 9/4/03 5:00 PM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> escribi�: > > > On Thu, 4 Sep 2003, Juan Francisco Diaz wrote: > > > >> Hi again, I just tried the first load in my brand new pg database... > >> And IT SUCKS! yeah! it took over an hour to load around 200 thousand > >> records, while the sqlserver2000 took about 5-7 minutes... > >> Im using the SAME program, some changes necessary but nothing really > >> important. > > > > How were you loading the data, one row per transaction or batched in some > > fashion? > One row per transaction, in both cases sqlserver and pgsql. There is a speed difference generally right now on loading single rows or multiple rows in a transaction. Batching is often preferred if possible. :) > > What do the schema(s) of the table(s) involved look like? What (if any) > > constraints are involved (esp. foreign keys). > There are 50 tables, but the ones were the insertion is taking place are > only 10. Each of these 10 has one FK ((compoiste) and in a couple cases 2 FK > (composite). > > The most used tables have more than 20 columns, and in two cases i have to > break the insertion into two, an insertion and an update due to the large > number of columns (39 and 45). Functions dont allow more than 32 parameters, > anyone know why? You can change it at compile time. Raising it offers a small amount of overhead currently and so it's not raised for the default install. > I dont know if you would like to see ALL of the schemas, or if what ive told > you is enough or if maybe you would like to see the schemas of the top 5 > tables. One or two representative ones are probably fine. It's a question of data types (do all the foreign key constraints reference columns of the same type for example) and indexes mostly. Note my examples of things to look for down below as well. > > What do the specs of the relative machines look like? Especially given > > that in one case you're running multiple things on one machine and in the > > other it sounds like they're on separate machines? > > > Yes sqlserver is in one server and locally im using postgresl. > My local machine is a PowerPc G4 533Mhz with 256Megs in RAM, Mac OSX 10.2.6 > The DELL server, on which the sqlserver 2000 is running, is a P3 1.4gHz, 1 > gig RAM. (What a difference uh? I didnt realize the specs were so > different... Until now) That's a pretty big difference, especially the ram (especially since it seems like it's also got apache and the client program running on it). Another one might be related to the disk systems. > > What do your function(s) look like? Have you tried doing a subset not > > using functions for comparison? > > > Ok, so im a newbie and i dont understand exactly the question, but the > following is one of the most used functions (please tell me how and what is > a subset and for what is it useful?): I meant a subset of the inserts directly rather than through functions to figure out if things were related to insert performance or the function overhead. See below for a comment on this particular function, however. > [function bits snipped] > ����������������IF�CAST(Lista�AS�CHAR)�not�in�(select�IdLista�from�cabezalis > ta�where�IdLista�=�CAST(Lista�AS�CHAR)�and�IdLibro�=�CAST(Libro�AS�CHAR(6))) > [function bits snipped] I'd suggest IF NOT EXISTS (select ...) since I believe that's equivalent and possibly faster since you're already doing a comparison of IdLista=CAST(Lista AS CHAR) in the subselect. Also, for this particular one, an index on (IdLista,IdLibro) may help the search (although it'd need to be maintained by the insert, so it may be a wash), and you should probably be careful with types here as well (and I'm not sure why you aren't just declaring lista and libro as the appropriate types and not casting them).