Thread: Permance issues with migrated db
I posted an issue I was having with a migration from MS SQL server to pgsql over the weekend. Tom's suggestion for the query I posted was right on, I made the appropriate updates to column types by dumping, changing and restoring the database. I then analyze'd the db and my query performance was equal to that of MS SQL. This is my first migration and also my first time trying to spot performance issues, mainly because all the previous db's I've worked with were built from scratch, never an issue with performance, but never worked with so much data either (not sure if that has anything to do with my issues). I have developed a view in pgsql that takes over 160K ms to execute, but when copied into MS SQL against the old database (with syntax mods of course), runs in a few seconds. Seems the issues are with tblcontactinfo and tblclientactivitytag. Only if I remove all references to *both* tables do I get good performance from the query. Thanks for any help! SELECT tblclientmaster.fldclientnumber, tblclientmaster.fldclientname, tblclientmaster.fldclienttype, tblclientmaster.fldbuyingstatus,tblclientmaster.fldsellingstatus, tblclientproductpreference.fldfullservice, tblclientproductpreference.fldlimitedservice,tblclientproductpreference.fldallsuite, tblclientproductpreference.fldbudget,tblclientproductpreference.fldconference, tblclientproductpreference.fldresort, tblclientproductpreference.flddailyfee,tblclientproductpreference.fldsemiprivate, tblclientproductpreference.fldprivate,tblclientproductpreference.fldmunicipal, tblclientroomsize.fldsize149, tblclientroomsize.fldsize299,tblclientroomsize.fldsize449, tblclientroomsize.fldsize599, tblclientroomsize.fldsize600, tblgeopreference.fldsw,tblgeopreference.fldnw, tblgeopreference.fldmw, tblgeopreference.fldw, tblgeopreference.fldma, tblgeopreference.fldse,tblgeopreference.flds, tblgeopreference.fldne, tblproductmaster.fldproductname, tblproductmaster.fldproductcode,tblcontactinfo.fldcontactnumber, tblcontactinfo.fldcontactfirstname, tblcontactinfo.fldcontactlastname,(tblcontactinfo.fldcontactaddress1::text || ' '::text) || tblcontactinfo.fldcontactaddress2::textAS fldcontactaddress, tblcontactinfo.fldcontactcity, tblcontactinfo.fldcontactstate,tblcontactinfo.fldcontactzipcode, tblcontactinfo.fldcontacttitle, tblcontactinfo.fldcontactphone2_type,tblcontactinfo.fldcontactphone2_num, tblcontactinfo.fldcontactphone3_num, tblcontactinfo.fldcontactphone4_num,tblcontactinfo.fldcontactphone5_num, tblcontactinfo.fldcontactemail, tblcontactinfo.fldenable,tblcontactinfo.fldcontactphone1_num, tblcontactinfo.fldperscomments, tblclientactivitytag.fldcontactactivitytag FROM tblclientmaster LEFT JOIN tblclientproductpreference ON tblclientmaster.fldclientnumber = tblclientproductpreference.fldclientnumber LEFT JOIN tblclientroomsize ON tblclientmaster.fldclientnumber = tblclientroomsize.fldclientnumber LEFT JOIN tblgeopreference ON tblclientmaster.fldclientnumber = tblgeopreference.fldclientnumber LEFT JOIN tblclientproductrelation ON tblclientmaster.fldclientnumber = tblclientproductrelation.fldclientnumber JOIN tblproductmaster ON tblclientproductrelation.fldproductnumber = tblproductmaster.fldproductnumber LEFT JOIN tblcontactinfo ON tblclientmaster.fldclientnumber = tblcontactinfo.fldclientnumber LEFT JOIN tblclientactivitytag ON tblclientmaster.fldclientnumber = tblclientactivitytag.fldclientnumber ORDER BY tblclientmaster.fldclientnumber, tblclientmaster.fldclientname, tblclientmaster.fldclienttype, tblclientmaster.fldbuyingstatus,tblclientmaster.fldsellingstatus, tblclientproductpreference.fldfullservice, tblclientproductpreference.fldlimitedservice,tblclientproductpreference.fldallsuite, tblclientproductpreference.fldbudget,tblclientproductpreference.fldconference, tblclientproductpreference.fldresort, tblclientproductpreference.flddailyfee,tblclientproductpreference.fldsemiprivate, tblclientproductpreference.fldprivate,tblclientproductpreference.fldmunicipal, tblclientroomsize.fldsize149, tblclientroomsize.fldsize299,tblclientroomsize.fldsize449, tblclientroomsize.fldsize599, tblclientroomsize.fldsize600, tblgeopreference.fldsw,tblgeopreference.fldnw, tblgeopreference.fldmw, tblgeopreference.fldw, tblgeopreference.fldma, tblgeopreference.fldse,tblgeopreference.flds, tblgeopreference.fldne, tblproductmaster.fldproductname, tblproductmaster.fldproductcode,tblcontactinfo.fldcontactnumber, tblcontactinfo.fldcontactfirstname, tblcontactinfo.fldcontactlastname,(tblcontactinfo.fldcontactaddress1::text || ' '::text) || tblcontactinfo.fldcontactaddress2::text,tblcontactinfo.fldcontactcity, tblcontactinfo.fldcontactstate, tblcontactinfo.fldcontactzipcode,tblcontactinfo.fldcontacttitle, tblcontactinfo.fldcontactphone2_type, tblcontactinfo.fldcontactphone2_num,tblcontactinfo.fldcontactphone3_num, tblcontactinfo.fldcontactphone4_num, tblcontactinfo.fldcontactphone5_num,tblcontactinfo.fldcontactemail, tblcontactinfo.fldenable, tblcontactinfo.fldcontactphone1_num,tblcontactinfo.fldperscomments, tblclientactivitytag.fldcontactactivitytag; Merge Left Join (cost=35366.42..57565.28 rows=565261 width=297) (actual time=1000.457..148111.905 rows=41866801 loops=1) Merge Cond: (tblclientmaster.fldclientnumber = tblclientactivitytag.fldclientnumber) -> Merge Join (cost=18768.38..29470.77 rows=76505 width=286) (actual time=437.703..1575.449 rows=360822 loops=1) Merge Cond: (tblclientmaster.fldclientnumber = tblclientproductrelation.fldclientnumber) -> Merge Left Join (cost=0.00..9495.89 rows=23878 width=257) (actual time=0.224..302.321 rows=24908 loops=1) Merge Cond: (tblclientmaster.fldclientnumber = tblcontactinfo.fldclientnumber) -> Merge Left Join (cost=0.00..4044.08 rows=16485 width=90) (actual time=0.172..182.162 rows=16547 loops=1) Merge Cond: (tblclientmaster.fldclientnumber = tblclientproductpreference.fldclientnumber) -> Merge Left Join (cost=0.00..3212.26 rows=16485 width=80) (actual time=0.134..130.524 rows=16514loops=1) Merge Cond: (tblclientmaster.fldclientnumber = tblclientroomsize.fldclientnumber) -> Merge Left Join (cost=0.00..2396.89 rows=16485 width=75) (actual time=0.097..81.108 rows=16497loops=1) Merge Cond: (tblclientmaster.fldclientnumber = tblgeopreference.fldclientnumber) -> Index Scan using pk_tblclientmaster on tblclientmaster (cost=0.00..1582.44 rows=16485width=67) (actual time=0.053..22.425 rows=16484 loops=1) -> Index Scan using ix_tblgeopreference_fldclientnumber on tblgeopreference (cost=0.00..606.38rows=13398 width=16) (actual time=0.033..14.121 rows=13397 loops=1) -> Index Scan using ix_tblclientroomsize_fldclientnumber on tblclientroomsize (cost=0.00..606.72rows=13395 width=13) (actual time=0.032..14.240 rows=13414 loops=1) -> Index Scan using ix_tblclientproductpreference on tblclientproductpreference (cost=0.00..623.95rows=13391 width=18) (actual time=0.031..15.194 rows=13439 loops=1) -> Index Scan using ix_tblcontactinfo_fldclientnumber on tblcontactinfo (cost=0.00..5113.21 rows=23878 width=175)(actual time=0.046..50.194 rows=24512 loops=1) -> Sort (cost=18768.38..18900.69 rows=52924 width=37) (actual time=437.401..724.717 rows=361884 loops=1) Sort Key: tblclientproductrelation.fldclientnumber -> Merge Join (cost=0.00..12985.06 rows=52924 width=37) (actual time=0.827..299.983 rows=52904 loops=1) Merge Cond: (tblclientproductrelation.fldproductnumber = tblproductmaster.fldproductnumber) -> Index Scan using ix_tblclientproductrelation_fldproductnumber on tblclientproductrelation (cost=0.00..2210.71rows=52950 width=12) (actual time=0.443..53.273 rows=52950 loops=1) -> Index Scan using pk_tblproperty on tblproductmaster (cost=0.00..9987.31 rows=52924 width=37) (actualtime=0.377..102.800 rows=52927 loops=1) -> Sort (cost=16598.04..16902.54 rows=121800 width=19) (actual time=562.569..29677.017 rows=41662721 loops=1) Sort Key: tblclientactivitytag.fldclientnumber -> Seq Scan on tblclientactivitytag (cost=0.00..3394.00 rows=121800 width=19) (actual time=12.772..115.473 rows=121800loops=1) Total runtime: 163914.674 ms Indexes were added in an effort to help the issue, but no help. These indexes do not exists in MS SQL server: CREATE TABLE "public"."tblclientactivitytag" ( "fldclientnumber" BIGINT, "fldcontactactivitytag" VARCHAR(100), "fldclientname_old" VARCHAR(100), "fldcontactfirstname" VARCHAR(50), "fldcontactlastname" VARCHAR(50), "fldcontactnumber" BIGINT, "fldclientactivitytagid" BIGSERIAL, CONSTRAINT "tblclientactivitytag_pkey" PRIMARY KEY("fldclientactivitytagid"), CONSTRAINT "test2" FOREIGN KEY ("fldclientnumber") REFERENCES "public"."tblclientmaster"("fldclientnumber") ON DELETE CASCADE ON UPDATE CASCADE NOT DEFERRABLE ) WITHOUT OIDS; CREATE INDEX "ix_tblclientactivitytag_fldclientactivitytagid" ON "public"."tblclientactivitytag" USING btree ("fldclientactivitytagid"); CREATE INDEX "ix_tblclientactivitytag_fldclientnumber" ON "public"."tblclientactivitytag" USING btree ("fldclientnumber"); CREATE INDEX "ix_tblclientactivitytag_fldcontactnumber" ON "public"."tblclientactivitytag" USING btree ("fldcontactnumber"); CREATE INDEX "ix_tblclientactivitytag_tag" ON "public"."tblclientactivitytag" USING btree ("fldcontactactivitytag"); CREATE TABLE "public"."tblcontactinfo" ( "fldclientnumber" BIGINT, "fldcontactnumber" BIGSERIAL, "fldcontactfirstname" VARCHAR(50), "fldcontactlastname" VARCHAR(50), "fldcontactaddress1" VARCHAR(60), "fldcontactaddress2" VARCHAR(50), "fldcontactcity" VARCHAR(50), "fldcontactstate" VARCHAR(50), "fldcontactzipcode" VARCHAR(10), "fldclientname_remove" VARCHAR(100), "fldcontacttype" VARCHAR(150), "fldcontactsalutation" VARCHAR(50), "fldcontactdear" VARCHAR(50), "fldcontacttitle" VARCHAR(100), "fldcontactphone1_type" VARCHAR(50), "fldcontactphone1_num" VARCHAR(20), "fldcontactphone1_num_ext" VARCHAR(20), "fldcontactphone2_type" VARCHAR(50), "fldcontactphone2_num" VARCHAR(20), "fldcontactphone3_type" VARCHAR(50), "fldcontactphone3_num" VARCHAR(20), "fldcontactphone4_type" VARCHAR(50), "fldcontactphone4_num" VARCHAR(20), "fldcontactphone5_type" VARCHAR(50), "fldcontactphone5_num" VARCHAR(20), "fldcontactemail" VARCHAR(50) NOT NULL, "fldcontactwebsite" VARCHAR(75), "fldperscomments" VARCHAR(900), "fldassistant" VARCHAR(100), "fldhfcode" VARCHAR(50), "fldenable" BOOLEAN NOT NULL, "fldpreviousclientnumber" BIGINT, "fldcreateddate" TIMESTAMP WITHOUT TIME ZONE, "fldcreatedby" VARCHAR(50), "fldlastupdateddate" TIMESTAMP WITHOUT TIME ZONE, "fldlastupdatedby" VARCHAR(50), "fldcontactbyemail" BOOLEAN NOT NULL, "fldcontactbyfax" BOOLEAN NOT NULL, "fldcontactbymail" BOOLEAN NOT NULL, "fldcontactbyphone" BOOLEAN NOT NULL, "fldcontactbycell" BOOLEAN NOT NULL, "fldcontactbypager" BOOLEAN NOT NULL, "fldcontactpassword" VARCHAR(8), CONSTRAINT "pk_tblcontactinfo" PRIMARY KEY("fldcontactnumber"), CONSTRAINT "test" FOREIGN KEY ("fldclientnumber") REFERENCES "public"."tblclientmaster"("fldclientnumber") ON DELETE CASCADE ON UPDATE CASCADE NOT DEFERRABLE ) WITHOUT OIDS; CREATE INDEX "ix_tblcontactinfo_fldclientnumber" ON "public"."tblcontactinfo" USING btree ("fldclientnumber"); CREATE INDEX "ix_tblcontactinfo_fldcontactcity" ON "public"."tblcontactinfo" USING btree ("fldcontactcity"); CREATE INDEX "ix_tblcontactinfo_fldcontactstate" ON "public"."tblcontactinfo" USING btree ("fldcontactstate"); CREATE INDEX "ix_tblcontactinfo_fldperscomments" ON "public"."tblcontactinfo" USING btree ("fldperscomments"); CREATE TABLE "public"."tblclientmaster" ( "fldclientnumber" BIGSERIAL, "fldclientname" VARCHAR(100), "fldclienttype" VARCHAR(50), "fldclientparentcompanyname_remove" VARCHAR(100), "fldclientparentcompanynumber" BIGINT, "fldchildren" SMALLINT, "fldclientbuyerseller" VARCHAR(10), "fldterms" VARCHAR(50), "fldmaxdollars" VARCHAR(20), "fldmaxdownpayment" VARCHAR(20), "fldenable" BOOLEAN NOT NULL, "fldcreateddate" TIMESTAMP WITHOUT TIME ZONE, "fldcreatedby" VARCHAR(10), "fldlastupdateddate" TIMESTAMP WITHOUT TIME ZONE, "fldlastupdatedby" VARCHAR(10), "fldclientwebsite" VARCHAR(100), "fldbuyingstatus" VARCHAR(50), "fldsellingstatus" VARCHAR(50), "fldequitystatus" VARCHAR(50), "fldstatusdate" TIMESTAMP WITHOUT TIME ZONE, CONSTRAINT "pk_tblclientmaster" PRIMARY KEY("fldclientnumber") ) WITHOUT OIDS; -- Robert
Robert Fitzpatrick wrote: > I have developed a view in pgsql that takes over 160K ms to execute, but > when copied into MS SQL against the old database (with syntax mods of > course), runs in a few seconds. Your query seems to produce 41.8 million rows. Are you sure MS-SQL is returning that many rows in a few seconds? > Merge Left Join (cost=35366.42..57565.28 rows=565261 width=297) (actual time=1000.457..148111.905 rows=41866801 loops=1) -- Richard Huxton Archonet Ltd
On Tue, 2007-05-22 at 17:21 +0100, Richard Huxton wrote: > Robert Fitzpatrick wrote: > > I have developed a view in pgsql that takes over 160K ms to execute, but > > when copied into MS SQL against the old database (with syntax mods of > > course), runs in a few seconds. > > Your query seems to produce 41.8 million rows. Are you sure MS-SQL is > returning that many rows in a few seconds? > I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100 PERCENT' after SELECT in the query. The Enterprise Manager does not indicate how many rows come back. I save it as a VIEW in MS SQL and do a 'select count(*)...' and, yes, it comes back 42164877 records. Just to be sure MS SQL hasn't done something to the structure (I noticed dbo prefixes, etc.), I pasted back into pgadmin, took off 'top 100 percent'. Then saved as a view and did a count(*) in pgsql, got 41866801. The difference in the number of records could vary due to the dataset age I'm using in pgsql for migration. The MS SQL db is still actively used. I hope I'm just missing something here...this are the queries for mssql and pgsql, respectively... SELECT TOP 100 PERCENT dbo.tblClientMaster.fldClientNumber, dbo.tblClientMaster.fldClientName, dbo.tblClientMaster.fldClientType, dbo.tblClientMaster.fldBuyingStatus, dbo.tblClientMaster.fldSellingStatus, dbo.tblClientProductPreference.fldFullService, dbo.tblClientProductPreference.fldLimitedService, dbo.tblClientProductPreference.fldAllSuite, dbo.tblClientProductPreference.fldBudget, dbo.tblClientProductPreference.fldConference, dbo.tblClientProductPreference.fldResort, dbo.tblClientProductPreference.fldDailyFee, dbo.tblClientProductPreference.fldSemiPrivate, dbo.tblClientProductPreference.fldPrivate, dbo.tblClientProductPreference.fldMunicipal, dbo.tblClientRoomSize.fldSize149, dbo.tblClientRoomSize.fldSize299, dbo.tblClientRoomSize.fldSize449,dbo.tblClientRoomSize.fldSize599, dbo.tblClientRoomSize.fldSize600, dbo.tblGeoPreference.fldSW, dbo.tblGeoPreference.fldNW, dbo.tblGeoPreference.fldMW, dbo.tblGeoPreference.fldW, dbo.tblGeoPreference.fldMA, dbo.tblGeoPreference.fldSE, dbo.tblGeoPreference.fldS,dbo.tblGeoPreference.fldNE, dbo.tblProductMaster.fldProductName, dbo.tblProductMaster.fldProductCode, dbo.tblContactInfo.fldContactNumber, dbo.tblContactInfo.fldContactFirstName, dbo.tblContactInfo.fldContactLastName, dbo.tblContactInfo.fldContactCity,dbo.tblContactInfo.fldContactState, dbo.tblContactInfo.fldContactZipCode, dbo.tblContactInfo.fldContactTitle, dbo.tblContactInfo.fldContactPhone2_Type, dbo.tblContactInfo.fldContactPhone2_Num, dbo.tblContactInfo.fldContactPhone3_Num, dbo.tblContactInfo.fldContactPhone4_Num, dbo.tblContactInfo.fldContactPhone5_Num, dbo.tblContactInfo.fldContactEMail, dbo.tblContactInfo.fldEnable,dbo.tblContactInfo.fldContactPhone1_Num, dbo.tblContactInfo.fldPersComments, dbo.tblClientActivityTag.fldContactActivityTag FROM dbo.tblClientMaster LEFT OUTER JOIN dbo.tblClientProductPreference ON dbo.tblClientMaster.fldClientNumber = dbo.tblClientProductPreference.fldClientNumberLEFT OUTER JOIN dbo.tblClientRoomSize ON dbo.tblClientMaster.fldClientNumber = dbo.tblClientRoomSize.fldClientNumberLEFT OUTER JOIN dbo.tblGeoPreference ON dbo.tblClientMaster.fldClientNumber = dbo.tblGeoPreference.fldClientNumberLEFT OUTER JOIN dbo.tblClientProductRelation ON dbo.tblClientMaster.fldClientNumber = dbo.tblClientProductRelation.fldClientNumberINNER JOIN dbo.tblProductMaster ON dbo.tblClientProductRelation.fldProductNumber = dbo.tblProductMaster.fldProductNumberLEFT OUTER JOIN dbo.tblContactInfo ON dbo.tblClientMaster.fldClientNumber = dbo.tblContactInfo.fldClientNumber LEFTOUTER JOIN dbo.tblClientActivityTag ON dbo.tblClientMaster.fldClientNumber = dbo.tblClientActivityTag.fldClientNumber SELECT tblclientmaster.fldclientnumber, tblclientmaster.fldclientname, tblclientmaster.fldclienttype, tblclientmaster.fldbuyingstatus,tblclientmaster.fldsellingstatus, tblclientproductpreference.fldfullservice, tblclientproductpreference.fldlimitedservice,tblclientproductpreference.fldallsuite, tblclientproductpreference.fldbudget,tblclientproductpreference.fldconference, tblclientproductpreference.fldresort, tblclientproductpreference.flddailyfee,tblclientproductpreference.fldsemiprivate, tblclientproductpreference.fldprivate,tblclientproductpreference.fldmunicipal, tblclientroomsize.fldsize149, tblclientroomsize.fldsize299,tblclientroomsize.fldsize449, tblclientroomsize.fldsize599, tblclientroomsize.fldsize600, tblgeopreference.fldsw,tblgeopreference.fldnw, tblgeopreference.fldmw, tblgeopreference.fldw, tblgeopreference.fldma, tblgeopreference.fldse,tblgeopreference.flds, tblgeopreference.fldne, tblproductmaster.fldproductname, tblproductmaster.fldproductcode,tblcontactinfo.fldcontactnumber, tblcontactinfo.fldcontactfirstname, tblcontactinfo.fldcontactlastname,tblcontactinfo.fldcontactcity, tblcontactinfo.fldcontactstate, tblcontactinfo.fldcontactzipcode,tblcontactinfo.fldcontacttitle, tblcontactinfo.fldcontactphone2_type, tblcontactinfo.fldcontactphone2_num,tblcontactinfo.fldcontactphone3_num, tblcontactinfo.fldcontactphone4_num, tblcontactinfo.fldcontactphone5_num,tblcontactinfo.fldcontactemail, tblcontactinfo.fldenable, tblcontactinfo.fldcontactphone1_num,tblcontactinfo.fldperscomments, tblclientactivitytag.fldcontactactivitytag FROM tblclientmaster LEFT JOIN tblclientproductpreference ON tblclientmaster.fldclientnumber = tblclientproductpreference.fldclientnumber LEFT JOIN tblclientroomsize ON tblclientmaster.fldclientnumber = tblclientroomsize.fldclientnumber LEFT JOIN tblgeopreference ON tblclientmaster.fldclientnumber = tblgeopreference.fldclientnumber LEFT JOIN tblclientproductrelation ON tblclientmaster.fldclientnumber = tblclientproductrelation.fldclientnumber JOIN tblproductmaster ON tblclientproductrelation.fldproductnumber = tblproductmaster.fldproductnumber LEFT JOIN tblcontactinfo ON tblclientmaster.fldclientnumber = tblcontactinfo.fldclientnumber LEFT JOIN tblclientactivitytag ON tblclientmaster.fldclientnumber = tblclientactivitytag.fldclientnumber; -- Robert
> I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100 > PERCENT' after SELECT in the query. The Enterprise Manager does not > indicate how many rows come back. I save it as a VIEW in MS SQL and do a > 'select count(*)...' and, yes, it comes back 42164877 records. No, it comes back 1 record with the count in it, the ORDER BY is useless for a count(*), etc. What is it that you are trying to do exactly ?
Robert Fitzpatrick <lists@webtent.net> writes: > On Tue, 2007-05-22 at 17:21 +0100, Richard Huxton wrote: >> Your query seems to produce 41.8 million rows. Are you sure MS-SQL is >> returning that many rows in a few seconds? > I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100 > PERCENT' after SELECT in the query. The Enterprise Manager does not > indicate how many rows come back. I save it as a VIEW in MS SQL and do a > 'select count(*)...' and, yes, it comes back 42164877 records. > Just to be sure MS SQL hasn't done something to the structure (I noticed > dbo prefixes, etc.), I pasted back into pgadmin, took off 'top 100 > percent'. Then saved as a view and did a count(*) in pgsql, got > 41866801. How much time do the two select count(*) operations take? That would be a reasonably fair comparison of the query engines, as opposed to whatever might be happening on the client side (in particular, I wonder whether the MS client is actually fetching all the rows or just the first few). regards, tom lane
On Tue, 2007-05-22 at 19:04 +0200, PFC wrote: > > I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100 > > PERCENT' after SELECT in the query. The Enterprise Manager does not > > indicate how many rows come back. I save it as a VIEW in MS SQL and do a > > 'select count(*)...' and, yes, it comes back 42164877 records. > > No, it comes back 1 record with the count in it, the ORDER BY is useless > for a count(*), etc. > > What is it that you are trying to do exactly ? Yes, one record indicating over 42 million records available from the view, correct? I realized this after my first post, there is no 'ORDER BY' in my last post with the two query examples. I'm trying to my query in pgsql to return a result in the same amount of time (approx) than it does in the existing mssql db. The query comes back with results using MS SQL Enterprise Manager in seconds and the same query in pgadmin takes super long. I just tried running the query now and it is still going with over 200K ms clocked. If I stop the query, remove all references to tblcontactinfo and tblactivitytag, the query comes back in less than 6000 ms. -- Robert
On Tue, 2007-05-22 at 13:10 -0400, Tom Lane wrote: > Robert Fitzpatrick <lists@webtent.net> writes: > > On Tue, 2007-05-22 at 17:21 +0100, Richard Huxton wrote: > >> Your query seems to produce 41.8 million rows. Are you sure MS-SQL is > >> returning that many rows in a few seconds? > > > I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100 > > PERCENT' after SELECT in the query. The Enterprise Manager does not > > indicate how many rows come back. I save it as a VIEW in MS SQL and do a > > 'select count(*)...' and, yes, it comes back 42164877 records. > > > Just to be sure MS SQL hasn't done something to the structure (I noticed > > dbo prefixes, etc.), I pasted back into pgadmin, took off 'top 100 > > percent'. Then saved as a view and did a count(*) in pgsql, got > > 41866801. > > How much time do the two select count(*) operations take? That would be > a reasonably fair comparison of the query engines, as opposed to > whatever might be happening on the client side (in particular, I wonder > whether the MS client is actually fetching all the rows or just the > first few). Takes 25K ms in pgsql, don't see a timer in MS Ent Manager, but only 5 seconds clocked. Maybe I should put together a php script to operate on each to be using the exact same client. I am doing all this all on the same server with PostgreSQL 8.2 loaded in Windows Server 2003 also with MS SQL server 2000. -- Robert
Robert Fitzpatrick wrote: > On Tue, 2007-05-22 at 13:10 -0400, Tom Lane wrote: >> Robert Fitzpatrick <lists@webtent.net> writes: >>> On Tue, 2007-05-22 at 17:21 +0100, Richard Huxton wrote: >>>> Your query seems to produce 41.8 million rows. Are you sure MS-SQL is >>>> returning that many rows in a few seconds? >>> I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100 >>> PERCENT' after SELECT in the query. The Enterprise Manager does not >>> indicate how many rows come back. I save it as a VIEW in MS SQL and do a >>> 'select count(*)...' and, yes, it comes back 42164877 records. >>> Just to be sure MS SQL hasn't done something to the structure (I noticed >>> dbo prefixes, etc.), I pasted back into pgadmin, took off 'top 100 >>> percent'. Then saved as a view and did a count(*) in pgsql, got >>> 41866801. >> How much time do the two select count(*) operations take? That would be >> a reasonably fair comparison of the query engines, as opposed to >> whatever might be happening on the client side (in particular, I wonder >> whether the MS client is actually fetching all the rows or just the >> first few). > > Takes 25K ms in pgsql, don't see a timer in MS Ent Manager, but only 5 > seconds clocked. Maybe I should put together a php script to operate on > each to be using the exact same client. I am doing all this all on the > same server with PostgreSQL 8.2 loaded in Windows Server 2003 also with > MS SQL server 2000. 1. Be aware that with the real query, PG is giving you the time to fetch the *last* row, whereas MS is probably the *first* row. I'm reasonably sure of this because 4.1 million rows * (say) 256 bytes would be 1GB of data to return, which in a few seconds seems optimistic. 2. If you don't want all the rows in one go, explicitly declare a cursor with PG. 3. The EXPLAIN ANALYSE will not have the format-and-transfer-data-to-client costs, but I think does the rest of the query. 4. We're still 5 x slower than MS-SQL (with the count). That might well be down to having to check visibility on each row with our MVCC rather than just going to the index. Hmm... How much of your machine is PG getting to use vs. MS-SQL? What are your shared_buffers, work_mem, effective_cache_size (and how much RAM on this box)? -- Richard Huxton Archonet Ltd
On Tue, 2007-05-22 at 19:03 +0100, Richard Huxton wrote: > 4. We're still 5 x slower than MS-SQL (with the count). That might > well > be down to having to check visibility on each row with our MVCC > rather > than just going to the index. Tips? I'd love to know how to see inside MVCC. I really appreciate the help! > > Hmm... How much of your machine is PG getting to use vs. MS-SQL? What > are your shared_buffers, work_mem, effective_cache_size (and how much > RAM on this box)? 3.5GB of RAM on a Xeon 2.8GHz server. I have default shared_buffers 32MB, no defaults changed except listen_addresses. How can I check work_mem and effective_cache_size? -- Robert
On Tue, 2007-05-22 at 14:30 -0400, Robert Fitzpatrick wrote: > On Tue, 2007-05-22 at 19:03 +0100, Richard Huxton wrote: > > > > Hmm... How much of your machine is PG getting to use vs. MS-SQL? What > > are your shared_buffers, work_mem, effective_cache_size (and how much > > RAM on this box)? > > 3.5GB of RAM on a Xeon 2.8GHz server. I have default shared_buffers > 32MB, no defaults changed except listen_addresses. How can I check > work_mem and effective_cache_size? I did some googling and came up with some ideas, I have it now with these settings and after restarting PG, no help. work_mem = 5MB shared_buffers = 128MB effective_cache_size = 800MB -- Robert
Robert Fitzpatrick wrote: > On Tue, 2007-05-22 at 19:03 +0100, Richard Huxton wrote: >> 4. We're still 5 x slower than MS-SQL (with the count). That might >> well >> be down to having to check visibility on each row with our MVCC >> rather >> than just going to the index. > > Tips? I'd love to know how to see inside MVCC. I really appreciate the > help! The main thing is that PostgreSQL's implementation of MVCC means that 1. (In many cases) writers need not block readers. 2. An update is effectively a delete and an insert. 3. VACUUM is needed to mark space from deleted rows for re-use. 4. The indexes don't carry visibility information, which means we need to go to the actual row on-disk to see if the current transaction can actually see the row. This last point is a problem for things like count(*) where we can't just count entries in the index because we don't know if some of the rows they point to might be deleted. The reason we don't store visibility info with the index is that it makes the index larger, so using up valuable RAM more quickly. For more info, see "Internals" in the manuals for a start. Then have a quick look around these for some more bits & pieces. There are some presentation slides somewhere. Note - I am *not* a developer, just a long-term user. http://www.postgresql.org/developer/ http://www.postgresql.org/docs/faqs.FAQ_DEV.html http://www.postgresql.org/docs/techdocs >> Hmm... How much of your machine is PG getting to use vs. MS-SQL? What >> are your shared_buffers, work_mem, effective_cache_size (and how much >> RAM on this box)? > > 3.5GB of RAM on a Xeon 2.8GHz server. I have default shared_buffers > 32MB, no defaults changed except listen_addresses. How can I check > work_mem and effective_cache_size? Ah - I bet MS-SQL is squatting on a gig of RAM or some-such too. Read through this - it's a little old, but still good advice. http://www.powerpostgresql.com/PerfList/ You'll probably find increasing work_mem (by a lot) for this one query will help you out. SET work_mem = <something large>; SELECT .... SET work_mem = <small again>; -- Richard Huxton Archonet Ltd
Robert Fitzpatrick wrote: > I did some googling and came up with some ideas, I have it now with > these settings and after restarting PG, no help. > > work_mem = 5MB My last post missed yours - you're ahead of my previous reply :-) Try 32MB, then 64MB, then perhaps 128MB. If it looks like you're not using stupid amounts of RAM on this one sort go higher still. -- Richard Huxton Archonet Ltd
On Tue, 2007-05-22 at 20:13 +0100, Richard Huxton wrote: > Robert Fitzpatrick wrote: > > I did some googling and came up with some ideas, I have it now with > > these settings and after restarting PG, no help. > > > > work_mem = 5MB > > My last post missed yours - you're ahead of my previous reply :-) > > Try 32MB, then 64MB, then perhaps 128MB. If it looks like you're not > using stupid amounts of RAM on this one sort go higher still. > This really has me perplexed now :\ I closed pgadmin, set it to 128MB work_mem and restarted PG, went back into pgadmin and created script from my saved view (again, this view is identical/copied from the same mssql view with only syntax changes). I started the query and then I started looking over that doc you sent me (again, since I just read through it from a google search)...after over 300000 ms, still going, not even finishing. I'm beginning to worry something is wrong with the dataset migrated from mssql? I can migrate again tonight, last done about a month ago. Since then it has been dumped from a linux server and loaded on this Windows server without issue. -- Robert