Grave performance issues... - Mailing list pgsql-general
From | Ztream |
---|---|
Subject | Grave performance issues... |
Date | |
Msg-id | a0fr1a$1eil$1@news.tht.net Whole thread Raw |
Responses |
Re: Grave performance issues...
|
List | pgsql-general |
I am porting a system that I have been developing on an MS platform (including SQL Server) to a linux system with postgresql. Apart from the usual difficulties (and not being very knowledgeable in the unix world), I have now gotten most things to work. However, I am experiencing some pretty serious performance issues. The system collects data from users thru a web interface, and also has a background process that does rather heavy work. I will here refer to but one of the SQL statements, to keep things focused. The idea here is that the background process is, among other things, constructing/updating a table called Distribution from a table called Data, which then represents the statistic distribution of the information found in Data (using user groups, too). The update-statement actually performing this looks like this: UPDATE Distribution SET Value = ( SELECT COUNT(*) FROM Data INNER JOIN GroupMember ON Data.UserID = GroupMember.UserID WHERE ItemID = Distribution.ItemID AND Grade = Distribution.Grade AND WeightGroupID = Distribution.WeightGroupID ) Distribution takes the form: CREATE TABLE Distribution ( Grade int NOT NULL, ItemID numeric(18, 0) NOT NULL, WeightGroupID numeric(18, 0) NOT NULL, Value float NOT NULL, PRIMARY KEY (Grade, ItemID, WeightGroupID) ); I am unsure if I need to post the DDL descriptions of the other tables; feel free to ask. Anyway, there are two seemingly distinct problems with this: *1 As you can see, this UPDATE touches the entire table upon each execution, which in postgresql seems to mean that performance quickly deteriorates unless you are constantly vacuuming. To answer an obvious question, I *do* feel that this is a necessary thing to do - if I were to update only those rows that would acquire a new value, it would most probably turn out to be the majority of them as the system is expected to receive a steady flow of input data when launched. This operation also needs to be performed very often - ideally about every 10th second. I tried leaving the background process running for a few days without vacuuming during the holiday, and while testing today I found that the above query takes aproximately 2 minutes to complete - for a Distribution size of 600 rows! Also, neither of the other two tables referenced contained more than 30 rows. *2 Even when the involved tables are freshly vacuumed (and vacuum analyzed), the above query using the data amount from (*1) takes about 4 seconds to complete. That should be compared to the about 50ms it takes to complete for the same amount of data on Windows 2000 against SQL Server on a comparable machine. Needless to say, this is rather catastrophical, and I'm pretty much out of ideas. The performance values given above are those I got while executing the query directly in psql, with no other connections to the database, so this does not seem to me to be a problem with the system in itself. A configuration problem, perhaps? A bug? SQL ignorance on my part? I could surely use some help. As previously noted, I am not very comfortable with unix systems, and not exactly an SQL guru either. The linux distribution is some version of redhat, and the postgresql version is 7.1.3. Any help or attempt to help would be greatly appreciated. / Ztream
pgsql-general by date: