Thread: UPDATE becomes mired / win32
I'm having an interesting (perhaps anomalous) variability in UPDATE performance on a table in my database, and wanted to see if there was any interest in looking further before I destroy the evidence and move on. The table, VOTER, contains 3,090,013 rows and each row is about 120 bytes wide. It's loaded via a batch process in one shot, and the load is followed by an VACUUM FULL ANALYZE. Its structure is shown at the bottom of the message. If I run the statement: (1): UPDATE voter SET gender = 'U'; on the table in this condition, the query effectively never ends -- I've allowed it to run for 12-14 hours before giving up. The plan for that statement is: Seq Scan on voter (cost=0.00..145117.38 rows=3127738 width=120) However, if I do the following: (2): CREATE TABLE voter_copy AS SELECT * FROM voter; (3): UPDATE voter_copy SET gender = 'U'; the query is much faster -- Seq Scan on voter_copy (cost=0.00..96231.35 rows=3090635 width=120) (actual time=108.056..43203.696 rows=3090013 loops=1) Total runtime: 117315.731 ms When (1) is running, the machine is very nearly idle, with no postmaster taking more than 1 or 2 % of the CPU. When (3) is running, about 60% CPU utilization occurs. The same behavior occurs if the table is dumped and reloaded. My environment is Windows XP SP2 and I'm on Postgresql 8.1.4 installed via the msi installer. Hardware is an Athlon 2000+ 1.67ghx, with 1G RAM. The database is hosted on a Seagate Barracuda 7200.10 connected via a FastTrak 4300 without any RAID configuration. dd shows a write speed of 39 MB/s and read speed of 44 MB/s. The server configuration deviates from the default in these statements: fsync = off shared_buffers = 25000 work_mem = 50000 maintenance_work_mem = 100000 CREATE TABLE voter ( voter_id int4, sos_voter_id varchar(20), household_id int4, first_name varchar(40), middle_name varchar(40), last_name varchar(40), name_suffix varchar(10), phone_number varchar(10), bad_phone_no bool, registration_date date, birth_year int4, gender char(1), pri_ind char(1), gen_1992_primary_party char(1), council_votes int2, primary_votes int2, council_primary_votes int2, special_votes int2, presidential_votes int2, votes int2, absentee_votes int2, last_voted_date date, first_voted_date date, rating char(1), score float4, general_votes int2 ) WITHOUT OIDS;
> The table, VOTER, contains 3,090,013 rows and each row is about 120 bytes > wide. It's loaded via a batch process in one shot, and the load is > followed by an VACUUM FULL ANALYZE. Its structure is shown at the bottom > of the message. if the table wasn't empty before and has indices defined, try a "REINDEX TABLE VOTER" before running the update. i had a similar case where an often updated table was vacuumed regurarly, but the indices grew and grew and grew. in my case the table - even when empty and analyze full'ed was 1.2gb according to pgadmin due to (outdated) indices. a reindex fixed all my performance issues. - thomas
Steve Peterson <stevep-hv@zpfe.com> writes: > If I run the statement: > (1): UPDATE voter SET gender = 'U'; > on the table in this condition, the query effectively never ends -- > I've allowed it to run for 12-14 hours before giving up. > ... > When (1) is running, the machine is very nearly idle, with no > postmaster taking more than 1 or 2 % of the CPU. Is the disk busy? If neither CPU nor I/O are saturated, then it's a good bet that the UPDATE isn't actually running at all, but is waiting for a lock somewhere. Have you looked into pg_locks to check for a conflicting lock? regards, tom lane
Both commands seem to be saturating the disk. There's nothing else running in the database, and all of the locks have 't' in the granted column, which I'm assuming means they're not blocked. According to the statistics, the original table has 889 mb and indexes of 911mb, whereas the copy has 1021 mb and no space for indexes. Steve At 03:28 PM 10/4/2006, Tom Lane wrote: >Steve Peterson <stevep-hv@zpfe.com> writes: > > If I run the statement: > > (1): UPDATE voter SET gender = 'U'; > > on the table in this condition, the query effectively never ends -- > > I've allowed it to run for 12-14 hours before giving up. > > ... > > When (1) is running, the machine is very nearly idle, with no > > postmaster taking more than 1 or 2 % of the CPU. > >Is the disk busy? If neither CPU nor I/O are saturated, then it's a >good bet that the UPDATE isn't actually running at all, but is waiting >for a lock somewhere. Have you looked into pg_locks to check for a >conflicting lock? > > regards, tom lane
I'm pretty sure that the table was empty before doing the load, but I gave this a shot. It didn't have an impact on the results. The behavior also persists across a dump/reload of the table into a new install on a different machine. IIRC dump/reload rebuilds indexes from scratch. Steve At 01:13 PM 10/4/2006, me@alternize.com wrote: >>The table, VOTER, contains 3,090,013 rows and each row is about 120 >>bytes wide. It's loaded via a batch process in one shot, and the >>load is followed by an VACUUM FULL ANALYZE. Its structure is shown >>at the bottom of the message. > > >if the table wasn't empty before and has indices defined, try a >"REINDEX TABLE VOTER" before running the update. i had a similar >case where an often updated table was vacuumed regurarly, but the >indices grew and grew and grew. in my case the table - even when >empty and analyze full'ed was 1.2gb according to pgadmin due to >(outdated) indices. a reindex fixed all my performance issues. > >- thomas >
Steve Peterson <stevep-hv@zpfe.com> writes: > The behavior also persists across a dump/reload of the table into a > new install on a different machine. IIRC dump/reload rebuilds > indexes from scratch. Hm. There must be something you're not telling us that accounts for the difference between the original table and the copied version --- foreign keys linking from other tables, perhaps? regards, tom lane