Hello!
I am relative newcomer to SQL and PostgreSQL world, so please forgive me
if this question is stupid.
I am experiencing strange behaviour, where simple UPDATE of one field is
very slow, compared to INSERT into table with multiple indexes. I have
two tables - one with raw data records (about 24000), where one field
contains status information (varchar(10)). First table has no indexes,
only primary key (recid). Second table contains processed records - some
fields are same as first table, others are calculated during processing.
Records are processed by Python script, which uses PyPgSQL for PostgreSQL
access.
Processing is done by selecting all records from table1 where status
matches certain criteria (import). Each record is processed and results
are inserted into table2, after inserting status field on same record in
table1 is updated with new value (done). Update statement itself is
extremely simple: "update table1 set status = 'done' where recid = ..."
Most interesting is, that insert takes 0.004 seconds in average, but
update takes 0.255 seconds in average. Processing of 24000 records took
around 1 hour 20 minutes.
Then i changed processing logic not to update every record in table1
after processing. Instead i did insert recid value into temporary table
and updated records in table1 after all records were processed and
inserted into table2:
UPDATE table1 SET Status = 'done' WHERE recid IN (SELECT recid FROM temptable)
This way i got processing time of 24000 records down to about 16 minutes.
About 13 minutes from this took last UPDATE statement.
Why is UPDATE so slow compared to INSERT? I would expect more or less
similar performance, or slower on insert since table2 has four indexes
in addition to primary key, table1 has only primary key, which is used
on update. Am i doing something wrong or is this normal?
I am using PostgreSQL 7.3.4, Debian/GNU Linux 3.0 (Woody),
kernel 2.4.21, Python 2.3.2, PyPgSQL 2.4
--
Ivar Zarans