If I make the index unique on the email column, the updates slows
down an additional 5 minutes to a total of 25 minutes.
I realize I did not understand how PostgreSQL handles updates,
but the performance implications are huge. It is not unreasonable
to have 3 indexes on a table with a number of columns. With the
way that PostgreSQL handles an update, any update to that table
will impact the entire data row and all of the indexes, which
seems invariably slow.
At Manfred's suggestion, I made the email column unique:
$ time psql -c "update emp set email = email||oid;"
UPDATE 1230703
real 4990m31.151s
user 0m0.010s
sys 0m0.000s
There was no index on the email column when I did that update.
I then created an index emp_ix03 on emp (email).
I then did the fk_dept_id update, which took 25 minutes 52 seconds.
A vacuum afterwards shows:
NOTICE: Index emp_ix03: Pages 14689; Tuples 1230703: Deleted 597153.
CPU 2.58s/8.88u sec elapsed 79.83 sec.
Having unique data in the indexed column does benefit this process.
It really seems that the mere fact of another index with keys in
it slows down the update.
Yet I can't imagine I'm the first person to experience this.
Thoughts?
Thanks,
Tom