Alter/update large tables - VERRRY annoying behaviour! - Mailing list pgsql-bugs

From Dmitry Tkach
Subject Alter/update large tables - VERRRY annoying behaviour!
Date
Msg-id 3CBB08C8.7000705@openratings.com
Whole thread Raw
List pgsql-bugs
Hi, everybody!

I was wonderring if it is a known "feature" to begin with, and, if there are any plans to fix
it in future?
I had two very large tables in my database (about 30 million rows each), connected by a foreign key,
and wanted to merge them together...

Something like this:

create table a
(
    id int primary key,
    some_data int
);
create table b
(
     id int unique references a,
     other_data int
);

So, what I did was:

alter table a add other_data int;
update a set other_data = b.other_data from b where b.id=a.id;

This took me awfully long, but worked (I guess).
I say 'I guess', because I wasn't able so far to verify that - when I triued to do

select * from a limit 1;

It just hungs on me ... at least, it looks like it does.

Lucky me, I have compiled the backend from sources with full debug info, because if I hadn't done that,
(as most users), I would certainly had thought, that my database is hopelessly corrupted, and would have to
recreate it from scratch :-(
Instead, I loaded the whole thing into a debugger, because that seems to be the only way to figure out what
the hell it is thinking about...
So, what I found out was that it seems to have recreated my entire table when I updated it, and left all the
old tuples in it as well, so, my 'select *...limit 1' query was cycling through 30 million deleted tuples, trying
to find the first one that was still valid, and that's waht was taking that long time...

First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert?
Or is my problem specific to the fact that I have altered the table and add new columns?

Now, I understand, that, if I vacuum'ed it, the problem would have been resolved... The problems I have with it
though are:

- As I said, the behaviour was so unexpected, I would have trashed the whole databse, if I wasn't able to debug
it... If there is no other possible solution, I think, at the very least, it should give the user some indication
that it's not hopelessly hung, when doing that query...

- Vacuum, isn't the speediest thing in the world too (it's been running for a hour now, and still has not finished).
I was hoping to complete modifying my schema first, and then just vacuum everything once. So, it would be
REALLY, REALLY helpful for situations like that, if PG was smart enough to keep track somehow of those deleted
tuples, to avoid having to scan through them all every time...
In my particular situation, the solution would be trivial (just remembering the address of the first valid
tuple would suffice - because the entire table was updated)... I am not familiar enough with internals to suggest
anything more general than this, but EVEN fixing only this particular scenario, would, I believe, be extremely
useful....

Do you agree?

Thanks a lot!

Dima.


pgsql-bugs by date:

Previous
From: Andreas Schwab
Date:
Subject: Re: Bug #630: date/time storage problem: timestamp parsed
Next
From: Bruce Momjian
Date:
Subject: Re: Problem Modify