Re: Thoughts on how to avoid a massive integer update. - Mailing list pgsql-general
From | Fehrle, Brian |
---|---|
Subject | Re: Thoughts on how to avoid a massive integer update. |
Date | |
Msg-id | 0F5992D4-9317-4007-9036-202DC89BA0D8@comscore.com Whole thread Raw |
In response to | Re: Thoughts on how to avoid a massive integer update. (Adrian Klaver <adrian.klaver@aklaver.com>) |
Responses |
Re: Thoughts on how to avoid a massive integer update.
|
List | pgsql-general |
On 5/4/20, 3:56 PM, "Adrian Klaver" <adrian.klaver@aklaver.com> wrote: [External Email] On 5/4/20 2:32 PM, Fehrle, Brian wrote: > Hi all, > > This is a shot in the dark in hopes to find a magic bullet to fix an > issue I have, I can’t personally think of any solution myself. > > I have a database with hundreds of terabytes of data, where every table > has an integer column referencing a small table. For reasons out of my > control and cannot change, I NEED to update every single row in all > these tables, changing the integer value to a different integer. > > Since I have to deal with dead space, I can only do a couple tables at a > time, then do a vacuum full after each one. Why? A regular vacuum would mark the space as available. A regular vacuum would mark the space as available – ***for re-use***, it will not release the space back to the drive as‘unused’. 99% of my tables are old data that will not receive any future inserts or updates, which means that space thatis marked ready for ‘reuse’ will not ever be used. This means that a 100GB table will be updated, and every row marked as dead and re-created with the newly updated data. Thistable is now 200GB in size. A vacuum will keep it at 200GB of space used, freeing up the 100GB of dead space as ready-to-reuse.A vacuum full will make it 100GB again. Since 99% of my tables will never be updated or inserted into again, this means my ~300 Terabytes of data would be ~600 Terabytesof data on disk. Thus, vacuum full. More below. > Another option is to build a new table with the new values, then drop > the old one and swap in the new, either way is very time consuming. > > Initial tests suggest this effort will take several months to complete, > not to mention cause blocking issues on tables being worked on. > > Does anyone have any hackery ideas on how to achieve this in less time? > I was looking at possibly converting the integer column type to another > that would present the integer differently, like a hex value, but > everything still ends up requiring all data to be re-written to disk. In > a well designed database (I didn’t design it :) ), I would simply change > the data in the referenced table (200 total rows), however the key being > referenced isn’t just an arbitrary ID, it’s actual ‘data’, and must be > changed. I'm not following above. Could you show an example table relationship? It’s a simple one-to-many relationship: *Info_table* info_table_sid integer *data_table* data_table_sid integer, info_table_id integer references info_table(info_table_sid), > > Thanks for any thoughts or ideas, > > * Brian F > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: