Re: Thoughts on how to avoid a massive integer update. - Mailing list pgsql-general

From Rob Sargent
Subject Re: Thoughts on how to avoid a massive integer update.
Date
Msg-id 1ad51d80-94d5-cd49-cda0-d55daf9bfbb8@gmail.com
Whole thread Raw
In response to Thoughts on how to avoid a massive integer update.  ("Fehrle, Brian" <bfehrle@comscore.com>)
List pgsql-general


On 5/4/20 3:32 PM, Fehrle, Brian wrote:

Hi all,

 

This is a shot in the dark in hopes to find a magic bullet to fix an issue I have, I can’t personally think of any solution myself.

 

I have a database with hundreds of terabytes of data, where every table has an integer column referencing a small table. For reasons out of my control and cannot change, I NEED to update every single row in all these tables, changing the integer value to a different integer.

 

Since I have to deal with dead space, I can only do a couple tables at a time, then do a vacuum full after each one.
Another option is to build a new table with the new values, then drop the old one and swap in the new, either way is very time consuming.

 

Initial tests suggest this effort will take several months to complete, not to mention cause blocking issues on tables being worked on.

 

Does anyone have any hackery ideas on how to achieve this in less time? I was looking at possibly converting the integer column type to another that would present the integer differently, like a hex value, but everything still ends up requiring all data to be re-written to disk. In a well designed database (I didn’t design it :) ), I would simply change the data in the referenced table (200 total rows), however the key being referenced isn’t just an arbitrary ID, it’s actual ‘data’, and must be changed.

 

Thanks for any thoughts or ideas,

  • Brian F

Here's my wife solution:  add a column to small table, fill with duplicate of the original column.  Change the foreign keys of the huge tables to reference the new column in the small table.  Update the original values to what you now need them to be.  (Not sure how much re-index cost you'll see here, but non-zero I'm guessing)

Your reports now must of course return the new value via joins to the dictionary(?) table.  Views my be your best bet here.

pgsql-general by date:

Previous
From: Clifford Snow
Date:
Subject: Installing Postgis25_11
Next
From: Tom Lane
Date:
Subject: Re: Temporary table has become problematically persistent