Re: Thoughts on how to avoid a massive integer update. - Mailing list pgsql-general

From Fehrle, Brian
Subject Re: Thoughts on how to avoid a massive integer update.
Date
Msg-id 0F5992D4-9317-4007-9036-202DC89BA0D8@comscore.com
Whole thread Raw
In response to Re: Thoughts on how to avoid a massive integer update.  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: Thoughts on how to avoid a massive integer update.
List pgsql-general

On 5/4/20, 3:56 PM, "Adrian Klaver" <adrian.klaver@aklaver.com> wrote:

    [External Email]
    
    
    On 5/4/20 2:32 PM, Fehrle, Brian wrote:
    > Hi all,
    >
    > This is a shot in the dark in hopes to find a magic bullet to fix an
    > issue I have, I can’t personally think of any solution myself.
    >
    > I have a database with hundreds of terabytes of data, where every table
    > has an integer column referencing a small table. For reasons out of my
    > control and cannot change, I NEED to update every single row in all
    > these tables, changing the integer value to a different integer.
    >
    > Since I have to deal with dead space, I can only do a couple tables at a
    > time, then do a vacuum full after each one.
    
    Why?
    A regular vacuum would mark the space as available.
    
A regular vacuum would mark the space as available – ***for re-use***, it will not release the space back to the drive
as‘unused’.  99% of my tables are old data that will not receive any future inserts or updates, which means that space
thatis marked ready for ‘reuse’ will not ever be used. 
 
This means that a 100GB table will be updated, and every row marked as dead and re-created with the newly updated data.
Thistable is now 200GB in size. A vacuum will keep it at 200GB of space used, freeing up the 100GB of dead space as
ready-to-reuse.A vacuum full will make it 100GB again. 
 

Since 99% of my tables will never be updated or inserted into again, this means my ~300 Terabytes of data would be ~600
Terabytesof data on disk. Thus, vacuum full. 
 

    More below.
    
    > Another option is to build a new table with the new values, then drop
    > the old one and swap in the new, either way is very time consuming.
    >
    > Initial tests suggest this effort will take several months to complete,
    > not to mention cause blocking issues on tables being worked on.
    >
    > Does anyone have any hackery ideas on how to achieve this in less time?
    > I was looking at possibly converting the integer column type to another
    > that would present the integer differently, like a hex value, but
    > everything still ends up requiring all data to be re-written to disk. In
    > a well designed database (I didn’t design it :) ), I would simply change
    > the data in the referenced table (200 total rows), however the key being
    > referenced isn’t just an arbitrary ID, it’s actual ‘data’, and must be
    > changed.
    
    I'm not following above.
    
    Could you show an example table relationship?

It’s a simple one-to-many relationship:
*Info_table*
info_table_sid integer


*data_table*
data_table_sid integer,
info_table_id integer references info_table(info_table_sid),




    >
    > Thanks for any thoughts or ideas,
    >
    >   * Brian F
    >
    
    
    --
    Adrian Klaver
    adrian.klaver@aklaver.com
    


pgsql-general by date:

Previous
From: Tim Clarke
Date:
Subject: Re: Best way to use trigger to email a report ?
Next
From: Rob Sargent
Date:
Subject: Re: Thoughts on how to avoid a massive integer update.