Thread: Gradual migration from integer to bigint?

Gradual migration from integer to bigint?

From
James Healy
Date:
My organization has a number of very large tables (most 100s of GB, a
couple over a Tb) that were created many years ago by a tool that
defaulted to integer PKs rather than bigint. Those PKs have a number
of integer FKs in related tables as well. We shouldn't have let them
get so big, but that's a conversation for another day.

Some are approaching overflow and we're slowly doing the work to
migrate to bigint. Mostly via the well understood "add a new id_bigint
column, populate on new tuples, backfill the old, switch the PK"
method. The backfill is slow on these large tables, but it works and
there's plenty of blog posts and documentation to follow.

It did make me curious though: would it be possible for postgres to
support gradual migration from integer to bigint in a more transparent
way, where new and updated tuples are written as bigint, but existing
tuples can be read as integer?

I assume maybe a complication is that the catalog says the column is
either 32bit int or 64bit bigint and making that conditional is hard.
There's presumably other considerations I'm unaware of too. My core
question: are there significant technical blockers to supporting this
kind of gradual in place migration, or has it just not been enough of
a problem that it's received attention?

James



Re: Gradual migration from integer to bigint?

From
grimy.outshine830@aceecat.org
Date:
On Sat, Sep 30, 2023 at 03:55:20PM +1000, James Healy wrote:

> It did make me curious though: would it be possible for postgres to
> support gradual migration from integer to bigint in a more
> transparent way, where new and updated tuples are written as bigint,
> but existing tuples can be read as integer?

Language police: this is the *opposite* of "transparent". "trasparent"
and "automated" are not synonyms.

-- 
Ian



Re: Gradual migration from integer to bigint?

From
Bruce Momjian
Date:
On Sat, Sep 30, 2023 at 03:55:20PM +1000, James Healy wrote:
> My organization has a number of very large tables (most 100s of GB, a
> couple over a Tb) that were created many years ago by a tool that
> defaulted to integer PKs rather than bigint. Those PKs have a number
> of integer FKs in related tables as well. We shouldn't have let them
> get so big, but that's a conversation for another day.
> 
> Some are approaching overflow and we're slowly doing the work to
> migrate to bigint. Mostly via the well understood "add a new id_bigint
> column, populate on new tuples, backfill the old, switch the PK"
> method. The backfill is slow on these large tables, but it works and
> there's plenty of blog posts and documentation to follow.
> 
> It did make me curious though: would it be possible for postgres to
> support gradual migration from integer to bigint in a more transparent
> way, where new and updated tuples are written as bigint, but existing
> tuples can be read as integer?
> 
> I assume maybe a complication is that the catalog says the column is
> either 32bit int or 64bit bigint and making that conditional is hard.
> There's presumably other considerations I'm unaware of too. My core
> question: are there significant technical blockers to supporting this
> kind of gradual in place migration, or has it just not been enough of
> a problem that it's received attention?

I think this talk will help you:

    https://www.youtube.com/watch?v=XYRgTazYuZ4

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Re: Gradual migration from integer to bigint?

From
James Healy
Date:
On Sun, 1 Oct 2023 at 04:35, Bruce Momjian <bruce@momjian.us> wrote:
> I think this talk will help you:
>
>         https://www.youtube.com/watch?v=XYRgTazYuZ4

Thanks, I hadn't seen that talk and it's a good summary of the issue
and available solutions.

However it doesn't really address the question of a gradual migration
process that can read 32bit ints but insert/update as 64bit bigints. I
remain curious about whether the postgres architecture just makes that
implausible, or if it could be done and just hasn't because the
options for a more manual migration are Good Enough.

James



Re: Gradual migration from integer to bigint?

From
Tom Lane
Date:
James Healy <james@yob.id.au> writes:
> However it doesn't really address the question of a gradual migration
> process that can read 32bit ints but insert/update as 64bit bigints. I
> remain curious about whether the postgres architecture just makes that
> implausible, or if it could be done and just hasn't because the
> options for a more manual migration are Good Enough.

I think what you're asking for is a scheme whereby some rows in a
table have datatype X in a particular column while other rows in
the very same physical table have datatype Y in the same column.
That is not happening, because there'd be no way to tell which
case applies to any particular row.

You could fantasize about labeling individual rows somehow, but
it's mere fantasy because there's noplace to put such labels.
To the limited extent that we can find spare space in the
existing page layout, there are far better use-cases (see
nearby discussions about 64-bit XIDs, for example).  And nobody
is going to advocate breaking on-disk compatibility for this,
especially not a break that adds more per-row overhead.

So really the only way forward for this would be to provide more
automation for the existing conversion processes involving table
rewrites.  That's possible perhaps, but it doesn't really sound
compelling enough to justify a lot of work.

            regards, tom lane



Re: Gradual migration from integer to bigint?

From
Ron
Date:
On 9/30/23 22:37, Tom Lane wrote:
[snip]
> especially not a break that adds more per-row overhead.
>
> So really the only way forward for this would be to provide more
> automation for the existing conversion processes involving table
> rewrites.

When altering an unindexed INT to BIGINT, do all of the indices get rewritten?

-- 
Born in Arizona, moved to Babylonia.



Re: Gradual migration from integer to bigint?

From
James Healy
Date:
On Sun, 1 Oct 2023 at 14:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think what you're asking for is a scheme whereby some rows in a
> table have datatype X in a particular column while other rows in
> the very same physical table have datatype Y in the same column.
> That is not happening, because there'd be no way to tell which
> case applies to any particular row.

To be honest, I don't know enough about the postgresql on-disk format
and tuple shape to be confident in how this would be solved. I was
thinking more about the ergonomics of what would be helpful and
wondering how viable it was.

Sounds like not very viable. Rats.

The docs [1] on changing column types include:

> As an exception, when changing the type of an existing column, if the USING clause does not change the column
contentsand the old type is either binary coercible to the new type or an unconstrained domain over the new type, a
tablerewrite is not needed
 

... and mention the specific case of switching between VARCHAR and
TEXT not requiring a table or index rewrite.

Seems like the specific case of int->bigint is impossible to make as
easy, given the fixed sizes in the tuple and impossibility of knowing
from tuple to tuple whether to read 4 or 8 bytes.

regards,
James

[1] https://www.postgresql.org/docs/current/sql-altertable.html



Re: Gradual migration from integer to bigint?

From
Ann Harrison
Date:


On Sat, Sep 30, 2023 at 11:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Healy <james@yob.id.au> writes:
> However it doesn't really address the question of a gradual migration
> process that can read 32bit ints but insert/update as 64bit bigints. I
> remain curious about whether the postgres architecture just makes that
> implausible, or if it could be done and just hasn't because the
> options for a more manual migration are Good Enough.

I think what you're asking for is a scheme whereby some rows in a
table have datatype X in a particular column while other rows in
the very same physical table have datatype Y in the same column.
That is not happening, because there'd be no way to tell which
case applies to any particular row.

Other databases do allow that sort of gradual migration.  One example
has an internal table of record descriptions indexed the table identifier 
and a description number.  Each record includes a header with various 
useful bits including its description number. When reading a record, 
the system notes the description number and looks up the description 
before parsing the record into columns.  

The transition is made easier if the database indexes are generic - 
for example, numbers rather than decimal[12,6], int32, etc., and string 
rather than varchar[12].   That way, increasing a column size doesn't
require re-indexing.

But, those are decision that really had to be made early - making
a major format change 25+ years in would break too much.

Cheers,

Ann




Re: Gradual migration from integer to bigint?

From
Ireneusz Pluta
Date:
W dniu 30.09.2023 o 07:55, James Healy pisze:
> ...
> We shouldn't have let them get so big, but that's a conversation
> for another day.
>
> Some are approaching overflow and we're slowly doing the work to
> migrate to bigint. Mostly via the well understood "add a new id_bigint
> column, populate on new tuples, backfill the old, switch the PK"
> method. The backfill is slow on these large tables, but it works and
> there's plenty of blog posts and documentation to follow.
wouldn't wrapping to negative numbers like: https://www.youtube.com/watch?v=XYRgTazYuZ4&t=1338s be a 
solution for you? At least for buying more time for the slow migration process. Or even as a 
definite solution if you now take care and not let the keys grow too quickly.



Re: Gradual migration from integer to bigint?

From
Ron
Date:
On 10/1/23 12:04, Ireneusz Pluta wrote:
> W dniu 30.09.2023 o 07:55, James Healy pisze:
>> ...
>> We shouldn't have let them get so big, but that's a conversation
>> for another day.
>>
>> Some are approaching overflow and we're slowly doing the work to
>> migrate to bigint. Mostly via the well understood "add a new id_bigint
>> column, populate on new tuples, backfill the old, switch the PK"
>> method. The backfill is slow on these large tables, but it works and
>> there's plenty of blog posts and documentation to follow.
> wouldn't wrapping to negative numbers like: 
> https://www.youtube.com/watch?v=XYRgTazYuZ4&t=1338s be a solution for you? 
> At least for buying more time for the slow migration process. Or even as a 
> definite solution if you now take care and not let the keys grow too quickly.

The application might not react well to negative numbers.

-- 
Born in Arizona, moved to Babylonia.



Re: Gradual migration from integer to bigint?

From
Nick Cleaton
Date:
On Sat, 30 Sept 2023, 23:37 Tom Lane, <tgl@sss.pgh.pa.us> wrote:

I think what you're asking for is a scheme whereby some rows in a
table have datatype X in a particular column while other rows in
the very same physical table have datatype Y in the same column.

An alternative for NOT NULL columns would be to use a new attnum for the bigint version of the id, but add a column to pg_attribute allowing linking the new id col to the dropped old id col, to avoid the table rewrite.

Global read code change needed: on finding a NULL in a NOT NULL column, check for a link to a dropped old col and use that value instead if found. The check could be almost free in the normal case if there's already a check for unexpected NULL or tuple too short.

Then a metadata-only operation can create the new id col and drop and rename and link the old id col, and fix up fkeys etc for the attnum change.

Indexes are an issue. Require the in-advance creation of indexes like btree(id::bigint) mirroring every index involving id maybe ? Those could then be swapped in as part of the same metadata operation.

Re: Gradual migration from integer to bigint?

From
Bruce Momjian
Date:
On Sun, Oct  1, 2023 at 05:30:39AM -0400, Ann Harrison wrote:
> Other databases do allow that sort of gradual migration.  One example
> has an internal table of record descriptions indexed the table identifier 
> and a description number.  Each record includes a header with various 
> useful bits including its description number. When reading a record, 
> the system notes the description number and looks up the description 
> before parsing the record into columns.  
> 
> The transition is made easier if the database indexes are generic - 
> for example, numbers rather than decimal[12,6], int32, etc., and string 
> rather than varchar[12].   That way, increasing a column size doesn't
> require re-indexing.
> 
> But, those are decision that really had to be made early - making
> a major format change 25+ years in would break too much.

And the performance sounds terrible.  ;-)

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.