Thread: Gradual migration from integer to bigint?
My organization has a number of very large tables (most 100s of GB, a couple over a Tb) that were created many years ago by a tool that defaulted to integer PKs rather than bigint. Those PKs have a number of integer FKs in related tables as well. We shouldn't have let them get so big, but that's a conversation for another day. Some are approaching overflow and we're slowly doing the work to migrate to bigint. Mostly via the well understood "add a new id_bigint column, populate on new tuples, backfill the old, switch the PK" method. The backfill is slow on these large tables, but it works and there's plenty of blog posts and documentation to follow. It did make me curious though: would it be possible for postgres to support gradual migration from integer to bigint in a more transparent way, where new and updated tuples are written as bigint, but existing tuples can be read as integer? I assume maybe a complication is that the catalog says the column is either 32bit int or 64bit bigint and making that conditional is hard. There's presumably other considerations I'm unaware of too. My core question: are there significant technical blockers to supporting this kind of gradual in place migration, or has it just not been enough of a problem that it's received attention? James
On Sat, Sep 30, 2023 at 03:55:20PM +1000, James Healy wrote: > It did make me curious though: would it be possible for postgres to > support gradual migration from integer to bigint in a more > transparent way, where new and updated tuples are written as bigint, > but existing tuples can be read as integer? Language police: this is the *opposite* of "transparent". "trasparent" and "automated" are not synonyms. -- Ian
On Sat, Sep 30, 2023 at 03:55:20PM +1000, James Healy wrote: > My organization has a number of very large tables (most 100s of GB, a > couple over a Tb) that were created many years ago by a tool that > defaulted to integer PKs rather than bigint. Those PKs have a number > of integer FKs in related tables as well. We shouldn't have let them > get so big, but that's a conversation for another day. > > Some are approaching overflow and we're slowly doing the work to > migrate to bigint. Mostly via the well understood "add a new id_bigint > column, populate on new tuples, backfill the old, switch the PK" > method. The backfill is slow on these large tables, but it works and > there's plenty of blog posts and documentation to follow. > > It did make me curious though: would it be possible for postgres to > support gradual migration from integer to bigint in a more transparent > way, where new and updated tuples are written as bigint, but existing > tuples can be read as integer? > > I assume maybe a complication is that the catalog says the column is > either 32bit int or 64bit bigint and making that conditional is hard. > There's presumably other considerations I'm unaware of too. My core > question: are there significant technical blockers to supporting this > kind of gradual in place migration, or has it just not been enough of > a problem that it's received attention? I think this talk will help you: https://www.youtube.com/watch?v=XYRgTazYuZ4 -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Sun, 1 Oct 2023 at 04:35, Bruce Momjian <bruce@momjian.us> wrote: > I think this talk will help you: > > https://www.youtube.com/watch?v=XYRgTazYuZ4 Thanks, I hadn't seen that talk and it's a good summary of the issue and available solutions. However it doesn't really address the question of a gradual migration process that can read 32bit ints but insert/update as 64bit bigints. I remain curious about whether the postgres architecture just makes that implausible, or if it could be done and just hasn't because the options for a more manual migration are Good Enough. James
James Healy <james@yob.id.au> writes: > However it doesn't really address the question of a gradual migration > process that can read 32bit ints but insert/update as 64bit bigints. I > remain curious about whether the postgres architecture just makes that > implausible, or if it could be done and just hasn't because the > options for a more manual migration are Good Enough. I think what you're asking for is a scheme whereby some rows in a table have datatype X in a particular column while other rows in the very same physical table have datatype Y in the same column. That is not happening, because there'd be no way to tell which case applies to any particular row. You could fantasize about labeling individual rows somehow, but it's mere fantasy because there's noplace to put such labels. To the limited extent that we can find spare space in the existing page layout, there are far better use-cases (see nearby discussions about 64-bit XIDs, for example). And nobody is going to advocate breaking on-disk compatibility for this, especially not a break that adds more per-row overhead. So really the only way forward for this would be to provide more automation for the existing conversion processes involving table rewrites. That's possible perhaps, but it doesn't really sound compelling enough to justify a lot of work. regards, tom lane
On 9/30/23 22:37, Tom Lane wrote: [snip] > especially not a break that adds more per-row overhead. > > So really the only way forward for this would be to provide more > automation for the existing conversion processes involving table > rewrites. When altering an unindexed INT to BIGINT, do all of the indices get rewritten? -- Born in Arizona, moved to Babylonia.
On Sun, 1 Oct 2023 at 14:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think what you're asking for is a scheme whereby some rows in a > table have datatype X in a particular column while other rows in > the very same physical table have datatype Y in the same column. > That is not happening, because there'd be no way to tell which > case applies to any particular row. To be honest, I don't know enough about the postgresql on-disk format and tuple shape to be confident in how this would be solved. I was thinking more about the ergonomics of what would be helpful and wondering how viable it was. Sounds like not very viable. Rats. The docs [1] on changing column types include: > As an exception, when changing the type of an existing column, if the USING clause does not change the column contentsand the old type is either binary coercible to the new type or an unconstrained domain over the new type, a tablerewrite is not needed ... and mention the specific case of switching between VARCHAR and TEXT not requiring a table or index rewrite. Seems like the specific case of int->bigint is impossible to make as easy, given the fixed sizes in the tuple and impossibility of knowing from tuple to tuple whether to read 4 or 8 bytes. regards, James [1] https://www.postgresql.org/docs/current/sql-altertable.html
On Sat, Sep 30, 2023 at 11:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Healy <james@yob.id.au> writes:
> However it doesn't really address the question of a gradual migration
> process that can read 32bit ints but insert/update as 64bit bigints. I
> remain curious about whether the postgres architecture just makes that
> implausible, or if it could be done and just hasn't because the
> options for a more manual migration are Good Enough.
I think what you're asking for is a scheme whereby some rows in a
table have datatype X in a particular column while other rows in
the very same physical table have datatype Y in the same column.
That is not happening, because there'd be no way to tell which
case applies to any particular row.
Other databases do allow that sort of gradual migration. One example
has an internal table of record descriptions indexed the table identifier
and a description number. Each record includes a header with various
useful bits including its description number. When reading a record,
the system notes the description number and looks up the description
before parsing the record into columns.
The transition is made easier if the database indexes are generic -
for example, numbers rather than decimal[12,6], int32, etc., and string
rather than varchar[12]. That way, increasing a column size doesn't
require re-indexing.
But, those are decision that really had to be made early - making
a major format change 25+ years in would break too much.
Cheers,
Ann
W dniu 30.09.2023 o 07:55, James Healy pisze: > ... > We shouldn't have let them get so big, but that's a conversation > for another day. > > Some are approaching overflow and we're slowly doing the work to > migrate to bigint. Mostly via the well understood "add a new id_bigint > column, populate on new tuples, backfill the old, switch the PK" > method. The backfill is slow on these large tables, but it works and > there's plenty of blog posts and documentation to follow. wouldn't wrapping to negative numbers like: https://www.youtube.com/watch?v=XYRgTazYuZ4&t=1338s be a solution for you? At least for buying more time for the slow migration process. Or even as a definite solution if you now take care and not let the keys grow too quickly.
On 10/1/23 12:04, Ireneusz Pluta wrote: > W dniu 30.09.2023 o 07:55, James Healy pisze: >> ... >> We shouldn't have let them get so big, but that's a conversation >> for another day. >> >> Some are approaching overflow and we're slowly doing the work to >> migrate to bigint. Mostly via the well understood "add a new id_bigint >> column, populate on new tuples, backfill the old, switch the PK" >> method. The backfill is slow on these large tables, but it works and >> there's plenty of blog posts and documentation to follow. > wouldn't wrapping to negative numbers like: > https://www.youtube.com/watch?v=XYRgTazYuZ4&t=1338s be a solution for you? > At least for buying more time for the slow migration process. Or even as a > definite solution if you now take care and not let the keys grow too quickly. The application might not react well to negative numbers. -- Born in Arizona, moved to Babylonia.
On Sat, 30 Sept 2023, 23:37 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
I think what you're asking for is a scheme whereby some rows in a
table have datatype X in a particular column while other rows in
the very same physical table have datatype Y in the same column.
An alternative for NOT NULL columns would be to use a new attnum for the bigint version of the id, but add a column to pg_attribute allowing linking the new id col to the dropped old id col, to avoid the table rewrite.
Global read code change needed: on finding a NULL in a NOT NULL column, check for a link to a dropped old col and use that value instead if found. The check could be almost free in the normal case if there's already a check for unexpected NULL or tuple too short.
Then a metadata-only operation can create the new id col and drop and rename and link the old id col, and fix up fkeys etc for the attnum change.
Indexes are an issue. Require the in-advance creation of indexes like btree(id::bigint) mirroring every index involving id maybe ? Those could then be swapped in as part of the same metadata operation.
On Sun, Oct 1, 2023 at 05:30:39AM -0400, Ann Harrison wrote: > Other databases do allow that sort of gradual migration. One example > has an internal table of record descriptions indexed the table identifier > and a description number. Each record includes a header with various > useful bits including its description number. When reading a record, > the system notes the description number and looks up the description > before parsing the record into columns. > > The transition is made easier if the database indexes are generic - > for example, numbers rather than decimal[12,6], int32, etc., and string > rather than varchar[12]. That way, increasing a column size doesn't > require re-indexing. > > But, those are decision that really had to be made early - making > a major format change 25+ years in would break too much. And the performance sounds terrible. ;-) -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.