Thread: How to add columns faster

How to add columns faster

From
yudhi s
Date:
Hello,
We have an application in which the tables will have approx ~200 columns in some of the big transaction tables when we will be serving all the business use cases. Currently it has ~100 columns to serve current business use cases to start with. As the team is following an agile approach , the attributes will be added gradually with each future release. But the concern is, the transaction tables will be holding billions of rows ( because of historical data migration from start) and will be range partitioned and each of those partitions can be as big as ~100GB and full table size can be in multiple Terabytes.

So our concern was , as the column addition using the traditional "Alter table" command in postgres looks to be a full table rewrite , it’s going to take a lot of time and resources with each of such releases for these big transaction tables. So what is the way to handle such scenarios and do these column additions in quick time with no/minimal downtime?

Regards
Yudhi

Re: How to add columns faster

From
Christophe Pettus
Date:

> On Mar 3, 2024, at 11:06, yudhi s <learnerdatabase99@gmail.com> wrote:
> as the column addition using the traditional "Alter table" command in postgres looks to be a full table rewrite

That's not always (or, really, usually) true.  Adding a new column in any recent version of PostgreSQL just alters the
systemcatalogs; it does not rewrite the table.  Make sure the new column is either NULL-able, or has a simple DEFAULT
expression(specifically, not using a VOLATILE function).  Per the documentation: 

> When a column is added with ADD COLUMN and a non-volatile DEFAULT is specified, the default is evaluated at the time
ofthe statement and the result stored in the table's metadata. That value will be used for the column for all existing
rows.If no DEFAULT is specified, NULL is used. In neither case is a rewrite of the table required. 
>
> Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and
itsindexes to be rewritten. 

https://www.postgresql.org/docs/current/sql-altertable.html


Re: How to add columns faster

From
Ron Johnson
Date:
On Sun, Mar 3, 2024 at 2:06 PM yudhi s <learnerdatabase99@gmail.com> wrote:
Hello,
We have an application in which the tables will have approx ~200 columns in some of the big transaction tables when we will be serving all the business use cases. Currently it has ~100 columns to serve current business use cases to start with. As the team is following an agile approach , the attributes will be added gradually with each future release. But the concern is, the transaction tables will be holding billions of rows ( because of historical data migration from start) and will be range partitioned and each of those partitions can be as big as ~100GB and full table size can be in multiple Terabytes.

So our concern was , as the column addition using the traditional "Alter table" command in postgres looks to be a full table rewrite , it’s going to take a lot of time and resources with each of such releases for these big transaction tables. So what is the way to handle such scenarios and do these column additions in quick time with no/minimal downtime?

Normalizing the database design completely eliminates the problem of any need to rewrite a table, since new attributes are added as new rows in a different table.

Re: How to add columns faster

From
yudhi s
Date:
On Mon, Mar 4, 2024 at 12:43 AM Christophe Pettus <xof@thebuild.com> wrote:

> On Mar 3, 2024, at 11:06, yudhi s <learnerdatabase99@gmail.com> wrote:
> as the column addition using the traditional "Alter table" command in postgres looks to be a full table rewrite

That's not always (or, really, usually) true.  Adding a new column in any recent version of PostgreSQL just alters the system catalogs; it does not rewrite the table.  Make sure the new column is either NULL-able, or has a simple DEFAULT expression (specifically, not using a VOLATILE function).  Per the documentation:

> When a column is added with ADD COLUMN and a non-volatile DEFAULT is specified, the default is evaluated at the time of the statement and the result stored in the table's metadata. That value will be used for the column for all existing rows. If no DEFAULT is specified, NULL is used. In neither case is a rewrite of the table required.
>
> Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and its indexes to be rewritten.

https://www.postgresql.org/docs/current/sql-altertable.html

Thanks for the clarification. In case of adding the column as volatile default (like current_timestamp function as default) or say adding NOT NULL column with some conditional population of existing values will be a full table rewrite. In such scenarios, the full table rewrite operation is going to take a long time , so what will be the fastest way to achieve that with minimal to no downtime? 

Apology if this is dumb one, but considering the partitions in postgres are as good as different tables, can we add the new column someway at the table level and add the columns to each of the partitions individually and then attach or it has to happen at one shot only? 

 

Re: How to add columns faster

From
Christophe Pettus
Date:

> On Mar 3, 2024, at 11:40, yudhi s <learnerdatabase99@gmail.com> wrote:
> Thanks for the clarification. In case of adding the column as volatile default (like current_timestamp function as
default)or say adding NOT NULL column with some conditional population of existing values will be a full table rewrite.
Insuch scenarios, the full table rewrite operation is going to take a long time , so what will be the fastest way to
achievethat with minimal to no downtime?  

For other situations, you probably have to create the column as NULL-able, and then backfill it in groups of records
untilthe whole column is populated.  Remember that dropping the NULL constraint afterwards will require a full table
read(although not a rewrite). 


Re: How to add columns faster

From
Christophe Pettus
Date:

> On Mar 3, 2024, at 12:00, Christophe Pettus <xof@thebuild.com> wrote:
> Remember that dropping the NULL constraint afterwards will require a full table read (although not a rewrite).

Sorry, badly put: Adding a NOT NULl constraint afterwards will require a full table read (although not a rewrite).