Thread: How to add columns faster
Hello,
We have an application in which the tables will have approx ~200 columns in some of the big transaction tables when we will be serving all the business use cases. Currently it has ~100 columns to serve current business use cases to start with. As the team is following an agile approach , the attributes will be added gradually with each future release. But the concern is, the transaction tables will be holding billions of rows ( because of historical data migration from start) and will be range partitioned and each of those partitions can be as big as ~100GB and full table size can be in multiple Terabytes.
So our concern was , as the column addition using the traditional "Alter table" command in postgres looks to be a full table rewrite , it’s going to take a lot of time and resources with each of such releases for these big transaction tables. So what is the way to handle such scenarios and do these column additions in quick time with no/minimal downtime?
Regards
Yudhi
> On Mar 3, 2024, at 11:06, yudhi s <learnerdatabase99@gmail.com> wrote: > as the column addition using the traditional "Alter table" command in postgres looks to be a full table rewrite That's not always (or, really, usually) true. Adding a new column in any recent version of PostgreSQL just alters the systemcatalogs; it does not rewrite the table. Make sure the new column is either NULL-able, or has a simple DEFAULT expression(specifically, not using a VOLATILE function). Per the documentation: > When a column is added with ADD COLUMN and a non-volatile DEFAULT is specified, the default is evaluated at the time ofthe statement and the result stored in the table's metadata. That value will be used for the column for all existing rows.If no DEFAULT is specified, NULL is used. In neither case is a rewrite of the table required. > > Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and itsindexes to be rewritten. https://www.postgresql.org/docs/current/sql-altertable.html
On Sun, Mar 3, 2024 at 2:06 PM yudhi s <learnerdatabase99@gmail.com> wrote:
Hello,We have an application in which the tables will have approx ~200 columns in some of the big transaction tables when we will be serving all the business use cases. Currently it has ~100 columns to serve current business use cases to start with. As the team is following an agile approach , the attributes will be added gradually with each future release. But the concern is, the transaction tables will be holding billions of rows ( because of historical data migration from start) and will be range partitioned and each of those partitions can be as big as ~100GB and full table size can be in multiple Terabytes.So our concern was , as the column addition using the traditional "Alter table" command in postgres looks to be a full table rewrite , it’s going to take a lot of time and resources with each of such releases for these big transaction tables. So what is the way to handle such scenarios and do these column additions in quick time with no/minimal downtime?
Normalizing the database design completely eliminates the problem of any need to rewrite a table, since new attributes are added as new rows in a different table.
On Mon, Mar 4, 2024 at 12:43 AM Christophe Pettus <xof@thebuild.com> wrote:
> On Mar 3, 2024, at 11:06, yudhi s <learnerdatabase99@gmail.com> wrote:
> as the column addition using the traditional "Alter table" command in postgres looks to be a full table rewrite
That's not always (or, really, usually) true. Adding a new column in any recent version of PostgreSQL just alters the system catalogs; it does not rewrite the table. Make sure the new column is either NULL-able, or has a simple DEFAULT expression (specifically, not using a VOLATILE function). Per the documentation:
> When a column is added with ADD COLUMN and a non-volatile DEFAULT is specified, the default is evaluated at the time of the statement and the result stored in the table's metadata. That value will be used for the column for all existing rows. If no DEFAULT is specified, NULL is used. In neither case is a rewrite of the table required.
>
> Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and its indexes to be rewritten.
https://www.postgresql.org/docs/current/sql-altertable.html
Thanks for the clarification. In case of adding the column as volatile default (like current_timestamp function as default) or say adding NOT NULL column with some conditional population of existing values will be a full table rewrite. In such scenarios, the full table rewrite operation is going to take a long time , so what will be the fastest way to achieve that with minimal to no downtime?
Apology if this is dumb one, but considering the partitions in postgres are as good as different tables, can we add the new column someway at the table level and add the columns to each of the partitions individually and then attach or it has to happen at one shot only?
> On Mar 3, 2024, at 11:40, yudhi s <learnerdatabase99@gmail.com> wrote: > Thanks for the clarification. In case of adding the column as volatile default (like current_timestamp function as default)or say adding NOT NULL column with some conditional population of existing values will be a full table rewrite. Insuch scenarios, the full table rewrite operation is going to take a long time , so what will be the fastest way to achievethat with minimal to no downtime? For other situations, you probably have to create the column as NULL-able, and then backfill it in groups of records untilthe whole column is populated. Remember that dropping the NULL constraint afterwards will require a full table read(although not a rewrite).
> On Mar 3, 2024, at 12:00, Christophe Pettus <xof@thebuild.com> wrote: > Remember that dropping the NULL constraint afterwards will require a full table read (although not a rewrite). Sorry, badly put: Adding a NOT NULl constraint afterwards will require a full table read (although not a rewrite).