Thread: Column type modification in big tables

Column type modification in big tables

From
Lok P
Date:
Hello all,
We have a postgres table which is a range partitions on a timestamp column having total size ~3TB holding a total ~5billion rows spanning across ~150 daily partitions and there are ~140+columns in the table. Also this table is a child to another partition table. And we have partition creation handled through pg_partman extension on this table.

We have a requirement of modifying the existing column lengths as below . So doing it directly through a single alter command will probably scan and rewrite the whole table which may take hours of run time.

So trying to understand from experts what is the best possible way to tackle such changes in postgres database? And if any side effects we may see considering this table being child to another one and also dependency with pg_partman extension.

two of the columns from varchar(20) to numeric(3)
one of the columns from varchar(20) to varchar(2)
one of the columns from Number(10,2) to Numeric(8,2)

Regards
Lok

Re: Column type modification in big tables

From
sud
Date:

On Wed, Aug 7, 2024 at 4:39 PM Lok P <loknath.73@gmail.com> wrote:
Hello all,
We have a postgres table which is a range partitions on a timestamp column having total size ~3TB holding a total ~5billion rows spanning across ~150 daily partitions and there are ~140+columns in the table. Also this table is a child to another partition table. And we have partition creation handled through pg_partman extension on this table.

We have a requirement of modifying the existing column lengths as below . So doing it directly through a single alter command will probably scan and rewrite the whole table which may take hours of run time.

So trying to understand from experts what is the best possible way to tackle such changes in postgres database? And if any side effects we may see considering this table being child to another one and also dependency with pg_partman extension.

two of the columns from varchar(20) to numeric(3)
one of the columns from varchar(20) to varchar(2)
one of the columns from Number(10,2) to Numeric(8,2)




Others may correct but i think, If you don't have the FK defined on these columns you can do below.


--Alter table add column which will be very fast within seconds as it will just add it to the data dictionary.

ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2 varchar2(3);


-- Back populate the data partition wise and commit, if it's really needed

UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
.....

--Alter table drop old columns which will be very fast within seconds as it will just drop it from the data dictionary.

ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2; 

Re: Column type modification in big tables

From
Lok P
Date:


On Wed, Aug 7, 2024 at 4:51 PM sud <suds1434@gmail.com> wrote:


Others may correct but i think, If you don't have the FK defined on these columns you can do below.


--Alter table add column which will be very fast within seconds as it will just add it to the data dictionary.

ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2 varchar2(3);


-- Back populate the data partition wise and commit, if it's really needed

UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
.....

--Alter table drop old columns which will be very fast within seconds as it will just drop it from the data dictionary.

ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2; 



Thank you so much.

I understand this will be the fastest possible way to achieve the column modification.

But talking about the dropped column which will be sitting in the table and consuming storage space, Is it fine to leave as is or auto vacuum will remove the column values behind the scene and also anyway , once those partitions will be purged they will be by default purged. Is this understanding correct?

 And also will this have any impact on the partition maintenance which is currently done by pg_partman as because the template table is now different internally(not from outside though). Will it cause conflict because of those dropped columns from the main table?

Re: Column type modification in big tables

From
sud
Date:


On Wed, Aug 7, 2024 at 5:00 PM Lok P <loknath.73@gmail.com> wrote:


On Wed, Aug 7, 2024 at 4:51 PM sud <suds1434@gmail.com> wrote:


Others may correct but i think, If you don't have the FK defined on these columns you can do below.


--Alter table add column which will be very fast within seconds as it will just add it to the data dictionary.

ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2 varchar2(3);


-- Back populate the data partition wise and commit, if it's really needed

UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
.....

--Alter table drop old columns which will be very fast within seconds as it will just drop it from the data dictionary.

ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2; 



Thank you so much.

I understand this will be the fastest possible way to achieve the column modification.

But talking about the dropped column which will be sitting in the table and consuming storage space, Is it fine to leave as is or auto vacuum will remove the column values behind the scene and also anyway , once those partitions will be purged they will be by default purged. Is this understanding correct?

 And also will this have any impact on the partition maintenance which is currently done by pg_partman as because the template table is now different internally(not from outside though). Will it cause conflict because of those dropped columns from the main table?

I think leaving the table as is after the dropping column will be fine for you because your regular partition maintenance/drop will slowly purge the historical partitions and eventually they will be removed. But if you update those new columns with the old column values, then autovacuum should also take care of removing the rows with older column values (which are dead actually) .

Not sure if pg_partman will cause any issue ,as because the table now has the column data type/length changed. Others may confirm.

 

Re: Column type modification in big tables

From
Lok P
Date:

On Thu, Aug 8, 2024 at 1:06 AM sud <suds1434@gmail.com> wrote:


On Wed, Aug 7, 2024 at 5:00 PM Lok P <loknath.73@gmail.com> wrote:


On Wed, Aug 7, 2024 at 4:51 PM sud <suds1434@gmail.com> wrote:


Others may correct but i think, If you don't have the FK defined on these columns you can do below.


--Alter table add column which will be very fast within seconds as it will just add it to the data dictionary.

ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2 varchar2(3);


-- Back populate the data partition wise and commit, if it's really needed

UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
.....

--Alter table drop old columns which will be very fast within seconds as it will just drop it from the data dictionary.

ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2; 



Thank you so much.

I understand this will be the fastest possible way to achieve the column modification.

But talking about the dropped column which will be sitting in the table and consuming storage space, Is it fine to leave as is or auto vacuum will remove the column values behind the scene and also anyway , once those partitions will be purged they will be by default purged. Is this understanding correct?

 And also will this have any impact on the partition maintenance which is currently done by pg_partman as because the template table is now different internally(not from outside though). Will it cause conflict because of those dropped columns from the main table?

I think leaving the table as is after the dropping column will be fine for you because your regular partition maintenance/drop will slowly purge the historical partitions and eventually they will be removed. But if you update those new columns with the old column values, then autovacuum should also take care of removing the rows with older column values (which are dead actually) .

Not sure if pg_partman will cause any issue ,as because the table now has the column data type/length changed. Others may confirm.

Thank you so much. 

Can anybody suggest any other possible way here. As, we also need to have the existing values be updated to the new column value here using update command (even if it will update one partition at a time). And as I see we have almost all the values in the column not null, which means it will update almost ~5billion rows across all the partitions. So my question is , is there any parameter(like work_mem,maintenance_work_mem etc) which we can set to make this update faster?
 or any other way to get this column altered apart from this method?

Re: Column type modification in big tables

From
Alban Hertroys
Date:
> On 8 Aug 2024, at 20:38, Lok P <loknath.73@gmail.com> wrote:
>
> Thank you so much.
>
> Can anybody suggest any other possible way here. As, we also need to have the existing values be updated to the new
columnvalue here using update command (even if it will update one partition at a time). And as I see we have almost all
thevalues in the column not null, which means it will update almost ~5billion rows across all the partitions. So my
questionis , is there any parameter(like work_mem,maintenance_work_mem etc) which we can set to make this update
faster?
>  or any other way to get this column altered apart from this method?

Just a wild thought here that I’m currently not able to check… Can you add views as partitions? They would be read-only
ofcourse, but that would allow you to cast the columns in your original partitions to the new format, while you can add
anynew partitions in the new format. 

I suspect it’s not allowed, but perhaps worth a try.

Alban Hertroys
--
There is always an exception to always.







Re: Column type modification in big tables

From
Greg Sabino Mullane
Date:
On Thu, Aug 8, 2024 at 2:39 PM Lok P <loknath.73@gmail.com> wrote:
Can anybody suggest any other possible way here.

Sure - how about not changing the column type at all?

> one of the columns from varchar(20) to varchar(2)

ALTER TABLE foobar ADD CONSTRAINT twocharplease CHECK (length(mycol) <= 2) NOT VALID;

> one of the columns from Number(10,2) to Numeric(8,2)

ALTER TABLE foobar ADD CONSTRAINT eightprecision CHECK (mycol <= 10^8) NOT VALID;

two of the columns from varchar(20) to numeric(3)

This one is trickier, as we don't know the contents, nor why it is going to numeric(3) - not a terribly useful data type, but let's roll with it and assume the stuff in the varchar is a number of some sort, and that we don't allow nulls:

ALTER TABLE foobar ADD CONSTRAINT onekorless CHECK (mycol::numeric(3) is not null) NOT VALID;

You probably want to check on the validity of the existing rows: see the docs on VALIDATE CONSTRAINT here:


Cheers,
Greg

Re: Column type modification in big tables

From
Lok P
Date:


On Fri, Aug 9, 2024 at 2:06 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Thu, Aug 8, 2024 at 2:39 PM Lok P <loknath.73@gmail.com> wrote:
Can anybody suggest any other possible way here.

Sure - how about not changing the column type at all?

> one of the columns from varchar(20) to varchar(2)

ALTER TABLE foobar ADD CONSTRAINT twocharplease CHECK (length(mycol) <= 2) NOT VALID;

> one of the columns from Number(10,2) to Numeric(8,2)

ALTER TABLE foobar ADD CONSTRAINT eightprecision CHECK (mycol <= 10^8) NOT VALID;

two of the columns from varchar(20) to numeric(3)

This one is trickier, as we don't know the contents, nor why it is going to numeric(3) - not a terribly useful data type, but let's roll with it and assume the stuff in the varchar is a number of some sort, and that we don't allow nulls:

ALTER TABLE foobar ADD CONSTRAINT onekorless CHECK (mycol::numeric(3) is not null) NOT VALID;

You probably want to check on the validity of the existing rows: see the docs on VALIDATE CONSTRAINT here:




Thank you so much. Will definitely try to evaluate this approach. The Only concern I have is , as this data is moving downstream with exactly the same data type and length , so will it cause the downstream code to break while using this column in the join or filter criteria. Also I believe the optimizer won't be able to utilize this information while preparing the execution plan.

 Another thing , correct me if wrong, My understanding is  , if we want to run the "validate constraint" command after running this "check constraint with not valid" command, this will do a full table scan across all the partitions , but it's still beneficial as compared to updating the columns values for each rows. Correct me if I'm wrong. 



Re: Column type modification in big tables

From
Greg Sabino Mullane
Date:
On Fri, Aug 9, 2024 at 6:39 AM Lok P <loknath.73@gmail.com> wrote:
Thank you so much. Will definitely try to evaluate this approach. The Only concern I have is , as this data is moving downstream with exactly the same data type and length , so will it cause the downstream code to break while using this column in the join or filter criteria. Also I believe the optimizer won't be able to utilize this information while preparing the execution plan.

Yes, this is not as ideal as rewriting the table, but you asked for another approaches :) As to the impact of your downstream stuff, I think you have to try and see. Not clear what you mean by the optimizer, it's not going to really care about numeric(10) versus numeric(8) or varchar(20) vs varchar(2). It's possible the varchar -> numeric could cause issues, but without real-world queries and data we cannot say.
 
 Another thing , correct me if wrong, My understanding is  , if we want to run the "validate constraint" command after running this "check constraint with not valid" command, this will do a full table scan across all the partitions , but it's still beneficial as compared to updating the columns values for each rows. Correct me if I'm wrong. 

Yes, it needs to scan the entire table, but it's a lightweight lock, won't block concurrent access, will not need to detoast, and makes no table or index updates. Versus an entire table rewrite which will do heavy locking, take up tons of I/O, update all the indexes, and generate quite a lot of WAL. 

Cheers,
Greg
 

Re: Column type modification in big tables

From
Lok P
Date:

On Fri, Aug 9, 2024 at 9:19 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Fri, Aug 9, 2024 at 6:39 AM Lok P <loknath.73@gmail.com> wrote:
Thank you so much. Will definitely try to evaluate this approach. The Only concern I have is , as this data is moving downstream with exactly the same data type and length , so will it cause the downstream code to break while using this column in the join or filter criteria. Also I believe the optimizer won't be able to utilize this information while preparing the execution plan.

Yes, this is not as ideal as rewriting the table, but you asked for another approaches :) As to the impact of your downstream stuff, I think you have to try and see. Not clear what you mean by the optimizer, it's not going to really care about numeric(10) versus numeric(8) or varchar(20) vs varchar(2). It's possible the varchar -> numeric could cause issues, but without real-world queries and data we cannot say.
 
 Another thing , correct me if wrong, My understanding is  , if we want to run the "validate constraint" command after running this "check constraint with not valid" command, this will do a full table scan across all the partitions , but it's still beneficial as compared to updating the columns values for each rows. Correct me if I'm wrong. 

Yes, it needs to scan the entire table, but it's a lightweight lock, won't block concurrent access, will not need to detoast, and makes no table or index updates. Versus an entire table rewrite which will do heavy locking, take up tons of I/O, update all the indexes, and generate quite a lot of WAL. 


Thank you so much Greg.

Considering the option, if we are able to get large down time to get this activity done.

Some teammates suggested altering the column with "USING" Clause. I am not really able to understand the difference,  also when i tested on a simple table, it seems the "USING" clause takes more time as compared to normal ALTER. But again I don't see any way to see the progress and estimated completion time. Can you share your thoughts on this? 

ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) USING mycol::NUMERIC(3);
VS
ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) ;

*****
Another thing also comes to my mind whether we should just create a new partition table(say new_part_table) from scratch from the DDL of the existing table(say old_part_table) and then load the data into it using command (insert into new_part_table.. select..from old_part_table). Then create indexes and constraints etc, something as below.

Will this approach be faster/better as compared to the simple "alter table alter column approach" as above, considering we will have 4-6 hours of downtime for altering three different columns on this ~5TB table?

-- Steps

Create table exactly same as existing partition table but with the modified column types/lengths.

drop indexes ;  (Except PK and FK indexes may be..)
drop constraints;

insert into new_part_table (...) select (...) from old_part_table;

create indexes concurrently ;
create constraints; (But this table is also a child table to another partition table, so creating the foreign key may be resource consuming here too).

drop the old_part_table;
rename the new_part_table to old_part_table;
rename all the partitions;

VACUUM  old_part_table  ;
ANALYZE  old_part_table  ; 

Re: Column type modification in big tables

From
sud
Date:

On Sat, Aug 10, 2024 at 12:52 AM Lok P <loknath.73@gmail.com> wrote:

On Fri, Aug 9, 2024 at 9:19 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Fri, Aug 9, 2024 at 6:39 AM Lok P <loknath.73@gmail.com> wrote:
Thank you so much. Will definitely try to evaluate this approach. The Only concern I have is , as this data is moving downstream with exactly the same data type and length , so will it cause the downstream code to break while using this column in the join or filter criteria. Also I believe the optimizer won't be able to utilize this information while preparing the execution plan.

Yes, this is not as ideal as rewriting the table, but you asked for another approaches :) As to the impact of your downstream stuff, I think you have to try and see. Not clear what you mean by the optimizer, it's not going to really care about numeric(10) versus numeric(8) or varchar(20) vs varchar(2). It's possible the varchar -> numeric could cause issues, but without real-world queries and data we cannot say.
 
 Another thing , correct me if wrong, My understanding is  , if we want to run the "validate constraint" command after running this "check constraint with not valid" command, this will do a full table scan across all the partitions , but it's still beneficial as compared to updating the columns values for each rows. Correct me if I'm wrong. 

Yes, it needs to scan the entire table, but it's a lightweight lock, won't block concurrent access, will not need to detoast, and makes no table or index updates. Versus an entire table rewrite which will do heavy locking, take up tons of I/O, update all the indexes, and generate quite a lot of WAL. 


Thank you so much Greg.

Considering the option, if we are able to get large down time to get this activity done.

Some teammates suggested altering the column with "USING" Clause. I am not really able to understand the difference,  also when i tested on a simple table, it seems the "USING" clause takes more time as compared to normal ALTER. But again I don't see any way to see the progress and estimated completion time. Can you share your thoughts on this? 

ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) USING mycol::NUMERIC(3);
VS
ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) ;

*****
Another thing also comes to my mind whether we should just create a new partition table(say new_part_table) from scratch from the DDL of the existing table(say old_part_table) and then load the data into it using command (insert into new_part_table.. select..from old_part_table). Then create indexes and constraints etc, something as below.

Will this approach be faster/better as compared to the simple "alter table alter column approach" as above, considering we will have 4-6 hours of downtime for altering three different columns on this ~5TB table?

-- Steps

Create table exactly same as existing partition table but with the modified column types/lengths.

drop indexes ;  (Except PK and FK indexes may be..)
drop constraints;

insert into new_part_table (...) select (...) from old_part_table;

create indexes concurrently ;
create constraints; (But this table is also a child table to another partition table, so creating the foreign key may be resource consuming here too).

drop the old_part_table;
rename the new_part_table to old_part_table;
rename all the partitions;

VACUUM  old_part_table  ;
ANALYZE  old_part_table  ; 


My 2cents. 
If you have enough time then from a simplicity point of view, your single line alter command may look good, but how are you going to see  the amount of progress it has made so far and how much time it's going to take to finish. And you got ~6hrs of down time but if it fails at 5th hour then you will be in a bad position. 


Re: Column type modification in big tables

From
Lok P
Date:

On Sat, Aug 10, 2024 at 5:47 PM sud <suds1434@gmail.com> wrote:


My 2cents. 
If you have enough time then from a simplicity point of view, your single line alter command may look good, but how are you going to see  the amount of progress it has made so far and how much time it's going to take to finish. And you got ~6hrs of down time but if it fails at 5th hour then you will be in a bad position. 


 
Does it mean that , if we get enough downtime then , we should rather go with the option of recreating the table from scratch and populating the data from the existing table and then rename it back? It does look more complicated considering many steps like creating indexes, constraints back and renaming it and then running vacuum and analyze etc. 

Can someone through some light , in case we get 5-6hrs downtime for this change , then what method should we choose for this Alter operation?

Re: Column type modification in big tables

From
Greg Sabino Mullane
Date:
On Sat, Aug 10, 2024 at 5:06 PM Lok P <loknath.73@gmail.com> wrote:
 
Can someone through some light , in case we get 5-6hrs downtime for this change , then what method should we choose for this Alter operation?

We can't really answer that. Only you know what resources you have, what risk/reward you are willing to handle, and how long things may take. For that latter item, your best bet is to try this out on the same/similar hardware and see how long it takes. Do a smaller table and extrapolate if you need to. Or promote one of your replicas offline and modify that. I've given you a low-risk / medium-reward option with check constraints, but for the ALTER TABLE options you really need to try it and see (on non-prod).

it seems the "USING" clause takes more time as compared to normal ALTER. But again I don't see any way to see the progress and estimated completion time. Can you share your thoughts on this?

There should be no difference if they are doing the same conversion.

 Will this approach be faster/better as compared to the simple "alter table alter column approach" as above

Seems a lot more complicated to me than a simple ALTER. But measurement is key. Create a new test cluster using pgBackRest or whatever you have. Then run your ALTER TABLE and see how long it takes (remember that multiple columns can be changed in a single ALTER TABLE statement). 

Cheers,
Greg

Re: Column type modification in big tables

From
Lok P
Date:

On Tue, Aug 13, 2024 at 7:39 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Sat, Aug 10, 2024 at 5:06 PM Lok P <loknath.73@gmail.com> wrote:
 
Can someone through some light , in case we get 5-6hrs downtime for this change , then what method should we choose for this Alter operation?

We can't really answer that. Only you know what resources you have, what risk/reward you are willing to handle, and how long things may take. For that latter item, your best bet is to try this out on the same/similar hardware and see how long it takes. Do a smaller table and extrapolate if you need to. Or promote one of your replicas offline and modify that. I've given you a low-risk / medium-reward option with check constraints, but for the ALTER TABLE options you really need to try it and see (on non-prod).

 
"Do a smaller table and extrapolate if you need to. Or promote one of your replicas offline and modify that. I've given you a low-risk / medium-reward option with check constraints, but for the ALTER TABLE options you really need to try it and see (on non-prod)."

Is there any possible method(maybe by looking into the data dictionary tables/views etc) to see the progress of the Alter statement by which we can estimate the expected completion time of the "Alter" command? I understand pg_stat_activity doesn't show any completion percentage of a statement, but wondering if by any other possible way we can estimate the amount of time it will take in prod for the completion of the ALTER command.

Re: Column type modification in big tables

From
veem v
Date:

On Tue, 13 Aug 2024 at 19:39, Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Sat, Aug 10, 2024 at 5:06 PM Lok P <loknath.73@gmail.com> wrote:
 
Can someone through some light , in case we get 5-6hrs downtime for this change , then what method should we choose for this Alter operation?

We can't really answer that. Only you know what resources you have, what risk/reward you are willing to handle, and how long things may take. For that latter item, your best bet is to try this out on the same/similar hardware and see how long it takes. Do a smaller table and extrapolate if you need to. Or promote one of your replicas offline and modify that. I've given you a low-risk / medium-reward option with check constraints, but for the ALTER TABLE options you really need to try it and see (on non-prod).


What about if the OP opt a strategy something as below,
1) Detaching the partitions 2)Altering individual partitions with required column type and length 3)Altering the table 4)Attaching the partitions back to the main table

This should be faster and also a controlled fashion for each partition individually. 

Re: Column type modification in big tables

From
"Peter J. Holzer"
Date:
On 2024-08-14 01:26:36 +0530, Lok P wrote:
> Is there any possible method(maybe by looking into the data dictionary tables/
> views etc) to see the progress of the Alter statement by which we can estimate
> the expected completion time of the "Alter" command? I understand
> pg_stat_activity doesn't show any completion percentage of a statement, but
> wondering if by any other possible way we can estimate the amount of time it
> will take in prod for the completion of the ALTER command.

You could look at the data files. Tables in PostgreSQL are stored as a
series of 1GB files, so you watching them being created and/or read
gives you a pretty good idea about progress.

For example, here is an alter table (changing one column from int to
bigint)  on a 1.8 GB table on my laptop:

The original table: Two data files with 1 and 0.8 GB respectively:

22:26:51 1073741824 Aug 13 22:24 266648
22:26:51  853794816 Aug 13 22:26 266648.1

The operation begins: A data file for the new table appears:

22:26:55 1073741824 Aug 13 22:26 266648
22:26:55  853794816 Aug 13 22:26 266648.1
22:26:55   79298560 Aug 13 22:26 266659

... and grows:

22:26:57 1073741824 Aug 13 22:26 266648
22:26:57  853794816 Aug 13 22:26 266648.1
22:26:57  208977920 Aug 13 22:26 266659

... and grows:

22:26:59 1073741824 Aug 13 22:26 266648
22:26:59  853794816 Aug 13 22:26 266648.1
22:26:59  284024832 Aug 13 22:26 266659

and now the table has exceeded 1 GB, so there's a second file:

22:27:17 1073741824 Aug 13 22:26 266648
22:27:17 1073741824 Aug 13 22:27 266659
22:27:17  853794816 Aug 13 22:27 266648.1
22:27:17    3022848 Aug 13 22:27 266659.1

... and a third:

22:27:44 1073741824 Aug 13 22:26 266648
22:27:44 1073741824 Aug 13 22:27 266659
22:27:44 1073741824 Aug 13 22:27 266659.1
22:27:44  853794816 Aug 13 22:27 266648.1
22:27:44   36798464 Aug 13 22:27 266659.2

almost finished:

22:28:08 1073741824 Aug 13 22:26 266648
22:28:08 1073741824 Aug 13 22:27 266659
22:28:08 1073741824 Aug 13 22:27 266659.1
22:28:08  853794816 Aug 13 22:27 266648.1
22:28:08   36798464 Aug 13 22:28 266659.2

Done: The old table has been reduced to an empty file (not sure why
PostgreSQL keeps that around):

22:28:10 1073741824 Aug 13 22:27 266659
22:28:10 1073741824 Aug 13 22:27 266659.1
22:28:10   36798464 Aug 13 22:28 266659.2
22:28:10          0 Aug 13 22:28 266648

Of course you need to be postgres or root to do this. Be careful!

Watching the access times may be useful, too, but on Linux by default
the access time is only updated under some special circumstances, so
this may be misleading.

        hp


--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: Column type modification in big tables

From
Dominique Devienne
Date:
On Tue, Aug 13, 2024 at 10:54 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> You could look at the data files. Tables in PostgreSQL are stored as a
> series of 1GB files, so you watching them being created and/or read
> gives you a pretty good idea about progress.

Thanks Peter, very insightful. Appreciated. --DD