Thread: Determine potential change in table size after a column dropped?

Determine potential change in table size after a column dropped?

From

Wells Oliver

Date:

22 January 2022, 07:16:23

I have a large large large table with many many many rows, and it's a certain size in pg_relation_size -- there's a timestamp with tz column on this table that's mostly kind of useless, and I want to figure out how much space it would free if we just dropped it. Can I easily do this?

I can fire up a replica and drop the column there and compare but that takes some time, so wondering if there's some simple math here due to an inherent tz timestamp column size, or something.

Thanks.

Wells Oliver
wells.oliver@gmail.com

Re: Determine potential change in table size after a column dropped?

From

Vijaykumar Jain

Date:

22 January 2022, 07:43:41

On Sat, Jan 22, 2022, 12:47 PM Wells Oliver <wells.oliver@gmail.com> wrote:

I have a large large large table with many many many rows, and it's a certain size in pg_relation_size -- there's a timestamp with tz column on this table that's mostly kind of useless, and I want to figure out how much space it would free if we just dropped it. Can I easily do this?

https://www.postgresql.org/docs/current/sql-altertable.html

The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.

To force immediate reclamation of space occupied by a dropped column, you can execute one of the forms of ALTER TABLE that performs a rewrite of the whole table. This results in reconstructing each row with the dropped column replaced by a null value.

............

But you could still try to get a rough estimate, you could use pg_column_size() and pg_stat_user_tables to get an approx set of tuples and multiply with the size. (Not including indexes though)

Re: Determine potential change in table size after a column dropped?

From

Ron

Date:

22 January 2022, 13:38:26

On 1/22/22 1:43 AM, Vijaykumar Jain wrote:

On Sat, Jan 22, 2022, 12:47 PM Wells Oliver <wells.oliver@gmail.com> wrote:
I have a large large large table with many many many rows, and it's a certain size in pg_relation_size -- there's a timestamp with tz column on this table that's mostly kind of useless, and I want to figure out how much space it would free if we just dropped it. Can I easily do this?

https://www.postgresql.org/docs/current/sql-altertable.html

The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.

To force immediate reclamation of space occupied by a dropped column, you can execute one of the forms of ALTER TABLE that performs a rewrite of the whole table. This results in reconstructing each row with the dropped column replaced by a null value.

What about VACUUM FULL?

--
Angular momentum makes the world go 'round.

Re: Determine potential change in table size after a column dropped?

From

Wells Oliver

Date:

22 January 2022, 17:08:31

I need only drop the column and VACUUM FULL the table, and not the entire DB, right?

On Sat, Jan 22, 2022 at 5:38 AM Ron <ronljohnsonjr@gmail.com> wrote:

On 1/22/22 1:43 AM, Vijaykumar Jain wrote:

On Sat, Jan 22, 2022, 12:47 PM Wells Oliver <wells.oliver@gmail.com> wrote:
I have a large large large table with many many many rows, and it's a certain size in pg_relation_size -- there's a timestamp with tz column on this table that's mostly kind of useless, and I want to figure out how much space it would free if we just dropped it. Can I easily do this?

https://www.postgresql.org/docs/current/sql-altertable.html

The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.

To force immediate reclamation of space occupied by a dropped column, you can execute one of the forms of ALTER TABLE that performs a rewrite of the whole table. This results in reconstructing each row with the dropped column replaced by a null value.

What about VACUUM FULL?

--
Angular momentum makes the world go 'round.

Wells Oliver
wells.oliver@gmail.com

Re: Determine potential change in table size after a column dropped?

From

"David G. Johnston"

Date:

22 January 2022, 17:27:11

On Sat, Jan 22, 2022 at 10:09 AM Wells Oliver <wells.oliver@gmail.com> wrote:

I need only drop the column and VACUUM FULL the table, and not the entire DB, right?

Do you seriously think that vacuuming the entire database is a possible requirement here?

In terms of estimated space savings, the data type documentation tells you how much space each data type consumes. A reasonable first estimate, knowing nothing else about the internals of the system, is to multiply that by the number of rows. That would provide a lower-bound estimate. If that amount makes the effort worth it then no further info is needed.

David J.

Re: Determine potential change in table size after a column dropped?

From

Laurenz Albe

Date:

24 January 2022, 08:28:44

On Sat, 2022-01-22 at 09:08 -0800, Wells Oliver wrote:
> I need only drop the column and VACUUM FULL the table, and not the entire DB, right?

Not that VACUUM (FULL) will *not* physically get rid of a dropped column,
as it just copies the complete rows to a new table.

You would need something like:

CREATE TABLE newtab (LIKE oldtab);
INSERT INTO newtab SELECT * FROM oldtab;

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Determine potential change in table size after a column dropped?

From

Thomas Kellerer

Date:

24 January 2022, 08:44:12

Laurenz Albe schrieb am 24.01.2022 um 09:28:
> On Sat, 2022-01-22 at 09:08 -0800, Wells Oliver wrote:
>> I need only drop the column and VACUUM FULL the table, and not the entire DB, right?
>
> Not that VACUUM (FULL) will *not* physically get rid of a dropped column,
> as it just copies the complete rows to a new table.

I always wondered why that is the case.

If the table is rewritten entirely, wouldn't that also be an option to
really get rid of dropped columns?

Is there a technical reason, or just a case of "no one cared enough to change it"?

Regards
Thomas

Re: Determine potential change in table size after a column dropped?

From

Wells Oliver

Date:

24 January 2022, 16:08:10

So, there's really no way to reclaim space from a dropped column other than entirely creating a new table?

On Mon, Jan 24, 2022 at 12:28 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Sat, 2022-01-22 at 09:08 -0800, Wells Oliver wrote:
> I need only drop the column and VACUUM FULL the table, and not the entire DB, right?

Not that VACUUM (FULL) will *not* physically get rid of a dropped column,
as it just copies the complete rows to a new table.

You would need something like:

CREATE TABLE newtab (LIKE oldtab);
INSERT INTO newtab SELECT * FROM oldtab;

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Wells Oliver
wells.oliver@gmail.com

Re: Determine potential change in table size after a column dropped?

From

Laurenz Albe

Date:

24 January 2022, 16:30:42

On Mon, 2022-01-24 at 08:08 -0800, Wells Oliver wrote:
> > > I need only drop the column and VACUUM FULL the table, and not the entire DB, right?
> > 
> > Not that VACUUM (FULL) will *not* physically get rid of a dropped column,
> > as it just copies the complete rows to a new table.
> > 
> > You would need something like:
> > 
> > CREATE TABLE newtab (LIKE oldtab);
> > INSERT INTO newtab SELECT * FROM oldtab;
>
> So, there's really no way to reclaim space from a dropped column other than
> entirely creating a new table?

Correct, as far as I know.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Determine potential change in table size after a column dropped?

From

Holger Jakobs

Date:

01 February 2022, 13:49:33

Am 01.02.22 um 14:46 schrieb Jiankang Ji:

Hi All Supermen Experts,

I'm new in pgsql and have a similar problem for a timescale pgDB. A DB table is for storing raw sessions data received through IoT network from many remote machines. The data format is the same for all the machines but the sessions lasting-periods could be different from 1 minute to 1 hour and such. Each machine could be activated once a day or a few times a day randomly.

My question is:
1. How to setup a watch-dog to detect new data has been added into the DB, and
2. How to pick-up the newly completed sessions data since last pick-up and put it into a buffer table dedicated to new data for further ETL processing?

If you have some scripts in pgSQL, Python or C, it will be greatly appreciated!

Thank you.

Best regards, Ji

In order to get notified should new rows arrive (or current ones updated or deleted), you can install a trigger which fires a NOTIFY command on a name (channel).

All other sessions which have issued a LISTEN on the same name (channel) will receive a notification.

Unfortunately, not all languages and drivers support this.

Recently, I updated the code for pg_listen in the script language Tcl. It's committed, but no new version released yet.

-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

Attachment

OpenPGP_signature

Re: Determine potential change in table size after a column dropped?

From

Jiankang Ji

Date:

01 February 2022, 14:24:05

Hi Holger Jakobs,

Thank you so much for your quick reply and instruction. I will try later.

Have a nice day.

Br, Ji

On Tue, 1 Feb 2022 at 14:49, Holger Jakobs <holger@jakobs.com> wrote:

Am 01.02.22 um 14:46 schrieb Jiankang Ji:
Hi All Supermen Experts,

I'm new in pgsql and have a similar problem for a timescale pgDB. A DB table is for storing raw sessions data received through IoT network from many remote machines. The data format is the same for all the machines but the sessions lasting-periods could be different from 1 minute to 1 hour and such. Each machine could be activated once a day or a few times a day randomly.

My question is:
1. How to setup a watch-dog to detect new data has been added into the DB, and
2. How to pick-up the newly completed sessions data since last pick-up and put it into a buffer table dedicated to new data for further ETL processing?

If you have some scripts in pgSQL, Python or C, it will be greatly appreciated!

Thank you.

Best regards, Ji

In order to get notified should new rows arrive (or current ones updated or deleted), you can install a trigger which fires a NOTIFY command on a name (channel).
All other sessions which have issued a LISTEN on the same name (channel) will receive a notification.
Unfortunately, not all languages and drivers support this.
Recently, I updated the code for pg_listen in the script language Tcl. It's committed, but no new version released yet.
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

Re: Determine potential change in table size after a column dropped?

From

Holger Jakobs

Date:

02 February 2022, 07:16:00

Am 01.02.22 um 14:46 schrieb Jiankang Ji:

Hi All Supermen Experts,

I'm new in pgsql and have a similar problem for a timescale pgDB. A DB table is for storing raw sessions data received through IoT network from many remote machines. The data format is the same for all the machines but the sessions lasting-periods could be different from 1 minute to 1 hour and such. Each machine could be activated once a day or a few times a day randomly.

My question is:
1. How to setup a watch-dog to detect new data has been added into the DB, and
2. How to pick-up the newly completed sessions data since last pick-up and put it into a buffer table dedicated to new data for further ETL processing?

If you have some scripts in pgSQL, Python or C, it will be greatly appreciated!

Thank you.

Best regards, Ji

In order to get notified should new rows arrive (or current ones updated or deleted), you can install a trigger which fires a NOTIFY command on a name (channel).

All other sessions which have issued a LISTEN on the same name (channel) will receive a notification.

Unfortunately, not all languages and drivers support this.

Recently, I updated the code for pg_listen in the script language Tcl. It's committed, but no new version released yet.

-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

Attachment

OpenPGP_signature