Re: System administration functions about relation size ignore changes in the table structure - Mailing list pgsql-bugs

From Erki Eessaar
Subject Re: System administration functions about relation size ignore changes in the table structure
Date
Msg-id AM9PR01MB8268A9EA16A0D77CD69D759BFED0A@AM9PR01MB8268.eurprd01.prod.exchangelabs.com
Whole thread Raw
In response to Re: System administration functions about relation size ignore changes in the table structure  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: System administration functions about relation size ignore changes in the table structure
Re: System administration functions about relation size ignore changes in the table structure
List pgsql-bugs
Hello

Thank you. However, I made another small experiment (see the code in the end) and I still have questions.

I created a table with a textual column and generated 15_000 rows of test data. After that I added a generated column based on the textual column and indeed the relation size increased.

After that I dropped ALL the columns of the table (by the way it is a nice feature that PostgreSQL allows tables without columns). VACUUM ANALYZE did not change the relation size. VACUUM FULL reduced the size, but it was still pretty large.

After that I tried to add a mandatory (NOT NULL) column to the table (with a new name). It failed!!

After that I tried to add an optional (permit NULLs) column to the table. It succeeded but the table now had 15_000 rows with NULLs.

Then I added a small amount of test data. It increased the relation size a bit.

I want to understand is it an expected behavior? 

Is it possible somehow to reorganize pages so that the pages that were once used by the relation would be released? 

The border case with evolving the table structure produces a result that from the point of view of database user is a bug.

Best regards
Erki Eessaar
******************************
DROP TABLE IF EXISTS Invoice;

CREATE TABLE Invoice (invoice_code INTEGER NOT NULL,
client_id INTEGER NOT NULL,
invoice_date DATE NOT NULL,
total_sum INTEGER NOT NULL,
description TEXT NOT NULL,
CONSTRAINT pk_invoice PRIMARY KEY (invoice_code));

INSERT INTO Invoice (invoice_code, client_id, invoice_date, total_sum, description)
SELECT generator AS invoice_code,
floor(random() * (1_000 - 1 + 1) + 1)::int AS client_id,
'2015-01-01'::date + floor(random() * (current_date - '2015-01-01' + 1) + 1)::int AS invoice_date,
floor(random() * (40_000-1+1) + 1)::integer AS total_sum,
(SELECT string_agg(x,', ') AS tulemus
FROM (SELECT arr[floor(random() * (8-1+1) + 1)] AS x
FROM (SELECT '{important, awful, irrelevant, quick, paper-based, electronic, top-priority, cheating, very big}'::text[] AS arr) AS foo,
generate_series(1,floor(random() * (10-1+1) +
1)::integer + (generator*0)) AS g) AS f) AS description
FROM generate_series(1, 15_000) AS generator;

VACUUM FULL;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 1966080


ALTER TABLE Invoice ADD COLUMN description_upper TEXT GENERATED ALWAYS AS (Upper(description)) STORED NOT NULL;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
---Result: 2981888

ALTER TABLE Invoice DROP COLUMN description_upper;
ALTER TABLE Invoice DROP COLUMN description;
ALTER TABLE Invoice DROP COLUMN invoice_date;
ALTER TABLE Invoice DROP COLUMN client_id;
ALTER TABLE Invoice DROP COLUMN total_sum;
ALTER TABLE Invoice DROP COLUMN invoice_code CASCADE;

VACUUM FULL;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 425984


ALTER TABLE Invoice ADD COLUMN invoice_id SMALLINT NOT NULL;
--ERROR:  column "invoice_id" of relation "invoice" contains null values

ALTER TABLE Invoice ADD COLUMN invoice_id SMALLINT;

SELECT Count(*) AS cnt FROM Invoice;
--Table has 15000 rows where invoice_id is NULL

INSERT INTO Invoice (invoice_id)
SELECT generator AS invoice_id
FROM generate_series(1, 1_000) AS generator;

VACUUM FULL;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 458752

DELETE FROM Invoice;
VACUUM FULL;
SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 0

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Saturday, October 14, 2023 20:02
To: Erki Eessaar <erki.eessaar@taltech.ee>
Cc: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: System administration functions about relation size ignore changes in the table structure
 
Erki Eessaar <erki.eessaar@taltech.ee> writes:
> According to the example at the end of the letter the functions pg_total_relation_size,  pg_table_size and pg_indexes_size seem to ignore adding or removing columns from a table.

I don't think this is a bug, it's more like a poor test case designed
with no understanding of Postgres internals.

1. Dropping a column doesn't ever produce an instantaneous reduction
of table size.  The column is still there physically.  We'll replace
it with NULLs over time as existing rows are updated.

2. Adding a column may not cause an instantaneous increase in size
either, if we can avoid an immediate table rewrite; which is true
if the column has a null or constant default value.

3. Even for your example with adding a generated column, you would
only see an increase in table size if we could now fit fewer rows per
page than before.  Your test case seems to involve comparing a table
with 3 integer-size columns to one with 3 integers and 2 smallints.
On 64-bit machines those come out to the same size (16 data bytes per
row) because the row length is padded to a multiple of 8 bytes for
alignment reasons.

                        regards, tom lane

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
Next
From: "David G. Johnston"
Date:
Subject: Re: System administration functions about relation size ignore changes in the table structure