Thread: Why are data files getting modified?
I have a table with monthly partitions, starting from 2011.
Each partition has about 2-3 billion rows and is about 40TB of data in total.
I am running a select query to \copy the data into csv files for each month.
I see there is an AUTOVACUUM job started on all the partitions that have been read (on which select has run). I am assuming the reason why Postgres is autovacuuming these large partitions is this:
"This implies that if a table is not otherwise vacuumed, autovacuum will be invoked on it approximately once every
"This implies that if a table is not otherwise vacuumed, autovacuum will be invoked on it approximately once every
autovacuum_freeze_max_age
minus vacuum_freeze_min_age
transactions. For tables that are regularly vacuumed for space reclamation purposes, this is of little importance. However, for static tables (including tables that receive inserts, but no updates or deletes), there is no need to vacuum for space reclamation, so it can be useful to try to maximize the interval between forced autovacuums on very large static tables. Obviously one can do this either by increasing autovacuum_freeze_max_age
or decreasing vacuum_freeze_min_age
."https://www.postgresql.org/docs/13/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
Please see these partition tables are static, i.e., they received data many years ago, with no updates ever.
Please see these partition tables are static, i.e., they received data many years ago, with no updates ever.
db=# SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid) as age, c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <> 2147483647 ORDER BY 3 DESC LIMIT 20;
table_name | type | age | relfrozenxid
---------------------------+------+-----------+--------------
hist_omx_orderaudit201506 | r | 201463508 | 319891
hist_omx_orderaudit201509 | r | 201418497 | 364902
hist_omx_orderaudit201205 | r | 201142042 | 641357
table_name | type | age | relfrozenxid
---------------------------+------+-----------+--------------
hist_omx_orderaudit201506 | r | 201463508 | 319891
hist_omx_orderaudit201509 | r | 201418497 | 364902
hist_omx_orderaudit201205 | r | 201142042 | 641357
autovacuum_freeze_max_age
---------------------------
200000000
(1 row)
---------------------------
200000000
(1 row)
Am I wrong in my understanding why this autovacuum is getting triggered?
However, my main question is that why do the data files in the tablespace, corresponding to these partition tables (that have been read or selected recently), or the ones autovacuumed recently have a latest timestamp?
However, my main question is that why do the data files in the tablespace, corresponding to these partition tables (that have been read or selected recently), or the ones autovacuumed recently have a latest timestamp?
There was no insert or update on these partitions, so why do I see an updated timestamp? Does autovacuum change something on the datafiles?
Thanks
This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).
On Wed, 2022-02-09 at 17:29 +0900, Abhishek Bhola wrote: > I have a table with monthly partitions, starting from 2011. > Each partition has about 2-3 billion rows and is about 40TB of data in total. > I am running a select query to \copy the data into csv files for each month. > > I see there is an AUTOVACUUM job started on all the partitions that have been > read (on which select has run). I am assuming the reason why Postgres is > autovacuuming these large partitions is this: > > db=# SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid) as age, > c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <> 2147483647 ORDER BY 3 DESC LIMIT 20; > table_name | type | age | relfrozenxid > ---------------------------+------+-----------+-------------- > hist_omx_orderaudit201506 | r | 201463508 | 319891 > hist_omx_orderaudit201509 | r | 201418497 | 364902 > hist_omx_orderaudit201205 | r | 201142042 | 641357 > > autovacuum_freeze_max_age > --------------------------- > 200000000 > (1 row) > > Am I wrong in my understanding why this autovacuum is getting triggered? > > However, my main question is that why do the data files in the tablespace, corresponding > to these partition tables (that have been read or selected recently), or the ones > autovacuumed recently have a latest timestamp? > > There was no insert or update on these partitions, so why do I see an updated timestamp? > Does autovacuum change something on the datafiles? Yes, autovacuum will freeze old rows in the table, which modifies the data files. To keep this to a minimum, you can run VACUUM (FREEZE) on a partition as soon as you know that its data won't get modified any more. Still, I wouldn't be sure that the file modification timestamp won't change any more after that. But that should be pretty irrelevant; at any rate, no substantial data modifications will occur after that. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
I see, but then why is the autovacuum getting triggered after the select command?
On Thu, Feb 10, 2022, 12:14 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2022-02-09 at 17:29 +0900, Abhishek Bhola wrote:
> I have a table with monthly partitions, starting from 2011.
> Each partition has about 2-3 billion rows and is about 40TB of data in total.
> I am running a select query to \copy the data into csv files for each month.
>
> I see there is an AUTOVACUUM job started on all the partitions that have been
> read (on which select has run). I am assuming the reason why Postgres is
> autovacuuming these large partitions is this:
>
> db=# SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid) as age,
> c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <> 2147483647 ORDER BY 3 DESC LIMIT 20;
> table_name | type | age | relfrozenxid
> ---------------------------+------+-----------+--------------
> hist_omx_orderaudit201506 | r | 201463508 | 319891
> hist_omx_orderaudit201509 | r | 201418497 | 364902
> hist_omx_orderaudit201205 | r | 201142042 | 641357
>
> autovacuum_freeze_max_age
> ---------------------------
> 200000000
> (1 row)
>
> Am I wrong in my understanding why this autovacuum is getting triggered?
>
> However, my main question is that why do the data files in the tablespace, corresponding
> to these partition tables (that have been read or selected recently), or the ones
> autovacuumed recently have a latest timestamp?
>
> There was no insert or update on these partitions, so why do I see an updated timestamp?
> Does autovacuum change something on the datafiles?
Yes, autovacuum will freeze old rows in the table, which modifies the data files.
To keep this to a minimum, you can run VACUUM (FREEZE) on a partition as soon
as you know that its data won't get modified any more.
Still, I wouldn't be sure that the file modification timestamp won't change
any more after that. But that should be pretty irrelevant; at any rate, no
substantial data modifications will occur after that.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).
On Thu, 2022-02-10 at 00:29 +0900, Abhishek Bhola wrote: > > > I see there is an AUTOVACUUM job started on all the partitions that have been > > > read (on which select has run). I am assuming the reason why Postgres is > > > autovacuuming these large partitions is this: > > > > > > db=# SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid) as age, > > > c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <> 2147483647 ORDER BY 3 DESC LIMIT 20; > > > table_name | type | age | relfrozenxid > > > ---------------------------+------+-----------+-------------- > > > hist_omx_orderaudit201506 | r | 201463508 | 319891 > > > hist_omx_orderaudit201509 | r | 201418497 | 364902 > > > hist_omx_orderaudit201205 | r | 201142042 | 641357 > > > > > > autovacuum_freeze_max_age > > > --------------------------- > > > 200000000 > > > (1 row) > > I see, but then why is the autovacuum getting triggered after the select command? Autovacuum is triggered because "relfrozenxid" is nore than 200000000 transactions in the past. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com