Thread: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

vacuumdb --jobs deadlock: how to avoid pg_catalog ?

From
Eduard Rozenberg
Date:
Hello there,

I'm a long-time postgres user but vacuumdb newbie :).

Using vacuumdb (v12.2) with '---jobs' to vacuum a v9.6 database on localhost with parallel processes:

$ time vacuumdb --full --echo -U postgres --jobs=8 -d mydatabase_test -p 5434 --password

As shown below I ran into the (100%?) predictable/expected deadlock with sys catalog tables documented in the vacuumdb doc page (https://www.postgresql.org/docs/12/app-vacuumdb.html).

Question: I specified "-d mydatabase_test" - why is it also doing a full vac on the system catalog (postgres db) which is practically guaranteed to cause a deadlock? I don't need or want it to do a full vac on the postgres db as I can do that myself later if I need to, and it takes no time compared to full vac of my own 7 TB database that I am "extremely eager" to parallelize.

Is my only option list all schemas.tables and feed to vacuumdb as "-t schema1.table1 -t schema1.table2 ........" ? That's manageable but unfortunate as it creates more work (having to list all tables) and adds additional work for automation scripts as well.

I've tried searching forums/goog for past issues related to this. I did find an old discussion about possible enhancements to "vacuumdb --jobs" to help the user avoid these predictable deadlocks, where the conclusion was to let the user figure out a way to deal with it such as via "-t table -t table -t table .......".

Maybe "--jobs" should only apply to the user's own databases, and when vacuumdb is working on system catalog tables it should apply a strictly serial/sequential approach to handle those tables?

Thanks!


------------------------------------------
output from vacuumdb's --echo
------------------------------------------
...
VACUUM (FULL) mycompanytesting.sometable1;
VACUUM (FULL) index.mydatabasestd_missing;
VACUUM (FULL) mycompanytesting.sometable2;
VACUUM (FULL) mycompanytesting.sometable3;
VACUUM (FULL) pg_catalog.pg_attribute;
VACUUM (FULL) mycompanydata.sometable4;
VACUUM (FULL) pg_catalog.pg_statistic;
VACUUM (FULL) mycompanytesting.sometable5;
VACUUM (FULL) pg_catalog.pg_proc;
VACUUM (FULL) pg_catalog.pg_depend;
VACUUM (FULL) pg_catalog.pg_class;
VACUUM (FULL) pg_catalog.pg_type;
vacuumdb: error: vacuuming of database "mydatabase_test" failed: ERROR:  deadlock detected
DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of database 35239378; blocked by process 28182.
Process 28182 waits for AccessShareLock on relation 1247 of database 35239378; blocked by process 28183.
HINT:  See server log for query details.


------------------------------
/var/log/postgresql-9.6
------------------------------
...
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.pg_toast.pg_toast_35240266"
LOG:  could not receive data from client: Connection timed out
LOG:  received SIGHUP, reloading configuration files
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
LOG:  received SIGHUP, reloading configuration files
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.pg_toast.pg_toast_35240330"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  deadlock detected
DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of database 35239378; blocked by process 28182.
Process 28182 waits for AccessShareLock on relation 1247 of database 35239378; blocked by process 28183.
Process 28183: VACUUM (FULL) pg_catalog.pg_type;
Process 28182: VACUUM (FULL) pg_catalog.pg_class;
HINT:  See server log for query details.
STATEMENT:  VACUUM (FULL) pg_catalog.pg_type;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) pg_catalog.pg_class;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) pg_catalog.pg_depend;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) someschema.sometable
LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost
LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) mycompanydata.wildcard_replacement_bkp;
LOG:  could not send data to client: Broken pipe
STATEMENT:  VACUUM (FULL) mycompanydata.wildcard_replacement_bkp;
FATAL:  connection to client lost
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) pg_catalog.pg_proc;
LOG:  could not send data to client: Broken pipe
STATEMENT:  VACUUM (FULL) pg_catalog.pg_proc;
FATAL:  connection to client lost
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) mycompanydata.cpc_tag_score2;
LOG:  could not send data to client: Broken pipe
STATEMENT:  VACUUM (FULL) mycompanydata.cpc_tag_score2;
LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost
FATAL:  connection to client lost
LOG:  received SIGHUP, reloading configuration files
LOG:  received SIGHUP, reloading configuration files

Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

From
Adrian Klaver
Date:
On 5/14/20 8:35 AM, Eduard Rozenberg wrote:
> Hello there,
> 
> I'm a long-time postgres user but vacuumdb newbie :).
> 
> Using vacuumdb (v12.2) with '---jobs' to vacuum a v9.6 database on 
> localhost with parallel processes:
> 
> $ time vacuumdb --full --echo -U postgres --jobs=8 -d mydatabase_test -p 
> 5434 --password
> 
> As shown below I ran into the (100%?) predictable/expected deadlock with 
> sys catalog tables documented in the vacuumdb doc page 
> (https://www.postgresql.org/docs/12/app-vacuumdb.html).
> 
> Question: I specified "-d mydatabase_test" - why is it also doing a full 
> vac on the system catalog (postgres db) which is practically guaranteed 

The postgres database and the system catalogs are two separate things.

> to cause a deadlock? I don't need or want it to do a full vac on the 
> postgres db as I can do that myself later if I need to, and it takes no 
> time compared to full vac of my own 7 TB database that I am "extremely 
> eager" to parallelize.

See:

https://www.postgresql.org/docs/12/sql-vacuum.html

"FULL

     Selects “full” vacuum, which can reclaim more space, but takes much 
longer and exclusively locks the table. This method also requires extra 
disk space, since it writes a new copy of the table and doesn't release 
the old copy until the operation is complete. Usually this should only 
be used when a significant amount of space needs to be reclaimed from 
within the table.
"

Pretty sure all that rewriting of tables bloats the system catalogs and 
so they get a VACUUM FULL.


The question here is why is it necessary to do a VACUUM FULL in the 
first place?

> 
> Is my only option list all schemas.tables and feed to vacuumdb as "-t 
> schema1.table1 -t schema1.table2 ........" ? That's manageable but 
> unfortunate as it creates more work (having to list all tables) and adds 
> additional work for automation scripts as well.
> 
> I've tried searching forums/goog for past issues related to this. I did 
> find an old discussion about possible enhancements to "vacuumdb --jobs" 
> to help the user avoid these predictable deadlocks, where the conclusion 
> was to let the user figure out a way to deal with it such as via "-t 
> table -t table -t table .......".
> 
> Maybe "--jobs" should only apply to the user's own databases, and when 
> vacuumdb is working on system catalog tables it should apply a strictly 
> serial/sequential approach to handle those tables?
> 
> Thanks!
> 
> 
> ------------------------------------------
> output from vacuumdb's --echo
> ------------------------------------------
> ...
> VACUUM (FULL) mycompanytesting.sometable1;
> VACUUM (FULL) index.mydatabasestd_missing;
> VACUUM (FULL) mycompanytesting.sometable2;
> VACUUM (FULL) mycompanytesting.sometable3;
> VACUUM (FULL) pg_catalog.pg_attribute;
> VACUUM (FULL) mycompanydata.sometable4;
> VACUUM (FULL) pg_catalog.pg_statistic;
> VACUUM (FULL) mycompanytesting.sometable5;
> VACUUM (FULL) pg_catalog.pg_proc;
> VACUUM (FULL) pg_catalog.pg_depend;
> VACUUM (FULL) pg_catalog.pg_class;
> VACUUM (FULL) pg_catalog.pg_type;
> vacuumdb: error: vacuuming of database "mydatabase_test" failed: ERROR: 
>   deadlock detected
> DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of 
> database 35239378; blocked by process 28182.
> Process 28182 waits for AccessShareLock on relation 1247 of database 
> 35239378; blocked by process 28183.
> HINT:  See server log for query details.
> 
> 
> ------------------------------
> /var/log/postgresql-9.6
> ------------------------------
> ...
> LOG:  received SIGHUP, reloading configuration files
> ERROR:  canceling autovacuum task
> CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
> ERROR:  canceling autovacuum task
> CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
> ERROR:  canceling autovacuum task
> CONTEXT:  automatic vacuum of table 
> "mydatabase_test.pg_toast.pg_toast_35240266"
> LOG:  could not receive data from client: Connection timed out
> LOG:  received SIGHUP, reloading configuration files
> LOG:  received SIGHUP, reloading configuration files
> ERROR:  canceling autovacuum task
> CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
> LOG:  received SIGHUP, reloading configuration files
> LOG:  received SIGHUP, reloading configuration files
> ERROR:  canceling autovacuum task
> CONTEXT:  automatic vacuum of table 
> "mydatabase_test.pg_toast.pg_toast_35240330"
> ERROR:  canceling autovacuum task
> CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
> ERROR:  deadlock detected
> DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of 
> database 35239378; blocked by process 28182.
> Process 28182 waits for AccessShareLock on relation 1247 of database 
> 35239378; blocked by process 28183.
> Process 28183: VACUUM (FULL) pg_catalog.pg_type;
> Process 28182: VACUUM (FULL) pg_catalog.pg_class;
> HINT:  See server log for query details.
> STATEMENT:  VACUUM (FULL) pg_catalog.pg_type;
> ERROR:  canceling statement due to user request
> STATEMENT:  VACUUM (FULL) pg_catalog.pg_class;
> ERROR:  canceling statement due to user request
> STATEMENT:  VACUUM (FULL) pg_catalog.pg_depend;
> ERROR:  canceling statement due to user request
> STATEMENT:  VACUUM (FULL) someschema.sometable
> LOG:  could not send data to client: Broken pipe
> FATAL:  connection to client lost
> LOG:  could not send data to client: Broken pipe
> FATAL:  connection to client lost
> ERROR:  canceling statement due to user request
> STATEMENT:  VACUUM (FULL) mycompanydata.wildcard_replacement_bkp;
> LOG:  could not send data to client: Broken pipe
> STATEMENT:  VACUUM (FULL) mycompanydata.wildcard_replacement_bkp;
> FATAL:  connection to client lost
> ERROR:  canceling statement due to user request
> STATEMENT:  VACUUM (FULL) pg_catalog.pg_proc;
> LOG:  could not send data to client: Broken pipe
> STATEMENT:  VACUUM (FULL) pg_catalog.pg_proc;
> FATAL:  connection to client lost
> ERROR:  canceling statement due to user request
> STATEMENT:  VACUUM (FULL) mycompanydata.cpc_tag_score2;
> LOG:  could not send data to client: Broken pipe
> STATEMENT:  VACUUM (FULL) mycompanydata.cpc_tag_score2;
> LOG:  could not send data to client: Broken pipe
> FATAL:  connection to client lost
> FATAL:  connection to client lost
> LOG:  received SIGHUP, reloading configuration files
> LOG:  received SIGHUP, reloading configuration files
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

From
Eduard Rozenberg
Date:
@Adrian thanks.

I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) previously on a test db copy and saw the DB size (postgres 9.6) shrink from 6.8 TB to 4.4 TB of actual, real disk space usage ($ df -d 1).

I don't know the reason so much space was "locked up" (other than there is a lot of data "churn" from daily updates). But I definitely do need to do the vac full on the production db to get down to the smaller size - cannot afford the 2.4 TB of "wasted" space on an ongoing basis.

Based on your email it sounds like the vacuuming of those pg_catalog.* tables is done regardless, as a normal part of doing vac full on my own database.

Unfortunately I still don't see an ideal way to run vacuum full in parallel via vacuumdb without running into the expected and documented deadlock. Only method I'm aware of is to list each table individually with "-t table1 -t table2..." to "vacuum db --jobs" which is not pleasant and not exceedingly beautiful.

Thanks.

On May 14, 2020, at 10:35, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 5/14/20 8:35 AM, Eduard Rozenberg wrote:
Hello there,
I'm a long-time postgres user but vacuumdb newbie :).
Using vacuumdb (v12.2) with '---jobs' to vacuum a v9.6 database on localhost with parallel processes:
$ time vacuumdb --full --echo -U postgres --jobs=8 -d mydatabase_test -p 5434 --password
As shown below I ran into the (100%?) predictable/expected deadlock with sys catalog tables documented in the vacuumdb doc page (https://www.postgresql.org/docs/12/app-vacuumdb.html).
Question: I specified "-d mydatabase_test" - why is it also doing a full vac on the system catalog (postgres db) which is practically guaranteed 

The postgres database and the system catalogs are two separate things.

to cause a deadlock? I don't need or want it to do a full vac on the postgres db as I can do that myself later if I need to, and it takes no time compared to full vac of my own 7 TB database that I am "extremely eager" to parallelize.

See:

https://www.postgresql.org/docs/12/sql-vacuum.html

"FULL

   Selects “full” vacuum, which can reclaim more space, but takes much longer and exclusively locks the table. This method also requires extra disk space, since it writes a new copy of the table and doesn't release the old copy until the operation is complete. Usually this should only be used when a significant amount of space needs to be reclaimed from within the table.
"

Pretty sure all that rewriting of tables bloats the system catalogs and so they get a VACUUM FULL.


The question here is why is it necessary to do a VACUUM FULL in the first place?

Is my only option list all schemas.tables and feed to vacuumdb as "-t schema1.table1 -t schema1.table2 ........" ? That's manageable but unfortunate as it creates more work (having to list all tables) and adds additional work for automation scripts as well.
I've tried searching forums/goog for past issues related to this. I did find an old discussion about possible enhancements to "vacuumdb --jobs" to help the user avoid these predictable deadlocks, where the conclusion was to let the user figure out a way to deal with it such as via "-t table -t table -t table .......".
Maybe "--jobs" should only apply to the user's own databases, and when vacuumdb is working on system catalog tables it should apply a strictly serial/sequential approach to handle those tables?
Thanks!
------------------------------------------
output from vacuumdb's --echo
------------------------------------------
...
VACUUM (FULL) mycompanytesting.sometable1;
VACUUM (FULL) index.mydatabasestd_missing;
VACUUM (FULL) mycompanytesting.sometable2;
VACUUM (FULL) mycompanytesting.sometable3;
VACUUM (FULL) pg_catalog.pg_attribute;
VACUUM (FULL) mycompanydata.sometable4;
VACUUM (FULL) pg_catalog.pg_statistic;
VACUUM (FULL) mycompanytesting.sometable5;
VACUUM (FULL) pg_catalog.pg_proc;
VACUUM (FULL) pg_catalog.pg_depend;
VACUUM (FULL) pg_catalog.pg_class;
VACUUM (FULL) pg_catalog.pg_type;
vacuumdb: error: vacuuming of database "mydatabase_test" failed: ERROR:   deadlock detected
DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of database 35239378; blocked by process 28182.
Process 28182 waits for AccessShareLock on relation 1247 of database 35239378; blocked by process 28183.
HINT:  See server log for query details.
------------------------------
/var/log/postgresql-9.6
------------------------------
...
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.pg_toast.pg_toast_35240266"
LOG:  could not receive data from client: Connection timed out
LOG:  received SIGHUP, reloading configuration files
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
LOG:  received SIGHUP, reloading configuration files
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.pg_toast.pg_toast_35240330"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  deadlock detected
DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of database 35239378; blocked by process 28182.
Process 28182 waits for AccessShareLock on relation 1247 of database 35239378; blocked by process 28183.
Process 28183: VACUUM (FULL) pg_catalog.pg_type;
Process 28182: VACUUM (FULL) pg_catalog.pg_class;
HINT:  See server log for query details.
STATEMENT:  VACUUM (FULL) pg_catalog.pg_type;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) pg_catalog.pg_class;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) pg_catalog.pg_depend;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) someschema.sometable
LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost
LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) mycompanydata.wildcard_replacement_bkp;
LOG:  could not send data to client: Broken pipe
STATEMENT:  VACUUM (FULL) mycompanydata.wildcard_replacement_bkp;
FATAL:  connection to client lost
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) pg_catalog.pg_proc;
LOG:  could not send data to client: Broken pipe
STATEMENT:  VACUUM (FULL) pg_catalog.pg_proc;
FATAL:  connection to client lost
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) mycompanydata.cpc_tag_score2;
LOG:  could not send data to client: Broken pipe
STATEMENT:  VACUUM (FULL) mycompanydata.cpc_tag_score2;
LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost
FATAL:  connection to client lost
LOG:  received SIGHUP, reloading configuration files
LOG:  received SIGHUP, reloading configuration files


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

From
Adrian Klaver
Date:
On 5/14/20 11:27 AM, Eduard Rozenberg wrote:
> @Adrian thanks.
> 
> I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) 
> previously on a test db copy and saw the DB size (postgres 9.6) shrink 
> from 6.8 TB to 4.4 TB of actual, real disk space usage ($ df -d 1).
> 
> I don't know the reason so much space was "locked up" (other than there 
> is a lot of data "churn" from daily updates). But I definitely do need 
> to do the vac full on the production db to get down to the smaller size 
> - cannot afford the 2.4 TB of "wasted" space on an ongoing basis.

It may not be wasted space. A regular VACUUM marks space within a table 
available for reuse(and not returned to OS) when it removes unneeded 
tuples. It then fills that space up with new tuples, roughly speaking. 
So if the vacuuming is working properly you will reach a steady state 
where space is reused and the database on disk size grows slowly as 
reusable space is occupied. I would monitor the database size on a 
regular basis. My guess is that the VACUUM FULL is dropping the OS used 
space and then it fills up again as the database does those updates.

> 
> Based on your email it sounds like the vacuuming of those pg_catalog.* 
> tables is done regardless, as a normal part of doing vac full on my own 
> database.
> 
> Unfortunately I still don't see an ideal way to run vacuum full in 
> parallel via vacuumdb without running into the expected and documented 
> deadlock. Only method I'm aware of is to list each table individually 
> with "-t table1 -t table2..." to "vacuum db --jobs" which is not 
> pleasant and not exceedingly beautiful.
> 
> Thanks.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

From
Eduard Rozenberg
Date:
@Adrian thanks again.

I read the postgres docs the same way - that previously used space is marked as available and therefore no need for
vacuumfull. Previously used = now available space, which gets re-used, in theory. 

And yet this same DB is growing at 30-50 GB weekly, despite the fact that 2.4 TB of previously used space is clearly
available("clearly available" as proven by vacuum full shrinking the DB space usage by 2.4 TB). I did verify
postgresql.confhas always been properly configured re: autovacuum:  'autovacuum = on'and 'track_counts = on' 

I'm not planning on running VACUUM FULL regularly, just "this one time". And I was trying to to parallelize VACUUM FULL
andminimize downtime of the production DB caused by table locking. And then I found the option of using "vacuumdb
--jobs"which sounded like the perfect solution except for "well you can't actually use --jobs because you'll run into a
deadlockand everybody knows that and nobody has a (good) solution for it" :). 

--Ed

> On May 14, 2020, at 11:46, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 5/14/20 11:27 AM, Eduard Rozenberg wrote:
>> @Adrian thanks.
>> I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) previously on a test db copy and saw the DB
size(postgres 9.6) shrink from 6.8 TB to 4.4 TB of actual, real disk space usage ($ df -d 1). 
>> I don't know the reason so much space was "locked up" (other than there is a lot of data "churn" from daily
updates).But I definitely do need to do the vac full on the production db to get down to the smaller size - cannot
affordthe 2.4 TB of "wasted" space on an ongoing basis. 
>
> It may not be wasted space. A regular VACUUM marks space within a table available for reuse(and not returned to OS)
whenit removes unneeded tuples. It then fills that space up with new tuples, roughly speaking. So if the vacuuming is
workingproperly you will reach a steady state where space is reused and the database on disk size grows slowly as
reusablespace is occupied. I would monitor the database size on a regular basis. My guess is that the VACUUM FULL is
droppingthe OS used space and then it fills up again as the database does those updates. 
>
>> Based on your email it sounds like the vacuuming of those pg_catalog.* tables is done regardless, as a normal part
ofdoing vac full on my own database. 
>> Unfortunately I still don't see an ideal way to run vacuum full in parallel via vacuumdb without running into the
expectedand documented deadlock. Only method I'm aware of is to list each table individually with "-t table1 -t
table2..."to "vacuum db --jobs" which is not pleasant and not exceedingly beautiful. 
>> Thanks.
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com




Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

From
Adrian Klaver
Date:
On 5/14/20 12:03 PM, Eduard Rozenberg wrote:
> @Adrian thanks again.
> 
> I read the postgres docs the same way - that previously used space is marked as available and therefore no need for
vacuumfull. Previously used = now available space, which gets re-used, in theory.
 
> 
> And yet this same DB is growing at 30-50 GB weekly, despite the fact that 2.4 TB of previously used space is clearly
available("clearly available" as proven by vacuum full shrinking the DB space usage by 2.4 TB). I did verify
postgresql.confhas always been properly configured re: autovacuum:  'autovacuum = on'and 'track_counts = on'
 

Well if I'm counting zeros right 50GB on 4.4TB database is ~1.14%. Does 
that sound right for new data being added?


One place to look to see how aggressively the autovacuum  is being done 
here:

https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ALL-TABLES-VIEW

What you find there may mean tweaking the values as explained here:

https://www.postgresql.org/docs/12/routine-vacuuming.html#AUTOVACUUM

Could be there are just a few tables that account for most of the churn 
and a manual VACUUM on them is needed. Say if there are regularly 
scheduled large UPDATEs to tables, incorporate a VACUUM after.

> 
> I'm not planning on running VACUUM FULL regularly, just "this one time". And I was trying to to parallelize VACUUM
FULLand minimize downtime of the production DB caused by table locking. And then I found the option of using "vacuumdb
--jobs"which sounded like the perfect solution except for "well you can't actually use --jobs because you'll run into a
deadlockand everybody knows that and nobody has a (good) solution for it" :).
 
> 
> --Ed
> 
>> On May 14, 2020, at 11:46, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>> On 5/14/20 11:27 AM, Eduard Rozenberg wrote:
>>> @Adrian thanks.
>>> I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) previously on a test db copy and saw the
DBsize (postgres 9.6) shrink from 6.8 TB to 4.4 TB of actual, real disk space usage ($ df -d 1).
 
>>> I don't know the reason so much space was "locked up" (other than there is a lot of data "churn" from daily
updates).But I definitely do need to do the vac full on the production db to get down to the smaller size - cannot
affordthe 2.4 TB of "wasted" space on an ongoing basis.
 
>>
>> It may not be wasted space. A regular VACUUM marks space within a table available for reuse(and not returned to OS)
whenit removes unneeded tuples. It then fills that space up with new tuples, roughly speaking. So if the vacuuming is
workingproperly you will reach a steady state where space is reused and the database on disk size grows slowly as
reusablespace is occupied. I would monitor the database size on a regular basis. My guess is that the VACUUM FULL is
droppingthe OS used space and then it fills up again as the database does those updates.
 
>>
>>> Based on your email it sounds like the vacuuming of those pg_catalog.* tables is done regardless, as a normal part
ofdoing vac full on my own database.
 
>>> Unfortunately I still don't see an ideal way to run vacuum full in parallel via vacuumdb without running into the
expectedand documented deadlock. Only method I'm aware of is to list each table individually with "-t table1 -t
table2..."to "vacuum db --jobs" which is not pleasant and not exceedingly beautiful.
 
>>> Thanks.
>>
>>
>> -- 
>> Adrian Klaver
>> adrian.klaver@aklaver.com
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

From
Virendra Kumar
Date:
You might have index fragmentation and possibly reindexing them conncurrently on PG12, should do that. As everyone agreed most of space will be marked for re-use later for table segments but indices in your case could be problem. On previous versions you can use pg_repack to do index rebuilds which will help reduce space consumed.

Regards,
Virendra


On Thursday, May 14, 2020, 1:20:41 PM PDT, Eduard Rozenberg <edrozenberg@gmail.com> wrote:


@Adrian thanks again.

I read the postgres docs the same way - that previously used space is marked as available and therefore no need for vacuum full. Previously used = now available space, which gets re-used, in theory.

And yet this same DB is growing at 30-50 GB weekly, despite the fact that 2.4 TB of previously used space is clearly available ("clearly available" as proven by vacuum full shrinking the DB space usage by 2.4 TB). I did verify postgresql.conf has always been properly configured re: autovacuum:  'autovacuum = on'and 'track_counts = on'

I'm not planning on running VACUUM FULL regularly, just "this one time". And I was trying to to parallelize VACUUM FULL and minimize downtime of the production DB caused by table locking. And then I found the option of using "vacuumdb --jobs" which sounded like the perfect solution except for "well you can't actually use --jobs because you'll run into a deadlock and everybody knows that and nobody has a (good) solution for it" :).

--Ed

> On May 14, 2020, at 11:46, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 5/14/20 11:27 AM, Eduard Rozenberg wrote:
>> @Adrian thanks.
>> I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) previously on a test db copy and saw the DB size (postgres 9.6) shrink from 6.8 TB to 4.4 TB of actual, real disk space usage ($ df -d 1).
>> I don't know the reason so much space was "locked up" (other than there is a lot of data "churn" from daily updates). But I definitely do need to do the vac full on the production db to get down to the smaller size - cannot afford the 2.4 TB of "wasted" space on an ongoing basis.
>
> It may not be wasted space. A regular VACUUM marks space within a table available for reuse(and not returned to OS) when it removes unneeded tuples. It then fills that space up with new tuples, roughly speaking. So if the vacuuming is working properly you will reach a steady state where space is reused and the database on disk size grows slowly as reusable space is occupied. I would monitor the database size on a regular basis. My guess is that the VACUUM FULL is dropping the OS used space and then it fills up again as the database does those updates.
>
>> Based on your email it sounds like the vacuuming of those pg_catalog.* tables is done regardless, as a normal part of doing vac full on my own database.
>> Unfortunately I still don't see an ideal way to run vacuum full in parallel via vacuumdb without running into the expected and documented deadlock. Only method I'm aware of is to list each table individually with "-t table1 -t table2..." to "vacuum db --jobs" which is not pleasant and not exceedingly beautiful.
>> Thanks.
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com


Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

From
Michael Lewis
Date:
On Thu, May 14, 2020 at 2:20 PM Eduard Rozenberg <edrozenberg@gmail.com> wrote:
I did verify postgresql.conf has always been properly configured re: autovacuum:  'autovacuum = on'and 'track_counts = on'

This may be insufficient to keep up if you have large tables. The default scale factor allows for 20% of the rows to be dead before the autovacuum will kick in to mark the space as available for reuse. Assuming you have the I/O capacity and prefer to do a little cleanup more often rather than HUGE cleanup work all at once on rare occasions, it may be ideal to look at turning down the autovacuum_vacuum_scale_factor. You can tweak these settings on large tables only, or increase the autovacuum_vacuum_threshold at the same time to compensate a bit for decreasing the scale factor. You can also look at pg_stat_activity for autovacuums, and if you see that some are running for hours, then probably they are trying to do too much work all at once and waiting too long before tidying up. Also, the default autovacuum_vacuum_cost_delay was changed from 20ms to 2ms with PG12 so that may be worth considering as a best practice even on older versions.