Thread: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17717 Logged by: Gunnar L Email address: postgresql@taljaren.se PostgreSQL version: 15.0 Operating system: Ubuntu Linux Description: We have observed a significant slowdown in vacuumdb performance between different versions of postgresql. And possibly also a memory issue. We run a specific data model, where each customer has its own schema with its own set of tables. Each database server hosts 16 databases, each containing around 250 customer schemas. Due to postgres creating a new file for each database object, we end up with around 5 million files on each database server. This may or may not be related to the issue we're seeing (new algorithms with new time complexity?) We upgraded from postgresql 9.5 to postgresql 13, and noticed a significant slowdown in how vacuumdb performs. Before, we could run a vacuumdb -a -z each night, taking around 2 hours to complete. After the upgrade, we see a constant 100% CPU utilization during the vacuumdb process (almost no I/O activity), and vacuumdb cannot complete within a reasonable time. We're able to vacuum about 3-4 databases each night. We are able to recreate this issue, using a simple bash script to generate a similar setup. From local testing, here are our findings: Concerning speed: * Version 9.5, 10, 11 are fast (9.5 slower than 10 and 11) * Version 12, 13, 14 are very, very slow * Version 15 is faster (a lot faster than 12,13,14) but not nearly as fast as 10 or 11. Concerning memory usage: * Version 15 is using a lot more shared memory OR it might not be releasing it properly after vacuuming a db. These are the timings for vacuuming the 16 dbs. Version Seconds Completed ------------------------------ 9.5 412 16/16 10 178 16/16 11 166 16/16 12 8319 1/16 or 2/16 (manually aborted) 13 18853 3/16 or 4/16 (manually aborted) 14 16857 3/16 or 4/16 (manually aborted) 15 617 1/16 (crashed!) 15 4158 6/16 (crashed! --shm-size=256mb) 15 9500 16/16 (--shm-size=4096mb) The timing of the only successful run for postgres 15 is somewhat flaky, since the machine was suspended for about 1-1.5 hours so 9500 is only an estimate, but the first run (1 db completed in 10 minutes) gives that it is faster than 12-14 but slower than 10 and 11 (3 minutes to complete everything) The following describes our setup This is the script (called setup.sh) we’re using to populate the databases (we give a port number as parameter) ##### start of setup.sh export PGPASSWORD=mysecretpassword PORT=$1 echo ""> tables_$PORT.sql for schema in `seq -w 1 250`; do echo "create schema schema$schema;" >> tables_$PORT.sql for table in `seq -w 1 500`; do echo "create table schema$schema.table$table (id int);" >> tables_$PORT.sql done done echo "Setting up db: 01" createdb -h localhost -U postgres -p $PORT db01 psql -q -h localhost -U postgres -p $PORT db01 -f tables_$PORT.sql # This seems to be the fastest way to create the databases for db in `seq -w 2 16`; do echo "Setting up db: $db" createdb -h localhost -U postgres -p $PORT --template db01 db$db done ####### end of setup.sh To execute a test for a particular postgres version (in this example PG 9.5), we run the following. It will setup PG 9.5 on port 15432. docker run --rm --name pg95 -e POSTGRES_PASSWORD=mysecretpassword -p 15432:5432 -d postgres:9.5 ./setup.sh 15432 date; time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"; date (The date commands are added to keep track of when tasks were started). Here are complete set of commands and output and comments (We use different ports for different versions of PG) date; time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"; date (The date commands since it takes some time to run) time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres" vacuumdb: vacuuming database "db01" …<snip>... vacuumdb: vacuuming database "db16" vacuumdb: vacuuming database "postgres" vacuumdb: vacuuming database "template1" real 6m52,070s user 0m0,048s sys 0m0,029s time docker exec -it pg10 bash -c "vacuumdb -a -z -U postgres" vacuumdb: vacuuming database "db01" …<snip>... vacuumdb: vacuuming database "db16" vacuumdb: vacuuming database "postgres" vacuumdb: vacuuming database "template1" real 2m58,354s user 0m0,043s sys 0m0,013s time docker exec -it pg11 bash -c "vacuumdb -a -z -U postgres" vacuumdb: vacuuming database "db01" …<snip>... vacuumdb: vacuuming database "db16" vacuumdb: vacuuming database "postgres" vacuumdb: vacuuming database "template1" real 2m46,181s user 0m0,047s sys 0m0,012s date; time docker exec -it pg12 bash -c "vacuumdb -a -z -U postgres"; date lör 10 dec 2022 18:57:43 CET vacuumdb: vacuuming database "db01" vacuumdb: vacuuming database "db02" ^CCancel request sent vacuumdb: error: vacuuming of table "schema241.table177" in database "db02" failed: ERROR: canceling statement due to user request real 138m39,600s user 0m0,177s sys 0m0,418s lör 10 dec 2022 21:16:22 CET date;time docker exec -it pg13 bash -c "vacuumdb -a -z -U postgres" lör 10 dec 2022 07:22:32 CET vacuumdb: vacuuming database "db01" vacuumdb: vacuuming database "db02" vacuumdb: vacuuming database "db03" vacuumdb: vacuuming database "db04" ^CCancel request sent real 314m13,172s user 0m0,551s sys 0m0,663s lör 10 dec 2022 12:37:03 CET date;time docker exec -it pg14 bash -c "vacuumdb -a -z -U postgres"; date lör 10 dec 2022 14:15:37 CET vacuumdb: vacuuming database "db01" vacuumdb: vacuuming database "db02" vacuumdb: vacuuming database "db03" vacuumdb: vacuuming database "db04" ^CCancel request sent real 280m57,172s user 0m0,586s sys 0m0,559s lör 10 dec 2022 18:56:34 CET date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date lör 10 dec 2022 12:50:25 CET vacuumdb: vacuuming database "db01" vacuumdb: vacuuming database "db02" vacuumdb: error: processing of database "db02" failed: ERROR: could not resize shared memory segment "/PostgreSQL.2952321776" to 27894720 bytes: No space left on device real 10m17,913s user 0m0,030s sys 0m0,049s lör 10 dec 2022 13:00:43 CET # it was faster, but we need to extend shared memory to make it work docker run --rm --name pg15 --shm-size=256mb -e POSTGRES_PASSWORD=mysecretpassword -p 55555:5432 -d postgres:15 date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date mån 12 dec 2022 08:56:17 CET vacuumdb: vacuuming database "db01" …<snip>... vacuumdb: vacuuming database "db07" vacuumdb: error: processing of database "db07" failed: ERROR: could not resize shared memory segment "/PostgreSQL.1003084622" to 27894720 bytes: No space left on device real 69m18,345s user 0m0,217s sys 0m0,086s mån 12 dec 2022 10:05:36 CET docker run --rm --name pg15 --shm-size=4096mb -e POSTGRES_PASSWORD=mysecretpassword -p 55555:5432 -d postgres:15 date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date mån 12 dec 2022 11:16:11 CET vacuumdb: vacuuming database "db01" …<snip>... vacuumdb: vacuuming database "db16" vacuumdb: vacuuming database "postgres" vacuumdb: vacuuming database "template1" real 232m46,168s user 0m0,227s sys 0m0,467s mån 12 dec 2022 15:08:57 CET Here is the hardware that was used AMD Ryzen 7 PRO 5850U with Radeon Graphics 8 Cores, 16 threads $ free total used free shared buff/cache available Mem: 28562376 5549716 752624 1088488 22260036 21499752 Swap: 999420 325792 673628 Disk: NVMe device, Samsung SSD 980 1TB
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > We run a specific data model, where each customer has its own schema with > its own set of tables. Each database server hosts 16 databases, each > containing around 250 customer schemas. Due to postgres creating a new file > for each database object, we end up with around 5 million files on each > database server. This may or may not be related to the issue we're seeing > (new algorithms with new time complexity?) > We upgraded from postgresql 9.5 to postgresql 13, and noticed a significant > slowdown in how vacuumdb performs. Before, we could run a vacuumdb -a -z > each night, taking around 2 hours to complete. After the upgrade, we see a > constant 100% CPU utilization during the vacuumdb process (almost no I/O > activity), and vacuumdb cannot complete within a reasonable time. We're able > to vacuum about 3-4 databases each night. I poked into this a little bit. On HEAD, watching things with "perf" identifies vac_update_datfrozenxid() as the main time sink. It's not hard to see why: that does a seqscan of pg_class, and it's invoked at the end of each vacuum() call. So if you try to vacuum each table in the DB separately, you're going to end up spending O(N^2) time in often-useless rescans of pg_class. This isn't a huge problem in ordinary-sized DBs, but with 125000 small tables in the DB it becomes the dominant cost. > Concerning speed: > * Version 9.5, 10, 11 are fast (9.5 slower than 10 and 11) > * Version 12, 13, 14 are very, very slow > * Version 15 is faster (a lot faster than 12,13,14) but not nearly as fast > as 10 or 11. The reason for the v12 performance change is that up through v11, "vacuumdb -a -z" would just issue "VACUUM (ANALYZE);" in each DB. So vac_update_datfrozenxid only ran once. Beginning in v12 (commit e0c2933a7), vacuumdb issues a separate VACUUM command for each targeted table, which causes the problem. I'm not sure why there's a performance delta from 14 to 15. It doesn't look like vacuumdb itself had any material changes, so we must have done something different on the backend side. This may indicate that there's another O(N^2) behavior that we got rid of in v15. Anyway, that change isn't bad, so I did not poke into it too much. Conclusions: * As a short-term fix, you could try using vacuumdb from v11 with the newer servers. Or just do "psql -c 'vacuum analyze'" and not bother with vacuumdb at all. (On HEAD, with this example database, 'vacuum analyze' takes about 7 seconds per DB for me, versus ~10 minutes using vacuumdb.) * To fix vacuumdb properly, it might be enough to get it to batch VACUUMs, say by naming up to 1000 tables per command instead of just one. I'm not sure how that would interact with its parallelization logic, though. It's not really solving the O(N^2) issue either, just pushing it further out. * A better idea, though sadly not very back-patchable, could be to expose a VACUUM option to control whether it runs vac_update_datfrozenxid, so that vacuumdb can do that just once at the end. Considering that vac_update_datfrozenxid requires an exclusive lock, the current behavior is poison for parallel vacuuming quite aside from the O(N^2) issue. This might tie into some work Peter G. has been pursuing, too. regards, tom lane
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Peter Geoghegan
Date:
On Thu, Dec 15, 2022 at 10:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > * A better idea, though sadly not very back-patchable, could > be to expose a VACUUM option to control whether it runs > vac_update_datfrozenxid, so that vacuumdb can do that just > once at the end. Considering that vac_update_datfrozenxid > requires an exclusive lock, the current behavior is poison for > parallel vacuuming quite aside from the O(N^2) issue. This > might tie into some work Peter G. has been pursuing, too. That sounds like a good idea to me. But do we actually need a VACUUM option for this? I wonder if we could get away with having the VACUUM command never call vac_update_datfrozenxid(), except when run in single-user mode. It would be nice to make pg_xact/clog truncation autovacuum's responsibility. Autovacuum already does things differently to the VACUUM command, and for reasons that seem related to this complaint about vacuumdb. Besides, autovacuum is already on the hook to call vac_update_datfrozenxid() for the benefit of databases that haven't actually been vacuumed, per the do_autovacuum() comments right above its vac_update_datfrozenxid() call. -- Peter Geoghegan
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Dec 15, 2022 at 10:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> * A better idea, though sadly not very back-patchable, could >> be to expose a VACUUM option to control whether it runs >> vac_update_datfrozenxid, so that vacuumdb can do that just >> once at the end. Considering that vac_update_datfrozenxid >> requires an exclusive lock, the current behavior is poison for >> parallel vacuuming quite aside from the O(N^2) issue. This >> might tie into some work Peter G. has been pursuing, too. > That sounds like a good idea to me. But do we actually need a VACUUM > option for this? I wonder if we could get away with having the VACUUM > command never call vac_update_datfrozenxid(), except when run in > single-user mode. It would be nice to make pg_xact/clog truncation > autovacuum's responsibility. I could get behind manual VACUUM not invoking vac_update_datfrozenxid by default, perhaps. But if it can never call it, then that is a fairly important bit of housekeeping that is unreachable except by autovacuum. No doubt the people who turn off autovacuum are benighted, but they're still out there. Could we get somewhere by saying that manual VACUUM calls vac_update_datfrozenxid only if it's a full-DB vacuum (ie, no table was specified)? That would fix the problem at hand. However, it'd mean (since v12) that a vacuumdb run never calls vac_update_datfrozenxid at all, which would result in horrible problems for any poor sods who think that a cronjob running "vacuumdb -a" is an adequate substitute for autovacuum. Or maybe we could modify things so that "autovacuum = off" doesn't prevent occasional cycles of vac_update_datfrozenxid-and-nothing-else? In the end I feel like a manual way to call vac_update_datfrozenxid would be the least magical way of running this. regards, tom lane
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Peter Geoghegan
Date:
On Thu, Dec 15, 2022 at 1:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I could get behind manual VACUUM not invoking vac_update_datfrozenxid > by default, perhaps. But if it can never call it, then that is a > fairly important bit of housekeeping that is unreachable except by > autovacuum. No doubt the people who turn off autovacuum are benighted, > but they're still out there. I wouldn't mind adding another option for this to VACUUM. We already have a couple of VACUUM options that are only really needed as escape hatches, or perhaps as testing tools used by individual Postgres hackers. Another one doesn't seem too bad. The VACUUM command should eventually become totally niche, so I'm not too concerned about going overboard here. > Could we get somewhere by saying that manual VACUUM calls > vac_update_datfrozenxid only if it's a full-DB vacuum (ie, no table > was specified)? That would fix the problem at hand. That definitely seems reasonable. > Or maybe we could modify things so that "autovacuum = off" doesn't prevent > occasional cycles of vac_update_datfrozenxid-and-nothing-else? That's what I was thinking of. It seems like a more natural approach to me, at least offhand. I have to imagine that the vast majority of individual calls to vac_update_datfrozenxid have just about zero chance of updating datfrozenxid or datminmxid as things stand. There is bound to be some number of completely static tables in every database (maybe just system catalogs). Those static tables are bound to be the tables that hold back datfrozenxid/datminmxid approximately all the time. To me this suggests that vac_update_datfrozenxid should fully own the fact that it's supposed to be called out of band, possibly only in autovacuum. Separately, I wonder if it would make sense to invent a new fast-path for the VACUUM command that is designed to inexpensively determine that it cannot possibly matter if vac_update_datfrozenxid is never called, given the specifics (the details of the target rel and its TOAST rel). -- Peter Geoghegan
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Dec 15, 2022 at 1:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Or maybe we could modify things so that "autovacuum = off" doesn't prevent >> occasional cycles of vac_update_datfrozenxid-and-nothing-else? > That's what I was thinking of. It seems like a more natural approach > to me, at least offhand. Seems worth looking into. But I suppose the launch frequency would have to be more often than the current behavior for autovacuum = off, so it would complicate the logic in that area. > I have to imagine that the vast majority of individual calls to > vac_update_datfrozenxid have just about zero chance of updating > datfrozenxid or datminmxid as things stand. That is a really good point. How about teaching VACUUM to track the oldest original relfrozenxid and relminmxid among the table(s) it processed, and skip vac_update_datfrozenxid unless at least one of those matches the database's values? For extra credit, also skip if we didn't successfully advance the source rel's value. This might lead to a fix that solves the OP's problem while not changing anything fundamental, which would make it reasonable to back-patch. regards, tom lane
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Peter Geoghegan
Date:
On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I have to imagine that the vast majority of individual calls to > > vac_update_datfrozenxid have just about zero chance of updating > > datfrozenxid or datminmxid as things stand. > > That is a really good point. How about teaching VACUUM to track > the oldest original relfrozenxid and relminmxid among the table(s) > it processed, and skip vac_update_datfrozenxid unless at least one > of those matches the database's values? For extra credit, also > skip if we didn't successfully advance the source rel's value. Hmm. I think that that would probably work. It would certainly work on 15+, because there tends to be "natural diversity" among the relfrozenxid values seen for each table, due to the "track oldest extant XID" work; we no longer see many tables that all have the same relfrozenxid, that advance in lockstep. But even that factor probably doesn't matter, since we only need one "laggard relfrozenxid" static table for the scheme to work and work well. That is probably a safe bet on all versions, though I'd have to check to be sure. > This might lead to a fix that solves the OP's problem while not > changing anything fundamental, which would make it reasonable > to back-patch. That's a big plus. This is a nasty regression. I wouldn't call it a must-fix, but it's bad enough to be worth fixing if we can come up with a reasonably non-invasive approach. -- Peter Geoghegan
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes: > On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That is a really good point. How about teaching VACUUM to track >> the oldest original relfrozenxid and relminmxid among the table(s) >> it processed, and skip vac_update_datfrozenxid unless at least one >> of those matches the database's values? For extra credit, also >> skip if we didn't successfully advance the source rel's value. > Hmm. I think that that would probably work. > It would certainly work on 15+, because there tends to be "natural > diversity" among the relfrozenxid values seen for each table, due to > the "track oldest extant XID" work; we no longer see many tables that > all have the same relfrozenxid, that advance in lockstep. But even > that factor probably doesn't matter, since we only need one "laggard > relfrozenxid" static table for the scheme to work and work well. That > is probably a safe bet on all versions, though I'd have to check to be > sure. Oh, I see your point: if a whole lot of tables have the same relfrozenxid and it matches datfrozenxid, this won't help. Still, we can hope that that's an uncommon situation. If we postulate somebody trying to use scheduled "vacuumdb -z" in place of autovacuum, they shouldn't really have that situation. Successively vacuuming many tables should normally result in the tables' relfrozenxids not being all the same, unless they were unlucky enough to have a very long-running transaction holding back the global xmin horizon the whole time. regards, tom lane
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Michael Paquier
Date:
On Thu, Dec 15, 2022 at 01:56:30PM -0500, Tom Lane wrote: > * To fix vacuumdb properly, it might be enough to get it to > batch VACUUMs, say by naming up to 1000 tables per command > instead of just one. I'm not sure how that would interact > with its parallelization logic, though. It's not really > solving the O(N^2) issue either, just pushing it further out. I have been thinking about this part, and using a hardcoded rule for the batches would be tricky. The list of relations returned by the scan of pg_class are ordered by relpages, so depending on the distribution of the sizes (few tables with a large size and a lot of table with small sizes, exponential distribution of table sizes), we may finish with more downsides than upsides in some cases, even if we use a linear rule based on the number of relations, or even if we distribute the relations across the slots in a round robin fashion for example. In order to control all that, rather than a hardcoded rule, could it be as simple as introducing an option like vacuumdb --batch=N defaulting to 1 to let users control the number of relations grouped in a single command with a round robin distribution for each slot? -- Michael
Attachment
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Christophe Pettus
Date:
> In order to control all that, rather than a hardcoded rule, could it > be as simple as introducing an option like vacuumdb --batch=N > defaulting to 1 to let users control the number of relations grouped > in a single command with a round robin distribution for each slot? My first reaction to that is: Is it possible to explain to a DBA what N should be for a particular cluster?
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Nathan Bossart
Date:
On Thu, Dec 15, 2022 at 08:39:54PM -0800, Peter Geoghegan wrote: > On Thu, Dec 15, 2022 at 1:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I could get behind manual VACUUM not invoking vac_update_datfrozenxid >> by default, perhaps. But if it can never call it, then that is a >> fairly important bit of housekeeping that is unreachable except by >> autovacuum. No doubt the people who turn off autovacuum are benighted, >> but they're still out there. > > I wouldn't mind adding another option for this to VACUUM. We already > have a couple of VACUUM options that are only really needed as escape > hatches, or perhaps as testing tools used by individual Postgres > hackers. Another one doesn't seem too bad. The VACUUM command should > eventually become totally niche, so I'm not too concerned about going > overboard here. Perhaps there could also be an update-datfrozenxid function that vacuumdb calls when finished with a database. Even if vacuum becomes smarter about calling vac_update_datfrozenxid, this might still be worth doing. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Michael Paquier
Date:
On Sat, Dec 17, 2022 at 06:23:27PM -0800, Christophe Pettus wrote: > My first reaction to that is: Is it possible to explain to a DBA > what N should be for a particular cluster? Assuming that we can come up with a rather straight-forward still portable rule for the distribution of the relations across of the slots like something I mentioned above (which is not the best thing depending on the sizes and the number of tables), that would be quite tricky IMO. -- Michael
Attachment
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
From
Justin Pryzby
Date:
On Sun, Dec 18, 2022 at 11:21:47AM +0900, Michael Paquier wrote: > On Thu, Dec 15, 2022 at 01:56:30PM -0500, Tom Lane wrote: > > * To fix vacuumdb properly, it might be enough to get it to > > batch VACUUMs, say by naming up to 1000 tables per command > > instead of just one. I'm not sure how that would interact > > with its parallelization logic, though. It's not really > > solving the O(N^2) issue either, just pushing it further out. > > I have been thinking about this part, and using a hardcoded rule for > the batches would be tricky. The list of relations returned by the > scan of pg_class are ordered by relpages, so depending on the > distribution of the sizes (few tables with a large size and a lot of > table with small sizes, exponential distribution of table sizes), we > may finish with more downsides than upsides in some cases, even if we > use a linear rule based on the number of relations, or even if we > distribute the relations across the slots in a round robin fashion for > example. I've always found it weird that it uses "ORDER BY relpages". I'd prefer if it could ORDER BY age(relfrozenxid) or GREATEST(age(relfrozenxid), age(relminmxid)), at least if you specify one of the --min-*age parms. Or something less hardcoded and unconfigurable. -- Justin