Thread: Long-running and non-finishing VACUUM ANALYZE on large table
Hi everyone, I'm currently running PostgreSQL 9.3.6 on Ubuntu 14.04 Server and have severe problems to get one large table properly vacuumed (16,743,000,000 reltuples). That is, autovacuum on that table does not finish (even after six days it is still running). To narrow down the problem, I stopped autovacuum, temporarily disabled it (but ONLY on that particular table; not a permanent solution, I know) and started a manual VACUUM ANALYZE on that table via PGAdmin instead. In analogy to the autovacuum, the manually started process is also running virtually endless (until I terminate it after some days) but PGadmin's verbose output (see below) tells me at least that it (apparently) tries to process the same table over and over again. I'm definitely not a Postgres guru, but this can't be the expected behaviour, even on such a big table, right?Did I overlook something? Hoping that this might be of help, you please find the following information below: 1st) definition of the problematic table 2nd) the PGadmin output Please let me know, if you need more information (e.g., specific postgresql.conf settings). Any help is greatly appreciated! Kind regards, Jan #################### 1st) #################### CREATE TABLE protein_hsps ( id bigserial NOT NULL, origin bigint NOT NULL, bit_score real, e_value real, alignment_length smallint NOT NULL, query_start smallint NOT NULL, query_end smallint NOT NULL, subject_start integer NOT NULL, subject_end integer NOT NULL, identical smallint NOT NULL, positive smallint NOT NULL, reciprocal_best_hit boolean, CONSTRAINT protein_hsps_pkey PRIMARY KEY (id), CONSTRAINT protein_hsps_origin_fkey FOREIGN KEY (origin) REFERENCES protein_comparisons (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH ( OIDS=FALSE ); -- Index: protein_hsps_clustidx_on_origin CREATE INDEX protein_hsps_clustidx_on_origin ON protein_hsps USING btree (origin); #################### 2nd) #################### Output of VACCUM ANALYZE when run in PGadmin: INFO: vacuuming "public.protein_hsps" INFO: scanned index "protein_hsps_pkey" to remove 178956773 row versions DETAIL: CPU 342.46s/2774.51u sec elapsed 4359.26 sec. INFO: scanned index "protein_hsps_clustidx_on_origin" to remove 178956773 row versions DETAIL: CPU 361.58s/2750.58u sec elapsed 4324.01 sec. INFO: "protein_hsps": removed 178956773 row versions in 1880456 pages DETAIL: CPU 30.06s/17.31u sec elapsed 160.45 sec. INFO: scanned index "protein_hsps_pkey" to remove 178956763 row versions DETAIL: CPU 360.96s/2751.12u sec elapsed 4330.33 sec. INFO: scanned index "protein_hsps_clustidx_on_origin" to remove 178956763 row versions DETAIL: CPU 356.65s/2737.37u sec elapsed 4288.50 sec. INFO: "protein_hsps": removed 178956763 row versions in 1902608 pages DETAIL: CPU 30.20s/18.78u sec elapsed 174.97 sec. INFO: scanned index "protein_hsps_pkey" to remove 178956767 row versions DETAIL: CPU 358.25s/2812.84u sec elapsed 4448.54 sec. INFO: scanned index "protein_hsps_clustidx_on_origin" to remove 178956767 row versions DETAIL: CPU 365.69s/2835.24u sec elapsed 4513.28 sec. INFO: "protein_hsps": removed 178956767 row versions in 1899058 pages DETAIL: CPU 30.48s/18.15u sec elapsed 160.12 sec. INFO: scanned index "protein_hsps_pkey" to remove 178956762 row versions DETAIL: CPU 361.19s/2668.89u sec elapsed 4287.59 sec. INFO: scanned index "protein_hsps_clustidx_on_origin" to remove 178956762 row versions DETAIL: CPU 360.95s/2668.46u sec elapsed 4244.87 sec. INFO: "protein_hsps": removed 178956762 row versions in 1891629 pages DETAIL: CPU 29.31s/18.06u sec elapsed 136.22 sec. INFO: scanned index "protein_hsps_pkey" to remove 178956716 row versions DETAIL: CPU 365.00s/2817.88u sec elapsed 4542.31 sec. INFO: scanned index "protein_hsps_clustidx_on_origin" to remove 178956716 row versions DETAIL: CPU 360.58s/2776.57u sec elapsed 4368.76 sec. INFO: "protein_hsps": removed 178956716 row versions in 1877406 pages DETAIL: CPU 29.04s/17.29u sec elapsed 119.11 sec. INFO: scanned index "protein_hsps_pkey" to remove 178956712 row versions DETAIL: CPU 356.50s/2644.50u sec elapsed 4272.66 sec. INFO: scanned index "protein_hsps_clustidx_on_origin" to remove 178956712 row versions DETAIL: CPU 371.59s/2644.97u sec elapsed 4236.70 sec. INFO: "protein_hsps": removed 178956712 row versions in 1917558 pages DETAIL: CPU 31.86s/20.42u sec elapsed 161.58 sec. INFO: scanned index "protein_hsps_pkey" to remove 178956753 row versions DETAIL: CPU 376.24s/2722.70u sec elapsed 4352.62 sec. INFO: scanned index "protein_hsps_clustidx_on_origin" to remove 178956753 row versions DETAIL: CPU 378.60s/2723.70u sec elapsed 4323.40 sec. INFO: "protein_hsps": removed 178956753 row versions in 1922079 pages DETAIL: CPU 30.48s/21.27u sec elapsed 136.20 sec. INFO: scanned index "protein_hsps_pkey" to remove 178956686 row versions DETAIL: CPU 381.81s/2709.34u sec elapsed 4380.95 sec. INFO: scanned index "protein_hsps_clustidx_on_origin" to remove 178956686 row versions DETAIL: CPU 382.59s/2695.10u sec elapsed 4321.23 sec. INFO: "protein_hsps": removed 178956686 row versions in 1924530 pages DETAIL: CPU 31.39s/21.19u sec elapsed 135.10 sec. INFO: scanned index "protein_hsps_pkey" to remove 178956721 row versions DETAIL: CPU 372.96s/2578.62u sec elapsed 4235.85 sec. (truncated here, process is still running though ...)
On 10/02/2015 09:48 AM, Jan wrote: > Hi everyone, > > I'm currently running PostgreSQL 9.3.6 on Ubuntu 14.04 Server and have > severe problems to get one large table properly vacuumed (16,743,000,000 > reltuples). 16B in a single table? I am not surprised you are having issues. > That is, autovacuum on that table does not finish (even after six days > it is still running). To narrow down the problem, I stopped autovacuum, > temporarily disabled it (but ONLY on that particular table; not a > permanent solution, I know) and started a manual VACUUM ANALYZE on that > table via PGAdmin instead. In analogy to the autovacuum, the manually > started process is also running virtually endless (until I terminate it > after some days) but PGadmin's verbose output (see below) tells me at > least that it (apparently) tries to process the same table over and over > again. I'm definitely not a Postgres guru, but this can't be the > expected behaviour, even on such a big table, right?Did I overlook > something? 178,956,763 row versions to be removed. I have no doubt that it is taking what seems like forever. I strongly suggest you partition this beast. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
Jan <pgsql.admin@j.mk-contact.de> writes: > That is, autovacuum on that table does not finish (even after six days > it is still running). To narrow down the problem, I stopped autovacuum, > temporarily disabled it (but ONLY on that particular table; not a > permanent solution, I know) and started a manual VACUUM ANALYZE on that > table via PGAdmin instead. In analogy to the autovacuum, the manually > started process is also running virtually endless (until I terminate it > after some days) but PGadmin's verbose output (see below) tells me at > least that it (apparently) tries to process the same table over and over > again. I'm definitely not a Postgres guru, but this can't be the > expected behaviour, even on such a big table, right?Did I overlook > something? No, that looks fairly normal: each time it fills up memory with TIDs of dead tuples, it has to go and scan the indexes to remove index entries pointing at those tuples. You could increase maintenance_work_mem to reduce the number of scans through the indexes, but it looks like you've already got that set to 1GB or so, so I'm not sure that increasing it further would be a good idea. The good news is you are making progress. Perhaps you should just wait it out. However, it'd be good to try to estimate how many dead tuples there are (have you checked the physical size of the table?). If there are vastly more dead tuples than live ones, a VACUUM FULL might be a less painful way to get out of this, though it would lock out other accesses to the table so you might have to take some downtime to do it. While you're waiting, you should try to figure out how you got into a state with so many dead tuples, and think about how to readjust things so it doesn't happen again. regards, tom lane
Ben Primrose <bprimrose@tracelink.com> writes: > Tom, how'd you determine the maintenance_work_mem? From the numbers of tuples shown as being removed in each index pass. The work-mem usage is 6 bytes per TID, and it'll stop and do an index pass once we get close to full. regards, tom lane
Good to know, thanks for the response. Ben -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, October 02, 2015 4:16 PM To: Ben Primrose Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Long-running and non-finishing VACUUM ANALYZE on large table Ben Primrose <bprimrose@tracelink.com> writes: > Tom, how'd you determine the maintenance_work_mem? From the numbers of tuples shown as being removed in each index pass. The work-mem usage is 6 bytes per TID, and it'll stop and do an index pass once we get close to full. regards, tom lane
Tom, thanks a lot for your assessment and please excuse my late response (had to gather some data and to think this over)! On 10/02/2015 07:16 PM, Tom Lane wrote: > Jan<pgsql.admin@j.mk-contact.de> writes: >> That is, autovacuum on that table does not finish (even after six days >> it is still running). To narrow down the problem, I stopped autovacuum, >> temporarily disabled it (but ONLY on that particular table; not a >> permanent solution, I know) and started a manual VACUUM ANALYZE on that >> table via PGAdmin instead. In analogy to the autovacuum, the manually >> started process is also running virtually endless (until I terminate it >> after some days) but PGadmin's verbose output (see below) tells me at >> least that it (apparently) tries to process the same table over and over >> again. I'm definitely not a Postgres guru, but this can't be the >> expected behaviour, even on such a big table, right?Did I overlook >> something? > No, that looks fairly normal: each time it fills up memory with TIDs > of dead tuples, it has to go and scan the indexes to remove index entries > pointing at those tuples. I'm still not getting the math behind it. The below tuple stats show a dead tuple count of 63,187,655 whereas the PGadmin output (see my initial e-mail) reported the message "scanned index protein_hsps_pkey to remove 178956753 row versions" nine times before I cancelled it. That is, if one multiplies 178,956,753 by 9 it yields 1,610,610,777 (dead) rows. But the latter number is much higher than the above 63m rows? Do I compare the wrong numbers? Two additional related questions here: (i) Is it possible to infer, based on the intermediate VACUUM ANALYZE output, how many passes the VACUUM ANALYZE needs to go in total? (ii) As you are mentioning the index on that table, would an index rebuild be a faster alternative to get rid of old row versions? > You could increase maintenance_work_mem to reduce the number of scans > through the indexes, but it looks like you've already got that set to > 1GB or so, so I'm not sure that increasing it further would be a good > idea. Exactly, maintenance_work_mem is currently set to 1GB. > The good news is you are making progress. Perhaps you should just wait it > out. However, it'd be good to try to estimate how many dead tuples there > are (have you checked the physical size of the table?). If there are > vastly more dead tuples than live ones, a VACUUM FULL might be a less > painful way to get out of this, though it would lock out other accesses > to the table so you might have to take some downtime to do it. I executed the following two commands to get both the current tuple statistics and physical size. 1st) -- CREATE EXTENSION pgstattuple; SELECT * FROM pgstattuple('protein_hsps'); -- yielded: table_len 1484828352512 tuple_count 11353611882 tuple_len 874228114914 tuple_percent 58.88 dead_tuple_count 63187655 dead_tuple_len 4810998304 dead_tuple_percent 0.32 free_space 495246133064 free_percent 33.35 2nd) SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON N.oid = C.relnamespace WHERE relname IN ('protein_hsps', 'protein_hsps_clustidx_on_origin', 'protein_hsps_pkey') ORDER BY pg_relation_size(C.oid) DESC; -- yielded: public.protein_hsps 1383 GB public.protein_hsps_clustidx_on_origin 499 GB public.protein_hsps_pkey 494 GB > While you're waiting, you should try to figure out how you got into a > state with so many dead tuples, and think about how to readjust things > so it doesn't happen again. Some more background: the whole database was recently migrated to a new database server and thus restored from a dump file. That is, the table 'protein_hsps' and its index were build from scratch. Since then, the only operations on that table were some column type conversions (e.g., integer to smallint, double to real). Data-wise, the only operations were UPDATES on a single boolean column by adding precalculated values (true/false) to EACH row in the database (~ 16bn rows). These UPDATEs were most likely the cause for the (huge) number of dead tuples (0.32%, see above), weren't they? My original "reasoning" behind not partitioning the table was: 1st) Postgres supports bigint/serial, so billions of rows are possible, in principle ... 2nd) A clustered index will help me to efficiently pick the rowset by a certain criterion (column 'origin' in this case) 3rd) I'm actually never deleting rows but rather only do INSERTs and UPDATEs (However, I just realized that UPDATEs do also cause dead tuples). 4th) Partitioning still involves a fair amount of manual work and planning (setup triggers etc.) Is there any good rule of thumb regarding how large a table should be at most (row count-wise), before a table partitioning becomes rather mandatory? Kind regards, Jan
Jan <pgsql.admin@j.mk-contact.de> writes: > I'm still not getting the math behind it. The below tuple stats show a > dead tuple count of 63,187,655 whereas the PGadmin output (see my > initial e-mail) reported the message "scanned index protein_hsps_pkey to > remove 178956753 row versions" nine times before I cancelled it. That > is, if one multiplies 178,956,753 by 9 it yields 1,610,610,777 (dead) > rows. But the latter number is much higher than the above 63m rows? Do I > compare the wrong numbers? There's something awfully wacky about that. I suspect that pgstattuple is somehow giving wrong answers, but I don't see any plausible theory as to why. > Some more background: the whole database was recently migrated to a new > database server and thus restored from a dump file. That is, the table > 'protein_hsps' and its index were build from scratch. Since then, the > only operations on that table were some column type conversions (e.g., > integer to smallint, double to real). Data-wise, the only operations > were UPDATES on a single boolean column by adding precalculated values > (true/false) to EACH row in the database (~ 16bn rows). These UPDATEs > were most likely the cause for the (huge) number of dead tuples (0.32%, > see above), weren't they? Such an UPDATE should have left the table 50% dead tuples, since every row would leave behind a dead version. On the other hand, an ALTER COLUMN TYPE operation should rewrite the whole table and leave no dead tuples behind. No matter which one you did last, it doesn't square with 0.32% dead tuples. My best guess at this point is that what you did last is an UPDATE, so you have 50% dead tuples, and for some reason pgstattuple is not telling you the truth about that. But the VACUUM is showing reality. How long did those UPDATEs and ALTER TABLEs take? If an ALTER seemed tolerable then maybe what you want to do is VACUUM FULL, which would be roughly the same cost. regards, tom lane
On 10/07/2015 01:53 AM, Tom Lane wrote: > Jan <pgsql.admin@j.mk-contact.de> writes: >> I'm still not getting the math behind it. The below tuple stats show a >> dead tuple count of 63,187,655 whereas the PGadmin output (see my >> initial e-mail) reported the message "scanned index protein_hsps_pkey to >> remove 178956753 row versions" nine times before I cancelled it. That >> is, if one multiplies 178,956,753 by 9 it yields 1,610,610,777 (dead) >> rows. But the latter number is much higher than the above 63m rows? Do I >> compare the wrong numbers? > There's something awfully wacky about that. I suspect that pgstattuple > is somehow giving wrong answers, but I don't see any plausible theory > as to why. > >> Some more background: the whole database was recently migrated to a new >> database server and thus restored from a dump file. That is, the table >> 'protein_hsps' and its index were build from scratch. Since then, the >> only operations on that table were some column type conversions (e.g., >> integer to smallint, double to real). Data-wise, the only operations >> were UPDATES on a single boolean column by adding precalculated values >> (true/false) to EACH row in the database (~ 16bn rows). These UPDATEs >> were most likely the cause for the (huge) number of dead tuples (0.32%, >> see above), weren't they? > Such an UPDATE should have left the table 50% dead tuples, since every > row would leave behind a dead version. On the other hand, an ALTER > COLUMN TYPE operation should rewrite the whole table and leave no dead > tuples behind. No matter which one you did last, it doesn't square with > 0.32% dead tuples. > > My best guess at this point is that what you did last is an UPDATE, > so you have 50% dead tuples, and for some reason pgstattuple is not > telling you the truth about that. But the VACUUM is showing reality. Exactly, I altered the column types first. (That had been on my list for a long time, and I used the planned database server downtime for these optimizations.) After that, back in production mode, the aforementioned UPDATEs were applied (not at once, but in batches of rows). That is, I also assume 50% dead tuples right now. > How long did those UPDATEs and ALTER TABLEs take? If an ALTER seemed > tolerable then maybe what you want to do is VACUUM FULL, which would > be roughly the same cost. I protocolled the exact execution times of each ALTER statement (unfortunately not available right now because I'm at home) and these took roundabout 1-2 days each. Now I will go for the VACUUM FULL and will report back here once it's done. *fingers crossed* Many thanks for your time! Kind regards, Jan
On 10/07/2015 01:53 AM, Tom Lane wrote: > Such an UPDATE should have left the table 50% dead tuples, since every > row would leave behind a dead version. On the other hand, an ALTER > COLUMN TYPE operation should rewrite the whole table and leave no dead > tuples behind. No matter which one you did last, it doesn't square with > 0.32% dead tuples. > > My best guess at this point is that what you did last is an UPDATE, > so you have 50% dead tuples, and for some reason pgstattuple is not > telling you the truth about that. But the VACUUM is showing reality. > > How long did those UPDATEs and ALTER TABLEs take? If an ALTER seemed > tolerable then maybe what you want to do is VACUUM FULL, which would > be roughly the same cost. Tom, as you predicted, the VACUUM FULL finished relatively quickly yesterday after about 31h. Here is the verbose output: INFO: vacuuming "public.protein_hsps" INFO: "protein_hsps": found 63187655 removable, 11353611882 nonremovable row versions in 181253461 pages DETAIL: 0 dead row versions cannot be removed yet. CPU 2814.17s/8479.90u sec elapsed 15451.26 sec. Total query runtime: 114969739 ms. Here is the output from pgstattuple (before / after vacuum): -- SELECT * FROM pgstattuple('protein_hsps'); -- -- yielded: table_len 1484828352512 / 958853496832 tuple_count 11353611882 / 11353611882 tuple_len 874228114914 / 874228114914 tuple_percent 58.88 / 91.17 dead_tuple_count 63187655 / 0 dead_tuple_len 4810998304 / 0 dead_tuple_percent 0.32 / 0 free_space 495246133064 / 1872767456 free_percent 33.35 / 0.2 And the table sizes (before / after vacuum): -- SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size" -- FROM pg_class C LEFT JOIN pg_namespace N ON N.oid = C.relnamespace -- WHERE relname IN ('protein_hsps', 'protein_hsps_clustidx_on_origin', 'protein_hsps_pkey') -- -- yielded: public.protein_hsps 1383 GB / 893 GB public.protein_hsps_clustidx_on_origin 499 GB / 238 GB public.protein_hsps_pkey 494 GB / 238 GB The only thing which I currently not understand is the pgstattuple output, which tells me that there is 0.2% free space left. Actually there are 7.3T (i.e., 71%) left on the device so this is either a wrong display or this value refers to something other than the free disk space on the drive. Again, many thanks for the advice! Kind regards, Jan
2015-10-10 16:38 GMT+02:00 Jan <pgsql.admin@j.mk-contact.de>:
On 10/07/2015 01:53 AM, Tom Lane wrote:Such an UPDATE should have left the table 50% dead tuples, since every
row would leave behind a dead version. On the other hand, an ALTER
COLUMN TYPE operation should rewrite the whole table and leave no dead
tuples behind. No matter which one you did last, it doesn't square with
0.32% dead tuples.
My best guess at this point is that what you did last is an UPDATE,
so you have 50% dead tuples, and for some reason pgstattuple is not
telling you the truth about that. But the VACUUM is showing reality.
How long did those UPDATEs and ALTER TABLEs take? If an ALTER seemed
tolerable then maybe what you want to do is VACUUM FULL, which would
be roughly the same cost.
Tom, as you predicted, the VACUUM FULL finished relatively quickly yesterday after about 31h. Here is the verbose output:
INFO: vacuuming "public.protein_hsps"
INFO: "protein_hsps": found 63187655 removable, 11353611882 nonremovable row versions in 181253461 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 2814.17s/8479.90u sec elapsed 15451.26 sec.
Total query runtime: 114969739 ms.
Here is the output from pgstattuple (before / after vacuum):
-- SELECT * FROM pgstattuple('protein_hsps');
--
-- yielded:
table_len 1484828352512 / 958853496832
tuple_count 11353611882 / 11353611882
tuple_len 874228114914 / 874228114914
tuple_percent 58.88 / 91.17
dead_tuple_count 63187655 / 0
dead_tuple_len 4810998304 / 0
dead_tuple_percent 0.32 / 0
free_space 495246133064 / 1872767456
free_percent 33.35 / 0.2
And the table sizes (before / after vacuum):
-- SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size"
-- FROM pg_class C LEFT JOIN pg_namespace N ON N.oid = C.relnamespace
-- WHERE relname IN ('protein_hsps', 'protein_hsps_clustidx_on_origin', 'protein_hsps_pkey')
--
-- yielded:
public.protein_hsps 1383 GB / 893 GB
public.protein_hsps_clustidx_on_origin 499 GB / 238 GB
public.protein_hsps_pkey 494 GB / 238 GB
The only thing which I currently not understand is the pgstattuple output, which tells me that there is 0.2% free space left. Actually there are 7.3T (i.e., 71%) left on the device so this is either a wrong display or this value refers to something other than the free disk space on the drive.
Actually, it is the free space inside the table's files. It has nothing to do with your device (PostgreSQL doesn't know how much free space is available on your disks).
--