Thread: autovacuum blocks the operations of other manual vacuum
Hi, I have a question about the behavior of autovacuum. When I have a big table A which is being processed by autovacuum, I also manually use (full) vacuum to clean another table B. Then I found that I always got something like “found 0 removable, 14283 nonremovable row”. However, if I stop the autovacuum functionality and use vacuum on that big table A manually, I can clean table B (ex. found 22615 removable, 2049 nonremovable row). Is this correct? Why do vacuum and autovacuum have different actions? Ps. My postgreSQL is 8.4.
Excerpts from kuopo's message of vie nov 12 05:01:24 -0300 2010: > Hi, > > I have a question about the behavior of autovacuum. When I have a big > table A which is being processed by autovacuum, I also manually use > (full) vacuum to clean another table B. Then I found that I always got > something like “found 0 removable, 14283 nonremovable row”. However, > if I stop the autovacuum functionality and use vacuum on that big > table A manually, I can clean table B (ex. found 22615 removable, 2049 > nonremovable row). > > Is this correct? Why do vacuum and autovacuum have different actions? Vacuum full does not assume that it can clean up tuples while other transactions are running, and that includes the (non full, or "lazy") vacuum that autovacuum is running. Autovacuum only runs lazy vacuum; and that one is aware that other concurrent vacuums can be ignored. Just don't use vacuum full unless strictly necessary. It has other drawbacks. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi, Thanks for your response. I've checked it again and found that the main cause is the execution of ANALYZE. As I have mentioned, I have two tables: table A is a big one (around 10M~100M records) for log data and table B is a small one (around 1k records) for keeping some current status. There are a lot of update operations and some search operations on the table B. For the performance issue, I would like to keep table B as compact as possible. According your suggestion, I try to invoke standard vacuum (not full) more frequently (e.g., once per min). However, when I analyze the table A, the autovacuum or vacuum on the table B cannot find any removable row version (the number of nonremoveable row versions and pages keeps increasing). After the analysis finishes, the search operations on the table B is still inefficient. If I call full vacuum right now, then I can have quick response time of the search operations on the table B again. Any suggestions for this situation? On Tue, Nov 16, 2010 at 11:26 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from kuopo's message of vie nov 12 05:01:24 -0300 2010: >> Hi, >> >> I have a question about the behavior of autovacuum. When I have a big >> table A which is being processed by autovacuum, I also manually use >> (full) vacuum to clean another table B. Then I found that I always got >> something like “found 0 removable, 14283 nonremovable row”. However, >> if I stop the autovacuum functionality and use vacuum on that big >> table A manually, I can clean table B (ex. found 22615 removable, 2049 >> nonremovable row). >> >> Is this correct? Why do vacuum and autovacuum have different actions? > > Vacuum full does not assume that it can clean up tuples while other > transactions are running, and that includes the (non full, or "lazy") > vacuum that autovacuum is running. Autovacuum only runs lazy vacuum; > and that one is aware that other concurrent vacuums can be ignored. > > Just don't use vacuum full unless strictly necessary. It has other > drawbacks. > > -- > Álvaro Herrera <alvherre@commandprompt.com> > The PostgreSQL Company - Command Prompt, Inc. > PostgreSQL Replication, Consulting, Custom Development, 24x7 support >
Excerpts from kuopo's message of jue nov 18 04:10:36 -0300 2010: > Hi, > > Thanks for your response. I've checked it again and found that the > main cause is the execution of ANALYZE. As I have mentioned, I have > two tables: table A is a big one (around 10M~100M records) for log > data and table B is a small one (around 1k records) for keeping some > current status. There are a lot of update operations and some search > operations on the table B. For the performance issue, I would like to > keep table B as compact as possible. According your suggestion, I try > to invoke standard vacuum (not full) more frequently (e.g., once per > min). > > However, when I analyze the table A, the autovacuum or vacuum on the > table B cannot find any removable row version (the number of > nonremoveable row versions and pages keeps increasing). After the > analysis finishes, the search operations on the table B is still > inefficient. If I call full vacuum right now, then I can have quick > response time of the search operations on the table B again. Hmm, I don't think we can optimize the analyze-only operation the same way we optimize vacuum (i.e. allow vacuum to proceed while it's in progress). Normally analyze shouldn't take all that long anyway -- why is it that slow? Are you calling it in a transaction that also does other stuff? Are you analyzing more than one table in a single transaction, perhaps even the whole database? Perhaps you could speed it up by lowering vacuum_cost_delay, if it's set to a nonzero value. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> Excerpts from kuopo's message of jue nov 18 04:10:36 -0300 2010: >> However, when I analyze the table A, the autovacuum or vacuum on the >> table B cannot find any removable row version (the number of >> nonremoveable row versions and pages keeps increasing). After the >> analysis finishes, the search operations on the table B is still >> inefficient. If I call full vacuum right now, then I can have quick >> response time of the search operations on the table B again. Hi, I don't know how to fix the long VACUUM/ANALYZE, but have you tried to minimize the growth using HOT? HOT means that if you update only columns that are not indexed, and if the update can fit into the same page (into an update chain), this would not create a dead row. Are there any indexes on the small table? How large is it? You've mentioned there are about 2049 rows - that might be just a few pages so the indexes would not be very efficient anyway. Try to remove the indexes, and maybe create the table with a smaller fillfactor (so that there is more space for the updates). That should be much more efficient and the table should not grow. You can see if HOT works through pg_stat_all_tables view (columns n_tup_upd and n_tup_hot_upd). regards Tomas
Excerpts from kuopo's message of dom nov 21 11:15:52 -0300 2010: > In my experiment, I need about 1~3 min to finish the analyze operation > on the big table (which depends on the value of vacuum_cost_delay). I > am not surprised because this table is a really big one (now, it has > over 200M records). Okay. You may want to consider lowering the statistics size for all the column in that table; that would reduce analyze time, at the cost of possibly worsening the plans for that table, depending on how irregular the distribution is. See ALTER TABLE / SET STATISTICS in the documentation, and the default_statistics_target parameter in postgresql.conf. > However, the most of my concerns is the behavior of analyze/vacuum. > You mentioned that the analyze-only operation cannot be optimized as > the same way on optimizing vacuum. Does that mean the analyze > operation on a table would unavoidably affect the vacuum proceeded on > another one? That's correct. I think you can run VACUUM ANALYZE, and it would do both things at once; AFAIK this is also optimized like VACUUM is, but I admit I'm not 100% sure (and I can't check right now). > If this is a normal reaction for an analyze operation, > maybe I should try to lower vacuum_cost_delay or use more powerful > hardware to minimize the interfered period. So, the pages for the > small table would not increase quickly. I think it would make sense to have as low a cost_delay as possible for this ANALYZE. (Note you can change it locally with a SET command; no need to touch postgresql.conf. So you can change it when you analyze just this large table). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
In my experiment, I need about 1~3 min to finish the analyze operation on the big table (which depends on the value of vacuum_cost_delay). I am not surprised because this table is a really big one (now, it has over 200M records). However, the most of my concerns is the behavior of analyze/vacuum. You mentioned that the analyze-only operation cannot be optimized as the same way on optimizing vacuum. Does that mean the analyze operation on a table would unavoidably affect the vacuum proceeded on another one? If this is a normal reaction for an analyze operation, maybe I should try to lower vacuum_cost_delay or use more powerful hardware to minimize the interfered period. So, the pages for the small table would not increase quickly. Do you have any suggestion? Thanks!! On Sat, Nov 20, 2010 at 9:49 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from kuopo's message of jue nov 18 04:10:36 -0300 2010: >> Hi, >> >> Thanks for your response. I've checked it again and found that the >> main cause is the execution of ANALYZE. As I have mentioned, I have >> two tables: table A is a big one (around 10M~100M records) for log >> data and table B is a small one (around 1k records) for keeping some >> current status. There are a lot of update operations and some search >> operations on the table B. For the performance issue, I would like to >> keep table B as compact as possible. According your suggestion, I try >> to invoke standard vacuum (not full) more frequently (e.g., once per >> min). >> >> However, when I analyze the table A, the autovacuum or vacuum on the >> table B cannot find any removable row version (the number of >> nonremoveable row versions and pages keeps increasing). After the >> analysis finishes, the search operations on the table B is still >> inefficient. If I call full vacuum right now, then I can have quick >> response time of the search operations on the table B again. > > Hmm, I don't think we can optimize the analyze-only operation the same > way we optimize vacuum (i.e. allow vacuum to proceed while it's in > progress). Normally analyze shouldn't take all that long anyway -- why > is it that slow? Are you calling it in a transaction that also does > other stuff? Are you analyzing more than one table in a single > transaction, perhaps even the whole database? > > Perhaps you could speed it up by lowering vacuum_cost_delay, if it's set > to a nonzero value. > > -- > Álvaro Herrera <alvherre@commandprompt.com> > The PostgreSQL Company - Command Prompt, Inc. > PostgreSQL Replication, Consulting, Custom Development, 24x7 support >
Thanks for your information. I am using postgresql 8.4 and this version should have already supported HOT. The frequently updated columns are not indexed columns. So, the frequent updates should not create many dead records. I also did a small test. If I don't execute vacuum, the number of pages of the small table does not increase. However, analyzing the big table still bothers me. According current results, if the analyze operation is triggered, vacuum or HOT would not function as I expect. On Sat, Nov 20, 2010 at 12:43 PM, <tv@fuzzy.cz> wrote: >> Excerpts from kuopo's message of jue nov 18 04:10:36 -0300 2010: >>> However, when I analyze the table A, the autovacuum or vacuum on the >>> table B cannot find any removable row version (the number of >>> nonremoveable row versions and pages keeps increasing). After the >>> analysis finishes, the search operations on the table B is still >>> inefficient. If I call full vacuum right now, then I can have quick >>> response time of the search operations on the table B again. > > Hi, I don't know how to fix the long VACUUM/ANALYZE, but have you tried to > minimize the growth using HOT? > > HOT means that if you update only columns that are not indexed, and if the > update can fit into the same page (into an update chain), this would not > create a dead row. > > Are there any indexes on the small table? How large is it? You've > mentioned there are about 2049 rows - that might be just a few pages so > the indexes would not be very efficient anyway. > > Try to remove the indexes, and maybe create the table with a smaller > fillfactor (so that there is more space for the updates). > > That should be much more efficient and the table should not grow. > > You can see if HOT works through pg_stat_all_tables view (columns > n_tup_upd and n_tup_hot_upd). > > regards > Tomas > >