Thread: Autovacuum - what does it actually do?
Hello Everybody, I am trying to establish what does Autovaccum actually do, as it is causing data load failures? Documenatation from : http://www.postgresql.org/docs/8.3/static/routine-vacuuming.html States that VACCUMis needed to: 1. To recover or reuse disk space occupied by updated or deleted rows. 2. To update data statistics used by the PostgreSQL query planner. 3. To protect against loss of very old data due to transaction ID wraparound. Documentation from: PgADMINIII help, on the auto vacuum daemon: When enabled, the autovacuum daemon runs periodically and checks for tables that have had a large number of inserted, updatedor deleted tuples. So, which is it, perhaps I am mixing vaccum with autovacuum ? Updated or deleted rows - at which point the hoovering of dead stuff to keep the disk space tidy makes sence or Updated, deleted and inserted rows - the inserted rows trouble me, are the inserted rows merely analyzed for the benefitof the optimizer, or does vacuum actually do anything in their case as well? I ask because I am running into a weird scenario with large dataloads and the autovacuum process. My setup: Windows 2003 server, sp2 PostgreSQL 8.2.4 installed from the gui (autovacuum on in postmaster.conf by default, no changes to postmaster.conf at all) Type of data loaded: large spatial (non postgis) -> lots of blob data(4 gigs) This is my scenario: I am loading various sets of data, on several ones (that happen to be very large)it seems like the autovacuum threshold istriggered and on these select few datasets it crashes the postmaster: With code -1073741819, which I looked up to mean an access violation to an address space, it was apparently bugged: BUG #3427: Autovacuum crashed server http://archives.postgresql.org/pgsql-bugs/2007-07/msg00016.php The bug does not explain what is actually happening, what is autovacuum doing to get an access violation, why would it happenon very specific data everytime ? Sorry for the long post, Thank you, Sincerely, Kasia Pg_log: 2007-11-07 08:32:29 LOG: autovacuum process (PID 3668) exited with exit code -1073741819 2007-11-07 08:32:29 LOG: terminating any other active server processes 2007-11-07 08:32:29 WARNING: terminating connection because of crash of another server process 2007-11-07 08:32:29 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,because another server process exited abnormally and possibly corrupted shared memory. 2007-11-07 08:32:29 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2007-11-07 08:32:29 WARNING: terminating connection because of crash of another server process 2007-11-07 08:32:29 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,because another server process exited abnormally and possibly corrupted shared memory. 2007-11-07 08:32:29 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2007-11-07 08:32:29 WARNING: terminating connection because of crash of another server process 2007-11-07 08:32:29 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,because another server process exited abnormally and possibly corrupted shared memory. 2007-11-07 08:32:29 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2007-11-07 08:32:29 WARNING: terminating connection because of crash of another server process 2007-11-07 08:32:29 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,because another server process exited abnormally and possibly corrupted shared memory. 2007-11-07 08:32:29 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2007-11-07 08:32:29 LOG: all server processes terminated; reinitializing 2007-11-07 08:32:29 FATAL: the database system is starting up 2007-11-07 08:32:29 LOG: database system was interrupted at 2007-11-07 08:32:29 Pacific Standard Time 2007-11-07 08:32:29 LOG: checkpoint record is at 15/431DF0C8 2007-11-07 08:32:29 LOG: redo record is at 15/43008230; undo record is at 0/0; shutdown FALSE 2007-11-07 08:32:29 LOG: next transaction ID: 0/99496820; next OID: 6802317 2007-11-07 08:32:29 LOG: next MultiXactId: 1; next MultiXactOffset: 0 2007-11-07 08:32:29 LOG: database system was not properly shut down; automatic recovery in progress 2007-11-07 08:32:29 FATAL: the database system is starting up 2007-11-07 08:32:29 LOG: redo starts at 15/43008230 2007-11-07 08:32:29 FATAL: the database system is starting up
On Fri, 2007-11-09 at 09:06 -0800, Kasia Tuszynska wrote: > Hello Everybody, > I am trying to establish what does Autovaccum actually do, as it is causing data load failures? Autovacuum is a daemon that triggers vacuums when enough dead tuples are generated in a table. The idea is that if it set up properly, the database can automatically vacuum the tables when they need to get vacuumed as opposed to the administrator having to do it manually. There is still some tuning involved, and it's not perfect, but that's another story. There is no difference in the work that is actually done between a vacuum triggered by autovacuum and one triggered manually. > I ask because I am running into a weird scenario with large dataloads and the autovacuum process. > My setup: > Windows 2003 server, sp2 > PostgreSQL 8.2.4 installed from the gui (autovacuum on in postmaster.conf by default, no changes to postmaster.conf atall) > Type of data loaded: large spatial (non postgis) -> lots of blob data(4 gigs) > > This is my scenario: > I am loading various sets of data, on several ones (that happen to be very large)it seems like the autovacuum thresholdis triggered and on these select few datasets it crashes the postmaster: > With code -1073741819, which I looked up to mean an access violation to an address space, it was apparently bugged: Can you provide a bit more info on this data load procedure? Are you deleting from a table then loading the data? Are you loading into a table that already has data that is being changed? How are you loading it? Is it all in one transaction? A data load itself should not trigger a vacuum. Dead tuples are created by updating or deleting tuples, not by inserting them. You will need to analyze the table afterwards, it's possibe that autovac is triggering that analyze. > BUG #3427: Autovacuum crashed server > > http://archives.postgresql.org/pgsql-bugs/2007-07/msg00016.php > > The bug does not explain what is actually happening, what is autovacuum doing to get an access violation, why would ithappen on very specific data everytime ? I really don't know anything about Windows, but on UNIX, if a process dies a hard enough death, it can corrupt the shared memory segment and bring the whole postmaster down. That's what it looks like is happening. I have no why though. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
-----Original Message----- From: Brad Nicholson [mailto:bnichols@ca.afilias.info] Sent: Friday, November 09, 2007 12:23 PM To: Kasia Tuszynska Cc: pgsql-novice@postgresql.org Subject: Re: [NOVICE] Autovacuum - what does it actually do? On Fri, 2007-11-09 at 09:06 -0800, Kasia Tuszynska wrote: > Hello Everybody, > I am trying to establish what does Autovaccum actually do, as it is causing data load failures? Autovacuum is a daemon that triggers vacuums when enough dead tuples are generated in a table. The idea is that if it set up properly, the database can automatically vacuum the tables when they need to get vacuumed as opposed to the administrator having to do it manually. There is still some tuning involved, and it's not perfect, but that's another story. There is no difference in the work that is actually done between a vacuum triggered by autovacuum and one triggered manually. > I ask because I am running into a weird scenario with large dataloads and the autovacuum process. > My setup: > Windows 2003 server, sp2 > PostgreSQL 8.2.4 installed from the gui (autovacuum on in postmaster.conf by default, no changes to postmaster.conf at all) > Type of data loaded: large spatial (non postgis) -> lots of blob data(4 gigs) > > This is my scenario: > I am loading various sets of data, on several ones (that happen to be very large)it seems like the autovacuum threshold is triggered and on these select few datasets it crashes the postmaster: > With code -1073741819, which I looked up to mean an access violation to an address space, it was apparently bugged: Can you provide a bit more info on this data load procedure? Are you deleting from a table then loading the data? Are you loading into a table that already has data that is being changed? How are you loading it? Is it all in one transaction? A data load itself should not trigger a vacuum. Dead tuples are created by updating or deleting tuples, not by inserting them. You will need to analyze the table afterwards, it's possibe that autovac is triggering that analyze. > BUG #3427: Autovacuum crashed server > > http://archives.postgresql.org/pgsql-bugs/2007-07/msg00016.php > > The bug does not explain what is actually happening, what is autovacuum doing to get an access violation, why would it happen on very specific data everytime ? I really don't know anything about Windows, but on UNIX, if a process dies a hard enough death, it can corrupt the shared memory segment and bring the whole postmaster down. That's what it looks like is happening. I have no why though. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. Hi Brad, Thank you for taking the time to answer my question. A little bit more about what is happening. "Can you provide a bit more info on this data load procedure? Are you deleting from a table then loading the data? Are you loading into a table that already has data that is being changed? How are you loading it? Is it all in one transaction?" - this is a fresh load, the table is created and data is inserted, there is no pre- exiting data to be considered for a delete or an update. - potentially a gist index would be created on the table as the table is created and than, it is immediately dropped to be recreated after the bulk load has completed. - the data is has a spatial component stored in a user defined spatial type, similar in usage and construction to the postgis spatial type, it stored data in binary and it is what utilizes the gist - we are loading data in many transactions with an autocommitt interval set on the client app that is loading. - turning autovacuum off in postmaster.conf solves the issue and data is loaded without a problem. Thank you, Kasia