Thread: Autovacuum - what does it actually do?

Autovacuum - what does it actually do?

From
"Kasia Tuszynska"
Date:
Hello Everybody,
I am trying to establish what does Autovaccum actually do, as it is causing data load failures?

Documenatation from : http://www.postgresql.org/docs/8.3/static/routine-vacuuming.html
States that VACCUMis needed to:
1. To recover or reuse disk space occupied by updated or deleted rows.
2. To update data statistics used by the PostgreSQL query planner.
3. To protect against loss of very old data due to transaction ID wraparound.

Documentation from: PgADMINIII help, on the auto vacuum daemon:
When enabled, the autovacuum daemon runs periodically and checks for tables that have had a large number of inserted,
updatedor deleted tuples. 

So, which is it, perhaps I am mixing vaccum with autovacuum ?
Updated or deleted rows - at which point the hoovering of dead stuff to keep the disk space tidy makes sence or
Updated, deleted and inserted rows - the inserted rows trouble me, are the inserted rows merely analyzed for the
benefitof the optimizer, or does vacuum actually do anything in their case as well?  

I ask because I am running into a weird scenario with large dataloads and the autovacuum process.
My setup:
Windows 2003 server, sp2
PostgreSQL 8.2.4 installed from the gui (autovacuum on in postmaster.conf by default, no changes to postmaster.conf at
all)
Type of data loaded: large spatial (non postgis) -> lots of blob data(4 gigs)

This is my scenario:
I am loading various sets of data, on several ones (that happen to be very large)it seems like the autovacuum threshold
istriggered and on these select few datasets it crashes the postmaster: 
With code -1073741819, which I looked up to mean an access violation to an address space, it was apparently bugged:

  BUG #3427: Autovacuum crashed server

  http://archives.postgresql.org/pgsql-bugs/2007-07/msg00016.php

The bug does not explain what is actually happening, what is autovacuum doing to get an access violation, why would it
happenon very specific data everytime ? 

Sorry for the long post,
Thank you,
Sincerely,
Kasia

Pg_log:
2007-11-07 08:32:29 LOG:  autovacuum process (PID 3668) exited with exit code -1073741819
2007-11-07 08:32:29 LOG:  terminating any other active server processes
2007-11-07 08:32:29 WARNING:  terminating connection because of crash of another server process
2007-11-07 08:32:29 DETAIL:  The postmaster has commanded this server process to roll back the current transaction and
exit,because another server process exited abnormally and possibly corrupted shared memory. 
2007-11-07 08:32:29 HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2007-11-07 08:32:29 WARNING:  terminating connection because of crash of another server process
2007-11-07 08:32:29 DETAIL:  The postmaster has commanded this server process to roll back the current transaction and
exit,because another server process exited abnormally and possibly corrupted shared memory. 
2007-11-07 08:32:29 HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2007-11-07 08:32:29 WARNING:  terminating connection because of crash of another server process
2007-11-07 08:32:29 DETAIL:  The postmaster has commanded this server process to roll back the current transaction and
exit,because another server process exited abnormally and possibly corrupted shared memory. 
2007-11-07 08:32:29 HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2007-11-07 08:32:29 WARNING:  terminating connection because of crash of another server process
2007-11-07 08:32:29 DETAIL:  The postmaster has commanded this server process to roll back the current transaction and
exit,because another server process exited abnormally and possibly corrupted shared memory. 
2007-11-07 08:32:29 HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2007-11-07 08:32:29 LOG:  all server processes terminated; reinitializing
2007-11-07 08:32:29 FATAL:  the database system is starting up
2007-11-07 08:32:29 LOG:  database system was interrupted at 2007-11-07 08:32:29 Pacific Standard Time
2007-11-07 08:32:29 LOG:  checkpoint record is at 15/431DF0C8
2007-11-07 08:32:29 LOG:  redo record is at 15/43008230; undo record is at 0/0; shutdown FALSE
2007-11-07 08:32:29 LOG:  next transaction ID: 0/99496820; next OID: 6802317
2007-11-07 08:32:29 LOG:  next MultiXactId: 1; next MultiXactOffset: 0
2007-11-07 08:32:29 LOG:  database system was not properly shut down; automatic recovery in progress
2007-11-07 08:32:29 FATAL:  the database system is starting up
2007-11-07 08:32:29 LOG:  redo starts at 15/43008230
2007-11-07 08:32:29 FATAL:  the database system is starting up



Re: Autovacuum - what does it actually do?

From
Brad Nicholson
Date:
On Fri, 2007-11-09 at 09:06 -0800, Kasia Tuszynska wrote:
> Hello Everybody,
> I am trying to establish what does Autovaccum actually do, as it is causing data load failures?

Autovacuum is a daemon that triggers vacuums when enough dead tuples are
generated in a table.  The idea is that if it set up properly, the
database can automatically vacuum the tables when they need to get
vacuumed as opposed to the administrator having to do it manually.
There is still some tuning involved, and it's not perfect, but that's
another story.

There is no difference in the work that is actually done between a
vacuum triggered by autovacuum and one triggered manually.

> I ask because I am running into a weird scenario with large dataloads and the autovacuum process.
> My setup:
> Windows 2003 server, sp2
> PostgreSQL 8.2.4 installed from the gui (autovacuum on in postmaster.conf by default, no changes to postmaster.conf
atall) 
> Type of data loaded: large spatial (non postgis) -> lots of blob data(4 gigs)
>
> This is my scenario:
> I am loading various sets of data, on several ones (that happen to be very large)it seems like the autovacuum
thresholdis triggered and on these select few datasets it crashes the postmaster: 
> With code -1073741819, which I looked up to mean an access violation to an address space, it was apparently bugged:

Can you provide a bit more info on this data load procedure?  Are you deleting from a table then loading the data?
Are you loading into a table that already has data that is being changed?  How are you loading it?  Is it all in
one transaction?

A data load itself should not trigger a vacuum.  Dead tuples are created by updating or deleting tuples, not by
inserting
them.  You will need to analyze the table afterwards, it's possibe that autovac is triggering that analyze.

>   BUG #3427: Autovacuum crashed server
>
>   http://archives.postgresql.org/pgsql-bugs/2007-07/msg00016.php
>
> The bug does not explain what is actually happening, what is autovacuum doing to get an access violation, why would
ithappen on very specific data everytime ? 


I really don't know anything about Windows, but on UNIX, if a process
dies a hard enough death, it can corrupt the shared memory segment and
bring the whole postmaster down.  That's what it looks like is
happening.  I have no why though.
--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: Autovacuum - what does it actually do?

From
"Kasia Tuszynska"
Date:

-----Original Message-----
From: Brad Nicholson [mailto:bnichols@ca.afilias.info]
Sent: Friday, November 09, 2007 12:23 PM
To: Kasia Tuszynska
Cc: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Autovacuum - what does it actually do?

On Fri, 2007-11-09 at 09:06 -0800, Kasia Tuszynska wrote:
> Hello Everybody,
> I am trying to establish what does Autovaccum actually do, as it is
causing data load failures?

Autovacuum is a daemon that triggers vacuums when enough dead tuples are
generated in a table.  The idea is that if it set up properly, the
database can automatically vacuum the tables when they need to get
vacuumed as opposed to the administrator having to do it manually.
There is still some tuning involved, and it's not perfect, but that's
another story.

There is no difference in the work that is actually done between a
vacuum triggered by autovacuum and one triggered manually.

> I ask because I am running into a weird scenario with large dataloads
and the autovacuum process.
> My setup:
> Windows 2003 server, sp2
> PostgreSQL 8.2.4 installed from the gui (autovacuum on in
postmaster.conf by default, no changes to postmaster.conf at all)
> Type of data loaded: large spatial (non postgis) -> lots of blob
data(4 gigs)
>
> This is my scenario:
> I am loading various sets of data, on several ones (that happen to be
very large)it seems like the autovacuum threshold is triggered and on
these select few datasets it crashes the postmaster:
> With code -1073741819, which I looked up to mean an access violation
to an address space, it was apparently bugged:

Can you provide a bit more info on this data load procedure?  Are you
deleting from a table then loading the data?
Are you loading into a table that already has data that is being
changed?  How are you loading it?  Is it all in
one transaction?

A data load itself should not trigger a vacuum.  Dead tuples are created
by updating or deleting tuples, not by inserting
them.  You will need to analyze the table afterwards, it's possibe that
autovac is triggering that analyze.

>   BUG #3427: Autovacuum crashed server
>
>   http://archives.postgresql.org/pgsql-bugs/2007-07/msg00016.php
>
> The bug does not explain what is actually happening, what is
autovacuum doing to get an access violation, why would it happen on very
specific data everytime ?


I really don't know anything about Windows, but on UNIX, if a process
dies a hard enough death, it can corrupt the shared memory segment and
bring the whole postmaster down.  That's what it looks like is
happening.  I have no why though.
--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

Hi Brad,
Thank you for taking the time to answer my question.
A little bit more about what is happening.

"Can you provide a bit more info on this data load procedure?  Are you
deleting from a table then loading the data?
Are you loading into a table that already has data that is being
changed?  How are you loading it?  Is it all in
one transaction?"
- this is a fresh load, the table is created and data is inserted, there
is no pre- exiting data to be considered for a delete or an update.
- potentially a gist index would be created on the table as the table
is created and than, it is immediately dropped to be recreated after the
bulk load has completed.
- the data is has a spatial component stored in a user defined spatial
type, similar in usage and construction to the postgis spatial type, it
stored data in binary and it is what utilizes the gist
- we are loading data in many transactions with an autocommitt interval
set on the client app that is loading.
- turning autovacuum off in postmaster.conf solves the issue and data is
loaded without a problem.

Thank you,
Kasia