Thread: Deadlock while doing VACUUM??

Deadlock while doing VACUUM??

From
Kevin Brown
Date:
I've been experimenting with Matthew T. O'Connor's pg_avd (auto vacuum
daemon), which is actually quite an interesting piece of software that
so far seems to function pretty well.

I've ported it to PG 7.2.4, which is the version I'm running on my
system, and experimenting with it has uncovered what I can only
consider a potentially serious bug in PG.

When a heavy INSERT or UPDATE load on a table is occurring (lots of
quick INSERTs or UPDATEs within a single transaction), a VACUUM
ANALYZE (or just straight VACUUM) has a really good chance (10% or so)
of causing either the INSERT/UPDATE or the VACUUM to fail with a
"deadlock detected" error.  Usually the INSERT/UPDATE is what fails.
I have verified this with manual VACUUM and VACUUM ANALYZE commands
issued on the command line while a perl script is generating a heavy
load of inserts and/or updates, so this isn't pg_avd's fault.

A straight ANALYZE on the table doesn't seem to trigger the problem,
while a straight VACUUM does.

Statistics collection is enabled -- it's required for pg_avd to run --
but I suspect that this is irrelevant (what does a straight VACUUM do
with the statistics tables, if anything?).

Even dropping the table and all of its indexes and recreating it does
nothing to solve this problem, so I don't have any reason to suspect
corruption of the table itself.  Corruption elsewhere is a
possibility, I suppose, but I haven't noticed any strangeness
elsewhere.

The only other thing of note is that a REALLY LONG running INSERT
... SELECT (with the source being a temporary table) is running
concurrently on a different table in a different database.  That, too,
is running within a transaction, but I don't have reason to believe
that it's having an effect.  When it completes I'll do more testing to
determine if it really did have an effect.

I've searched the archives for references to this and have found
nothing relevant (all references I found are to older implementations
in which VACUUM locks the table).  Have I managed to uncover something
new here?

I don't have 7.3.x (or later) installed so I can't test this myself on
that or CVS tip, but I may have to if nobody else has the time.


Anyone have any suggestions on what to do about this?  This problem
pretty much kills pg_avd's usefulness, at least on 7.2.x.  :-(


I'll be happy to supply the Perl script I'm using to do the inserts,
if that'll help people track this down...



-- 
Kevin Brown                          kevin@sysexperts.com



Re: Deadlock while doing VACUUM??

From
Tom Lane
Date:
Kevin Brown <kevin@sysexperts.com> writes:
> When a heavy INSERT or UPDATE load on a table is occurring (lots of
> quick INSERTs or UPDATEs within a single transaction), a VACUUM
> ANALYZE (or just straight VACUUM) has a really good chance (10% or so)
> of causing either the INSERT/UPDATE or the VACUUM to fail with a
> "deadlock detected" error.

I was unable to replicate this in CVS tip, using "pgbench -c 10 -t 1000"
as the load generator.
        regards, tom lane



Re: Deadlock while doing VACUUM??

From
"Thomas T. Thai"
Date:
> Even dropping the table and all of its indexes and recreating it does
> nothing to solve this problem, so I don't have any reason to suspect
> corruption of the table itself.  Corruption elsewhere is a
> possibility, I suppose, but I haven't noticed any strangeness
> elsewhere.

This sounds very similiar to problems I was having under the thread:
 Re: [HACKERS] BUG: Vacuum Analyze - datumGetSize: Invalid typLen 0

Dropping the table, indexes, and recreating it didn't help me either. I
had to drop the database!

What kind of platform are you on? 64-bit?

I've noticed that 7.4-snapshot fixed a huge amount of bugs, but the vacuum
analyze bug is still there.

--
Thomas



Re: Deadlock while doing VACUUM??

From
Kevin Brown
Date:
Tom Lane wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> > When a heavy INSERT or UPDATE load on a table is occurring (lots of
> > quick INSERTs or UPDATEs within a single transaction), a VACUUM
> > ANALYZE (or just straight VACUUM) has a really good chance (10% or so)
> > of causing either the INSERT/UPDATE or the VACUUM to fail with a
> > "deadlock detected" error.
> 
> I was unable to replicate this in CVS tip, using "pgbench -c 10 -t 1000"
> as the load generator.

I guess I should mention that I was doing VACUUM on the individual
tables, not the entire database.  I have no idea if that makes any
difference.

I'll see about compiling and installing CVS tip for testing.  I'll
also see about doing some testing on 7.3.2.


Can anyone here independently verify what I've found on 7.2.4?  You'll
probably have to run a process that does continuous INSERTs on a
table, and another that does continuous VACUUMs on the same table at
the same time.



-- 
Kevin Brown                          kevin@sysexperts.com



Re: Deadlock while doing VACUUM??

From
Kevin Brown
Date:
Tom Lane wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> > When a heavy INSERT or UPDATE load on a table is occurring (lots of
> > quick INSERTs or UPDATEs within a single transaction), a VACUUM
> > ANALYZE (or just straight VACUUM) has a really good chance (10% or so)
> > of causing either the INSERT/UPDATE or the VACUUM to fail with a
> > "deadlock detected" error.
> 
> I was unable to replicate this in CVS tip, using "pgbench -c 10 -t 1000"
> as the load generator.

I finally got 7.3.2 installed, and confirmed that the problem does not
exist on that version.  So this is something that's limited to the
7.2.x tree.  Which, I guess, means that it's not going to get fixed
for that tree (I assume that 7.2.x is effectively end-of-lifed)...

On 7.3.2, a concurrent VACUUM appears to slow inserts down A LOT, but
it won't deadlock them.


-- 
Kevin Brown                          kevin@sysexperts.com



Re: Deadlock while doing VACUUM??

From
Tom Lane
Date:
Kevin Brown <kevin@sysexperts.com> writes:
> I finally got 7.3.2 installed, and confirmed that the problem does not
> exist on that version.  So this is something that's limited to the
> 7.2.x tree.  Which, I guess, means that it's not going to get fixed
> for that tree (I assume that 7.2.x is effectively end-of-lifed)...

I dug through the CVS logs, and could not find any entry between 7.2 and
7.3 that seemed like it might fix such a problem.  If we knew what the
problem was, maybe we could put together a back-patch.  I'm not
personally eager to spend more time on 7.2.*, though.
        regards, tom lane