Thread: Vacuum now uses AccessShareLock for analyze

Vacuum now uses AccessShareLock for analyze

From
Bruce Momjian
Date:
I have changed vacuum so analyze now uses AccessShareLock.  (Is this the
proper lock for analyze?)

The code will now vacuum all requested relations.  It will then analyze
each relation.  This way, all the exclusive vacuum work is done first,
then analyze can continue with shared locks.

The code is much clearer with that functionality split into separate
functions.

How separate do people want vacuum and analyze?  Analyze currently does
not record the number of tuples and pages, because vacuum does that.  Do
people want analyze as a separate command and in a separate file?

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Vacuum now uses AccessShareLock for analyze

From
The Hermit Hacker
Date:
On Mon, 29 May 2000, Bruce Momjian wrote:

> I have changed vacuum so analyze now uses AccessShareLock.  (Is this the
> proper lock for analyze?)
> 
> The code will now vacuum all requested relations.  It will then analyze
> each relation.  This way, all the exclusive vacuum work is done first,
> then analyze can continue with shared locks.

hrmmm, here's a thought ... why not vacuum->analyze each relation in
order?  the 'exclusive lock' will prevent anyone from reading, so do a
relation, release the lock to analyze that relation and let ppl access the
database ... then do the next ... instead of doing an exclusive lock for
the duration of the whole database ...




Re: Vacuum now uses AccessShareLock for analyze

From
Bruce Momjian
Date:
> On Mon, 29 May 2000, Bruce Momjian wrote:
> 
> > I have changed vacuum so analyze now uses AccessShareLock.  (Is this the
> > proper lock for analyze?)
> > 
> > The code will now vacuum all requested relations.  It will then analyze
> > each relation.  This way, all the exclusive vacuum work is done first,
> > then analyze can continue with shared locks.
> 
> hrmmm, here's a thought ... why not vacuum->analyze each relation in
> order?  the 'exclusive lock' will prevent anyone from reading, so do a
> relation, release the lock to analyze that relation and let ppl access the
> database ... then do the next ... instead of doing an exclusive lock for
> the duration of the whole database ...

No, each table is locked one at a time.  We do all the single-table
locks first so the rest is all shared access.  Does that make sense?

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Vacuum now uses AccessShareLock for analyze

From
The Hermit Hacker
Date:
On Mon, 29 May 2000, Bruce Momjian wrote:

> > On Mon, 29 May 2000, Bruce Momjian wrote:
> > 
> > > I have changed vacuum so analyze now uses AccessShareLock.  (Is this the
> > > proper lock for analyze?)
> > > 
> > > The code will now vacuum all requested relations.  It will then analyze
> > > each relation.  This way, all the exclusive vacuum work is done first,
> > > then analyze can continue with shared locks.
> > 
> > hrmmm, here's a thought ... why not vacuum->analyze each relation in
> > order?  the 'exclusive lock' will prevent anyone from reading, so do a
> > relation, release the lock to analyze that relation and let ppl access the
> > database ... then do the next ... instead of doing an exclusive lock for
> > the duration of the whole database ...
> 
> No, each table is locked one at a time.  We do all the single-table
> locks first so the rest is all shared access.  Does that make sense?

its what I suspected, but my point was that if we did the ANALYZE for the
relation right after the VACUUM for it, there would be a period of time
where readers could come in and process ... think of it as a 'breather'
before the next VACUUM starts, vs just jumping into the next ...

Overall time for doing the vacuum shouldn't be any longer, but it would
give gaps where readers could get in and out ... we're a relational
database, so I imagine ppl are doing JOINs ... if RelationA is locked
while ReaderA is trying to doign a JOIN between RA and RB, ReaderA is
gonna be screwed ... if we did a quick ANALZE between RelationA and
RelationB, then ReaderA would have a chance to do its processing while the
ANALYZE is running, instead of having to wait for both RelationA and
RelationB to be finished ...



Re: Vacuum now uses AccessShareLock for analyze

From
Bruce Momjian
Date:
> its what I suspected, but my point was that if we did the ANALYZE for the
> relation right after the VACUUM for it, there would be a period of time
> where readers could come in and process ... think of it as a 'breather'
> before the next VACUUM starts, vs just jumping into the next ...
> 
> Overall time for doing the vacuum shouldn't be any longer, but it would
> give gaps where readers could get in and out ... we're a relational
> database, so I imagine ppl are doing JOINs ... if RelationA is locked
> while ReaderA is trying to doign a JOIN between RA and RB, ReaderA is
> gonna be screwed ... if we did a quick ANALZE between RelationA and
> RelationB, then ReaderA would have a chance to do its processing while the
> ANALYZE is running, instead of having to wait for both RelationA and
> RelationB to be finished ...

Good point.  Because we are only doing one table at a time, they could
get in and do something, but they could also get part-way in and have
another table locked, and since we are only locking one at a time, it
seemed better to get it all done first.  Comments?

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Vacuum now uses AccessShareLock for analyze

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> The code will now vacuum all requested relations.  It will then analyze
> each relation.  This way, all the exclusive vacuum work is done first,
> then analyze can continue with shared locks.

I agree with Marc: it'd make more sense to do it one table at a time,
ie,get exclusive lock on table Avacuum table Acommit, release lockget shared lock on table Agather stats for table
Acommit,release lockrepeat sequence for table Betc
 

> The code is much clearer with that functionality split into separate
> functions.

Wouldn't surprise me.

> How separate do people want vacuum and analyze?  Analyze currently does
> not record the number of tuples and pages, because vacuum does that.  Do
> people want analyze as a separate command and in a separate file?

We definitely want a separate command that can invoke just the analyze
part.  I'd guess something like "ANALYZE [ VERBOSE ] optional-table-name
(optional-list-of-columns)" pretty much like VACUUM.

I would be inclined to move the code out to a new file, just because
vacuum.c is so darn big, but that's purely a code-beautification issue.

On the number of tuples/pages issue, I'd suggest removing that function
from plain vacuum and make the analyze part do it instead.  It's always
made me uncomfortable that vacuum needs to update system relations while
it's holding an exclusive lock on the table-being-vacuumed (which might
be another system catalog, or even pg_class itself).  It works, more or
less, but that update-tuple-in-place code is awfully ugly and
fragile-looking.  I'm also worried that there could be deadlock
scenarios between concurrent vacuums (eg, one guy working on pg_class,
another on pg_statistic, both need to get in and update the other guy's
table.  Oops.  That particular problem should be gone with your changes,
but maybe there are still problems just from the need to update
pg_class).
        regards, tom lane


Re: Vacuum now uses AccessShareLock for analyze

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > The code will now vacuum all requested relations.  It will then analyze
> > each relation.  This way, all the exclusive vacuum work is done first,
> > then analyze can continue with shared locks.
> 
> I agree with Marc: it'd make more sense to do it one table at a time,
> ie,
>     get exclusive lock on table A
>     vacuum table A
>     commit, release lock
>     get shared lock on table A
>     gather stats for table A
>     commit, release lock
>     repeat sequence for table B
>     etc

OK, changed.

I will work on the additional issues in the next week or so.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Vacuum now uses AccessShareLock for analyze

From
Daniel Kalchev
Date:
While we are at VACUUM, it would be good idea to have VACUUM flag each 'vacuumed' table (with some sort of 'clean'
flag)- this will permit vaster vacuuming of mostly static databases, such that contain 'history' data in some tables.
 

Daniel



Re: Vacuum now uses AccessShareLock for analyze

From
Bruce Momjian
Date:
> > How separate do people want vacuum and analyze?  Analyze currently does
> > not record the number of tuples and pages, because vacuum does that.  Do
> > people want analyze as a separate command and in a separate file?
> 
> We definitely want a separate command that can invoke just the analyze
> part.  I'd guess something like "ANALYZE [ VERBOSE ] optional-table-name
> (optional-list-of-columns)" pretty much like VACUUM.

OK.

> 
> I would be inclined to move the code out to a new file, just because
> vacuum.c is so darn big, but that's purely a code-beautification issue.

Done.

> 
> On the number of tuples/pages issue, I'd suggest removing that function
> from plain vacuum and make the analyze part do it instead.  It's always
> made me uncomfortable that vacuum needs to update system relations while
> it's holding an exclusive lock on the table-being-vacuumed (which might
> be another system catalog, or even pg_class itself).  It works, more or
> less, but that update-tuple-in-place code is awfully ugly and
> fragile-looking.  I'm also worried that there could be deadlock
> scenarios between concurrent vacuums (eg, one guy working on pg_class,
> another on pg_statistic, both need to get in and update the other guy's
> table.  Oops.  That particular problem should be gone with your changes,
> but maybe there are still problems just from the need to update
> pg_class).

How do I find the number of pages from heapscan?  Can that number just
be computed from the file size.  I can get the block number of the last
entry in the scan, but that doesn't show me expired rows at the end.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Vacuum now uses AccessShareLock for analyze

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> How do I find the number of pages from heapscan?

IIRC, heap_beginscan updates the relcache entry with the current number
of blocks (as determined the hard way, with lseek).  Should be able to
use that, even though it might be a little bit out of date by the time
you finish the scan ...
        regards, tom lane