Thread: Vacuum now uses AccessShareLock for analyze
I have changed vacuum so analyze now uses AccessShareLock. (Is this the proper lock for analyze?) The code will now vacuum all requested relations. It will then analyze each relation. This way, all the exclusive vacuum work is done first, then analyze can continue with shared locks. The code is much clearer with that functionality split into separate functions. How separate do people want vacuum and analyze? Analyze currently does not record the number of tuples and pages, because vacuum does that. Do people want analyze as a separate command and in a separate file? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, 29 May 2000, Bruce Momjian wrote: > I have changed vacuum so analyze now uses AccessShareLock. (Is this the > proper lock for analyze?) > > The code will now vacuum all requested relations. It will then analyze > each relation. This way, all the exclusive vacuum work is done first, > then analyze can continue with shared locks. hrmmm, here's a thought ... why not vacuum->analyze each relation in order? the 'exclusive lock' will prevent anyone from reading, so do a relation, release the lock to analyze that relation and let ppl access the database ... then do the next ... instead of doing an exclusive lock for the duration of the whole database ...
> On Mon, 29 May 2000, Bruce Momjian wrote: > > > I have changed vacuum so analyze now uses AccessShareLock. (Is this the > > proper lock for analyze?) > > > > The code will now vacuum all requested relations. It will then analyze > > each relation. This way, all the exclusive vacuum work is done first, > > then analyze can continue with shared locks. > > hrmmm, here's a thought ... why not vacuum->analyze each relation in > order? the 'exclusive lock' will prevent anyone from reading, so do a > relation, release the lock to analyze that relation and let ppl access the > database ... then do the next ... instead of doing an exclusive lock for > the duration of the whole database ... No, each table is locked one at a time. We do all the single-table locks first so the rest is all shared access. Does that make sense? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, 29 May 2000, Bruce Momjian wrote: > > On Mon, 29 May 2000, Bruce Momjian wrote: > > > > > I have changed vacuum so analyze now uses AccessShareLock. (Is this the > > > proper lock for analyze?) > > > > > > The code will now vacuum all requested relations. It will then analyze > > > each relation. This way, all the exclusive vacuum work is done first, > > > then analyze can continue with shared locks. > > > > hrmmm, here's a thought ... why not vacuum->analyze each relation in > > order? the 'exclusive lock' will prevent anyone from reading, so do a > > relation, release the lock to analyze that relation and let ppl access the > > database ... then do the next ... instead of doing an exclusive lock for > > the duration of the whole database ... > > No, each table is locked one at a time. We do all the single-table > locks first so the rest is all shared access. Does that make sense? its what I suspected, but my point was that if we did the ANALYZE for the relation right after the VACUUM for it, there would be a period of time where readers could come in and process ... think of it as a 'breather' before the next VACUUM starts, vs just jumping into the next ... Overall time for doing the vacuum shouldn't be any longer, but it would give gaps where readers could get in and out ... we're a relational database, so I imagine ppl are doing JOINs ... if RelationA is locked while ReaderA is trying to doign a JOIN between RA and RB, ReaderA is gonna be screwed ... if we did a quick ANALZE between RelationA and RelationB, then ReaderA would have a chance to do its processing while the ANALYZE is running, instead of having to wait for both RelationA and RelationB to be finished ...
> its what I suspected, but my point was that if we did the ANALYZE for the > relation right after the VACUUM for it, there would be a period of time > where readers could come in and process ... think of it as a 'breather' > before the next VACUUM starts, vs just jumping into the next ... > > Overall time for doing the vacuum shouldn't be any longer, but it would > give gaps where readers could get in and out ... we're a relational > database, so I imagine ppl are doing JOINs ... if RelationA is locked > while ReaderA is trying to doign a JOIN between RA and RB, ReaderA is > gonna be screwed ... if we did a quick ANALZE between RelationA and > RelationB, then ReaderA would have a chance to do its processing while the > ANALYZE is running, instead of having to wait for both RelationA and > RelationB to be finished ... Good point. Because we are only doing one table at a time, they could get in and do something, but they could also get part-way in and have another table locked, and since we are only locking one at a time, it seemed better to get it all done first. Comments? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > The code will now vacuum all requested relations. It will then analyze > each relation. This way, all the exclusive vacuum work is done first, > then analyze can continue with shared locks. I agree with Marc: it'd make more sense to do it one table at a time, ie,get exclusive lock on table Avacuum table Acommit, release lockget shared lock on table Agather stats for table Acommit,release lockrepeat sequence for table Betc > The code is much clearer with that functionality split into separate > functions. Wouldn't surprise me. > How separate do people want vacuum and analyze? Analyze currently does > not record the number of tuples and pages, because vacuum does that. Do > people want analyze as a separate command and in a separate file? We definitely want a separate command that can invoke just the analyze part. I'd guess something like "ANALYZE [ VERBOSE ] optional-table-name (optional-list-of-columns)" pretty much like VACUUM. I would be inclined to move the code out to a new file, just because vacuum.c is so darn big, but that's purely a code-beautification issue. On the number of tuples/pages issue, I'd suggest removing that function from plain vacuum and make the analyze part do it instead. It's always made me uncomfortable that vacuum needs to update system relations while it's holding an exclusive lock on the table-being-vacuumed (which might be another system catalog, or even pg_class itself). It works, more or less, but that update-tuple-in-place code is awfully ugly and fragile-looking. I'm also worried that there could be deadlock scenarios between concurrent vacuums (eg, one guy working on pg_class, another on pg_statistic, both need to get in and update the other guy's table. Oops. That particular problem should be gone with your changes, but maybe there are still problems just from the need to update pg_class). regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > The code will now vacuum all requested relations. It will then analyze > > each relation. This way, all the exclusive vacuum work is done first, > > then analyze can continue with shared locks. > > I agree with Marc: it'd make more sense to do it one table at a time, > ie, > get exclusive lock on table A > vacuum table A > commit, release lock > get shared lock on table A > gather stats for table A > commit, release lock > repeat sequence for table B > etc OK, changed. I will work on the additional issues in the next week or so. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
While we are at VACUUM, it would be good idea to have VACUUM flag each 'vacuumed' table (with some sort of 'clean' flag)- this will permit vaster vacuuming of mostly static databases, such that contain 'history' data in some tables. Daniel
> > How separate do people want vacuum and analyze? Analyze currently does > > not record the number of tuples and pages, because vacuum does that. Do > > people want analyze as a separate command and in a separate file? > > We definitely want a separate command that can invoke just the analyze > part. I'd guess something like "ANALYZE [ VERBOSE ] optional-table-name > (optional-list-of-columns)" pretty much like VACUUM. OK. > > I would be inclined to move the code out to a new file, just because > vacuum.c is so darn big, but that's purely a code-beautification issue. Done. > > On the number of tuples/pages issue, I'd suggest removing that function > from plain vacuum and make the analyze part do it instead. It's always > made me uncomfortable that vacuum needs to update system relations while > it's holding an exclusive lock on the table-being-vacuumed (which might > be another system catalog, or even pg_class itself). It works, more or > less, but that update-tuple-in-place code is awfully ugly and > fragile-looking. I'm also worried that there could be deadlock > scenarios between concurrent vacuums (eg, one guy working on pg_class, > another on pg_statistic, both need to get in and update the other guy's > table. Oops. That particular problem should be gone with your changes, > but maybe there are still problems just from the need to update > pg_class). How do I find the number of pages from heapscan? Can that number just be computed from the file size. I can get the block number of the last entry in the scan, but that doesn't show me expired rows at the end. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > How do I find the number of pages from heapscan? IIRC, heap_beginscan updates the relcache entry with the current number of blocks (as determined the hard way, with lseek). Should be able to use that, even though it might be a little bit out of date by the time you finish the scan ... regards, tom lane