Thread: Missing CONCURRENT VACUUM (Was: Release notes for 8.1)

Missing CONCURRENT VACUUM (Was: Release notes for 8.1)

From
Hannu Krosing
Date:
Once more: 

I would like to get at least some answer, why my patch for enabling
concurrent VACUUM was left out from 8.1.

It was submitted well in time, and there was only minimal amount of
discussion of an earlier patch,and AFAIK I addressed all the issues
raised there.

I really hate to have to start maintaining my own patches for "24/7,
OLTP-enabled" postgreSQL, at least unless I know that there is a good
reason why the mainline postgreSQL can't have these.

-- 
Hannu Krosing <hannu@skype.net>



Re: Missing CONCURRENT VACUUM (Was: Release notes for 8.1)

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> Once more: 
> I would like to get at least some answer, why my patch for enabling
> concurrent VACUUM was left out from 8.1.

You did not respond to this:
http://archives.postgresql.org/pgsql-patches/2005-08/msg00238.php
        regards, tom lane


Re: Missing CONCURRENT VACUUM (Was: Release notes for 8.1)

From
Bruce Momjian
Date:
Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > Once more: 
> > I would like to get at least some answer, why my patch for enabling
> > concurrent VACUUM was left out from 8.1.
> 
> You did not respond to this:
> http://archives.postgresql.org/pgsql-patches/2005-08/msg00238.php

Yep, that's why I put it in the 8.2 queue.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On T, 2005-08-16 at 18:26 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > Once more: 
> > I would like to get at least some answer, why my patch for enabling
> > concurrent VACUUM was left out from 8.1.
> 
> You did not respond to this:
> http://archives.postgresql.org/pgsql-patches/2005-08/msg00238.php

Somehow this did not reach me :(

I'll answer this here:

> Bruce Momjian <pgman ( at ) candle ( dot ) pha ( dot ) pa ( dot ) us> writes:
> >> Is there any particular reason for not putting it in 8.1 ?
> 
> > I thought there was still uncertainty about the patch.  Is there?
> 
> Considerable uncertainty, in my mind.  What we've got here is some
> pretty fundamental hacking on the transaction visibility logic, and
> neither Hannu nor anyone else has produced a convincing argument
> that it's correct.  "It hasn't failed yet in my usage" isn't enough
> to give me a good feeling about it. 

Agreed.

>  Some specific concerns:
> 
> * Given that VACUUM ANALYZE does create new output tuples stamped with
> its xid, I'm unclear on what happens in pg_statistic with this code in
> place.  

Actually any VACUUM, not only VACUUM ANALYSE, updates pg_class at the end.
That's why I exclude only one of the transactions of the VACUUM command, and that 
transaction does not create any new tuples, it only removes old ones.

> It seems entirely possible that someone might conclude the
> analyze tuples are from a crashed transaction and mark them invalid
> before the analyze can commit (notice TransactionIdIsInProgress does not
> bother looking in PGPROC when the tuple xmin is less than RecentXmin).

Once more, only 2nd transaction of LAZY VACUUM is affected, and that one does 
only (heap scan + clean indexes + clean heap) and _only_ removes old tuples.

> * If the vacuum xact is older than what others think is the global xmin,
> it could have problems with other vacuums removing tuples it should
> still be able to see (presumably only in the system catalogs, so maybe
> this isn't an issue, but I'm unsure). 

The cleanup transaction does no lookups in system catalogs.

> A related scenario that I don't
> think can be dismissed is someone truncating off part of pg_subtrans or
> pg_multixact that the vacuum still needs.

In my patch I specifically exclude TruncateSUBTRANS from using the
inVacuum flag

At the time I originally submitted my patch, GetOldestXmin was only used
in VACUUM and CREATE INDEX, others had other means of getting oldest
Xmin, and these were not affected by my patch. When I reworked it before
last submit, i changed the only nwe use (in xlog.c, line 5165) to use a
new version of GetOldestXmin with an extra flag tu tell it to NOT exlude
transactions running vacuum.

The transaction running the heap scan/cleanup part of the vacuum command
only sets a new isVacuum flag, and this is only used by GetOldestXmin
functions. Other means of getting OldestXmin still get exactly the old
behaviour. GetOldestXmin() does exclude xmin's with isVacuum set only
when called by other VACUUMS. this is controlled by the (new) second
argument of GetOldestXmin().

So I think that both your concerns expressed here are _already_
addressed by the latest patch in:

http://archives.postgresql.org/pgsql-patches/2005-07/msg00086.php

Please check the actual patch and advise if anything is still missing.

-- 
Hannu Krosing <hannu@skype.net>




Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> On T, 2005-08-16 at 18:26 -0400, Tom Lane wrote:
>> Some specific concerns:
>> 
>> * Given that VACUUM ANALYZE does create new output tuples stamped with
>> its xid, I'm unclear on what happens in pg_statistic with this code in
>> place.  

> Actually any VACUUM, not only VACUUM ANALYSE, updates pg_class at the end.
> That's why I exclude only one of the transactions of the VACUUM command, and that 
> transaction does not create any new tuples, it only removes old ones.

vac_update_relstats isn't the issue because it doesn't create a new
tuple.  I was concerned about ANALYZE --- but since that's done in a
separate transaction that's not marked inVacuum, it's not at risk.  So
OK, that's all right.

>> * If the vacuum xact is older than what others think is the global xmin,
>> it could have problems with other vacuums removing tuples it should
>> still be able to see (presumably only in the system catalogs, so maybe
>> this isn't an issue, but I'm unsure). 

> The cleanup transaction does no lookups in system catalogs.

Oh?  It certainly has to open relations and indexes.  I think that all
of that stuff may be done with SnapshotNow, rather than an xmin-related
snap, but it's still nervous-making.

>> A related scenario that I don't
>> think can be dismissed is someone truncating off part of pg_subtrans or
>> pg_multixact that the vacuum still needs.

> In my patch I specifically exclude TruncateSUBTRANS from using the
> inVacuum flag

You missed vac_truncate_clog, though.

> So I think that both your concerns expressed here are _already_
> addressed by the latest patch in:
> http://archives.postgresql.org/pgsql-patches/2005-07/msg00086.php

I have to admit that in my earlier message, I was looking at the version
of the patch that Bruce had on his patch page --- which I now see was not
the latest.  The idea of making GetOldestXmin only conditionally ignore
vacuums certainly makes it a lot safer.

> Please check the actual patch and advise if anything is still missing.

There's still a fair amount of breakage in this patch --- eg, in the
VACUUM FULL case it manages to invoke *both* full_vacuum_rel and
lazy_vacuum_rel --- but I think it can probably be made to work.
I'll take another pass at it.
        regards, tom lane


Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> Please check the actual patch and advise if anything is still missing.

While testing this I realized that it does not in fact work as
advertised.  It will only exclude long-running VACUUMs from other
VACUUMs' OldestXmin if *all* the transactions in the system are lazy
VACUUMs.  If there is even one regular transaction in the system,
that transaction will include the VACUUMs in its MyProc->xmin, and
thence GetOldestXmin will have to include them in its result.

AFAICS the only way to fix this would be to exclude inVacuum
transactions from GetSnapshotData's calculations as well.  That
makes the patch far more invasive, and I'm not confident I can work
out all the implications.  (In particular, the consequences for
TransactionIdIsInProgress look bad.  I don't think we want a VACUUM
to be seen as not-in-progress.)

So I'm bouncing this patch again...
        regards, tom lane


Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:
>> While testing this I realized that it does not in fact work as
>> advertised.  It will only exclude long-running VACUUMs from other
>> VACUUMs' OldestXmin if *all* the transactions in the system are lazy
>> VACUUMs.  If there is even one regular transaction in the system,
>> that transaction will include the VACUUMs in its MyProc->xmin, and
>> thence GetOldestXmin will have to include them in its result.

> Only if these regular transactions are running in SERIALIZABLE isolation
> level, else MyProc->xmin is not set inside GetSnapshotData.

Better read the code again.  The first snap in *any* transaction sets
MyProc->xmin.
        regards, tom lane


Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > Please check the actual patch and advise if anything is still missing.
> 
> While testing this I realized that it does not in fact work as
> advertised.  It will only exclude long-running VACUUMs from other
> VACUUMs' OldestXmin if *all* the transactions in the system are lazy
> VACUUMs.  If there is even one regular transaction in the system,
> that transaction will include the VACUUMs in its MyProc->xmin, and
> thence GetOldestXmin will have to include them in its result.

Only if these regular transactions are running in SERIALIZABLE isolation
level, else MyProc->xmin is not set inside GetSnapshotData. As my
transactions mostly run in READ COMMITTED mode I did not care about
those and later forgot about it.

> AFAICS the only way to fix this would be to exclude inVacuum
> transactions from GetSnapshotData's calculations as well.  That
> makes the patch far more invasive, and I'm not confident I can work
> out all the implications.  (In particular, the consequences for
> TransactionIdIsInProgress look bad.  I don't think we want a VACUUM
> to be seen as not-in-progress.)
> 
> So I'm bouncing this patch again...

Please don't. 

Even with current functionality it is part of solving the problem of
being able to vacuum small tables while vacuuming big ones at the same
time. In a scenario, where I use it, there are lot of OLTP (30-50ms)
transactions. These are run in READ COMMITTED mode, but even if I needed
them to be in SERIALIZABLE mode (or my reasoning about MyProc->xmin is
wrong), I can find (or force if needed) a few ms window where no other
transaction is running to start my VACUUM LAZY in a mode it can actually
clean up the tables.

This is still much better than not being able to do it at all. I'm ready
to write the documentation explaining it all in detail to those really
needing it.

-- 
Hannu Krosing <hannu@skype.net>



Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On K, 2005-08-17 at 16:45 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:
> >> While testing this I realized that it does not in fact work as
> >> advertised.  It will only exclude long-running VACUUMs from other
> >> VACUUMs' OldestXmin if *all* the transactions in the system are lazy
> >> VACUUMs.  If there is even one regular transaction in the system,
> >> that transaction will include the VACUUMs in its MyProc->xmin, and
> >> thence GetOldestXmin will have to include them in its result.
> 
> > Only if these regular transactions are running in SERIALIZABLE isolation
> > level, else MyProc->xmin is not set inside GetSnapshotData.
> 
> Better read the code again.  The first snap in *any* transaction sets
> MyProc->xmin.

Can't find the place :(

Could you point to the file / function that does this.

-- 
Hannu Krosing <hannu@skype.net>



Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On K, 2005-08-17 at 23:58 +0300, Hannu Krosing wrote:
> On K, 2005-08-17 at 16:45 -0400, Tom Lane wrote:
> > Hannu Krosing <hannu@skype.net> writes:
> > > On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:
> > >> While testing this I realized that it does not in fact work as
> > >> advertised.  It will only exclude long-running VACUUMs from other
> > >> VACUUMs' OldestXmin if *all* the transactions in the system are lazy
> > >> VACUUMs.  If there is even one regular transaction in the system,
> > >> that transaction will include the VACUUMs in its MyProc->xmin, and
> > >> thence GetOldestXmin will have to include them in its result.
> > 
> > > Only if these regular transactions are running in SERIALIZABLE isolation
> > > level, else MyProc->xmin is not set inside GetSnapshotData.
> > 
> > Better read the code again.  The first snap in *any* transaction sets
> > MyProc->xmin.
> 
> Can't find the place :(
> 
> Could you point to the file / function that does this.

Nevermind. Found it - GetTransactionSnapshot() always gets a
serializable snapshot first.

-- 
Hannu Krosing <hannu@tm.ee>


Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On E, 2005-08-22 at 00:36 +0300, Hannu Krosing wrote:
> On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:
> > Hannu Krosing <hannu@skype.net> writes:
> > > Please check the actual patch and advise if anything is still missing.
> > 
> > While testing this I realized that it does not in fact work as
> > advertised.  It will only exclude long-running VACUUMs from other
> > VACUUMs' OldestXmin if *all* the transactions in the system are lazy
> > VACUUMs.  If there is even one regular transaction in the system,
> > that transaction will include the VACUUMs in its MyProc->xmin, and
> > thence GetOldestXmin will have to include them in its result.
> 
> Thanks for spotting it. I guess in my case it was performing more or
> less as expected due to my load fluctuating between 0 and 300 concurrent
> transaction each minute, so the vacuum hit the 0 point often enough to
> be useful. I am running it in a loop, with 15 sec sleep between each
> vacuum.
> 
> > AFAICS the only way to fix this would be to exclude inVacuum
> > transactions from GetSnapshotData's calculations as well. 
> 
> I fixed this in a more local way by adding an extra "xmin" to proc for
> transactions where inVacuum is false (proc->nonInVacuumXmin) which is
> calculated together with proc->xmin.

Somehow this still did not fix the issue of getting the xmin of long
vacuum through concurrent transactions.

My initial testing was wrong.

Probably you should ignore the whole thing until I figure out why this
does not work ...

-- 
Hannu Krosing <hannu@tm.ee>


Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On E, 2005-08-22 at 01:14 +0300, Hannu Krosing wrote:

> > I fixed this in a more local way by adding an extra "xmin" to proc for
> > transactions where inVacuum is false (proc->nonInVacuumXmin) which is
> > calculated together with proc->xmin.
> 
> Somehow this still did not fix the issue of getting the xmin of long
> vacuum through concurrent transactions.

Actually it seems to work, if the "other" transaction actually does
something. It does not work when the only thing the other transactions
has doen is BEGIN WORK;

> My initial testing was wrong.

And my second testing was wrong too!

I tested in the following way (

create bigtable (i serial primary key, t text);
-- filled bigtable with 8 M rows
create tinytable(i serial primary key, t text);
insert into tinytable select * from bigtable limit 50;

opened 3 psql connections and did the following

1) vacuum bigtable;
2) update tinytable set t = random(); -- this creates dead tuples in                                     -- parallel
withvacuum bigtable
 
3) begin;
2) vacuum verbose tinytable;  -- cant free tuples
3) select 1;
2) vacuum verbose tinytable; -- can free the tuples

+ 30 sec

1) vacuum bigtable finishes


initially I just did begin and thought that concurrent transactions are
still blocking. 

When I added a notice inside GetSnapshotData, I saw that it was not
called by plain BEGIN WORK;

Probably nonInVacuumXmin needs more care, i.e. initialising and setting
it outside GetSnapshotData, at trx start and/or end. I'm too sleepy now
to investigate further (it's 2:10 am here).

> Probably you should ignore the whole thing until I figure out why this
> does not work ...

Please try to apply the patch, even if dangling BEGIN WORK; still causes 
problems - I'll fix this tomorrow.

I'd really like to see this in 8.1.

-- 
Hannu Krosing <hannu@tm.ee>


Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On E, 2005-08-22 at 00:36 +0300, Hannu Krosing wrote:
> On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:
> > That
> > makes the patch far more invasive, and I'm not confident I can work
> > out all the implications.  (In particular, the consequences for
> > TransactionIdIsInProgress look bad.  I don't think we want a VACUUM
> > to be seen as not-in-progress.)
> 
> The way the attached patch does it should hopefully not have these
> implications.
> 
> > So I'm bouncing this patch again...
> 
> Please check the attached patch and apply if possible. 
> 
> I also think this is a good time for this change as some people are
> already doing extensive concurrent vacuum testing due to the t_ctid
> chain fix, so this one would get all the testing for free.
> 
> This patch is against CVS checkout only a few hours old.

Alas, I just noticed that it still runs *both* full_vacuum_rel and
lazy_vacuum_rel due to missing {}.

And probably misses vac_truncate_clog use of inVacuum (have to check
that).

Sorry for confusion.

-- 
Hannu Krosing <hannu@tm.ee>


Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On E, 2005-08-22 at 02:14 +0300, Hannu Krosing wrote:
> Probably nonInVacuumXmin needs more care, i.e. initialising and setting
> it outside GetSnapshotData, at trx start and/or end. I'm too sleepy now
> to investigate further (it's 2:10 am here).

The attached patch works now as advertized so that concurrent non-vacuum
transactions don't interfere with vacuums OldestXmin calculations.

I fixed also the "VACUUM FULL case it manages to invoke *both*
full_vacuum_rel and lazy_vacuum_rel " bug.

But I could not find the breakage (from your Aug 17 email) with

> > In my patch I specifically exclude TruncateSUBTRANS from using the
> > inVacuum flag
>
> You missed vac_truncate_clog, though.

Has somethoing changed in vac_truncate_clog or am I just too
lazy/stupid ?

--
Hannu Krosing <hannu@skype.net>

Attachment

Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
> Please try to apply the patch, even if dangling BEGIN WORK; still causes 
> problems - I'll fix this tomorrow.

No.  A patch that you yourself have so little confidence in, in a
fundamental part of the system?

This will be lucky if it gets into 8.2, after considerably more review
than I have time to give it now.  It clearly needs such review, and we
are way too late with the beta already.
        regards, tom lane


Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On E, 2005-08-22 at 10:03 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > Please try to apply the patch, even if dangling BEGIN WORK; still causes 
> > problems - I'll fix this tomorrow.
> 
> No.  A patch that you yourself have so little confidence in, in a
> fundamental part of the system?

Yeah, I know I sound flaky. Never should I submit patches when
terminally tired :( . I had just recently read something about starting
beta on monday and wanted to get it in before that.

I had quite much confidence in it not breaking anything (at least not
anything more than what the previous crippled patch did), I had just to
fix the parts regarding initialising nonInVacuumXmin, which is done in
todays patch.

> This will be lucky if it gets into 8.2, after considerably more review
> than I have time to give it now.  It clearly needs such review, and we
> are way too late with the beta already.

When I started with the patch a long time ago, GetOldestXmin was used in
less places than it is now. Since then more functionality seems to have
started using it without paying attention that it is used for two
logically different things.

I hope that some care will be taken in future to distinguish between
"real" transactions (those that modify data and whose actions can be
rollbacked) and vacuum (and maybe some other maintenance tasks) where
transaction is used as a programming convenience without any real
transaction-like behaviour present.

Getting my todays patch in would be a way to make sure this distinction
will get more attention.

-- 
Hannu Krosing <hannu@tm.ee>



Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> But I could not find the breakage (from your Aug 17 email) with 

>> You missed vac_truncate_clog, though.

That was fixed (and documented), along with some other problems,
in the modified patch I sent back to you:

http://archives.postgresql.org/pgsql-patches/2005-08/msg00260.php

AFAICT you ignored that ...
        regards, tom lane


Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On E, 2005-08-22 at 14:05 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > But I could not find the breakage (from your Aug 17 email) with 
> 
> >> You missed vac_truncate_clog, though.
> 
> That was fixed (and documented), along with some other problems,
> in the modified patch I sent back to you:
> 
> http://archives.postgresql.org/pgsql-patches/2005-08/msg00260.php
> 
> AFAICT you ignored that ...

This is the second mail you refer to that I have not received, neithet
through list nor directly.

I must make a serious inquiry into my mailhost to see what is going on.

Or start looking for your answers in exclusively in archives :(

Thanks. I'll check that.

-- 
Hannu Krosing <hannu@skype.net>



Re: Missing CONCURRENT VACUUM (Was: Release notes for

From
Hannu Krosing
Date:
On E, 2005-08-22 at 23:17 +0300, Hannu Krosing wrote:
> On E, 2005-08-22 at 14:05 -0400, Tom Lane wrote:
> > Hannu Krosing <hannu@skype.net> writes:
> > > But I could not find the breakage (from your Aug 17 email) with 
> > 
> > >> You missed vac_truncate_clog, though.
> > 
> > That was fixed (and documented), along with some other problems,
> > in the modified patch I sent back to you:
> > 
> > http://archives.postgresql.org/pgsql-patches/2005-08/msg00260.php
> > 
> > AFAICT you ignored that ...
>
> I must make a serious inquiry into my mailhost to see what is going on.

Again I answered too quickly. I see it now. 

Your answer was to pgsql-patches list and to my terminally spam-infested
old address hannu@tm.ee, whereas I expected it for some reason as a
reply to my hannu@skype.net address and/or pgsql-hackers list from/to
which the last mails were sent

Sorry, must look more aggressively in future :)

-- 
Hannu Krosing <hannu@tm.ee>