Thread: Re: [PATCHES] The vacuum-ignore-vacuum patch

Re: [PATCHES] The vacuum-ignore-vacuum patch

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Hannu Krossing asked me about his patch to ignore transactions running
> VACUUM LAZY in other vacuum transactions.  I attach a version of the
> patch updated to the current sources.

nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
version of the computation?

In general, it seems to me that a transaction running lazy vacuum could
be ignored for every purpose except truncating clog/subtrans.  Since it
will never insert its own XID into the database (note: VACUUM ANALYZE is
run as two separate transactions, hence the pg_statistic rows inserted
by ANALYZE are not a counterexample), there's no need for anyone to
include it as running in their snapshots.  So unless I'm missing
something, this is a safe change for lazy vacuum, but perhaps not for
full vacuum, which *does* put its XID into the database.

A possible objection to this is that it would foreclose running VACUUM
and ANALYZE as a single transaction, exactly because of the point that
we couldn't insert pg_statistic rows using a lazy vacuum's XID.  I think
there was some discussion of doing that in connection with enlarging
ANALYZE's sample greatly --- if ANALYZE goes back to being a full scan
or nearly so, it'd sure be nice to combine it with the VACUUM scan.
However maybe we should just accept that as the price of not having
multiple vacuums interfere with each other.

            regards, tom lane

Re: [PATCHES] The vacuum-ignore-vacuum patch

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Hannu Krossing asked me about his patch to ignore transactions running
> > VACUUM LAZY in other vacuum transactions.  I attach a version of the
> > patch updated to the current sources.
>
> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
> version of the computation?

Hmm ... I remember removing a now-useless variable somewhere, but maybe
this one escaped me.  I don't have the code handy -- will check.

> In general, it seems to me that a transaction running lazy vacuum could
> be ignored for every purpose except truncating clog/subtrans.  Since it
> will never insert its own XID into the database (note: VACUUM ANALYZE is
> run as two separate transactions, hence the pg_statistic rows inserted
> by ANALYZE are not a counterexample), there's no need for anyone to
> include it as running in their snapshots.  So unless I'm missing
> something, this is a safe change for lazy vacuum, but perhaps not for
> full vacuum, which *does* put its XID into the database.

But keep in mind that in the current code, clog truncation takes
relminxid (actually datminxid) into account, not running transactions,
so AFAICS this should affect anything.

Subtrans truncation is different and it certainly should consider lazy
vacuum's Xids.

> A possible objection to this is that it would foreclose running VACUUM
> and ANALYZE as a single transaction, exactly because of the point that
> we couldn't insert pg_statistic rows using a lazy vacuum's XID.  I think
> there was some discussion of doing that in connection with enlarging
> ANALYZE's sample greatly --- if ANALYZE goes back to being a full scan
> or nearly so, it'd sure be nice to combine it with the VACUUM scan.
> However maybe we should just accept that as the price of not having
> multiple vacuums interfere with each other.

Hmm, what about having a single scan for both, and then starting a
normal transaction just for the sake of inserting the pg_statistics
tuple?

I think the interactions of Xids and vacuum and other stuff are starting
to get complex; IMHO it warrants having a README.vacuum, or something.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [PATCHES] The vacuum-ignore-vacuum patch

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> A possible objection to this is that it would foreclose running VACUUM
>> and ANALYZE as a single transaction, exactly because of the point that
>> we couldn't insert pg_statistic rows using a lazy vacuum's XID.

> Hmm, what about having a single scan for both, and then starting a
> normal transaction just for the sake of inserting the pg_statistics
> tuple?

We could, but I think memory consumption would be the issue.  VACUUM
wants a lotta memory for the dead-TIDs array, ANALYZE wants a lot for
its statistics gathering ... even more if it's trying to take a larger
sample than before.  (This is probably why we kept them separate in
the last rewrite.)

> I think the interactions of Xids and vacuum and other stuff are starting
> to get complex; IMHO it warrants having a README.vacuum, or something.

Go for it ...

            regards, tom lane

Re: The vacuum-ignore-vacuum patch

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Hannu Krossing asked me about his patch to ignore transactions running
> > VACUUM LAZY in other vacuum transactions.  I attach a version of the
> > patch updated to the current sources.
> 
> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
> version of the computation?

Hmm, not useless at all really -- only a bug of mine.  Turns out the
notInVacuumXmin stuff is essential, so I put it back.

I noticed something however -- in calculating the OldestXmin we always
consider all DBs, even though there is a parameter for skipping backends
not in the current DB -- this is because the Xmin we store in PGPROC is
always computed using all backends.  The allDbs parameter only allows us
to skip the Xid of a transaction running elsewhere, but this is not very
helpful because the Xmin of transactions running in the local DB will
include those foreign Xids.

In case I'm not explaining myself, the problem is that if I open a
transaction in database A and then vacuum a table in database B, those
tuples deleted after the transaction in database A started cannot be
removed.

To solve this problem, one idea is to change the new member of PGPROC to
"current database's not in vacuum Xmin", which is the minimum of Xmins
of backends running in my database which are not executing a lazy
vacuum.  This can be used to vacuum non-shared relations.

We could either add it anew, beside nonInVacuumXmin, or replace
nonInVacuumXmin.  The difference will be whether we will have something
to be used to vacuum shared relations or not.  I think in general,
shared relations are not vacuumed much so it shouldn't be too much of a
problem if we leave them to be vacuumed with the regular, all-databases,
include-vacuum Xmin.

The other POV is that we don't really care about long-running
transaction in other databases unless they are lazy vacuum, a case which
is appropiately covered by the patch as it currently stands.  This seems
to be the POV that Hannu takes: the only long-running transactions he
cares about are lazy vacuums.

Thoughts?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: The vacuum-ignore-vacuum patch

From
Hannu Krosing
Date:
Ühel kenal päeval, N, 2006-07-27 kell 19:29, kirjutas Alvaro Herrera:

> 
> We could either add it anew, beside nonInVacuumXmin, or replace
> nonInVacuumXmin.  The difference will be whether we will have something
> to be used to vacuum shared relations or not.  I think in general,
> shared relations are not vacuumed much so it shouldn't be too much of a
> problem if we leave them to be vacuumed with the regular, all-databases,
> include-vacuum Xmin.

Yes. I don't think that vacuuming shared relations will ever be a
significant performance concern.

> The other POV is that we don't really care about long-running
> transaction in other databases unless they are lazy vacuum, a case which
> is appropiately covered by the patch as it currently stands.  This seems
> to be the POV that Hannu takes: the only long-running transactions he
> cares about are lazy vacuums.

Yes. The original target audience of this patch are users running 24/7
OLTP databases with big slow changing tables and small fast-changing
tables which need to stay small even at the time when the big ones are
vacuumed.

The other possible transactions which _could_ possibly be ignored while
VACUUMING are those from ANALYSE and non-lazy VACUUMs.

I don't care about them as:
 ANALYSE is relatively fast, even on huge tables, and thus can be  ignored.
 If you do run VACUUM FULL on anything bigger than a few thousand  lines then you are not running a 24/7 OLTP database
anyway.
 I also can't see a usecase for OLTP database where VACUUM FREEZE is required.


Maybe we could also start ignoring the transactions that are running the
new CONCURRENT CREATE INDEX command, as it also runs inside its own
transaction(s) which can't possibly touch the tuples in the table being
vacuumed as it locks out VACUUM on the indexed table.

That would probably be quite easy to do by just having CONCURRENT CREATE
INDEX also mark its transactions as ignorable by VACUUM. Maybe the
variable name for that (proc->inVacuum) needs to be changed to something
like trxSafeToIgnoreByVacuum.


-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: The vacuum-ignore-vacuum patch

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
>> version of the computation?

> Hmm, not useless at all really -- only a bug of mine.  Turns out the
> notInVacuumXmin stuff is essential, so I put it back.

Uh, why?

> I noticed something however -- in calculating the OldestXmin we always
> consider all DBs, even though there is a parameter for skipping backends
> not in the current DB -- this is because the Xmin we store in PGPROC is
> always computed using all backends.  The allDbs parameter only allows us
> to skip the Xid of a transaction running elsewhere, but this is not very
> helpful because the Xmin of transactions running in the local DB will
> include those foreign Xids.

Yeah, this has been recognized for some time.  However the overhead of
calculating local and global xmins in *every* transaction start is a
significant reason not to do it.
        regards, tom lane


Re: The vacuum-ignore-vacuum patch

From
Bruce Momjian
Date:
Another idea Jan had today was whether we could vacuum more rows if a
long-running backend is in serializable mode, like pg_dump.

---------------------------------------------------------------------------

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
> >> version of the computation?
> 
> > Hmm, not useless at all really -- only a bug of mine.  Turns out the
> > notInVacuumXmin stuff is essential, so I put it back.
> 
> Uh, why?
> 
> > I noticed something however -- in calculating the OldestXmin we always
> > consider all DBs, even though there is a parameter for skipping backends
> > not in the current DB -- this is because the Xmin we store in PGPROC is
> > always computed using all backends.  The allDbs parameter only allows us
> > to skip the Xid of a transaction running elsewhere, but this is not very
> > helpful because the Xmin of transactions running in the local DB will
> > include those foreign Xids.
> 
> Yeah, this has been recognized for some time.  However the overhead of
> calculating local and global xmins in *every* transaction start is a
> significant reason not to do it.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: The vacuum-ignore-vacuum patch

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
> >> version of the computation?
> 
> > Hmm, not useless at all really -- only a bug of mine.  Turns out the
> > notInVacuumXmin stuff is essential, so I put it back.
> 
> Uh, why?

Because it's used to determine the Xmin that our vacuum will use.  If
there is a transaction whose Xmin calculation included the Xid of a
transaction running vacuum, we have gained nothing from directly
excluding said vacuum's Xid, because it will affect us anyway indirectly
via that transaction's Xmin.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: The vacuum-ignore-vacuum patch

From
Hannu Krosing
Date:
Ühel kenal päeval, N, 2006-07-27 kell 22:05, kirjutas Bruce Momjian:
> Another idea Jan had today was whether we could vacuum more rows if a
> long-running backend is in serializable mode, like pg_dump.

I don't see how this gives us ability to vacuum more rows, as the
snapshot of a serializable transaction is the oldest one.


-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: The vacuum-ignore-vacuum patch

From
Bruce Momjian
Date:
Hannu Krosing wrote:
> ?hel kenal p?eval, N, 2006-07-27 kell 22:05, kirjutas Bruce Momjian:
> > Another idea Jan had today was whether we could vacuum more rows if a
> > long-running backend is in serializable mode, like pg_dump.
> 
> I don't see how this gives us ability to vacuum more rows, as the
> snapshot of a serializable transaction is the oldest one.

Good question.  Imagine you have a serializable transaction like
pg_dump, and then you have lots of newer transactions.  If pg_dump is
xid=12, and all the new transactions start at xid=30, any row created
and expired between 12 and 30 can be removed because they are not
visible.  For a use case, imagine an UPDATE chain where a rows was
created by x=15 and expired by xid=19.  Right now, we don't remove that
row, though we could.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: The vacuum-ignore-vacuum patch

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Good question.  Imagine you have a serializable transaction like
> pg_dump, and then you have lots of newer transactions.  If pg_dump is
> xid=12, and all the new transactions start at xid=30, any row created
> and expired between 12 and 30 can be removed because they are not
> visible.

This reasoning is bogus.

It would probably be safe for pg_dump because it's a read-only
operation, but it fails badly if the serializable transaction is trying
to do updates.  An update needs to chase the chain of newer versions of
the row forward from the version that's visible to the xact's
serializable snapshot, to see if anyone has committed a newer version.
Your proposal would remove elements of that chain, thereby possibly
allowing the serializable xact to conclude it may update the tuple
when it should have given an error.
        regards, tom lane


Re: The vacuum-ignore-vacuum patch

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Good question.  Imagine you have a serializable transaction like
> > pg_dump, and then you have lots of newer transactions.  If pg_dump is
> > xid=12, and all the new transactions start at xid=30, any row created
> > and expired between 12 and 30 can be removed because they are not
> > visible.
> 
> This reasoning is bogus.
> 
> It would probably be safe for pg_dump because it's a read-only
> operation, but it fails badly if the serializable transaction is trying
> to do updates.  An update needs to chase the chain of newer versions of
> the row forward from the version that's visible to the xact's
> serializable snapshot, to see if anyone has committed a newer version.
> Your proposal would remove elements of that chain, thereby possibly
> allowing the serializable xact to conclude it may update the tuple
> when it should have given an error.

So in fact members of the chain are not visible, but vacuum doesn't have
a strong enough lock to remove parts of the chain.  What seems strange
is that vacuum can trim the chain, but only if you do members starting
from the head.  I assume this is because you don't need to rejoin the
chain around the expired tuples.

("bogus" seems a little strong.)

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: The vacuum-ignore-vacuum patch

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> Uh, why?
> 
> > Because it's used to determine the Xmin that our vacuum will use.  If
> > there is a transaction whose Xmin calculation included the Xid of a
> > transaction running vacuum, we have gained nothing from directly
> > excluding said vacuum's Xid, because it will affect us anyway indirectly
> > via that transaction's Xmin.
> 
> But the patch changes things so that *everyone* excludes the vacuum from
> their xmin.  Or at least I thought that was the plan.

We shouldn't do that, because that Xmin is also used to truncate
SUBTRANS.  Unless we are prepared to say that vacuum does not use
subtransactions so it doesn't matter.  This is true currently, so we
could go ahead and do it (unless I'm missing something) -- but it means
lazy vacuum will never be able to use subtransactions.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: The vacuum-ignore-vacuum patch

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> Uh, why?

> Because it's used to determine the Xmin that our vacuum will use.  If
> there is a transaction whose Xmin calculation included the Xid of a
> transaction running vacuum, we have gained nothing from directly
> excluding said vacuum's Xid, because it will affect us anyway indirectly
> via that transaction's Xmin.

But the patch changes things so that *everyone* excludes the vacuum from
their xmin.  Or at least I thought that was the plan.
        regards, tom lane


Re: The vacuum-ignore-vacuum patch

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> But the patch changes things so that *everyone* excludes the vacuum from
>> their xmin.  Or at least I thought that was the plan.

> We shouldn't do that, because that Xmin is also used to truncate
> SUBTRANS.

Yeah, but you were going to change that, no?  Truncating SUBTRANS will
need to include the vacuum xact's xmin, but we don't need it for any
other purpose.

> but it means
> lazy vacuum will never be able to use subtransactions.

This patch already depends on the assumption that lazy vacuum will never
do any transactional updates, so I don't see what it would need
subtransactions for.
        regards, tom lane


Re: The vacuum-ignore-vacuum patch

From
"Jim C. Nasby"
Date:
On Fri, Jul 28, 2006 at 03:08:08AM +0300, Hannu Krosing wrote:
> > The other POV is that we don't really care about long-running
> > transaction in other databases unless they are lazy vacuum, a case which
> > is appropiately covered by the patch as it currently stands.  This seems
> > to be the POV that Hannu takes: the only long-running transactions he
> > cares about are lazy vacuums.
> 
> Yes. The original target audience of this patch are users running 24/7
> OLTP databases with big slow changing tables and small fast-changing
> tables which need to stay small even at the time when the big ones are
> vacuumed.
> 
> The other possible transactions which _could_ possibly be ignored while
> VACUUMING are those from ANALYSE and non-lazy VACUUMs.

There are other transactions to consider: user transactions that will
run a long time, but only hit a limited number of relations. These are
as big a problem in an OLTP environment as vacuum is.

Rather than coming up with machinery that will special-case vacuum or
pg_dump, etc., I'd suggest thinking about a generic framework that would
work for any long-runnnig transaction. One possibility:

Transaction flags itself as 'long-running' and provides a list of
exactly what relations it will be touching.

That list is stored someplace a future vacuum can get at.

The transaction runs, with additional checks that ensure it will not
touch any relations that aren't in the list it provided. 

Any vacuums that start will take into account these lists of relations
from long-running transactions and build a list of XIDs that have
provided a list, and the minimum XID for every relation that was listed.
If vacuum wants to vacuum a relation that has been listed as part of a
long-running transaction, it will use the oldest XID in the
database/cluster or the oldest XID listed for that relation, whichever
is older. If it wants to vacuum a relation that is not listed, it will
use the oldest XID in the database/cluster, excluding those XIDs that
have listed exactly what relations they will be looking at.

That scheme won't help pg_dump... in order to do so, you'd need to allow
transactions to drop relations from their list.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: The vacuum-ignore-vacuum patch

From
Chris Browne
Date:
jnasby@pervasive.com ("Jim C. Nasby") writes:
> There are other transactions to consider: user transactions that will
> run a long time, but only hit a limited number of relations. These are
> as big a problem in an OLTP environment as vacuum is.
>
> Rather than coming up with machinery that will special-case vacuum or
> pg_dump, etc., I'd suggest thinking about a generic framework that would
> work for any long-runnnig transaction. One possibility:
>
> Transaction flags itself as 'long-running' and provides a list of
> exactly what relations it will be touching.
>
> That list is stored someplace a future vacuum can get at.
>
> The transaction runs, with additional checks that ensure it will not
> touch any relations that aren't in the list it provided. 

One thought that's a bit different...

How about we mark transactions that are in serializable mode?  That
would merely be a flag...

We would know that, for each such transaction, we could treat all
tuples "deadified" after those transactions as being dead and
cleanable.

That doesn't require any knowledge of relations that are
touched/locked...
-- 
"cbbrowne","@","cbbrowne.com"
http://www.ntlug.org/~cbbrowne/nonrdbms.html
To err is human, to moo bovine. 


Re: The vacuum-ignore-vacuum patch

From
Hannu Krosing
Date:
Ühel kenal päeval, R, 2006-07-28 kell 12:38, kirjutas Jim C. Nasby:
> On Fri, Jul 28, 2006 at 03:08:08AM +0300, Hannu Krosing wrote:
> > > The other POV is that we don't really care about long-running
> > > transaction in other databases unless they are lazy vacuum, a case which
> > > is appropiately covered by the patch as it currently stands.  This seems
> > > to be the POV that Hannu takes: the only long-running transactions he
> > > cares about are lazy vacuums.
> > 
> > Yes. The original target audience of this patch are users running 24/7
> > OLTP databases with big slow changing tables and small fast-changing
> > tables which need to stay small even at the time when the big ones are
> > vacuumed.
> > 
> > The other possible transactions which _could_ possibly be ignored while
> > VACUUMING are those from ANALYSE and non-lazy VACUUMs.
> 
> There are other transactions to consider: user transactions that will
> run a long time, but only hit a limited number of relations. These are
> as big a problem in an OLTP environment as vacuum is.

These transactions are better kept out of an OLTP database, by their
nature they belong to OLAP db :)

The reason I addressed the VACUUM first, was the fact that you can't
avoid VACUUM on OLTP db.

> Rather than coming up with machinery that will special-case vacuum or
> pg_dump, etc., I'd suggest thinking about a generic framework that would
> work for any long-runnnig transaction. 

So instead of actually *solving* one problem you suggest *thinking*
about solving the general case ?

We have been *thinking* about dead-space-map for at least three years by
now.

> One possibility:
> 
> Transaction flags itself as 'long-running' and provides a list of
> exactly what relations it will be touching.
> 
> That list is stored someplace a future vacuum can get at.
> 
> The transaction runs, with additional checks that ensure it will not
> touch any relations that aren't in the list it provided. 

I have thought abou that too, but checking on each data change seemed
too expensive to me, at least for the first cut.

There seems to be some ways to avoid actual checking for table-in-list,
but you still have to check weather you have to check .

> Any vacuums that start will take into account these lists of relations
> from long-running transactions and build a list of XIDs that have
> provided a list, and the minimum XID for every relation that was listed.
> If vacuum wants to vacuum a relation that has been listed as part of a
> long-running transaction, it will use the oldest XID in the
> database/cluster or the oldest XID listed for that relation, whichever
> is older. If it wants to vacuum a relation that is not listed, it will
> use the oldest XID in the database/cluster, excluding those XIDs that
> have listed exactly what relations they will be looking at.
> 
> That scheme won't help pg_dump... in order to do so, you'd need to allow
> transactions to drop relations from their list.

The whole thing is probably doable, but I doubt it will be done before
8.2 (or even 8.5, considering that I had the first vacuum-ignore-vacuum
patch ready by 8.0 (i think))

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: The vacuum-ignore-vacuum patch

From
Jim Nasby
Date:
On Jul 28, 2006, at 5:05 PM, Hannu Krosing wrote:
> Ühel kenal päeval, R, 2006-07-28 kell 12:38, kirjutas Jim C. Nasby:
>> There are other transactions to consider: user transactions that will
>> run a long time, but only hit a limited number of relations. These
>> are
>> as big a problem in an OLTP environment as vacuum is.
>
> These transactions are better kept out of an OLTP database, by their
> nature they belong to OLAP db :)

Sure, but that's not always possible/practical.

>> Rather than coming up with machinery that will special-case vacuum or
>> pg_dump, etc., I'd suggest thinking about a generic framework that
>> would
>> work for any long-runnnig transaction.
>
> So instead of actually *solving* one problem you suggest *thinking*
> about solving the general case ?
>
> We have been *thinking* about dead-space-map for at least three
> years by
> now.

No, I just wanted anyone who was actually going to work on this to
think about a more general fix. If the vacuum-only fix has a chance
of getting into core a version before the general case, I'll happily
take what I can get.

>> One possibility:
>>
>> Transaction flags itself as 'long-running' and provides a list of
>> exactly what relations it will be touching.
>>
>> That list is stored someplace a future vacuum can get at.
>>
>> The transaction runs, with additional checks that ensure it will not
>> touch any relations that aren't in the list it provided.
>
> I have thought abou that too, but checking on each data change seemed
> too expensive to me, at least for the first cut.
>
> There seems to be some ways to avoid actual checking for table-in-
> list,
> but you still have to check weather you have to check .

Well, presumably the check to see if you have to check would be
extremely cheap. As for checking that only approved relations are
touched, you can do that by analyzing the rules/triggers/etc that are
on all the tables involved. Or for a start, just disallow this on
tables with rules or triggers (well, we'd probably have to allow for
RI).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461




Re: The vacuum-ignore-vacuum patch

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> But the patch changes things so that *everyone* excludes the vacuum from
> >> their xmin.  Or at least I thought that was the plan.
>
> > We shouldn't do that, because that Xmin is also used to truncate
> > SUBTRANS.
>
> Yeah, but you were going to change that, no?  Truncating SUBTRANS will
> need to include the vacuum xact's xmin, but we don't need it for any
> other purpose.

That's correct.

> > but it means
> > lazy vacuum will never be able to use subtransactions.
>
> This patch already depends on the assumption that lazy vacuum will never
> do any transactional updates, so I don't see what it would need
> subtransactions for.

Here is a patch pursuant to there ideas.  The main change is that in
GetSnapshotData, a backend is skipped entirely if inVacuum is found to
be true.

I've been trying to update my SSH CVS several times today but I can't
reach the server.  Maybe it's the DoS attach that it's been under, I
don't know.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: The vacuum-ignore-vacuum patch

From
Alvaro Herrera
Date:
Jim Nasby wrote:
> On Jul 28, 2006, at 5:05 PM, Hannu Krosing wrote:

> >So instead of actually *solving* one problem you suggest *thinking*
> >about solving the general case ?
> >
> >We have been *thinking* about dead-space-map for at least three
> >years by now.
> 
> No, I just wanted anyone who was actually going to work on this to  
> think about a more general fix. If the vacuum-only fix has a chance  
> of getting into core a version before the general case, I'll happily  
> take what I can get.

Well, the vacuum-only fix has the advantage that the patch has already
been written, tested, discussed, beaten to death, resurrected,
rewritten, and is ready to be committed, while the "general solution" is
not even past the handwaving phase, let alone *thinking*.

And we have only three days before feature freeze, so if you want the
general solution for 8.2 you should start *thinking* really fast :-)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: The vacuum-ignore-vacuum patch

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Here is a patch pursuant to there ideas.  The main change is that in
> GetSnapshotData, a backend is skipped entirely if inVacuum is found to
> be true.

Patch applied.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.