Thread: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

From
"Florian G. Pflug"
Date:
Hi

When we lazily assign XIDs, we gain another flag beside the existing
MyXactMadeTempRelUpdates, MyXactMadeXLogEntry, MyLastRecPtr and smgr's
pendingDeletes to tell what kind of actions a transaction performed. Adding
TransactionIsIsValid(GetCurrentTransactionIdIfAny()) on top of that
makes things quite impenetrable - at least for me. So I'm trying to
wrap my head around that logic, and simplify it a bit if possible.
(Nowadays, async commit even adds a bit more complexity)

Currently, we write out a COMMIT record if a transaction either created
any transaction-controlled XLOG entries (MyLastRecPtr.xrecoff != 0), or
scheduled the deletion of files on commit. Afterwards, the xlog is flushed
to the end last record created by the session (ProcLastRecEnd) if the
transaction created any xlog record at all. If we either made
transaction-controlled XLOG entries, or temporary relation updates, we
update the XID status in the CLOG.

An ABORT record is currently created if a transaction either created
transaction-controlled XLOG entries or scheduled the deletion of files
on abort. If we schedules file deletions, we flush the XLOG up to the
ABORT record. If we either made transaction-controlled XLOG entries,
updated temporary relations, or scheduled deletions we update the XID
status in the CLOG.

For subtransaction commit, a COMMIT record is emitted if we either made
transaction-controlled XLOG entries, or updated temporary relations.
No XLOG flush is performed.

Subtransaction ABORTS are handled the same way as regular transaction
aborts.

For toplevel transaction commits, we defer flushing the xlog if
synchronous_commit = off, and we didn't schedule any file deletions.

Now, with lazy XID assignment I believe the following holds true
.) If we didn't assign a valid XID, we cannot have made transaction-controlled   XLOG entries (Can be checked by
assertingthat the current transaction id   is valid if XLOG_NO_TRAN isn't set in XLogInsert).
 
.) We cannot have scheduled files for deletion (on either COMMIT or ABORT)   if we don't have a valid XID, since any
suchdeletion will happen together   with a catalog update. Note that this is already assumed to be true for
subtransactions,since they only call RecordSubTransaction{Commit|Abort}   if they have an XID assigned.
 

I propose to do the following in my lazy XID assignment patch - can
anyone see a hole in that?

.) Get rid of MyLastRecPtr and MyXactMadeTempRelUpdates. Those are   superseeded by
TransactionIdIsValid(GetCurrentTransactionIdIfAny()).
.) Get rid of MyXactMadeXLogEntry. Instead, just reset ProcLast
.) Rename ProcLastRecEnd to XactLastRecEnd, and reset when starting   a new toplevel transaction.

Transaction COMMIT:  Write an COMMIT record if and only if we have a valid XID.  Then, flush the XLOG to XactLastRecEnd
ifthat is set, and  synchronous_commit=on.  Aferwards, update the CLOG if and only if we have a valid XID.
 

Transaction ABORT:  Write an ABORT record if and only if we have a valid XID.  Then, flush the XLOG to XactLastRecEnd
ifthat is set, and  we scheduled on-abort deletions.  Update the CLOG if and only if we have a valid XID.
 

Subtransaction COMMIT:  Update the CLOG if and only if we have a valid XID.

Subtransaction ABORT:  Write an ABORT record if and only if we have a valid XID.  Then, flush the XLOG to
XactLastRecEndif that is set, and  we scheduled on-abort deletions.  Update the CLOG if and only if we have a valid
XID.

I think we might go even further, and *never* flush the XLOG on abort,
since if we crash just before the abort won't log anything either. But
if we leak the leftover files in such a case, that's probably a bad idea.

greetings, Florian Pflug



"Florian G. Pflug" <fgp@phlo.org> writes:
> I propose to do the following in my lazy XID assignment patch - can
> anyone see a hole in that?

Cleaning up this area seems like a good idea.  Just FYI, one reason why
there are so many LastRec pointer variables is that the WAL record
format used to include a back-link to the previous record of the same
transaction, so we needed to track that location.  Since that's gone,
simplification is definitely possible.  A lot of the other behavior
you're looking at "just grew" as incremental optimizations added over
time.

One comment is that at the time we make an entry into smgr's
pending-deletes list, I think we might not have acquired an XID yet
--- if I understand your patch correctly, a CREATE TABLE would acquire
an XID when it makes its first catalog insertion, and that happens
after creating the on-disk table file.  So it seems like a good idea
for smgr itself to trigger acquisition of an XID before it makes a
pending-deletes entry.  This ensures that you can't have a situation
where you have deletes to record and no XID; otherwise, an elog
between smgr insertion and catalog insertion would lead to just that.

> .) Rename ProcLastRecEnd to XactLastRecEnd, and reset when starting
>     a new toplevel transaction.

I'm not very happy with that name for the variable, because it looks
like it might refer to the last transaction-controlled record we
emitted, rather than the last record of any type.  Don't have a really
good suggestion though --- CurXactLastRecEnd is the best I can do.

One thought here is that it's not clear that we really need a concept of
transaction-controlled vs not-transaction-controlled xlog records
anymore.  In CVS HEAD, the *only* difference no_tran makes is whether
to set MyLastRecPtr, and you propose removing that variable.  This
seems sane to me --- the reason for having the distinction at all was
Vadim's plan to implement transaction UNDO by scanning its xlog records
backwards, and that idea is as dead as a doornail.  So we could simplify
matters conceptually if we got rid of any reference to such a
distinction.
        regards, tom lane


Re: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> I propose to do the following in my lazy XID assignment patch - can
>> anyone see a hole in that?>
> One comment is that at the time we make an entry into smgr's
> pending-deletes list, I think we might not have acquired an XID yet
> --- if I understand your patch correctly, a CREATE TABLE would acquire
> an XID when it makes its first catalog insertion, and that happens
> after creating the on-disk table file.  So it seems like a good idea
> for smgr itself to trigger acquisition of an XID before it makes a
> pending-deletes entry.  This ensures that you can't have a situation
> where you have deletes to record and no XID; otherwise, an elog
> between smgr insertion and catalog insertion would lead to just that.

I wonder a bit about the whole special-casing
of COMMITs/ABORTs with pending delete, though. A crash might always leave
stray file around, so there ought to be a way to clean them up anyway.
Still, for now I'll go with your suggestion, and force XID assignment
in the smgr.

>> .) Rename ProcLastRecEnd to XactLastRecEnd, and reset when starting
>>     a new toplevel transaction.
> 
> I'm not very happy with that name for the variable, because it looks
> like it might refer to the last transaction-controlled record we
> emitted, rather than the last record of any type.  Don't have a really
> good suggestion though --- CurXactLastRecEnd is the best I can do.

Hm.. don't have a good suggestion, either - the reason I want to rename
it is that ProcLastRecEnd doesn't sound like it's be reset at transaction
start.

> One thought here is that it's not clear that we really need a concept of
> transaction-controlled vs not-transaction-controlled xlog records
> anymore.  In CVS HEAD, the *only* difference no_tran makes is whether
> to set MyLastRecPtr, and you propose removing that variable.  This
> seems sane to me --- the reason for having the distinction at all was
> Vadim's plan to implement transaction UNDO by scanning its xlog records
> backwards, and that idea is as dead as a doornail.  So we could simplify
> matters conceptually if we got rid of any reference to such a
> distinction.

I've thinking about keeping XLOG_NO_TRAN, and doing
if (!no_tran)  Assert(TransactionIdIsValid(GetCurrentTransactionIdIfAny())
in xlog.c as a safety measure. We can't make that assertion
unconditionally, I think, because nextval() won't force XID
assigment, but might do XLogInsert.

greetings, Florian Pflug



"Florian G. Pflug" <fgp@phlo.org> writes:
> Tom Lane wrote:
>> One thought here is that it's not clear that we really need a concept of
>> transaction-controlled vs not-transaction-controlled xlog records
>> anymore.

> I've thinking about keeping XLOG_NO_TRAN, and doing
> if (!no_tran)
>    Assert(TransactionIdIsValid(GetCurrentTransactionIdIfAny())
> in xlog.c as a safety measure.

Why do you think this is a safety measure?  All that it is checking
is whether the caller has preserved an entirely useless distinction.
The real correctness property is that you can't write your XID
into a heap tuple or XLOG record if you haven't acquired an XID,
but that seems nearly tautological.
        regards, tom lane


Re: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> Tom Lane wrote:
>>> One thought here is that it's not clear that we really need a concept of
>>> transaction-controlled vs not-transaction-controlled xlog records
>>> anymore.
> 
>> I've thinking about keeping XLOG_NO_TRAN, and doing
>> if (!no_tran)
>>    Assert(TransactionIdIsValid(GetCurrentTransactionIdIfAny())
>> in xlog.c as a safety measure.
> 
> Why do you think this is a safety measure?  All that it is checking
> is whether the caller has preserved an entirely useless distinction.
> The real correctness property is that you can't write your XID
> into a heap tuple or XLOG record if you haven't acquired an XID,
> but that seems nearly tautological.

I was confused. I wanted to protect against the case the an XID hits
the disk, but doesn't show up in any xl_xid field, and therefore might
be reused after crash recovery. But of course, to make that happen
you'd have to actually *store* the XID into the data you pass to
XLogInsert, which is kind of hard if you haven't asked for it first.

So, I now agree, XLOG_NO_TRAN should be buried.

greetings, Florian Pflug


On Wed, 2007-08-29 at 19:32 +0200, Florian G. Pflug wrote:

> I propose to do the following in my lazy XID assignment patch 

The lazy XID assignment seems to be the key to unlocking this whole
area.

> - can
> anyone see a hole in that?
> 
> .) Get rid of MyLastRecPtr and MyXactMadeTempRelUpdates. Those are
>     superseeded by TransactionIdIsValid(GetCurrentTransactionIdIfAny()).
> .) Get rid of MyXactMadeXLogEntry. Instead, just reset ProcLast
> .) Rename ProcLastRecEnd to XactLastRecEnd, and reset when starting
>     a new toplevel transaction.

I followed you up to this point. Nothing bad springs immediately to
mind, but please can you explain the proposals rather than just assert
them and ask us to find the holes? 

> I think we might go even further, and *never* flush the XLOG on abort,
> since if we crash just before the abort won't log anything either. But
> if we leak the leftover files in such a case, that's probably a bad idea.

That doesn't gain us much, yet we lose the ability to tell the
difference between an abort and a crash.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> One comment is that at the time we make an entry into smgr's
> pending-deletes list, I think we might not have acquired an XID yet
> --- if I understand your patch correctly, a CREATE TABLE would acquire
> an XID when it makes its first catalog insertion, and that happens
> after creating the on-disk table file.  So it seems like a good idea
> for smgr itself to trigger acquisition of an XID before it makes a
> pending-deletes entry.  This ensures that you can't have a situation
> where you have deletes to record and no XID; otherwise, an elog
> between smgr insertion and catalog insertion would lead to just that.

Hm.. I was just going to implement this, but I'm now wondering if
thats really worth it.

For smgrcreate, this would catch the following case:
.) CREATE something
.) smgrcreate: Creates file, and puts it onto the delete-on-abort   list
.) We elog() *before* acquiring an XID
.) RecordTransactionAbort or RecordSubTransactionAbort:   We don't write an ABORT record.
.) We crash *before* actually deleting the file

Compare the probability of that happening (The elog *and* the crash)
with the probability of
.) CREATE something
.) smgrcreate: Creates the file
.) We crash *before* we have to chance to commit or abort.

The window in which a crash causes us to leak the file seems to be much
wider in the second case, yet forcing XID assignment will not help to
preven it, unless I'm overlooking something.

In the smgrunlink case, there is no reason at all to force XID assignment,
because if we abort or crash, we don't want to unlink anyway, and if we
survive until we commit, we'll assign the XID during the inevitable catalog
update.

The only thing the forced XID assignment would buy is to be able to stick
if (TransactionIdIsValid(GetCurrentTransactionIdIfAny()))  Assert(nrels == 0);
into the various Record{Sub|}Transction{Commit|Abort} functions

So unless I'm overlooking something, I believe for now it's best to ignore this
issued, and to do a proper fix in the long run that removes *all* possible
leakages.

greetings, Florian Pflug



"Florian G. Pflug" <fgp@phlo.org> writes:
>> One comment is that at the time we make an entry into smgr's
>> pending-deletes list, I think we might not have acquired an XID yet

> Hm.. I was just going to implement this, but I'm now wondering if
> thats really worth it.

Basically what you'd give up is the ability to Assert() that there are
no deletable files if there's no XID, which seems to me to be an
important cross-check ... although maybe making smgr do that turns
this "cross-check" into a tautology ... hmm.  I guess the case that's
bothering me is where we reach commit with deletable files and no XID.
But that should probably be an error condition anyway, ie, we should
error out and turn it into an abort.  On the abort side we'd consider
it OK to have files and no XID.  Seems reasonable to me.

The only way we could make this more robust is if we could have
WAL-before-data rule for file *creation*, but I think that's not
possible given that we don't know what relfilenode number we will use
until we've successfully created a file.  So there will always be
windows where a crash leaks unreferenced files.  There's been some
debate about having crash recovery search for and delete such files, but
so far I've resisted it on the grounds that it sounds like data loss
waiting to happen --- someday it'll delete a file you wished it'd kept.
        regards, tom lane


Re: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>>> One comment is that at the time we make an entry into smgr's
>>> pending-deletes list, I think we might not have acquired an XID yet
> 
>> Hm.. I was just going to implement this, but I'm now wondering if
>> thats really worth it.
> 
> Basically what you'd give up is the ability to Assert() that there are
> no deletable files if there's no XID, which seems to me to be an
> important cross-check ... although maybe making smgr do that turns
> this "cross-check" into a tautology ... hmm.  I guess the case that's
> bothering me is where we reach commit with deletable files and no XID.
> But that should probably be an error condition anyway, ie, we should
> error out and turn it into an abort.  On the abort side we'd consider
> it OK to have files and no XID.  Seems reasonable to me.

I've done that now, and it turned out nicely. There is an Assertion
on "(nrels == 0) || xid assigned" in the COMMIT path, but
not in the ABORT path. Seems reasonable and safe.

And I'm quite tempted to not flush the XLOG at all during ABORT, and to
only force synchronous commits if one of the to-be-deleted files is
non-temporary. The last idea widens the leakage window quite a bit
though, so I maybe I should rather resist that temptation...

OTOH, it'd allow aynchronous commits for transactions that created
temporary tables.

> The only way we could make this more robust is if we could have
> WAL-before-data rule for file *creation*, but I think that's not
> possible given that we don't know what relfilenode number we will use
> until we've successfully created a file.  So there will always be
> windows where a crash leaks unreferenced files.  There's been some
> debate about having crash recovery search for and delete such files, but
> so far I've resisted it on the grounds that it sounds like data loss
> waiting to happen --- someday it'll delete a file you wished it'd kept.

It seems doable, but it's not pretty. One possible scheme would be to
emit a record *after* chosing a name but *before* creating the file,
and then a second record when the file is actually created successfully.

Then, during replay we could remember a list of xids and filenames,
and remove those files for which we either haven't seen a "created
successfully" record, or no COMMIT record for the creating xid.

With this scheme, I'd be natural to force XID assignment in smgrcreate,
because we'd actually depend on logging the xid there.

greetings, Florian Pflug



"Florian G. Pflug" <fgp@phlo.org> writes:
> And I'm quite tempted to not flush the XLOG at all during ABORT, and to
> only force synchronous commits if one of the to-be-deleted files is
> non-temporary.

+1 on the first, but -1 on the second, because we'd have to track
whether deleted files are temp or not ... it's very unclear that it'd
be worth the trouble.

> OTOH, it'd allow aynchronous commits for transactions that created
> temporary tables.

It'd be for xacts that *dropped* temp tables, no?  I'm not sure that
is a performance-critical path --- probably it more usually gets done
after the client's already disconnected.

>> The only way we could make this more robust is if we could have
>> WAL-before-data rule for file *creation*, but I think that's not
>> possible given that we don't know what relfilenode number we will use
>> until we've successfully created a file.

> It seems doable, but it's not pretty. One possible scheme would be to
> emit a record *after* chosing a name but *before* creating the file,

No, because the way you know the name is good is a successful
open(O_CREAT).
        regards, tom lane


Re: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> And I'm quite tempted to not flush the XLOG at all during ABORT, and to
>> only force synchronous commits if one of the to-be-deleted files is
>> non-temporary.
> 
> +1 on the first, but -1 on the second, because we'd have to track
> whether deleted files are temp or not ... it's very unclear that it'd
> be worth the trouble.

I'm confused. smgrcreate already tracks if it created a file for a temp
relation or not (at least it has an isTemp parameter, which it also
stores in the delete list). I'd have thought the second suggestion would
amount to a few lines of code in smgrGetPendingDeletes to let it report
if all pending deletes carried the isTemp flag or not.

>>> The only way we could make this more robust is if we could have
>>> WAL-before-data rule for file *creation*, but I think that's not
>>> possible given that we don't know what relfilenode number we will use
>>> until we've successfully created a file.
> 
>> It seems doable, but it's not pretty. One possible scheme would be to
>> emit a record *after* chosing a name but *before* creating the file,
> 
> No, because the way you know the name is good is a successful
> open(O_CREAT).

The idea was to log *twice*. Once the we're about to create a file, and
the second time that we succeeded. That way, the filename shows up in the
log, even if we crash immediatly after physically creating the file, which
gives recovery at least a chance to clean up the mess.

greetings, Florian Pflug



"Florian G. Pflug" <fgp@phlo.org> writes:

>>> It seems doable, but it's not pretty. One possible scheme would be to
>>> emit a record *after* chosing a name but *before* creating the file,
>>
>> No, because the way you know the name is good is a successful
>> open(O_CREAT).
>
> The idea was to log *twice*. Once the we're about to create a file, and
> the second time that we succeeded. That way, the filename shows up in the
> log, even if we crash immediatly after physically creating the file, which
> gives recovery at least a chance to clean up the mess.

It sounds like if the reason it fails is because someone else created the same
file name you'll delete the wrong file?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


"Florian G. Pflug" <fgp@phlo.org> writes:
> Tom Lane wrote:
>> +1 on the first, but -1 on the second, because we'd have to track
>> whether deleted files are temp or not ... it's very unclear that it'd
>> be worth the trouble.

> I'm confused. smgrcreate already tracks if it created a file for a temp
> relation or not (at least it has an isTemp parameter, which it also
> stores in the delete list).

Doh, I was thinking it did not track that --- serves me right for not
checking the code before opining.

>> No, because the way you know the name is good is a successful
>> open(O_CREAT).

> The idea was to log *twice*. Once the we're about to create a file, and
> the second time that we succeeded. That way, the filename shows up in the
> log, even if we crash immediatly after physically creating the file, which
> gives recovery at least a chance to clean up the mess.

Hmm.  Seems like a lot of work for something that's not a large problem.
Anyway, I'd counsel not thinking about that for now, but keep the patch
focused on solving one problem.
        regards, tom lane


Re: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

From
"Florian G. Pflug"
Date:
Gregory Stark wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
> 
>>>> It seems doable, but it's not pretty. One possible scheme would be to
>>>> emit a record *after* chosing a name but *before* creating the file,
>>> No, because the way you know the name is good is a successful
>>> open(O_CREAT).
>> The idea was to log *twice*. Once the we're about to create a file, and
>> the second time that we succeeded. That way, the filename shows up in the
>> log, even if we crash immediatly after physically creating the file, which
>> gives recovery at least a chance to clean up the mess.
> 
> It sounds like if the reason it fails is because someone else created the same
> file name you'll delete the wrong file?

Carefull bookkeeping during recovery should be able to eliminate that risk,
I think. I've thought a bit more like this, and came up with the following
idea that also take checkpoints into account.

We keep a global table of (xid, filename) pairs in shared memory. File creation
becomes  1) Generate a new filename  2) Add (CurrentTransactionId, filename) to the list, emit a XLOG record     saying
wedid this, and flush the log. If the filename is already on     the list, start over at (1).  3) Create the file. If
thisfails, delete the list entry and the file,     and start over at (1).  4) On (sub) transaction ABORT, we remove
entrieswith the xids we abort,     and delete the files.  5) On top transaction COMMIT, we remove entries with the xids
wecommit,     and keep the files.  6) During top transaction PREPARE, we record the entries with matching xids     in
the2PC state file.
 

When creating a checkpoint, we include the global filelist in the checkpoint. We
might need some interlock to ensure that concurrent global filelist updates 
don't get lost - but maybe doing things in the correct order is sufficient to
guarantee this.

During recovery, we track the fate of the files in a similar (but local) list. .) We initialize our local tracking list
withthe one found in the latest    CHECKPOINT. .) When we encounter a COMMIT record, we remove all files with xids
matching   those in the COMMIT record without deleting them. .) When we encounter a PREPARE record, we remove all files
withmatching xids,    and record them in the 2PC state file. They are deleted if the PREPARED    transaction is
aborted..) When we encounter an ABORT record, we remove all files with matching xids    from the list, and delete them.
.)When we encounter a runtime CHECKPOINT, it's list should match our tracking    list. .) When we encounter a shutdown
CHECKPOINT,we remove all files from our local    list that are not in the checkpoint's list, and delete those files.
 

The XLOG flush in step (2) is pretty nasty, but I think any solution that
guarantees to prevent leaks will have to flush something to disk at that
point. The global table isn't too appealing either, because it
will limit how many concurrent transactions will be able to create files. It
could be replaced by some on-disk thing, though.

This solution sounds rather heavy-weight, but I thought I'd share the idea.

Back to work on lazy xid assignment now ;-)

greetings, Florian Pflug


Re: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> Tom Lane wrote:
>>> +1 on the first, but -1 on the second, because we'd have to track
>>> whether deleted files are temp or not ... it's very unclear that it'd
>>> be worth the trouble.
> 
>> I'm confused. smgrcreate already tracks if it created a file for a temp
>> relation or not (at least it has an isTemp parameter, which it also
>> stores in the delete list).
> 
> Doh, I was thinking it did not track that --- serves me right for not
> checking the code before opining.

So should we not force synchronous commit if all to-be-deleted files
are temporary? Or is that pushing our luck too much, because it opens
the window of possible file leakage considerably wider?

I don't have a strong preference - it seems to be low-hanging fruit
performance-wise - but then, maybe that fruit should be left hanging
until a time where we have some way of removing unreferenced files...

Thoughts, anyone?

>>> No, because the way you know the name is good is a successful
>>> open(O_CREAT).
> 
>> The idea was to log *twice*. Once the we're about to create a file, and
>> the second time that we succeeded. That way, the filename shows up in the
>> log, even if we crash immediatly after physically creating the file, which
>> gives recovery at least a chance to clean up the mess.
> 
> Hmm.  Seems like a lot of work for something that's not a large problem.
> Anyway, I'd counsel not thinking about that for now, but keep the patch
> focused on solving one problem.

Don't worry, I won't be doing anything like that in this patch ;-) But since
my work on the patch made me thing about these things, I thought I'd share
my ideas...

greetings, Florian Pflug



"Florian G. Pflug" <fgp@phlo.org> writes:
> So should we not force synchronous commit if all to-be-deleted files
> are temporary? Or is that pushing our luck too much, because it opens
> the window of possible file leakage considerably wider?

I think this area is something to be left for later.  Just worry about
XID avoidance for now, and maybe we'll revisit file creation/deletion
at another time.  (I'm not of the opinion that it's so broken that we
need to force two WAL flushes per file creation to fix it.)
        regards, tom lane


On Thu, 2007-08-30 at 00:33 -0400, Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
> > And I'm quite tempted to not flush the XLOG at all during ABORT, and to
> > only force synchronous commits if one of the to-be-deleted files is
> > non-temporary.
> 
> +1 on the first, but -1 on the second, because we'd have to track
> whether deleted files are temp or not ... it's very unclear that it'd
> be worth the trouble.

+1 to first: Aborts aren't particularly frequent actions, but having
them return faster makes sense if you have a session pool and people are
waiting to run another transaction.

-1 to second: If there are any temporary deleted files then it is
because we've done a large sort, hash join etc, so avoiding the sync
makes no difference to the overall response time. 

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



On Thu, 2007-08-30 at 14:14 -0400, Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
> > So should we not force synchronous commit if all to-be-deleted files
> > are temporary? Or is that pushing our luck too much, because it opens
> > the window of possible file leakage considerably wider?
> 
> I think this area is something to be left for later.  Just worry about
> XID avoidance for now, and maybe we'll revisit file creation/deletion
> at another time.  (I'm not of the opinion that it's so broken that we
> need to force two WAL flushes per file creation to fix it.)

Agreed

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Simon Riggs <simon@2ndquadrant.com> writes:
> -1 to second: If there are any temporary deleted files then it is
> because we've done a large sort, hash join etc, so avoiding the sync
> makes no difference to the overall response time. 

I think you're confused, actually: this is not about temporary sort
files, it's about dropping temp relations.  The scenario for this to
happen in ABORT is a rollback of a transaction that made a temp table.

Whether that's common enough to be worth special optimization, I'm
not sure.  Your point that the transaction may have been heavyweight
enough that it'd hardly matter is still on-target, you just got there
by the wrong reasoning ;-)
        regards, tom lane


Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > -1 to second: If there are any temporary deleted files then it is
> > because we've done a large sort, hash join etc, so avoiding the sync
> > makes no difference to the overall response time. 
> 
> I think you're confused, actually: this is not about temporary sort
> files, it's about dropping temp relations.  The scenario for this to
> happen in ABORT is a rollback of a transaction that made a temp table.

Why don't we create temp tables in a separate directory, anyway?  That
would make it trivially easy to get rid of them during recovery: just
zap all the files in there.

-- 
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)


Re: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> Whether that's common enough to be worth special optimization, I'm
> not sure.  Your point that the transaction may have been heavyweight
> enough that it'd hardly matter is still on-target, you just got there
> by the wrong reasoning ;-)

My point is that there is hardly any effort needed to code this (Certainly
less then what went into the mails about this until now ;-). As far as
I can see, it's adding a parameter "bool *allRelsTemp" to GetPendingDeletes,
and doing "*allRelsTemp &&= pendingDelete->isTemp" in that function.

So IMHO the question is really only about increasing the possibility of
file leakage this optimization would introduce, not about the effort needed
to implement this.

And actually, with that point of view, Simon's argument is actually in *favor*
of *allowing* asynchronous commit if only temptables are to be deleted.
Because *if* the transaction did a lot of work after creating a
(on-commit drop) temp table, then window where a crash leads to file leakage
*is* already very large. So it bit more by waiting for the wal writer
to flush the COMMIT record won't matter much.

I just wanted to write that I'll let this matter rest now, since IMHO none
of the arguments, neither pro nor contra, are really convincing, and so
keeping the status quo seems to be the most appropriate.

But just that second Alvaro's mail showed up in my inbox, saying that
temptable files are created in a special directory anyway, so cleaning
*them* up after a crash is really easy. If that is true for all
tablespaces, then this might be the convincing argument in favour of
allowing asynchronous commit if all pending deletes are temptables.

greetings, Florian Pflug



Alvaro Herrera <alvherre@commandprompt.com> writes:
> Why don't we create temp tables in a separate directory, anyway?  That
> would make it trivially easy to get rid of them during recovery: just
> zap all the files in there.

Hmm ... doesn't do anything to fix the problem of file leakage for
crashes when making/deleting non-temp tables, but at least we could be
reasonably confident we weren't throwing away any data someone had a
long-term interest in.
        regards, tom lane