Re: Proposal: Snapshot cloning - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Proposal: Snapshot cloning
Date
Msg-id 0393D34D-EA0B-4737-BD60-D8C4CB5AD01E@decibel.org
Whole thread Raw
In response to Re: Proposal: Snapshot cloning  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Jan 29, 2007, at 11:28 PM, Tom Lane wrote:
> Jim Nasby <decibel@decibel.org> writes:
>> On Jan 26, 2007, at 4:48 PM, Tom Lane wrote:
>>> I don't actually see that it buys you a darn thing ... you still  
>>> won't
>>> be able to delete dead updated tuples because of the possibility of
>>> the LRT deciding to chase ctid chains up from the tuples it can see.
>
>> Well, Simon was talking about a serialized LRT, which ISTM shouldn't
>> be hunting down ctid chains past the point it serialized at.
>
> How you figure that?  If the LRT wants to update a tuple, it's got to
> chase the ctid chain to see whether the head update committed or not.
> It's not an error for a serializable transaction to update a tuple  
> that
> was tentatively updated by a transaction that rolled back.

Nuts. :(

>> Even if that's not the case, there is also the possibility if a LRT
>> publishing information about what tables it will hit.
>
> I think we already bought 99% of the possible win there by fixing
> vacuum.  Most ordinary transactions aren't going to be able to predict
> which other tables the user might try to touch.

Presumably a single-statement transaction could do that in most (if  
not all) cases.

But even if we didn't support automatically detecting what tables a  
transaction was hitting, we could allow the user to specify it and  
then bomb out if the transaction tried to hit anything that wasn't in  
that list. That would allow users who are creating LRTs to limit  
their impact on vacuum. The safe way to perform that check would be  
to check each buffer before accessing it, but I'm unsure how large a  
performance impact that would entail; I don't know how much code we  
run through to pull a tuple out of a page and do something with it  
compared to the cost of checking if that buffer belongs to a relation/ 
file that's in the "approved list".

Perhaps a better way would be to allow users to mark vacuum-critical  
tables for "restricted" access. To access a restricted table the user  
would need to provide a list of restricted tables that a transaction  
is going to hit (or maybe just lump all restricted tables into one  
group), and that transaction would log it's XID somewhere that vacuum  
can look at. If a transaction that hasn't specified it will touch the  
restricted tables tries to do so it errors out. We might want some  
way to flag buffers as belonging to a restricted table (or one of  
it's indexes) so that transactions that aren't hitting restricted  
tables wouldn't have to pay a large performance penalty to figure  
that out. But you'd only have to mark those buffers when they're read  
in from the OS, and presumably a restricted table will be small  
enough that it's buffers should stay put. Logging the XID could prove  
to be a serialization point, but we could possibly avoid that by  
using per-relation locks.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: proposal: only superuser can change customized_options
Next
From: Jim Nasby
Date:
Subject: Performance penalty of visibility info in indexes?