On 19/09/10 21:57, I wrote:
> Putting that aside for now, we have one very serious problem with this
> algorithm:
>
>> While they [SIREAD locks] are associated with a transaction, they must
>> survive
>> a successful COMMIT of that transaction, and remain until all overlapping
> > transactions complete.
>
> Long-running transactions are already nasty because they prevent VACUUM
> from cleaning up old tuple versions, but this escalates the problem to a
> whole new level. If you have one old transaction sitting idle, every
> transaction that follows consumes a little bit of shared memory, until
> that old transaction commits. Eventually you will run out of shared
> memory, and will not be able to start new transactions anymore.
>
> Is there anything we can do about that? Just a thought, but could you
> somehow coalesce the information about multiple already-committed
> transactions to keep down the shared memory usage? For example, if you
> have this:
>
> 1. Transaction <slow> begins
> 2. 100 other transactions begin and commit
>
> Could you somehow group together the 100 committed transactions and
> represent them with just one SERIALIZABLEXACT struct?
Ok, I think I've come up with a scheme that puts an upper bound on the
amount of shared memory used, wrt. number of transactions. You can still
run out of shared memory if you lock a lot of objects, but that doesn't
worry me as much.
When a transaction is commits, its predicate locks must be held, but
it's not important anymore *who* holds them, as long as they're hold for
long enough.
Let's move the finishedBefore field from SERIALIZABLEXACT to
PREDICATELOCK. When a transaction commits, set the finishedBefore field
in all the PREDICATELOCKs it holds, and then release the
SERIALIZABLEXACT struct. The predicate locks stay without an associated
SERIALIZABLEXACT entry until finishedBefore expires.
Whenever there are two predicate locks on the same target that both
belonged to an already-committed transaction, the one with a smaller
finishedBefore can be dropped, because the one with higher
finishedBefore value covers it already.
There. That was surprisingly simple, I must be missing something.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com