Re: Speedup twophase transactions - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Speedup twophase transactions |
Date | |
Msg-id | CAB7nPqS-6gbaRJjQ0x+hh8zMWMJhyOF2-d97nPdOV+z9b95aJg@mail.gmail.com Whole thread Raw |
In response to | Re: Speedup twophase transactions (Stas Kelvich <s.kelvich@postgrespro.ru>) |
Responses |
Re: Speedup twophase transactions
|
List | pgsql-hackers |
On Sat, Sep 17, 2016 at 2:45 AM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote: > Looking through old version i’ve noted few things that were bothering me: > > * In case of crash replay PREPARE redo accesses SUBTRANS, but StartupSUBTRANS() isn’t called yet > in StartupXLOG(). > * Several functions in twophase.c have to walk through both PGPROC and pg_twophase directory. > > Now I slightly changed order of things in StartupXLOG() so twophase state loaded from from file and > StartupSUBTRANS called before actual recovery starts. So in all other functions we can be sure that > file were already loaded in memory. > > Also since 2pc transactions now are dumped to files only on checkpoint, we can get rid of > PrescanPreparedTransactions() — all necessary info can reside in checkpoint itself. I’ve changed > behaviour of oldestActiveXid write in checkpoint and that’s probably discussable topic, but ISTM > that simplifies a lot recovery logic in both twophase.c and xlog.c. More comments on that in patch itself. Finally I am back on this one.. First be more careful about typos in the comments of your code. Just reading through the patch I found quite a couple of places that needed to be fixed. cosequent, unavalable, SUBTRANCE are some examples among many other things.. + StartupCLOG(); + StartupSUBTRANS(checkPoint.oldestActiveXid); + RecoverPreparedFromFiles(); [...] /* - * Startup commit log and subtrans only. MultiXact and commit - * timestamp have already been started up and other SLRUs are not - * maintained during recovery and need not be started yet. - */ - StartupCLOG(); - StartupSUBTRANS(oldestActiveXID); Something is definitely wrong in this patch if we begin to do that. There is no need to move those two calls normally, and I think that we had better continue to do that only for hot standbys just to improve 2PC handling... CleanupPreparedTransactionsOnPITR() cleans up pg_twophase, but isn't it necessary to do something as well for what is in memory? I have been thinking about this patch quite a bit, and the approach taken looks really over-complicated to me. Basically what we are trying to do here is to reuse as much as possible code paths that are being used by non-recovery code paths during recovery, and then try to adapt a bunch of things like SUBTRANS structures, CLOG, XID handling and PGXACT things in sync to handle the 2PC information in memory correctly. I am getting to think that this is *really* bug-prone in the future and really going to make maintenance hard. Taking one step back, and I know that I am a noisy one, but looking at this patch is making me aware of the fact that it is trying more and more to patch things around the more we dig into issues, and I'd suspect that trying to adapt for example sub-transaction and clog handling is just the tip of the iceberg and that there are more things that need to be taken care of if we continue to move on with this approach. Coming to the point: why don't we simplify things? In short: - Just use a static list and allocate a chunk of shared memory as needed. DSM could come into play here to adapt to the size of a 2PC status file, this way there is no need to allocate a chunk of memory with an upper bound. Still we could use an hardcoded upper-bound of say 2k with max_prepared_transactions, and just write the 2PC status file to disk if its size is higher than what can stick into memory. - save 2PC information in memory instead of writing it to a 2PC file when XLOG_XACT_PREPARE shows up. - flush information to disk when there is a valid restart point in RecoveryRestartPoint(). We would need as well to tweak PrescanPreparedTransactions accordingly than everything we are trying to do here. That would be way more simple than what's proposed here, and we'd still keep all the performance benefits. -- Michael
pgsql-hackers by date: