Re: [HACKERS] snapbuild woes - Mailing list pgsql-hackers
From | Petr Jelinek |
---|---|
Subject | Re: [HACKERS] snapbuild woes |
Date | |
Msg-id | ba7a3052-6e71-d162-d4a0-b5bdd7145c19@2ndquadrant.com Whole thread Raw |
In response to | Re: [HACKERS] snapbuild woes (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [HACKERS] snapbuild woes
|
List | pgsql-hackers |
On 01/05/17 21:14, Andres Freund wrote: > On 2017-05-01 11:09:44 +0200, Petr Jelinek wrote: >> On 01/05/17 10:03, Andres Freund wrote: >>> On 2017-05-01 03:54:49 +0200, Petr Jelinek wrote: > >>>> But, I still think we need to restart the tracking after new >>>> xl_running_xacts. Reason for that is afaics any of the catalog snapshots >>>> that we assigned to transactions at the end of SnapBuildCommitTxn might >>>> be corrupted otherwise as they were built before we knew one of the >>>> supposedly running txes was actually already committed and that >>>> transaction might have done catalog changes. >>> >>> I'm afraid you're right. But I think this is even more complicated: The >>> argument in your version that this can only happen once, seems to also >>> be holey: Just imagine a pg_usleep(3000 * 1000000) right before >>> ProcArrayEndTransaction() and enjoy the picture. > >> Well yes, transaction can in theory have written commit/abort xlog >> record and stayed in proc for more than single xl_running_xacts write. >> But then the condition which we test that the new xl_running_xacts has >> bigger xmin than the previously tracked one's xmax would not be >> satisfied and we would not enter the relevant code path yet. So I think >> we should not be able to get any xids we didn't see. But we have to >> restart tracking from beginning (after first checking if we didn't >> already see anything that the xl_running_xacts considers as running), >> that's what my code did. > > But to get that correct, we'd have to not only track ->committed, but > also somehow maintain ->aborted, and not just for the transactions in > the original set of running transactions. That'd be fairly complicated > and large. The reason I was trying - and it's definitely not correct as > I had proposed - to use the original running_xacts record is that that > only required tracking as many transaction statuses as in the first > xl_running_xacts. Am I missing something? Aah, now I understand we talked about slightly different things, I considered the running thing to be first step towards tracking aborted txes everywhere. I am not sure if it's complicated, it would be exactly the same as committed tracking, except we'd do it only before we reach SNAPBUILD_CONSISTENT. It would be definitely larger patch I agree, but I can give it at try. If you think that adding the SNAPBUILD_BUILD_INITIAL_SNAPSHOT would be less invasive/smaller patch I am okay with doing that for PG10. I think we'll have to revisit tracking of aborted transactions in PG11 then though because of the 'snapshot too large' issue when exporting, at least I don't see any other way to fix that. > > The probabilistic tests catch the issues here fairly quickly, btw, if > you run with synchronous_commit=on, while pgbench is running, because > the WAL flushes make this more likely. Runs this query: > > SELECT account_count, teller_count, account_sum - teller_sum s > FROM > ( > SELECT count(*) account_count, SUM(abalance) account_sum > FROM pgbench_accounts > ) a, > ( > SELECT count(*) teller_count, SUM(tbalance) teller_sum > FROM pgbench_tellers > ) t > > which, for my scale, should always return: > ┌─────────┬─────┬───┐ > │ a │ t │ s │ > ├─────────┼─────┼───┤ > │ 2000000 │ 200 │ 0 │ > └─────────┴─────┴───┘ > but with my POC patch occasionally returns things like: > ┌─────────┬─────┬───────┐ > │ a │ t │ s │ > ├─────────┼─────┼───────┤ > │ 2000000 │ 212 │ 37358 │ > └─────────┴─────┴───────┘ > > which obviously shouldn't be the case. > Very nice (the test, not the failures ;)) ! -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: