Thread: Re: [BUGS] server crash in very big transaction [postgresql 8.0beta1]
I wrote: > What is happening of course is that more than 16K subtransaction IDs > won't fit in a commit record (since XLOG records have a 16-bit length > field). We're gonna have to rethink the representation of subxact > commit in XLOG. After some further thought, I think there are basically two ways to attack this: 1. Allow XLOG records to be larger than 64K. 2. Split transaction commit into multiple XLOG records when there are many subtransactions. #2 looks pretty painful because of the need to ensure that transaction commit is still an atomic action. It's probably doable in principle with something similar to the solution we are using for btree page split logging (ie, record enough info so that the replay logic can complete the commit even if the later records aren't recoverable from the log). But I don't see all the details right off, and it sure seems risky. I'm inclined to go with #1. There are various ways we could do it but the most straightforward would be to just widen the xl_len field to 32 bits. This would cost either 4 or 8 bytes per XLOG record (because of MAXALIGN restrictions) but we could more than buy that back by eliminating the xl_prev and/or xl_xact_prev fields, which have no use in the current system. (They were intended to support UNDO but it seems clear that we will never do that.) Or we could assign an rmgr value to represent an "extension" record that is to be merged with a following "normal" record. This is kinda klugy but would avoid wasting bits on xl_len in the vast majority of records. Also we'd not have to force an initdb since the file format would remain upward-compatible. Thoughts? regards, tom lane
On Tue, 24 Aug 2004, Tom Lane wrote: > I wrote: > > What is happening of course is that more than 16K subtransaction IDs > > won't fit in a commit record (since XLOG records have a 16-bit length > > field). We're gonna have to rethink the representation of subxact > > commit in XLOG. > > After some further thought, I think there are basically two ways to > attack this: > > 1. Allow XLOG records to be larger than 64K. > > 2. Split transaction commit into multiple XLOG records when there are > many subtransactions. > [snip] > I'm inclined to go with #1. There are various ways we could do it > but the most straightforward would be to just widen the xl_len field > to 32 bits. This would cost either 4 or 8 bytes per XLOG record > (because of MAXALIGN restrictions) but we could more than buy that back > by eliminating the xl_prev and/or xl_xact_prev fields, which have no use > in the current system. (They were intended to support UNDO but it seems > clear that we will never do that.) If we have to do an initdb for a subsequent beta, could we just remove these anyway? By my count, we've got at least 16 bytes there. As for extending the length of xl_len, what happens if someone now has 2^30 subtransaction IDs (as unlikely as that sounds)? What I mean is, it would be good if we could detect this at a point when we can issue an ERROR. If we go down this path, we should also document the maximum number of sub transaction IDs which can be used within a single block so that if/when people look at doing stuff on that scale are aware of the limitations. > > Or we could assign an rmgr value to represent an "extension" record that > is to be merged with a following "normal" record. This is kinda klugy > but would avoid wasting bits on xl_len in the vast majority of records. > Also we'd not have to force an initdb since the file format would > remain upward-compatible. This is a better idea, I think, as it avoids the problems above and, as you say, will be binary compatible. Gavin
On Wed, Aug 25, 2004 at 11:21:49AM +1000, Gavin Sherry wrote: > On Tue, 24 Aug 2004, Tom Lane wrote: > > 1. Allow XLOG records to be larger than 64K. > > > > 2. Split transaction commit into multiple XLOG records when there are > > many subtransactions. > > [snip] > > > I'm inclined to go with #1. There are various ways we could do it > > but the most straightforward would be to just widen the xl_len field > > to 32 bits. This would cost either 4 or 8 bytes per XLOG record > > (because of MAXALIGN restrictions) but we could more than buy that back > > by eliminating the xl_prev and/or xl_xact_prev fields, which have no use > > in the current system. (They were intended to support UNDO but it seems > > clear that we will never do that.) If we agree to never implement UNDO, there's a bunch of other code that could be removed. Is there anyone that thinks we have any chance of not doing it? OTOH, if those fields are unused, we could just remove them for now in any case. It's unlikely that there won't be a catalog update for some other reason before someone implements UNDO anyway. > As for extending the length of xl_len, what happens if someone now has > 2^30 subtransaction IDs (as unlikely as that sounds)? The commit xlog record also carries dropped table information, 12 bytes apiece (on 32 bit machines?). It's unlikely that anyone will drop 2^13 tables on a single transaction, but it adds to the child xid list. > > Or we could assign an rmgr value to represent an "extension" record that > > is to be merged with a following "normal" record. This is kinda klugy > > but would avoid wasting bits on xl_len in the vast majority of records. > > Also we'd not have to force an initdb since the file format would > > remain upward-compatible. > > This is a better idea, I think, as it avoids the problems above and, as > you say, will be binary compatible. I also think this is a good idea. Would it be generalized or only applicable to xl_xact_{commit,abort} records? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Vivir y dejar de vivir son soluciones imaginarias. La existencia está en otra parte" (Andre Breton)
Gavin Sherry <swm@linuxworld.com.au> writes: > As for extending the length of xl_len, what happens if someone now has > 2^30 subtransaction IDs (as unlikely as that sounds)? They'll have run out of RAM to store the subxact-related storage before that (not to mention most likely have exhausted the CommandCounter range, not to mention exhausted their patience --- it takes a good while even to exercise the 2^16-subxact case). I'm satisfied if we can approach that limit. Exceeding it will be a task for some other release. regards, tom lane
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > If we agree to never implement UNDO, there's a bunch of other code that > could be removed. Yeah, I've been thinking of going around and cleaning out the deadwood, but beta is not the time for it. > The commit xlog record also carries dropped table information, 12 bytes > apiece (on 32 bit machines?). Good point --- someone will eventually hit that case too, if we don't increase the XLOG record size limit. >>> Or we could assign an rmgr value to represent an "extension" record that >>> is to be merged with a following "normal" record. > I also think this is a good idea. Would it be generalized or only > applicable to xl_xact_{commit,abort} records? I was envisioning it as a general mechanism --- I see no point in restricting it to commit/abort records. If anything it would take extra code to restrict it to that case ... regards, tom lane
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Wed, Aug 25, 2004 at 11:21:49AM +1000, Gavin Sherry wrote: >> On Tue, 24 Aug 2004, Tom Lane wrote: >>> Or we could assign an rmgr value to represent an "extension" record that >>> is to be merged with a following "normal" record. This is kinda klugy >>> but would avoid wasting bits on xl_len in the vast majority of records. >>> Also we'd not have to force an initdb since the file format would >>> remain upward-compatible. >> >> This is a better idea, I think, as it avoids the problems above and, as >> you say, will be binary compatible. > I also think this is a good idea. Would it be generalized or only > applicable to xl_xact_{commit,abort} records? After looking into this I've decided that it's not very practical --- it would require major rewriting of XLogInsert, which I'm disinclined to do at this stage of the beta cycle. Widening the xl_len field seems much safer. It's not really an initdb-forcing change anyway; all you need to do to upgrade an existing 8.0beta1 installation is run pg_resetxlog (assuming you shut down the old postmaster cleanly). regards, tom lane
This has just been fixed by Tom and will be in beta2. --------------------------------------------------------------------------- Tom Lane wrote: > Gavin Sherry <swm@linuxworld.com.au> writes: > > As for extending the length of xl_len, what happens if someone now has > > 2^30 subtransaction IDs (as unlikely as that sounds)? > > They'll have run out of RAM to store the subxact-related storage before > that (not to mention most likely have exhausted the CommandCounter > range, not to mention exhausted their patience --- it takes a good while > even to exercise the 2^16-subxact case). I'm satisfied if we can > approach that limit. Exceeding it will be a task for some other release. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073