Thread: SSI SLRU strategy choices
I'm now deep enough into the SLRU techniques to see what my options are for storing the data appropriate for SLRU. This consists of uint64 commitSeqNo (which is overkill enough that I'd be comfortable stealing a bit or two from the high end in SLRU usage) which needs to be associated with an xid. The xids would have gaps, since we only need to track committed serializable transactions which still matter because of a long-running transaction weren't subject to early cleanup based on previously posted rules. These will be looked up by xid. The options I see are: (1) Store the xid and commitSeqNo in each SLRU entry -- with alignment, that's 16 bytes per entry. Simple, but requires sequential search for the xid. Wouldn't scale well. (2) Use 8 byte SLRU entries and map the xid values over the SLRU space, with each spot allowing two different xid values. At first blush that looks good, because transaction ID wrap-around techniques mean that the two values for any one spot couldn't be active at the same time. The high bit could flag that the xid is "present" with the rest of the bits being from the commitSeqNo. The problem is that the SLRU code appears to get confused about there being wrap-around when the SLRU space is half-full, so we would get into trouble if we burned through more than 2^30 transactions during one long-running serializable read write transaction. I still like this option best, with resort to killing the long-running transaction at that point. (3) Use two SLRU spaces. You'd look up randomly into the first one based on xid, and get a position in the second one which would hold the commitSeqNo, which would be assigned to sequential slots. This would potentially allow us to burn through more transactions because some are likely to be subject to early cleanup. The marginal extension of the failure point doesn't seem like it merits the extra complexity. (4) Change SLRU to tolerate more entries. At most this raises the number of transactions we can burn through during a long-running transaction from 2^30 to 2^31. That hardly seems worth the potential to destabilize a lot of critical code. Does (2) sound good to anyone else? Other ideas? Does it sound like I'm totally misunderstanding anything? -Kevin
On 29.12.2010 00:10, Kevin Grittner wrote: > (2) Use 8 byte SLRU entries and map the xid values over the SLRU > space, with each spot allowing two different xid values. At first > blush that looks good, because transaction ID wrap-around techniques > mean that the two values for any one spot couldn't be active at the > same time. The high bit could flag that the xid is "present" with > the rest of the bits being from the commitSeqNo. The problem is > that the SLRU code appears to get confused about there being > wrap-around when the SLRU space is half-full, so we would get into > trouble if we burned through more than 2^30 transactions during one > long-running serializable read write transaction. I still like this > option best, with resort to killing the long-running transaction at > that point. If you burn through more than 2^30 XIDs while a long-running transaction is still running, you have bigger problems. 2^30 is the maximum number of XIDs you can burn through before you reach XID wrap-around anyway. GetNewTransaction() will stop assigning new XIDs when you approach that limit. (I'm not sure how you arrived at that number, though. ISTM that the slru code can handle 2^30 *pages* before getting into trouble, assuming the PagePrecedes function can handle that.) The only issue I can see with that is that you allocate those 8 bytes for every xid, even if it's a non-serializable transaction or a subtransaction. But the overhead is probably not significant in practice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I'm not sure how you arrived at that number, though. http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/include/access/slru.h;h=710cca70acd67e03e5f3a255b048a719ae4c4709 The way I read this, each segment is (BLCKSZ * SLRU_PAGES_PER_SEGMENT) long, which is (8kB * 32), or 256kB. The number of files is limited to 64k because of the 0000 to FFFF segment file naming. So total space is limited to 16GB. When an SLRU is used to store xids for random access, that's 4 bytes per entry, so 2^32 entries are possible, but SLRU code considers it a problem for the space to become more than half full. With the eight byte entries I need, there are 2^31 slots for entries, with the ability to use 2^30 before it becomes half full and SLRU complains. Does that look right to you, or have I misunderstood something? > The only issue I can see with that is that you allocate those 8 > bytes for every xid, even if it's a non-serializable transaction > or a subtransaction. But the overhead is probably not significant > in practice. Right. And it avoids having to sequentially search for the desired xid. A sequential search seems to me like it would get into O(N^2) performance under extreme load, whereas this approach has a couple performance plateaus at O(1) which will be, I think, the normal case, and only goes to O(N) performance under extreme load. -Kevin
Excerpts from Kevin Grittner's message of mié dic 29 12:20:20 -0300 2010: > http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/include/access/slru.h;h=710cca70acd67e03e5f3a255b048a719ae4c4709 > > The way I read this, each segment is (BLCKSZ * > SLRU_PAGES_PER_SEGMENT) long, which is (8kB * 32), or 256kB. The > number of files is limited to 64k because of the 0000 to FFFF > segment file naming. So total space is limited to 16GB. When an > SLRU is used to store xids for random access, that's 4 bytes per > entry, so 2^32 entries are possible, but SLRU code considers it a > problem for the space to become more than half full. With the eight > byte entries I need, there are 2^31 slots for entries, with the > ability to use 2^30 before it becomes half full and SLRU complains. If these limitations become a problem, you can always change them. A couple of zeroes at the start of the pg_clog filenames aren't going to bother anyone, I don't think. Not so sure about your new proposed design's space usage. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> wrote: > If these limitations become a problem, you can always change them. > A couple of zeroes at the start of the pg_clog filenames aren't > going to bother anyone, I don't think. Not so sure about your new > proposed design's space usage. I guess that's a call the community can make now -- if a serializable transaction which is not flagged as read only remains open long enough for over a billion other transactions to commit, is it OK for the old transaction to be automatically canceled? Is it worth messing with the SLRU limits to double that? Beyond a certain point you have transaction ID wrap-around, so at that point this would be the least of your troubles -- canceling the old transaction might even be helpful. I thought that was at 2 billion, but Heikki was saying it's at 1 billion in an earlier post. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > if a serializable transaction which is not flagged as read only > remains open long enough for over a billion other transactions to > commit Maybe a clarification and example would be useful. We're talking about going through a billion transactions which were assigned a TransactionId, not all database transactions. An example of how you could hit that is with a sustained commit rate of 5000 transactions per second which are modifying data while a single read write transaction stays open for 2.3 days. -Kevin