Striping CLOG and Subtrans - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Striping CLOG and Subtrans |
Date | |
Msg-id | 1133615666.2906.765.camel@localhost.localdomain Whole thread Raw |
Responses |
Re: Striping CLOG and Subtrans
|
List | pgsql-hackers |
Transaction Ids are assigned consecutively. As a result, access to the bitmaps for CLOG and Subtrans will have a tendency to access the just-recently allocated end of those data structures, which are updated on successful transaction creation/completion (high level view). Earlier we found that CPU false sharing has a performance and scalability reducing effect. Any cache line that needs to be shared across multiple backends causes additional memory accesses and cache line bouncing. Currently, the layout of CLOG and Subtrans is also consecutive, i.e. an XId and XId+1 are adjacent in memory. Within each cache line for CLOG, we will have about 256 transactions, assuming the lowest cache line of any modern CPU, 64 bytes. If all transactions are roughly the same duration, then we may get a worst case that nearly all of those transactions require nearly simultaneous access. We can imagine many situations where we are far from the worst case, but there will be times when contention for those cache lines is high. The actual situation seems likely to be multiple overlayed Normal distributions, but in any case: there will be "hot spots". These hot spots could be responsible for extended locking times, which then cause contention for the LWlocks protecting CLOG and Subtrans. That contention effects everybody. I propose that we lay the XIds out differently within the CLOG and Subtrans, so that XId and XId+1 are at least one cache line apart. In this way, access to 256 adjacent XId numbers would access 256 different cache lines within one page of the CLOG. The layout implied by TransactionIdToPage() would remain. So we align the XIds in consecutive pages, but make the on-page layout significantly different - but retain the same size and other characteristics. Thus, it would seem that this could be almost a 3 line change: Clog.cTransactionIdToByte()TransactionIdToBIndex() Subtrans.cTransactionIdToEntry() These would be changed so that we stripe the XIds across the page, rather than assume they are sequential. The exact striping mechanism is just some simple modulo arithmetic based upon a multiple of ALIGNOF_BUFFER. The Subtrans code makes no assumption about the way in which XIds are located, but there is an assumption within the CLOG code: At startup time, all XIds on the current page higher than the current XId are zeroed by writing zeroes to higher bytes. I would propose that we still perform the logical operation of zeroing later XIds, but in a slightly different physical form. The way we do that might effect startup time, so I have avoided specifying a particular stripe algorithm in case we need to change that to improve the CLOG startup code. These changes have almost no negative impact on run time performance and can be implemented with minimum change. We can discuss whether the false sharing phenomena actually occurs, but the bottom line ISTM is that if we can avoid it ever occurring, for almost free, then why not? If it does occur then we know it can be destroy SMP performance and on some CPUs can lead to context switching. Comments? Best Regards, Simon Riggs
pgsql-hackers by date: