Striping CLOG and Subtrans - Mailing list pgsql-hackers

From Simon Riggs
Subject Striping CLOG and Subtrans
Date
Msg-id 1133615666.2906.765.camel@localhost.localdomain
Whole thread Raw
Responses Re: Striping CLOG and Subtrans
List pgsql-hackers
Transaction Ids are assigned consecutively. As a result, access to the
bitmaps for CLOG and Subtrans will have a tendency to access the
just-recently allocated end of those data structures, which are updated
on successful transaction creation/completion (high level view).

Earlier we found that CPU false sharing has a performance and
scalability reducing effect. Any cache line that needs to be shared
across multiple backends causes additional memory accesses and cache
line bouncing.

Currently, the layout of CLOG and Subtrans is also consecutive, i.e. an
XId and XId+1 are adjacent in memory.

Within each cache line for CLOG, we will have about 256 transactions,
assuming the lowest cache line of any modern CPU, 64 bytes. If all
transactions are roughly the same duration, then we may get a worst case
that nearly all of those transactions require nearly simultaneous
access. We can imagine many situations where we are far from the worst
case, but there will be times when contention for those cache lines is
high. The actual situation seems likely to be multiple overlayed Normal
distributions, but in any case: there will be "hot spots". These hot
spots could be responsible for extended locking times, which then cause
contention for the LWlocks protecting CLOG and Subtrans. That contention
effects everybody.

I propose that we lay the XIds out differently within the CLOG and
Subtrans, so that XId and XId+1 are at least one cache line apart. In
this way, access to 256 adjacent XId numbers would access 256 different
cache lines within one page of the CLOG.

The layout implied by TransactionIdToPage() would remain. So we align
the XIds in consecutive pages, but make the on-page layout significantly
different - but retain the same size and other characteristics.

Thus, it would seem that this could be almost a 3 line change:
Clog.cTransactionIdToByte()TransactionIdToBIndex()
Subtrans.cTransactionIdToEntry()

These would be changed so that we stripe the XIds across the page,
rather than assume they are sequential. The exact striping mechanism is
just some simple modulo arithmetic based upon a multiple of
ALIGNOF_BUFFER.

The Subtrans code makes no assumption about the way in which XIds are
located, but there is an assumption within the CLOG code: At startup
time, all XIds on the current page higher than the current XId are
zeroed by writing zeroes to higher bytes. I would propose that we still
perform the logical operation of zeroing later XIds, but in a slightly
different physical form. The way we do that might effect startup time,
so I have avoided specifying a particular stripe algorithm in case we
need to change that to improve the CLOG startup code. 

These changes have almost no negative impact on run time performance and
can be implemented with minimum change. We can discuss whether the false
sharing phenomena actually occurs, but the bottom line ISTM is that if
we can avoid it ever occurring, for almost free, then why not? If it
does occur then we know it can be destroy SMP performance and on some
CPUs can lead to context switching.

Comments?

Best Regards, Simon Riggs





pgsql-hackers by date:

Previous
From: Hans-Juergen Schoenig
Date:
Subject: Re: generalizing the planner knobs
Next
From: Nicolai Tufar
Date:
Subject: snprintf() argument reordering not working under Windows in 8.1