Hi,
Let me share some numbers from a few more tests. I've been experimenting
with two optimization ideas - alignment and non-temporal writes.
The first idea (alignment) is not entirely unique to PMEM - we have a
bunch of places where we align stuff to cacheline, and the same thing
does apply to PMEM. The cache lines are 64B, so I've tweaked the WAL
format to align records accordingly - the header sizes are a multiple of
64B, and the space is reserved in 64B chunks. It's a bit crude, but good
enough for experiments, I think. This means the WAL format would not be
compatible, and there's additional overhead (not sure how much).
The second idea is somewhat specific to PMEM - the pmem_memcpy provided
by libpmem allows specifying flags, determining whether the data should
go to CPU cache or not, whether it should be flushed, etc. So far the
code was using
pmem_memcpy(..., PMEM_F_MEM_NOFLUSH);
following the idea that caching data in CPU cache and then flushing it
in larger chunks is more efficient. I heard some recommendations to use
non-temporal writes (which should not use CPU cache), so I tested that
switching to
pmem_memcpy(..., PMEM_F_NON_TEMPORAL);
The experimental patches doing these things are attached, as usual.
The results are a bit better than for the preceding patches, but only by
a couple percent. That's a bit disappointing. Attached is a PDF with
charts for the three WAL segment sizes as before.
It's possible the patches are introducing some internal bottleneck, so I
plan to focus on profiling and optimizing them next. I'd welcome some
feedback with ideas what might be wrong, of course ;-)
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company