Misaligned BufferDescriptors causing major performance problems on AMD - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Misaligned BufferDescriptors causing major performance problems on AMD |
Date | |
Msg-id | 20140202151319.GD32123@awork2.anarazel.de Whole thread Raw |
Responses |
Re: Misaligned BufferDescriptors causing major performance problems
on AMD
|
List | pgsql-hackers |
Hi, In the nearby thread at http://archives.postgresql.org/message-id/20140202140014.GM5930%40awork2.anarazel.de Peter and I discovered that there is a large performance difference between different max_connections on a larger machine (4x Opteron 6272, 64 cores together) in a readonly pgbench tests... Just as reference, we're talking about a performance degradation from 475963.613865 tps to 197744.913556 in a pgbench -S -cj64 just by setting max_connections to 90, from 91... On 2014-02-02 15:00:14 +0100, Andres Freund wrote: > On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote: > > Here are the results of a benchmark on Nathan Boley's 64-core, 4 > > socket server: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ > > That's interesting. The maximum number of what you see here (~293125) > is markedly lower than what I can get. > > ... poke around ... > > Hm, that's partially because you're using pgbench without -M prepared if > I see that correctly. The bottleneck in that case is primarily memory > allocation. But even after that I am getting higher > numbers: ~342497. > > Trying to nail down the differnce it oddly seems to be your > max_connections=80 vs my 100. The profile in both cases is markedly > different, way much more spinlock contention with 80. All in > Pin/UnpinBuffer(). > > I think =80 has to lead to some data being badly aligned. I can > reproduce that =91 has *much* better performance than =90. 170841.844938 > vs 368490.268577 in a 10s test. Reproducable both with an without the test. > That's certainly worth some investigation. > This is *not* reproducable on the intel machine, so it might the > associativity of the L1/L2 cache on the AMD. So, I looked into this, and I am fairly certain it's because of the (mis-)alignment of the buffer descriptors. With certain max_connections settings InitBufferPool() happens to get 64byte aligned addresses, with others not. I checked the alignment with gdb to confirm that. A quick hack (attached) making BufferDescriptor 64byte aligned indeed restored performance across all max_connections settings. It's not surprising that a misaligned buffer descriptor causes problems - there'll be plenty of false sharing of the spinlocks otherwise. Curious that the the intel machine isn't hurt much by this. Now all this hinges on the fact that by a mere accident BufferDescriptors are 64byte in size: struct sbufdesc { BufferTag tag; /* 0 20 */ BufFlags flags; /* 20 2 */ uint16 usage_count; /* 22 2 */ unsigned int refcount; /* 24 4 */ int wait_backend_pid; /* 28 4 */ slock_t buf_hdr_lock; /* 32 1 */ /* XXX 3 bytes hole, try to pack */ int buf_id; /* 36 4 */ int freeNext; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ LWLock * io_in_progress_lock; /* 48 8 */ LWLock * content_lock; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ /* size: 64, cachelines: 1, members: 10 */ /* sum members: 57, holes: 2, sum holes: 7 */ }; We could polish up the attached patch and apply it to all the branches, the costs of memory are minimal. But I wonder if we shouldn't instead make ShmemInitStruct() always return cacheline aligned addresses. That will require some fiddling, but it might be a good idea nonetheless? I think we should also consider some more reliable measures to have BufferDescriptors cacheline sized, rather than relying on the happy accident. Debugging alignment issues isn't fun, too much of a guessing game... Thoughts? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: