Thread: Memory Alignment in Postgres
I'm continuously studying Postgres codebase. Hopefully I'll be able to make some contributions in the future.
For now I'm intrigued about the extensive use of memory alignment. I'm sure there's some legacy and some architecture that requires it reasoning behind it.That aside, since it wastes space (a lot of space in some cases) there must be a tipping point somewhere. I'm sure one can prove aligned access is faster in a micro-benchmark but I'm not sure it's the case in a DBMS like postgres, specially in the page/rows area.
Is it worth the extra space in newer architectures (specially Intel)?
I'm trying to messing with the *ALIGN macros but so far I wasn't able to get any conclusive results. My guess is that I'm missing something in the code or pg_bench doesn't stress the difference enough.
--
Arthur Silva
Arthur Silva
On Tue, Sep 9, 2014 at 11:08:05AM -0300, Arthur Silva wrote: > I'm continuously studying Postgres codebase. Hopefully I'll be able to make > some contributions in the future. > > For now I'm intrigued about the extensive use of memory alignment. I'm sure > there's some legacy and some architecture that requires it reasoning behind it. > > That aside, since it wastes space (a lot of space in some cases) there must be > a tipping point somewhere. I'm sure one can prove aligned access is faster in a > micro-benchmark but I'm not sure it's the case in a DBMS like postgres, > specially in the page/rows area. > > Just for the sake of comparison Mysql COMPACT storage (default and recommended > since 5.5) doesn't align data at all. Mysql NDB uses a fixed 4-byte alignment. > Not sure about Oracle and others. > > Is it worth the extra space in newer architectures (specially Intel)? > Do you guys think this is something worth looking at? > > I'm trying to messing with the *ALIGN macros but so far I wasn't able to get > any conclusive results. My guess is that I'm missing something in the code or > pg_bench doesn't stress the difference enough. Postgres reads data block from disk and puts them in shared memory, then the CPU accesses those values, like floats and integers, as though they were in allocated memory, i.e. we make no adjustments to the data from disk all the way to CPU. I don't think anyone has measured the overhead of doing less alignment, but I would be interested to see any test results produced. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Tue, Sep 9, 2014 at 10:08 AM, Arthur Silva <arthurprs@gmail.com> wrote: > I'm continuously studying Postgres codebase. Hopefully I'll be able to make > some contributions in the future. > > For now I'm intrigued about the extensive use of memory alignment. I'm sure > there's some legacy and some architecture that requires it reasoning behind > it. > > That aside, since it wastes space (a lot of space in some cases) there must > be a tipping point somewhere. I'm sure one can prove aligned access is > faster in a micro-benchmark but I'm not sure it's the case in a DBMS like > postgres, specially in the page/rows area. > > Just for the sake of comparison Mysql COMPACT storage (default and > recommended since 5.5) doesn't align data at all. Mysql NDB uses a fixed > 4-byte alignment. Not sure about Oracle and others. > > Is it worth the extra space in newer architectures (specially Intel)? > Do you guys think this is something worth looking at? Yes. At least in my opinion, though, it's not a good project for a beginner. If you get your changes to take effect, you'll find that a lot of things will break in places that are not easy to find or fix. You're getting into really low-level areas of the system that get touched infrequently and require a lot of expertise in how things work today to adjust. The idea I've had before is to try to reduce the widest alignment we ever require from 8 bytes to 4 bytes. That is, look for types with typalign = 'd', and rewrite them to have typalign = 'i' by having them use two 4-byte loads to load an eight-byte value. In practice, I think this would probably save a high percentage of what can be saved, because 8-byte alignment implies a maximum of 7 bytes of wasted space, while 4-byte alignment implies a maximum of 3 bytes of wasted space. And it would probably be pretty cheap, too, because any type with less than 8 byte alignment wouldn't be affected at all, and even those types that were affected would only be slightly slowed down by doing two loads instead of one. In contrast, getting rid of alignment requirements completely would save a little more space, but probably at the cost of a lot more slowdown: any type with alignment requirements would have to fetch the value byte-by-byte instead of pulling the whole thing out at once. But there are a couple of obvious problems with this idea, too, such as: 1. It's really complicated and a ton of work. 2. It would break pg_upgrade pretty darn badly unless we employed some even-more-complex strategy to mitigate that. 3. The savings might not be enough to justify the effort. It might be interesting for someone to develop a tool measuring the number of bytes of alignment padding we lose per tuple or per page and gather some statistics on it on various databases. That would give us some sense as to the possible savings. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 10, 2014 at 11:43:52AM -0400, Robert Haas wrote: > But there are a couple of obvious problems with this idea, too, such as: > > 1. It's really complicated and a ton of work. > 2. It would break pg_upgrade pretty darn badly unless we employed some > even-more-complex strategy to mitigate that. > 3. The savings might not be enough to justify the effort. > > It might be interesting for someone to develop a tool measuring the > number of bytes of alignment padding we lose per tuple or per page and > gather some statistics on it on various databases. That would give us > some sense as to the possible savings. And will we ever implement a logical attribute system so we can reorder the stored attribtes to minimize wasted space? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Wed, Sep 10, 2014 at 4:29 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Wed, Sep 10, 2014 at 11:43:52AM -0400, Robert Haas wrote: >> But there are a couple of obvious problems with this idea, too, such as: >> >> 1. It's really complicated and a ton of work. >> 2. It would break pg_upgrade pretty darn badly unless we employed some >> even-more-complex strategy to mitigate that. >> 3. The savings might not be enough to justify the effort. >> >> It might be interesting for someone to develop a tool measuring the >> number of bytes of alignment padding we lose per tuple or per page and >> gather some statistics on it on various databases. That would give us >> some sense as to the possible savings. > > And will we ever implement a logical attribute system so we can reorder > the stored attribtes to minimize wasted space? You forgot to attach the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 10, 2014 at 12:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I thought all memory alignment was (or at least the bulk of it) handled using some codebase wide macros/settings, otherwise how could different parts of the code inter-op? Poking this area might suffice for some initial testing to check if it's worth any more attention.
Unaligned memory access received a lot attention in Intel post-Nehalen era. So it may very well pay off on Intel servers. You might find this blog post and it's comments/external-links interesting http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
Very true.
On Tue, Sep 9, 2014 at 10:08 AM, Arthur Silva <arthurprs@gmail.com> wrote:
> I'm continuously studying Postgres codebase. Hopefully I'll be able to make
> some contributions in the future.
>
> For now I'm intrigued about the extensive use of memory alignment. I'm sure
> there's some legacy and some architecture that requires it reasoning behind
> it.
>
> That aside, since it wastes space (a lot of space in some cases) there must
> be a tipping point somewhere. I'm sure one can prove aligned access is
> faster in a micro-benchmark but I'm not sure it's the case in a DBMS like
> postgres, specially in the page/rows area.
>
> Just for the sake of comparison Mysql COMPACT storage (default and
> recommended since 5.5) doesn't align data at all. Mysql NDB uses a fixed
> 4-byte alignment. Not sure about Oracle and others.
>
> Is it worth the extra space in newer architectures (specially Intel)?
> Do you guys think this is something worth looking at?
Yes. At least in my opinion, though, it's not a good project for a
beginner. If you get your changes to take effect, you'll find that a
lot of things will break in places that are not easy to find or fix.
You're getting into really low-level areas of the system that get
touched infrequently and require a lot of expertise in how things work
today to adjust.
I thought all memory alignment was (or at least the bulk of it) handled using some codebase wide macros/settings, otherwise how could different parts of the code inter-op? Poking this area might suffice for some initial testing to check if it's worth any more attention.
Unaligned memory access received a lot attention in Intel post-Nehalen era. So it may very well pay off on Intel servers. You might find this blog post and it's comments/external-links interesting http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
I'm a newbie in the codebase, so please let me know if I'm saying anything non-sense.
The idea I've had before is to try to reduce the widest alignment we
ever require from 8 bytes to 4 bytes. That is, look for types with
typalign = 'd', and rewrite them to have typalign = 'i' by having them
use two 4-byte loads to load an eight-byte value. In practice, I
think this would probably save a high percentage of what can be saved,
because 8-byte alignment implies a maximum of 7 bytes of wasted space,
while 4-byte alignment implies a maximum of 3 bytes of wasted space.
And it would probably be pretty cheap, too, because any type with less
than 8 byte alignment wouldn't be affected at all, and even those
types that were affected would only be slightly slowed down by doing
two loads instead of one. In contrast, getting rid of alignment
requirements completely would save a little more space, but probably
at the cost of a lot more slowdown: any type with alignment
requirements would have to fetch the value byte-by-byte instead of
pulling the whole thing out at once.
Does byte-by-byte access stand true nowadays? I though modern processors would fetch memory at very least in "word" sized chunks, so 4/8 bytes then merge-slice.
But there are a couple of obvious problems with this idea, too, such as:
1. It's really complicated and a ton of work.
2. It would break pg_upgrade pretty darn badly unless we employed some
even-more-complex strategy to mitigate that.
3. The savings might not be enough to justify the effort.
Very true.
It might be interesting for someone to develop a tool measuring the
number of bytes of alignment padding we lose per tuple or per page and
gather some statistics on it on various databases. That would give us
some sense as to the possible savings.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Sep 11, 2014 at 9:32 AM, Arthur Silva <arthurprs@gmail.com> wrote: > I thought all memory alignment was (or at least the bulk of it) handled > using some codebase wide macros/settings, otherwise how could different > parts of the code inter-op? Poking this area might suffice for some initial > testing to check if it's worth any more attention. Well, sure, but the issues aren't too simple. For example, I think there are cases where we rely on the alignment bytes being zero to distinguish between an aligned value following and an unaligned toasted value. That stuff can make your head explode, or at least mine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-09-11 10:32:24 -0300, Arthur Silva wrote: > Unaligned memory access received a lot attention in Intel post-Nehalen era. > So it may very well pay off on Intel servers. You might find this blog post > and it's comments/external-links interesting > http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/ FWIW, the reported results of imo pretty meaningless for postgres. It's sequential access over larger amount of memory. I.e. a perfectly prefetchable workload where it doesn't matter if superflous cachelines are fetched because they're going to be needed next round anyway. In many production workloads one of the most busy accesses to individual datums is the binary search on individual pages during index lookups. That's pretty much exactly the contrary to the above. Not saying that it's not going to be a benefit in many scenarios, but it's far from being as simple as saying that unaligned accesses on their own aren't penalized anymore. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 11, 2014 at 8:32 AM, Arthur Silva <arthurprs@gmail.com> wrote: > > On Wed, Sep 10, 2014 at 12:43 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Tue, Sep 9, 2014 at 10:08 AM, Arthur Silva <arthurprs@gmail.com> wrote: >> > I'm continuously studying Postgres codebase. Hopefully I'll be able to >> > make >> > some contributions in the future. >> > >> > For now I'm intrigued about the extensive use of memory alignment. I'm >> > sure >> > there's some legacy and some architecture that requires it reasoning >> > behind >> > it. >> > >> > That aside, since it wastes space (a lot of space in some cases) there >> > must >> > be a tipping point somewhere. I'm sure one can prove aligned access is >> > faster in a micro-benchmark but I'm not sure it's the case in a DBMS >> > like >> > postgres, specially in the page/rows area. >> > >> > Just for the sake of comparison Mysql COMPACT storage (default and >> > recommended since 5.5) doesn't align data at all. Mysql NDB uses a fixed >> > 4-byte alignment. Not sure about Oracle and others. >> > >> > Is it worth the extra space in newer architectures (specially Intel)? >> > Do you guys think this is something worth looking at? >> >> Yes. At least in my opinion, though, it's not a good project for a >> beginner. If you get your changes to take effect, you'll find that a >> lot of things will break in places that are not easy to find or fix. >> You're getting into really low-level areas of the system that get >> touched infrequently and require a lot of expertise in how things work >> today to adjust. > > > I thought all memory alignment was (or at least the bulk of it) handled > using some codebase wide macros/settings, otherwise how could different > parts of the code inter-op? Poking this area might suffice for some initial > testing to check if it's worth any more attention. > > Unaligned memory access received a lot attention in Intel post-Nehalen era. > So it may very well pay off on Intel servers. You might find this blog post > and it's comments/external-links interesting > http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/ > > I'm a newbie in the codebase, so please let me know if I'm saying anything > non-sense. Be advised of the difficulties you are going to face here. Assuming for a second there is no reason not to go unaligned on Intel and there are material benefits to justify the effort, that doesn't necessarily hold for other platforms like arm/power. Even though intel handles the vast majority of installations it's not gonna fly to optimize for that platform at the expense of others so there'd have to be some kind of compile time setting to control alignment behavior. That being said, if you could pull this off cleanly, it'd be pretty neat. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > Be advised of the difficulties you are going to face here. Assuming > for a second there is no reason not to go unaligned on Intel and there > are material benefits to justify the effort, that doesn't necessarily > hold for other platforms like arm/power. Note that on many (most?) non-Intel architectures, unaligned access is simply not an option. The chips themselves will throw SIGBUS or equivalent if you try it. Some kernels provide signal handlers that emulate the unaligned access in software rather than killing the process; but the performance consequences of hitting such traps more than very occasionally would be catastrophic. Even on Intel, I'd wonder what unaligned accesses do to atomicity guarantees and suchlike. This is not a big deal for row data storage, but we'd have to be careful about it if we were to back off alignment requirements for in-memory data structures such as latches and buffer headers. Another fun thing you'd need to deal with is ensuring that the C structs we overlay onto catalog data rows still match up with the data layout rules. On the whole, I'm pretty darn skeptical that such an effort would repay itself. There are lots of more promising things to hack on. regards, tom lane
On 2014-09-11 11:39:12 -0400, Tom Lane wrote: > Even on Intel, I'd wonder what unaligned accesses do to atomicity > guarantees and suchlike. They pretty much kill atomicity guarantees. Atomicity is guaranteed while you're inside a cacheline, but not once you span them. > This is not a big deal for row data storage, > but we'd have to be careful about it if we were to back off alignment > requirements for in-memory data structures such as latches and buffer > headers. Right. I don't think that's an option. > Another fun thing you'd need to deal with is ensuring that the C structs > we overlay onto catalog data rows still match up with the data layout > rules. Yea, this would require some nastyness in the bki generation, but it'd probably doable to have different alignment for system catalogs. > On the whole, I'm pretty darn skeptical that such an effort would repay > itself. There are lots of more promising things to hack on. I have no desire to hack on it, but I can understand the desire to reduce the space overhead... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 11, 2014 at 11:27 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-09-11 10:32:24 -0300, Arthur Silva wrote:
> Unaligned memory access received a lot attention in Intel post-Nehalen era.
> So it may very well pay off on Intel servers. You might find this blog post
> and it's comments/external-links interesting
> http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
FWIW, the reported results of imo pretty meaningless for postgres. It's
sequential access over larger amount of memory. I.e. a perfectly
prefetchable workload where it doesn't matter if superflous cachelines
are fetched because they're going to be needed next round anyway.
In many production workloads one of the most busy accesses to individual
datums is the binary search on individual pages during index
lookups. That's pretty much exactly the contrary to the above.
Not saying that it's not going to be a benefit in many scenarios, but
it's far from being as simple as saying that unaligned accesses on their
own aren't penalized anymore.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
I modified the test code to use a completely random scan pattern to test something that completely trashes the cache. Not realistic but still confirms the hypothesis that the overhead is minimal on modern Intel.
------------------ test results compiling for 32bit ------------------
processing word of size 2
offset = 0
average time for offset 0 is 422.7
offset = 1
average time for offset 1 is 422.85
processing word of size 4
offset = 0
average time for offset 0 is 436.6
offset = 1
average time for offset 1 is 451
offset = 2
average time for offset 2 is 444.3
offset = 3
average time for offset 3 is 441.9
processing word of size 8
offset = 0
average time for offset 0 is 630.15
offset = 1
average time for offset 1 is 653
offset = 2
average time for offset 2 is 655.5
offset = 3
average time for offset 3 is 660.85
offset = 4
average time for offset 4 is 650.1
offset = 5
average time for offset 5 is 656.9
offset = 6
average time for offset 6 is 656.6
offset = 7
average time for offset 7 is 656.9
------------------ test results compiling for 64bit ------------------
processing word of size 2
offset = 0
average time for offset 0 is 402.55
offset = 1
average time for offset 1 is 406.9
processing word of size 4
offset = 0
average time for offset 0 is 424.05
offset = 1
average time for offset 1 is 436.55
offset = 2
average time for offset 2 is 435.1
offset = 3
average time for offset 3 is 435.3
processing word of size 8
offset = 0
average time for offset 0 is 444.9
offset = 1
average time for offset 1 is 470.25
offset = 2
average time for offset 2 is 468.95
offset = 3
average time for offset 3 is 476.75
offset = 4
average time for offset 4 is 474.9
offset = 5
average time for offset 5 is 468.25
offset = 6
average time for offset 6 is 469.8
offset = 7
average time for offset 7 is 469.1
offset = 0
average time for offset 0 is 422.7
offset = 1
average time for offset 1 is 422.85
processing word of size 4
offset = 0
average time for offset 0 is 436.6
offset = 1
average time for offset 1 is 451
offset = 2
average time for offset 2 is 444.3
offset = 3
average time for offset 3 is 441.9
processing word of size 8
offset = 0
average time for offset 0 is 630.15
offset = 1
average time for offset 1 is 653
offset = 2
average time for offset 2 is 655.5
offset = 3
average time for offset 3 is 660.85
offset = 4
average time for offset 4 is 650.1
offset = 5
average time for offset 5 is 656.9
offset = 6
average time for offset 6 is 656.6
offset = 7
average time for offset 7 is 656.9
------------------ test results compiling for 64bit ------------------
processing word of size 2
offset = 0
average time for offset 0 is 402.55
offset = 1
average time for offset 1 is 406.9
processing word of size 4
offset = 0
average time for offset 0 is 424.05
offset = 1
average time for offset 1 is 436.55
offset = 2
average time for offset 2 is 435.1
offset = 3
average time for offset 3 is 435.3
processing word of size 8
offset = 0
average time for offset 0 is 444.9
offset = 1
average time for offset 1 is 470.25
offset = 2
average time for offset 2 is 468.95
offset = 3
average time for offset 3 is 476.75
offset = 4
average time for offset 4 is 474.9
offset = 5
average time for offset 5 is 468.25
offset = 6
average time for offset 6 is 469.8
offset = 7
average time for offset 7 is 469.1
Attachment
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Sep 11, 2014 at 12:39 PM, Tom Lane <span dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>MerlinMoncure <<a href="mailto:mmoncure@gmail.com" target="_blank">mmoncure@gmail.com</a>>writes:<br /> > Be advised of the difficulties you are going to face here. Assuming<br /> > for a second there is no reason not to go unaligned on Intel and there<br /> > are materialbenefits to justify the effort, that doesn't necessarily<br /> > hold for other platforms like arm/power.<br /><br/></span>Note that on many (most?) non-Intel architectures, unaligned access is<br /> simply not an option. The chipsthemselves will throw SIGBUS or<br /> equivalent if you try it. Some kernels provide signal handlers that<br /> emulatethe unaligned access in software rather than killing the process;<br /> but the performance consequences of hittingsuch traps more than very<br /> occasionally would be catastrophic. <br /></blockquote><blockquote class="gmail_quote"style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br /> Even onIntel, I'd wonder what unaligned accesses do to atomicity<br /> guarantees and suchlike. This is not a big deal for rowdata storage,<br /> but we'd have to be careful about it if we were to back off alignment<br /> requirements for in-memorydata structures such as latches and buffer<br /> headers.<br /><br /> Another fun thing you'd need to deal withis ensuring that the C structs<br /> we overlay onto catalog data rows still match up with the data layout<br /> rules.<br/><br /> On the whole, I'm pretty darn skeptical that such an effort would repay<br /> itself. There are lots ofmore promising things to hack on.<br /><br /> regards, tom lane<br /></blockquote></div><br />IndeedI don't know any other architectures that this would be at an option. So if this ever moves forward it must be turnedon at compile time for x86-64 only. I wonder how the Mysql handle their rows even on those architectures as their storageformat is completely packed.<br /><br /></div><div class="gmail_extra">If we just reduced the alignment requirementswhen laying out columns in the rows and indexes by reducing/removing padding -- typalign, it'd be enough gainin my (humble) opinion.<br /><br /></div><div class="gmail_extra">If you think alignment is not an issue you can seesaving everywhere, which is kinda insane...<br /></div><div class="gmail_extra"><br />I'm unsure how this equates in patchcomplexity, but judging by the reactions so far I'm assuming a lot.<br /><br /></div></div>
On Thu, Sep 11, 2014 at 02:54:36PM -0300, Arthur Silva wrote: > Indeed I don't know any other architectures that this would be at an > option. So if this ever moves forward it must be turned on at compile time > for x86-64 only. I wonder how the Mysql handle their rows even on those > architectures as their storage format is completely packed. > > If we just reduced the alignment requirements when laying out columns in > the rows and indexes by reducing/removing padding -- typalign, it'd be > enough gain in my (humble) opinion. > > If you think alignment is not an issue you can see saving everywhere, which > is kinda insane... > > I'm unsure how this equates in patch complexity, but judging by the > reactions so far I'm assuming a lot. If the column order in the table was independent of the physical layout, it would be possible to order columns to reduce the padding needed. Not my suggestion, just repeating a valid comment from earlier in the thread. Regards, Ken