Thread: Question on alignment
In copydir.c:copy_file() I read /* Use palloc to ensure we get a maxaligned buffer */ buffer = palloc(COPY_BUF_SIZE); No data type wider than a single byte is used to access the data in the buffer, and neither read() nor write() should require any specific alignment. Can someone please explain why alignment matters here? -- Antonin Houska Web: https://www.cybertec-postgresql.com
On 01/04/2019 11:01, Antonin Houska wrote: > In copydir.c:copy_file() I read > > /* Use palloc to ensure we get a maxaligned buffer */ > buffer = palloc(COPY_BUF_SIZE); > > No data type wider than a single byte is used to access the data in the > buffer, and neither read() nor write() should require any specific alignment. > Can someone please explain why alignment matters here? An aligned buffer can allow optimizations in the kernel, when it copies the data. So it's not strictly required, but potentially makes the read() and write() faster. - Heikki
Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 01/04/2019 11:01, Antonin Houska wrote: > > In copydir.c:copy_file() I read > > > > /* Use palloc to ensure we get a maxaligned buffer */ > > buffer = palloc(COPY_BUF_SIZE); > > > > No data type wider than a single byte is used to access the data in the > > buffer, and neither read() nor write() should require any specific alignment. > > Can someone please explain why alignment matters here? > > An aligned buffer can allow optimizations in the kernel, when it copies the > data. So it's not strictly required, but potentially makes the read() and > write() faster. Thanks. Your response reminds me of buffer alignment: /* * Preferred alignment for disk I/O buffers. On some CPUs, copies between * user space and kernel space are significantly faster if the user buffer * is aligned on a larger-than-MAXALIGN boundary. Ideally this should be * a platform-dependent value, but for now we just hard-wire it. */ #define ALIGNOF_BUFFER 32 Is this what you mean? Since palloc() only ensures MAXIMUM_ALIGNOF, that wouldn't help here anyway. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Antonin Houska <ah@cybertec.at> wrote: > Since palloc() only ensures MAXIMUM_ALIGNOF, that wouldn't help here anyway. After some more search I'm not sure about that. The following comment indicates that MAXALIGN helps too: /* * Use this, not "char buf[BLCKSZ]", to declare a field or local variable * holding a page buffer, if that page might be accessed as a page and not * just a string of bytes. Otherwise the variable might be under-aligned, * causing problems on alignment-picky hardware. (In some places, we use * this to declare buffers even though we only pass them to read() and * write(), because copying to/from aligned buffers is usually faster than * using unaligned buffers.) We include both "double" and "int64" in the * union to ensure that the compiler knows the value must be MAXALIGN'ed * (cf. configure's computation of MAXIMUM_ALIGNOF). */ typedef union PGAlignedBlock { char data[BLCKSZ]; double force_align_d; int64 force_align_i64; } PGAlignedBlock; -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Mon, Apr 01, 2019 at 02:38:30PM +0200, Antonin Houska wrote: > After some more search I'm not sure about that. The following comment > indicates that MAXALIGN helps too: The performance argument is true, now the reason why PGAlignedBlock has been introduced is here: https://www.postgresql.org/message-id/1535618100.1286.3.camel@credativ.de -- Michael
Attachment
Antonin Houska <ah@cybertec.at> writes: > Antonin Houska <ah@cybertec.at> wrote: >> Since palloc() only ensures MAXIMUM_ALIGNOF, that wouldn't help here anyway. > After some more search I'm not sure about that. The following comment > indicates that MAXALIGN helps too: Well, there is more than one thing going on here, and more than one level of potential optimization. On just about any hardware I know, misalignment below the machine's natural word width is going to cost cycles in memcpy (or whatever equivalent the kernel is using). Intel CPUs tend to throw many many transistors at minimizing such costs, but that still doesn't make it zero. On some hardware, you can get further speedups with alignment to a bigger-than-word-width boundary, allowing memcpy to use specialized instructions (SSE2 stuff on Intel, IIRC). But there's a point of diminishing returns there, plus it takes extra work and more wasted space to arrange for anything to have extra alignment. So we generally only bother with ALIGNOF_BUFFER for shared buffers. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Antonin Houska <ah@cybertec.at> writes: > > Antonin Houska <ah@cybertec.at> wrote: > >> Since palloc() only ensures MAXIMUM_ALIGNOF, that wouldn't help here anyway. > > > After some more search I'm not sure about that. The following comment > > indicates that MAXALIGN helps too: > > Well, there is more than one thing going on here, and more than one > level of potential optimization. On just about any hardware I know, > misalignment below the machine's natural word width is going to cost > cycles in memcpy (or whatever equivalent the kernel is using). Intel > CPUs tend to throw many many transistors at minimizing such costs, but > that still doesn't make it zero. On some hardware, you can get further > speedups with alignment to a bigger-than-word-width boundary, allowing > memcpy to use specialized instructions (SSE2 stuff on Intel, IIRC). > But there's a point of diminishing returns there, plus it takes extra > work and more wasted space to arrange for anything to have extra > alignment. Thanks for this summary. > So we generally only bother with ALIGNOF_BUFFER for shared buffers. ok, I'll consider this a (reasonable) convention. -- Antonin Houska Web: https://www.cybertec-postgresql.com