Thread: Ensuring hash tuples are properly maxaligned
I've been poking around in the PHJ code trying to identify the reason why there are still so many buildfarm failures. I've not nailed it down yet, but one thing I did notice is that there's an entirely undocumented assumption that offsetof(HashMemoryChunkData, data) is maxalign'ed. If it isn't, the allocation code will give back non- maxaligned pointers, which will appear to work as long as you only test on alignment-lax processors. Now, the existing definition of the struct seems safe on all architectures we support, but it would not take much to break it. I think we ought to do what we did recently in the memory-context code: insert an explicit padding calculation and a static assertion that it's right. Hence, the attached proposed patch (in which I also failed to resist the temptation to make the two code paths in dense_alloc() look more alike). Any objections? regards, tom lane diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c index 4e1a280..6aad152 100644 *** a/src/backend/executor/nodeHash.c --- b/src/backend/executor/nodeHash.c *************** dense_alloc(HashJoinTable hashtable, Siz *** 2651,2667 **** size = MAXALIGN(size); /* ! * If tuple size is larger than of 1/4 of chunk size, allocate a separate ! * chunk. */ if (size > HASH_CHUNK_THRESHOLD) { /* allocate new chunk and put it at the beginning of the list */ newChunk = (HashMemoryChunk) MemoryContextAlloc(hashtable->batchCxt, ! offsetof(HashMemoryChunkData, data) + size); newChunk->maxlen = size; ! newChunk->used = 0; ! newChunk->ntuples = 0; /* * Add this chunk to the list after the first existing chunk, so that --- 2651,2673 ---- size = MAXALIGN(size); /* ! * If the data[] field of HashMemoryChunkData has a non-maxaligned offset, ! * we'll return non-maxaligned allocations, which is bad. ! */ ! StaticAssertStmt(HASH_CHUNK_HEADER_SIZE == MAXALIGN(HASH_CHUNK_HEADER_SIZE), ! "sizeof(HashMemoryChunkData) is not maxaligned"); ! ! /* ! * If tuple size is larger than threshold, allocate a separate chunk. */ if (size > HASH_CHUNK_THRESHOLD) { /* allocate new chunk and put it at the beginning of the list */ newChunk = (HashMemoryChunk) MemoryContextAlloc(hashtable->batchCxt, ! HASH_CHUNK_HEADER_SIZE + size); newChunk->maxlen = size; ! newChunk->used = size; ! newChunk->ntuples = 1; /* * Add this chunk to the list after the first existing chunk, so that *************** dense_alloc(HashJoinTable hashtable, Siz *** 2678,2686 **** hashtable->chunks = newChunk; } - newChunk->used += size; - newChunk->ntuples += 1; - return newChunk->data; } --- 2684,2689 ---- *************** dense_alloc(HashJoinTable hashtable, Siz *** 2693,2699 **** { /* allocate new chunk and put it at the beginning of the list */ newChunk = (HashMemoryChunk) MemoryContextAlloc(hashtable->batchCxt, ! offsetof(HashMemoryChunkData, data) + HASH_CHUNK_SIZE); newChunk->maxlen = HASH_CHUNK_SIZE; newChunk->used = size; --- 2696,2702 ---- { /* allocate new chunk and put it at the beginning of the list */ newChunk = (HashMemoryChunk) MemoryContextAlloc(hashtable->batchCxt, ! HASH_CHUNK_HEADER_SIZE + HASH_CHUNK_SIZE); newChunk->maxlen = HASH_CHUNK_SIZE; newChunk->used = size; diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h index d8c82d4..cb91f79 100644 *** a/src/include/executor/hashjoin.h --- b/src/include/executor/hashjoin.h *************** typedef struct HashMemoryChunkData *** 127,139 **** dsa_pointer shared; } next; char data[FLEXIBLE_ARRAY_MEMBER]; /* buffer allocated at the end */ } HashMemoryChunkData; typedef struct HashMemoryChunkData *HashMemoryChunk; #define HASH_CHUNK_SIZE (32 * 1024L) ! #define HASH_CHUNK_HEADER_SIZE (offsetof(HashMemoryChunkData, data)) #define HASH_CHUNK_THRESHOLD (HASH_CHUNK_SIZE / 4) /* --- 127,152 ---- dsa_pointer shared; } next; + /* + * It's required that data[] start at a maxaligned offset. This padding + * calculation is correct as long as int is not wider than size_t and + * dsa_pointer is not wider than a regular pointer, but there's a static + * assertion to check things in nodeHash.c. + */ + #define HASHMEMORYCHUNKDATA_RAWSIZE (SIZEOF_SIZE_T * 3 + SIZEOF_VOID_P) + + /* ensure proper alignment by adding padding if needed */ + #if (HASHMEMORYCHUNKDATA_RAWSIZE % MAXIMUM_ALIGNOF) != 0 + char padding[MAXIMUM_ALIGNOF - HASHMEMORYCHUNKDATA_RAWSIZE % MAXIMUM_ALIGNOF]; + #endif + char data[FLEXIBLE_ARRAY_MEMBER]; /* buffer allocated at the end */ } HashMemoryChunkData; typedef struct HashMemoryChunkData *HashMemoryChunk; #define HASH_CHUNK_SIZE (32 * 1024L) ! #define HASH_CHUNK_HEADER_SIZE offsetof(HashMemoryChunkData, data) #define HASH_CHUNK_THRESHOLD (HASH_CHUNK_SIZE / 4) /*
Hi, On 2018-01-02 19:08:49 -0500, Tom Lane wrote: > Now, the existing definition of the struct seems safe on all > architectures we support, but it would not take much to break it. > I think we ought to do what we did recently in the memory-context > code: insert an explicit padding calculation and a static assertion > that it's right. Hence, the attached proposed patch (in which > I also failed to resist the temptation to make the two code paths > in dense_alloc() look more alike). Any objections? Generally no objections. > + /* > + * It's required that data[] start at a maxaligned offset. This padding > + * calculation is correct as long as int is not wider than size_t and > + * dsa_pointer is not wider than a regular pointer, but there's a static > + * assertion to check things in nodeHash.c. > + */ But note that dsa_pointer can be wider than a regular pointer on platforms without atomics support. > +#define HASHMEMORYCHUNKDATA_RAWSIZE (SIZEOF_SIZE_T * 3 + SIZEOF_VOID_P) > + > + /* ensure proper alignment by adding padding if needed */ > +#if (HASHMEMORYCHUNKDATA_RAWSIZE % MAXIMUM_ALIGNOF) != 0 > + char padding[MAXIMUM_ALIGNOF - HASHMEMORYCHUNKDATA_RAWSIZE % MAXIMUM_ALIGNOF]; > +#endif > + > char data[FLEXIBLE_ARRAY_MEMBER]; /* buffer allocated at the end */ > } HashMemoryChunkData; Wonder if this specific case wouldn't be easier handled with a union? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-01-02 19:08:49 -0500, Tom Lane wrote: >> Now, the existing definition of the struct seems safe on all >> architectures we support, but it would not take much to break it. >> I think we ought to do what we did recently in the memory-context >> code: insert an explicit padding calculation and a static assertion >> that it's right. Hence, the attached proposed patch (in which >> I also failed to resist the temptation to make the two code paths >> in dense_alloc() look more alike). Any objections? > But note that dsa_pointer can be wider than a regular pointer on > platforms without atomics support. Hm. I did not get that impression from the comments in dsa.h, but if it's true then this approach won't work --- and indeed the hash code would be actively broken in such a case, so it's a problem we must fix. The other idea I was considering was to get rid of the data[] member entirely, define HASH_CHUNK_HEADER_SIZE as MAXALIGN(sizeof(HashMemoryChunkData)), and replace the references to data[] with pointer arithmetic along the lines of ((char *) chunk) + HASH_CHUNK_HEADER_SIZE This would be a bit more invasive, but probably not too ugly if we encapsulated that pointer arithmetic in a macro. And it'd certainly be a bunch more robust against people throwing random fields into the struct. regards, tom lane
On Wed, Jan 3, 2018 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: >> But note that dsa_pointer can be wider than a regular pointer on >> platforms without atomics support. > > Hm. I did not get that impression from the comments in dsa.h, > but if it's true then this approach won't work --- and indeed the > hash code would be actively broken in such a case, so it's a problem > we must fix. Maybe Andres is thinking of dsa_pointer_atomic? dsa_pointer is normally the size of a pointer (well, really, the size of size_t), though it could be *narrower* if you don't have atomics or ask for it with USE_SMALL_DSA_POINTER -- Thomas Munro http://www.enterprisedb.com
On 2018-01-03 14:29:15 +1300, Thomas Munro wrote: > On Wed, Jan 3, 2018 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > >> But note that dsa_pointer can be wider than a regular pointer on > >> platforms without atomics support. > > > > Hm. I did not get that impression from the comments in dsa.h, > > but if it's true then this approach won't work --- and indeed the > > hash code would be actively broken in such a case, so it's a problem > > we must fix. > > Maybe Andres is thinking of dsa_pointer_atomic? dsa_pointer is > normally the size of a pointer (well, really, the size of size_t), > though it could be *narrower* if you don't have atomics or ask for it > with USE_SMALL_DSA_POINTER Yep, I was.
Andres Freund <andres@anarazel.de> writes: > On 2018-01-03 14:29:15 +1300, Thomas Munro wrote: >> On Wed, Jan 3, 2018 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Andres Freund <andres@anarazel.de> writes: >>> But note that dsa_pointer can be wider than a regular pointer on >>> platforms without atomics support. >>> Hm. I did not get that impression from the comments in dsa.h, >>> but if it's true then this approach won't work --- and indeed the >>> hash code would be actively broken in such a case, so it's a problem >>> we must fix. >> Maybe Andres is thinking of dsa_pointer_atomic? dsa_pointer is >> normally the size of a pointer (well, really, the size of size_t), >> though it could be *narrower* if you don't have atomics or ask for it >> with USE_SMALL_DSA_POINTER > Yep, I was. OK, then there's not a live bug, but I'm a bit tempted to get rid of the data[] member anyway. It's not clear to me now that keeping it results in net cleaner code. Thoughts? regards, tom lane
On 2018-01-02 20:40:50 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-01-03 14:29:15 +1300, Thomas Munro wrote: > >> On Wed, Jan 3, 2018 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Andres Freund <andres@anarazel.de> writes: > >>> But note that dsa_pointer can be wider than a regular pointer on > >>> platforms without atomics support. > > >>> Hm. I did not get that impression from the comments in dsa.h, > >>> but if it's true then this approach won't work --- and indeed the > >>> hash code would be actively broken in such a case, so it's a problem > >>> we must fix. > > >> Maybe Andres is thinking of dsa_pointer_atomic? dsa_pointer is > >> normally the size of a pointer (well, really, the size of size_t), > >> though it could be *narrower* if you don't have atomics or ask for it > >> with USE_SMALL_DSA_POINTER > > > Yep, I was. > > OK, then there's not a live bug, but I'm a bit tempted to get rid of > the data[] member anyway. It's not clear to me now that keeping it > results in net cleaner code. Thoughts? I like that plan. I don't think the data field buys us anything, and I personally in most cases find "fully manual" alignment code easier to reason about than fiddling with padding fields. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-01-02 20:40:50 -0500, Tom Lane wrote: >> OK, then there's not a live bug, but I'm a bit tempted to get rid of >> the data[] member anyway. It's not clear to me now that keeping it >> results in net cleaner code. Thoughts? > I like that plan. I don't think the data field buys us anything, and I > personally in most cases find "fully manual" alignment code easier to > reason about than fiddling with padding fields. Agreed --- I'll go do that. regards, tom lane