Thread: "Truncated" tuples for tuple hash tables
While looking at the recently-noticed problem that HashAggregate nodes store more columns of the input than they need to, I couldn't help noticing how much of the hashtable space goes into HeapTuple header overhead. A couple months ago we were able to get a useful improvement in sorting by not storing unnecessary header fields in sort files, and I'm strongly tempted to do the same in tuple hash tables. Unlike the case with sort temp files, it's important to be able to access the stored data without moving/copying it. So, not wishing to duplicate all the tuple access machinery we have already, I'm envisioning a compromise design that leaves a couple bytes on the table but looks enough like a standard tuple to be directly usable. Specifically, something like this: typedef struct TruncatedTupleData { uint32 t_len; /* length of tuple */ char pad[...]; /* see below */ int16 t_natts; /* number of attributes */ ... the rest matching HeapTupleHeaderData ... } The padding would be chosen such that the offset of t_natts would have the same value modulo MAXIMUM_ALIGNOF as it has in HeapTupleHeaderData. This ensures that if a TruncatedTuple is stored starting on a MAXALIGN boundary, data within it is correctly aligned the same as it would be in a normal tuple. With the current struct definitions, 2 bytes of padding would be needed on all supported platforms, and a TruncatedTuple would be 16 bytes shorter than a regular tuple. However, because we are also eliminating a HeapTupleData struct, the total savings in tuple hash tables would be 36 to 40 bytes per tuple. To make use of a TruncatedTuple, we'd set up a temporary HeapTupleData struct with its t_data field pointing 16 bytes before the start of the TruncatedTuple. As long as the code using it never tries to access any of the missing fields (t_xmin through t_ctid), this would work exactly like a normal HeapTuple. Going forward, we'd have to be careful to preserve the existing field-ordering separation between transaction status fields and data content fields in tuple headers, but that's just a matter of adding some comments to htup.h. It's tempting to think about using this same representation for tuples stored in memory by tuplesort.c and tuplestore.c. That'd probably require some changes in their APIs, but I think it's doable. Comments? Anyone think this is too ugly for words? regards, tom lane
On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote: > While looking at the recently-noticed problem that HashAggregate nodes > store more columns of the input than they need to, I couldn't help > noticing how much of the hashtable space goes into HeapTuple header > overhead. A couple months ago we were able to get a useful improvement > in sorting by not storing unnecessary header fields in sort files, and > I'm strongly tempted to do the same in tuple hash tables. > > Unlike the case with sort temp files, it's important to be able to > access the stored data without moving/copying it. So, not wishing to > duplicate all the tuple access machinery we have already, I'm > envisioning a compromise design that leaves a couple bytes on the table > but looks enough like a standard tuple to be directly usable. I considered this, but ran into the problem that heap_getattr fell back to fastgetattr, which wouldn't know what kind of tuple it was given. Now, if you're going to add a special heap_getattr for these tuples, then ofcourse there's no problem. Maybe create a version of heap_getattr that takes the fallback function as a parameter? Anyway, I think it's a good idea. Most places in the backend after the SeqScan/IndexScan node really don't care about most of the header fields and being able to drop them would be nice. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote: >> Unlike the case with sort temp files, it's important to be able to >> access the stored data without moving/copying it. So, not wishing to >> duplicate all the tuple access machinery we have already, I'm >> envisioning a compromise design that leaves a couple bytes on the table >> but looks enough like a standard tuple to be directly usable. > I considered this, but ran into the problem that heap_getattr fell back > to fastgetattr, which wouldn't know what kind of tuple it was given. > Now, if you're going to add a special heap_getattr for these tuples, > then ofcourse there's no problem. No, I'm specifically *not* going to do that. AFAICS the proposed representation requires no changes in heap_getattr; if it did, it'd be too invasive to contemplate :-(. It should look exactly like any other HeapTuple structure, except that the "transaction status" fields will contain garbage. Do you see something I missed? regards, tom lane
Tom Lane said: > > To make use of a TruncatedTuple, we'd set up a temporary HeapTupleData > struct with its t_data field pointing 16 bytes before the start of the > TruncatedTuple. As long as the code using it never tries to > access any > of the missing fields (t_xmin through t_ctid), this would work exactly > like a normal HeapTuple. > This sounds like a security risk. What's the worst thing that could be in those 16 bytes, and could that be used to bite (sorry) us? If those 16 bytes could be user data in another tuple, there might be an attack there.
"Bort, Paul" <pbort@tmwsystems.com> writes: > Tom Lane said: >> As long as the code using it never tries to access any >> of the missing fields (t_xmin through t_ctid), this would work exactly >> like a normal HeapTuple. > This sounds like a security risk. No more than any other wild-pointer problem. At the level of C code it's always trivial to break anything ;-). The reason we don't need to worry is that in the upper levels of the executor there is no such thing as access to system columns --- any attempt to fetch a system column is converted to a Var reference that appears in the initial table-scan node, and thereafter it's an ordinary user column. This must be so because trying to keep the system columns in their normal privileged positions breaks down as soon as you consider a join; there would only be room for one, and the query might well be trying to fetch (say) ctid from more than one table. So any code that was trying to fetch these fields would be buggy anyway. In the case of the TupleHashTable code, the only access that need be provided is through a TupleTableSlot. We could get a little bit of protection against programming mistakes by using the "virtual tuple" feature of slots to disallow attempts to fetch any system columns from a truncated tuple. I'm not sure if this would be feasible for tuplesort though; haven't looked at how it's used yet. regards, tom lane
On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote: > On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote: > > While looking at the recently-noticed problem that HashAggregate nodes > > store more columns of the input than they need to, I couldn't help > > noticing how much of the hashtable space goes into HeapTuple header > > overhead. A couple months ago we were able to get a useful improvement > > in sorting by not storing unnecessary header fields in sort files, and > > I'm strongly tempted to do the same in tuple hash tables. > Anyway, I think it's a good idea. Most places in the backend after the > SeqScan/IndexScan node really don't care about most of the header > fields and being able to drop them would be nice. I understood Tom meant to do this only for HashAgg and Tuplestore. Tom, is it possible to extend this further across the executor as Martijn suggests? That would be useful, even if it is slightly harder to measure the benefit than it is with the can-spill-to-disk cases. IMHO it would be better to call the format TupleWithoutVisibilityData so its very clear that accessing the visibility fields aren't present and any attempt to access them would be a mistake. TruncatedTuple isn't clear about what's missing or its consequences. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote: >> Anyway, I think it's a good idea. Most places in the backend after the >> SeqScan/IndexScan node really don't care about most of the header >> fields and being able to drop them would be nice. > I understood Tom meant to do this only for HashAgg and Tuplestore. Tom, > is it possible to extend this further across the executor as Martijn > suggests? That would be useful, even if it is slightly harder to measure > the benefit than it is with the can-spill-to-disk cases. There isn't any benefit except where we collect lots of tuples, which is to say tuplesort/tuplestore/tuplehashtable. In other places in the executor, there's basically only one transient tuple in existence per plan node; jumping through hoops to save 16 bytes per plan node is just silly. (What's more, as of 8.1 most of those tuples will be in "virtual tuple" format anyway, and so the optimization wouldn't make any difference at all...) > IMHO it would be better to call the format TupleWithoutVisibilityData so > its very clear that accessing the visibility fields aren't present and > any attempt to access them would be a mistake. TruncatedTuple isn't > clear about what's missing or its consequences. I'm not wedded to "TruncatedTuple", but I don't like your suggestion better; it presumes too much about what the difference might be between truncated and full tuples. Even today, I don't think "TupleWithoutVisibilityData" makes it clear that t_ctid is missing; and down the road there might be other header fields that we don't need to have in in-memory tuples. Another small problem is that given our naming conventions for structs vs pointers to structs, using "Data" as the last word of a struct name is a bad idea --- for consistency, pointers to it would be typedef TupleWithoutVisibility, which seems a bit weird. For consistency with the existing struct names, I think we need to choose a name of the form "<adjective>Tuple". I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but that seems too generic. Any other thoughts? regards, tom lane
On Mon, 2006-06-26 at 14:36 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote: > >> Anyway, I think it's a good idea. Most places in the backend after the > >> SeqScan/IndexScan node really don't care about most of the header > >> fields and being able to drop them would be nice. > > > I understood Tom meant to do this only for HashAgg and Tuplestore. Tom, > > is it possible to extend this further across the executor as Martijn > > suggests? That would be useful, even if it is slightly harder to measure > > the benefit than it is with the can-spill-to-disk cases. > > There isn't any benefit OK, see that... > I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but > that seems too generic. Any other thoughts? I like MemoryTuple but since we only use it when we go to disk... ExecutorTuple, MinimalTuple, DataOnlyTuple, MultTuple, TempFileTuple Pick one: I'm sorry I opined. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > On Mon, 2006-06-26 at 14:36 -0400, Tom Lane wrote: >> I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but >> that seems too generic. Any other thoughts? > I like MemoryTuple but since we only use it when we go to disk... > ExecutorTuple, MinimalTuple, DataOnlyTuple, MultTuple, TempFileTuple MinimalTuple seems good to me ... regards, tom lane
I wrote: > There isn't any benefit except where we collect lots of tuples, which is > to say tuplesort/tuplestore/tuplehashtable. In other places in the > executor, there's basically only one transient tuple in existence per > plan node; jumping through hoops to save 16 bytes per plan node is just > silly. (What's more, as of 8.1 most of those tuples will be in "virtual > tuple" format anyway, and so the optimization wouldn't make any > difference at all...) After further study of the code, here's my hit-list of places that could make worthwhile use of MinimalTuples: tuplesort.c (in-memory, on-disk case done already)tuplestore.c (in-memory and on-disk)TupleHashTable (execGrouping.c ---used by nodeAgg and nodeSubplan)hash joins (in-memory hash table and tuple "batch" files)analyze.c (tuples collected in-memoryfor stats analysis) It looks like there is actually not anyplace else in the executor where we "materialize" tuples anymore, except for execMain.c's INSERT/UPDATE code, which of course is going to want full tuples it can stash on disk. Everything else is dealing in TupleTableSlots that probably contain virtual tuples. So in one sense this *is* "all across the executor". But the amount of code to touch seems pretty limited. regards, tom lane
On Mon, 2006-06-26 at 16:43 -0400, Tom Lane wrote: > I wrote: > > There isn't any benefit except where we collect lots of tuples, which is > > to say tuplesort/tuplestore/tuplehashtable. In other places in the > > executor, there's basically only one transient tuple in existence per > > plan node; jumping through hoops to save 16 bytes per plan node is just > > silly. (What's more, as of 8.1 most of those tuples will be in "virtual > > tuple" format anyway, and so the optimization wouldn't make any > > difference at all...) > > After further study of the code, here's my hit-list of places that could > make worthwhile use of MinimalTuples: > > tuplesort.c (in-memory, on-disk case done already) > tuplestore.c (in-memory and on-disk) > TupleHashTable (execGrouping.c --- used by nodeAgg and nodeSubplan) > hash joins (in-memory hash table and tuple "batch" files) Thats the list I thought you meant. > analyze.c (tuples collected in-memory for stats analysis) Do we save enough there for us to care? Will that allow us to increase the sample size for larger tables? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > On Mon, 2006-06-26 at 16:43 -0400, Tom Lane wrote: >> analyze.c (tuples collected in-memory for stats analysis) > Do we save enough there for us to care? Possibly not --- it's certainly low-priority, but I listed it for completeness. regards, tom lane