Thread: WIP patch: reducing overhead for repeat de-TOASTing
Attached is a worked-out patch for the approach proposed here: http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php namely, that cache management for de-TOASTed datums is handled by TupleTableSlots. To avoid premature detoasting of values that we might never need, the patch introduces a concept of an "indirect TOAST pointer", which has the same 0x80 or 0x01 header as an external TOAST pointer, but can be told apart by having a different length byte. Within that we have * pointer to original toasted field within the Slot's tuple * pointer to the owning Slot * pointer to decompressed copy, or NULL if not decompressed yet Some fairly straightforward extensions to the TupleTableSlot code, heaptuple.c, and tuptoaster.c make it all go. My original thoughts had included turning FREE_IF_COPY() into a no-op, but on investigation that seems impractical. One case that still depends on that pfree is where we have palloc'd a 4-byte-header copy of a short-header datum to support code that needs properly aligned datum content. The solution adopted in the patch is to arrange for pfree() applied to a cacheable detoasted object to be a no-op, whereas it still works normally for non-cached detoasted objects. We do this by inserting a dummy chunk header that points to a dummy memory context whose pfree support method does nothing. I think this part of the patch would be required for any toast caching method, not just this one. What I like about this patch is that it's a fairly small-footprint change, it doesn't add much overhead, and it covers caching of decompression for in-line-compressed datums as well as the out-of-line case. One thing I really *don't* like about it is that it requires everyplace that copies Datums to know about indirect pointers: in general, the copy must be a copy of the original toasted Datum, not of the indirect pointer, else we have indirect pointers that can outlive their owning TupleTableSlot (or at least outlive its current tuple cycle). There only seem to be about half a dozen such places in the current code, but still it seems a rather fragile coding rule. After playing with it for a little bit, I'm not convinced that it buys enough performance win to be worth applying --- the restriction of cache lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive. (For example, sorts that involve toasted sort keys continue to suck, because the tuples being sorted aren't in Slots.) It would probably fix the specific case that the PostGIS hackers were complaining of, but I think we need something more. Still, I wanted to get it into the archives because the idea of indirect toast pointers might be useful for something else. regards, tom lane
Attachment
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > After playing with it for a little bit, I'm not convinced that it buys > enough performance win to be worth applying --- the restriction of cache > lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive. > (For example, sorts that involve toasted sort keys continue to suck, > because the tuples being sorted aren't in Slots.) It would probably > fix the specific case that the PostGIS hackers were complaining of, > but I think we need something more. > > Still, I wanted to get it into the archives because the idea of indirect > toast pointers might be useful for something else. I do like that it handles even inline-compressed cases. What I didn't like about the managed cache was that it couldn't handle such cases. I could easily imagine the PostGIS case arising for inline compressed data structures. I wonder if it isn't worthwhile just for that case even if there's a further cache behind it for repeated fetches of out-of-line data. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Tom Lane wrote: > Attached is a worked-out patch for the approach proposed here: > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php > namely, that cache management for de-TOASTed datums is handled > by TupleTableSlots. > > To avoid premature detoasting of values that we might never need, the > patch introduces a concept of an "indirect TOAST pointer", which has > the same 0x80 or 0x01 header as an external TOAST pointer, but can > be told apart by having a different length byte. Within that we have > * pointer to original toasted field within the Slot's tuple > * pointer to the owning Slot > * pointer to decompressed copy, or NULL if not decompressed yet > Some fairly straightforward extensions to the TupleTableSlot code, > heaptuple.c, and tuptoaster.c make it all go. > > My original thoughts had included turning FREE_IF_COPY() into a no-op, > but on investigation that seems impractical. One case that still > depends on that pfree is where we have palloc'd a 4-byte-header copy > of a short-header datum to support code that needs properly aligned > datum content. The solution adopted in the patch is to arrange for > pfree() applied to a cacheable detoasted object to be a no-op, whereas > it still works normally for non-cached detoasted objects. We do this > by inserting a dummy chunk header that points to a dummy memory context > whose pfree support method does nothing. I think this part of the patch > would be required for any toast caching method, not just this one. > > What I like about this patch is that it's a fairly small-footprint > change, it doesn't add much overhead, and it covers caching of > decompression for in-line-compressed datums as well as the out-of-line > case. > > One thing I really *don't* like about it is that it requires everyplace > that copies Datums to know about indirect pointers: in general, the copy > must be a copy of the original toasted Datum, not of the indirect > pointer, else we have indirect pointers that can outlive their owning > TupleTableSlot (or at least outlive its current tuple cycle). There > only seem to be about half a dozen such places in the current code, > but still it seems a rather fragile coding rule. > > After playing with it for a little bit, I'm not convinced that it buys > enough performance win to be worth applying --- the restriction of cache > lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive. > (For example, sorts that involve toasted sort keys continue to suck, > because the tuples being sorted aren't in Slots.) It would probably > fix the specific case that the PostGIS hackers were complaining of, > but I think we need something more. > > Still, I wanted to get it into the archives because the idea of indirect > toast pointers might be useful for something else. Hi Tom, Thanks very much for supplying the WIP patch. In the interest of testing and feedback, I've just downloaded PostgreSQL CVS head and applied your patch, compiled up a slightly modified version of PostGIS (without RECHECKs on the GiST opclass) and loaded in the same dataset. Unfortunately I have to report back that with your WIP patch applied, timings seem to have become several orders of magnitude *worse*: pg84@zeno:~$ psql -d postgis psql (8.4devel) Type "help" for help. postgis=# explain analyze select count(*) from geography where centroid && (select the_geom from geography where id=69495); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=7100.28..7100.29 rows=1 width=0) (actual time=18238.932..18238.934 rows=1 loops=1) InitPlan -> Seq Scan on geography (cost=0.00..7092.00 rows=1 width=4387) (actual time=27.472..69.223 rows=1 loops=1) Filter: (id = 69495::numeric) -> Index Scan using geography_centroid_idxon geography (cost=0.00..8.27 rows=1 width=0) (actual time=118.371..18196.041 rows=32880 loops=1) Index Cond: (centroid && $0) Total runtime: 18239.918 ms (7 rows) In fact, even sequential scans seem to have gone up by several orders of magnitude: postgis=# set enable_indexscan = 'f'; SET postgis=# set enable_bitmapscan = 'f'; SET postgis=# explain analyze select count(*) from geography where centroid && (select the_geom from geography where id=69495); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Aggregate (cost=14184.01..14184.01 rows=1 width=0) (actual time=9117.022..9117.024 rows=1 loops=1) InitPlan -> Seq Scan on geography (cost=0.00..7092.00 rows=1 width=4387) (actual time=23.780..54.362 rows=1 loops=1) Filter: (id = 69495::numeric) -> Seq Scan on geography (cost=0.00..7092.00rows=1 width=0) (actual time=55.016..9073.084 rows=32880 loops=1) Filter: (centroid && $0) Total runtime: 9117.174 ms (7 rows) ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063
Mark, On Tue, Jul 1, 2008 at 1:15 PM, Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk> wrote: > Thanks very much for supplying the WIP patch. In the interest of testing and > feedback, I've just downloaded PostgreSQL CVS head and applied your patch, > compiled up a slightly modified version of PostGIS (without RECHECKs on the > GiST opclass) and loaded in the same dataset. From the discussion we had a few months ago, I don't think the no RECHECK trick works with CVS tip anymore. See my post on the "Remove lossy-operator RECHECK flag?" thread: http://archives.postgresql.org/pgsql-hackers/2008-04/msg00847.php and follow-ups. That said, perhaps it's not the only problem you have but I thought it was worth mentioning. -- Guillaume
Guillaume Smet wrote: > From the discussion we had a few months ago, I don't think the no > RECHECK trick works with CVS tip anymore. > > See my post on the "Remove lossy-operator RECHECK flag?" thread: > http://archives.postgresql.org/pgsql-hackers/2008-04/msg00847.php > and follow-ups. > > That said, perhaps it's not the only problem you have but I thought it > was worth mentioning. Yeah sorry, that might not have been as clear as I hoped. What I meant was that I removed the explicit RECHECK clause from the GiST operator class definition - since as you correctly mention, CVS tip throws an error if this is present. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063
Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk> writes: > Unfortunately I have to report back that with your WIP patch applied, > timings seem to have become several orders of magnitude *worse*: Ugh. Could I pester you to run the case under gprof or oprofile? Or, if you can give me step-by-step directions for setting up the test scenario, I could do that here. regards, tom lane
[ back on-list ] Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk> writes: > Thanks very much for supplying the WIP patch. In the interest of testing > and feedback, I've just downloaded PostgreSQL CVS head and applied your > patch, compiled up a slightly modified version of PostGIS (without > RECHECKs on the GiST opclass) and loaded in the same dataset. > Unfortunately I have to report back that with your WIP patch applied, > timings seem to have become several orders of magnitude *worse*: OK, I've reproduced the test case locally. I believe that when you say "worse", you mean "worse than 8.3", right? And you did tell me offlist that you were testing with --enable-cassert. CVS HEAD has very substantially greater cassert overhead because of the randomize_memory addition --- oprofile output for this test looks like samples % image name symbol name 1239580 78.7721 postgres randomize_mem 143544 9.1218 libc-2.7.so memcpy 48039 3.0528 libc-2.7.so memset 13838 0.8794 postgres LWLockAcquire 12176 0.7738 postgres index_getnext 11697 0.7433 postgres LWLockRelease 10406 0.6613 postgres hash_search_with_hash_value 4739 0.3012 postgres toast_fetch_datum 4099 0.2605 postgres _bt_checkkeys 3905 0.2482 postgres AllocSetAlloc 3751 0.2384 postgres PinBuffer 3545 0.2253 postgres UnpinBuffer I'm inclined to think that we'd better turn that off by default, since it's not looking like it's catching anything new. However, even with cassert turned off, the de-TOAST patch isn't helping your test case; though it does help if I turn the query into a join. Here's what I get for 8.3 HEAD HEAD + patch original query [1] 4575 4669 4613 original + force_2d [2] 196 195 201 converted to join [3] 4603 4667 209 original, indexscans off 2335 2391 2426 join, indexscans off 2350 2373 124 [1] select count(*) from geography where centroid && (select the_geom from geography where id=69495); [2] select count(*) from geography where centroid && (select force_2d(the_geom) from geography where id=69495); [3] select count(*) from geography g1, geography g2 where g1.centroid && g2.the_geom and g2.id=69495; All times in msec, median of 3 trials using psql \timing (times seem to be reproducible within about 1%, if data is already cached). Default parameters all around. The join form of the query produces the results I expected, with g2.the_geom coming from the outer side of a nestloop join and getting detoasted only once. After some digging I found the reason why the original query isn't getting any benefit: it's copying the_geom up from an InitPlan, and nodeSubplan.c does that like this: /* * We need to copy the subplan's tuple into our own context, in case * any of the params are pass-by-reftype --- the pointers stored in * the param structs will point at this copied tuple! node->curTuple * keeps track of the copied tuple for eventual freeing. */ if (node->curTuple) heap_freetuple(node->curTuple); node->curTuple = ExecCopySlotTuple(slot); /* * Now set all the setParam params from the columns of the tuple */ foreach(l, subplan->setParam) { int paramid = lfirst_int(l); ParamExecData *prm = &(econtext->ecxt_param_exec_vals[paramid]); prm->execPlan = NULL; prm->value = heap_getattr(node->curTuple, i, tdesc, &(prm->isnull)); i++; } So the Param is pointing at a bare toasted Datum, not an indirect pointer in a Slot, and there's no way to avoid detoasting each time the Param is referenced. It would be simple enough to fix nodeSubplan.c to copy the data into an upper-level Slot rather than a bare tuple. But this makes me wonder how many other places are like this. In the past there wasn't that much benefit to pulling data from a Slot instead of a bare tuple, so I'm afraid we might have a number of similar gotchas we'd have to track down. The other thing that worries me is that the force_2d query measurements suggest a runtime penalty of two or three percent for the patch, which is higher than I was hoping for. But that isn't that much more than the noise level in this test. On the whole I'm still feeling pretty discouraged about this patch ... regards, tom lane
Tom Lane wrote: > It would be simple enough to fix nodeSubplan.c to copy the data into > an upper-level Slot rather than a bare tuple. But this makes me wonder > how many other places are like this. In the past there wasn't that much > benefit to pulling data from a Slot instead of a bare tuple, so I'm > afraid we might have a number of similar gotchas we'd have to track > down. I compare this to adding CHECK_FOR_INTERRUPTS(): let's declare that every usage of bare tuples is a not-very-serious bug, and we can fix them one by one as we come across them. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> It would be simple enough to fix nodeSubplan.c to copy the data into >> an upper-level Slot rather than a bare tuple. But this makes me wonder >> how many other places are like this. In the past there wasn't that much >> benefit to pulling data from a Slot instead of a bare tuple, so I'm >> afraid we might have a number of similar gotchas we'd have to track >> down. > I compare this to adding CHECK_FOR_INTERRUPTS(): let's declare that > every usage of bare tuples is a not-very-serious bug, and we can fix > them one by one as we come across them. Unfortunately we can't usefully have such a rule --- consider sorting for example. We're not going to change over to using TupleTableSlots as the items being sorted. What I foresee if we go down this path is that there will be some places where we can fix toasting performance problems by inserting a Slot, and some where we can't. regards, tom lane
Tom Lane wrote: > OK, I've reproduced the test case locally. I believe that when you> say "worse", you mean "worse than 8.3", right? Andyou did tell me> offlist that you were testing with --enable-cassert. CVS HEAD has> very substantially greater cassertoverhead because of the> randomize_memory addition --- oprofile output for this test looks like>> samples % image name symbol name> 1239580 78.7721 postgres randomize_mem> 143544 9.1218 libc-2.7.so memcpy> 48039 3.0528 libc-2.7.so memset> 13838 0.8794 postgres LWLockAcquire> 12176 0.7738 postgres index_getnext> 11697 0.7433 postgres LWLockRelease> 10406 0.6613 postgres hash_search_with_hash_value> 4739 0.3012 postgres toast_fetch_datum> 4099 0.2605 postgres _bt_checkkeys> 3905 0.2482 postgres AllocSetAlloc> 3751 0.2384 postgres PinBuffer> 3545 0.2253 postgres UnpinBuffer>>I'm inclined to think that we'd better turn that off by default,> since it's not looking like it's catchinganything new. Yes, I suspect that's probably it. I applied the patch straight to CVS tip as I wasn't aware of any changes that would affect the unpatched result, but I was obviously wrong ;) (cut) > On the whole I'm still feeling pretty discouraged about this patch ... At the very least we have some more information on how an eventual solution should work, and a test case to help analyse the effectiveness of any potential solution. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063
>> I'm inclined to think that we'd better turn that off by default, >> since it's not looking like it's catching anything new. Well at least it caught the bug that Mark was performance testing with a --enable-cassert build :/ -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
Gregory Stark wrote: > Well at least it caught the bug that Mark was performance testing with a> --enable-cassert build :/ True ;) I appreciated that there would be some overhead, but I didn't think it would be that much. This was mainly since I seem to remember there was talk a while back of enabling some assertions in production builds. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063
Added to TODO: Eliminate de-TOASTing of values when not needed --------------------------------------------------------------------------- Tom Lane wrote: > Attached is a worked-out patch for the approach proposed here: > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php > namely, that cache management for de-TOASTed datums is handled > by TupleTableSlots. > > To avoid premature detoasting of values that we might never need, the > patch introduces a concept of an "indirect TOAST pointer", which has > the same 0x80 or 0x01 header as an external TOAST pointer, but can > be told apart by having a different length byte. Within that we have > * pointer to original toasted field within the Slot's tuple > * pointer to the owning Slot > * pointer to decompressed copy, or NULL if not decompressed yet > Some fairly straightforward extensions to the TupleTableSlot code, > heaptuple.c, and tuptoaster.c make it all go. > > My original thoughts had included turning FREE_IF_COPY() into a no-op, > but on investigation that seems impractical. One case that still > depends on that pfree is where we have palloc'd a 4-byte-header copy > of a short-header datum to support code that needs properly aligned > datum content. The solution adopted in the patch is to arrange for > pfree() applied to a cacheable detoasted object to be a no-op, whereas > it still works normally for non-cached detoasted objects. We do this > by inserting a dummy chunk header that points to a dummy memory context > whose pfree support method does nothing. I think this part of the patch > would be required for any toast caching method, not just this one. > > What I like about this patch is that it's a fairly small-footprint > change, it doesn't add much overhead, and it covers caching of > decompression for in-line-compressed datums as well as the out-of-line > case. > > One thing I really *don't* like about it is that it requires everyplace > that copies Datums to know about indirect pointers: in general, the copy > must be a copy of the original toasted Datum, not of the indirect > pointer, else we have indirect pointers that can outlive their owning > TupleTableSlot (or at least outlive its current tuple cycle). There > only seem to be about half a dozen such places in the current code, > but still it seems a rather fragile coding rule. > > After playing with it for a little bit, I'm not convinced that it buys > enough performance win to be worth applying --- the restriction of cache > lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive. > (For example, sorts that involve toasted sort keys continue to suck, > because the tuples being sorted aren't in Slots.) It would probably > fix the specific case that the PostGIS hackers were complaining of, > but I think we need something more. > > Still, I wanted to get it into the archives because the idea of indirect > toast pointers might be useful for something else. > > regards, tom lane > Content-Description: toast-indirect-1.patch.gz [ Type application/octet-stream treated as attachment, skipping... ] > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Added to TODO: > Eliminate de-TOASTing of values when not needed That's a fairly bad description of what the patch was about. I changed it to Reduce costs of repeat de-TOASTing of values regards, tom lane
On Sun, 2008-06-29 at 16:57 -0400, Tom Lane wrote: > After playing with it for a little bit, I'm not convinced that it buys > enough performance win to be worth applying --- the restriction of cache > lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive. > (For example, sorts that involve toasted sort keys continue to suck, > because the tuples being sorted aren't in Slots.) It would probably > fix the specific case that the PostGIS hackers were complaining of, > but I think we need something more. > > Still, I wanted to get it into the archives because the idea of indirect > toast pointers might be useful for something else. > Thank you for posting this to the list, this does help us at Truviso (sometimes). In some real cases, we're seeing about 15-20x improvement of the overall query; going from about 9 seconds to under 500 ms. In other cases that could theoretically benefit from TOAST caching, we see no improvement at all. As you say, the cases where it helps are fairly narrow. It's also very susceptible to changes in the plan. For instance, if the toasted value is on one side of a nested loop, the patch helps a lot; but it goes back to the performance of unpatched PostgreSQL for the same plan if the nested loop is inverted, as shown below. Regards,Jeff Davis [ plan estimates are omitted for readability ] ----------- UNPATCHED ----------- Nested Loop (actual time=3.021..1020.923 rows=1000 loops=1) -> Seq Scan on toasted (actual time=0.016..0.017 rows=1 loops=1)-> Seq Scan on a (actual time=0.016..0.735 rows=1000 loops=1) Total runtime: 1021.397 ms Nested Loop (actual time=1.352..1037.207 rows=1000 loops=1) -> Seq Scan on a (actual time=0.007..0.364 rows=1000 loops=1)-> Seq Scan on toasted (actual time=0.001..0.002 rows=1 loops=1000) Total runtime: 1037.653 ms ------------ PATCHED ------------ -- patch helps Nested Loop (actual time=1.266..2.247 rows=1000 loops=1) -> Seq Scan on toasted (actual time=0.003..0.072 rows=1 loops=1)-> Seq Scan on a (actual time=0.006..0.328 rows=1000 loops=1) Total runtime: 2.519 ms -- patch has no effect Nested Loop (actual time=1.283..1012.371 rows=1000 loops=1) -> Seq Scan on a (actual time=0.010..0.719 rows=1000 loops=1)-> Seq Scan on toasted (actual time=0.001..0.059 rows=1 loops=1000) Total runtime: 1012.973 ms
Jeff Davis <pgsql@j-davis.com> writes: > On Sun, 2008-06-29 at 16:57 -0400, Tom Lane wrote: >> After playing with it for a little bit, I'm not convinced that it buys >> enough performance win to be worth applying --- the restriction of cache >> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive. > Thank you for posting this to the list, this does help us at Truviso > (sometimes). In some real cases, we're seeing about 15-20x improvement > of the overall query; going from about 9 seconds to under 500 ms. In > other cases that could theoretically benefit from TOAST caching, we see > no improvement at all. > As you say, the cases where it helps are fairly narrow. Thanks for giving it a workout. Looks like we do indeed need to work on the other approach with a more persistent toasted-object cache. But the numbers you got are good evidence that this will be worth doing if we can get it right. I might try to look at this after the September commit fest (but if anyone else wants to tackle it, feel free!) regards, tom lane