Thread: WIP patch: reducing overhead for repeat de-TOASTing

WIP patch: reducing overhead for repeat de-TOASTing

From
Tom Lane
Date:
Attached is a worked-out patch for the approach proposed here:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php
namely, that cache management for de-TOASTed datums is handled
by TupleTableSlots.

To avoid premature detoasting of values that we might never need, the
patch introduces a concept of an "indirect TOAST pointer", which has
the same 0x80 or 0x01 header as an external TOAST pointer, but can
be told apart by having a different length byte.  Within that we have
    * pointer to original toasted field within the Slot's tuple
    * pointer to the owning Slot
    * pointer to decompressed copy, or NULL if not decompressed yet
Some fairly straightforward extensions to the TupleTableSlot code,
heaptuple.c, and tuptoaster.c make it all go.

My original thoughts had included turning FREE_IF_COPY() into a no-op,
but on investigation that seems impractical.  One case that still
depends on that pfree is where we have palloc'd a 4-byte-header copy
of a short-header datum to support code that needs properly aligned
datum content.  The solution adopted in the patch is to arrange for
pfree() applied to a cacheable detoasted object to be a no-op, whereas
it still works normally for non-cached detoasted objects.  We do this
by inserting a dummy chunk header that points to a dummy memory context
whose pfree support method does nothing.  I think this part of the patch
would be required for any toast caching method, not just this one.

What I like about this patch is that it's a fairly small-footprint
change, it doesn't add much overhead, and it covers caching of
decompression for in-line-compressed datums as well as the out-of-line
case.

One thing I really *don't* like about it is that it requires everyplace
that copies Datums to know about indirect pointers: in general, the copy
must be a copy of the original toasted Datum, not of the indirect
pointer, else we have indirect pointers that can outlive their owning
TupleTableSlot (or at least outlive its current tuple cycle).  There
only seem to be about half a dozen such places in the current code,
but still it seems a rather fragile coding rule.

After playing with it for a little bit, I'm not convinced that it buys
enough performance win to be worth applying --- the restriction of cache
lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
(For example, sorts that involve toasted sort keys continue to suck,
because the tuples being sorted aren't in Slots.)  It would probably
fix the specific case that the PostGIS hackers were complaining of,
but I think we need something more.

Still, I wanted to get it into the archives because the idea of indirect
toast pointers might be useful for something else.

            regards, tom lane


Attachment

Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> After playing with it for a little bit, I'm not convinced that it buys
> enough performance win to be worth applying --- the restriction of cache
> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
> (For example, sorts that involve toasted sort keys continue to suck,
> because the tuples being sorted aren't in Slots.)  It would probably
> fix the specific case that the PostGIS hackers were complaining of,
> but I think we need something more.
>
> Still, I wanted to get it into the archives because the idea of indirect
> toast pointers might be useful for something else.

I do like that it handles even inline-compressed cases. What I didn't like
about the managed cache was that it couldn't handle such cases. I could easily
imagine the PostGIS case arising for inline compressed data structures. I
wonder if it isn't worthwhile just for that case even if there's a further
cache behind it for repeated fetches of out-of-line data.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Mark Cave-Ayland
Date:
Tom Lane wrote:
> Attached is a worked-out patch for the approach proposed here:
> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php
> namely, that cache management for de-TOASTed datums is handled
> by TupleTableSlots.
> 
> To avoid premature detoasting of values that we might never need, the
> patch introduces a concept of an "indirect TOAST pointer", which has
> the same 0x80 or 0x01 header as an external TOAST pointer, but can
> be told apart by having a different length byte.  Within that we have
>     * pointer to original toasted field within the Slot's tuple
>     * pointer to the owning Slot
>     * pointer to decompressed copy, or NULL if not decompressed yet
> Some fairly straightforward extensions to the TupleTableSlot code,
> heaptuple.c, and tuptoaster.c make it all go.
> 
> My original thoughts had included turning FREE_IF_COPY() into a no-op,
> but on investigation that seems impractical.  One case that still
> depends on that pfree is where we have palloc'd a 4-byte-header copy
> of a short-header datum to support code that needs properly aligned
> datum content.  The solution adopted in the patch is to arrange for
> pfree() applied to a cacheable detoasted object to be a no-op, whereas
> it still works normally for non-cached detoasted objects.  We do this
> by inserting a dummy chunk header that points to a dummy memory context
> whose pfree support method does nothing.  I think this part of the patch
> would be required for any toast caching method, not just this one.
> 
> What I like about this patch is that it's a fairly small-footprint
> change, it doesn't add much overhead, and it covers caching of
> decompression for in-line-compressed datums as well as the out-of-line
> case.
> 
> One thing I really *don't* like about it is that it requires everyplace
> that copies Datums to know about indirect pointers: in general, the copy
> must be a copy of the original toasted Datum, not of the indirect
> pointer, else we have indirect pointers that can outlive their owning
> TupleTableSlot (or at least outlive its current tuple cycle).  There
> only seem to be about half a dozen such places in the current code,
> but still it seems a rather fragile coding rule.
> 
> After playing with it for a little bit, I'm not convinced that it buys
> enough performance win to be worth applying --- the restriction of cache
> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
> (For example, sorts that involve toasted sort keys continue to suck,
> because the tuples being sorted aren't in Slots.)  It would probably
> fix the specific case that the PostGIS hackers were complaining of,
> but I think we need something more.
> 
> Still, I wanted to get it into the archives because the idea of indirect
> toast pointers might be useful for something else.

Hi Tom,

Thanks very much for supplying the WIP patch. In the interest of testing 
and feedback, I've just downloaded PostgreSQL CVS head and applied your 
patch, compiled up a slightly modified version of PostGIS (without 
RECHECKs on the GiST opclass) and loaded in the same dataset.

Unfortunately I have to report back that with your WIP patch applied, 
timings seem to have become several orders of magnitude *worse*:


pg84@zeno:~$ psql -d postgis
psql (8.4devel)
Type "help" for help.

postgis=# explain analyze select count(*) from geography where centroid 
&& (select the_geom from geography where id=69495);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7100.28..7100.29 rows=1 width=0) (actual 
 
time=18238.932..18238.934 rows=1 loops=1)   InitPlan     ->  Seq Scan on geography  (cost=0.00..7092.00 rows=1
width=4387)
 
(actual time=27.472..69.223 rows=1 loops=1)           Filter: (id = 69495::numeric)   ->  Index Scan using
geography_centroid_idxon geography 
 
(cost=0.00..8.27 rows=1 width=0) (actual time=118.371..18196.041 
rows=32880 loops=1)         Index Cond: (centroid && $0) Total runtime: 18239.918 ms
(7 rows)


In fact, even sequential scans seem to have gone up by several orders of 
magnitude:


postgis=# set enable_indexscan = 'f';
SET
postgis=# set enable_bitmapscan = 'f';
SET
postgis=# explain analyze select count(*) from geography where centroid 
&& (select the_geom from geography where id=69495);                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=14184.01..14184.01 rows=1 width=0) (actual 
 
time=9117.022..9117.024 rows=1 loops=1)   InitPlan     ->  Seq Scan on geography  (cost=0.00..7092.00 rows=1
width=4387)
 
(actual time=23.780..54.362 rows=1 loops=1)           Filter: (id = 69495::numeric)   ->  Seq Scan on geography
(cost=0.00..7092.00rows=1 width=0) 
 
(actual time=55.016..9073.084 rows=32880 loops=1)         Filter: (centroid && $0) Total runtime: 9117.174 ms
(7 rows)


ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
"Guillaume Smet"
Date:
Mark,

On Tue, Jul 1, 2008 at 1:15 PM, Mark Cave-Ayland
<mark.cave-ayland@siriusit.co.uk> wrote:
> Thanks very much for supplying the WIP patch. In the interest of testing and
> feedback, I've just downloaded PostgreSQL CVS head and applied your patch,
> compiled up a slightly modified version of PostGIS (without RECHECKs on the
> GiST opclass) and loaded in the same dataset.

From the discussion we had a few months ago, I don't think the no
RECHECK trick works with CVS tip anymore.

See my post on the "Remove lossy-operator RECHECK flag?" thread:
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00847.php
and follow-ups.

That said, perhaps it's not the only problem you have but I thought it
was worth mentioning.

-- 
Guillaume


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Mark Cave-Ayland
Date:
Guillaume Smet wrote:

> From the discussion we had a few months ago, I don't think the no
> RECHECK trick works with CVS tip anymore.
> 
> See my post on the "Remove lossy-operator RECHECK flag?" thread:
> http://archives.postgresql.org/pgsql-hackers/2008-04/msg00847.php
> and follow-ups.
> 
> That said, perhaps it's not the only problem you have but I thought it
> was worth mentioning.

Yeah sorry, that might not have been as clear as I hoped. What I meant 
was that I removed the explicit RECHECK clause from the GiST operator 
class definition - since as you correctly mention, CVS tip throws an 
error if this is present.


ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Tom Lane
Date:
Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk> writes:
> Unfortunately I have to report back that with your WIP patch applied, 
> timings seem to have become several orders of magnitude *worse*:

Ugh.  Could I pester you to run the case under gprof or oprofile?
Or, if you can give me step-by-step directions for setting up the
test scenario, I could do that here.
        regards, tom lane


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Tom Lane
Date:
[ back on-list ]

Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk> writes:
> Thanks very much for supplying the WIP patch. In the interest of testing 
> and feedback, I've just downloaded PostgreSQL CVS head and applied your 
> patch, compiled up a slightly modified version of PostGIS (without 
> RECHECKs on the GiST opclass) and loaded in the same dataset.
> Unfortunately I have to report back that with your WIP patch applied, 
> timings seem to have become several orders of magnitude *worse*:

OK, I've reproduced the test case locally.  I believe that when you
say "worse", you mean "worse than 8.3", right?  And you did tell me
offlist that you were testing with --enable-cassert.  CVS HEAD has
very substantially greater cassert overhead because of the
randomize_memory addition --- oprofile output for this test looks like

samples  %        image name               symbol name
1239580  78.7721  postgres                 randomize_mem
143544    9.1218  libc-2.7.so              memcpy
48039     3.0528  libc-2.7.so              memset
13838     0.8794  postgres                 LWLockAcquire
12176     0.7738  postgres                 index_getnext
11697     0.7433  postgres                 LWLockRelease
10406     0.6613  postgres                 hash_search_with_hash_value
4739      0.3012  postgres                 toast_fetch_datum
4099      0.2605  postgres                 _bt_checkkeys
3905      0.2482  postgres                 AllocSetAlloc
3751      0.2384  postgres                 PinBuffer
3545      0.2253  postgres                 UnpinBuffer

I'm inclined to think that we'd better turn that off by default,
since it's not looking like it's catching anything new.

However, even with cassert turned off, the de-TOAST patch isn't
helping your test case; though it does help if I turn the query
into a join.  Here's what I get for
            8.3    HEAD    HEAD + patch

original query [1]        4575    4669    4613
original + force_2d [2]         196     195     201
converted to join [3]        4603    4667     209
original, indexscans off    2335    2391    2426
join, indexscans off        2350    2373     124

[1] select count(*) from geography where centroid && (select the_geom from geography where id=69495);

[2] select count(*) from geography where centroid && (select force_2d(the_geom) from geography where id=69495);

[3] select count(*) from geography g1, geography g2 where g1.centroid && g2.the_geom and g2.id=69495;

All times in msec, median of 3 trials using psql \timing (times seem
to be reproducible within about 1%, if data is already cached).
Default parameters all around.

The join form of the query produces the results I expected, with
g2.the_geom coming from the outer side of a nestloop join and getting
detoasted only once.  After some digging I found the reason why the
original query isn't getting any benefit: it's copying the_geom up
from an InitPlan, and nodeSubplan.c does that like this:
       /*        * We need to copy the subplan's tuple into our own context, in case        * any of the params are
pass-by-reftype --- the pointers stored in        * the param structs will point at this copied tuple! node->curTuple
    * keeps track of the copied tuple for eventual freeing.        */       if (node->curTuple)
heap_freetuple(node->curTuple);      node->curTuple = ExecCopySlotTuple(slot);
 
       /*        * Now set all the setParam params from the columns of the tuple        */       foreach(l,
subplan->setParam)      {           int            paramid = lfirst_int(l);           ParamExecData *prm =
&(econtext->ecxt_param_exec_vals[paramid]);
           prm->execPlan = NULL;           prm->value = heap_getattr(node->curTuple, i, tdesc,
          &(prm->isnull));           i++;       }
 

So the Param is pointing at a bare toasted Datum, not an indirect
pointer in a Slot, and there's no way to avoid detoasting each time
the Param is referenced.

It would be simple enough to fix nodeSubplan.c to copy the data into
an upper-level Slot rather than a bare tuple.  But this makes me wonder
how many other places are like this.  In the past there wasn't that much
benefit to pulling data from a Slot instead of a bare tuple, so I'm
afraid we might have a number of similar gotchas we'd have to track
down.

The other thing that worries me is that the force_2d query measurements
suggest a runtime penalty of two or three percent for the patch, which
is higher than I was hoping for.  But that isn't that much more than
the noise level in this test.

On the whole I'm still feeling pretty discouraged about this patch ...
        regards, tom lane


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Alvaro Herrera
Date:
Tom Lane wrote:

> It would be simple enough to fix nodeSubplan.c to copy the data into
> an upper-level Slot rather than a bare tuple.  But this makes me wonder
> how many other places are like this.  In the past there wasn't that much
> benefit to pulling data from a Slot instead of a bare tuple, so I'm
> afraid we might have a number of similar gotchas we'd have to track
> down.

I compare this to adding CHECK_FOR_INTERRUPTS(): let's declare that
every usage of bare tuples is a not-very-serious bug, and we can fix
them one by one as we come across them.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> It would be simple enough to fix nodeSubplan.c to copy the data into
>> an upper-level Slot rather than a bare tuple.  But this makes me wonder
>> how many other places are like this.  In the past there wasn't that much
>> benefit to pulling data from a Slot instead of a bare tuple, so I'm
>> afraid we might have a number of similar gotchas we'd have to track
>> down.

> I compare this to adding CHECK_FOR_INTERRUPTS(): let's declare that
> every usage of bare tuples is a not-very-serious bug, and we can fix
> them one by one as we come across them.

Unfortunately we can't usefully have such a rule --- consider sorting
for example.  We're not going to change over to using TupleTableSlots
as the items being sorted.  What I foresee if we go down this path
is that there will be some places where we can fix toasting performance
problems by inserting a Slot, and some where we can't.
        regards, tom lane


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Mark Cave-Ayland
Date:
Tom Lane wrote:
> OK, I've reproduced the test case locally.  I believe that when you> say "worse", you mean "worse than 8.3", right?
Andyou did tell me> offlist that you were testing with --enable-cassert.  CVS HEAD has> very substantially greater
cassertoverhead because of the> randomize_memory addition --- oprofile output for this test looks like>> samples  %
  image name               symbol name> 1239580  78.7721  postgres                 randomize_mem> 143544    9.1218
libc-2.7.so             memcpy> 48039     3.0528  libc-2.7.so              memset> 13838     0.8794  postgres
     LWLockAcquire> 12176     0.7738  postgres                 index_getnext> 11697     0.7433  postgres
LWLockRelease> 10406     0.6613  postgres                 hash_search_with_hash_value> 4739      0.3012  postgres
         toast_fetch_datum> 4099      0.2605  postgres                 _bt_checkkeys> 3905      0.2482  postgres
        AllocSetAlloc> 3751      0.2384  postgres                 PinBuffer> 3545      0.2253  postgres
UnpinBuffer>>I'm inclined to think that we'd better turn that off by default,> since it's not looking like it's
catchinganything new.
 

Yes, I suspect that's probably it. I applied the patch straight to CVS 
tip as I wasn't aware of any changes that would affect the unpatched 
result, but I was obviously wrong ;)

(cut)
> On the whole I'm still feeling pretty discouraged about this patch ...

At the very least we have some more information on how an eventual 
solution should work, and a test case to help analyse the effectiveness 
of any potential solution.


ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Gregory Stark
Date:
>> I'm inclined to think that we'd better turn that off by default,
>> since it's not looking like it's catching anything new.

Well at least it caught the bug that Mark was performance testing with a
--enable-cassert build :/


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Mark Cave-Ayland
Date:
Gregory Stark wrote:
> Well at least it caught the bug that Mark was performance testing with a> --enable-cassert build :/

True ;)  I appreciated that there would be some overhead, but I didn't 
think it would be that much. This was mainly since I seem to remember 
there was talk a while back of enabling some assertions in production 
builds.


ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Bruce Momjian
Date:
Added to TODO:
Eliminate de-TOASTing of values when not needed

---------------------------------------------------------------------------

Tom Lane wrote:
> Attached is a worked-out patch for the approach proposed here:
> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php
> namely, that cache management for de-TOASTed datums is handled
> by TupleTableSlots.
> 
> To avoid premature detoasting of values that we might never need, the
> patch introduces a concept of an "indirect TOAST pointer", which has
> the same 0x80 or 0x01 header as an external TOAST pointer, but can
> be told apart by having a different length byte.  Within that we have
>     * pointer to original toasted field within the Slot's tuple
>     * pointer to the owning Slot
>     * pointer to decompressed copy, or NULL if not decompressed yet
> Some fairly straightforward extensions to the TupleTableSlot code,
> heaptuple.c, and tuptoaster.c make it all go.
> 
> My original thoughts had included turning FREE_IF_COPY() into a no-op,
> but on investigation that seems impractical.  One case that still
> depends on that pfree is where we have palloc'd a 4-byte-header copy
> of a short-header datum to support code that needs properly aligned
> datum content.  The solution adopted in the patch is to arrange for
> pfree() applied to a cacheable detoasted object to be a no-op, whereas
> it still works normally for non-cached detoasted objects.  We do this
> by inserting a dummy chunk header that points to a dummy memory context
> whose pfree support method does nothing.  I think this part of the patch
> would be required for any toast caching method, not just this one.
> 
> What I like about this patch is that it's a fairly small-footprint
> change, it doesn't add much overhead, and it covers caching of
> decompression for in-line-compressed datums as well as the out-of-line
> case.
> 
> One thing I really *don't* like about it is that it requires everyplace
> that copies Datums to know about indirect pointers: in general, the copy
> must be a copy of the original toasted Datum, not of the indirect
> pointer, else we have indirect pointers that can outlive their owning
> TupleTableSlot (or at least outlive its current tuple cycle).  There
> only seem to be about half a dozen such places in the current code,
> but still it seems a rather fragile coding rule.
> 
> After playing with it for a little bit, I'm not convinced that it buys
> enough performance win to be worth applying --- the restriction of cache
> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
> (For example, sorts that involve toasted sort keys continue to suck,
> because the tuples being sorted aren't in Slots.)  It would probably
> fix the specific case that the PostGIS hackers were complaining of,
> but I think we need something more.
> 
> Still, I wanted to get it into the archives because the idea of indirect
> toast pointers might be useful for something else.
> 
>             regards, tom lane
> 

Content-Description: toast-indirect-1.patch.gz

[ Type application/octet-stream treated as attachment, skipping... ]

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Added to TODO:
>     Eliminate de-TOASTing of values when not needed

That's a fairly bad description of what the patch was about.  I changed
it to
Reduce costs of repeat de-TOASTing of values
        regards, tom lane


Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Jeff Davis
Date:
On Sun, 2008-06-29 at 16:57 -0400, Tom Lane wrote:
> After playing with it for a little bit, I'm not convinced that it buys
> enough performance win to be worth applying --- the restriction of cache
> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
> (For example, sorts that involve toasted sort keys continue to suck,
> because the tuples being sorted aren't in Slots.)  It would probably
> fix the specific case that the PostGIS hackers were complaining of,
> but I think we need something more.
> 
> Still, I wanted to get it into the archives because the idea of indirect
> toast pointers might be useful for something else.
> 

Thank you for posting this to the list, this does help us at Truviso
(sometimes). In some real cases, we're seeing about 15-20x improvement
of the overall query; going from about 9 seconds to under 500 ms. In
other cases that could theoretically benefit from TOAST caching, we see
no improvement at all.

As you say, the cases where it helps are fairly narrow. It's
also very susceptible to changes in the plan. For instance, if the
toasted value is on one side of a nested loop, the patch helps a
lot; but it goes back to the performance of unpatched PostgreSQL for the
same plan if the nested loop is inverted, as shown below.

Regards,Jeff Davis

[ plan estimates are omitted for readability ]

----------- UNPATCHED -----------

Nested Loop (actual time=3.021..1020.923 rows=1000 loops=1) ->  Seq Scan on toasted (actual time=0.016..0.017 rows=1
loops=1)->  Seq Scan on a (actual time=0.016..0.735 rows=1000 loops=1)
 
Total runtime: 1021.397 ms

Nested Loop (actual time=1.352..1037.207 rows=1000 loops=1) ->  Seq Scan on a (actual time=0.007..0.364 rows=1000
loops=1)->  Seq Scan on toasted (actual time=0.001..0.002 rows=1 loops=1000)
 
Total runtime: 1037.653 ms


------------ PATCHED ------------
-- patch helps
Nested Loop (actual time=1.266..2.247 rows=1000 loops=1) ->  Seq Scan on toasted (actual time=0.003..0.072 rows=1
loops=1)->  Seq Scan on a (actual time=0.006..0.328 rows=1000 loops=1)
 
Total runtime: 2.519 ms

-- patch has no effect
Nested Loop (actual time=1.283..1012.371 rows=1000 loops=1) ->  Seq Scan on a (actual time=0.010..0.719 rows=1000
loops=1)->  Seq Scan on toasted (actual time=0.001..0.059 rows=1 loops=1000)
 
Total runtime: 1012.973 ms





Re: WIP patch: reducing overhead for repeat de-TOASTing

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Sun, 2008-06-29 at 16:57 -0400, Tom Lane wrote:
>> After playing with it for a little bit, I'm not convinced that it buys
>> enough performance win to be worth applying --- the restriction of cache
>> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.

> Thank you for posting this to the list, this does help us at Truviso
> (sometimes). In some real cases, we're seeing about 15-20x improvement
> of the overall query; going from about 9 seconds to under 500 ms. In
> other cases that could theoretically benefit from TOAST caching, we see
> no improvement at all.

> As you say, the cases where it helps are fairly narrow.

Thanks for giving it a workout.  Looks like we do indeed need to work on
the other approach with a more persistent toasted-object cache.  But the
numbers you got are good evidence that this will be worth doing if we
can get it right.

I might try to look at this after the September commit fest (but if
anyone else wants to tackle it, feel free!)
        regards, tom lane