Thread: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
Hello this patch remove a multiple detoasting of varlena values in plpgsql. It is usable mainly for iteration over longer array directly loaded from relation. It's doesn't have a impact on semantic or behave - it's just eliminate some performance trap. sample: table 10000 rows one column with array with 1000 string fields: patched pl time: 6 sec unpatched pl time: 170 sec This doesn't change my opinion on FOR-IN-ARRAY cycle (is still important for readability) - just remove one critical performance issue Regards Pavel Stehule
Attachment
Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Andrew Dunstan
Date:
On 11/22/2010 07:46 AM, Pavel Stehule wrote: > Hello > > this patch remove a multiple detoasting of varlena values in plpgsql. > > It is usable mainly for iteration over longer array directly loaded > from relation. > > It's doesn't have a impact on semantic or behave - it's just eliminate > some performance trap. > > sample: table 10000 rows one column with array with 1000 string fields: > > patched pl time: 6 sec > unpatched pl time: 170 sec > Since you haven't told us exactly how you tested this it's hard to gauge the test results. cheers andrew
Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2010/11/22 Andrew Dunstan <andrew@dunslane.net>: > > > On 11/22/2010 07:46 AM, Pavel Stehule wrote: >> >> Hello >> >> this patch remove a multiple detoasting of varlena values in plpgsql. >> >> It is usable mainly for iteration over longer array directly loaded >> from relation. >> >> It's doesn't have a impact on semantic or behave - it's just eliminate >> some performance trap. >> >> sample: table 10000 rows one column with array with 1000 string fields: >> >> patched pl time: 6 sec >> unpatched pl time: 170 sec >> > > Since you haven't told us exactly how you tested this it's hard to gauge the > test results. sorry - it is related to tests from FOR-IN-ARRAY thread create table t1000(x text[]); CREATE OR REPLACE FUNCTION rndstr() RETURNS text AS $$select array_to_string(array(select substring('ABCDEFGHIJKLMNOPQ' FROM (random()*16)::int FOR 1) from generate_series(1,10)),'')$$ LANGUAGE sql; create or replace function rndarray(int) returns text[] as $$select array(select rndstr() from generate_series(1,$1)) $$ language sql; insert into t1000 select rndarray(1000) from generate_series(1,10000); CREATE OR REPLACE FUNCTION public.filter02(text[], text, integer)RETURNS text[]LANGUAGE plpgsql AS $function$ DECLAREs text[] := '{}';l int := 0; i int;v text; BEGINFOR i IN array_lower($1,1)..array_upper($1,1)LOOP EXIT WHEN l = $3; IF $1[i] LIKE $2 THEN s := s || $1[i]; l:= l + 1; END IF;END LOOP;RETURN s; END;$function$ test query: select avg(array_upper(filter02(x,'%AA%', 10),1)) from t1000; Regards Pavel Stehule > > cheers > > andrew >
Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Alvaro Herrera
Date:
Excerpts from Pavel Stehule's message of lun nov 22 10:01:23 -0300 2010: > sorry - it is related to tests from FOR-IN-ARRAY thread > test query: select avg(array_upper(filter02(x,'%AA%', 10),1)) from t1000; Yeah, I can measure a 25x improvement in that test with the patch applied. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Noah Misch
Date:
Hello Pavel, I'm reviewing this patch for CommitFest 2011-01. The patch seems fully desirable. It appropriately contains no documentation updates. It contains no new tests, and that's probably fine, too; I can't think of any corner cases where this would do something other than work correctly or break things comprehensively. Using your test case from here: http://archives.postgresql.org/message-id/AANLkTikDCW+c-C4U4NgaOBhpFSZkb5Uy_ZuaTDZfPMSn@mail.gmail.com I observed a 28x speedup, similar to Álvaro's report. I also made my own test case, attached, to evaluate this from a somewhat different angle and also to consider the worst case. With a 100 KiB string (good case), I see a 4.8x speedup. With a 1 KiB string (worst case - never toasted), I see no statistically significant change. This is to be expected. A few specific questions and comments follow, all minor. Go ahead and mark the patch ready for committer when you've acted on them, or declined to do so, to your satisfaction. I don't think a re-review will be needed. Thanks, nm On Mon, Nov 22, 2010 at 01:46:51PM +0100, Pavel Stehule wrote: > *** ./pl_exec.c.orig 2010-11-16 10:28:42.000000000 +0100 > --- ./pl_exec.c 2010-11-22 13:33:01.597726809 +0100 The patch applies cleanly, but the format is slightly nonstandard (-p0 when applied from src/pl/plpgsql/src, rather than -p1 from the root). > *************** > *** 3944,3949 **** > --- 3965,3993 ---- > > *typeid = var->datatype->typoid; > *typetypmod = var->datatype->atttypmod; > + > + /* > + * explicit deTOAST and decomprim for varlena types "decompress", perhaps? > + */ > + if (var->should_be_detoasted) > + { > + Datum dvalue; > + > + Assert(!var->isnull); > + > + oldcontext = MemoryContextSwitchTo(estate->fn_mcxt); > + dvalue = PointerGetDatum(PG_DETOAST_DATUM(var->value)); > + if (dvalue != var->value) > + { > + if (var->freeval) > + free_var(var); > + var->value = dvalue; > + var->freeval = true; > + } > + MemoryContextSwitchTo(oldcontext); This line adds trailing white space. > + var->should_be_detoasted = false; > + } > + > *value = var->value; > *isnull = var->isnull; > break; > *** ./plpgsql.h.orig 2010-11-16 10:28:42.000000000 +0100 > --- ./plpgsql.h 2010-11-22 13:12:38.897851879 +0100 > *************** > *** 644,649 **** > --- 645,651 ---- > bool fn_is_trigger; > PLpgSQL_func_hashkey *fn_hashkey; /* back-link to hashtable key */ > MemoryContext fn_cxt; > + MemoryContext fn_mcxt; /* link to function's memory context */ > > Oid fn_rettype; > int fn_rettyplen; > *************** > *** 692,697 **** > --- 694,701 ---- > Oid rettype; /* type of current retval */ > > Oid fn_rettype; /* info about declared function rettype */ > + MemoryContext fn_mcxt; /* link to function's memory context */ > + > bool retistuple; > bool retisset; > I only see PLpgSQL_execstate.fn_mcxt getting populated in this patch. Is the PLpgSQL_function.fn_mcxt leftover from an earlier design? I could not readily tell the difference between fn_cxt and fn_mcxt. As I understand it, fn_mcxt is the SPI procCxt, and fn_cxt is the long-lived context used to cache facts across many transactions. Perhaps name the member something like "top_cxt", "exec_cxt" or "proc_cxt" and comment it like "/* SPI Proc memory context */" to make this clearer.
Attachment
Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
Hello 2011/1/15 Noah Misch <noah@leadboat.com>: > Hello Pavel, > > I'm reviewing this patch for CommitFest 2011-01. > Thank you very much, I am sending a updated version with little bit more comments. But I am sure, so somebody with good English have to edit my comments. Minimally I hope, so your questions will be answered. > The patch seems fully desirable. It appropriately contains no documentation > updates. It contains no new tests, and that's probably fine, too; I can't think > of any corner cases where this would do something other than work correctly or > break things comprehensively. > > Using your test case from here: > http://archives.postgresql.org/message-id/AANLkTikDCW+c-C4U4NgaOBhpFSZkb5Uy_ZuaTDZfPMSn@mail.gmail.com > I observed a 28x speedup, similar to Álvaro's report. I also made my own test > case, attached, to evaluate this from a somewhat different angle and also to > consider the worst case. With a 100 KiB string (good case), I see a 4.8x > speedup. With a 1 KiB string (worst case - never toasted), I see no > statistically significant change. This is to be expected. > > A few specific questions and comments follow, all minor. Go ahead and mark the > patch ready for committer when you've acted on them, or declined to do so, to > your satisfaction. I don't think a re-review will be needed. > > Thanks, > nm > > On Mon, Nov 22, 2010 at 01:46:51PM +0100, Pavel Stehule wrote: >> *** ./pl_exec.c.orig 2010-11-16 10:28:42.000000000 +0100 >> --- ./pl_exec.c 2010-11-22 13:33:01.597726809 +0100 > > The patch applies cleanly, but the format is slightly nonstandard (-p0 when > applied from src/pl/plpgsql/src, rather than -p1 from the root). > >> *************** >> *** 3944,3949 **** >> --- 3965,3993 ---- >> >> *typeid = var->datatype->typoid; >> *typetypmod = var->datatype->atttypmod; >> + >> + /* >> + * explicit deTOAST and decomprim for varlena types > > "decompress", perhaps? > fixed >> + */ >> + if (var->should_be_detoasted) >> + { >> + Datum dvalue; >> + >> + Assert(!var->isnull); >> + >> + oldcontext = MemoryContextSwitchTo(estate->fn_mcxt); >> + dvalue = PointerGetDatum(PG_DETOAST_DATUM(var->value)); >> + if (dvalue != var->value) >> + { >> + if (var->freeval) >> + free_var(var); >> + var->value = dvalue; >> + var->freeval = true; >> + } >> + MemoryContextSwitchTo(oldcontext); > > This line adds trailing white space. > >> + var->should_be_detoasted = false; >> + } >> + >> *value = var->value; >> *isnull = var->isnull; >> break; > >> *** ./plpgsql.h.orig 2010-11-16 10:28:42.000000000 +0100 >> --- ./plpgsql.h 2010-11-22 13:12:38.897851879 +0100 > >> *************** >> *** 644,649 **** >> --- 645,651 ---- >> bool fn_is_trigger; >> PLpgSQL_func_hashkey *fn_hashkey; /* back-link to hashtable key */ >> MemoryContext fn_cxt; >> + MemoryContext fn_mcxt; /* link to function's memory context */ >> >> Oid fn_rettype; >> int fn_rettyplen; >> *************** >> *** 692,697 **** >> --- 694,701 ---- >> Oid rettype; /* type of current retval */ >> >> Oid fn_rettype; /* info about declared function rettype */ >> + MemoryContext fn_mcxt; /* link to function's memory context */ >> + >> bool retistuple; >> bool retisset; >> > > I only see PLpgSQL_execstate.fn_mcxt getting populated in this patch. Is the > PLpgSQL_function.fn_mcxt leftover from an earlier design? I have to access to top execution context from exec_eval_datum function. This function can be called from parser's context, and without explicit switch to top execution context a variables are detoasted in wrong context. > > I could not readily tell the difference between fn_cxt and fn_mcxt. As I > understand it, fn_mcxt is the SPI procCxt, and fn_cxt is the long-lived context > used to cache facts across many transactions. Perhaps name the member something > like "top_cxt", "exec_cxt" or "proc_cxt" and comment it like "/* SPI Proc memory > context */" to make this clearer. I used a top_exec_cxt name Pavel Stehule Regards >
Attachment
Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Noah Misch
Date:
On Sun, Jan 16, 2011 at 06:49:27PM +0100, Pavel Stehule wrote: > I am sending a updated version with little bit more comments. But I am > sure, so somebody with good English have to edit my comments. > Minimally I hope, so your questions will be answered. Thanks. I edited the comments and white space somewhat, as attached. I'll go ahead and mark it Ready for Committer.
Attachment
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes: > On Sun, Jan 16, 2011 at 06:49:27PM +0100, Pavel Stehule wrote: >> I am sending a updated version with little bit more comments. But I am >> sure, so somebody with good English have to edit my comments. >> Minimally I hope, so your questions will be answered. > Thanks. I edited the comments and white space somewhat, as attached. I'll go > ahead and mark it Ready for Committer. I looked at this patch and found it fairly awkward. What is the point of adding an additional flag to every variable, as opposed to just forcibly detoasting during assignment? If it's to avoid detoasting when the variable is read in a way that doesn't require detoasting, it fails rather completely IMO, since exec_eval_datum certainly doesn't know that. The added memory context variable seems redundant as well. regards, tom lane
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/1/18 Tom Lane <tgl@sss.pgh.pa.us>: > Noah Misch <noah@leadboat.com> writes: >> On Sun, Jan 16, 2011 at 06:49:27PM +0100, Pavel Stehule wrote: >>> I am sending a updated version with little bit more comments. But I am >>> sure, so somebody with good English have to edit my comments. >>> Minimally I hope, so your questions will be answered. > >> Thanks. I edited the comments and white space somewhat, as attached. I'll go >> ahead and mark it Ready for Committer. > > I looked at this patch and found it fairly awkward. What is the point > of adding an additional flag to every variable, as opposed to just > forcibly detoasting during assignment? If it's to avoid detoasting when > the variable is read in a way that doesn't require detoasting, it fails > rather completely IMO, since exec_eval_datum certainly doesn't know > that. I am not sure about false overhead of detoasting. This is a safe variant. I don't would to decrease performance. Not sure if it's necessary. But detoasting on assignment isn't enought: some most simple example - searching a maximum for i in array_lower(a,1) .. array_upper(a,1) loop if x < a[i] then x = a[i]; end if; end loop; in this cycle the variable a wasn't modified. Any access to this variable means a detoast and decompres. So there is necessary to modify a process. Detoast not on assign, but detoast on usage. Regards Pavel Stehule > > The added memory context variable seems redundant as well. I didn't find a pointer on top execution context available from execution state. I am sure, so we have to switch to this context explicitly. Regards Pavel Stehule > > regards, tom lane >
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2011/1/18 Tom Lane <tgl@sss.pgh.pa.us>: >> I looked at this patch and found it fairly awkward. What is the point >> of adding an additional flag to every variable, as opposed to just >> forcibly detoasting during assignment? > But detoasting on assignment isn't enought: > for i in array_lower(a,1) .. array_upper(a,1) > loop > if x < a[i] then > x = a[i]; > end if; > end loop; > in this cycle the variable a wasn't modified. Any access to this > variable means a detoast and decompres. How so? In what I'm envisioning, a would have been decompressed when it was originally assigned to. regards, tom lane
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/1/18 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2011/1/18 Tom Lane <tgl@sss.pgh.pa.us>: >>> I looked at this patch and found it fairly awkward. What is the point >>> of adding an additional flag to every variable, as opposed to just >>> forcibly detoasting during assignment? > >> But detoasting on assignment isn't enought: > >> for i in array_lower(a,1) .. array_upper(a,1) >> loop >> if x < a[i] then >> x = a[i]; >> end if; >> end loop; > >> in this cycle the variable a wasn't modified. Any access to this >> variable means a detoast and decompres. > > How so? In what I'm envisioning, a would have been decompressed when it > was originally assigned to. > oh, I understand now. I afraid so it can be overhad, because there can be path where you doesn't use a some variables from parameter list. There are lot of user procedures, where not all parameters are used, so I think is better to wait on first usage. Probably these procedures can be written in SQL or C, but it can decrese a performance of some current trivial functions in plpgsql. So my strategy is simple - wait with detoasting to last moment, but don't repeat detoasting. My opinion isn't strong in this topic. One or twenty useless detoasting isn't really significant in almost use cases (problem is thousands detoasting). Regards Pavel Stehule > regards, tom lane >
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > opinion isn't strong in this topic. One or twenty useless detoasting > isn't really significant in almost use cases (problem is thousands > detoasting). Yeah. Many-times-repeated detoasting is really bad, and this is not the only place in the backend where we have this problem. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/1/19 Robert Haas <robertmhaas@gmail.com>: > On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> opinion isn't strong in this topic. One or twenty useless detoasting >> isn't really significant in almost use cases (problem is thousands >> detoasting). > > Yeah. Many-times-repeated detoasting is really bad, and this is not > the only place in the backend where we have this problem. :-( > ??? some general solution can be pretty hardcore - some like handlers instead pointers on varlena :( I know mainly about plpgsql performance problems - but it is my interes. Plpgsql can be simply fixed - maybe 10 maybe less lines of code. Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> opinion isn't strong in this topic. One or twenty useless detoasting >> isn't really significant in almost use cases (problem is thousands >> detoasting). > Yeah. Many-times-repeated detoasting is really bad, and this is not > the only place in the backend where we have this problem. :-( Yeah, there's been some discussion of a more general solution, and I think I even had a trial patch at one point (which turned out not to work terribly well, but maybe somebody will have a better idea someday). In the meantime, the proposal at hand seems like a bit of a stop-gap, which is why I'd prefer to see something with a very minimal code footprint. Detoast at assignment would likely need only a few lines of code added in a single place. regards, tom lane
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> opinion isn't strong in this topic. One or twenty useless detoasting >>> isn't really significant in almost use cases (problem is thousands >>> detoasting). > >> Yeah. Many-times-repeated detoasting is really bad, and this is not >> the only place in the backend where we have this problem. :-( > > Yeah, there's been some discussion of a more general solution, and I > think I even had a trial patch at one point (which turned out not to > work terribly well, but maybe somebody will have a better idea someday). I'm pretty doubtful that there's going to be a general solution to this problem - I think it's going to require gradual refactoring of problem spots. > In the meantime, the proposal at hand seems like a bit of a stop-gap, > which is why I'd prefer to see something with a very minimal code > footprint. Detoast at assignment would likely need only a few lines > of code added in a single place. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Noah Misch
Date:
On Wed, Jan 19, 2011 at 12:10:16PM -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> opinion isn't strong in this topic. One or twenty useless detoasting > >> isn't really significant in almost use cases (problem is thousands > >> detoasting). > > > Yeah. Many-times-repeated detoasting is really bad, and this is not > > the only place in the backend where we have this problem. :-( > > Yeah, there's been some discussion of a more general solution, and I > think I even had a trial patch at one point (which turned out not to > work terribly well, but maybe somebody will have a better idea someday). > In the meantime, the proposal at hand seems like a bit of a stop-gap, > which is why I'd prefer to see something with a very minimal code > footprint. Detoast at assignment would likely need only a few lines > of code added in a single place. What qualifies a patch as stop-gap scale? Pavel's patch is ~60 lines. If adding PLpgSQL_var.should_be_detoasted is your main pain point, testing VARATT_IS_EXTENDED there might be the least-harmful way to avoid it. Saving a few more lines by moving the work to exec_assign_value probably does not justify the associated performance regressions Pavel has cited. nm
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes: > On Wed, Jan 19, 2011 at 12:10:16PM -0500, Tom Lane wrote: >> In the meantime, the proposal at hand seems like a bit of a stop-gap, >> which is why I'd prefer to see something with a very minimal code >> footprint. Detoast at assignment would likely need only a few lines >> of code added in a single place. > What qualifies a patch as stop-gap scale? Pavel's patch is ~60 lines. Yeah, but they're spread out all over plpgsql, and seem likely to metastasize to other places --- the additional field that needs to be initialized is the main culprit. I'd like a one-spot patch that will be easy to remove when/if it's no longer needed. > If adding PLpgSQL_var.should_be_detoasted is your main pain point, testing > VARATT_IS_EXTENDED there might be the least-harmful way to avoid it. I thought about that too, but adding an additional set of tests into exec_eval_datum isn't free --- that's a hot spot for plpgsql execution. Doing it in exec_assign_value would be significantly cheaper, first because it's reasonable to assume that assignments are less frequent than reads, and second because there's already a test there to detect pass-by-ref datatypes, as well as a datumCopy() step that could be skipped altogether when we detoast. regards, tom lane
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Yeah. Many-times-repeated detoasting is really bad, and this is not >>> the only place in the backend where we have this problem. :-( >> Yeah, there's been some discussion of a more general solution, and I >> think I even had a trial patch at one point (which turned out not to >> work terribly well, but maybe somebody will have a better idea someday). > I'm pretty doubtful that there's going to be a general solution to > this problem - I think it's going to require gradual refactoring of > problem spots. Do you remember the previous discussion? One idea that was on the table was to make the TOAST code maintain a cache of detoasted values, which could be indexed by the toast pointer OIDs (toast rel OID + value OID), and then PG_DETOAST_DATUM might give back a pointer into the cache instead of a fresh value. In principle that could be done in a fairly centralized way. The hard part is to know when a cache entry is not actively referenced anymore ... regards, tom lane
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Wed, Jan 19, 2011 at 2:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> Yeah. Many-times-repeated detoasting is really bad, and this is not >>>> the only place in the backend where we have this problem. :-( > >>> Yeah, there's been some discussion of a more general solution, and I >>> think I even had a trial patch at one point (which turned out not to >>> work terribly well, but maybe somebody will have a better idea someday). > >> I'm pretty doubtful that there's going to be a general solution to >> this problem - I think it's going to require gradual refactoring of >> problem spots. > > Do you remember the previous discussion? One idea that was on the table > was to make the TOAST code maintain a cache of detoasted values, which > could be indexed by the toast pointer OIDs (toast rel OID + value OID), > and then PG_DETOAST_DATUM might give back a pointer into the cache > instead of a fresh value. In principle that could be done in a fairly > centralized way. The hard part is to know when a cache entry is not > actively referenced anymore ... I do remember that discussion. Aside from the problem you mention, it also seems that maintaining the hash table and doing lookups into it would have some intrinsic cost. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > I do remember that discussion. Aside from the problem you mention, it > also seems that maintaining the hash table and doing lookups into it > would have some intrinsic cost. Well, sure, but it's still far cheaper than going out to the toast table (which will require multiple hashtable lookups, not to mention I/O and possible lock blocking). If we could solve the refcounting problem I think this would be a very significant win. regards, tom lane
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: > If we could solve the refcounting problem I think this would be a > very significant win. Instead of trying to keep a refcount, how about just evicting from the buffer as needed based on LRU? -Kevin
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If we could solve the refcounting problem I think this would be a >> very significant win. > Instead of trying to keep a refcount, how about just evicting from > the buffer as needed based on LRU? Well, unless you know for certain that an item is no longer used, you can't evict it. There are ways you could finesse that --- for instance, you might try to only flush between statements, when certainly no user functions are running --- but the problem is that the cache is likely to contain some large values that you can't adopt such a laissez faire approach with, or you'll run out of memory. One idea that I think we discussed was to tie cache entries to the memory context they were demanded in, and mark them unused at the next context reset/delete. That way they'd be considered unused at the same points where the current implementation would certainly have discarded the value. This isn't perfect (because of pfree) but might be good enough. regards, tom lane
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Wed, Jan 19, 2011 at 4:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > One idea that I think we discussed was to tie cache entries to the > memory context they were demanded in, and mark them unused at the next > context reset/delete. That way they'd be considered unused at the same > points where the current implementation would certainly have discarded > the value. This isn't perfect (because of pfree) but might be good > enough. Yeah, I was thinking that's probably what would have to be done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
Hello, because I am not sure so any complex solution can be done to deadline for 9.1, I created a patch that is based on Tom ideas - just explicitly detoast function parameters. Regards Pavel 2011/1/19 Robert Haas <robertmhaas@gmail.com>: > On Wed, Jan 19, 2011 at 4:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> One idea that I think we discussed was to tie cache entries to the >> memory context they were demanded in, and mark them unused at the next >> context reset/delete. That way they'd be considered unused at the same >> points where the current implementation would certainly have discarded >> the value. This isn't perfect (because of pfree) but might be good >> enough. > > Yeah, I was thinking that's probably what would have to be done. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Attachment
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Noah Misch
Date:
On Sat, Jan 22, 2011 at 11:32:02AM +0100, Pavel Stehule wrote: > because I am not sure so any complex solution can be done to deadline > for 9.1, I created a patch that is based on Tom ideas - just > explicitly detoast function parameters. I can confirm that, for your original test case, this yields performance comparable to that of your original patch. This patch hooks into plpgsql_exec_function, detoasting only the function arguments. Therefore, it doesn't help in a test case like the one I posted in my original review. That test case initialized a variable using SELECT INTO, then used the variable in a loop. Is there any benefit to doing this in plpgsql_exec_function, versus exec_assign_value (Tom's suggestion), which would presumably help the other test case also? As we've discussed, unlike the original patch, this yields similarly grand performance regressions on functions that receive toasted arguments and never use them. Who is prepared to speculate that this will help more people than it will hurt? This patch is easier on -hackers than the original, but it seems much more likely to create measurable performance regressions in the field. It's clear the committers prefer it this way, but I remain skeptical. nm
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
Hello 2011/1/25 Noah Misch <noah@leadboat.com>: > On Sat, Jan 22, 2011 at 11:32:02AM +0100, Pavel Stehule wrote: >> because I am not sure so any complex solution can be done to deadline >> for 9.1, I created a patch that is based on Tom ideas - just >> explicitly detoast function parameters. > > I can confirm that, for your original test case, this yields performance > comparable to that of your original patch. I know it :(. I am thinking, so detoasting on usage is better, but I am don't know more about Tom or Rober's plans. > > This patch hooks into plpgsql_exec_function, detoasting only the function > arguments. Therefore, it doesn't help in a test case like the one I posted in > my original review. That test case initialized a variable using SELECT INTO, > then used the variable in a loop. Is there any benefit to doing this in > plpgsql_exec_function, versus exec_assign_value (Tom's suggestion), which would > presumably help the other test case also? I can explicitly detosting on assign stmt too. It's 6 lines more. Regards Pavel > > As we've discussed, unlike the original patch, this yields similarly grand > performance regressions on functions that receive toasted arguments and never > use them. Who is prepared to speculate that this will help more people than it > will hurt? This patch is easier on -hackers than the original, but it seems > much more likely to create measurable performance regressions in the field. > It's clear the committers prefer it this way, but I remain skeptical. > > nm >
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Heikki Linnakangas
Date:
On 25.01.2011 06:29, Pavel Stehule wrote: > 2011/1/25 Noah Misch<noah@leadboat.com>: >> On Sat, Jan 22, 2011 at 11:32:02AM +0100, Pavel Stehule wrote: >>> because I am not sure so any complex solution can be done to deadline >>> for 9.1, I created a patch that is based on Tom ideas - just >>> explicitly detoast function parameters. >> >> I can confirm that, for your original test case, this yields performance >> comparable to that of your original patch. > > I know it :(. I am thinking, so detoasting on usage is better, but I > am don't know more about Tom or Rober's plans. Detoasting on first usage, ie. exec_eval_datum(), seems the best to me. Compared to detoasting on assignment, it avoids the performance regression if the value is never used, and I don't think checking if the value is toasted at every exec_eval_datum() call adds too much overhead. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Detoasting on first usage, ie. exec_eval_datum(), seems the best to me. > Compared to detoasting on assignment, it avoids the performance > regression if the value is never used, and I don't think checking if the > value is toasted at every exec_eval_datum() call adds too much overhead. The arguments that were made against this were about maintenance costs and code footprint. Claims about performance are not really relevant, especially when they're entirely unsupported by evidence. regards, tom lane
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Tue, Jan 25, 2011 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Detoasting on first usage, ie. exec_eval_datum(), seems the best to me. >> Compared to detoasting on assignment, it avoids the performance >> regression if the value is never used, and I don't think checking if the >> value is toasted at every exec_eval_datum() call adds too much overhead. > > The arguments that were made against this were about maintenance costs > and code footprint. Claims about performance are not really relevant, > especially when they're entirely unsupported by evidence. How much evidence do you need to the effect that detoasting a value that's never used will hurt performance? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 25, 2011 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The arguments that were made against this were about maintenance costs >> and code footprint. �Claims about performance are not really relevant, >> especially when they're entirely unsupported by evidence. > How much evidence do you need to the effect that detoasting a value > that's never used will hurt performance? I agree that at some microscopic level it will cost something. What's not been shown is that there's any significant cost in any real-world use pattern. As Pavel said upthread, the main thing here is to get rid of cases where there are many many repeated detoastings. Adding an occasional detoast that wouldn't have happened before is a good tradeoff for that. To convince me that we should contort the code to go further, you need to show that that's more than a negligible cost. regards, tom lane
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Tue, Jan 25, 2011 at 10:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jan 25, 2011 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The arguments that were made against this were about maintenance costs >>> and code footprint. Claims about performance are not really relevant, >>> especially when they're entirely unsupported by evidence. > >> How much evidence do you need to the effect that detoasting a value >> that's never used will hurt performance? > > I agree that at some microscopic level it will cost something. What's > not been shown is that there's any significant cost in any real-world > use pattern. As Pavel said upthread, the main thing here is to get rid > of cases where there are many many repeated detoastings. Adding an > occasional detoast that wouldn't have happened before is a good tradeoff > for that. To convince me that we should contort the code to go further, > you need to show that that's more than a negligible cost. Well, what if somebody calls the function like this? SELECT foo(a, b) FROM huge_table This is not a particularly uncommon thing to do, and it might easily be the case that foo accesses b for some values of a but not all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/1/25 Tom Lane <tgl@sss.pgh.pa.us>: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jan 25, 2011 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The arguments that were made against this were about maintenance costs >>> and code footprint. Claims about performance are not really relevant, >>> especially when they're entirely unsupported by evidence. > >> How much evidence do you need to the effect that detoasting a value >> that's never used will hurt performance? > > I agree that at some microscopic level it will cost something. What's > not been shown is that there's any significant cost in any real-world > use pattern. As Pavel said upthread, the main thing here is to get rid > of cases where there are many many repeated detoastings. Adding an > occasional detoast that wouldn't have happened before is a good tradeoff > for that. To convince me that we should contort the code to go further, > you need to show that that's more than a negligible cost. I did a few tests: create table a1(a int); create table a2(a int) insert into a1 select array_fill(1, array[100]) from generate_series(1,10000); insert into a2 select array_fill(1, array[10000]) from generate_series(1,10000); create or replace function s(int[]) returns int as $$ declare s int = 0; i int; begin for i in array_lower($1,1) .. array_upper($1,1) loop s := s + $1[i]; end loop; return s; end; $$ language plpgsql immutable; next I tested queries 1, select sum(s(a)) from a1; 2, select sum(s(a)) from a2; variant a) my first patch - detoast on first usage with avoiding to useless detoast checking variant b) my first patch - detoast on first usage without avoiding to useless detoast checking time for 1 - about 300 ms, a is bout 1.5% faster than b time for 2 - about 30000 ms, a is about 3% faster than b Regards Pavel Stehule
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Tue, Jan 25, 2011 at 11:29 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > variant a) my first patch - detoast on first usage with avoiding to > useless detoast checking > variant b) my first patch - detoast on first usage without avoiding to > useless detoast checking > > time for 1 - about 300 ms, a is bout 1.5% faster than b > time for 2 - about 30000 ms, a is about 3% faster than b This makes your approach sound pretty good, but it sounds like we might need to find a better way to structure the code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/1/28 Robert Haas <robertmhaas@gmail.com>: > On Tue, Jan 25, 2011 at 11:29 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> variant a) my first patch - detoast on first usage with avoiding to >> useless detoast checking >> variant b) my first patch - detoast on first usage without avoiding to >> useless detoast checking >> >> time for 1 - about 300 ms, a is bout 1.5% faster than b >> time for 2 - about 30000 ms, a is about 3% faster than b > > This makes your approach sound pretty good, but it sounds like we > might need to find a better way to structure the code. > do you have a any idea? Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Fri, Jan 28, 2011 at 2:19 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2011/1/28 Robert Haas <robertmhaas@gmail.com>: >> On Tue, Jan 25, 2011 at 11:29 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> variant a) my first patch - detoast on first usage with avoiding to >>> useless detoast checking >>> variant b) my first patch - detoast on first usage without avoiding to >>> useless detoast checking >>> >>> time for 1 - about 300 ms, a is bout 1.5% faster than b >>> time for 2 - about 30000 ms, a is about 3% faster than b >> >> This makes your approach sound pretty good, but it sounds like we >> might need to find a better way to structure the code. >> > > do you have a any idea? At first blush, the should_be_detoasted flag looks completely unnecessary to me. I don't see why we have to set a flag in one part of the code to tell some other part of the code whether or not it should detoast. Why is that necessary or a good idea? Why can't we just make the decision at the point where we're detoasting? Also, I believe I'm agreeing with Tom when I say that exec_eval_datum() doesn't look like the right place to do this. Right now that function is a very simple switch, and I've found that it's usually a bad idea to stick complex code into the middle of such things. It looks like function only has three callers, so maybe we should look at each caller individually and see whether it needs this logic or not. For example, I'm guessing make_tuple_from_row() doesn't, because it's only calling exec_eval_datum() so that it can pass the results to heap_form_tuple(), which already contains some logic to detoast in certain cases. It's not clear why we'd want different logic here than we do anywhere else. The other callers are exec_assign_value(), where this looks like it could be important to avoid repeatedly detoasting the array, and plpgsql_param_fetch(), which I'm not sure about, but perhaps you have an idea or can benchmark it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/2/4 Robert Haas <robertmhaas@gmail.com>: > On Fri, Jan 28, 2011 at 2:19 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2011/1/28 Robert Haas <robertmhaas@gmail.com>: >>> On Tue, Jan 25, 2011 at 11:29 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> variant a) my first patch - detoast on first usage with avoiding to >>>> useless detoast checking >>>> variant b) my first patch - detoast on first usage without avoiding to >>>> useless detoast checking >>>> >>>> time for 1 - about 300 ms, a is bout 1.5% faster than b >>>> time for 2 - about 30000 ms, a is about 3% faster than b >>> >>> This makes your approach sound pretty good, but it sounds like we >>> might need to find a better way to structure the code. >>> >> >> do you have a any idea? > > At first blush, the should_be_detoasted flag looks completely > unnecessary to me. I don't see why we have to set a flag in one part > of the code to tell some other part of the code whether or not it > should detoast. Why is that necessary or a good idea? Why can't we > just make the decision at the point where we're detoasting? a logic about detoasting isn't exactly simple, and you see on tests, so "bypass" save a 3%. I can't to say, if it's much or less. Just take a 3%. Isn't nothing simpler than remove these flags. > > Also, I believe I'm agreeing with Tom when I say that > exec_eval_datum() doesn't look like the right place to do this. Right > now that function is a very simple switch, and I've found that it's > usually a bad idea to stick complex code into the middle of such > things. It looks like function only has three callers, so maybe we > should look at each caller individually and see whether it needs this > logic or not. For example, I'm guessing make_tuple_from_row() > doesn't, because it's only calling exec_eval_datum() so that it can > pass the results to heap_form_tuple(), which already contains some > logic to detoast in certain cases. It's not clear why we'd want > different logic here than we do anywhere else. The other callers are > exec_assign_value(), where this looks like it could be important to > first we have to decide when we will do a detoasting - there are two variants a) when we write a toasted data to variable b) when we use a toasted data @a needs to modify: copy parameters and exec_assign_value, @b needs to modify plpgsql_param_fetch or exec_eval_datum. Really important is plpgsql_param_fetch, because it is only point, where we know, so some Datum will be used, but wasn't detoasted yet. Problem is "wrong" memory context. And plpgsql_param_fetch is callback, that is emitted from different SPI functions so is hard to move a some lines to function, that will run under good context. There isn't necessary to detoast Datum, when value is only copied. The disadvantage of @a can be so we remove 95% of pathologic cases, and we create a 5% new. @b is terrible because the most important point (main work) is done under parser memory context. I don't understand you what is complex on (this code can be moved to function). But @b needs only one place for detosting, because it's nice centralised, that you and Tom dislike. oldctx = MemoryContextSwitchTo(fcecxt); if (!var->typbyval && var->typlen == -1) x = detoast_datum(var); if (x != var) { pfree(var) var = x; } MemoryContextSwitchTo(oldctx); You can put this code to plpgsql_param_fetch or exec_eval_datum or you have to copy this code two or three times. I have not a idea how to design it well to by you and Tom happy :( When I thinking about it, then I am sure, so best solution is really global cache of detoasted values. It cannot be well solved on local area. On local area I can do correct decision only when I know, so I am inside a cycle and I work with some Datum repeatedly. Detoasting on assign can has impacts on FOR stmt, iteration over cursors ... FOR a,b,c IN SELECT * FROM foo LOOP if a THEN CONTINUE; END IF; process b,c -- b and c are toastable values. END LOOP; > avoid repeatedly detoasting the array, and plpgsql_param_fetch(), > which I'm not sure about, but perhaps you have an idea or can > benchmark it. It's hard to benchmark it. I can create a use case, where @a win or @b win. I believe so in almost all cases @a or @b will be better than current state. And I afraid, so there can be situation where implicit detoast can be bad. Regards Pavel Stehule p.s. as one idea is a flag for function, that can to specify, if inside function can be detoasting lazy or aggressive .
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Noah Misch
Date:
Let's see if I can summarize the facts we've collected so far. I see four options based on the discussion: 1. Add PLpgSQL_var.should_be_detoasted; check it in plpgsql_param_fetch(). Essentially Pavel's original patch, only with the check logic moved up from exec_eval_datum() to plpgsql_param_fetch() to avoid bothering a couple other callers that would not benefit. Tom and Robert objected to the new bookkeeping. 2. Deduce the need to detoast and do so in plpgsql_param_fetch(). Avoids the new bookkeeping. Tom objected to the qualitative performance implications, and Pavel measured a 3% performance regression. 3. Deduce the need to detoast and do so in exec_assign_value(). Tom's proposal. This avoids the new bookkeeping and does not touch a hot path. It could eliminate a datum copy in some cases. Pavel, Noah, Heikki and Robert objected to the detoasts of never-referenced variables. 4. Change nothing. Or to perhaps put it even simpler: 1. Swallow some ugly bookkeeping. 2. Slow down a substantial range of PL/pgSQL code to a small extent. 3. Slow down unused-toasted-variable use cases to a large extent. 4. Leave the repeated-detoast use cases slow to a large extent. In principle, given access to a global profile of PL/pgSQL usage, we could choose objectively between #2, #3 and #4. I can't see an objective method for choosing between #1 and the others; we'd need a conversion factor between the value of the performance improvement and the cost of that code. In practice, we're in wholly subjective territory. nm
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Sun, Feb 6, 2011 at 5:52 AM, Noah Misch <noah@leadboat.com> wrote: > 1. Add PLpgSQL_var.should_be_detoasted; check it in plpgsql_param_fetch(). > Essentially Pavel's original patch, only with the check logic moved up from > exec_eval_datum() to plpgsql_param_fetch() to avoid bothering a couple other > callers that would not benefit. Tom and Robert objected to the new bookkeeping. I don't understand why it's necessary. It seems to me that the case we're concerned about is when someone is referencing a variable that is toasted which they might later want to reference again. We're going to have to notice that the value is toasted and detoast it anyway before we can really do anything with it. So why can't we arrange to overwrite the *source* of the data we're fetching with the detoasted version? I know this is probably a stupid question, but i don't understand the code well enough to see why this can't work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Noah Misch
Date:
On Sun, Feb 06, 2011 at 08:15:30AM -0500, Robert Haas wrote: > On Sun, Feb 6, 2011 at 5:52 AM, Noah Misch <noah@leadboat.com> wrote: > > 1. Add PLpgSQL_var.should_be_detoasted; check it in plpgsql_param_fetch(). > > Essentially Pavel's original patch, only with the check logic moved up from > > exec_eval_datum() to plpgsql_param_fetch() to avoid bothering a couple other > > callers that would not benefit. ?Tom and Robert objected to the new bookkeeping. > > I don't understand why it's necessary. It seems to me that the case > we're concerned about is when someone is referencing a variable that > is toasted which they might later want to reference again. We're > going to have to notice that the value is toasted and detoast it > anyway before we can really do anything with it. So why can't we > arrange to overwrite the *source* of the data we're fetching with the > detoasted version? > > I know this is probably a stupid question, but i don't understand the > code well enough to see why this can't work. The detoast currently happens well after PL/pgSQL has handed off the datum. Consider this function, my original benchmark when reviewing this patch: CREATE OR REPLACE FUNCTION f(runs int) RETURNS void LANGUAGE plpgsql AS $$ DECLARE foo text; BEGIN SELECT c INTOfoo FROM t; FOR n IN 1 .. runs LOOP PERFORM foo < 'x'; END LOOP; END $$; Suppose "foo" is toasted. As the code stands in master, it gets detoasted in text_lt(). Certainly we won't overwrite the source back in PL/pgSQL from the detoast point in text_lt(). Pavel's optimization requires that we identify the need to detoast the datum earlier and do so preemptively. Thanks, nm
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Sun, Feb 6, 2011 at 8:47 AM, Noah Misch <noah@leadboat.com> wrote: > On Sun, Feb 06, 2011 at 08:15:30AM -0500, Robert Haas wrote: >> On Sun, Feb 6, 2011 at 5:52 AM, Noah Misch <noah@leadboat.com> wrote: >> > 1. Add PLpgSQL_var.should_be_detoasted; check it in plpgsql_param_fetch(). >> > Essentially Pavel's original patch, only with the check logic moved up from >> > exec_eval_datum() to plpgsql_param_fetch() to avoid bothering a couple other >> > callers that would not benefit. ?Tom and Robert objected to the new bookkeeping. >> >> I don't understand why it's necessary. It seems to me that the case >> we're concerned about is when someone is referencing a variable that >> is toasted which they might later want to reference again. We're >> going to have to notice that the value is toasted and detoast it >> anyway before we can really do anything with it. So why can't we >> arrange to overwrite the *source* of the data we're fetching with the >> detoasted version? >> >> I know this is probably a stupid question, but i don't understand the >> code well enough to see why this can't work. > > The detoast currently happens well after PL/pgSQL has handed off the datum. > Consider this function, my original benchmark when reviewing this patch: > > CREATE OR REPLACE FUNCTION f(runs int) RETURNS void LANGUAGE plpgsql AS $$ > DECLARE > foo text; > BEGIN > SELECT c INTO foo FROM t; > FOR n IN 1 .. runs LOOP > PERFORM foo < 'x'; > END LOOP; > END > $$; > > Suppose "foo" is toasted. As the code stands in master, it gets detoasted in > text_lt(). Certainly we won't overwrite the source back in PL/pgSQL from the > detoast point in text_lt(). Right, that much seems obvious... > Pavel's optimization requires that we identify the > need to detoast the datum earlier and do so preemptively. I guess I need to look at the patch more. I'm failing to understand why that can't be done within one or two functions, without passing around a new piece of state. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Sun, Feb 6, 2011 at 9:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Suppose "foo" is toasted. As the code stands in master, it gets detoasted in >> text_lt(). Certainly we won't overwrite the source back in PL/pgSQL from the >> detoast point in text_lt(). > > Right, that much seems obvious... > >> Pavel's optimization requires that we identify the >> need to detoast the datum earlier and do so preemptively. > > I guess I need to look at the patch more. I'm failing to understand > why that can't be done within one or two functions, without passing > around a new piece of state. So it looks like there are only two places that set should_be_detoasted to something other than false. plpgsql_exec_function does this: var->should_be_detoasted = !var->isnull && !var->datatype->typbyval && var->datatype->typlen == -1; And exec_assign_value does this, which appears to be the same test in a different guise: if (!var->datatype->typbyval && !*isNull) { var->freeval = true; var->should_be_detoasted = var->datatype->typlen == -1; } Every other place where we set this flag, we appear to already know either that the value is null or that it can't be toasted anyhow. So can we just get rid of should_be_detoasted, and have exec_eval_datum() or its callers instead test: !var->isnull && var->datatype->typbyval && var->datatype->typlen == -1 && VARATT_IS_EXTENDED(var->value) I haven't tested this, but it's not clear that'd be measurably slower than checking a single Boolean. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Noah Misch
Date:
On Mon, Feb 07, 2011 at 11:16:18PM -0500, Robert Haas wrote: > So > can we just get rid of should_be_detoasted, and have exec_eval_datum() > or its callers instead test: > > !var->isnull && var->datatype->typbyval && var->datatype->typlen == -1 > && VARATT_IS_EXTENDED(var->value) FWIW, this is what I meant by option 2 in my summary. > I haven't tested this, but it's not clear that'd be measurably slower > than checking a single Boolean. Pavel benchmarked this or something close, measuring a performance loss: http://archives.postgresql.org/message-id/AANLkTikDHekc9r38w2ttzoMDr8vDaVAnr3LhqfJkEuL9@mail.gmail.com Tom also expressed concern over performance: http://archives.postgresql.org/message-id/24266.1295462892@sss.pgh.pa.us Not sure what's next. nm
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/2/8 Robert Haas <robertmhaas@gmail.com>: > On Sun, Feb 6, 2011 at 9:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> Suppose "foo" is toasted. As the code stands in master, it gets detoasted in >>> text_lt(). Certainly we won't overwrite the source back in PL/pgSQL from the >>> detoast point in text_lt(). >> >> Right, that much seems obvious... >> >>> Pavel's optimization requires that we identify the >>> need to detoast the datum earlier and do so preemptively. >> >> I guess I need to look at the patch more. I'm failing to understand >> why that can't be done within one or two functions, without passing >> around a new piece of state. > > So it looks like there are only two places that set > should_be_detoasted to something other than false. > > plpgsql_exec_function does this: > > var->should_be_detoasted = !var->isnull && > !var->datatype->typbyval > && var->datatype->typlen == -1; > > And exec_assign_value does this, which appears to be the same test in > a different guise: > > if (!var->datatype->typbyval && !*isNull) > { > var->freeval = true; > var->should_be_detoasted = var->datatype->typlen == -1; > } > > Every other place where we set this flag, we appear to already know > either that the value is null or that it can't be toasted anyhow. So > can we just get rid of should_be_detoasted, and have exec_eval_datum() > or its callers instead test: > > !var->isnull && var->datatype->typbyval && var->datatype->typlen == -1 > && VARATT_IS_EXTENDED(var->value) > > I haven't tested this, but it's not clear that'd be measurably slower > than checking a single Boolean. Probably less than 1-3%. I tested situation where these checks was when Datum was used. Assignment is less often, so some shortcut there these isn't not too important. Pavel p.s. There is lot of uneffectivity in assignment statement - different typoid, typmod means IO cast, so 1-3% from these checks doesn't play a big role. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/2/8 Noah Misch <noah@leadboat.com>: > On Mon, Feb 07, 2011 at 11:16:18PM -0500, Robert Haas wrote: >> So >> can we just get rid of should_be_detoasted, and have exec_eval_datum() >> or its callers instead test: >> >> !var->isnull && var->datatype->typbyval && var->datatype->typlen == -1 >> && VARATT_IS_EXTENDED(var->value) > > FWIW, this is what I meant by option 2 in my summary. > >> I haven't tested this, but it's not clear that'd be measurably slower >> than checking a single Boolean. > > Pavel benchmarked this or something close, measuring a performance loss: > http://archives.postgresql.org/message-id/AANLkTikDHekc9r38w2ttzoMDr8vDaVAnr3LhqfJkEuL9@mail.gmail.com I tested this in situation when Datum is detoasted on usage, not in assignment. So impact will be less. Regards Pavel Stehule > > Tom also expressed concern over performance: > http://archives.postgresql.org/message-id/24266.1295462892@sss.pgh.pa.us > > Not sure what's next. > > nm >
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Mon, Feb 7, 2011 at 11:52 PM, Noah Misch <noah@leadboat.com> wrote: > On Mon, Feb 07, 2011 at 11:16:18PM -0500, Robert Haas wrote: >> So >> can we just get rid of should_be_detoasted, and have exec_eval_datum() >> or its callers instead test: >> >> !var->isnull && var->datatype->typbyval && var->datatype->typlen == -1 >> && VARATT_IS_EXTENDED(var->value) > > FWIW, this is what I meant by option 2 in my summary. > >> I haven't tested this, but it's not clear that'd be measurably slower >> than checking a single Boolean. > > Pavel benchmarked this or something close, measuring a performance loss: > http://archives.postgresql.org/message-id/AANLkTikDHekc9r38w2ttzoMDr8vDaVAnr3LhqfJkEuL9@mail.gmail.com > > Tom also expressed concern over performance: > http://archives.postgresql.org/message-id/24266.1295462892@sss.pgh.pa.us > > Not sure what's next. Well, Pavel's subsequent reply suggested that he didn't test exactly this thing, so maybe there's hope. Or maybe not. If Tom thought one branch inside exec_eval_datum() was going to be too expensive, four isn't going to be better. But I think we're out of time to work on this for this cycle. Even if my latest idea is brilliant (and it may not be), we still have to test it in a variety of cases and get consensus on it, which seems like more than we can manage right now. I think it's time to mark this one Returned with Feedback, or perhaps Rejected would be more accurate in this instance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Noah Misch
Date:
On Tue, Feb 08, 2011 at 08:00:42AM +0100, Pavel Stehule wrote: > 2011/2/8 Noah Misch <noah@leadboat.com>: > > On Mon, Feb 07, 2011 at 11:16:18PM -0500, Robert Haas wrote: > >> So > >> can we just get rid of should_be_detoasted, and have exec_eval_datum() > >> or its callers instead test: > >> > >> !var->isnull && var->datatype->typbyval && var->datatype->typlen == -1 > >> && VARATT_IS_EXTENDED(var->value) > > > > FWIW, this is what I meant by option 2 in my summary. > > > >> I haven't tested this, but it's not clear that'd be measurably slower > >> than checking a single Boolean. > > > > Pavel benchmarked this or something close, measuring a performance loss: > > http://archives.postgresql.org/message-id/AANLkTikDHekc9r38w2ttzoMDr8vDaVAnr3LhqfJkEuL9@mail.gmail.com > > I tested this in situation when Datum is detoasted on usage, not in > assignment. So impact will be less. Robert spoke of doing it on usage (exec_eval_datum()) too, though.
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/2/8 Noah Misch <noah@leadboat.com>: > On Tue, Feb 08, 2011 at 08:00:42AM +0100, Pavel Stehule wrote: >> 2011/2/8 Noah Misch <noah@leadboat.com>: >> > On Mon, Feb 07, 2011 at 11:16:18PM -0500, Robert Haas wrote: >> >> So >> >> can we just get rid of should_be_detoasted, and have exec_eval_datum() >> >> or its callers instead test: >> >> >> >> !var->isnull && var->datatype->typbyval && var->datatype->typlen == -1 >> >> && VARATT_IS_EXTENDED(var->value) >> > >> > FWIW, this is what I meant by option 2 in my summary. >> > >> >> I haven't tested this, but it's not clear that'd be measurably slower >> >> than checking a single Boolean. >> > >> > Pavel benchmarked this or something close, measuring a performance loss: >> > http://archives.postgresql.org/message-id/AANLkTikDHekc9r38w2ttzoMDr8vDaVAnr3LhqfJkEuL9@mail.gmail.com >> >> I tested this in situation when Datum is detoasted on usage, not in >> assignment. So impact will be less. > > Robert spoke of doing it on usage (exec_eval_datum()) too, though. > I am blind, sorry Regards Pavel
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/2/8 Robert Haas <robertmhaas@gmail.com>: > On Mon, Feb 7, 2011 at 11:52 PM, Noah Misch <noah@leadboat.com> wrote: >> On Mon, Feb 07, 2011 at 11:16:18PM -0500, Robert Haas wrote: >>> So >>> can we just get rid of should_be_detoasted, and have exec_eval_datum() >>> or its callers instead test: >>> >>> !var->isnull && var->datatype->typbyval && var->datatype->typlen == -1 >>> && VARATT_IS_EXTENDED(var->value) >> >> FWIW, this is what I meant by option 2 in my summary. >> >>> I haven't tested this, but it's not clear that'd be measurably slower >>> than checking a single Boolean. >> >> Pavel benchmarked this or something close, measuring a performance loss: >> http://archives.postgresql.org/message-id/AANLkTikDHekc9r38w2ttzoMDr8vDaVAnr3LhqfJkEuL9@mail.gmail.com >> >> Tom also expressed concern over performance: >> http://archives.postgresql.org/message-id/24266.1295462892@sss.pgh.pa.us >> >> Not sure what's next. > > Well, Pavel's subsequent reply suggested that he didn't test exactly > this thing, so maybe there's hope. > > Or maybe not. If Tom thought one branch inside exec_eval_datum() was > going to be too expensive, four isn't going to be better. > > But I think we're out of time to work on this for this cycle. Even if > my latest idea is brilliant (and it may not be), we still have to test > it in a variety of cases and get consensus on it, which seems like > more than we can manage right now. I think it's time to mark this one > Returned with Feedback, or perhaps Rejected would be more accurate in > this instance. if you have a briliant idea, then, please, send a path :). There was more ideas, and I am little bit lost. I'll have a time on weekend, and I can do some tests. Regards Pavel Stehule > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Noah Misch
Date:
On Tue, Feb 08, 2011 at 10:24:03AM -0500, Robert Haas wrote: > Well, Pavel's subsequent reply suggested that he didn't test exactly > this thing, so maybe there's hope. No hope on that basis, no. > Or maybe not. If Tom thought one branch inside exec_eval_datum() was > going to be too expensive, four isn't going to be better. He was commenting on a proposal equivalent to yours. You might want to reread this thread in its entirety; we're coming full circle. > But I think we're out of time to work on this for this cycle. Even if > my latest idea is brilliant (and it may not be), we still have to test > it in a variety of cases and get consensus on it, which seems like > more than we can manage right now. I think it's time to mark this one > Returned with Feedback, or perhaps Rejected would be more accurate in > this instance. It's not as if this patch raised complex questions that folks need more time to digest. For a patch this small and simple, we minimally owe Pavel a direct answer about its rejection. Thanks, nm
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/2/8 Noah Misch <noah@leadboat.com>: > On Tue, Feb 08, 2011 at 10:24:03AM -0500, Robert Haas wrote: >> Well, Pavel's subsequent reply suggested that he didn't test exactly >> this thing, so maybe there's hope. > > No hope on that basis, no. > >> Or maybe not. If Tom thought one branch inside exec_eval_datum() was >> going to be too expensive, four isn't going to be better. > > He was commenting on a proposal equivalent to yours. You might want to reread > this thread in its entirety; we're coming full circle. > >> But I think we're out of time to work on this for this cycle. Even if >> my latest idea is brilliant (and it may not be), we still have to test >> it in a variety of cases and get consensus on it, which seems like >> more than we can manage right now. I think it's time to mark this one >> Returned with Feedback, or perhaps Rejected would be more accurate in >> this instance. > > It's not as if this patch raised complex questions that folks need more time to > digest. For a patch this small and simple, we minimally owe Pavel a direct > answer about its rejection. > It's not necessary. But we did not go one step forward :(. The same situation can be repeated next year. so I would to see a Robert's patch, if it is possible. And can be nice if this simple issue can be done with this commitfest. I can do more performance test of my initial patch. Maybe this patch isn't nice, but I am sure, so this version has a minimal negative impact and maximal positive impact. We speaking about twenty lines that can removed when people will report a potential problems, so any variant - mine or Tom's can be simply removed. It doesn't modify behave of executions, it doesn't append a new feature. It just remove a one pathologic (+/-) issue. I can live without this patch. I know workaround and know a symptoms. It's pity so there isn't any PostGIS user (in this discus). For these people it can be a implicit detoasting interesting and can help with real tests. Regards Pavel > Thanks, > nm >
Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Robert Haas
Date:
On Tue, Feb 8, 2011 at 11:38 AM, Noah Misch <noah@leadboat.com> wrote: > It's not as if this patch raised complex questions that folks need more time to > digest. For a patch this small and simple, we minimally owe Pavel a direct > answer about its rejection. Well, I don't see how we can give a totally straightforward answer at this point. There are several proposals on the table, and they have different pros and cons. Nobody is completely happy with any of them, AFAICT. I think as far as the original patch goes, it's rejected. Is there a variant of that approach that gives the same benefit with better style? I don't know. I might be able to figure something out if I spent an afternoon on it, but why is that my job? There is sometimes a perception among non-committers that committers are hiding the ball, as if the criteria for patch acceptance were purely arbitrary and we make people guess what we want and then complain when we don't get it. I've even had that perception myself a time or two, but I try hard not to do that and I think (hope) that other committers do as well. I've had my own share of ideas that I thought were good and then had to abandon them either because they had some deficiency which someone pointed out to me and I had to give up, or else because I couldn't get consensus that the new behavior was better than the old, even though it emphatically seemed so to me. I've also had plenty of ideas that got shot down multiple times before finally being accepted. I can't, and don't, accept that there isn't some way to improve the repeated detoasting situation, but I do not know what the best solution is technically. I don't even have an opinion, without a lot more work. And I certainly don't have the ability to know what Tom or someone else will think about the code that that solution requires. The only thing I think IS clear is that despite three weeks of discussion, we have no consensus on any of the proposed patches, nor is there any clear path to reaching a consensus. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Bruce Momjian
Date:
What happened with this patch? Alvaro saw a 25x speedup. --------------------------------------------------------------------------- Pavel Stehule wrote: > Hello > > this patch remove a multiple detoasting of varlena values in plpgsql. > > It is usable mainly for iteration over longer array directly loaded > from relation. > > It's doesn't have a impact on semantic or behave - it's just eliminate > some performance trap. > > sample: table 10000 rows one column with array with 1000 string fields: > > patched pl time: 6 sec > unpatched pl time: 170 sec > > This doesn't change my opinion on FOR-IN-ARRAY cycle (is still > important for readability) - just remove one critical performance > issue > > Regards > > Pavel Stehule [ Attachment, skipping... ] > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes: > What happened with this patch? Alvaro saw a 25x speedup. It got bounced. regards, tom lane
Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
From
Pavel Stehule
Date:
2011/3/11 Bruce Momjian <bruce@momjian.us>: > > What happened with this patch? Alvaro saw a 25x speedup. There is not conformance about form in this patch. But there are a FOREACH statement - so if somebody uses this statement, then he will not have a problems with performance. Regards Pavel > > --------------------------------------------------------------------------- > > Pavel Stehule wrote: >> Hello >> >> this patch remove a multiple detoasting of varlena values in plpgsql. >> >> It is usable mainly for iteration over longer array directly loaded >> from relation. >> >> It's doesn't have a impact on semantic or behave - it's just eliminate >> some performance trap. >> >> sample: table 10000 rows one column with array with 1000 string fields: >> >> patched pl time: 6 sec >> unpatched pl time: 170 sec >> >> This doesn't change my opinion on FOR-IN-ARRAY cycle (is still >> important for readability) - just remove one critical performance >> issue >> >> Regards >> >> Pavel Stehule > > [ Attachment, skipping... ] > >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + >