Thread: [BUGS] Postgresql bug report - unexpected behavior of suppress_redundant_updates_trigger

Summary
The trigger suppress_redundant_updates_trigger does not suppress updates of strings larger than 16000 bit (or 2000 octet) with equal values

Platform information
  • PostgreSQL version: 9.6.3, server 9.5.6
  • C library version: glibc 2.19 on docker container running PostgreSQL server (glibc 2.25 on hosting linux)
  • uname -a => Linux adb 4.11.3-1-ARCH #1 SMP PREEMPT Sun May 28 10:40:17 CEST 2017 x86_64 GNU/Linux
Although the behavior was observed on various systems.

Full steps to reproduce executed
$ psql -h localhost -U postgres
psql (9.6.3, server 9.5.6)
Type "help" for help.
postgres=# CREATE DATABASE test_db;
CREATE DATABASE
postgres=# CREATE TABLE test_table (id int, field text);
CREATE TABLE
postgres=# INSERT INTO test_table VALUES (1, 'hi');
INSERT 0 1
postgres=# UPDATE test_table SET field = 'hi' WHERE id = 1;
UPDATE 1
postgres=# CREATE TRIGGER suppress_redundant_updates BEFORE UPDATE ON test_table FOR EACH ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger();
CREATE TRIGGER
postgres=# UPDATE test_table SET field = 'hi' WHERE id = 1;
UPDATE 0
test_db=# UPDATE test_table SET field = rpad('', 2001, 'a') WHERE id = 1;
UPDATE 1
test_db=# UPDATE test_table SET field = rpad('', 2001, 'a') WHERE id = 1;
UPDATE 1 <--- BUG: expected 0, as we ran the same update twice

// other tests:
test_db=# UPDATE test_table SET field = rpad('', 2000, 'a') WHERE id = 1;
UPDATE 1
test_db=# UPDATE test_table SET field = rpad('', 2000, 'a') WHERE id = 1;
UPDATE 0 <--- exactly 2000 octets, so OK

test_db=# UPDATE test_table SET field = rpad('', 1000, 'ó') || 'a' WHERE id = 1;
UPDATE 1
test_db=# UPDATE test_table SET field = rpad('', 1000, 'ó') || 'a' WHERE id = 1;
UPDATE 1 <--- BUG: because 'ó' is 2 octet long
On Mon, Jun 19, 2017 at 5:20 PM, Artus de benque
<artusdebenque@gmail.com> wrote:
> postgres=# UPDATE test_table SET field = 'hi' WHERE id = 1;
> UPDATE 0
> test_db=# UPDATE test_table SET field = rpad('', 2001, 'a') WHERE id = 1;
> UPDATE 1
> test_db=# UPDATE test_table SET field = rpad('', 2001, 'a') WHERE id = 1;
> UPDATE 1 <--- BUG: expected 0, as we ran the same update twice

Seems like in "suppress_redundant_updates_trigger"  we are comparing
toasted tuple with the new tuple and that is the cause of the bug.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Dilip Kumar <dilipbalaut@gmail.com> writes:
> On Mon, Jun 19, 2017 at 5:20 PM, Artus de benque
> <artusdebenque@gmail.com> wrote:
>> postgres=# UPDATE test_table SET field = 'hi' WHERE id = 1;
>> UPDATE 0
>> test_db=# UPDATE test_table SET field = rpad('', 2001, 'a') WHERE id = 1;
>> UPDATE 1
>> test_db=# UPDATE test_table SET field = rpad('', 2001, 'a') WHERE id = 1;
>> UPDATE 1 <--- BUG: expected 0, as we ran the same update twice

> Seems like in "suppress_redundant_updates_trigger"  we are comparing
> toasted tuple with the new tuple and that is the cause of the bug.

I don't think it's a bug, I think it's an intentional design tradeoff.
To suppress an update in this case, the trigger would have to grovel
through the individual fields and detoast them before comparing.
That would add a lot of cycles, and only seldom add successes.

Possibly we should adjust the documentation so that it doesn't imply
that this trigger guarantees to suppress every no-op update.
        regards, tom lane



On Mon, Jun 19, 2017 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Seems like in "suppress_redundant_updates_trigger"  we are comparing
>> toasted tuple with the new tuple and that is the cause of the bug.
>
> I don't think it's a bug, I think it's an intentional design tradeoff.
> To suppress an update in this case, the trigger would have to grovel
> through the individual fields and detoast them before comparing.
> That would add a lot of cycles, and only seldom add successes.
>
> Possibly we should adjust the documentation so that it doesn't imply
> that this trigger guarantees to suppress every no-op update.

That doesn't sound like a very plausible argument to me.  I don't
think that a proposal to add a function named
sometimes_suppress_redundant_updates_trigger() would've attracted many
votes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jun 19, 2017 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't think it's a bug, I think it's an intentional design tradeoff.
>> To suppress an update in this case, the trigger would have to grovel
>> through the individual fields and detoast them before comparing.
>> That would add a lot of cycles, and only seldom add successes.
>> 
>> Possibly we should adjust the documentation so that it doesn't imply
>> that this trigger guarantees to suppress every no-op update.

> That doesn't sound like a very plausible argument to me.  I don't
> think that a proposal to add a function named
> sometimes_suppress_redundant_updates_trigger() would've attracted many
> votes.

You'd be wrong.  The entire point of this trigger is to save cycles,
so having it eat a lot of cycles only to fail is not an improvement.
        regards, tom lane



Hi,

It looks like you know what is happening, but I found that I have made an error in my original assumption: (while the steps to reproduce are still valid)

The size of the string at which the trigger does not work as expected varies, depending on the size of the other fields in the row. 

The 'limit size' is lower if I set bigger values in another text field in the same row (and it seems that it is reached when going above 2000 octet for the texts cells added up).


Sorry if this is noise, and thank you for looking into the bug (or documentation error).

Regards,

Artus de Benque


Le lun. 19 juin 2017 à 18:20, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jun 19, 2017 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't think it's a bug, I think it's an intentional design tradeoff.
>> To suppress an update in this case, the trigger would have to grovel
>> through the individual fields and detoast them before comparing.
>> That would add a lot of cycles, and only seldom add successes.
>>
>> Possibly we should adjust the documentation so that it doesn't imply
>> that this trigger guarantees to suppress every no-op update.

> That doesn't sound like a very plausible argument to me.  I don't
> think that a proposal to add a function named
> sometimes_suppress_redundant_updates_trigger() would've attracted many
> votes.

You'd be wrong.  The entire point of this trigger is to save cycles,
so having it eat a lot of cycles only to fail is not an improvement.

                        regards, tom lane
Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jun 19, 2017 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I don't think it's a bug, I think it's an intentional design tradeoff.
> >> To suppress an update in this case, the trigger would have to grovel
> >> through the individual fields and detoast them before comparing.
> >> That would add a lot of cycles, and only seldom add successes.
> >> 
> >> Possibly we should adjust the documentation so that it doesn't imply
> >> that this trigger guarantees to suppress every no-op update.
> 
> > That doesn't sound like a very plausible argument to me.  I don't
> > think that a proposal to add a function named
> > sometimes_suppress_redundant_updates_trigger() would've attracted many
> > votes.
> 
> You'd be wrong.  The entire point of this trigger is to save cycles,
> so having it eat a lot of cycles only to fail is not an improvement.

I suppose that either behavior may be desirable depending on
circumstances.  Maybe it is possible to have each installed trigger be
configurable so that it can select either behavior.  (Maybe use the
trigger argument as a column list, and for each column in the list, do a
full detoast and compare instead of relying on toast pointer equality).

The current behavior seems more convenient in more cases, and so should
remain the default.

But this sounds like an additional feature, not a bug.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 06/19/2017 11:40 AM, Dilip Kumar wrote:
> ... Artus de benque ... wrote:
>> ...=# UPDATE test_table SET field = rpad('', 2001, 'a') WHERE id = 1;
>
> Seems like in "suppress_redundant_updates_trigger"  we are comparing
> toasted tuple with the new tuple and that is the cause of the bug.

Something still puzzles me about this, though, maybe only because
I don't know enough about TOAST.

The size of 'field' ends up 2001, or just over the threshold where
TOASTing will be attempted at all. The report doesn't mention changing
the strategy from the default EXTENDED, so won't the first thing
attempted be compression? Won't that succeed spectacularly, since the
test string is a single character 2001 times, probably producing
a compressed string a handful of bytes long, well under the threshold,
obviating any need to go further with TOAST pointers?

Is the compression algorithm nondeterministic? Is there some way
that compressing the same 2001*'a' on two occasions would produce
compressed strings that don't match?

What exactly is s_r_u_t() comparing, in the case where the TOASTed
value has been compressed, but not out-of-lined?

-Chap



J Chapman Flack <jflack@math.purdue.edu> writes:
> On 06/19/2017 11:40 AM, Dilip Kumar wrote:
>> ... Artus de benque ... wrote:
>>> ...=# UPDATE test_table SET field = rpad('', 2001, 'a') WHERE id = 1;

>> Seems like in "suppress_redundant_updates_trigger"  we are comparing
>> toasted tuple with the new tuple and that is the cause of the bug.

> Something still puzzles me about this, though, maybe only because
> I don't know enough about TOAST.

> The size of 'field' ends up 2001, or just over the threshold where
> TOASTing will be attempted at all. The report doesn't mention changing
> the strategy from the default EXTENDED, so won't the first thing
> attempted be compression? Won't that succeed spectacularly, since the
> test string is a single character 2001 times, probably producing
> a compressed string a handful of bytes long, well under the threshold,
> obviating any need to go further with TOAST pointers?

Right, given the facts at hand, the stored old tuple has probably
got a compressed-in-line version of "field".  However, the *new*
tuple is a transient tuple containing the uncompressed result of
rpad().  We don't bother to try to compress fields or shove them
out-of-line until after all the BEFORE ROW triggers are done ---
if we did, the effort might just be wasted, if the triggers change
those fields or cancel the update altogether.  So the trigger is
seeing a compressed vs. an uncompressed version of the field value,
and since it's just doing a dumb bitwise compare, they don't look
equal.

As I mentioned upthread, it'd certainly be possible for the trigger
to iterate through the fields in a datatype-aware fashion and undo
compression or out-of-lineing before the comparison.  But that would
eat a lot more cycles than the current implementation, and it seems
dubious that it's worth it.  If the trigger is succeeding (ie,
detecting a no-op update) often enough that it would be worth that,
you've really got an application-stupidity problem to fix.

> Is the compression algorithm nondeterministic?

I don't think so.
        regards, tom lane



Tom Lane wrote:

> As I mentioned upthread, it'd certainly be possible for the trigger
> to iterate through the fields in a datatype-aware fashion and undo
> compression or out-of-lineing before the comparison.  But that would
> eat a lot more cycles than the current implementation, and it seems
> dubious that it's worth it.  If the trigger is succeeding (ie,
> detecting a no-op update) often enough that it would be worth that,
> you've really got an application-stupidity problem to fix.

ISTM the whole point of suppress_redundant_updates_trigger is to cope
with application stupidity.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> ... If the trigger is succeeding (ie,
>> detecting a no-op update) often enough that it would be worth that,
>> you've really got an application-stupidity problem to fix.

> ISTM the whole point of suppress_redundant_updates_trigger is to cope
> with application stupidity.

I think it's a suitable band-aid for limited amounts of stupidity.
But it eliminates only a small fraction of the total overhead involved
in a useless update command.  So I remain of the opinion that if that's
happening a lot, you're better off fixing the problem somewhere upstream.
        regards, tom lane



On Mon, Jun 19, 2017 at 2:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Tom Lane wrote:
>>> ... If the trigger is succeeding (ie,
>>> detecting a no-op update) often enough that it would be worth that,
>>> you've really got an application-stupidity problem to fix.
>
>> ISTM the whole point of suppress_redundant_updates_trigger is to cope
>> with application stupidity.
>
> I think it's a suitable band-aid for limited amounts of stupidity.
> But it eliminates only a small fraction of the total overhead involved
> in a useless update command.  So I remain of the opinion that if that's
> happening a lot, you're better off fixing the problem somewhere upstream.

At first glance I think I'd rather have it do the correct thing all of
the time, even if it takes longer, so that my only trade-off decision
is whether to improve performance by fixing the application.

Ideally if the input tuple wouldn't require compression we wouldn't
bother to decompress the stored tuple.

David J.



On 06/19/2017 05:19 PM, David G. Johnston wrote:

> At first glance I think I'd rather have it do the correct thing all of
> the time, even if it takes longer, so that my only trade-off decision
> is whether to improve performance by fixing the application.
> 
> Ideally if the input tuple wouldn't require compression we wouldn't
> bother to decompress the stored tuple.

That looks like one reasonable elimination check.

I wonder how much closer it could get with some changes that wouldn't
necessarily use many more cycles.

One might be a less_easy queue; marching through the tuple
comparing fields, if one is found to be TOASTed, throw it
on the queue and march on. Only if all the easy ones matched
is there any point in looking at the queue.

At that point, there could be a tunable for how much effort
to expend. Perhaps I'm willing to decompress an inline value,
but not retrieve an out-of-line one? For the TOAST compression
algorithm I'm not sure of the balance between compression
and decompression effort; I know gzip decompression is pretty cheap.

-Chap