Thread: BUG #8163: simultaneous nearly identical update queries execute extremely slowly

BUG #8163: simultaneous nearly identical update queries execute extremely slowly

From
tcook@blackducksoftware.com
Date:
The following bug has been logged on the website:

Bug reference:      8163
Logged by:          Todd Cook
Email address:      tcook@blackducksoftware.com
PostgreSQL version: 8.4.16
Operating system:   Fedora 14
Description:        =


When nearly identical update queries arrive simultaneously, the first one to
execute runs normally, but subsequent executions run _extremely_ slowly. =

We've seen this behaviour in production, and the contrived test case below
reproduces the issue.

test=3D> select version() ;
                                                      version               =

                                       =

---------------------------------------------------------------------------=
-----------------------------------------
 PostgreSQL 8.4.16 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.4.4 20100726 (Red Hat 4.4.4-13), 64-bit


To set up:

create table prof as select i as id, i::text col1, (i*2)::text col2 ,
(i*3)::text col3, i*2 col4, md5((i % 5999)::text) as hash, (i % 6000)::text
as hint, (i*4)::text col6, i*5 col7, i*6 col8 from
generate_series(1,36000000) i ;
create table tree as select 'fixed16charstrng'::text as changeme, md5((i %
200000)::text) as hash from generate_series(1,400000) i ;
create index tree_hash_idx on tree(hash) ;

The problematic query run in isolation:

explain analyze update tree set changeme =3D 'changed' where hash in (select
hash from prof where hint =3D '2500') ;
                                                          QUERY PLAN        =

                                                 =

---------------------------------------------------------------------------=
---------------------------------------------------
 Nested Loop  (cost=3D1000098.75..1000104.44 rows=3D11583 width=3D39) (actu=
al
time=3D6765.316..6871.167 rows=3D11998 loops=3D1)
   ->  HashAggregate  (cost=3D1000098.75..1000098.76 rows=3D1 width=3D33) (=
actual
time=3D6765.264..6768.259 rows=3D5999 loops=3D1)
         ->  Seq Scan on prof  (cost=3D0.00..1000084.15 rows=3D5840 width=
=3D33)
(actual time=3D1.351..6755.691 rows=3D6000 loops=3D1)
               Filter: (hint =3D '2500'::text)
   ->  Index Scan using tree_hash_idx on tree  (cost=3D0.00..5.65 rows=3D2
width=3D39) (actual time=3D0.014..0.016 rows=3D2 loops=3D5999)
         Index Cond: (tree.hash =3D prof.hash)
 Total runtime: 7132.700 ms
(7 rows)

To exercise the problem (assuming a database named "test"):
psql -c "update tree set changeme =3D 'changed' where hash in (select hash
from prof where hint =3D '2500')" test &
psql -c "update tree set changeme =3D 'changed' where hash in (select hash
from prof where hint =3D '2501')" test &
psql -c "update tree set changeme =3D 'changed' where hash in (select hash
from prof where hint =3D '2502')" test &
psql -c "update tree set changeme =3D 'changed' where hash in (select hash
from prof where hint =3D '2503')" test &
psql -c "update tree set changeme =3D 'changed' where hash in (select hash
from prof where hint =3D '2504')" test &
psql -c "update tree set changeme =3D 'changed' where hash in (select hash
from prof where hint =3D '2505')" test &
psql -c "update tree set changeme =3D 'changed' where hash in (select hash
from prof where hint =3D '2506')" test &

One of the update begins executing immediately, while the others block
waiting on the first (which is expected). The first update finished in under
10 seconds, and another one started executing; however, this second one has
now been executing for 2 hours.

strace output from that backend is almost exclusively reads, with only a few
calls to lseek.  Attaching with gdb and interrupting a few times mostly gave
this backtrace:

#0  0x0000003b812d3490 in __read_nocancel () from /lib64/libc.so.6
#1  0x00000000005cd0cd in FileRead ()
#2  0x00000000005dc55d in mdread ()
#3  0x00000000005ca315 in ReadBuffer_common ()
#4  0x00000000005cac7f in ReadBufferExtended ()
#5  0x0000000000460c8b in heapgetpage ()
#6  0x000000000046110a in heapgettup_pagemode ()
#7  0x0000000000461b56 in heap_getnext ()
#8  0x000000000054ef18 in SeqNext ()
#9  0x00000000005429ba in ExecScan ()
#10 0x000000000053b8a8 in ExecProcNode ()
#11 0x0000000000547ac8 in ExecAgg ()
#12 0x000000000053b7b8 in ExecProcNode ()
#13 0x000000000054e031 in ExecNestLoop ()
#14 0x000000000053b818 in ExecProcNode ()
#15 0x000000000053827e in EvalPlanQualNext ()
#16 0x000000000053867b in EvalPlanQual ()
#17 0x0000000000539afd in standard_ExecutorRun ()
#18 0x00007f796347881b in pgss_ExecutorRun (queryDesc=3D0x1af53b0,
direction=3DForwardScanDirection, count=3D0) at pg_stat_statements.c:516
#19 0x00000000005e3ad1 in ProcessQuery ()
#20 0x00000000005e3cd4 in PortalRunMulti ()
#21 0x00000000005e4452 in PortalRun ()
#22 0x00000000005dfac9 in exec_simple_query ()
#23 0x00000000005e10f3 in PostgresMain ()
#24 0x00000000005b4e73 in ServerLoop ()
#25 0x00000000005b729e in PostmasterMain ()
#26 0x0000000000562578 in main ()
On 05/15/13 13:27, tcook@blackducksoftware.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      8163
> Logged by:          Todd Cook
> Email address:      tcook@blackducksoftware.com
> PostgreSQL version: 8.4.16
> Operating system:   Fedora 14
> Description:
>
> When nearly identical update queries arrive simultaneously, the first one to
> execute runs normally, but subsequent executions run _extremely_ slowly.
> We've seen this behaviour in production, and the contrived test case below
> reproduces the issue.

I've repeated the test below on a 9.1.9 installation, and it works fine there.
Each update finished in about 7 seconds.

-- todd

>
> test=> select version() ;
>                                                        version
>
> --------------------------------------------------------------------------------------------------------------------
>   PostgreSQL 8.4.16 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
> 4.4.4 20100726 (Red Hat 4.4.4-13), 64-bit
>
>
> To set up:
>
> create table prof as select i as id, i::text col1, (i*2)::text col2 ,
> (i*3)::text col3, i*2 col4, md5((i % 5999)::text) as hash, (i % 6000)::text
> as hint, (i*4)::text col6, i*5 col7, i*6 col8 from
> generate_series(1,36000000) i ;
> create table tree as select 'fixed16charstrng'::text as changeme, md5((i %
> 200000)::text) as hash from generate_series(1,400000) i ;
> create index tree_hash_idx on tree(hash) ;
>
> The problematic query run in isolation:
>
> explain analyze update tree set changeme = 'changed' where hash in (select
> hash from prof where hint = '2500') ;
>                                                            QUERY PLAN
>
>
------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop  (cost=1000098.75..1000104.44 rows=11583 width=39) (actual
> time=6765.316..6871.167 rows=11998 loops=1)
>     ->  HashAggregate  (cost=1000098.75..1000098.76 rows=1 width=33) (actual
> time=6765.264..6768.259 rows=5999 loops=1)
>           ->  Seq Scan on prof  (cost=0.00..1000084.15 rows=5840 width=33)
> (actual time=1.351..6755.691 rows=6000 loops=1)
>                 Filter: (hint = '2500'::text)
>     ->  Index Scan using tree_hash_idx on tree  (cost=0.00..5.65 rows=2
> width=39) (actual time=0.014..0.016 rows=2 loops=5999)
>           Index Cond: (tree.hash = prof.hash)
>   Total runtime: 7132.700 ms
> (7 rows)
>
> To exercise the problem (assuming a database named "test"):
> psql -c "update tree set changeme = 'changed' where hash in (select hash
> from prof where hint = '2500')" test &
> psql -c "update tree set changeme = 'changed' where hash in (select hash
> from prof where hint = '2501')" test &
> psql -c "update tree set changeme = 'changed' where hash in (select hash
> from prof where hint = '2502')" test &
> psql -c "update tree set changeme = 'changed' where hash in (select hash
> from prof where hint = '2503')" test &
> psql -c "update tree set changeme = 'changed' where hash in (select hash
> from prof where hint = '2504')" test &
> psql -c "update tree set changeme = 'changed' where hash in (select hash
> from prof where hint = '2505')" test &
> psql -c "update tree set changeme = 'changed' where hash in (select hash
> from prof where hint = '2506')" test &
>
> One of the update begins executing immediately, while the others block
> waiting on the first (which is expected). The first update finished in under
> 10 seconds, and another one started executing; however, this second one has
> now been executing for 2 hours.
>
> strace output from that backend is almost exclusively reads, with only a few
> calls to lseek.  Attaching with gdb and interrupting a few times mostly gave
> this backtrace:
>
> #0  0x0000003b812d3490 in __read_nocancel () from /lib64/libc.so.6
> #1  0x00000000005cd0cd in FileRead ()
> #2  0x00000000005dc55d in mdread ()
> #3  0x00000000005ca315 in ReadBuffer_common ()
> #4  0x00000000005cac7f in ReadBufferExtended ()
> #5  0x0000000000460c8b in heapgetpage ()
> #6  0x000000000046110a in heapgettup_pagemode ()
> #7  0x0000000000461b56 in heap_getnext ()
> #8  0x000000000054ef18 in SeqNext ()
> #9  0x00000000005429ba in ExecScan ()
> #10 0x000000000053b8a8 in ExecProcNode ()
> #11 0x0000000000547ac8 in ExecAgg ()
> #12 0x000000000053b7b8 in ExecProcNode ()
> #13 0x000000000054e031 in ExecNestLoop ()
> #14 0x000000000053b818 in ExecProcNode ()
> #15 0x000000000053827e in EvalPlanQualNext ()
> #16 0x000000000053867b in EvalPlanQual ()
> #17 0x0000000000539afd in standard_ExecutorRun ()
> #18 0x00007f796347881b in pgss_ExecutorRun (queryDesc=0x1af53b0,
> direction=ForwardScanDirection, count=0) at pg_stat_statements.c:516
> #19 0x00000000005e3ad1 in ProcessQuery ()
> #20 0x00000000005e3cd4 in PortalRunMulti ()
> #21 0x00000000005e4452 in PortalRun ()
> #22 0x00000000005dfac9 in exec_simple_query ()
> #23 0x00000000005e10f3 in PostgresMain ()
> #24 0x00000000005b4e73 in ServerLoop ()
> #25 0x00000000005b729e in PostmasterMain ()
> #26 0x0000000000562578 in main ()
>
>
>
"Todd A. Cook" <tcook@blackducksoftware.com> writes:
> On 05/15/13 13:27, tcook@blackducksoftware.com wrote:
>> When nearly identical update queries arrive simultaneously, the first one to
>> execute runs normally, but subsequent executions run _extremely_ slowly.
>> We've seen this behaviour in production, and the contrived test case below
>> reproduces the issue.

> I've repeated the test below on a 9.1.9 installation, and it works fine there.

Given the reference to EvalPlanQual in your stack trace, I'm thinking
the explanation is this 9.0 fix:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL9_0_BR [9f2ee8f28] 2009-10-26 02:26:45 +0000

    Re-implement EvalPlanQual processing to improve its performance and eliminate
    a lot of strange behaviors that occurred in join cases.  We now identify the
    "current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
    UPDATE/SHARE queries.  If an EvalPlanQual recheck is necessary, we jam the
    appropriate row into each scan node in the rechecking plan, forcing it to emit
    only that one row.  The former behavior could rescan the whole of each joined
    relation for each recheck, which was terrible for performance, and what's much
    worse could result in duplicated output tuples.

    Also, the original implementation of EvalPlanQual could not re-use the recheck
    execution tree --- it had to go through a full executor init and shutdown for
    every row to be tested.  To avoid this overhead, I've associated a special
    runtime Param with each LockRows or ModifyTable plan node, and arranged to
    make every scan node below such a node depend on that Param.  Thus, by
    signaling a change in that Param, the EPQ machinery can just rescan the
    already-built test plan.

    This patch also adds a prohibition on set-returning functions in the
    targetlist of SELECT FOR UPDATE/SHARE.  This is needed to avoid the
    duplicate-output-tuple problem.  It seems fairly reasonable since the
    other restrictions on SELECT FOR UPDATE are meant to ensure that there
    is a unique correspondence between source tuples and result tuples,
    which an output SRF destroys as much as anything else does.

            regards, tom lane
On 05/15/13 16:10, Tom Lane wrote:
> "Todd A. Cook" <tcook@blackducksoftware.com> writes:
>> On 05/15/13 13:27, tcook@blackducksoftware.com wrote:
>>> When nearly identical update queries arrive simultaneously, the first one to
>>> execute runs normally, but subsequent executions run _extremely_ slowly.
>>> We've seen this behaviour in production, and the contrived test case below
>>> reproduces the issue.
>
>> I've repeated the test below on a 9.1.9 installation, and it works fine there.
>
> Given the reference to EvalPlanQual in your stack trace, I'm thinking
> the explanation is this 9.0 fix:

Thanks for the explanation.  Is there any chance of that fix being backpatched
into 8.4?

-- todd


> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Branch: master Release: REL9_0_BR [9f2ee8f28] 2009-10-26 02:26:45 +0000
>
>      Re-implement EvalPlanQual processing to improve its performance and eliminate
>      a lot of strange behaviors that occurred in join cases.  We now identify the
>      "current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
>      UPDATE/SHARE queries.  If an EvalPlanQual recheck is necessary, we jam the
>      appropriate row into each scan node in the rechecking plan, forcing it to emit
>      only that one row.  The former behavior could rescan the whole of each joined
>      relation for each recheck, which was terrible for performance, and what's much
>      worse could result in duplicated output tuples.
>
>      Also, the original implementation of EvalPlanQual could not re-use the recheck
>      execution tree --- it had to go through a full executor init and shutdown for
>      every row to be tested.  To avoid this overhead, I've associated a special
>      runtime Param with each LockRows or ModifyTable plan node, and arranged to
>      make every scan node below such a node depend on that Param.  Thus, by
>      signaling a change in that Param, the EPQ machinery can just rescan the
>      already-built test plan.
>
>      This patch also adds a prohibition on set-returning functions in the
>      targetlist of SELECT FOR UPDATE/SHARE.  This is needed to avoid the
>      duplicate-output-tuple problem.  It seems fairly reasonable since the
>      other restrictions on SELECT FOR UPDATE are meant to ensure that there
>      is a unique correspondence between source tuples and result tuples,
>      which an output SRF destroys as much as anything else does.
>
>             regards, tom lane
>
>
"Todd A. Cook" <tcook@blackducksoftware.com> writes:
> On 05/15/13 16:10, Tom Lane wrote:
>> Given the reference to EvalPlanQual in your stack trace, I'm thinking
>> the explanation is this 9.0 fix:

> Thanks for the explanation.  Is there any chance of that fix being backpatched
> into 8.4?

None whatsoever.  Aside from the mentioned change in semantics, there
were numerous followup fixes, which would be difficult to disentangle
from everything else that changed in 9.0.  We would not risk
destabilizing 8.4 like that.

8.4 is barely more a year from EOL anyway, so you really ought to be
thinking about an upgrade.

            regards, tom lane