Thread: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

From
cbw
Date:
I have a DO block that has a couple of inserts (copying large amounts
of data from staging tables) that I am passing to the server using
JDBC,

When I execute the inserts independently, they work fine. But when I
submit them as part of the do block, the backend goes into
trigger.c:afterTriggerInvokeEvents and never returns. This happens
after the second insert starts running.

I have tried this in version 11.7 and 12.2 (on both Linux and Windows).

I grabbed the source from git and did a Linux build (REL_12_STABLE)
and can see that the method in question just keeps looping through
'chunks' and 'events'. I disabled the user triggers on the table in
question and the 'events' seem to be various foreign key references
and other constraints.

I bumped the log level up to debug5 and can see rows in the first
table getting inserted. Sometimes I see a few rows for the second
table and sometimes I don't. After that there are no additional log
entries from the backend process. I have let the process run overnight
(the inserts take about 12 minutes to complete when run
independently).

Any tips about how to go about debugging this would be appreciated. I
am struggling to see exactly what the events are so maybe so
suggestions on the best place to add some logging?

Here is the DO block:

DO
$$
    BEGIN
        PERFORM XeP_set_identifier('xi_batch_user');

        alter table xe_patient_visit disable trigger USER;
        alter table xe_auth disable trigger USER;

        RAISE NOTICE '% : inserting visits...', clock_timestamp();

        INSERT INTO
            xe_patient_visit
        (
            createtstamp,
            creationuser,
            modifiedtstamp,
            modifieduser,
            active,
            visitid,
            sourcesystem,
            status,
            visittypeid,
            ipid,
            accountid,
            ivid
        )
        SELECT (now() AT TIME ZONE 'utc') AS createtstamp,
               'xi_batch_user' AS creationuser,
               (now() AT TIME ZONE 'utc') AS modifiedtstamp,
               'xi_batch_user' AS modifieduser,
               'y' AS active,
               authorizationid AS visitid,
               'staging' AS sourcesystem,
               a.status,
               'AUTH' AS visittypeid,
               p.ipid,
               e.accountid,
               nextval('seq_xe_patient_visit') AS ivid
        FROM (SELECT authorizationid,
                     memberid,
                     CASE
                         WHEN authorizationstatus = 'Fully Approved'
THEN 'AUTH_APPROVED'
                         WHEN authorizationstatus = 'Partially
Approved' THEN 'AUTH_REDUCED'
                         WHEN authorizationstatus = 'Voided' THEN
'AUTH_COMPLETED'
                         WHEN authorizationstatus = 'Incomplete' THEN
'AUTH_PEND'
                         WHEN authorizationstatus = 'Pending Decision'
THEN 'AUTH_PEND'
                         WHEN authorizationstatus = 'Denied' THEN
'AUTH_DENIED' END

                        AS status,
                     row_number() OVER (PARTITION BY authorizationid
ORDER BY authorizationid) AS rownum
              FROM staging."authorization") a
                 JOIN xe_patient p ON p.patientid = a.memberid
                 JOIN xe_enterprise_data e ON e.accountid =
p.accountid AND e.enterpriseid = 'staging'
        WHERE rownum = 1
        ON CONFLICT (accountid, visitid)
            DO NOTHING;


        RAISE NOTICE '% : inserting auths...', clock_timestamp();
        INSERT INTO
            xe_auth
        (
            createtstamp,
            creationuser,
            modifiedtstamp,
            modifieduser,
            active,
            accountid,
            receiveddate,
            authnum,
            authtypeid,
            authsubtypeid,
            umurgencyid,
            ivid
        )
        SELECT (now() AT TIME ZONE 'utc') AS createtstamp,
               'xi_batch_user' AS creationuser,
               (now() AT TIME ZONE 'utc') AS modifiedtstamp,
               'xi_batch_user' AS modifieduser,
               'y' AS active,
               ed.accountid,
               receiveddate,
               authnum,
               a.authtypeid,
               at.authtypeid,
               umurgencyid,
               ivid
        FROM (
                 SELECT cast(receiveddate AS timestamp) AS receiveddate,
                        authorizationid AS authnum,
                        CASE
                            WHEN authorizationcategory = 'Inpatient'
                                THEN 'AUTH_IPA'
                            ELSE 'AUTH_SVC' END AS authtypeid,
                        authorizationtype,
                        CASE
                            WHEN authorizationurgency = 'ROUTINE' THEN 'STD'
                            WHEN authorizationurgency = 'EXPEDITED' THEN 'EXP'
                            END
                                                        AS umurgencyid
                 FROM staging."authorization"
             ) a
                 JOIN xe_patient_visit v ON v.visitid = a.authnum
                 JOIN xe_enterprise_data ed ON ed.accountid =
v.accountid AND ed.enterpriseid = 'staging'
                 JOIN xe_auth_type at ON at.name = a.authorizationtype
        ON CONFLICT (authnum, accountid)
            DO NOTHING;

    END
$$



Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

From
Tom Lane
Date:
cbw <cbwhitebu@gmail.com> writes:
> I have a DO block that has a couple of inserts (copying large amounts
> of data from staging tables) that I am passing to the server using
> JDBC,
> When I execute the inserts independently, they work fine. But when I
> submit them as part of the do block, the backend goes into
> trigger.c:afterTriggerInvokeEvents and never returns. This happens
> after the second insert starts running.

Have you got deferred uniqueness or exclusion constraints on the
target table?

If so, perhaps a plausible theory is that when you submit the queries
separately, the unique_key_recheck trigger is never fired at all --- but
if they are in the same transaction, then recheck events get queued
because the index can't tell whether the earlier row should be treated
as committed.  This requires some assumptions about the table schema
(which you haven't shown us) but it's a bit hard to see why the second
query would act differently in the two contexts otherwise.

A variant of that theory is that foreign key trigger firings are being
skipped in one case but not the other; but offhand I think those
optimizations only apply to update/delete cases not inserts.  Anyway
that still requires some assumptions about moving parts that you
haven't shown us.

The short answer very likely is going to be that you need to perform
the queries as separate transactions, or the second one drowns in
trigger overhead.  Tracing down exactly why might not be worth a
lot of trouble.

            regards, tom lane



Tom,

Thanks for that feedback.

I have tried adding a commit after the first insert but the problem
still persists so I don't think this problem is related to transaction
boundaries.

I am seeing this same problem with other table inserts as well.

My current hypothesis is that this bug happens when I am inserting
data that references rows in another table that I just inserted in the
same DO block. I don't know yet whether it only happens with the large
data sets I am processing.

I have also tried creating a procedure for the same code instead of
the DO block and the problem happens there as well.

This is a huge schema but I have included the definitions of the two
table in my examples. As you can see the second table references the
primary key of the first table (see constraint xe_auth_f5). There are
no deferred constraints.

--- first table-----
create table xe_patient_visit
(
    accountid numeric(10) not null,
    createtstamp timestamp not null,
    creationuser varchar(60) not null,
    modifiedtstamp timestamp not null,
    modifieduser varchar(60) not null,
    active varchar(1) default 'y'::character varying not null
        constraint xe_patient_visit_active_check
            check ((active)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    iscmgmtpatient varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_iscmgmtpatient_check
            check ((iscmgmtpatient)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    isinerror varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_isinerror_check
            check ((isinerror)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    ismlc varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_ismlc_check
            check ((ismlc)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    isreadmit varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_isreadmit_check
            check ((isreadmit)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text, ('u'::character
varying)::text])),
    isvisitlocked varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_isvisitlocked_check
            check ((isvisitlocked)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    iswalkin varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_iswalkin_check
            check ((iswalkin)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    needsbed varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_needsbed_check
            check ((needsbed)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    accidentdatetime timestamp,
    accidentdatetimesrv timestamp,
    authorizedadmitdate date,
    clearancedatetime timestamp,
    clearancedatetimesrv timestamp,
    closedate timestamp,
    closedatesrv timestamp,
    enddate timestamp,
    enddatesrv timestamp,
    expectedenddate timestamp,
    expectedenddatesrv timestamp,
    expireattemptdatetime timestamp,
    mergeverifieddate timestamp,
    prioradmissiondate date,
    referringletterdate date,
    requestedadmitdate date,
    startdate timestamp,
    startdatesrv timestamp,
    accidentdatetimeoff varchar(5),
    accidentstate varchar(100),
    accidenttypeid varchar(32)
        constraint xe_patient_visit_f0
            references xg_svc_code_catalog,
    admitcategoryid varchar(16)
        constraint xe_patient_visit_f1
            references xe_admit_cat,
    bedclassid varchar(32),
    bedtext varchar(100),
    clearancedatetimeoff varchar(5),
    closecomment varchar(200),
    closedateoff varchar(5),
    closeusername varchar(60),
    comments varchar(200),
    dischargeinstruction varchar(2000),
    dischargelocation varchar(100),
    displayroombed varchar(50),
    enddateoff varchar(5),
    errordescription varchar(1024),
    expectedenddateoff varchar(5),
    expectedlengthofstay varchar(16),
    expectedlengthofstayunits varchar(64)
        constraint xe_patient_visit_f10
            references xe_unit,
    mlcdescription varchar(500),
    mergecomments varchar(200),
    mergeverifiedby varchar(60),
    patientstatus varchar(16),
    performlocation varchar(100),
    priordiagnosisdesc varchar(200),
    reasonfordischarge varchar(100),
    reasonforvisit varchar(256),
    referralsourceid varchar(20),
    referraltype varchar(20),
    referringhospitalname varchar(200),
    referringhospitalpatientid varchar(100),
    referringlettercontext varchar(200),
    referringletterid varchar(200),
    referringphysicianname varchar(200),
    referringphysicianphoneno varchar(50),
    sourcesystem varchar(100) default 'SYNERGY'::character varying not null,
    startdateoff varchar(5),
    status varchar(16)
        constraint xe_patient_visit_f24
            references xe_visit_status,
    statusreasonid varchar(16)
        constraint xe_patient_visit_f25
            references xe_visit_status,
    statusreasontext varchar(200),
    transferstatus varchar(32),
    visitid varchar(30) not null,
    visittypeid varchar(16) not null
        constraint xe_patient_visit_f27
            references xe_visit_type,
    bedid numeric(18),
    clinicid numeric(18),
    dischargedisposition numeric(18)
        constraint xe_patient_visit_f5
            references xg_clinical_ref_data,
    dischargeorderowner numeric(18)
        constraint xe_patient_visit_f6
            references xe_resource,
    dischargereasonid numeric(18),
    dischargetounitid numeric(18),
    dischargetype numeric(18)
        constraint xe_patient_visit_f9
            references xg_clinical_ref_data,
    facilityid numeric(18),
    hcu numeric(18),
    ipid numeric(18) not null,
    incomingconditionid numeric(18)
        constraint xe_patient_visit_f14
            references xg_clinical_ref_data,
    locationid numeric(18)
        constraint xe_patient_visit_f15
            references xe_location,
    mergesourceipid numeric(18),
    orgunitid numeric(18),
    parentid numeric(18),
    patientmrdid numeric(18),
    patienttypeid numeric(18),
    previousivid numeric(18),
    readmitdays numeric(3),
    readmitivid numeric(18),
    restrictvalue numeric(5) default '1'::numeric not null,
    servicebookingid numeric(18),
    servicecenterid numeric(18),
    serviceid numeric(18)
        constraint xe_patient_visit_f22
            references xg_svc_base,
    sourceofadmitid numeric(18)
        constraint xe_patient_visit_f23
            references xg_pos,
    typeofcareid numeric(18)
        constraint xe_patient_visit_f26
            references xg_clinical_ref_data,
    workid numeric(18),
    ivid numeric(18) not null,
    constraint pk_xe_patient_visit
        primary key (ivid, accountid),
    constraint xe_patient_visit_eu
        unique (visitid, accountid),
    constraint xe_patient_visit_f11
        foreign key (facilityid, accountid) references xe_facility,
    constraint xe_patient_visit_f12
        foreign key (hcu, accountid) references xe_org_unit,
    constraint xe_patient_visit_f13
        foreign key (ipid, accountid) references xe_patient,
    constraint xe_patient_visit_f16
        foreign key (mergesourceipid, accountid) references xe_patient,
    constraint xe_patient_visit_f17
        foreign key (orgunitid, accountid) references xe_org_unit,
    constraint xe_patient_visit_f18
        foreign key (patientmrdid, accountid) references xe_patient_mrd,
    constraint xe_patient_visit_f19
        foreign key (patienttypeid, accountid) references xe_patient_type,
    constraint xe_patient_visit_f2
        foreign key (bedclassid, accountid) references xe_bed_class,
    constraint xe_patient_visit_f20
        foreign key (servicebookingid, accountid) references xe_srvc_booking,
    constraint xe_patient_visit_f21
        foreign key (servicecenterid, accountid) references xe_org_unit,
    constraint xe_patient_visit_f28
        foreign key (workid, accountid) references xe_workflow_inst,
    constraint xe_patient_visit_f3
        foreign key (bedid, accountid) references xe_bed,
    constraint xe_patient_visit_f4
        foreign key (clinicid, accountid) references xe_org_unit,
    constraint xe_patient_visit_f7
        foreign key (dischargereasonid, accountid) references
xe_discharge_reason,
    constraint xe_patient_visit_f8
        foreign key (dischargetounitid, accountid) references xe_org_unit
);

alter table xe_patient_visit owner to xowner;

create index xe_patient_visit_12
    on xe_patient_visit (startdate, accountid, status, visittypeid);

create index xe_patient_visit_13
    on xe_patient_visit (accountid, status, visittypeid, ivid, enddate);

create index xe_patient_visit_14
    on xe_patient_visit (enddate, hcu, accountid);

create index xe_patient_visit_15
    on xe_patient_visit (servicebookingid, accountid);

create index xe_patient_visit_16
    on xe_patient_visit (parentid, accountid, ivid);

create index xe_patient_visit_i1
    on xe_patient_visit (ipid, accountid);

create trigger xe_patient_visit_ht
    after insert or update or delete
    on xe_patient_visit
    for each row
execute procedure xep_xe_patient_visit();

create trigger xe_patient_visit_xe_auth_te
    after insert or update
    on xe_patient_visit
    for each row
execute procedure xep_xe_patient_visit_xe_auth_tef();

create trigger xe_patient_visit_xe_patient_issue_te
    after insert or update
    on xe_patient_visit
    for each row
execute procedure xep_xe_patient_visit_xe_patient_issue_tef();

create trigger xe_patient_visit_xe_visit_case_mgmt_pgm_te
    after insert or update
    on xe_patient_visit
    for each row
execute procedure xep_xe_patient_visit_xe_visit_case_mgmt_pgm_tef();

create trigger xe_patient_visit_xe_visit_case_mgmt_te
    after insert or update
    on xe_patient_visit
    for each row
execute procedure xep_xe_patient_visit_xe_visit_case_mgmt_tef();

--- second table ---
create table xe_auth
(
    accountid numeric(10) not null,
    createtstamp timestamp not null,
    creationuser varchar(60) not null,
    modifiedtstamp timestamp not null,
    modifieduser varchar(60) not null,
    active varchar(1) default 'y'::character varying not null
        constraint xe_auth_active_check
            check ((active)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    additionalcomments varchar(4000),
    isappeal varchar(1) default 'n'::character varying not null
        constraint xe_auth_isappeal_check
            check ((isappeal)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    isextended varchar(1) default 'n'::character varying not null
        constraint xe_auth_isextended_check
            check ((isextended)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    ispatientreq varchar(1) default 'n'::character varying not null
        constraint xe_auth_ispatientreq_check
            check ((ispatientreq)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    ispreauth varchar(1) default 'n'::character varying not null
        constraint xe_auth_ispreauth_check
            check ((ispreauth)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    isreopen varchar(1) default 'n'::character varying not null
        constraint xe_auth_isreopen_check
            check ((isreopen)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    isselfreferral varchar(1) default 'n'::character varying not null
        constraint xe_auth_isselfreferral_check
            check ((isselfreferral)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    medicalnecessity varchar(4000),
    needsccr varchar(1) default 'y'::character varying not null
        constraint xe_auth_needsccr_check
            check ((needsccr)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    approveddate timestamp,
    cancelleddate timestamp,
    compliancedate timestamp,
    denieddate timestamp,
    duedate timestamp,
    expectedbydate timestamp,
    expirationdate timestamp,
    nextreviewdate timestamp,
    outboundqueuedtime timestamp,
    receiveddate timestamp,
    authnum varchar(9) not null,
    authsubtypeid varchar(30)
        constraint xe_auth_f0
            references xe_auth_type,
    authtypeid varchar(30) not null
        constraint xe_auth_f1
            references xe_auth_type,
    denialcode varchar(60),
    deniedbyusername varchar(60),
    guidelines varchar(200),
    pricingnote varchar(100),
    primaryusername varchar(60),
    umurgencyid varchar(32)
        constraint xe_auth_f13
            references xg_um_urgency,
    allowedlos numeric(3),
    authorizedlos numeric(3),
    denialreasonid numeric(18)
        constraint xe_auth_f2
            references xg_clinical_ref_data,
    expediterequestortypeid numeric(18)
        constraint xe_auth_f4
            references xg_clinical_ref_data,
    owner numeric(18),
    requestedlos numeric(3),
    requestedposid numeric(18)
        constraint xe_auth_f8
            references xg_pos,
    restrictvalue numeric(5) default '1'::numeric not null,
    servicetypeid numeric(18)
        constraint xe_auth_f9
            references xg_clinical_ref_data,
    sourceofadmitid numeric(18)
        constraint xe_auth_f10
            references xg_pos,
    treatmentcategoryid numeric(18)
        constraint xe_auth_f11
            references xg_clinical_ref_data,
    typeofcareid numeric(18)
        constraint xe_auth_f12
            references xg_clinical_ref_data,
    ivid numeric(18) not null,
    constraint pk_xe_auth
        primary key (ivid, accountid),
    constraint xe_auth_eu
        unique (authnum, accountid),
    constraint xe_auth_f3
        foreign key (deniedbyusername, accountid) references xe_user_data,
    constraint xe_auth_f5
        foreign key (ivid, accountid) references xe_patient_visit,
    constraint xe_auth_f6
        foreign key (owner, accountid) references xe_staff,
    constraint xe_auth_f7
        foreign key (primaryusername, accountid) references xe_user_data
);

alter table xe_auth owner to xowner;

create trigger xe_auth_ht
    after insert or update or delete
    on xe_auth
    for each row
execute procedure xep_xe_auth();

create trigger xe_auth_xe_patient_visit_te
    after insert or update
    on xe_auth
    for each row
execute procedure xep_xe_auth_xe_patient_visit_tef();

On Tue, Apr 21, 2020 at 7:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> cbw <cbwhitebu@gmail.com> writes:
> > I have a DO block that has a couple of inserts (copying large amounts
> > of data from staging tables) that I am passing to the server using
> > JDBC,
> > When I execute the inserts independently, they work fine. But when I
> > submit them as part of the do block, the backend goes into
> > trigger.c:afterTriggerInvokeEvents and never returns. This happens
> > after the second insert starts running.
>
> Have you got deferred uniqueness or exclusion constraints on the
> target table?
>
> If so, perhaps a plausible theory is that when you submit the queries
> separately, the unique_key_recheck trigger is never fired at all --- but
> if they are in the same transaction, then recheck events get queued
> because the index can't tell whether the earlier row should be treated
> as committed.  This requires some assumptions about the table schema
> (which you haven't shown us) but it's a bit hard to see why the second
> query would act differently in the two contexts otherwise.
>
> A variant of that theory is that foreign key trigger firings are being
> skipped in one case but not the other; but offhand I think those
> optimizations only apply to update/delete cases not inserts.  Anyway
> that still requires some assumptions about moving parts that you
> haven't shown us.
>
> The short answer very likely is going to be that you need to perform
> the queries as separate transactions, or the second one drowns in
> trigger overhead.  Tracing down exactly why might not be worth a
> lot of trouble.
>
>                         regards, tom lane



Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

From
Alvaro Herrera
Date:
On 2020-Apr-21, Tom Lane wrote:

> A variant of that theory is that foreign key trigger firings are being
> skipped in one case but not the other; but offhand I think those
> optimizations only apply to update/delete cases not inserts.  Anyway
> that still requires some assumptions about moving parts that you
> haven't shown us.

I wonder if there are any funny interactions between trigger tuple
acquisition and the ON CONFLICT stuff.  The only thing that occurs to me
to explain the fact that it only fails with the two stmts in the DO
block is that the second insert can see rows as inserted by the same
transaction.

I would start by taking a few dozen backtraces and comparing them to see
if any progress is being made.

The fact that this reproduces in 11.2 would seem to discard theories
about tuple table slot changes and table AM.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Apr-21, Tom Lane wrote:
>> A variant of that theory is that foreign key trigger firings are being
>> skipped in one case but not the other; but offhand I think those
>> optimizations only apply to update/delete cases not inserts.  Anyway
>> that still requires some assumptions about moving parts that you
>> haven't shown us.

> I wonder if there are any funny interactions between trigger tuple
> acquisition and the ON CONFLICT stuff.  The only thing that occurs to me
> to explain the fact that it only fails with the two stmts in the DO
> block is that the second insert can see rows as inserted by the same
> transaction.

Yeah, I'll bet a good deal that the use of ON CONFLICT is part of the
issue here.  That can queue firings of the unique_key_recheck trigger,
but I'm too lazy to go figure out exactly when that happens or gets
optimized away.  (AfterTriggerSaveEvent definitely does know about
optimizing it away in some cases...)

            regards, tom lane



Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

From
Andres Freund
Date:
Hi,

On 2020-04-21 11:19:35 -0400, Alvaro Herrera wrote:
> On 2020-Apr-21, Tom Lane wrote:
> 
> > A variant of that theory is that foreign key trigger firings are being
> > skipped in one case but not the other; but offhand I think those
> > optimizations only apply to update/delete cases not inserts.  Anyway
> > that still requires some assumptions about moving parts that you
> > haven't shown us.
> 
> I wonder if there are any funny interactions between trigger tuple
> acquisition and the ON CONFLICT stuff.  The only thing that occurs to me
> to explain the fact that it only fails with the two stmts in the DO
> block is that the second insert can see rows as inserted by the same
> transaction.

I wonder if there's potentially some issue with the wrong snapshot being
used for the different statements...

> I would start by taking a few dozen backtraces and comparing them to see
> if any progress is being made.

It could also be informative to look at the walstream with pg_waldump
while the problem is occuring. Would tell us about row locks acquired
during on conflict / trigger handling etc.

> The fact that this reproduces in 11.2 would seem to discard theories
> about tuple table slot changes and table AM.

Phew ;)

Greetings,

Andres Freund



Andres,

Thanks for taking a look at this.

Here is a sample of pg_waldump while the problem is occurring:

rmgr: Heap        len (rec/tot):    185/   185, tx:      35961, lsn:
A/D37435A8, prev A/D37415A0, desc: INSERT off 40 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D3743668, prev A/D37435A8, desc: INSERT_LEAF off 96, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     54/    54, tx:      35961, lsn:
A/D37436B0, prev A/D3743668, desc: LOCK off 41: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1792
rmgr: Heap        len (rec/tot):    194/   194, tx:      35961, lsn:
A/D37436E8, prev A/D37436B0, desc: INSERT off 41 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D37437B0, prev A/D37436E8, desc: INSERT_LEAF off 93, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     59/  8175, tx:      35961, lsn:
A/D37437F8, prev A/D37437B0, desc: LOCK off 33: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1799 FPW
rmgr: Heap        len (rec/tot):    185/   185, tx:      35961, lsn:
A/D3745800, prev A/D37437F8, desc: INSERT off 42 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D37458C0, prev A/D3745800, desc: INSERT_LEAF off 95, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Standby     len (rec/tot):     66/    66, tx:          0, lsn:
A/D3745908, prev A/D37458C0, desc: LOCK xid 35961 db 99095 rel 125491
xid 35961 db 99095 rel 125485 xid 35961 db 99095 rel 125490
rmgr: Standby     len (rec/tot):     54/    54, tx:          0, lsn:
A/D3745950, prev A/D3745908, desc: RUNNING_XACTS nextXid 35962
latestCompletedXid 35960 oldestRunningXid 35961; 1 xacts: 35961
rmgr: Heap        len (rec/tot):     59/  8175, tx:      35961, lsn:
A/D3745988, prev A/D3745950, desc: LOCK off 4: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1772 FPW
rmgr: Heap        len (rec/tot):    194/   194, tx:      35961, lsn:
A/D3747990, prev A/D3745988, desc: INSERT off 43 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D3747A58, prev A/D3747990, desc: INSERT_LEAF off 89, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     54/    54, tx:      35961, lsn:
A/D3747AA0, prev A/D3747A58, desc: LOCK off 3: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1799
rmgr: Heap        len (rec/tot):    193/   193, tx:      35961, lsn:
A/D3747AD8, prev A/D3747AA0, desc: INSERT off 44 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D3747BA0, prev A/D3747AD8, desc: INSERT_LEAF off 96, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     54/    54, tx:      35961, lsn:
A/D3747BE8, prev A/D3747BA0, desc: LOCK off 16: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1799
rmgr: Heap        len (rec/tot):    185/   185, tx:      35961, lsn:
A/D3747C20, prev A/D3747BE8, desc: INSERT off 45 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D3747CE0, prev A/D3747C20, desc: INSERT_LEAF off 97, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     59/  8167, tx:      35961, lsn:
A/D3747D28, prev A/D3747CE0, desc: LOCK off 16: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1801 FPW
rmgr: Heap        len (rec/tot):    186/   186, tx:      35961, lsn:
A/D3749D28, prev A/D3747D28, desc: INSERT off 46 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D3749DE8, prev A/D3749D28, desc: INSERT_LEAF off 99, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Standby     len (rec/tot):     66/    66, tx:          0, lsn:
A/D3749E30, prev A/D3749DE8, desc: LOCK xid 35961 db 99095 rel 125491
xid 35961 db 99095 rel 125485 xid 35961 db 99095 rel 125490
rmgr: Standby     len (rec/tot):     54/    54, tx:          0, lsn:
A/D3749E78, prev A/D3749E30, desc: RUNNING_XACTS nextXid 35962
latestCompletedXid 35960 oldestRunningXid 35961; 1 xacts: 35961
rmgr: Heap        len (rec/tot):     59/  8191, tx:      35961, lsn:
A/D3749EB0, prev A/D3749E78, desc: LOCK off 13: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1802 FPW
rmgr: Heap        len (rec/tot):    190/   190, tx:      35961, lsn:
A/D374BEC8, prev A/D3749EB0, desc: INSERT off 47 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D374BF88, prev A/D374BEC8, desc: INSERT_LEAF off 100, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     59/  8239, tx:      35961, lsn:
A/D374BFD0, prev A/D374BF88, desc: LOCK off 54: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104057 blk 1 FPW
rmgr: Heap        len (rec/tot):     59/  8167, tx:      35961, lsn:
A/D374E030, prev A/D374BFD0, desc: LOCK off 41: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1822 FPW
rmgr: Heap        len (rec/tot):    178/   178, tx:      35961, lsn:
A/D3750030, prev A/D374E030, desc: INSERT off 48 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D37500E8, prev A/D3750030, desc: INSERT_LEAF off 104, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Standby     len (rec/tot):     66/    66, tx:          0, lsn:
A/D3750130, prev A/D37500E8, desc: LOCK xid 35961 db 99095 rel 125491
xid 35961 db 99095 rel 125485 xid 35961 db 99095 rel 125490
rmgr: Standby     len (rec/tot):     54/    54, tx:          0, lsn:
A/D3750178, prev A/D3750130, desc: RUNNING_XACTS nextXid 35962
latestCompletedXid 35960 oldestRunningXid 35961; 1 xacts: 35961
rmgr: Heap        len (rec/tot):     59/  8199, tx:      35961, lsn:
A/D37501B0, prev A/D3750178, desc: LOCK off 42: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1826 FPW
rmgr: Heap        len (rec/tot):    194/   194, tx:      35961, lsn:
A/D37521D0, prev A/D37501B0, desc: INSERT off 49 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D3752298, prev A/D37521D0, desc: INSERT_LEAF off 106, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     59/  8175, tx:      35961, lsn:
A/D37522E0, prev A/D3752298, desc: LOCK off 41: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1833 FPW
rmgr: Heap        len (rec/tot):    185/   185, tx:      35961, lsn:
A/D37542E8, prev A/D37522E0, desc: INSERT+INIT off 1 flags 0x00,
blkref #0: rel 1663/99095/104041 blk 9
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D37543A8, prev A/D37542E8, desc: INSERT_LEAF off 108, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     59/  8183, tx:      35961, lsn:
A/D37543F0, prev A/D37543A8, desc: LOCK off 4: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1834 FPW
rmgr: Heap        len (rec/tot):    185/   185, tx:      35961, lsn:
A/D3756400, prev A/D37543F0, desc: INSERT off 2 flags 0x00, blkref #0:
rel 1663/99095/104041 blk 9
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D37564C0, prev A/D3756400, desc: INSERT_LEAF off 109, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     54/    54, tx:      35961, lsn:
A/D3756508, prev A/D37564C0, desc: LOCK off 7: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1834
rmgr: Heap        len (rec/tot):    185/   185, tx:      35961, lsn:
A/D3756540, prev A/D3756508, desc: INSERT off 3 flags 0x00, blkref #0:
rel 1663/99095/104041 blk 9
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D3756600, prev A/D3756540, desc: INSERT_LEAF off 110, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Standby     len (rec/tot):     66/    66, tx:          0, lsn:
A/D3756648, prev A/D3756600, desc: LOCK xid 35961 db 99095 rel 125491
xid 35961 db 99095 rel 125485 xid 35961 db 99095 rel 125490
rmgr: Standby     len (rec/tot):     54/    54, tx:          0, lsn:
A/D3756690, prev A/D3756648, desc: RUNNING_XACTS nextXid 35962
latestCompletedXid 35960 oldestRunningXid 35961; 1 xacts: 35961
rmgr: Heap        len (rec/tot):     59/  8191, tx:      35961, lsn:
A/D37566C8, prev A/D3756690, desc: LOCK off 27: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1549 FPW
rmgr: Heap        len (rec/tot):    185/   185, tx:      35961, lsn:
A/D37586E0, prev A/D37566C8, desc: INSERT off 4 flags 0x00, blkref #0:
rel 1663/99095/104041 blk 9
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D37587A0, prev A/D37586E0, desc: INSERT_LEAF off 69, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     59/  8191, tx:      35961, lsn:
A/D37587E8, prev A/D37587A0, desc: LOCK off 30: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1554 FPW
rmgr: Heap        len (rec/tot):    185/   185, tx:      35961, lsn:
A/D375A800, prev A/D37587E8, desc: INSERT off 5 flags 0x00, blkref #0:
rel 1663/99095/104041 blk 9
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D375A8C0, prev A/D375A800, desc: INSERT_LEAF off 70, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap        len (rec/tot):     59/  8175, tx:      35961, lsn:
A/D375A908, prev A/D375A8C0, desc: LOCK off 46: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1555 FPW
rmgr: Heap        len (rec/tot):    199/   199, tx:      35961, lsn:
A/D375C910, prev A/D375A908, desc: INSERT off 6 flags 0x00, blkref #0:
rel 1663/99095/104041 blk 9
rmgr: Btree       len (rec/tot):     72/    72, tx:      35961, lsn:
A/D375C9D8, prev A/D375C910, desc: INSERT_LEAF off 71, blkref #0: rel
1663/99095/113582 blk 2

On Wed, Apr 22, 2020 at 8:39 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-04-21 11:19:35 -0400, Alvaro Herrera wrote:
> > On 2020-Apr-21, Tom Lane wrote:
> >
> > > A variant of that theory is that foreign key trigger firings are being
> > > skipped in one case but not the other; but offhand I think those
> > > optimizations only apply to update/delete cases not inserts.  Anyway
> > > that still requires some assumptions about moving parts that you
> > > haven't shown us.
> >
> > I wonder if there are any funny interactions between trigger tuple
> > acquisition and the ON CONFLICT stuff.  The only thing that occurs to me
> > to explain the fact that it only fails with the two stmts in the DO
> > block is that the second insert can see rows as inserted by the same
> > transaction.
>
> I wonder if there's potentially some issue with the wrong snapshot being
> used for the different statements...
>
> > I would start by taking a few dozen backtraces and comparing them to see
> > if any progress is being made.
>
> It could also be informative to look at the walstream with pg_waldump
> while the problem is occuring. Would tell us about row locks acquired
> during on conflict / trigger handling etc.
>
> > The fact that this reproduces in 11.2 would seem to discard theories
> > about tuple table slot changes and table AM.
>
> Phew ;)
>
> Greetings,
>
> Andres Freund



This turned out to be the result of outdated table statistics.

We were adding a large number of rows to a table then adding a large
number of rows to a table that referenced the first table. Because the
table statistics were so wrong we ended up with triggers, etc using
catastrophically bad execution plans.

The fix was to add an 'analyze first_table' statement after adding the
rows to the first table.

Many thanks to Christophe Pettus of PGExperts for solving this issue.

On Mon, Apr 20, 2020 at 9:07 PM cbw <cbwhitebu@gmail.com> wrote:
>
> I have a DO block that has a couple of inserts (copying large amounts
> of data from staging tables) that I am passing to the server using
> JDBC,
>
> When I execute the inserts independently, they work fine. But when I
> submit them as part of the do block, the backend goes into
> trigger.c:afterTriggerInvokeEvents and never returns. This happens
> after the second insert starts running.
>
> I have tried this in version 11.7 and 12.2 (on both Linux and Windows).
>
> I grabbed the source from git and did a Linux build (REL_12_STABLE)
> and can see that the method in question just keeps looping through
> 'chunks' and 'events'. I disabled the user triggers on the table in
> question and the 'events' seem to be various foreign key references
> and other constraints.
>
> I bumped the log level up to debug5 and can see rows in the first
> table getting inserted. Sometimes I see a few rows for the second
> table and sometimes I don't. After that there are no additional log
> entries from the backend process. I have let the process run overnight
> (the inserts take about 12 minutes to complete when run
> independently).
>
> Any tips about how to go about debugging this would be appreciated. I
> am struggling to see exactly what the events are so maybe so
> suggestions on the best place to add some logging?
>
> Here is the DO block:
>
> DO
> $$
>     BEGIN
>         PERFORM XeP_set_identifier('xi_batch_user');
>
>         alter table xe_patient_visit disable trigger USER;
>         alter table xe_auth disable trigger USER;
>
>         RAISE NOTICE '% : inserting visits...', clock_timestamp();
>
>         INSERT INTO
>             xe_patient_visit
>         (
>             createtstamp,
>             creationuser,
>             modifiedtstamp,
>             modifieduser,
>             active,
>             visitid,
>             sourcesystem,
>             status,
>             visittypeid,
>             ipid,
>             accountid,
>             ivid
>         )
>         SELECT (now() AT TIME ZONE 'utc') AS createtstamp,
>                'xi_batch_user' AS creationuser,
>                (now() AT TIME ZONE 'utc') AS modifiedtstamp,
>                'xi_batch_user' AS modifieduser,
>                'y' AS active,
>                authorizationid AS visitid,
>                'staging' AS sourcesystem,
>                a.status,
>                'AUTH' AS visittypeid,
>                p.ipid,
>                e.accountid,
>                nextval('seq_xe_patient_visit') AS ivid
>         FROM (SELECT authorizationid,
>                      memberid,
>                      CASE
>                          WHEN authorizationstatus = 'Fully Approved'
> THEN 'AUTH_APPROVED'
>                          WHEN authorizationstatus = 'Partially
> Approved' THEN 'AUTH_REDUCED'
>                          WHEN authorizationstatus = 'Voided' THEN
> 'AUTH_COMPLETED'
>                          WHEN authorizationstatus = 'Incomplete' THEN
> 'AUTH_PEND'
>                          WHEN authorizationstatus = 'Pending Decision'
> THEN 'AUTH_PEND'
>                          WHEN authorizationstatus = 'Denied' THEN
> 'AUTH_DENIED' END
>
>                         AS status,
>                      row_number() OVER (PARTITION BY authorizationid
> ORDER BY authorizationid) AS rownum
>               FROM staging."authorization") a
>                  JOIN xe_patient p ON p.patientid = a.memberid
>                  JOIN xe_enterprise_data e ON e.accountid =
> p.accountid AND e.enterpriseid = 'staging'
>         WHERE rownum = 1
>         ON CONFLICT (accountid, visitid)
>             DO NOTHING;
>
>
>         RAISE NOTICE '% : inserting auths...', clock_timestamp();
>         INSERT INTO
>             xe_auth
>         (
>             createtstamp,
>             creationuser,
>             modifiedtstamp,
>             modifieduser,
>             active,
>             accountid,
>             receiveddate,
>             authnum,
>             authtypeid,
>             authsubtypeid,
>             umurgencyid,
>             ivid
>         )
>         SELECT (now() AT TIME ZONE 'utc') AS createtstamp,
>                'xi_batch_user' AS creationuser,
>                (now() AT TIME ZONE 'utc') AS modifiedtstamp,
>                'xi_batch_user' AS modifieduser,
>                'y' AS active,
>                ed.accountid,
>                receiveddate,
>                authnum,
>                a.authtypeid,
>                at.authtypeid,
>                umurgencyid,
>                ivid
>         FROM (
>                  SELECT cast(receiveddate AS timestamp) AS receiveddate,
>                         authorizationid AS authnum,
>                         CASE
>                             WHEN authorizationcategory = 'Inpatient'
>                                 THEN 'AUTH_IPA'
>                             ELSE 'AUTH_SVC' END AS authtypeid,
>                         authorizationtype,
>                         CASE
>                             WHEN authorizationurgency = 'ROUTINE' THEN 'STD'
>                             WHEN authorizationurgency = 'EXPEDITED' THEN 'EXP'
>                             END
>                                                         AS umurgencyid
>                  FROM staging."authorization"
>              ) a
>                  JOIN xe_patient_visit v ON v.visitid = a.authnum
>                  JOIN xe_enterprise_data ed ON ed.accountid =
> v.accountid AND ed.enterpriseid = 'staging'
>                  JOIN xe_auth_type at ON at.name = a.authorizationtype
>         ON CONFLICT (authnum, accountid)
>             DO NOTHING;
>
>     END
> $$