Logical replication issue with row level trigger - Mailing list pgsql-general

From Aleš Zelený
Subject Logical replication issue with row level trigger
Date
Msg-id CAODqTUZsf8BasR-3SY3V7jeDVj-v4-komPpsR-SFLfp78dDiAQ@mail.gmail.com
Whole thread Raw
List pgsql-general
Hello,

I've found an issue with logical replication causing
1) large memory allocation by wal sender process (RSS 60GB)
2) large amount of "xid" files in $PGDATA/pg_replslot/<repl_slot_name> directory - seems to be amount of rows handled by trigger + few more

Having several millions of files in a single directory for replication slot makes it hard to work on (tested ext4 and zfs).
Regarding the memory - such large allocation was no expected and as result swap was used with all performance impact expected.

Environment and conditions to reproduce:

PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit

create trigger function with:
1) DML modifying data in another table
2) use EXCEPTION clause (implicit subtransaction [https://www.postgresql.org/docs/10/plpgsql-structure.html])

Note, that is does not matter, whether tables used for testcase are replicated or no (this is what I use in testcase).

# publication side
pg_createcluster -d /pgsql/pgclusterzfs/i_pub/10/ipub -p 5532 10 ipub -- -k
pg_ctlcluster 10 ipub start
psql -p 5532 -c "alter system set wal_level=logical;"
pg_ctlcluster 10 ipub restart

psql -p 5532 <<EOF
create table foo( foo_id bigserial, foo_val text, PRIMARY KEY (foo_id));
create table bar( bar_id bigserial, bar_val bigint, PRIMARY KEY (bar_id));
create publication pub_test FOR TABLE foo, bar;
EOF

# subscription side
pg_createcluster -d /pgsql/pgclusterzfs/i_sub/10/isub -p 5533 10 isub -- -k
pg_ctlcluster 10 isub start
psql -p 5533 <<EOF
create table foo( foo_id bigint, foo_val text, PRIMARY KEY (foo_id));
create table bar( bar_id bigint, bar_val bigint, PRIMARY KEY (bar_id));
create subscription sub_test CONNECTION 'port=5532 dbname=postgres' PUBLICATION pub_test;
EOF


# publication side
psql -p 5532 <<EOF
create table local_bar( bar_id bigserial, bar_val bigint, PRIMARY KEY (bar_id));

create table local_bar_aud (
      local_bar_aud_id bigserial
    , bar_id bigint not null
    , bar_val bigint
    , aud_op text not null
    , aud_time timestamptz not null
    , aud_current_user text not null
    , aud_session_user text not null
    , aud_pid int
    , PRIMARY KEY (local_bar_aud_id)
);
EOF

Trigger function :

CREATE OR REPLACE FUNCTION public.local_bar_aud()
 RETURNS trigger
 LANGUAGE plpgsql
 SECURITY DEFINER
 SET search_path TO 'pg_catalog', 'public'
AS $function$
BEGIN
  IF ( TG_OP = 'INSERT' ) THEN
    INSERT INTO public.local_bar_aud (
          bar_id,
          bar_val,
          aud_op,
          aud_time,
          aud_current_user,
          aud_session_user,
          aud_pid
      ) VALUES (
         NEW.bar_id,
         NEW.bar_val,
         TG_OP,
         now(),
         current_user::text,
         session_user::text,
         pg_backend_pid()
      );
    RETURN NEW;
  END IF;
  EXCEPTION
    WHEN OTHERS THEN
        RAISE EXCEPTION 'ERROR [OTHER] from trigger function.';
      RETURN NULL;
END;
$function$;

psql -p 5532 <<EOF
CREATE TRIGGER local_bar_audit_trg AFTER INSERT ON local_bar FOR EACH ROW EXECUTE PROCEDURE local_bar_aud();

insert into local_bar (bar_val) select i from generate_series(1, 1E6) gs(i);
EOF

Using 1 mio rows is enough to show that wal sended allocates ~4.5GB in RSS and create 1mio xid files in the $PGDATA/pg_replslot/sub_test directory.

If the exception clause is removed or the trigger did not manipulate with other table data (e.g. setting a new vale if used before insert), there is no memory issue at all.

Similar impact to wal sender memory allocation can be also simulated by simple script:

begin transaction;

SAVEPOINT my_savepoint; INSERT INTO public.bar (bar_id, bar_val) VALUES (49471972, 1); RELEASE SAVEPOINT my_savepoint;
SAVEPOINT my_savepoint; INSERT INTO public.bar (bar_id, bar_val) VALUES (49471973, 2); RELEASE SAVEPOINT my_savepoint;
SAVEPOINT my_savepoint; INSERT INTO public.bar (bar_id, bar_val) VALUES (49471974, 3); RELEASE SAVEPOINT my_savepoint;
.... repeat 1 mio rows ....
commit transaction;


I rewrite the original trigger to most minimalistic version I was able to reproduce the issue, so the exception here is just to be able demonstrate my finding.

I'll ask developers to remove exception handling from triggers if possible (or use statement triggers, what works well if I reworked this "auditing" example above).

Questions:
Is there a way how to mitigate such issue to stop a transaction fast enough in case it might result in huge memory allocation of wal sender processes (I even run out of 300GB swap, since I have multiple subscribers and had to drop subscriptions - even deleteing the "xid" files lasts severla hours...) ?

Is the above a bug, or just a trigger function programming issue (I haven't found in docs reason not to have exception caluse in trigger function [yes, there is noted exception clause impact to performance])?

Thanks for advises/hints

Ales Zeleny

pgsql-general by date:

Previous
From: Michel Pelletier
Date:
Subject: Re: Question about MemoryContextRegisterResetCallback
Next
From: Zach van Rijn
Date:
Subject: Re: Static PostgreSQL Binaries (Linux + Windows)