Re: [BUGS] BUG #14758: Segfault with logical replication on afunction index - Mailing list pgsql-bugs

From Masahiko Sawada
Subject Re: [BUGS] BUG #14758: Segfault with logical replication on afunction index
Date
Msg-id CAD21AoCHutPoUdtZg2RHDneXEFnbL3zp+FcQkMrZ8Lrfa3kZLQ@mail.gmail.com
Whole thread Raw
In response to [BUGS] BUG #14758: Segfault with logical replication on a function index  (scott@deltaex.com)
Responses Re: [BUGS] BUG #14758: Segfault with logical replication on afunction index
Re: [BUGS] BUG #14758: Segfault with logical replication on afunction index
List pgsql-bugs
On Mon, Jul 24, 2017 at 4:22 PM,  <scott@deltaex.com> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      14758
> Logged by:          Scott Milliken
> Email address:      scott@deltaex.com
> PostgreSQL version: 10beta2
> Operating system:   Linux 4.10.0-27-generic #30~16.04.2-Ubuntu S
> Description:
>
> I'm testing logical replication on 10beta2, and found a segfault that I can
> reliably reproduce with an index on a not-actually immutable function.
>
> Here's the function in question:
>
> ```
> CREATE OR REPLACE FUNCTION public.immutable_random(integer)
>  RETURNS double precision
>  LANGUAGE sql
>  IMMUTABLE
> AS $function$SELECT random();
> $function$;
> ```
>
> It's not actually immutable since it's calling random (a hack to get an
> efficient random sort on a table).
>
> (Aside: I'd understand if it errored on creation of the index, but would
> really prefer to keep using this instead of tablesample because it's fast,
> deterministic, and doesn't have sampling biases like the SYSTEM sampling.)
>
>
> Here's full reproduction instructions:
>
>
> Primary:
> ```
> mkdir -p /tmp/test-seg0
> PGPORT=5301 initdb -D /tmp/test-seg0
> echo "wal_level = logical" >> /tmp/test-seg0/postgresql.conf
> PGPORT=5301 pg_ctl -D /tmp/test-seg0 start
> for (( ; ; )); do if pg_isready -d postgres -p 5301; then break; fi; sleep
> 1; done
> psql -p 5301 postgres -c "CREATE USER test WITH PASSWORD 'test' SUPERUSER
> CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
> createdb -p 5301 -E utf8 test
>
> psql -p 5301 -U test test -c "CREATE TABLE testtbl (id int, name text);"
> psql -p 5301 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
> testtbl_pkey PRIMARY KEY (id);"
> psql -p 5301 -U test test -c "CREATE PUBLICATION testpub FOR TABLE
> testtbl;"
> psql -p 5301 -U test test -c "INSERT INTO testtbl (id, name) VALUES (1,
> 'a');"
> ```
>
> Secondary:
> ```
> mkdir -p /tmp/test-seg1
> PGPORT=5302 initdb -D /tmp/test-seg1
> PGPORT=5302 pg_ctl -D /tmp/test-seg1 start
> for (( ; ; )); do if pg_isready -d postgres -p 5302; then break; fi; sleep
> 1; done
> psql -p 5302 postgres -c "CREATE USER test WITH PASSWORD 'test' SUPERUSER
> CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
> createdb -p 5302 -E utf8 test
>
> psql -p 5302 -U test test -c "CREATE TABLE testtbl (id int, name text);"
> psql -p 5302 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
> testtbl_pkey PRIMARY KEY (id);"
> psql -p 5302 -U test test -c 'CREATE FUNCTION
> public.immutable_random(integer) RETURNS double precision LANGUAGE sql
> IMMUTABLE AS $function$ SELECT random(); $function$'
> psql -p 5302 -U test test -c "CREATE INDEX ix_testtbl_random ON testtbl
> USING btree (immutable_random(id));"
> psql -p 5302 -U test test -c "CREATE SUBSCRIPTION test0_testpub CONNECTION
> 'port=5301 user=test dbname=test' PUBLICATION testpub;"
> ```
>
> The secondary crashes with a segfault:
>
> ```
> 2017-07-23 23:55:37.961 PDT [4823] LOG:  logical replication table
> synchronization worker for subscription "test0_testpub", table "testtbl"
> has started
> 2017-07-23 23:55:38.244 PDT [4758] LOG:  worker process: logical replication
> worker for subscription 16396 sync 16386 (PID 4823) was terminated by signal
> 11: Segmentation fault
> 2017-07-23 23:55:38.244 PDT [4758] LOG:  terminating any other active server
> processes
> 2017-07-23 23:55:38.245 PDT [4763] WARNING:  terminating connection because
> of crash of another server process
> 2017-07-23 23:55:38.245 PDT [4763] DETAIL:  The postmaster has commanded
> this server process to roll back the current transaction and exit, because
> another server process exited
>  abnormally and possibly corrupted shared memory.
> 2017-07-23 23:55:38.245 PDT [4763] HINT:  In a moment you should be able to
> reconnect to the database and repeat your command.
> 2017-07-23 23:55:38.247 PDT [4758] LOG:  all server processes terminated;
> reinitializing
> 2017-07-23 23:55:38.256 PDT [4826] LOG:  database system was interrupted;
> last known up at 2017-07-23 23:55:36 PDT
> 2017-07-23 23:55:38.809 PDT [4826] LOG:  database system was not properly
> shut down; automatic recovery in progress
> 2017-07-23 23:55:38.812 PDT [4826] LOG:  redo starts at 0/173AEA0
> 2017-07-23 23:55:38.815 PDT [4826] LOG:  invalid record length at 0/17B50B0:
> wanted 24, got 0
> 2017-07-23 23:55:38.815 PDT [4826] LOG:  redo done at 0/17B5070
> 2017-07-23 23:55:38.815 PDT [4826] LOG:  last completed transaction was at
> log time 2017-07-23 23:55:37.962957-07
> ```
>

Thank you for the reporting and precise reproducing steps!
I could reproduced this issue and it seems to me that the cause of
this is that the table sync worker didn't get a snapshot before
starting table copy. Attached patch fixes this problem.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

pgsql-bugs by date:

Previous
From: Rick Otten
Date:
Subject: Re: [BUGS] signal 11 segfaults with parallel workers
Next
From: Etsuro Fujita
Date:
Subject: Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitionsfail