Re: Support logical replication of DDLs - Mailing list pgsql-hackers

From Jonathan S. Katz
Subject Re: Support logical replication of DDLs
Date
Msg-id 831d8428-7a62-3bc6-4e3b-93458469f6a7@postgresql.org
Whole thread Raw
In response to Re: Support logical replication of DDLs  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Support logical replication of DDLs
List pgsql-hackers
On 2/17/23 4:15 AM, Amit Kapila wrote:
> On Fri, Feb 17, 2023 at 1:13 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>>
>> On 2/16/23 2:38 PM, Alvaro Herrera wrote:
>>> On 2023-Feb-16, Jonathan S. Katz wrote:
>>>
>>>> On 2/16/23 12:53 PM, Alvaro Herrera wrote:
>>>
>>>>> I don't think this is the fault of logical replication.  Consider that
>>>>> for the backend server, the function source code is just an opaque
>>>>> string that is given to the plpgsql engine to interpret.  So there's no
>>>>> way for the logical DDL replication engine to turn this into runnable
>>>>> code if the table name is not qualified.
>>>>
>>>> Sure, that's fair. That said, the example above would fall under a "typical
>>>> use case", i.e. I'm replicating functions that call tables without schema
>>>> qualification. This is pretty common, and as logical replication becomes
>>>> used for more types of workloads (e.g. high availability), we'll definitely
>>>> see this.
>>>
>>> Hmm, I think you're saying that replay should turn check_function_bodies
>>> off, and I think I agree with that.
>>
>> Yes, exactly. +1
>>
> 
> But will that be sufficient? I guess such functions can give errors at
> a later stage when invoked at DML or another DDL time. Consider the
> following example:
> 
> Pub:
> CREATE PUBLICATION pub FOR ALL TABLES with (ddl = 'all');
> 
> Sub:
> (Set check_function_bodies = off in postgresql.conf)
> CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres' PUBLICATION pub;
> 
> Pub:
> CREATE FUNCTION t1(a int) RETURNS int AS $$
> select a+1;
> $$ LANGUAGE sql;
> CREATE FUNCTION t(a int) RETURNS int AS $$
> select t1(a);
> $$ LANGUAGE sql;
> CREATE TABLE tbl1 (a int primary key, b text);
> create index idx on tbl1(t(a));
> 
> insert into tbl1 values (1,1); -- This insert on publisher causes an
> error on the subscriber. Check subscriber Logs (ERROR:  function
> t1(integer) does not exist at character 9.)

I did reproduce this as is. I also reproduced this when I rewrote the 
function in PL/pgSQL.

I also did an experiment using PL/v8[1] where I rewrote the functions 
above and did two tests: one via SPI, the other via PL/v8's ability to 
find and call a function[2]. In both cases, the INSERT statement failed 
citing the inability to find the function. The calls did work when I 
schema-qualified.

However, when I converted the SQL-only functions to use the SQL standard 
syntax (BEGIN ATOMIC), I did not get this error and was able to 
successfully use this index with the table on both publisher and 
subscriber. I believe this is due to the generated function body having 
all of the schema qualifications in it.

> This happens because of the function used in the index expression.
> Now, this is not the only thing, the replication can even fail during
> DDL replication when the function like above is IMMUTABLE and used as
> follows: ALTER TABLE tbl ADD COLUMN d int DEFAULT t(1);
> 
> Normally, it is recommended that users can fix such errors by
> schema-qualifying affected names. See commits 11da97024a and
> 582edc369c.

I'm very familiar with those CVEs, but even though these are our 
recommended best practices, there is still a lot of code that does not 
schema-qualify the names of functions (including many of our own examples ;)

If we're going to say "You can use logical replication to replicate 
functions, but you have to ensure you've schema-qualified any function 
calls within them," I think that will prevent people from being able to 
use this feature, particularly on existing applications.

I guess there's a connection I'm missing here. For the failing examples 
above, I look at the pg_proc entries on both the publisher and the 
subscriber and they're identical. I'm not understanding why creating and 
executing the functions works on the publisher, but it does not on the 
subscriber. What additional info would the subscriber need to be able to 
successfully run these functions? Would we need to pass in some 
additional context, e.g. what the search_path was at the time the 
publisher created the function?

Thanks,

Jonathan

[1] https://plv8.github.io/
[2] https://plv8.github.io/#-code-plv8-find_function-code-

Attachment

pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)
Next
From: "shiy.fnst@fujitsu.com"
Date:
Subject: "out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes