Re: row filtering for logical replication - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: row filtering for logical replication
Date
Msg-id CAHE3wggWAtbT+Yt_MDy0uQwZcc6x820OwHp3_TBtAqtBeW4+Ug@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Responses Re: row filtering for logical replication
Re: row filtering for logical replication
Re: row filtering for logical replication
List pgsql-hackers
Em qui, 22 de nov de 2018 às 20:03, Petr Jelinek
<petr.jelinek@2ndquadrant.com> escreveu:
> Firstly, I am not sure if it's wise to allow UDFs in the filter clause
> for the table. The reason for that is that we can't record all necessary
> dependencies there because the functions are black box for parser. That
> means if somebody drops object that an UDF used in replication filter
> depends on, that function will start failing. But unlike for user
> sessions it will start failing during decoding (well processing in
> output plugin). And that's not recoverable by reading the missing
> object, the only way to get out of that is either to move slot forward
> which means losing part of replication stream and need for manual resync
> or full rebuild of replication. Neither of which are good IMHO.
>
It is a foot gun but there are several ways to do bad things in
postgres. CREATE PUBLICATION is restricted to superusers and role with
CREATE privilege in current database. AFAICS a role with CREATE
privilege cannot drop objects whose owner is not himself. I wouldn't
like to disallow UDFs in row filtering expressions just because
someone doesn't set permissions correctly. Do you have any other case
in mind?

> Secondly, do we want to at least notify user on filters (or maybe even
> disallow them) with combination of action + column where column value
> will not be logged? I mean for example we do this when processing the
> filter against a row:
>
> > +             ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, ecxt->ecxt_scantuple, false);
>
We could emit a LOG message. That could possibly be an option but it
could be too complex for the first version.

> But if user has expression on column which is not part of replica
> identity that expression will always return NULL for DELETEs because
> only replica identity is logged with actual values and everything else
> in NULL in old_tuple. So if publication replicates deletes we should
> check for this somehow.
>
In this case, we should document this behavior. That is a recurring
question in wal2json issues. Besides that we should explain that
UPDATE/DELETE tuples doesn't log all columns (people think the
behavior is equivalent to triggers; it is not unless you set REPLICA
IDENTITY FULL).

> Not really sure that this is actually worth it given that we have to
> allocate and free this in a loop now while before it was just sitting on
> a stack.
>
That is a experimentation code that should be in a separate patch.
Don't you think low memory use is a good goal? I also think that
MaxTupleAttributeNumber is an extreme value. I didn't some preliminary
tests and didn't notice overheads. I'll leave these modifications in a
separate patch.

> Won't this leak memory? The list_free only frees the list cells, but not
> the nodes you stored there before.
>
Good catch. It should be list_free_deep.

> Also I think we should document here that the expression is run with the
> session environment of the replication connection (so that it's more
> obvious that things like CURRENT_USER will not return user which changed
> tuple but the replication user).
>
Sure.

> It would be nice if 0006 had regression test and 0007 TAP test.
>
Sure.

Besides the problem presented by Hironobu-san, I'm doing some cleanup
and improving docs. I also forget to declare pg_publication_rel TOAST
table.

Thanks for your review.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: prewarm compiler warnings
Next
From: Alvaro Herrera
Date:
Subject: Re: row filtering for logical replication