Re: row filtering for logical replication - Mailing list pgsql-hackers
From | Euler Taveira |
---|---|
Subject | Re: row filtering for logical replication |
Date | |
Msg-id | 3ddceeeb-c6b6-419f-8d11-4f61eb3c3d46@www.fastmail.com Whole thread Raw |
In response to | Re: row filtering for logical replication (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
List | pgsql-hackers |
On Sun, Jul 11, 2021, at 8:09 PM, Tomas Vondra wrote:
I took a look at this patch, which seems to be in CF since 2018. I haveonly some basic comments and observations at this point:
Tomas, thanks for reviewing this patch again.
1) alter_publication.sgmlI think "expression is executed" sounds a bit strange, perhaps"evaluated" would be better?
Fixed.
2) create_publication.sgmlWhy is the patch changing publish_via_partition_root docs? That seemslike a rather unrelated bit.
Removed. I will submit a separate patch for this.
The <literal>WHERE</literal> clause should probably contain onlycolumns that are part of the primary key or be covered by<literal>REPLICA ...I'm not sure what exactly is this trying to say. What does "shouldprobably ..." mean in practice for the users? Does that mean somethingbad will happen for other columns, or what? I'm sure this wording willbe quite confusing for users.
Reading again it seems "probably" is confusing. Let's remove it.
It may also be unclear whether the condition is evaluated on the old ornew row, so perhaps add an example illustrating that & more detailedcomment, or something. E.g. what will happen withUPDATE departments SET active = false WHERE active;
Yeah. I avoided to mention this internal detail about old/new row but it seems
better to be clear. How about the following paragraph?
<para>
The <literal>WHERE</literal> clause should contain only columns that are
part of the primary key or be covered by <literal>REPLICA
IDENTITY</literal> otherwise, <command>DELETE</command> operations will not
be replicated. That's because old row is used and it only contains primary
key or columns that are part of the <literal>REPLICA IDENTITY</literal>; the
remaining columns are <literal>NULL</literal>. For <command>INSERT</command>
and <command>UPDATE</command> operations, any column might be used in the
<literal>WHERE</literal> clause. New row is used and it contains all
columns. A <literal>NULL</literal> value causes the expression to evaluate
to false; avoid using columns without not-null constraints in the
<literal>WHERE</literal> clause. The <literal>WHERE</literal> clause does
not allow functions and user-defined operators.
</para>
3) publication_add_relationDoes this need to build the parse state even for whereClause == NULL?
No. Fixed.
4) AlterPublicationTablesI wonder if this new reworked code might have issues with subscriptionscontaining many tables, but I haven't tried.
This piece of code is already complicated. Amit complained about it too [1].
Are you envisioning any specific issue (other than open thousands of relations,
do some stuff, and close them all)? IMO the open/close relation should be
postponed for as long as possible.
5) OpenTableListI really dislike that the list can have two different node types(Relation and PublicationTable). In principle we don't actually need theextra flag, we can simply check the node type directly by IsA() and actbased on that. However, I think it'd be better to just use a single nodetype from all places.
Amit complained about having a runtime test for ALTER PUBLICATION ... DROP
TABLE in case user provides a WHERE clause [2]. I did that way (runtime test)
because it simplified the code. I would tend to avoid moving grammar task into
a runtime, that's why I agreed to change it. I didn't like the multi-node
argument handling for OpenTableList() (mainly because of the extra argument in
the function signature) but with your suggestion (IsA()) maybe it is
acceptable. What do you think? I included IsA() in v19.
I don't see why not to set whereClause every time, I don't think theextra if saves anything, it's just a bit more complex.
See runtime test in [2].
5) CloseTableListThe comment about node types seems pointless, this function has no flagand the element type does not matter.
Fixed.
6) parse_agg.c... are not allowed in publication WHERE expressionsI think all similar cases use "WHERE conditions" instead.
No. Policy, index, statistics, partition, column generation use expressions.
COPY and trigger use conditions. It is also referred as expression in the
synopsis.
7) transformExprRecurseThe check at the beginning seems rather awkward / misplaced - it's waytoo specific for this location (there are no other p_expr_kindreferences in this function). Wouldn't transformFuncCall (or maybeParseFuncOrColumn) be a more appropriate place?
Probably. I have to try the multiple possibilities to make sure it forbids all
cases.
Initially I was wondering why not to allow function calls in WHEREconditions, but I see that was discussed in the past as problematic. Butthat reminds me that I don't see any docs describing what expressionsare allowed in WHERE conditions - maybe we should explicitly list whatexpressions are allowed?
I started to investigate how to safely allow built-in functions. There is a
long discussion about using functions in a logical decoding context. As I said
during the last CF for v14, I prefer this to be a separate feature. I realized
that I mentioned that functions and user-defined operators are not allowed in
the commit message but forgot to mention it in the documentation.
8) pgoutput.cI have not reviewed this in detail yet, but there seems to be somethingwrong because `make check-world` fails in subscription/010_truncate.plafter hitting an assert (backtrace attached) during "START_REPLICATIONSLOT" in get_rel_sync_entry in this code:
That's because I didn't copy the TupleDesc in CacheMemoryContext. Greg pointed
it too in a previous email [3]. The new patch (v19) includes a fix for it.
pgsql-hackers by date: