Re: row filtering for logical replication - Mailing list pgsql-hackers
From | Petr Jelinek |
---|---|
Subject | Re: row filtering for logical replication |
Date | |
Msg-id | 7106a0fc-8017-c0fe-a407-9466c9407ff8@2ndquadrant.com Whole thread Raw |
In response to | Re: row filtering for logical replication (Euler Taveira <euler@timbira.com.br>) |
Responses |
Re: row filtering for logical replication
Re: row filtering for logical replication |
List | pgsql-hackers |
On 01/11/2018 01:29, Euler Taveira wrote: > Em qua, 28 de fev de 2018 às 20:03, Euler Taveira > <euler@timbira.com.br> escreveu: >> The attached patches add support for filtering rows in the publisher. >> > I rebased the patch. I added row filtering for initial > synchronization, pg_dump support and psql support. 0001 removes unused > code. 0002 reduces memory use. 0003 passes only structure member that > is used in create_estate_for_relation. 0004 reuses a parser node for > row filtering. 0005 is the feature. 0006 prints WHERE expression in > psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm > not sure some of these messages will be part of the final patch). > 0001, 0002, 0003 and 0008 are not mandatory for this feature. > > Comments? > Hi, I think there are two main topics that still need to be discussed about this patch. Firstly, I am not sure if it's wise to allow UDFs in the filter clause for the table. The reason for that is that we can't record all necessary dependencies there because the functions are black box for parser. That means if somebody drops object that an UDF used in replication filter depends on, that function will start failing. But unlike for user sessions it will start failing during decoding (well processing in output plugin). And that's not recoverable by reading the missing object, the only way to get out of that is either to move slot forward which means losing part of replication stream and need for manual resync or full rebuild of replication. Neither of which are good IMHO. Secondly, do we want to at least notify user on filters (or maybe even disallow them) with combination of action + column where column value will not be logged? I mean for example we do this when processing the filter against a row: > + ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, ecxt->ecxt_scantuple, false); But if user has expression on column which is not part of replica identity that expression will always return NULL for DELETEs because only replica identity is logged with actual values and everything else in NULL in old_tuple. So if publication replicates deletes we should check for this somehow. Btw about code (you already fixed the wrong reloid in sync so skipping that). 0002: > + for (tupn = 0; tupn < walres->ntuples; tupn++) > { > - char *cstrs[MaxTupleAttributeNumber]; > + char **cstrs; > > CHECK_FOR_INTERRUPTS(); > > /* Do the allocations in temporary context. */ > oldcontext = MemoryContextSwitchTo(rowcontext); > > + cstrs = palloc(nfields * sizeof(char *)); Not really sure that this is actually worth it given that we have to allocate and free this in a loop now while before it was just sitting on a stack. 0005: > @@ -654,5 +740,10 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue) > */ > hash_seq_init(&status, RelationSyncCache); > while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL) > + { > entry->replicate_valid = false; > + if (list_length(entry->row_filter) > 0) > + list_free(entry->row_filter); > + entry->row_filter = NIL; > + } Won't this leak memory? The list_free only frees the list cells, but not the nodes you stored there before. Also I think we should document here that the expression is run with the session environment of the replication connection (so that it's more obvious that things like CURRENT_USER will not return user which changed tuple but the replication user). It would be nice if 0006 had regression test and 0007 TAP test. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: