Re: row filtering for logical replication - Mailing list pgsql-hackers
From | Euler Taveira |
---|---|
Subject | Re: row filtering for logical replication |
Date | |
Msg-id | CAHE3wggWAtbT+Yt_MDy0uQwZcc6x820OwHp3_TBtAqtBeW4+Ug@mail.gmail.com Whole thread Raw |
In response to | Re: row filtering for logical replication (Petr Jelinek <petr.jelinek@2ndquadrant.com>) |
Responses |
Re: row filtering for logical replication
Re: row filtering for logical replication Re: row filtering for logical replication |
List | pgsql-hackers |
Em qui, 22 de nov de 2018 às 20:03, Petr Jelinek <petr.jelinek@2ndquadrant.com> escreveu: > Firstly, I am not sure if it's wise to allow UDFs in the filter clause > for the table. The reason for that is that we can't record all necessary > dependencies there because the functions are black box for parser. That > means if somebody drops object that an UDF used in replication filter > depends on, that function will start failing. But unlike for user > sessions it will start failing during decoding (well processing in > output plugin). And that's not recoverable by reading the missing > object, the only way to get out of that is either to move slot forward > which means losing part of replication stream and need for manual resync > or full rebuild of replication. Neither of which are good IMHO. > It is a foot gun but there are several ways to do bad things in postgres. CREATE PUBLICATION is restricted to superusers and role with CREATE privilege in current database. AFAICS a role with CREATE privilege cannot drop objects whose owner is not himself. I wouldn't like to disallow UDFs in row filtering expressions just because someone doesn't set permissions correctly. Do you have any other case in mind? > Secondly, do we want to at least notify user on filters (or maybe even > disallow them) with combination of action + column where column value > will not be logged? I mean for example we do this when processing the > filter against a row: > > > + ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, ecxt->ecxt_scantuple, false); > We could emit a LOG message. That could possibly be an option but it could be too complex for the first version. > But if user has expression on column which is not part of replica > identity that expression will always return NULL for DELETEs because > only replica identity is logged with actual values and everything else > in NULL in old_tuple. So if publication replicates deletes we should > check for this somehow. > In this case, we should document this behavior. That is a recurring question in wal2json issues. Besides that we should explain that UPDATE/DELETE tuples doesn't log all columns (people think the behavior is equivalent to triggers; it is not unless you set REPLICA IDENTITY FULL). > Not really sure that this is actually worth it given that we have to > allocate and free this in a loop now while before it was just sitting on > a stack. > That is a experimentation code that should be in a separate patch. Don't you think low memory use is a good goal? I also think that MaxTupleAttributeNumber is an extreme value. I didn't some preliminary tests and didn't notice overheads. I'll leave these modifications in a separate patch. > Won't this leak memory? The list_free only frees the list cells, but not > the nodes you stored there before. > Good catch. It should be list_free_deep. > Also I think we should document here that the expression is run with the > session environment of the replication connection (so that it's more > obvious that things like CURRENT_USER will not return user which changed > tuple but the replication user). > Sure. > It would be nice if 0006 had regression test and 0007 TAP test. > Sure. Besides the problem presented by Hironobu-san, I'm doing some cleanup and improving docs. I also forget to declare pg_publication_rel TOAST table. Thanks for your review. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
pgsql-hackers by date: