Thread: Watching for view changes
Hi! I am trying to see how I could use NOTIFY/LISTEN to watch changes of a complicated SELECT query which spans multiple tables. Ideally, I would like to leave to PostgreSQL to determine when some data (and which data) in the result of the SELECT query has changed. So I am thinking that creating a temporary view using that query could be a way, only if I would find a way to watch such view for changes somehow. But it seems this is not really possible. I looked into two mechanisms: - Logical replication. Instead of NOTIFY/LISTEN I could simply create a publication over a view and then subscribe to it. But it seems logical replication can be done only over base tables and not views. [1] - Using "after" trigger on the view to get notification when the view gets changed. I could even use transition relations to have information what changed. But sadly it seems that this is possible only if there is also INSTEAD OF trigger on the view. But I would like to get notification when the view has changed because underlying tables have changed, and not because of an UPDATE query on the view itself. Moreover, I do not really need writable views. [2] So I wonder if I am missing anything. Is there some other best practice how to get notifications when result of a query changes in real-time? And information what changed? How hard it would be to implement such triggers on a view for whenever a view changes? Is there a process to make a feature request? (Also, I have not really managed to get statement level "after" triggers to be run on a view for at all. Because if I rewrite a query with INSTEAD OF then triggers on those tables are triggered, not really view's. So not sure what is even expected use there.) [1] https://www.postgresql.org/docs/devel/logical-replication-restrictions.html [2] https://www.postgresql.org/docs/devel/trigger-definition.html Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On 12/20/18 3:17 AM, Mitar wrote: > Hi! > > I am trying to see how I could use NOTIFY/LISTEN to watch changes of a > complicated SELECT query which spans multiple tables. Ideally, I would > like to leave to PostgreSQL to determine when some data (and which > data) in the result of the SELECT query has changed. So I am thinking [snip] There was a LONG thread on this list a few months ago about monitoring for DDL changes. It should prove helpful (or cause you to despair). -- Angular momentum makes the world go 'round.
Hi! Thanks. Care for a link or at least thread subject? Also, Googling around this seems a pretty popular request. I am quite surprised that there is not something out-of-box available for this, efficient and scalable. Mitar On Thu, Dec 20, 2018 at 7:33 AM Ron <ronljohnsonjr@gmail.com> wrote: > > On 12/20/18 3:17 AM, Mitar wrote: > > Hi! > > > > I am trying to see how I could use NOTIFY/LISTEN to watch changes of a > > complicated SELECT query which spans multiple tables. Ideally, I would > > like to leave to PostgreSQL to determine when some data (and which > > data) in the result of the SELECT query has changed. So I am thinking > [snip] > > There was a LONG thread on this list a few months ago about monitoring for > DDL changes. It should prove helpful (or cause you to despair). > > > -- > Angular momentum makes the world go 'round. > -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On 12/20/18 7:54 AM, Mitar wrote: > Hi! > > Thanks. Care for a link or at least thread subject? https://www.postgresql.org/message-id/CA%2BFnnTxqScf6mDw_7HLMfu7YTagPrsYUw-bc%3DOKsBQ0TqprvEA%40mail.gmail.com > > Also, Googling around this seems a pretty popular request. I am quite > surprised that there is not something out-of-box available for this, > efficient and scalable. > > > Mitar > > On Thu, Dec 20, 2018 at 7:33 AM Ron <ronljohnsonjr@gmail.com> wrote: >> >> On 12/20/18 3:17 AM, Mitar wrote: >>> Hi! >>> >>> I am trying to see how I could use NOTIFY/LISTEN to watch changes of a >>> complicated SELECT query which spans multiple tables. Ideally, I would >>> like to leave to PostgreSQL to determine when some data (and which >>> data) in the result of the SELECT query has changed. So I am thinking >> [snip] >> >> There was a LONG thread on this list a few months ago about monitoring for >> DDL changes. It should prove helpful (or cause you to despair). >> >> >> -- >> Angular momentum makes the world go 'round. >> > > -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Dec 20, 2018 at 4:17 AM Mitar <mmitar@gmail.com> wrote: > > Hi! > > I am trying to see how I could use NOTIFY/LISTEN to watch changes of a > complicated SELECT query which spans multiple tables. Ideally, I would > like to leave to PostgreSQL to determine when some data (and which > data) in the result of the SELECT query has changed. So I am thinking > that creating a temporary view using that query could be a way, only > if I would find a way to watch such view for changes somehow. > If, as I assume, the FROM clause of the SELECT is essentially static, just with varying WHERE conditions, you could just use an AFTER INSERT/UPDATE/DELETE trigger on each of the tables involved to send a NOTIFY whenever data that is exposed by the SELECT is changed. You can deliver a payload, such as the table name, primary key value and, with a little work, even a list of fields that were modified as a JSON blob, and let the application do whatever needs to be done to react the the changes -- issue other queries, etc. Of course that depends on your application knowing when it's appropriate to NOTIFY, or being able to handle spurious NOTIFYs. HTH, -- Mike Rylander | Executive Director | Equinox Open Library Initiative | phone: 1-877-OPEN-ILS (673-6457) | email: miker@equinoxinitiative.org | web: http://equinoxinitiative.org > But it seems this is not really possible. I looked into two mechanisms: > > - Logical replication. Instead of NOTIFY/LISTEN I could simply create > a publication over a view and then subscribe to it. But it seems > logical replication can be done only over base tables and not views. > [1] > - Using "after" trigger on the view to get notification when the view > gets changed. I could even use transition relations to have > information what changed. But sadly it seems that this is possible > only if there is also INSTEAD OF trigger on the view. But I would like > to get notification when the view has changed because underlying > tables have changed, and not because of an UPDATE query on the view > itself. Moreover, I do not really need writable views. [2] > > So I wonder if I am missing anything. Is there some other best > practice how to get notifications when result of a query changes in > real-time? And information what changed? > > How hard it would be to implement such triggers on a view for whenever > a view changes? Is there a process to make a feature request? > > (Also, I have not really managed to get statement level "after" > triggers to be run on a view for at all. Because if I rewrite a query > with INSTEAD OF then triggers on those tables are triggered, not > really view's. So not sure what is even expected use there.) > > [1] https://www.postgresql.org/docs/devel/logical-replication-restrictions.html > [2] https://www.postgresql.org/docs/devel/trigger-definition.html > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m >
Hi! Ah, that was a thread about table changes. I am interested in data changes (results from a query). Thanks. Mitar On Thu, Dec 20, 2018 at 8:01 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > On 12/20/18 7:54 AM, Mitar wrote: > > Hi! > > > > Thanks. Care for a link or at least thread subject? > > https://www.postgresql.org/message-id/CA%2BFnnTxqScf6mDw_7HLMfu7YTagPrsYUw-bc%3DOKsBQ0TqprvEA%40mail.gmail.com > > > > > Also, Googling around this seems a pretty popular request. I am quite > > surprised that there is not something out-of-box available for this, > > efficient and scalable. > > > > > > Mitar > > > > On Thu, Dec 20, 2018 at 7:33 AM Ron <ronljohnsonjr@gmail.com> wrote: > >> > >> On 12/20/18 3:17 AM, Mitar wrote: > >>> Hi! > >>> > >>> I am trying to see how I could use NOTIFY/LISTEN to watch changes of a > >>> complicated SELECT query which spans multiple tables. Ideally, I would > >>> like to leave to PostgreSQL to determine when some data (and which > >>> data) in the result of the SELECT query has changed. So I am thinking > >> [snip] > >> > >> There was a LONG thread on this list a few months ago about monitoring for > >> DDL changes. It should prove helpful (or cause you to despair). > >> > >> > >> -- > >> Angular momentum makes the world go 'round. > >> > > > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com -- http://mitar.tnode.com/ https://twitter.com/mitar_m
> On Dec 20, 2018, at 1:04 PM, Mitar <mmitar@gmail.com> wrote: > > Hi! > > Ah, that was a thread about table changes. I am interested in data > changes (results from a query). > > Thanks. > Are you hoping to see the difference in the returned values for successive calls to the same query?
Hi! On Thu, Dec 20, 2018 at 8:11 AM Mike Rylander <mrylander@gmail.com> wrote: > If, as I assume, the FROM clause of the SELECT is essentially static, > just with varying WHERE conditions, you could just use an AFTER > INSERT/UPDATE/DELETE trigger on each of the tables involved to send a > NOTIFY whenever data that is exposed by the SELECT is changed. You mean that for every SQL query I am making to a database, I would manually determine which tables are involved and then setup triggers with NOTIFY that the SELECT might have changed? I am trying to see if this could be something I could abstract out that it would be done automatically for any query. I have looked into using EXPLAIN to get a list of tables involved in a query, but the issue is that it look non-trivial to determine what has really changed as a consequence of those tables changing. I would have to cache myself what was the result of a query and then myself do a diff? Are there any ways to do this inside PostgreSQL without having to cache results on the client and do it there? > You can deliver a payload, such as the table name, primary key value and, > with a little work, even a list of fields that were modified as a JSON > blob, and let the application do whatever needs to be done to react > the the changes -- issue other queries, etc. But this would mean that I would have to know how changes on involved tables influence query results. I would like to not have to do SQL query parsing and understanding on the client. So ideally, I would get information directly from PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on FOR EACH ROW on a view would be perfect. In that trigger I could get information which rows of the view changed and then use NOTIFY to inform the client. Or even use transition relations to get old and new state in the case FOR EACH STATEMENT (but then I would still have to diff it probably myself). And view could represent any query, without me having to try to understand and parse it. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Hi! On Thu, Dec 20, 2018 at 12:07 PM Rob Sargent <robjsargent@gmail.com> wrote: > Are you hoping to see the difference in the returned values for successive calls to the same query? i would like to in as close to real-time as possible get notification when results of a query would have changed if I would do the same query again, without me having to do polling or without me having to do diffing. So if my query was SELECT * FROM table then having after update trigger on that table would give me such information on a row to row basis, or statement basis. But I am not doing just such queries. So I was thinking that I could create a view and then do SELECT * FROM view to get update trigger on changes on the view. So changes would happen because of changes do underlying tables. And I would like to be able to know when and what in that query has changed when underlying tables have changed. I have found this package [1] which seems to be close in the idea, but it does not seem to work in all cases. There is also an older different package. [2] I would like to see if something like this could be done inside database itself. [1] https://github.com/nothingisdead/pg-live-query [2] https://github.com/numtel/pg-live-select Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
From: Mitar <mmitar@gmail.com> > On Thu, Dec 20, 2018 at 12:07 PM Rob Sargent <robjsargent@gmail.com> wrote: > > Are you hoping to see the difference in the returned values for successive calls to the same query? > > i would like to in as close to real-time as possible get notification when results of a query would have changed if I woulddo the same query again, without me having to do polling or without me having to do diffing. Hmm, I guess I could see that as long as the DB wasn't too [write] busy, else you'd be flooded with notifications. Some years ago when I was working on a web app in Perl (which uses the DBI module for all communication to the DB), I subclassedDBI in order to see all calls to the DB. For most statements, I just let all the calls go thru. But for inserts,I wrote to a file the data that was being inserted; for deletes I wrote what was being deleted; and for updates Iwrote the before and after values. (For the last 2 I changed the delete/update into a select to get the data.) It made itmuch easier to see how data changed -- especially when I was new to the app. You could do something like that, where you have an interceptor that reports on data changes, filtering/searching for justthe parts you want as you see fit. Of course, that would be just for your app, it wouldn't catch changes made from psqland other tools. Maybe it's a useful idea for you ... or maybe not. 😊 If you had to have all statements no matter what tool was used to change data, I'd probably change log_statements to "all","tail -f" the Pg log, and "do the right thing" (which could be non-trivial). Kevin This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information.If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, youare hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attachedto this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notifyus by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them todisk. Thank you.
On 12/20/18 2:20 PM, Mitar wrote: > Hi! > > On Thu, Dec 20, 2018 at 12:07 PM Rob Sargent <robjsargent@gmail.com> wrote: >> Are you hoping to see the difference in the returned values for successive calls to the same query? > i would like to in as close to real-time as possible get notification > when results of a query would have changed if I would do the same > query again, without me having to do polling or without me having to > do diffing. That's nothing like what you wrote in the Subject line. -- Angular momentum makes the world go 'round.
On Thu, 20 Dec 2018 12:12:14 -0800, Mitar <mmitar@gmail.com> wrote: >On Thu, Dec 20, 2018 at 8:11 AM Mike Rylander <mrylander@gmail.com> wrote: >> If, as I assume, the FROM clause of the SELECT is essentially static, >> just with varying WHERE conditions, you could just use an AFTER >> INSERT/UPDATE/DELETE trigger on each of the tables involved to send a >> NOTIFY whenever data that is exposed by the SELECT is changed. > >You mean that for every SQL query I am making to a database, I would >manually determine which tables are involved and then setup triggers >with NOTIFY that the SELECT might have changed? You can just put an update trigger on every table. You aren't forced to listen for notifications. >I am trying to see if this could be something I could abstract out >that it would be done automatically for any query. I have looked into >using EXPLAIN to get a list of tables involved in a query, but the >issue is that it look non-trivial to determine what has really changed >as a consequence of those tables changing. I would have to cache >myself what was the result of a query and then myself do a diff? Are >there any ways to do this inside PostgreSQL without having to cache >results on the client and do it there? You don't need to cache anything on the client. An update trigger on a table can notify a listening client when data is changed. The difficulty is that views are not guaranteed to be updateable. As a technical matter, you can put a trigger on a view, but it may never fire. AFAIK, update and delete triggers do work on materialized views, because they really are physical tables. Depending on your use case, materialized views may or may not be of help to you. >> You can deliver a payload, such as the table name, primary key value and, >> with a little work, even a list of fields that were modified as a JSON >> blob, and let the application do whatever needs to be done to react >> the the changes -- issue other queries, etc. > >But this would mean that I would have to know how changes on involved >tables influence query results. The trigger function itself is agnostic WRT the format of the table - the old and new row data are provided generically as records, and you can to convert the record data, e.g., to JSON or XML, without knowing its format. AFAIHS, you really only need to know the table format to inspect or modify the row data. >I would like to not have to do SQL query parsing and understanding >on the client. So ideally, I would get information directly from >PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on >FOR EACH ROW on a view would be perfect. In that trigger I could get >information which rows of the view changed and then use NOTIFY to >inform the client. You might want to use BEFORE UPDATE so you get both the old and new row data. YMMV, George
Hi!
On Thu, Dec 20, 2018 at 12:54 PM Kevin Brannen <KBrannen@efji.com> wrote:
Hmm, I guess I could see that as long as the DB wasn't too [write] busy, else you'd be flooded with notifications.
Sure. But this could be resolved by allowing notifications to be batched together. Debounce them. So could maybe configure how often you want such notifications and if they are more often they would be combined together into one.
Maybe it's a useful idea for you ... or maybe not. 😊
Thanks. Yes, this is one approach to do it. Hooking into every modify call at the app level and in this way have some information what is changing. I would prefer doing it in the database though, so that it could be independent from the source of the change. Moreover, not all UPDATE queries really do end up updating the data.
Mitar
--
Hi! Thanks for more responses. On Thu, Dec 20, 2018 at 6:28 PM George Neuner <gneuner2@comcast.net> wrote: > >You mean that for every SQL query I am making to a database, I would > >manually determine which tables are involved and then setup triggers > >with NOTIFY that the SELECT might have changed? > > You can just put an update trigger on every table. You aren't forced > to listen for notifications. You are right. It is easier to just have triggers in advance. > The difficulty is that views are not guaranteed to be updateable. What you mean by that? I mean, just to be clear. I care only about read-only views. The changes to views I care about change view because underlying tables are updated. I do not care about UPDATE queries against views themselves. So you mean here that it is not guaranteed that you can make an UPDATE query against a view? Yes, that I know. But if underlying tables change, view is always updated, no? I mean, this is the main purpose of a view. :-) > As a technical matter, you can put a trigger on a view, but it may never fire. If fires only for UPDATE queries against views themselves (and only if INSTEAD OF trigger does not invalidate the update). But it never fires for updates which happen because of changes to the underlying tables. I would like to know: a) Do people agree/think that would be a good API for my use case? b) How hard would it be to implement/develop something like that? Is this something PostgreSQL already knows internally and it is just a question of exposing it? c) Is there some better way to achieve this? > AFAIK, update and delete triggers do work on materialized views, > because they really are physical tables. Depending on your use case, > materialized views may or may not be of help to you. Yes, materialized views are too heavy for me. But having update and delete triggers only memory-only temporary views would be perfect. Also, materialized views have to be manually refreshed, no? So it is not really true that they get updated automatically (and that update triggers would run) as soon as underling tables are modified? > >I would like to not have to do SQL query parsing and understanding > >on the client. So ideally, I would get information directly from > >PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on > >FOR EACH ROW on a view would be perfect. In that trigger I could get > >information which rows of the view changed and then use NOTIFY to > >inform the client. > > You might want to use BEFORE UPDATE so you get both the old and new > row data. Not sure how this helps. If I have a query like (or a view defined with such query): SELECT "_id", "body", (SELECT row_to_json(posts) FROM posts WHERE posts."_id"=comments."postId") AS "post" FROM comments If I get a trigger notification that some row in "comments" table has changed. How do I get an updated row in the query results (or the view defined with such query). I would have to parse the SQL and figure out how to do transformation myself, no? So, I am not sure how triggers on underlying tables can really inform how to know what in the view has been updated? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On Fri, 21 Dec 2018 20:49:23 -0800, Mitar <mmitar@gmail.com> wrote: >On Thu, Dec 20, 2018 at 6:28 PM George Neuner <gneuner2@comcast.net> wrote: > >> The difficulty is that views are not guaranteed to be updateable. > >What you mean by that? I mean, just to be clear. I care only about >read-only views. The changes to views I care about change view because >underlying tables are updated. I do not care about UPDATE queries >against views themselves. > >So you mean here that it is not guaranteed that you can make an UPDATE >query against a view? Yes, that I know. But if underlying tables >change, view is always updated, no? I mean, this is the main purpose >of a view. :-) Terminology. <grin> UPDATE vs update. I'm certain you know this already, but to be clear: A normal view really is just a SELECT whose results reflect the current data in the underlying tables. That is, a normal view is *code*, NOT data, and its result is an anonymous *virtual* table that exists only in the transaction that applied the view. Thus the results can't be monitored for changes without employing some kind of deliberate caching. A materialized view IS exactly such a deliberate cache of results from applying a view. It is a real table that can be monitored for changes using INSERT, UPDATE and/or DELETE triggers. >> As a technical matter, you can put a trigger on a view, but it may never fire. > >If fires only for UPDATE queries against views themselves (and only if >INSTEAD OF trigger does not invalidate the update). But it never fires >for updates which happen because of changes to the underlying tables. Right. A trigger on a (normal) view actually is on the virtual result table - but that table doesn't exist outside of the transaction that applied the view. I'm not certain offhand, but I *believe* that - like a SELECT - view results will change on the fly during a transaction if the underlying tables change and the isolation settings allow to see it. But even if a view will update while open, to use this you'd have to hold the result set open (with a cursor) while also keeping the transaction open. Once the result set is closed, the view results are gone. With a materialized view, you must apply the view code again (call REFRESH) to see changes to the underlying tables - it doesn't happen automagically. But when refreshed, triggers on the cache table would react to changes. >I would like to know: > >a) Do people agree/think that would be a good API for my use case? >b) How hard would it be to implement/develop something like that? Is >this something PostgreSQL already knows internally and it is just a >question of exposing it? It probably would be possible to take a trigger set on a view and transitively trigger on the base tables underlying it. But to what end? The results (of a normal view) don't persist beyond the current transaction, and allowing them to do so would, AFAICS, be a violation of the SQL standard. Postgresql is not going to do that (certainly not for a new feature, only for backward compatibility). >c) Is there some better way to achieve this? > : >Yes, materialized views are too heavy for me. But having update and >delete triggers only memory-only temporary views would be perfect. Unfortunately, materialized views are the only reasonable server side solution that I can think of. >Also, materialized views have to be manually refreshed, no? So it is >not really true that they get updated automatically (and that update >triggers would run) as soon as underling tables are modified? Yes. But you can do that with triggers on the base tables. Or timers if "most current" information is not critical. Perhaps you can use synchronized normal and materialized views. Use the normal view for current information in transactions, and the materialized view to asynchronously update passive LISTEN clients with delta information. >> >I would like to not have to do SQL query parsing and understanding >> >on the client. That I understand. <grin> I'm a compiler and language hobbiest ... even just parsing modern SQL can be painful. >> So ideally, I would get information directly from >> >PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on >> >FOR EACH ROW on a view would be perfect. In that trigger I could get >> >information which rows of the view changed and then use NOTIFY to >> >inform the client. >> >> You might want to use BEFORE UPDATE so you get both the old and new >> row data. > >Not sure how this helps. If you know specifically what columns have changed, only they would need be communicated to a LISTEN client. Not necessary to send the whole row (which may be much larger). AFTER shows you only the new row. BEFORE shows you both the old and new rows so you can determine what changed (and how, if relevant). >So, I am not sure how triggers on underlying tables can really inform >how to know what in the view has been updated? They can't - if any results are computed you'd have to apply the view again to see the changes. And then you'd have to a whole new view and not just updates to the old. I don't see any way out of this that doesn't involve a materialized view. It's the only way I can see to get just updates and not deal with the whole result set again. YMMV, George
Hi! On Fri, Dec 21, 2018 at 11:10 PM George Neuner <gneuner2@comcast.net> wrote: > A materialized view IS exactly such a deliberate cache of results from > applying a view. It is a real table that can be monitored for changes > using INSERT, UPDATE and/or DELETE triggers. Caching is needed if you want to compute a difference between previous version and new. But if you want to just know new value, then I could imagine that (a simple implementation would) on every change to any underlying table check if this change matches selectors of the query and if such apply its operations/projections and produce the new value. So yes, you need caching if you want to decrease CPU use, but you could also see it as new values being computed again and again through query. Would such caching you are mentioning really improve performance, I do not know, so it might be premature optimization? If we do not go down the cache path, then it seems there is no other way to have this "apply this query again on those updated rows from table". In a way I see query as a transformation (in relational algebra) of original tables to results and I would need to be able to apply that transformation again on new rows. And if some additional data is necessary (some rows for other non-changed tables) it would just recompute that again, instead of using cache. If we do go down the cache path, then I agree, materialized views seems nice, but I would prefer temporary materialized views: they should be cleaned up at the end of the session. Moreover, they should be ideally just in memory, not really on disk. Materialized views are currently stored to disk, no? > With a materialized view, you must apply the view code again (call > REFRESH) to see changes to the underlying tables - it doesn't happen > automagically. But when refreshed, triggers on the cache table would > react to changes. So you are saying I could use triggers on the cache table to know what really changed instead of having to compute diff myself? Interesting. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Hi, perhaps you can use triggers for some tables.
Regards.
Obtener Outlook para Android
From: Mitar <mmitar@gmail.com>
Sent: Saturday, December 22, 2018 1:21:49 AM
To: Kevin Brannen
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Watching for view changes
Sent: Saturday, December 22, 2018 1:21:49 AM
To: Kevin Brannen
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Watching for view changes
Hi!
On Thu, Dec 20, 2018 at 12:54 PM Kevin Brannen <KBrannen@efji.com> wrote:
Hmm, I guess I could see that as long as the DB wasn't too [write] busy, else you'd be flooded with notifications.
Sure. But this could be resolved by allowing notifications to be batched together. Debounce them. So could maybe configure how often you want such notifications and if they are more often they would be combined together into one.
Maybe it's a useful idea for you ... or maybe not. 😊
Thanks. Yes, this is one approach to do it. Hooking into every modify call at the app level and in this way have some information what is changing. I would prefer doing it in the database though, so that it could be independent from the source of the change. Moreover, not all UPDATE queries really do end up updating the data.
Mitar
--
Is this of theoretical interest (up to and including a specification/requirement) or this a practical concern (i.e. need to know when to update somebody’s dashboard widget (but the query is too slow to simply refresh on-demand)?
On Dec 22, 2018, at 9:42 AM, Ricardo Martin Gomez <rimartingomez@hotmail.com> wrote:Hi, perhaps you can use triggers for some tables.Regards.Obtener Outlook para AndroidFrom: Mitar <mmitar@gmail.com>
Sent: Saturday, December 22, 2018 1:21:49 AM
To: Kevin Brannen
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Watching for view changesHi!On Thu, Dec 20, 2018 at 12:54 PM Kevin Brannen <KBrannen@efji.com> wrote:Hmm, I guess I could see that as long as the DB wasn't too [write] busy, else you'd be flooded with notifications.Sure. But this could be resolved by allowing notifications to be batched together. Debounce them. So could maybe configure how often you want such notifications and if they are more often they would be combined together into one.Maybe it's a useful idea for you ... or maybe not. 😊Thanks. Yes, this is one approach to do it. Hooking into every modify call at the app level and in this way have some information what is changing. I would prefer doing it in the database though, so that it could be independent from the source of the change. Moreover, not all UPDATE queries really do end up updating the data.Mitar
--
Hi!
This is of very practical concern. :-) I have many apps I developed in Meteor [1] using MongoDB. The whole Meteor stack is about reactive programming where web UI automatically rerenders as data in the database is changing. Meteor achieves this using complicated server-side code which tails MongoDB oplog and then maps this to reactive queries and maps how they update based on changes it observes in the oplog. This is then pushed to the client which rerenders.
This approach has scalability issues and also it costs a lot of resources on the server side to first copy data from the DB into server-side component and then keep that state in the server-side component in sync with the DB. It generally has to reimplement oplog parsing, query parsing and evaluation, to be able to do all that.
I like this declerative style of programming. Where you define reactive queries where you select data from DB, define a transformation, and then render it in UI. As data in DB is changing, everything else gets updated automatically. It is a pretty nice way of programming. Without having to think about which all places might be updating DB and how to update UI based on all those places. Maybe not for everyone and all tasks, but in my case I generally work with collaborative online tools where such real-time aspect of working together is pretty neat.
So I like PostgreSQL and I have used in other apps. And now I am trying to see if I could find an efficient way for PostgreSQL to have such reactive query and send me data as the query is changing. I think DB already has to deal with most of such logic and wiring it together in the DB instead of server-side of the app might allow better performance and scaling here.
For example, CREATE PUBLICATION seems a reasonable API as well (instead of a trigger + notification + working around a limit on how much data can be send in a notification), but it does not work on materialized views. Why is that? I thought materialized views are the same as tables, just that there is nicer API to copy a query into those tables when wanted. Currently it seems what is there is very similar to what MongoDB provides: publication/observe on a table level. So if I would not be using joins I could SUBSCRIBE to the PUBLICATION. I would still have to implement logic how to map those changes to changes to a result of a query though, to know how to update results. If I could publish a materialized view, PostgreSQL could do that mapping for me. And I could also do joins.
So it seems materialized views are close to this, but not perfect. So I have some questions:
- Is there a technical reason why PUBLICATION cannot be done on a materialized views? I mean, I could manually create/simulate materialized views through regular tables probably. What are differences between regular tables and materialized views?
- Are there limits on how many subscribers to a PUBLICATION can there effectively be?
- For my case it would be great if materialized views could be TEMPORARY, in-memory (and if PostgreSQL runs out of buffer space for it, I would prefer an error), and UNLOGGED. Any particular reasons which would prevent them to be implemented as such?
Mitar
On Sat, Dec 22, 2018 at 1:16 PM Rob Sargent <robjsargent@gmail.com> wrote:
Is this of theoretical interest (up to and including a specification/requirement) or this a practical concern (i.e. need to know when to update somebody’s dashboard widget (but the query is too slow to simply refresh on-demand)?On Dec 22, 2018, at 9:42 AM, Ricardo Martin Gomez <rimartingomez@hotmail.com> wrote:Hi, perhaps you can use triggers for some tables.Regards.Obtener Outlook para AndroidFrom: Mitar <mmitar@gmail.com>
Sent: Saturday, December 22, 2018 1:21:49 AM
To: Kevin Brannen
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Watching for view changesHi!On Thu, Dec 20, 2018 at 12:54 PM Kevin Brannen <KBrannen@efji.com> wrote:Hmm, I guess I could see that as long as the DB wasn't too [write] busy, else you'd be flooded with notifications.Sure. But this could be resolved by allowing notifications to be batched together. Debounce them. So could maybe configure how often you want such notifications and if they are more often they would be combined together into one.Maybe it's a useful idea for you ... or maybe not. 😊Thanks. Yes, this is one approach to do it. Hooking into every modify call at the app level and in this way have some information what is changing. I would prefer doing it in the database though, so that it could be independent from the source of the change. Moreover, not all UPDATE queries really do end up updating the data.Mitar
--
--
On Fri, 21 Dec 2018 23:41:16 -0800, Mitar <mmitar@gmail.com> wrote: >Hi! > >On Fri, Dec 21, 2018 at 11:10 PM George Neuner <gneuner2@comcast.net> wrote: >> A materialized view IS exactly such a deliberate cache of results from >> applying a view. It is a real table that can be monitored for changes >> using INSERT, UPDATE and/or DELETE triggers. > >Caching is needed if you want to compute a difference between previous >version and new. But if you want to just know new value, then I could >imagine that (a simple implementation would) on every change to any >underlying table check if this change matches selectors of the query >and if such apply its operations/projections and produce the new >value. Yes, that could be done. But it isn't. In effect you are asking the DBMS also to be a spreadsheet: i.e. change this cell and everything that depends on it gets recomputed. A spreadsheet is an order of magnitude simpler to implement than a DBMS, but the combination would be an order of magnitude (or more) harder. Even object graph databases don't offer spreadsheet functionality, and it would be a lot easier to do there than in a table relational system. >So yes, you need caching if you want to decrease CPU use, but you >could also see it as new values being computed again and again through >query. Would such caching you are mentioning really improve >performance, I do not know, so it might be premature optimization? It may take only 3 cycles to multiply two numbers, but it can take thousands of cycles [or millions if the data is on disk] to get those two numbers into the multiplier. There always are exceptions, but the general rule is that whenever the result requires: - significant computation, - significant resources, or - significant time then you should cache the result instead of recomputing it. Joins and sorts can take a whole lot of memory (and spill onto disk if they overflow the work buffer). A fetch of a table or index not in memory is simple but takes a lot of time - as well as maybe pushing something else out (increasing the complexity of a competing query). >If we do go down the cache path, then I agree, materialized views >seems nice, but I would prefer temporary materialized views: they >should be cleaned up at the end of the session. Moreover, they should >be ideally just in memory, not really on disk. Materialized views are >currently stored to disk, no? In PG, all *named* tables are backed on disk - even temporary tables. Only anonymous tables of query results can exist entirely in memory [and even they can spill onto disk when necessary]. With enough memory you can cache all your tables in shared buffers and have enough extra that you never run out of work buffers and never overflow a work buffer. But that is the best you can achieve with PG. George
Hi! On Sun, Dec 23, 2018 at 1:00 AM George Neuner <gneuner2@comcast.net> wrote: > A spreadsheet is an order of magnitude simpler to implement than a > DBMS, but the combination would be an order of magnitude (or more) > harder. Even object graph databases don't offer spreadsheet > functionality, and it would be a lot easier to do there than in a > table relational system. But having that readily available would be so cool. :-) Especially because it is hard. And also, it does not necessary have to be full graph. Just one level of dependencies. Then those are recomputed. And then anything depending on now changed values get recomputed again. And so on. So no need to traverse the whole graph at once. > In PG, all *named* tables are backed on disk - even temporary tables. > Only anonymous tables of query results can exist entirely in memory > [and even they can spill onto disk when necessary]. > > With enough memory you can cache all your tables in shared buffers and > have enough extra that you never run out of work buffers and never > overflow a work buffer. But that is the best you can achieve with PG. Thanks for all this input. So I am now thinking in terms of a materialized views. For my case it would be great if materialized views could be TEMPORARY (removed at the end of session), in-memory (and if PostgreSQL runs out of buffer space for it, I would prefer an error, instead of spilling to the disk), and UNLOGGED. Any particular reasons which would prevent them to be implemented as such? I through that materialized views are just a fancy table with stored query, so that you can easily REFRESH them, instead of doing that yourself. I can then wire triggers on underlying tables to REFRESH materialized views automatically. Is there some easy way to debounce those refreshes? If I just blindly trigger REFRESH in every row trigger, this could do a lot of work. I would prefer to combine all changes for example into 100 ms batches and refresh only once per 100 ms. So in some way I would like to be able to control the level of real-time I would need. I have found a blog post [1] about this, but it seems pretty tricky and requires an outside cron job. For 100 ms batching time this feels like something better done inside PostgreSQL itself. The last question is how do I get changes in materialized views streamed to the client. It seems one option is a trigger on the materialized view which uses NOTIFY to tell the client about the change. But NOTIFY has limit on the payload size, so I cannot just send the change to the client. It seems I would need additional table to store the change and then client would get notification, read from that table, and remove the rows read. So in a way I would need my own queue for changes. Any other suggestion how to do that? I looked into PUBLICATION and SUBSCRIBE, but it seems this is only supported between servers, not server-client, and also works only on base tables, not materialized views (not sure again why, because aren't materialized views just tables). Would it be possible to use client to subscribe instead of a server? [1] https://onethingsimple.com/2017/10/sync-materialized-views-after-debounce-period/ Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Hi! On Fri, Dec 21, 2018 at 11:10 PM George Neuner <gneuner2@comcast.net> wrote: > A materialized view IS exactly such a deliberate cache of results from > applying a view. It is a real table that can be monitored for changes > using INSERT, UPDATE and/or DELETE triggers. Are you sure one can use triggers on a materialized view? I am getting: "my_materialized_view" is not a table or view as an error when I am trying to create a trigger on materialized view. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On Sun, 23 Dec 2018 23:06:51 -0800, Mitar <mmitar@gmail.com> wrote: >On Fri, Dec 21, 2018 at 11:10 PM George Neuner <gneuner2@comcast.net> wrote: >> A materialized view IS exactly such a deliberate cache of results from >> applying a view. It is a real table that can be monitored for changes >> using INSERT, UPDATE and/or DELETE triggers. > >Are you sure one can use triggers on a materialized view? I am getting: > >"my_materialized_view" is not a table or view > >as an error when I am trying to create a trigger on materialized view. IIRC the cache table's name is generated. I don't know the proper incantations to get it from the catalogs, but an easy way to find it is to put the materialized view into its own tablespace, then search pg_tables for objects in that space. George
On Sun, 23 Dec 2018 10:10:50 -0800, Mitar <mmitar@gmail.com> wrote: >I can then wire triggers on underlying tables to REFRESH materialized >views automatically. Is there some easy way to debounce those >refreshes? If I just blindly trigger REFRESH in every row trigger, >this could do a lot of work. I would prefer to combine all changes for >example into 100 ms batches and refresh only once per 100 ms. So in >some way I would like to be able to control the level of real-time I >would need. I have found a blog post [1] about this, but it seems >pretty tricky and requires an outside cron job. For 100 ms batching >time this feels like something better done inside PostgreSQL itself. see the pg_cron extension. But cron doesn't allow sub-second scheduling. Why do you need it so frequently? PG is not a real time platform, or even an all-in-memory database. If you try to use it that way, you are guaranteed to have problems. >The last question is how do I get changes in materialized views >streamed to the client. It seems one option is a trigger on the >materialized view which uses NOTIFY to tell the client about the >change. But NOTIFY has limit on the payload size, so I cannot just >send the change to the client. If you trigger by row, the NOTIFY payload is just that one row. If the data can't fit into the 8K payload, then you need to send some kind of row id and have the client read the changes explicitly. Also remember that you may have to deal with DELETEd rows. If you can't send row data by NOTIFY, then the client HAS to cache the whole view anyway to see what's been deleted. >It seems I would need additional table >to store the change and then client would get notification, read from >that table, and remove the rows read. So in a way I would need my own >queue for changes. There's no reason to do that. The client has to interpret the view changes and incorporate them into its own local data structures. Segregating new/modified rows separately on the server side seems to me to be a waste of effort. If it makes sense, have the client collect some number of notifications and read all the indicated rows in one query. George
Hi! On Mon, Dec 24, 2018 at 12:20 AM George Neuner <gneuner2@comcast.net> wrote: > Also remember that you may have to deal with DELETEd rows. If you > can't send row data by NOTIFY, then the client HAS to cache the whole > view anyway to see what's been deleted. Client-side (web browser) is caching the view (because it is rendering it). I do not want the server-side component (PostgreSQL client) to have to cache it as well. So I will use materialized view to cache it in the PostgreSQL directly, and the sync it all the way up to the web browser. Not sure if this changes anything? > If it makes sense, have the client collect some number of > notifications and read all the indicated rows in one query. You are right. Thanks again. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Hi! On Sun, Dec 23, 2018 at 11:59 PM George Neuner <gneuner2@comcast.net> wrote: > IIRC the cache table's name is generated. I don't know the proper > incantations to get it from the catalogs, but an easy way to find it > is to put the materialized view into its own tablespace, then search > pg_tables for objects in that space. Oh, what a trick. :-) Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Hi! I am unable to find cache table's name. :-( Mitar On Mon, Dec 24, 2018 at 1:01 AM Mitar <mmitar@gmail.com> wrote: > > Hi! > > On Sun, Dec 23, 2018 at 11:59 PM George Neuner <gneuner2@comcast.net> wrote: > > IIRC the cache table's name is generated. I don't know the proper > > incantations to get it from the catalogs, but an easy way to find it > > is to put the materialized view into its own tablespace, then search > > pg_tables for objects in that space. > > Oh, what a trick. :-) > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On Mon, 24 Dec 2018 01:16:53 -0800, Mitar <mmitar@gmail.com> wrote: > >I am unable to find cache table's name. :-( > Did you execute the view or simply define it? I think the table is created the 1st time the view is executed. George
Hi! On Mon, Dec 24, 2018 at 1:31 AM George Neuner <gneuner2@comcast.net> wrote: > Did you execute the view or simply define it? I think the table is > created the 1st time the view is executed. View is populated with data and I can select from it. Still, I cannot find any table which would look like it in pg_tables? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Hi! I started a patch to address this limitation. [1] [1] https://www.postgresql.org/message-id/CAKLmikMN%2B0W79A7Wg4krK55cTLxb%3DTARdz2b%3Dvqp19ao6AAZjw%40mail.gmail.com Mitar On Mon, Dec 24, 2018 at 1:38 AM Mitar <mmitar@gmail.com> wrote: > > Hi! > > On Mon, Dec 24, 2018 at 1:31 AM George Neuner <gneuner2@comcast.net> wrote: > > Did you execute the view or simply define it? I think the table is > > created the 1st time the view is executed. > > View is populated with data and I can select from it. Still, I cannot > find any table which would look like it in pg_tables? > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Hi! I have a followup to this thread. George, thanks for all the help. I spend some time now investigating various approaches here and I am reporting here some findings, so that they might help others as well. First, I have tried the approach with MATERIALIZED VIEWs to hold the cached contents of the query. I attached triggers to all source tables which then notified (using LISTEN/NOTIFY) the client about changes. Client throttled those notifications and eventually triggered a REFRESH MATERIALIZED VIEW. Because I also attached triggers on the materialized view, I got notifications (using LISTEN/NOTIFY) of what rows have changed. Client might decided to fetch also rows themselves. For this to work well I made two patches. A patch to allow creation of TEMPORARY MATERIALIZED VIEWs [1] and a patch to allow attaching triggers on materialized views [2]. In the second patch I also changed the REFRESH MATERIALIZED VIEW CONCURRENTLY logic to issue not just REMOVEs and INSERTs for changed rows, but to compute which rows have changed and issue UPDATEs for them as well. This makes it easier for the client to know what changed. This worked well but it was slower than some other packages I have found on the web which were trying to provide a similar functionality. I made a benchmark to compare them [3] and found out that there was room for improvement. First, instead of sending updates of a MATERIALIZED VIEW using LISTEN/NOTIFY and then fetching rows, I instead used a trigger to copy changes to another temporary table, and then just used DELETE FROM temp_table RETURNING * to get all results from a table and returning it to the client, all inside same transaction, so data in that temporary table was never committed. This made things a bit better, mostly latency between a change and getting it to the client became more predictable. Before there were sometimes quite large spikes. Once I did that I realized that in fact MATERIALIZED VIEW is not really necessary. All I need is a place to cache previous results of the query, but I do not really care about updates to the MATERIALIZED VIEW. So I decided to do the following. If I already have a temporary table with previous results, when I want to refresh my query, I create a new temporary table using CREATE TABLE AS using the PREPAREd query, I compute diff between those two tables in the same way as REFRESH MATERIALIZED VIEW CONCURRENTLY does, in one query, and I return those results to the client. Then I just DROP TABLE old cache table, and rename new cache table to old cache name. So instead of computing a diff, updating materialized view, running triggers, and copying to the table, I just compute a diff and this is it. This works now very well. I made and published a package doings this [4]. Of course, the main problem is still that for every change in source table I have to eventually refresh the query. And without more logic this can really become problematic. Ideally, we could reuse some ideas/code from Incremental View Maintenance [5] to compute what changes to the query results should happen based on changes to source tables. Then we could just push those to the client (and update temporary table). We would then do full refresh only when things could not be inferred. Thanks again for everything. [1] https://commitfest.postgresql.org/21/1951/ [2] https://commitfest.postgresql.org/21/1948/ [3] https://github.com/mitar/node-pg-reactivity-benchmark [4] https://github.com/tozd/node-reactive-postgres [5] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m