Thread: Watching for view changes

Watching for view changes

From
Mitar
Date:
Hi!

I am trying to see how I could use NOTIFY/LISTEN to watch changes of a
complicated SELECT query which spans multiple tables. Ideally, I would
like to leave to PostgreSQL to determine when some data (and which
data) in the result of the SELECT query has changed. So I am thinking
that creating a temporary view using that query could be a way, only
if I would find a way to watch such view for changes somehow.

But it seems this is not really possible. I looked into two mechanisms:

- Logical replication. Instead of NOTIFY/LISTEN I could simply create
a publication over a view and then subscribe to it. But it seems
logical replication can be done only over base tables and not views.
[1]
- Using "after" trigger on the view to get notification when the view
gets changed. I could even use  transition relations to have
information what changed. But sadly it seems that this is possible
only if there is also INSTEAD OF trigger on the view. But I would like
to get notification when the view has changed because underlying
tables have changed, and not because of an UPDATE query on the view
itself. Moreover, I do not really need writable views. [2]

So I wonder if I am missing anything. Is there some other best
practice how to get notifications when result of a query changes in
real-time? And information what changed?

How hard it would be to implement such triggers on a view for whenever
a view changes? Is there a process to make a feature request?

(Also, I have not really managed to get statement level "after"
triggers to be run on a view for at all. Because if I rewrite a query
with INSTEAD OF then triggers on those tables are triggered, not
really view's. So not sure what is even expected use there.)

[1] https://www.postgresql.org/docs/devel/logical-replication-restrictions.html
[2] https://www.postgresql.org/docs/devel/trigger-definition.html


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
Ron
Date:
On 12/20/18 3:17 AM, Mitar wrote:
> Hi!
>
> I am trying to see how I could use NOTIFY/LISTEN to watch changes of a
> complicated SELECT query which spans multiple tables. Ideally, I would
> like to leave to PostgreSQL to determine when some data (and which
> data) in the result of the SELECT query has changed. So I am thinking
[snip]

There was a LONG thread on this list a few months ago about monitoring for 
DDL changes.  It should prove helpful (or cause you to despair).


-- 
Angular momentum makes the world go 'round.


Re: Watching for view changes

From
Mitar
Date:
Hi!

Thanks. Care for a link or at least thread subject?

Also, Googling around this seems a pretty popular request. I am quite
surprised that there is not something out-of-box available for this,
efficient and scalable.


Mitar

On Thu, Dec 20, 2018 at 7:33 AM Ron <ronljohnsonjr@gmail.com> wrote:
>
> On 12/20/18 3:17 AM, Mitar wrote:
> > Hi!
> >
> > I am trying to see how I could use NOTIFY/LISTEN to watch changes of a
> > complicated SELECT query which spans multiple tables. Ideally, I would
> > like to leave to PostgreSQL to determine when some data (and which
> > data) in the result of the SELECT query has changed. So I am thinking
> [snip]
>
> There was a LONG thread on this list a few months ago about monitoring for
> DDL changes.  It should prove helpful (or cause you to despair).
>
>
> --
> Angular momentum makes the world go 'round.
>


-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
Adrian Klaver
Date:
On 12/20/18 7:54 AM, Mitar wrote:
> Hi!
> 
> Thanks. Care for a link or at least thread subject?

https://www.postgresql.org/message-id/CA%2BFnnTxqScf6mDw_7HLMfu7YTagPrsYUw-bc%3DOKsBQ0TqprvEA%40mail.gmail.com

> 
> Also, Googling around this seems a pretty popular request. I am quite
> surprised that there is not something out-of-box available for this,
> efficient and scalable.
> 
> 
> Mitar
> 
> On Thu, Dec 20, 2018 at 7:33 AM Ron <ronljohnsonjr@gmail.com> wrote:
>>
>> On 12/20/18 3:17 AM, Mitar wrote:
>>> Hi!
>>>
>>> I am trying to see how I could use NOTIFY/LISTEN to watch changes of a
>>> complicated SELECT query which spans multiple tables. Ideally, I would
>>> like to leave to PostgreSQL to determine when some data (and which
>>> data) in the result of the SELECT query has changed. So I am thinking
>> [snip]
>>
>> There was a LONG thread on this list a few months ago about monitoring for
>> DDL changes.  It should prove helpful (or cause you to despair).
>>
>>
>> --
>> Angular momentum makes the world go 'round.
>>
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Watching for view changes

From
Mike Rylander
Date:
On Thu, Dec 20, 2018 at 4:17 AM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> I am trying to see how I could use NOTIFY/LISTEN to watch changes of a
> complicated SELECT query which spans multiple tables. Ideally, I would
> like to leave to PostgreSQL to determine when some data (and which
> data) in the result of the SELECT query has changed. So I am thinking
> that creating a temporary view using that query could be a way, only
> if I would find a way to watch such view for changes somehow.
>

If, as I assume, the FROM clause of the SELECT is essentially static,
just with varying WHERE conditions, you could just use an AFTER
INSERT/UPDATE/DELETE trigger on each of the tables involved to send a
NOTIFY whenever data that is exposed by the SELECT is changed.  You
can deliver a payload, such as the table name, primary key value and,
with a little work, even a list of fields that were modified as a JSON
blob, and let the application do whatever needs to be done to react
the the changes -- issue other queries, etc.

Of course that depends on your application knowing when it's
appropriate to NOTIFY, or being able to handle spurious NOTIFYs.

HTH,

--
Mike Rylander
 | Executive Director
 | Equinox Open Library Initiative
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker@equinoxinitiative.org
 | web:  http://equinoxinitiative.org

> But it seems this is not really possible. I looked into two mechanisms:
>
> - Logical replication. Instead of NOTIFY/LISTEN I could simply create
> a publication over a view and then subscribe to it. But it seems
> logical replication can be done only over base tables and not views.
> [1]
> - Using "after" trigger on the view to get notification when the view
> gets changed. I could even use  transition relations to have
> information what changed. But sadly it seems that this is possible
> only if there is also INSTEAD OF trigger on the view. But I would like
> to get notification when the view has changed because underlying
> tables have changed, and not because of an UPDATE query on the view
> itself. Moreover, I do not really need writable views. [2]
>
> So I wonder if I am missing anything. Is there some other best
> practice how to get notifications when result of a query changes in
> real-time? And information what changed?
>
> How hard it would be to implement such triggers on a view for whenever
> a view changes? Is there a process to make a feature request?
>
> (Also, I have not really managed to get statement level "after"
> triggers to be run on a view for at all. Because if I rewrite a query
> with INSTEAD OF then triggers on those tables are triggered, not
> really view's. So not sure what is even expected use there.)
>
> [1] https://www.postgresql.org/docs/devel/logical-replication-restrictions.html
> [2] https://www.postgresql.org/docs/devel/trigger-definition.html
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>


Re: Watching for view changes

From
Mitar
Date:
Hi!

Ah, that was a thread about table changes. I am interested in data
changes (results from a query).

Thanks.


Mitar

On Thu, Dec 20, 2018 at 8:01 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 12/20/18 7:54 AM, Mitar wrote:
> > Hi!
> >
> > Thanks. Care for a link or at least thread subject?
>
> https://www.postgresql.org/message-id/CA%2BFnnTxqScf6mDw_7HLMfu7YTagPrsYUw-bc%3DOKsBQ0TqprvEA%40mail.gmail.com
>
> >
> > Also, Googling around this seems a pretty popular request. I am quite
> > surprised that there is not something out-of-box available for this,
> > efficient and scalable.
> >
> >
> > Mitar
> >
> > On Thu, Dec 20, 2018 at 7:33 AM Ron <ronljohnsonjr@gmail.com> wrote:
> >>
> >> On 12/20/18 3:17 AM, Mitar wrote:
> >>> Hi!
> >>>
> >>> I am trying to see how I could use NOTIFY/LISTEN to watch changes of a
> >>> complicated SELECT query which spans multiple tables. Ideally, I would
> >>> like to leave to PostgreSQL to determine when some data (and which
> >>> data) in the result of the SELECT query has changed. So I am thinking
> >> [snip]
> >>
> >> There was a LONG thread on this list a few months ago about monitoring for
> >> DDL changes.  It should prove helpful (or cause you to despair).
> >>
> >>
> >> --
> >> Angular momentum makes the world go 'round.
> >>
> >
> >
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
Rob Sargent
Date:

> On Dec 20, 2018, at 1:04 PM, Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> Ah, that was a thread about table changes. I am interested in data
> changes (results from a query).
>
> Thanks.
>
Are you hoping to see the difference in the returned values for successive calls to the same query?



Re: Watching for view changes

From
Mitar
Date:
Hi!

On Thu, Dec 20, 2018 at 8:11 AM Mike Rylander <mrylander@gmail.com> wrote:
> If, as I assume, the FROM clause of the SELECT is essentially static,
> just with varying WHERE conditions, you could just use an AFTER
> INSERT/UPDATE/DELETE trigger on each of the tables involved to send a
> NOTIFY whenever data that is exposed by the SELECT is changed.

You mean that for every SQL query I am making to a database, I would
manually determine which tables are involved and then setup triggers
with NOTIFY that the SELECT might have changed?

I am trying to see if this could be something I could abstract out
that it would be done automatically for any query. I have looked into
using EXPLAIN to get a list of tables involved in a query, but the
issue is that it look non-trivial to determine what has really changed
as a consequence of those tables changing. I would have to cache
myself what was the result of a query and then myself do a diff? Are
there any ways to do this inside PostgreSQL without having to cache
results on the client and do it there?

> You can deliver a payload, such as the table name, primary key value and,
> with a little work, even a list of fields that were modified as a JSON
> blob, and let the application do whatever needs to be done to react
> the the changes -- issue other queries, etc.

But this would mean that I would have to know how changes on involved
tables influence query results. I would like to not have to do SQL
query parsing and understanding on the client. So ideally, I would get
information directly from PostgreSQL. For me, an API where I could do
AFTER UPDATE trigger on FOR EACH ROW on a view would be perfect. In
that trigger I could get information which rows of the view changed
and then use NOTIFY to inform the client. Or even use transition
relations to get old and new state in the case FOR EACH STATEMENT (but
then I would still have to diff it probably myself). And view could
represent any query, without me having to try to understand and parse
it.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
Mitar
Date:
Hi!

On Thu, Dec 20, 2018 at 12:07 PM Rob Sargent <robjsargent@gmail.com> wrote:
> Are you hoping to see the difference in the returned values for successive calls to the same query?

i would like to in as close to real-time as possible get notification
when results of a query would have changed if I would do the same
query again, without me having to do polling or without me having to
do diffing.

So if my query was SELECT * FROM table then having after update
trigger on that table would give me such information on a row to row
basis, or statement basis. But I am not doing just such queries. So I
was thinking that I could create a view and then do SELECT * FROM view
to get update trigger on changes on the view. So changes would happen
because of changes do underlying tables. And I would like to be able
to know when and what in that query has changed when underlying tables
have changed.

I have found this package [1] which seems to be close in the idea, but
it does not seem to work in all cases. There is also an older
different package. [2] I would like to see if something like this
could be done inside database itself.

[1] https://github.com/nothingisdead/pg-live-query
[2] https://github.com/numtel/pg-live-select


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


RE: Watching for view changes

From
Kevin Brannen
Date:
From: Mitar <mmitar@gmail.com>

> On Thu, Dec 20, 2018 at 12:07 PM Rob Sargent <robjsargent@gmail.com> wrote:
> > Are you hoping to see the difference in the returned values for successive calls to the same query?
>
> i would like to in as close to real-time as possible get notification when results of a query would have changed if I
woulddo the same query again, without me having to do polling or without me having to do diffing.
 


Hmm, I guess I could see that as long as the DB wasn't too [write] busy, else you'd be flooded with notifications.

Some years ago when I was working on a web app in Perl (which uses the DBI module for all communication to the DB), I
subclassedDBI in order to see all calls to the DB. For most statements, I just let all the calls go thru. But for
inserts,I wrote to a file the data that was being inserted; for deletes I wrote what was being deleted; and for updates
Iwrote the before and after values. (For the last 2 I changed the delete/update into a select to get the data.) It made
itmuch easier to see how data changed -- especially when I was new to the app.
 

You could do something like that, where you have an interceptor that reports on data changes, filtering/searching for
justthe parts you want as you see fit. Of course, that would be just for your app, it wouldn't catch changes made from
psqland other tools.
 

Maybe it's a useful idea for you ... or maybe not. 😊

If you had to have all statements no matter what tool was used to change data, I'd probably change log_statements to
"all","tail -f" the Pg log, and "do the right thing" (which could be non-trivial).
 

Kevin
This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential
information.If you are not the intended recipient, or a person responsible for delivering it to the intended recipient,
youare hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or
attachedto this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately
notifyus by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them
todisk. Thank you.
 

Re: Watching for view changes

From
Ron
Date:
On 12/20/18 2:20 PM, Mitar wrote:
> Hi!
>
> On Thu, Dec 20, 2018 at 12:07 PM Rob Sargent <robjsargent@gmail.com> wrote:
>> Are you hoping to see the difference in the returned values for successive calls to the same query?
> i would like to in as close to real-time as possible get notification
> when results of a query would have changed if I would do the same
> query again, without me having to do polling or without me having to
> do diffing.

That's nothing like what you wrote in the Subject line.

-- 
Angular momentum makes the world go 'round.


Re: Watching for view changes

From
George Neuner
Date:
On Thu, 20 Dec 2018 12:12:14 -0800, Mitar <mmitar@gmail.com> wrote:


>On Thu, Dec 20, 2018 at 8:11 AM Mike Rylander <mrylander@gmail.com> wrote:
>> If, as I assume, the FROM clause of the SELECT is essentially static,
>> just with varying WHERE conditions, you could just use an AFTER
>> INSERT/UPDATE/DELETE trigger on each of the tables involved to send a
>> NOTIFY whenever data that is exposed by the SELECT is changed.
>
>You mean that for every SQL query I am making to a database, I would
>manually determine which tables are involved and then setup triggers
>with NOTIFY that the SELECT might have changed?

You can just put an update trigger on every table.  You aren't forced
to listen for notifications.


>I am trying to see if this could be something I could abstract out
>that it would be done automatically for any query. I have looked into
>using EXPLAIN to get a list of tables involved in a query, but the
>issue is that it look non-trivial to determine what has really changed
>as a consequence of those tables changing. I would have to cache
>myself what was the result of a query and then myself do a diff? Are
>there any ways to do this inside PostgreSQL without having to cache
>results on the client and do it there?

You don't need to cache anything on the client.  An update trigger on
a table can notify a listening client when data is changed.  

The difficulty is that views are not guaranteed to be updateable.  As
a technical matter, you can put a trigger on a view, but it may never
fire.

AFAIK, update and delete triggers do work on materialized views,
because they really are physical tables.  Depending on your use case,
materialized views may or may not be of help to you.



>> You can deliver a payload, such as the table name, primary key value and,
>> with a little work, even a list of fields that were modified as a JSON
>> blob, and let the application do whatever needs to be done to react
>> the the changes -- issue other queries, etc.
>
>But this would mean that I would have to know how changes on involved
>tables influence query results. 

The trigger function itself is agnostic WRT the format of the table -
the old and new row data are provided generically as records, and you
can to convert the record data, e.g., to JSON or XML, without knowing
its format.

AFAIHS, you really only need to know the table format to inspect or
modify the row data.


>I would like to not have to do SQL query parsing and understanding
>on the client. So ideally, I would get information directly from 
>PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on
>FOR EACH ROW on a view would be perfect. In that trigger I could get
>information which rows of the view changed and then use NOTIFY to 
>inform the client. 

You might want to use BEFORE UPDATE so you get both the old and new
row data.


YMMV,
George



Re: Watching for view changes

From
Mitar
Date:
Hi!

On Thu, Dec 20, 2018 at 12:54 PM Kevin Brannen <KBrannen@efji.com> wrote:
Hmm, I guess I could see that as long as the DB wasn't too [write] busy, else you'd be flooded with notifications.

Sure. But this could be resolved by allowing notifications to be batched together. Debounce them. So could maybe configure how often you want such notifications and if they are more often they would be combined together into one.
 
Maybe it's a useful idea for you ... or maybe not. 😊

Thanks. Yes, this is one approach to do it. Hooking into every modify call at the app level and in this way have some information what is changing. I would prefer doing it in the database though, so that it could be independent from the source of the change. Moreover, not all UPDATE queries really do end up updating the data.


Mitar

--

Re: Watching for view changes

From
Mitar
Date:
Hi!

Thanks for more responses.

On Thu, Dec 20, 2018 at 6:28 PM George Neuner <gneuner2@comcast.net> wrote:
> >You mean that for every SQL query I am making to a database, I would
> >manually determine which tables are involved and then setup triggers
> >with NOTIFY that the SELECT might have changed?
>
> You can just put an update trigger on every table.  You aren't forced
> to listen for notifications.

You are right. It is easier to just have triggers in advance.

> The difficulty is that views are not guaranteed to be updateable.

What you mean by that? I mean, just to be clear. I care only about
read-only views. The changes to views I care about change view because
underlying tables are updated. I do not care about UPDATE queries
against views themselves.

So you mean here that it is not guaranteed that you can make an UPDATE
query against a view? Yes, that I know. But if underlying tables
change, view is always updated, no? I mean, this is the main purpose
of a view. :-)

> As a technical matter, you can put a trigger on a view, but it may never fire.

If fires only for UPDATE queries against views themselves (and only if
INSTEAD OF trigger does not invalidate the update). But it never fires
for updates which happen because of changes to the underlying tables.
I would like to know:

a) Do people agree/think that would be a good API for my use case?
b) How hard would it be to implement/develop something like that? Is
this something PostgreSQL already knows internally and it is just a
question of exposing it?
c) Is there some better way to achieve this?

> AFAIK, update and delete triggers do work on materialized views,
> because they really are physical tables.  Depending on your use case,
> materialized views may or may not be of help to you.

Yes, materialized views are too heavy for me. But having update and
delete triggers only memory-only temporary views would be perfect.

Also, materialized views have to be manually refreshed, no? So it is
not really true that they get updated automatically (and that update
triggers would run) as soon as underling tables are modified?

> >I would like to not have to do SQL query parsing and understanding
> >on the client. So ideally, I would get information directly from
> >PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on
> >FOR EACH ROW on a view would be perfect. In that trigger I could get
> >information which rows of the view changed and then use NOTIFY to
> >inform the client.
>
> You might want to use BEFORE UPDATE so you get both the old and new
> row data.

Not sure how this helps. If I have a query like (or a view defined
with such query):

SELECT "_id", "body", (SELECT row_to_json(posts) FROM posts WHERE
posts."_id"=comments."postId") AS "post" FROM comments

If I get a trigger notification that some row in "comments" table has
changed. How do I get an updated row in the query results (or the view
defined with such query). I would have to parse the SQL and figure out
how to do transformation myself, no?

So, I am not sure how triggers on underlying tables can really inform
how to know what in the view has been updated?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
George Neuner
Date:
On Fri, 21 Dec 2018 20:49:23 -0800, Mitar <mmitar@gmail.com> wrote:

>On Thu, Dec 20, 2018 at 6:28 PM George Neuner <gneuner2@comcast.net> wrote:
>
>> The difficulty is that views are not guaranteed to be updateable.
>
>What you mean by that? I mean, just to be clear. I care only about
>read-only views. The changes to views I care about change view because
>underlying tables are updated. I do not care about UPDATE queries
>against views themselves.
>
>So you mean here that it is not guaranteed that you can make an UPDATE
>query against a view? Yes, that I know. But if underlying tables
>change, view is always updated, no? I mean, this is the main purpose
>of a view. :-)

Terminology. <grin>  UPDATE vs update.

I'm certain you know this already, but to be clear:


A normal view really is just a SELECT whose results reflect the
current data in the underlying tables.  That is, a normal view is
*code*, NOT data, and its result is an anonymous *virtual* table that
exists only in the transaction that applied the view.  Thus the
results can't be monitored for changes without employing some kind of
deliberate caching.

A materialized view IS exactly such a deliberate cache of results from
applying a view.  It is a real table that can be monitored for changes
using INSERT, UPDATE and/or DELETE triggers.  


>> As a technical matter, you can put a trigger on a view, but it may never fire.
>
>If fires only for UPDATE queries against views themselves (and only if
>INSTEAD OF trigger does not invalidate the update). But it never fires
>for updates which happen because of changes to the underlying tables.

Right.  A trigger on a (normal) view actually is on the virtual result
table - but that table doesn't exist outside of the transaction that
applied the view.

I'm not certain offhand, but I *believe* that - like a SELECT - view
results will change on the fly during a transaction if the underlying
tables change and the isolation settings allow to see it.

But even if a view will update while open, to use this you'd have to
hold the result set open (with a cursor) while also keeping the
transaction open.  Once the result set is closed, the view results are
gone.


With a materialized view, you must apply the view code again (call
REFRESH) to see changes to the underlying tables - it doesn't happen
automagically.  But when refreshed, triggers on the cache table would
react to changes.


>I would like to know:
>
>a) Do people agree/think that would be a good API for my use case?
>b) How hard would it be to implement/develop something like that? Is
>this something PostgreSQL already knows internally and it is just a
>question of exposing it?

It probably would be possible to take a trigger set on a view and
transitively trigger on the base tables underlying it.  

But to what end?  The results (of a normal view) don't persist beyond
the current transaction, and allowing them to do so would, AFAICS, be
a violation of the SQL standard.  Postgresql is not going to do that
(certainly not for a new feature, only for backward compatibility).

>c) Is there some better way to achieve this?
>  :
>Yes, materialized views are too heavy for me. But having update and
>delete triggers only memory-only temporary views would be perfect.

Unfortunately, materialized views are the only reasonable server side
solution that I can think of.


>Also, materialized views have to be manually refreshed, no? So it is
>not really true that they get updated automatically (and that update
>triggers would run) as soon as underling tables are modified?

Yes.  But you can do that with triggers on the base tables.  Or timers
if "most current" information is not critical.  

Perhaps you can use synchronized normal and materialized views.  Use
the normal view for current information in transactions, and the
materialized view to asynchronously update passive LISTEN clients with
delta information.


>> >I would like to not have to do SQL query parsing and understanding
>> >on the client.

That I understand. <grin>  I'm a compiler and language hobbiest ... 
even just parsing modern SQL can be painful.


>> So ideally, I would get information directly from
>> >PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on
>> >FOR EACH ROW on a view would be perfect. In that trigger I could get
>> >information which rows of the view changed and then use NOTIFY to
>> >inform the client.
>>
>> You might want to use BEFORE UPDATE so you get both the old and new
>> row data.
>
>Not sure how this helps. 

If you know specifically what columns have changed, only they would
need be communicated to a LISTEN client.  Not necessary to send the
whole row (which may be much larger).

AFTER shows you only the new row.  BEFORE shows you both the old and
new rows so you can determine what changed (and how, if relevant).


>So, I am not sure how triggers on underlying tables can really inform
>how to know what in the view has been updated?

They can't - if any results are computed you'd have to apply the view
again to see the changes.  And then you'd have to a whole new view and
not just updates to the old.

I don't see any way out of this that doesn't involve a materialized
view.  It's the only way I can see to get just updates and not deal
with the whole result set again.


YMMV,
George



Re: Watching for view changes

From
Mitar
Date:
Hi!

On Fri, Dec 21, 2018 at 11:10 PM George Neuner <gneuner2@comcast.net> wrote:
> A materialized view IS exactly such a deliberate cache of results from
> applying a view.  It is a real table that can be monitored for changes
> using INSERT, UPDATE and/or DELETE triggers.

Caching is needed if you want to compute a difference between previous
version and new. But if you want to just know new value, then I could
imagine that (a simple implementation would) on every change to any
underlying table check if this change matches selectors of the query
and if such apply its operations/projections and produce the new
value.

So yes, you need caching if you want to decrease CPU use, but you
could also see it as new values being computed again and again through
query. Would such caching you are mentioning really improve
performance, I do not know, so it might be premature optimization?

If we do not go down the cache path, then it seems there is no other
way to have this "apply this query again on those updated rows from
table". In a way I see query as a transformation (in relational
algebra) of original tables to results and I would need to be able to
apply that transformation again on new rows. And if some additional
data is necessary (some rows for other non-changed tables) it would
just recompute that again, instead of using cache.

If we do go down the cache path, then I agree, materialized views
seems nice, but I would prefer temporary materialized views: they
should be cleaned up at the end of the session. Moreover, they should
be ideally just in memory, not really on disk. Materialized views are
currently stored to disk, no?

> With a materialized view, you must apply the view code again (call
> REFRESH) to see changes to the underlying tables - it doesn't happen
> automagically.  But when refreshed, triggers on the cache table would
> react to changes.

So you are saying I could use triggers on the cache table to know what
really changed instead of having to compute diff myself? Interesting.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
Ricardo Martin Gomez
Date:
Hi, perhaps you can use triggers for some tables.
Regards.


From: Mitar <mmitar@gmail.com>
Sent: Saturday, December 22, 2018 1:21:49 AM
To: Kevin Brannen
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Watching for view changes
 
Hi!

On Thu, Dec 20, 2018 at 12:54 PM Kevin Brannen <KBrannen@efji.com> wrote:
Hmm, I guess I could see that as long as the DB wasn't too [write] busy, else you'd be flooded with notifications.

Sure. But this could be resolved by allowing notifications to be batched together. Debounce them. So could maybe configure how often you want such notifications and if they are more often they would be combined together into one.
 
Maybe it's a useful idea for you ... or maybe not. 😊

Thanks. Yes, this is one approach to do it. Hooking into every modify call at the app level and in this way have some information what is changing. I would prefer doing it in the database though, so that it could be independent from the source of the change. Moreover, not all UPDATE queries really do end up updating the data.


Mitar

--

Re: Watching for view changes

From
Rob Sargent
Date:
Is this of theoretical interest (up to and including a specification/requirement) or this a practical concern (i.e. need to know when to update somebody’s dashboard widget (but the query is too slow to simply refresh on-demand)?


On Dec 22, 2018, at 9:42 AM, Ricardo Martin Gomez <rimartingomez@hotmail.com> wrote:

Hi, perhaps you can use triggers for some tables.
Regards.


From: Mitar <mmitar@gmail.com>
Sent: Saturday, December 22, 2018 1:21:49 AM
To: Kevin Brannen
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Watching for view changes
 
Hi!

On Thu, Dec 20, 2018 at 12:54 PM Kevin Brannen <KBrannen@efji.com> wrote:
Hmm, I guess I could see that as long as the DB wasn't too [write] busy, else you'd be flooded with notifications.

Sure. But this could be resolved by allowing notifications to be batched together. Debounce them. So could maybe configure how often you want such notifications and if they are more often they would be combined together into one.
 
Maybe it's a useful idea for you ... or maybe not. 😊

Thanks. Yes, this is one approach to do it. Hooking into every modify call at the app level and in this way have some information what is changing. I would prefer doing it in the database though, so that it could be independent from the source of the change. Moreover, not all UPDATE queries really do end up updating the data.


Mitar

--

Re: Watching for view changes

From
Mitar
Date:
Hi!

This is of very practical concern. :-) I have many apps I developed in Meteor [1] using MongoDB. The whole Meteor stack is about reactive programming where web UI automatically rerenders as data in the database is changing. Meteor achieves this using complicated server-side code which tails MongoDB oplog and then maps this to reactive queries and maps how they update based on changes it observes in the oplog. This is then pushed to the client which rerenders.

This approach has scalability issues and also it costs a lot of resources on the server side to first copy data from the DB into server-side component and then keep that state in the server-side component in sync with the DB. It generally has to reimplement oplog parsing, query parsing and evaluation, to be able to do all that.

I like this declerative style of programming. Where you define reactive queries where you select data from DB, define a transformation, and then render it in UI. As data in DB is changing, everything else gets updated automatically. It is a pretty nice way of programming. Without having to think about which all places might be updating DB and how to update UI based on all those places. Maybe not for everyone and all tasks, but in my case I generally work with collaborative online tools where such real-time aspect of working together is pretty neat.

So I like PostgreSQL and I have used in other apps. And now I am trying to see if I could find an efficient way for PostgreSQL to have such reactive query and send me data as the query is changing. I think DB already has to deal with most of such logic and wiring it together in the DB instead of server-side of the app might allow better performance and scaling here.

For example, CREATE PUBLICATION seems a reasonable API as well (instead of a trigger + notification + working around a limit on how much data can be send in a notification), but it does not work on materialized views. Why is that? I thought materialized views are the same as tables, just that there is nicer API to copy a query into those tables when wanted. Currently it seems what is there is very similar to what MongoDB provides: publication/observe on a table level. So if I would not be using joins I could SUBSCRIBE to the PUBLICATION. I would still have to implement logic how to map those changes to changes to a result of a query though, to know how to update results. If I could publish a materialized view, PostgreSQL could do that mapping for me. And I could also do joins.

So it seems materialized views are close to this, but not perfect. So I have some questions:

- Is there a technical reason why PUBLICATION cannot be done on a materialized views? I mean, I could manually create/simulate materialized views through regular tables probably. What are differences between regular tables and materialized views?
- Are there limits on how many subscribers to a PUBLICATION can there effectively be?
- For my case it would be great if materialized views could be TEMPORARY, in-memory (and if PostgreSQL runs out of buffer space for it, I would prefer an error), and UNLOGGED. Any particular reasons which would prevent them to be implemented as such?



Mitar

On Sat, Dec 22, 2018 at 1:16 PM Rob Sargent <robjsargent@gmail.com> wrote:
Is this of theoretical interest (up to and including a specification/requirement) or this a practical concern (i.e. need to know when to update somebody’s dashboard widget (but the query is too slow to simply refresh on-demand)?


On Dec 22, 2018, at 9:42 AM, Ricardo Martin Gomez <rimartingomez@hotmail.com> wrote:

Hi, perhaps you can use triggers for some tables.
Regards.


From: Mitar <mmitar@gmail.com>
Sent: Saturday, December 22, 2018 1:21:49 AM
To: Kevin Brannen
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Watching for view changes
 
Hi!

On Thu, Dec 20, 2018 at 12:54 PM Kevin Brannen <KBrannen@efji.com> wrote:
Hmm, I guess I could see that as long as the DB wasn't too [write] busy, else you'd be flooded with notifications.

Sure. But this could be resolved by allowing notifications to be batched together. Debounce them. So could maybe configure how often you want such notifications and if they are more often they would be combined together into one.
 
Maybe it's a useful idea for you ... or maybe not. 😊

Thanks. Yes, this is one approach to do it. Hooking into every modify call at the app level and in this way have some information what is changing. I would prefer doing it in the database though, so that it could be independent from the source of the change. Moreover, not all UPDATE queries really do end up updating the data.


Mitar

--



--

Re: Watching for view changes

From
George Neuner
Date:
On Fri, 21 Dec 2018 23:41:16 -0800, Mitar <mmitar@gmail.com> wrote:

>Hi!
>
>On Fri, Dec 21, 2018 at 11:10 PM George Neuner <gneuner2@comcast.net> wrote:
>> A materialized view IS exactly such a deliberate cache of results from
>> applying a view.  It is a real table that can be monitored for changes
>> using INSERT, UPDATE and/or DELETE triggers.
>
>Caching is needed if you want to compute a difference between previous
>version and new. But if you want to just know new value, then I could
>imagine that (a simple implementation would) on every change to any
>underlying table check if this change matches selectors of the query
>and if such apply its operations/projections and produce the new
>value.

Yes, that could be done.  But it isn't.  In effect you are asking the
DBMS also to be a spreadsheet: i.e. change this cell and everything
that depends on it gets recomputed.

A spreadsheet is an order of magnitude simpler to implement than a
DBMS, but the combination would be an order of magnitude (or more)
harder.  Even object graph databases don't offer spreadsheet
functionality, and it would be a lot easier to do there than in a
table relational system.


>So yes, you need caching if you want to decrease CPU use, but you
>could also see it as new values being computed again and again through
>query. Would such caching you are mentioning really improve
>performance, I do not know, so it might be premature optimization?

It may take only 3 cycles to multiply two numbers, but it can take
thousands of cycles [or millions if the data is on disk] to get those
two numbers into the multiplier.

There always are exceptions, but the general rule is that whenever the
result requires:
 - significant computation, 
 - significant resources, or
 - significant time
then you should cache the result instead of recomputing it.

Joins and sorts can take a whole lot of memory (and spill onto disk if
they overflow the work buffer).  A fetch of a table or index not in
memory is simple but takes a lot of time - as well as maybe pushing
something else out (increasing the complexity of a competing query).


>If we do go down the cache path, then I agree, materialized views
>seems nice, but I would prefer temporary materialized views: they
>should be cleaned up at the end of the session. Moreover, they should
>be ideally just in memory, not really on disk. Materialized views are
>currently stored to disk, no?

In PG, all *named* tables are backed on disk - even temporary tables.
Only anonymous tables of query results can exist entirely in memory
[and even they can spill onto disk when necessary].

With enough memory you can cache all your tables in shared buffers and
have enough extra that you never run out of work buffers and never
overflow a work buffer.  But that is the best you can achieve with PG.

George



Re: Watching for view changes

From
Mitar
Date:
Hi!

On Sun, Dec 23, 2018 at 1:00 AM George Neuner <gneuner2@comcast.net> wrote:
> A spreadsheet is an order of magnitude simpler to implement than a
> DBMS, but the combination would be an order of magnitude (or more)
> harder.  Even object graph databases don't offer spreadsheet
> functionality, and it would be a lot easier to do there than in a
> table relational system.

But having that readily available would be so cool. :-) Especially
because it is hard. And also, it does not necessary have to be full
graph. Just one level of dependencies. Then those are recomputed. And
then anything depending on now changed values get recomputed again.
And so on. So no need to traverse the whole graph at once.

> In PG, all *named* tables are backed on disk - even temporary tables.
> Only anonymous tables of query results can exist entirely in memory
> [and even they can spill onto disk when necessary].
>
> With enough memory you can cache all your tables in shared buffers and
> have enough extra that you never run out of work buffers and never
> overflow a work buffer.  But that is the best you can achieve with PG.

Thanks for all this input. So I am now thinking in terms of a
materialized views.

For my case it would be great if materialized views could be TEMPORARY
(removed at the end of session), in-memory (and if PostgreSQL runs out
of buffer space for it, I would prefer an error, instead of spilling
to the disk), and UNLOGGED. Any particular reasons which would prevent
them to be implemented as such? I through that materialized views are
just a fancy table with stored query, so that you can easily REFRESH
them, instead of doing that yourself.

I can then wire triggers on underlying tables to REFRESH materialized
views automatically. Is there some easy way to debounce those
refreshes? If I just blindly trigger REFRESH in every row trigger,
this could do a lot of work. I would prefer to combine all changes for
example into 100 ms batches and refresh only once per 100 ms. So in
some way I would like to be able to control the level of real-time I
would need. I have found a blog post [1] about this, but it seems
pretty tricky and requires an outside cron job. For 100 ms batching
time this feels like something better done inside PostgreSQL itself.

The last question is how do I get changes in materialized views
streamed to the client. It seems one option is a trigger on the
materialized view which uses NOTIFY to tell the client about the
change. But NOTIFY has limit on the payload size, so I cannot just
send the change to the client. It seems I would need additional table
to store the change and then client would get notification, read from
that table, and remove the rows read. So in a way I would need my own
queue for changes. Any other suggestion how to do that? I looked into
PUBLICATION and SUBSCRIBE, but it seems this is only supported between
servers, not server-client, and also works only on base tables, not
materialized views (not sure again why, because aren't materialized
views just tables). Would it be possible to use client to subscribe
instead of a server?

[1] https://onethingsimple.com/2017/10/sync-materialized-views-after-debounce-period/


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
Mitar
Date:
Hi!

On Fri, Dec 21, 2018 at 11:10 PM George Neuner <gneuner2@comcast.net> wrote:
> A materialized view IS exactly such a deliberate cache of results from
> applying a view.  It is a real table that can be monitored for changes
> using INSERT, UPDATE and/or DELETE triggers.

Are you sure one can use triggers on a materialized view? I am getting:

"my_materialized_view" is not a table or view

as an error when I am trying to create a trigger on materialized view.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
George Neuner
Date:
On Sun, 23 Dec 2018 23:06:51 -0800, Mitar <mmitar@gmail.com> wrote:

>On Fri, Dec 21, 2018 at 11:10 PM George Neuner <gneuner2@comcast.net> wrote:
>> A materialized view IS exactly such a deliberate cache of results from
>> applying a view.  It is a real table that can be monitored for changes
>> using INSERT, UPDATE and/or DELETE triggers.
>
>Are you sure one can use triggers on a materialized view? I am getting:
>
>"my_materialized_view" is not a table or view
>
>as an error when I am trying to create a trigger on materialized view.

IIRC the cache table's name is generated.  I don't know the proper
incantations to get it from the catalogs, but an easy way to find it
is to put the materialized view into its own tablespace, then search
pg_tables for objects in that space.

George



Re: Watching for view changes

From
George Neuner
Date:
On Sun, 23 Dec 2018 10:10:50 -0800, Mitar <mmitar@gmail.com> wrote:

>I can then wire triggers on underlying tables to REFRESH materialized
>views automatically. Is there some easy way to debounce those
>refreshes? If I just blindly trigger REFRESH in every row trigger,
>this could do a lot of work. I would prefer to combine all changes for
>example into 100 ms batches and refresh only once per 100 ms. So in
>some way I would like to be able to control the level of real-time I
>would need. I have found a blog post [1] about this, but it seems
>pretty tricky and requires an outside cron job. For 100 ms batching
>time this feels like something better done inside PostgreSQL itself.

see the pg_cron extension.

But cron doesn't allow sub-second scheduling. Why do you need it so
frequently?  

PG is not a real time platform, or even an all-in-memory database.  If
you try to use it that way, you are guaranteed to have problems.


>The last question is how do I get changes in materialized views
>streamed to the client. It seems one option is a trigger on the
>materialized view which uses NOTIFY to tell the client about the
>change. But NOTIFY has limit on the payload size, so I cannot just
>send the change to the client. 

If you trigger by row, the NOTIFY payload is just that one row.  If
the data can't fit into the 8K payload, then you need to send some
kind of row id and have the client read the changes explicitly.

Also remember that you may have to deal with DELETEd rows. If you
can't send row data by NOTIFY, then the client HAS to cache the whole
view anyway to see what's been deleted.


>It seems I would need additional table
>to store the change and then client would get notification, read from
>that table, and remove the rows read. So in a way I would need my own
>queue for changes. 

There's no reason to do that. The client has to interpret the view
changes and incorporate them into its own local data structures.
Segregating new/modified rows separately on the server side seems to
me to be a waste of effort.

If it makes sense, have the client collect some number of
notifications and read all the indicated rows in one query.


George



Re: Watching for view changes

From
Mitar
Date:
Hi!

On Mon, Dec 24, 2018 at 12:20 AM George Neuner <gneuner2@comcast.net> wrote:
> Also remember that you may have to deal with DELETEd rows. If you
> can't send row data by NOTIFY, then the client HAS to cache the whole
> view anyway to see what's been deleted.

Client-side (web browser) is caching the view (because it is rendering
it). I do not want the server-side component (PostgreSQL client) to
have to cache it as well. So I will use materialized view to cache it
in the PostgreSQL directly, and the sync it all the way up to the web
browser.

Not sure if this changes anything?

> If it makes sense, have the client collect some number of
> notifications and read all the indicated rows in one query.

You are right. Thanks again.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
Mitar
Date:
Hi!

On Sun, Dec 23, 2018 at 11:59 PM George Neuner <gneuner2@comcast.net> wrote:
> IIRC the cache table's name is generated.  I don't know the proper
> incantations to get it from the catalogs, but an easy way to find it
> is to put the materialized view into its own tablespace, then search
> pg_tables for objects in that space.

Oh, what a trick. :-)


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
Mitar
Date:
Hi!

I am unable to find cache table's name. :-(


Mitar

On Mon, Dec 24, 2018 at 1:01 AM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> On Sun, Dec 23, 2018 at 11:59 PM George Neuner <gneuner2@comcast.net> wrote:
> > IIRC the cache table's name is generated.  I don't know the proper
> > incantations to get it from the catalogs, but an easy way to find it
> > is to put the materialized view into its own tablespace, then search
> > pg_tables for objects in that space.
>
> Oh, what a trick. :-)
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
George Neuner
Date:
On Mon, 24 Dec 2018 01:16:53 -0800, Mitar <mmitar@gmail.com> wrote:

>
>I am unable to find cache table's name. :-(
>

Did you execute the view or simply define it?  I think the table is
created the 1st time the view is executed.

George



Re: Watching for view changes

From
Mitar
Date:
Hi!

On Mon, Dec 24, 2018 at 1:31 AM George Neuner <gneuner2@comcast.net> wrote:
> Did you execute the view or simply define it?  I think the table is
> created the 1st time the view is executed.

View is populated with data and I can select from it. Still, I cannot
find any table which would look like it in pg_tables?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
Mitar
Date:
Hi!

I started a patch to address this limitation. [1]

[1] https://www.postgresql.org/message-id/CAKLmikMN%2B0W79A7Wg4krK55cTLxb%3DTARdz2b%3Dvqp19ao6AAZjw%40mail.gmail.com


Mitar

On Mon, Dec 24, 2018 at 1:38 AM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> On Mon, Dec 24, 2018 at 1:31 AM George Neuner <gneuner2@comcast.net> wrote:
> > Did you execute the view or simply define it?  I think the table is
> > created the 1st time the view is executed.
>
> View is populated with data and I can select from it. Still, I cannot
> find any table which would look like it in pg_tables?
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Watching for view changes

From
Mitar
Date:
Hi!

I have a followup to this thread.

George, thanks for all the help. I spend some time now investigating
various approaches here and I am reporting here some findings, so that
they might help others as well.

First, I have tried the approach with MATERIALIZED VIEWs to hold the
cached contents of the query. I attached triggers to all source tables
which then notified (using LISTEN/NOTIFY) the client about changes.
Client throttled those notifications and eventually triggered a
REFRESH MATERIALIZED VIEW. Because I also attached triggers on the
materialized view, I got notifications (using LISTEN/NOTIFY) of what
rows have changed. Client might decided to fetch also rows themselves.
For this to work well I made two patches. A patch to allow creation of
TEMPORARY MATERIALIZED VIEWs [1] and a patch to allow attaching
triggers on materialized views [2]. In the second patch I also changed
the REFRESH MATERIALIZED VIEW CONCURRENTLY logic to issue not just
REMOVEs and INSERTs for changed rows, but to compute which rows have
changed and issue UPDATEs for them as well. This makes it easier for
the client to know what changed.

This worked well but it was slower than some other packages I have
found on the web which were trying to provide a similar functionality.
I made a benchmark to compare them [3] and found out that there was
room for improvement.

First, instead of sending updates of a MATERIALIZED VIEW using
LISTEN/NOTIFY and then fetching rows, I instead used a trigger to copy
changes to another temporary table, and then just used DELETE FROM
temp_table RETURNING * to get all results from a table and returning
it to the client, all inside same transaction, so data in that
temporary table was never committed.

This made things a bit better, mostly latency between a change and
getting it to the client became more predictable. Before there were
sometimes quite large spikes. Once I did that I realized that in fact
MATERIALIZED VIEW is not really necessary. All I need is a place to
cache previous results of the query, but I do not really care about
updates to the MATERIALIZED VIEW. So I decided to do the following. If
I already have a temporary table with previous results, when I want to
refresh my query, I create a new temporary table using CREATE TABLE AS
using the PREPAREd query, I compute diff between those two tables in
the same way as REFRESH MATERIALIZED VIEW CONCURRENTLY does, in one
query, and I return those results to the client. Then I just DROP
TABLE old cache table, and rename new cache table to old cache name.
So instead of computing a diff, updating materialized view, running
triggers, and copying to the table, I just compute a diff and this is
it.

This works now very well. I made and published a package doings this [4].

Of course, the main problem is still that for every change in source
table I have to eventually refresh the query. And without more logic
this can really become problematic. Ideally, we could reuse some
ideas/code from Incremental View Maintenance [5] to compute what
changes to the query results should happen based on changes to source
tables. Then we could just push those to the client (and update
temporary table). We would then do full refresh only when things could
not be inferred.

Thanks again for everything.

[1] https://commitfest.postgresql.org/21/1951/
[2] https://commitfest.postgresql.org/21/1948/
[3] https://github.com/mitar/node-pg-reactivity-benchmark
[4] https://github.com/tozd/node-reactive-postgres
[5] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m