Re: Watching for view changes - Mailing list pgsql-general

From George Neuner
Subject Re: Watching for view changes
Date
Msg-id l1gu1e9pova7q543j5bcs0o0i1njm5io09@4ax.com
Whole thread Raw
In response to Watching for view changes  (Mitar <mmitar@gmail.com>)
Responses Re: Watching for view changes  (Mitar <mmitar@gmail.com>)
Re: Watching for view changes  (Mitar <mmitar@gmail.com>)
List pgsql-general
On Fri, 21 Dec 2018 23:41:16 -0800, Mitar <mmitar@gmail.com> wrote:

>Hi!
>
>On Fri, Dec 21, 2018 at 11:10 PM George Neuner <gneuner2@comcast.net> wrote:
>> A materialized view IS exactly such a deliberate cache of results from
>> applying a view.  It is a real table that can be monitored for changes
>> using INSERT, UPDATE and/or DELETE triggers.
>
>Caching is needed if you want to compute a difference between previous
>version and new. But if you want to just know new value, then I could
>imagine that (a simple implementation would) on every change to any
>underlying table check if this change matches selectors of the query
>and if such apply its operations/projections and produce the new
>value.

Yes, that could be done.  But it isn't.  In effect you are asking the
DBMS also to be a spreadsheet: i.e. change this cell and everything
that depends on it gets recomputed.

A spreadsheet is an order of magnitude simpler to implement than a
DBMS, but the combination would be an order of magnitude (or more)
harder.  Even object graph databases don't offer spreadsheet
functionality, and it would be a lot easier to do there than in a
table relational system.


>So yes, you need caching if you want to decrease CPU use, but you
>could also see it as new values being computed again and again through
>query. Would such caching you are mentioning really improve
>performance, I do not know, so it might be premature optimization?

It may take only 3 cycles to multiply two numbers, but it can take
thousands of cycles [or millions if the data is on disk] to get those
two numbers into the multiplier.

There always are exceptions, but the general rule is that whenever the
result requires:
 - significant computation, 
 - significant resources, or
 - significant time
then you should cache the result instead of recomputing it.

Joins and sorts can take a whole lot of memory (and spill onto disk if
they overflow the work buffer).  A fetch of a table or index not in
memory is simple but takes a lot of time - as well as maybe pushing
something else out (increasing the complexity of a competing query).


>If we do go down the cache path, then I agree, materialized views
>seems nice, but I would prefer temporary materialized views: they
>should be cleaned up at the end of the session. Moreover, they should
>be ideally just in memory, not really on disk. Materialized views are
>currently stored to disk, no?

In PG, all *named* tables are backed on disk - even temporary tables.
Only anonymous tables of query results can exist entirely in memory
[and even they can spill onto disk when necessary].

With enough memory you can cache all your tables in shared buffers and
have enough extra that you never run out of work buffers and never
overflow a work buffer.  But that is the best you can achieve with PG.

George



pgsql-general by date:

Previous
From: Mitar
Date:
Subject: Re: Watching for view changes
Next
From: Istvan Soos
Date:
Subject: logical replication resiliency