Re: Watching Views - Mailing list pgsql-general

From Nick Guenther
Subject Re: Watching Views
Date
Msg-id 20140723201041.52584g72qnpk1smc@www.nexusmail.uwaterloo.ca
Whole thread Raw
In response to Re: Watching Views  (David G Johnston <david.g.johnston@gmail.com>)
Responses Re: Watching Views  (David G Johnston <david.g.johnston@gmail.com>)
List pgsql-general


Quoting David G Johnston <david.g.johnston@gmail.com>:

> Nick Guenther wrote
>> Dear List,
>>
>> In principle, I am looking for some way to say
>> ```
>> CREATE VIEW view13131 AS select (id, name, bank_account) from actors
>> where age > 22;
>> WATCH view13131;
>> ```
>>
>> and get output to stdout like
>> ```
>> ....
>> INSERT view13131 VALUES (241, "Mortimer", 131.09);
>> ...
>> INSERT view13131 VALUES (427, "Schezwan", 95.89);
>> UPDATE view13131 SET bank_account = 1017.12 WHERE id = 427;
>> DELETE FROM view13131 WHERE id = 92;
>> ...
>> ```
>
> 9.4 - http://www.postgresql.org/docs/9.4/static/logicaldecoding.html
>
> Though I doubt your requirement to obtain only a subset of data is something
> that can be accommodated; especially in SQL form.
>
> And, yes, you can create triggers on views.
>
> http://www.postgresql.org/docs/9.4/static/sql-createtrigger.html
>
> But assuming your view is meant to be dynamic, covering only the subset of
> data you wish to watch, no one is going to be using your view to actually
> Insert/Update/Delete against the underlying table(s) so it will not do you
> any good to add triggers to it.
>
> You probably need to create some kind of materialized view and add a trigger
> to the relevant source table to maintain that view on an ongoing basis.
> Then remove the trigger (and optionally the materialized view) when you no
> longer care to watch.
>
> This:
> http://www.postgresql.org/docs/9.4/static/sql-creatematerializedview.html
> .... or roll your own.
>
> You can also use the full power of whatever programming languages you can
> install onto the server (e.g., pl/perl, pl/python, pl/sh) to link up with
> the outside world from inside one of those triggers...

Thanks David, your tips were very helpful. I'm not a SQL expert, but
these ideas springboarded me ahead. I've been playing with your ideas
this week and I've come up with a way to extract the logical changes
without using 9.4, which I'll share here in case anyone else is
curious. This is preliminary, so don't rely on it for anything
important. It doesn't write "DELETE FROM " lines, but it does write
the data in a json-esque format which could be without too much work
turned into my invented WATCH language.


```{psql}
-- watch.psql
-- This postgres + pl/python2 script demonstrates watching changes to
a table via a trigger.
-- Python opens a FIFO on which it writes lines like
--  "+ {'rating': 3, 'kind': 'Documentary', 'name': 'The Mail Man'}".
-- The FIFO part is the flakiest bit, because it requires you to load
the trigger,
-- then immediately run a reader (e.g. `tail -f
/path/to/postgres/data/_changes_films`)
-- *before* any DB updates happen.
-- If you have no reader, updates will fail (the kernel will raise
ENXIO at "print >>FIFO").
-- The other option is to ignore the ENXIOs, but then changes will get
lost. I'm not sure.
-- Some more design (subprocess? calling this from a master script?)
can probably fix that awkwardness.
--
-- The second point of flakiness is that attaching the trigger is
rather verbose.
-- This can be solved with pl/pgsql subroutines.
--
-- This could be probably done in plpgsql, but I know python better,
and it comes with serialization (json, msgpack, pickle) available
easily.
-- these tips are due to
--
http://www.postgresql.org/message-id/1405660725952-5811931.post@n5.nabble.com
and
--
http://www.postgresql.org/message-id/1405703990457-5811982.post@n5.nabble.com
-- The reason I'm not using "Logical Decoding"
<http://www.postgresql.org/docs/devel/static/logicaldecoding-example.html> is
because it's still in devel---not even Arch Linux, usually full of
bleeding edge code, has this feature yet. Plus it requires fiddling
with the .conf file.


DROP TABLE IF EXISTS films CASCADE;
CREATE TABLE films (name text, kind text, rating int);


DROP FUNCTION IF EXISTS watch_table();
CREATE FUNCTION watch_table() RETURNS trigger AS $$
   tablename = TD["args"][0]

   FIFO = "_changes_%s" % (tablename,)
   if "FIFO" not in SD:
     #this is our first time running in this instance of the python
interpreter:
     # run initializations

     #PL/Python is really meant for small one-off tasks, mostly. Most
data should probably just be stuffed straight into the database.
     # however, things like file descriptors don't work so well like that
     # for these things, we need to use the facilities PL/python
provides:
http://www.postgresql.org/docs/9.3/static/plpython-sharing.html
     #  summary is: SD stands for "static data" and behaves like
static locals in C (they must have some kind of trap table kicking
around that switches in values of SD when the appropriate function is
called).
     #              GD stands for "global data" and is the same everywhere
     #        both begin as empty dictionaries
     #   note also that it seems that one python interpreter is
invoked ~per client connection~; not per-statement (which would be too
fine) nor per
     import sys, os

     if os.path.exists(FIFO):
       #TODO: check that, if it exists, it's a FIFO and we have perms on it
       pass
     else:
       print("attempting to construct fifo", FIFO)
       try:
         os.mkfifo(FIFO)
       except Exception as e:
         import traceback
         traceback.print_exc()
         print("couldn't make FIFO '%s'. ignoring" % FIFO)
         pass
     # XXX problem: a nonblocking pipe cannot be opened until there is
a reader to read it; the reader may go away after a moment and
everything will be peachy, but startup is hard
     # ..hm.

     fd = os.open(FIFO, os.O_WRONLY | os.O_NONBLOCK) #O_NONBLOCK is
key; otherwise, this line will *hang* the postgres process until
someone opens the other end
     FIFO = os.fdopen(fd, "w", 0) #OVERWRITES; buffering=0 means
unbuffered, important for our real-time changes goal
     SD["FIFO"] = FIFO

   FIFO = SD["FIFO"] #retrieve the FIFO from the static data, if this
is our second (or even first) time around

   print "Change occurred:", TD  #debugging
   if TD["event"] == "INSERT":
     print >>FIFO, "+", TD["new"]
   elif TD["event"] == "UPDATE":
     print >>FIFO, TD["old"], "-->", TD["new"]
   elif TD["event"] == "DELETE":
     print >>FIFO, "-", TD["old"]
$$ language plpython2u;


CREATE TRIGGER watch_table___
   AFTER INSERT OR UPDATE OR DELETE
   ON films
   FOR EACH ROW
   EXECUTE PROCEDURE watch_table('films');


-- Demo: run `tail -f data/_changes_films` and then let the following
lines run.
-- You should see
-- + {'rating': 0, 'kind': 'Documentary', 'name': 'Grass'}
-- + {'rating': 3, 'kind': 'Documentary', 'name': 'The Mail Man'}
-- + {'rating': 5, 'kind': 'Comedy', 'name': 'Superfly'}
-- - {'rating': 0, 'kind': 'Documentary', 'name': 'Grass'}
-- - {'rating': 3, 'kind': 'Documentary', 'name': 'The Mail Man'}
--  as written, you might need to run this script twice: once to get
the mkfifo() to happen
--  then again with tail running.
INSERT INTO films VALUES ('Grass', 'Documentary', 0);
INSERT INTO films VALUES ('The Mail Man', 'Documentary', 3);
INSERT INTO films VALUES ('Superfly', 'Comedy', 5);
DELETE FROM films WHERE rating < 5;
```

As you said, attaching the trigger to a view is useless (for
BEFORE/AFTER, which I'm interested in, also only works on statement
level changes, which I would rather not have to deal with). I tried
attaching my trigger to a materialized view and found that postgres
does not support that; as you said, I need to write a trigger on the
source to keep the materialized view and the source in sync. But in
that case I don't think a materialized view helps me at all, since
without triggers on it I can't even have it compute the diffs for me
(and I suspect that internally postgres simply truncates the old table
and refills it, so there would be nothing to hook anyway).

My bottleneck is the size of my database and the network: I want to
take slices of a potentially gigabytes-large database and stream them
out to multiple clients. Thus I need to stream--not poll--for changes.
I think a materialized view would force me to poll for changes, and in
that case I would be doing REFRESH MATERIALIZED VIEW on each poll and
therefore not winning anything over a regualar view. Ideally, when an
update comes in to a parent table I want all views that have sliced it
to be notified; I'm interested in doing dataflow* programming,
essentially. Is there maybe some sort of extension that does
dependency tracking inside of postgres?


--
Nick Guenther
4B Stats/CS
University of Waterloo


* Good examples of dataflow programming are in Knockout
<http://knockoutjs.com/documentation/computedObservables.html> and
Lava <http://lava.codeplex.com/>. Also Elm <http://elm-lang.org/>,
though Elm calls "dataflow" "functional reactive".





pgsql-general by date:

Previous
From: Anil Menon
Date:
Subject: Re: Referencing serial col's sequence for insert
Next
From: "Vasudevan, Ramya"
Date:
Subject: event triggers in 9.3.4