Thread: Materialized View Summary

Materialized View Summary

From
"Jonathan M. Gardner"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've written a summary of my findings on implementing and using
materialized views in PostgreSQL. I've already deployed eagerly updating
materialized views on several views in a production environment for a
company called RedWeek: http://redweek.com/. As a result, some queries
that were taking longer than 30 seconds to run now run in a fraction of a
millisecond.

You can view my summary at
http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Comments and suggestions are definitely welcome.

- --
Jonathan Gardner
jgardner@jonathangardner.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFAO3eZqp6r/MVGlwwRAnpEAKC8+/lFyPBbXetPEfFLwgUvJZLCmgCfYlmR
0vZmCcbGSNT/m/W8QOIhufk=
=snCu
-----END PGP SIGNATURE-----

Re: [SQL] Materialized View Summary

From
Richard Huxton
Date:
On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote:
>
> I've written a summary of my findings on implementing and using
> materialized views in PostgreSQL. I've already deployed eagerly updating
> materialized views on several views in a production environment for a
> company called RedWeek: http://redweek.com/. As a result, some queries
> that were taking longer than 30 seconds to run now run in a fraction of a
> millisecond.
>
> You can view my summary at
> http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Interesting (and well written) summary. Even if not a "built in" feature, I'm
sure that plenty of people will find this useful. Make sure it gets linked to
from techdocs.

If you could identify candidate keys on a view, you could conceivably automate
the process even more. That's got to be possible in some cases, but I'm not
sure how difficult it is to do in all cases.

--
  Richard Huxton
  Archonet Ltd

Re: [HACKERS] [SQL] Materialized View Summary

From
Robert Treat
Date:
On Tue, 2004-02-24 at 12:11, Richard Huxton wrote:
> On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote:
> >
> > I've written a summary of my findings on implementing and using
> > materialized views in PostgreSQL. I've already deployed eagerly updating
> > materialized views on several views in a production environment for a
> > company called RedWeek: http://redweek.com/. As a result, some queries
> > that were taking longer than 30 seconds to run now run in a fraction of a
> > millisecond.
> >
> > You can view my summary at
> > http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html


have you done much concurrency testing on your snapshot views? I
implemented a similar scheme in one of my databases but found problems
when I had concurrent "refresh attempts".  I ended up serializing the
calls view LOCKing, which was ok for my needs, but I thought potentially
problematic in other cases.

>
> Interesting (and well written) summary. Even if not a "built in" feature, I'm
> sure that plenty of people will find this useful. Make sure it gets linked to
> from techdocs.

Done. :-)

>
> If you could identify candidate keys on a view, you could conceivably automate
> the process even more. That's got to be possible in some cases, but I'm not
> sure how difficult it is to do in all cases.
>

it seems somewhere between Joe Conways work work arrays and polymorphic
functions in 7.4 this should be feasible.


Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: [HACKERS] [SQL] Materialized View Summary

From
"Jonathan M. Gardner"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm not sure if my original reply made it through. Ignore the last one if
it did.

On Tuesday 24 February 2004 1:48 pm, Robert Treat wrote:
> On Tue, 2004-02-24 at 12:11, Richard Huxton wrote:
> > On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote:
> > > I've written a summary of my findings on implementing and using
> > > materialized views in PostgreSQL. I've already deployed eagerly
> > > updating materialized views on several views in a production
> > > environment for a company called RedWeek: http://redweek.com/. As a
> > > result, some queries that were taking longer than 30 seconds to run
> > > now run in a fraction of a millisecond.
> > >
> > > You can view my summary at
> > > http://jonathangardner.net/PostgreSQL/materialized_views/matviews.h
> > >tml
>
> have you done much concurrency testing on your snapshot views? I
> implemented a similar scheme in one of my databases but found problems
> when I had concurrent "refresh attempts".  I ended up serializing the
> calls view LOCKing, which was ok for my needs, but I thought
> potentially problematic in other cases.
>

We are running into some small problems with deadlocks and multiple
inserts. It's not a problem unless we do a mass update to the data or
something like that. I'm interested in how you solved your problem.

I am playing with an exclusive lock scheme that will lock all the
materialized views with an exclusive lock (see Section 12.3 for a
reminder on what exactly this means). The locks have to occur in order,
so I use a recursive function to traverse a dependency tree to the root
and then lock from there. Right now, we only have one materialized view
tree, but I can see some schemas having multiple seperate trees with
multiple roots. So I put in an ordering to lock the tables in a
pre-defined order.

But if the two dependency trees are totally seperate, it is possible for
one transaction to lock tree A and then tree B, and for another to lock
tree B and then tree A, causing deadlock.

Unfortunately, I can't force any update to the underlying tables to force
this locking function to be called. So we will probably call this
manually before we touch any of those tables.

In the future, it would be nice to have a hook into the locking mechanism
so any kind of lock on the underlying tables can trigger this.

Also, building the dependency trees is completely manual. Until I can get
some functions to actually assemble the triggers and such, automatic
building of the trees will be difficult.


- --
Jonathan Gardner
jgardner@jonathangardner.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFAPFqRqp6r/MVGlwwRAnvPAJ90lEEyaBzAfUoLZU93ZDvkojaAwwCdGjaA
YBlO57OiZidZuQ5/S0u6wXM=
=bMYE
-----END PGP SIGNATURE-----

Re: [HACKERS] [SQL] Materialized View Summary

From
Robert Treat
Date:
On Wed, 2004-02-25 at 03:19, Jonathan M. Gardner wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I'm not sure if my original reply made it through. Ignore the last one if
> it did.

But I liked the last one :-)

>
> On Tuesday 24 February 2004 1:48 pm, Robert Treat wrote:
> > On Tue, 2004-02-24 at 12:11, Richard Huxton wrote:
> > > On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote:
> > > > I've written a summary of my findings on implementing and using
> > > > materialized views in PostgreSQL. I've already deployed eagerly
> > > > updating materialized views on several views in a production
> > > > environment for a company called RedWeek: http://redweek.com/. As a
> > > > result, some queries that were taking longer than 30 seconds to run
> > > > now run in a fraction of a millisecond.
> > > >
> > > > You can view my summary at
> > > > http://jonathangardner.net/PostgreSQL/materialized_views/matviews.h
> > > >tml
> >
> > have you done much concurrency testing on your snapshot views? I
> > implemented a similar scheme in one of my databases but found problems
> > when I had concurrent "refresh attempts".  I ended up serializing the
> > calls view LOCKing, which was ok for my needs, but I thought
> > potentially problematic in other cases.
> >
>
> We are running into some small problems with deadlocks and multiple
> inserts. It's not a problem unless we do a mass update to the data or
> something like that. I'm interested in how you solved your problem.
>

Well, I have two different cases actually. In one case I have a master
table with what are essentially 4 or 5 matviews based off of that.  I
don't allow updates to the matviews, only to the master table, and only
via stored procedures. This would work better if locking semantics
inside of pl functions worked properly, but currently we have the
application lock the table in exclusive access mode and then call the
function to make the data changes which then fires off a function to
update the matviews.  Since it's all within a transaction, readers of
the matviews are oblivious to the change.  IMO this whole method is a
wizardry in database hack jobs that I would love to replace.

The second case, and this one being much simpler, started out as a view
that does aggregation across several other views and tables, which is
pretty resource intensive but only returns 4 rows. I refresh the matview
via a cron job which basically does a SELECT * FOR UPDATE on the
matview, deletes the entire contents, then does an INSERT INTO matview
SELECT * FROM view.  Again since it's in a transaction, readers of the
matview are happy (and apps are only granted select on the matview).
Concurrency is kept because the cron job must wait to get a LOCK on the
table before it can proceed with the delete/update.  I have a feeling
that this method could fall over given a high enough number of
concurrent updaters, but works pretty well for our needs.

> I am playing with an exclusive lock scheme that will lock all the
> materialized views with an exclusive lock (see Section 12.3 for a
> reminder on what exactly this means). The locks have to occur in order,
> so I use a recursive function to traverse a dependency tree to the root
> and then lock from there. Right now, we only have one materialized view
> tree, but I can see some schemas having multiple seperate trees with
> multiple roots. So I put in an ordering to lock the tables in a
> pre-defined order.
>
> But if the two dependency trees are totally seperate, it is possible for
> one transaction to lock tree A and then tree B, and for another to lock
> tree B and then tree A, causing deadlock.
>
> Unfortunately, I can't force any update to the underlying tables to force
> this locking function to be called. So we will probably call this
> manually before we touch any of those tables.

Yeah, I ran into similar problems as this, but ISTM you could do a
before update trigger on the matview to do the locking (though I'd guess
this would end in trouble due to plpgsql lock semantics, so maybe i
shouldn't send you down a troubled road...)

>
> In the future, it would be nice to have a hook into the locking mechanism
> so any kind of lock on the underlying tables can trigger this.
>
> Also, building the dependency trees is completely manual. Until I can get
> some functions to actually assemble the triggers and such, automatic
> building of the trees will be difficult.
>

I just noticed that your summary doesn't make use of postgresql RULES in
any way, how much have you traveled down that path? We had cooked up a
scheme for our second case where we would have a table that held an
entry for the matview and then a timestamp of the last update/insert
into any of the base tables the matview depended on.  when then would
create rules on all the base tables to do an update to the refresh table
any time they were updated/inserted/deleted.  We would then put a
corresponding rule on the matview so that on each select from the
matview, it would check to see if any of it's base tables had changed
and if so fire off a refresh of itself.  We ended up abandoning this
idea as the complexity seemed to high when the simple scheme above
worked equally well for our needs.


Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: [HACKERS] [SQL] Materialized View Summary

From
Hans-Jürgen Schönig
Date:
Richard Huxton wrote:
> On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote:
>
>>I've written a summary of my findings on implementing and using
>>materialized views in PostgreSQL. I've already deployed eagerly updating
>>materialized views on several views in a production environment for a
>>company called RedWeek: http://redweek.com/. As a result, some queries
>>that were taking longer than 30 seconds to run now run in a fraction of a
>>millisecond.
>>
>>You can view my summary at
>>http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html
>
>
> Interesting (and well written) summary. Even if not a "built in" feature, I'm
> sure that plenty of people will find this useful. Make sure it gets linked to
> from techdocs.
>
> If you could identify candidate keys on a view, you could conceivably automate
> the process even more. That's got to be possible in some cases, but I'm not
> sure how difficult it is to do in all cases.
>



Are there any plans to rewrite that in C and add proper support for SQL
commands? (e.g. "CREATE MATERIALIZED VIEW", "DROP VIEW", ...).


    Best regards,

        Hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/2952/30706 or +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at


Re: [HACKERS] [SQL] Materialized View Summary

From
Jonathan Gardner
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday 24 February 2004 01:48 pm, Robert Treat wrote:
> On Tue, 2004-02-24 at 12:11, Richard Huxton wrote:
> > On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote:
> > > I've written a summary of my findings on implementing and using
> > > materialized views in PostgreSQL. I've already deployed eagerly
> > > updating materialized views on several views in a production
> > > environment for a company called RedWeek: http://redweek.com/. As a
> > > result, some queries that were taking longer than 30 seconds to run
> > > now run in a fraction of a millisecond.
> > >
> > > You can view my summary at
> > > http://jonathangardner.net/PostgreSQL/materialized_views/matviews.htm
> > >l
>
> have you done much concurrency testing on your snapshot views? I
> implemented a similar scheme in one of my databases but found problems
> when I had concurrent "refresh attempts".  I ended up serializing the
> calls view LOCKing, which was ok for my needs, but I thought potentially
> problematic in other cases.
>

I don't actually use snapshot views in production. I would imagine that if
you had two seperate processes trying to update the views simultaneously,
that would be a problem. All I can say is "don't do that". I think you'd
want to lock the table before we go and start messing with it on that
scale.

We are running into some deadlock issues and some other problems with eager
mvs, but they are very rare and hard to reproduce. I think we are going to
start locking the row before updating it and see if that solves it. We also
just discovered the "debug_deadlock" feature.

I'll post my findings and summaries of the information I am getting here
soon.

I'm interested in whatever you've been working on WRT materialized views.
What cases do you think will be problematic? Do you have ideas on how to
work around them? Are there issues that I'm not addressing but should be?

> > Interesting (and well written) summary. Even if not a "built in"
> > feature, I'm sure that plenty of people will find this useful. Make
> > sure it gets linked to from techdocs.
>
> Done. :-)
>

*blush*

> > If you could identify candidate keys on a view, you could conceivably
> > automate the process even more. That's got to be possible in some
> > cases, but I'm not sure how difficult it is to do in all cases.
>
> it seems somewhere between Joe Conways work work arrays and polymorphic
> functions in 7.4 this should be feasible.
>

I'll have to look at what he is doing in more detail.

- --
Jonathan M. Gardner
Web Developer, Amazon.com
jonagard@amazon.com - (206) 266-2906
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFAO837BFeYcclU5Q0RAhonAKDBY7Svz9/vxmerS+y/h2mLgV1ZZQCdFlnd
7aMPFvRx4O8qg+sJfWkaBh8=
=zdhL
-----END PGP SIGNATURE-----

Re: [HACKERS] Materialized View Summary

From
Mark Gibson
Date:
Jonathan M. Gardner wrote:

> You can view my summary at
> http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html
>
> Comments and suggestions are definitely welcome.
>
Fantastic, I was planning on a bit of materialized view investigations
myself
when time permits, I'm pleased to see you've started the ball rolling.

I was thinking about your problem with mutable functions used in a
materialized view.

How about eliminating the mutable functions as much as possible from the
underlying
view definition, and create another view on top of the materialized view
that has the mutable bits!
Giving you the best of both worlds.

I haven't tried this or thought it through very much - too busy - but
I'd thought I'd throw
it in for a bit o' head scratching, and chin stroking :)

Cheers
--
Mark Gibson <gibsonm |AT| cromwell |DOT| co |DOT| uk>
Web Developer & Database Admin
Cromwell Tools Ltd.
Leicester, England.