Thread: GSoC - Materialized Views - is stale or fresh?
I am curious how could I solve the problem: During refreshing I would like to know, if MV is stale or fresh? And I had an idea: In fact, MV need to know if its last refresh (transaction id) is older than any INSERT, UPDATE, DELETE transaction launched against source tables. So if MV has information about last (highest) xmin in source tables, it could simply compare its own xmin to xmins (xmax for deleted rows) from source tables and decide, if is stale or fresh. Whole realization could look like this: 1. Make new column in pg_class (or somewhere in pg_stat* ?): pg_class.rellastxid(of type xid) 2. After each INSERT, UPDATE, DELETE statement (transaction) pg_class.rellastxid would be updated. That should not be time- or memory- consuming (not so much) since pg_class is cached, I guess. 3. When refreshing, as described above, MV rellastxid compared to source tables rellastxid could answer if MV is stale or still fresh. Decision, if to run refreshing, would be as simple as it can. a) Is the idea right? b) Could appear some cases when it is not true? (except xid wraparound). c) I was looking for some help with it in pg_stat*, but there is no information about transactions, last changes in relations or anything. d) or there are other mechanisms or ideas how to check if MV source tables are changed from last refresh? Thanks for replies Pavel Baros
Pavel Baros wrote: > After each INSERT, UPDATE, DELETE statement (transaction) > pg_class.rellastxid would be updated. That should not be time- or > memory- consuming (not so much) since pg_class is cached, I guess. An update in PostgreSQL is essentially an INSERT followed a later DELETE when VACUUM gets to the dead row no longer visible. The problem with this approach is that it will leave behind so many dead rows in pg_class due to the heavy updates that the whole database could grind to a halt, as so many operations will have to sort through all that garbage. It could potentially double the total write volume on the system, and you'll completely kill people who don't have autovacuum running during some periods of the day. The basic idea of saving the last update time for each relation is not unreasonable, but you can't store the results by updating pg_class. My first thought would be to send this information as a message to the statistics collector. It's already being sent updates at the point you're interested in for the counters of how many INSERT/UPDATE/DELETE statements are executing against the table. You might bundle your last update information into that existing message with minimal overhead. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
2010/6/14 Greg Smith <greg@2ndquadrant.com>: > Pavel Baros wrote: >> >> After each INSERT, UPDATE, DELETE statement (transaction) >> pg_class.rellastxid would be updated. That should not be time- or memory- >> consuming (not so much) since pg_class is cached, I guess. > > An update in PostgreSQL is essentially an INSERT followed a later DELETE > when VACUUM gets to the dead row no longer visible. The problem with this > approach is that it will leave behind so many dead rows in pg_class due to > the heavy updates that the whole database could grind to a halt, as so many > operations will have to sort through all that garbage. It could potentially > double the total write volume on the system, and you'll completely kill > people who don't have autovacuum running during some periods of the day. > > The basic idea of saving the last update time for each relation is not > unreasonable, but you can't store the results by updating pg_class. My > first thought would be to send this information as a message to the > statistics collector. It's already being sent updates at the point you're > interested in for the counters of how many INSERT/UPDATE/DELETE statements > are executing against the table. You might bundle your last update > information into that existing message with minimal overhead. Right. Do remember that the stats collector is designed to be lossy, though, so you're not guaranteed that the information reaches the other end. In reality it tends to do that, but there needs to be some sort of recovery path for the case when it doesn't. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Mon, Jun 14, 2010 at 5:00 AM, Magnus Hagander <magnus@hagander.net> wrote: > 2010/6/14 Greg Smith <greg@2ndquadrant.com>: >> Pavel Baros wrote: >>> >>> After each INSERT, UPDATE, DELETE statement (transaction) >>> pg_class.rellastxid would be updated. That should not be time- or memory- >>> consuming (not so much) since pg_class is cached, I guess. >> >> An update in PostgreSQL is essentially an INSERT followed a later DELETE >> when VACUUM gets to the dead row no longer visible. The problem with this >> approach is that it will leave behind so many dead rows in pg_class due to >> the heavy updates that the whole database could grind to a halt, as so many >> operations will have to sort through all that garbage. It could potentially >> double the total write volume on the system, and you'll completely kill >> people who don't have autovacuum running during some periods of the day. >> >> The basic idea of saving the last update time for each relation is not >> unreasonable, but you can't store the results by updating pg_class. My >> first thought would be to send this information as a message to the >> statistics collector. It's already being sent updates at the point you're >> interested in for the counters of how many INSERT/UPDATE/DELETE statements >> are executing against the table. You might bundle your last update >> information into that existing message with minimal overhead. > > Right. Do remember that the stats collector is designed to be lossy, > though, so you're not guaranteed that the information reaches the > other end. In reality it tends to do that, but there needs to be some > sort of recovery path for the case when it doesn't. What Pavel's trying to do here is be smart about figuring out when an MV needs to be refreshed. I'm pretty sure this is the wrong way to go about it, but it seems entirely premature considering that we don't have a working implementation of a *manually* refreshed MV. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> wrote: > What Pavel's trying to do here is be smart about figuring out when > an MV needs to be refreshed. I'm pretty sure this is the wrong > way to go about it, but it seems entirely premature considering > that we don't have a working implementation of a *manually* > refreshed MV. Agreed all around. At the risk of sounding obsessed, this is an area where predicate locks might be usefully extended, if and when the serializable patch makes it in. -Kevin
> At the risk of sounding obsessed, this is an area where predicate > locks might be usefully extended, if and when the serializable patch > makes it in. Yes, we see your patch in 9.1-first. ;-) -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com