Thread: GSoC - Materialized Views - is stale or fresh?

GSoC - Materialized Views - is stale or fresh?

From
Pavel Baros
Date:
I am curious how could I solve the problem:

During refreshing I would like to know, if MV is stale or fresh? And I 
had an idea:

In fact, MV need to know if its last refresh (transaction id) is older 
than any INSERT, UPDATE, DELETE transaction launched against source 
tables. So if MV has information about last (highest) xmin in source 
tables, it could simply compare its own xmin to xmins (xmax for deleted 
rows) from source tables and decide, if is stale or fresh.

Whole realization could look like this:    1. Make new column in pg_class (or somewhere in pg_stat* ?):
pg_class.rellastxid(of type xid)
 
    2. After each INSERT, UPDATE, DELETE statement (transaction) 
pg_class.rellastxid would be updated. That should not be time- or 
memory- consuming (not so much) since pg_class is cached, I guess.
    3. When refreshing, as described above, MV rellastxid compared to 
source tables rellastxid could answer if MV is stale or still fresh. 
Decision, if to run refreshing, would be as simple as it can.


a) Is the idea right?

b) Could appear some cases when it is not true? (except xid wraparound).

c) I was looking for some help with it in pg_stat*, but there is no 
information about transactions, last changes in relations or anything.

d) or there are other mechanisms or ideas how to check if MV source 
tables are changed from last refresh?


Thanks for replies

Pavel Baros


Re: GSoC - Materialized Views - is stale or fresh?

From
Greg Smith
Date:
Pavel Baros wrote:
> After each INSERT, UPDATE, DELETE statement (transaction) 
> pg_class.rellastxid would be updated. That should not be time- or 
> memory- consuming (not so much) since pg_class is cached, I guess.

An update in PostgreSQL is essentially an INSERT followed a later DELETE 
when VACUUM gets to the dead row no longer visible.  The problem with 
this approach is that it will leave behind so many dead rows in pg_class 
due to the heavy updates that the whole database could grind to a halt, 
as so many operations will have to sort through all that garbage.  It 
could potentially double the total write volume on the system, and 
you'll completely kill people who don't have autovacuum running during 
some periods of the day.

The basic idea of saving the last update time for each relation is not 
unreasonable, but you can't store the results by updating pg_class.  My 
first thought would be to send this information as a message to the 
statistics collector.  It's already being sent updates at the point 
you're interested in for the counters of how many INSERT/UPDATE/DELETE 
statements are executing against the table.  You might bundle your last 
update information into that existing message with minimal overhead.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: GSoC - Materialized Views - is stale or fresh?

From
Magnus Hagander
Date:
2010/6/14 Greg Smith <greg@2ndquadrant.com>:
> Pavel Baros wrote:
>>
>> After each INSERT, UPDATE, DELETE statement (transaction)
>> pg_class.rellastxid would be updated. That should not be time- or memory-
>> consuming (not so much) since pg_class is cached, I guess.
>
> An update in PostgreSQL is essentially an INSERT followed a later DELETE
> when VACUUM gets to the dead row no longer visible.  The problem with this
> approach is that it will leave behind so many dead rows in pg_class due to
> the heavy updates that the whole database could grind to a halt, as so many
> operations will have to sort through all that garbage.  It could potentially
> double the total write volume on the system, and you'll completely kill
> people who don't have autovacuum running during some periods of the day.
>
> The basic idea of saving the last update time for each relation is not
> unreasonable, but you can't store the results by updating pg_class.  My
> first thought would be to send this information as a message to the
> statistics collector.  It's already being sent updates at the point you're
> interested in for the counters of how many INSERT/UPDATE/DELETE statements
> are executing against the table.  You might bundle your last update
> information into that existing message with minimal overhead.

Right. Do remember that the stats collector is designed to be lossy,
though, so you're not guaranteed that the information reaches the
other end. In reality it tends to do that, but there needs to be some
sort of recovery path for the case when it doesn't.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: GSoC - Materialized Views - is stale or fresh?

From
Robert Haas
Date:
On Mon, Jun 14, 2010 at 5:00 AM, Magnus Hagander <magnus@hagander.net> wrote:
> 2010/6/14 Greg Smith <greg@2ndquadrant.com>:
>> Pavel Baros wrote:
>>>
>>> After each INSERT, UPDATE, DELETE statement (transaction)
>>> pg_class.rellastxid would be updated. That should not be time- or memory-
>>> consuming (not so much) since pg_class is cached, I guess.
>>
>> An update in PostgreSQL is essentially an INSERT followed a later DELETE
>> when VACUUM gets to the dead row no longer visible.  The problem with this
>> approach is that it will leave behind so many dead rows in pg_class due to
>> the heavy updates that the whole database could grind to a halt, as so many
>> operations will have to sort through all that garbage.  It could potentially
>> double the total write volume on the system, and you'll completely kill
>> people who don't have autovacuum running during some periods of the day.
>>
>> The basic idea of saving the last update time for each relation is not
>> unreasonable, but you can't store the results by updating pg_class.  My
>> first thought would be to send this information as a message to the
>> statistics collector.  It's already being sent updates at the point you're
>> interested in for the counters of how many INSERT/UPDATE/DELETE statements
>> are executing against the table.  You might bundle your last update
>> information into that existing message with minimal overhead.
>
> Right. Do remember that the stats collector is designed to be lossy,
> though, so you're not guaranteed that the information reaches the
> other end. In reality it tends to do that, but there needs to be some
> sort of recovery path for the case when it doesn't.

What Pavel's trying to do here is be smart about figuring out when an
MV needs to be refreshed.  I'm pretty sure this is the wrong way to go
about it, but it seems entirely premature considering that we don't
have a working implementation of a *manually* refreshed MV.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: GSoC - Materialized Views - is stale or fresh?

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> What Pavel's trying to do here is be smart about figuring out when
> an MV needs to be refreshed.  I'm pretty sure this is the wrong
> way to go about it, but it seems entirely premature considering
> that we don't have a working implementation of a *manually*
> refreshed MV.
Agreed all around.
At the risk of sounding obsessed, this is an area where predicate
locks might be usefully extended, if and when the serializable patch
makes it in.
-Kevin


Re: GSoC - Materialized Views - is stale or fresh?

From
Josh Berkus
Date:
> At the risk of sounding obsessed, this is an area where predicate
> locks might be usefully extended, if and when the serializable patch
> makes it in.

Yes, we see your patch in 9.1-first.  ;-)


--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com