Re: pg_dump versus materialized views - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: pg_dump versus materialized views
Date
Msg-id 1367778152.74126.YahooMailNeo@web162903.mail.bf1.yahoo.com
Whole thread Raw
In response to pg_dump versus materialized views  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Consider the following scenario:
>
> create table t1 as select 1 as x;
> create materialized view mv1 as select * from t1;
> create materialized view mv2 as select * from mv1;
> create materialized view mv3 as select * from mv2;
> refresh materialized view mv1 with no data;
>
> This state is not dumpable by pg_dump, because there's no way to REFRESH
> mv2 or mv3 as long as mv1 hasn't been populated.  (Well, I suppose we
> could REFRESH mv1, then mv2, then mv3, then mv1 again WITH NO DATA; but
> do we really want to do that?  pg_dump is certainly unable to do such
> a thing today, anyway.)
>
> Currently, what pg_dump does about this is to silently suppress the
> REFRESH commands for mv2 and mv3.  Well, that's nice in terms of not
> having a dump script that fails, but I'm not sure it meets the principle
> of least astonishment.
>
> Now, this is only an issue if you assume that pg_dump's charter is to
> reproduce exactly the set of populated/non-populated MV states in the
> source database.  Considering that it has little hope of exactly
> reproducing their contents, I'm not sure that that charter makes much
> sense anyway.
>
> I've thought for some time that, given that it can't reproduce the MV
> states exactly, pg_dump shouldn't even try.  I think it would be more
> useful to have two operating modes selectable by command line switch:
> refresh all matviews, or refresh none of them.
>
> Or maybe there's some other better way to approach it.
>
> Thoughts?

This was discussed some time ago, and discussion then petered out
without any clear consensus.  I think there was some support for a
pg_dump (and pg_restore?) setting for --populate-matviews = {none,
all, conditional} or some such, where "conditional" would be the
current behavior.  There was a feeling that people should be able
to dodge populating individual expensive matviews if desired, to
limit restore time and schedule population during off-hours.  There
was also a feeling that it would be bad to have data missing on the
target which was present on the source for the normal case where
intermediate matviews don't exist or are populated.

Certainly it is possible for the application of pg_dump output to a
database to generate a state which doesn't exactly match the source
in other regards, based on what users and extensions are present on
the target, but for matview population this is based on internal
state of the source, so it is different.  I don't think anyone saw
a clear precedent.

Throwing away the results in an intermediate matview while keeping
results generated off of it seems like an infrequent case, so one
question is whether we do something odd in 99% of the cases just to
be more consistent with the 1% case.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Remaining beta blockers
Next
From: Kevin Grittner
Date:
Subject: Re: Remaining beta blockers