Re: Remaining beta blockers - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Remaining beta blockers
Date
Msg-id 1367773351.65924.YahooMailNeo@web162905.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: Remaining beta blockers  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Remaining beta blockers  (Kevin Grittner <kgrittn@ymail.com>)
Re: Remaining beta blockers  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> I'm going to submit a modified version of the second patch today.
>> My biggest problems with it as it stands are the name chosen for
>> the new pg_class column, and the hard-coded assumption that this
>> relation-level flag is a good long-term indicator of whether all
>> connections find a matview to be scannable.  Scannability should
>> be abstracted to allow easy addition of other factors as we add
>> them.  Whether or not the "populated" state is in the catalog, it
>> is a serious mistake to conflate that with scannability.
>>
>> Scannability will always be influenced by whether the matview has
>> been populated, but it is short-sighted not to draw a distinction
>> now, so that work people do in their applications is does not have
>> to be redone as we make scannability tests better.  Not to mention
>> the confusion factor of treating them as this patch does and then
>> trying to properly draw the distinction later.  IMV this patch, as
>> it stands, does much more to paint us into a corner regarding
>> future expansion than what came before.
>>
>> As one example, I think it is highly desirable, long term, to allow
>> different sessions to set GUCs to different tolerances for old
>> data, and thus have different perspectives on whether a matview is
>> scannable; and this patch forces that to be the same for every
>> session.  The code I committed allows expansion in the direction of
>> different session perspectives on scannability, and the suggested
>> patch closes that door.
>
> [ raised eyebrow... ]  There is certainly nothing about file-size-based
> state that would particularly support per-session scannability
> determination.

I didn't mean to suggest that there was; I was talking about
enshrining the notion that the relation was either scannable by all
or by none into pg_class.

> If you want to call the pg_class column relispopulated rather
> than relisscannable, I have no particular objection to that

That column name and the wording of some comments are the main
things, although I'm also wondering whether it is bad form to force
users to test the pg_class.relispopulated column if they want to
test whether they can currently scan a matview, by removing the
pg_relation_is_scannable() function.  As I mentioned earlier when
you asked why these two distinct properties weren't both exposed, I
mentioned that I hadn't thought that the "populated" property was
likely to be useful at the SQL level, but then questioned that,
saying that I wasn't sure I picked the right property to pay
attention to in pg_dump - and if pg_dump needed the "populated"
property it had to be exposed.  I've come around to thinking that
it is more proper to use populated, but I have the same question
you asked earlier -- If it will be important or users to understand
that these are distinct properties, why are we just exposing one of
them?

In the absence of a specific use case, maybe it is OK to leave
pg_relation_is_scannable() off for this release, but I worry we'll
encourage fuzzy thinking on the point that could be hard to undo
later.  The flip side of that is that it might be confusing to try
to explain why users should care which test they use before they
are capable of returning different results.  They might "get it"
more easily if the function is introduced at the same time we
introduce other criteria for determining scannability (primarily
based around how current the results are) and other methods for
dealing with stale data (like the ability to force a
concurrency-friendly form of REFRESH on an attempt to query stale
data).

Also, rather than do the direct update to pg_class in pg_dump, how
would you feel about an ALTER MATERIALIZED VIEW option to set the
populated state?  I haven't written that yet, but I figured that
when we went to an in-catalog representation of populated state, we
would want that.

I'm just reviewing the changes I made, and figured it might be good
to show a diff between my form of the patch and yours, but I'm
getting a lot spurious differences based on how we generate our
context diff files (or maybe the versions of some software
involved).  You you share how you generate your patch file?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: 9.3 release notes suggestions
Next
From: Kevin Grittner
Date:
Subject: Re: pg_dump versus materialized views