Re: BUG #6041: Unlogged table was created bad in slave node - Mailing list pgsql-bugs

From Robert Haas
Subject Re: BUG #6041: Unlogged table was created bad in slave node
Date
Msg-id BANLkTim433vF5HWjbJ0FSWm_-xA8DDaGNg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #6041: Unlogged table was created bad in slave node  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #6041: Unlogged table was created bad in slave node
Re: BUG #6041: Unlogged table was created bad in slave node
List pgsql-bugs
On Thu, May 26, 2011 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Excerpts from Euler Taveira de Oliveira's message of jue may 26 12:00:05 -0400 2011:
>>> I think we should emit the real cause in those cases, if possible (not too
>>> much overhead). The message would be "Unlogged table content is not available
>>> in standby server".
>
>> I guess what it should do is create an empty file in the slave.
>
> Probably it should, because won't the table malfunction after the slave
> is promoted to master, if there's no file at all there?  Or will the
> process of coming live create an empty file even if there was none?

Coming live creates an empty file.

> But Euler is pointing out a different issue, which is usability.  If the
> slave just acts like the table is present but empty, we are likely to
> get bug reports about that too.  An error telling you you aren't allowed
> to access such a table on slaves would be more user-friendly, if we can
> do it without too much pain.

I looked into this a bit.  A few observations:

(1) This problem is actually not confined to unlogged tables;
temporary tables have the same issue.  For example, if you create a
temporary table on the master and then, on the slave, do SELECT * FROM
 pg_temp_3.hi_mom (or whatever the name of the temp schema where the
temp table is) you get the same error.  In fact I suspect if you took
a base backup that included the temporary relation and matched the
backend ID you could even manage to read out the old contents (modulo
any fun and exciting XID wraparound issues).  But the problem is of
course more noticeable for unlogged tables since they're not hidden
away in a special funny schema.

(2) It's somewhat difficult to get a clean fix for this error because
it's coming from deep down in the bowels of the system, inside
mdnblocks().  At that point, we haven't got a Relation available, just
an SMgrRelation, so the information that this relation is unlogged is
not really available, at least not without some sort of gross
modularity violation like calling smgrexists() on the init fork.  It
doesn't particularly seem like a good idea to conditionalize the
behavior of mdnblocks() on relpersistence anyway; that's a pretty
gross modularity violation all by itself.

(3) mdnblocks() gets called twice during the process of doing a simple
select on a relation - once from the planner, via estimate_rel_size,
and again from the executor, via initscan.  A fairly obvious fix for
this problem is to skip the call to estimate_rel_size() if we're
dealing with a temporary or unlogged relation and are in recovery; and
make heap_beginscan_internal() throw a suitable error under similar
circumstances (or do we want to throw the error at plan time?  tossing
it out from all the way down in estimate_rel_size() seems a bit
wacky).  Patch taking this approach attached.

(4) It strikes me that it might be possible to address this problem a
bit more cleanly by allowing mdnblocks() and smgrnblocks() and
RelationGetNumberOfBlocksInFork() to take a boolean argument
indicating whether or not an error should be thrown if the underlying
physical file happens not to exist.  When no error is to be signaled,
we simply return 0 when the main fork doesn't exist, rather than
throwing an error.  We could then make estimate_rel_size() use this
variant, eliminating the need for an explicit test in
get_relation_info().  I'm sort of inclined to go this route even
though it would require touching a bit more code.  We've already
whacked around RelationGetNumberOfBlocks() a bit in this release (the
function that formerly had that name is now
RelationGetNumberOfBlocksInFork(), and RelationGetNumberOfBlocks() is
now a macro that calls that function w/MAIN_FORKNUM) so it doesn't
seem like a bad time to get any other API changes that might be useful
out of our system.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: BUG #6022: Postgre84+RHEL6+Veritas file system?
Next
From: Robert Haas
Date:
Subject: Re: BUG #5984: Got FailedAssertion("!(opaque->btpo_prev == target)", File: "nbtpage.c", Line: 1166)