Re: BUG #6041: Unlogged table was created bad in slave node - Mailing list pgsql-bugs
From | Robert Haas |
---|---|
Subject | Re: BUG #6041: Unlogged table was created bad in slave node |
Date | |
Msg-id | BANLkTim433vF5HWjbJ0FSWm_-xA8DDaGNg@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #6041: Unlogged table was created bad in slave node (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #6041: Unlogged table was created bad in slave node
Re: BUG #6041: Unlogged table was created bad in slave node |
List | pgsql-bugs |
On Thu, May 26, 2011 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Excerpts from Euler Taveira de Oliveira's message of jue may 26 12:00:05 -0400 2011: >>> I think we should emit the real cause in those cases, if possible (not too >>> much overhead). The message would be "Unlogged table content is not available >>> in standby server". > >> I guess what it should do is create an empty file in the slave. > > Probably it should, because won't the table malfunction after the slave > is promoted to master, if there's no file at all there? Or will the > process of coming live create an empty file even if there was none? Coming live creates an empty file. > But Euler is pointing out a different issue, which is usability. If the > slave just acts like the table is present but empty, we are likely to > get bug reports about that too. An error telling you you aren't allowed > to access such a table on slaves would be more user-friendly, if we can > do it without too much pain. I looked into this a bit. A few observations: (1) This problem is actually not confined to unlogged tables; temporary tables have the same issue. For example, if you create a temporary table on the master and then, on the slave, do SELECT * FROM pg_temp_3.hi_mom (or whatever the name of the temp schema where the temp table is) you get the same error. In fact I suspect if you took a base backup that included the temporary relation and matched the backend ID you could even manage to read out the old contents (modulo any fun and exciting XID wraparound issues). But the problem is of course more noticeable for unlogged tables since they're not hidden away in a special funny schema. (2) It's somewhat difficult to get a clean fix for this error because it's coming from deep down in the bowels of the system, inside mdnblocks(). At that point, we haven't got a Relation available, just an SMgrRelation, so the information that this relation is unlogged is not really available, at least not without some sort of gross modularity violation like calling smgrexists() on the init fork. It doesn't particularly seem like a good idea to conditionalize the behavior of mdnblocks() on relpersistence anyway; that's a pretty gross modularity violation all by itself. (3) mdnblocks() gets called twice during the process of doing a simple select on a relation - once from the planner, via estimate_rel_size, and again from the executor, via initscan. A fairly obvious fix for this problem is to skip the call to estimate_rel_size() if we're dealing with a temporary or unlogged relation and are in recovery; and make heap_beginscan_internal() throw a suitable error under similar circumstances (or do we want to throw the error at plan time? tossing it out from all the way down in estimate_rel_size() seems a bit wacky). Patch taking this approach attached. (4) It strikes me that it might be possible to address this problem a bit more cleanly by allowing mdnblocks() and smgrnblocks() and RelationGetNumberOfBlocksInFork() to take a boolean argument indicating whether or not an error should be thrown if the underlying physical file happens not to exist. When no error is to be signaled, we simply return 0 when the main fork doesn't exist, rather than throwing an error. We could then make estimate_rel_size() use this variant, eliminating the need for an explicit test in get_relation_info(). I'm sort of inclined to go this route even though it would require touching a bit more code. We've already whacked around RelationGetNumberOfBlocks() a bit in this release (the function that formerly had that name is now RelationGetNumberOfBlocksInFork(), and RelationGetNumberOfBlocks() is now a macro that calls that function w/MAIN_FORKNUM) so it doesn't seem like a bad time to get any other API changes that might be useful out of our system. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-bugs by date: