Re: Fixing flat user/group files at database startup - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Fixing flat user/group files at database startup |
Date | |
Msg-id | 23853.1107625330@sss.pgh.pa.us Whole thread Raw |
In response to | Fixing flat user/group files at database startup (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Fixing flat user/group files at database startup
|
List | pgsql-hackers |
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Fri, Feb 04, 2005 at 03:16:33PM -0500, Tom Lane wrote: >> Michael Klatt reported here: >> http://archives.postgresql.org/pgsql-admin/2005-02/msg00031.php >> that we have problems because the flat files global/pg_pwd >> and global/pg_group aren't rebuilt following WAL recovery. > I thought that maybe we could reconstruct the file using the previous > file and the WAL entry, but then I noticed that we don't emit a special > WAL entry for user create/delete/update, so this would also require a > lot of new code. What of mods that later get rolled back? Seems pretty messy to do it that way, even if there were adequate support in the WAL mechanisms. >> One idea I'm toying with is to try to make something like >> GetRawDatabaseInfo but not as klugy. > If you make something similar to GetRawDatabaseInfo, then you don't need > the plain files at all, do you? By doing that, a lot of code could go > away. Not unless we're willing to let the postmaster execute the GetRawDatabaseInfo substitute, which I think is a nonstarter on reliability grounds. I'm actually thinking more in the other direction: get rid of GetRawDatabaseInfo in its present form, instead relying on a flat-file representation of pg_database to let new backends find out the OID and tablespace of their target database. That would let us fix many of the weird corner cases in which GetRawDatabaseInfo fails (see the knee-jerk recommendation to "VACUUM pg_database and CHECKPOINT" that we make any time someone reports they can't log into one specific database). The trick is to have infrastructure that makes the flat files more reliable than they are now. >> But going in this direction would require writing a fair amount of new >> code, probably too much to consider backpatching into 8.0.*. > If you don't backport this into 8.0, then 8.0 will be broken forever? [ shrug... ] This bug has been there since 7.1; it just has a greater chance of biting someone in 8.0 than it did before. If we have to document "do something to force a pg_shadow update" as part of the PITR recovery process in 8.0.*, well, it's ugly but IMHO it beats putting a large amount of poorly tested code into a dot-release. I've been fooling with trying to make a small kluge that would be reasonable to back-port, but not having a lot of luck so far. What I've got at the moment is this bit of code to be called at the end of the BS_XLOG_STARTUP case in bootstrap.c: /** This routine is called once during database startup, after completing* WAL replay if needed. Its purpose is to syncthe flat files with the* current state of the pg_shadow and pg_group tables. This is particularly* important duringPITR operation, since the flat files will come from the* base backup which may be far out of sync with the currentstate.** In theory we could skip this if no WAL replay occurred, but it seems* safest to just do it always.*/ void BuildUserGroupFiles(void) { RelFileNode rnode; Relation rel; /* use a fake relcache similar to WAL recovery */ XLogInitRelationCache(); /* need to have a resource owner too to keep heapscan machinery happy */ CurrentResourceOwner = ResourceOwnerCreate(NULL,"BuildUserGroupFiles"); /* hard-wired path to pg_shadow */ rnode.spcNode = GLOBALTABLESPACE_OID; rnode.dbNode = 0; rnode.relNode = RelOid_pg_shadow; rel = XLogOpenRelation(true, 0, rnode); write_user_file(rel); /* hard-wired path to pg_group */ rnode.spcNode = GLOBALTABLESPACE_OID; rnode.dbNode = 0; rnode.relNode = RelOid_pg_group; rel = XLogOpenRelation(true, 0, rnode); write_group_file(rel); XLogCloseRelationCache(); } However this still dumps core because write_user_file and write_group_file expect to be able to use heap_getattr, which needs to have a tupdesc, which the phony relcache entries made by XLogOpenRelation haven't got. We could fix that by consing up tupdescs from fixed data in pg_attribute.h (a la the formrdesc'd reldescs in relcache.c) but that's pretty messy. I gave up after thinking ahead and realizing that it will still fail miserably on out-of-line toasted datums, which write_group_file at least has got to be able to cope with --- the grolist array might be large enough to require toasting. The WAL recovery environment has definitely not got the infrastructure needed by the tuple toaster. What I am thinking is that we have to punt on this problem until the proposed pg_role catalog is in place. With the grolist array representation of group membership replaced by a fixed-width pg_role_members catalog, there would be no need to deal with any potentially-toasted columns while extracting the data needed for the flat files. regards, tom lane
pgsql-hackers by date: