WIP: getting rid of the pg_database flat file - Mailing list pgsql-hackers

From Tom Lane
Subject WIP: getting rid of the pg_database flat file
Date
Msg-id 2993.1250032280@sss.pgh.pa.us
Whole thread Raw
Responses Re: WIP: getting rid of the pg_database flat file  (Andrew Dunstan <andrew@dunslane.net>)
Re: WIP: getting rid of the pg_database flat file  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
In the discussion of bug #4919 I wrote:
> In some sense this is a bootstrap problem: what does it take to get to
> the point of being able to read pg_database and its indexes?  That is
> necessarily not dependent on the particular database we want to join.
> Maybe we could solve it by having the relcache write a "global" cache
> file containing only entries for the global tables, and load that before
> we have identified the database we want to join (after which, we'll load
> another cache file for the local entries).  It would doubtless take some
> rearrangement of the backend startup sequence, but it doesn't seem
> obviously impossible.

Attached is a proof-of-concept patch which shows that this idea makes it
possible to start backends without the pg_database flat file, and that the
required search of pg_database can be done with an index as long as we
have the shared relcache cache file available (which should always be true
except for the first backend start after postmaster bootup or crash
recovery).  There are a few loose ends yet to fix, but on the whole it
was easier than I expected.  The main costs of doing it this way are:

* pg_database has to become a nailed-in-cache relation, as does its
index on datname.  (Its index on OID will have to be nailed too, unless
we can get rid of the kluge that lets autovacuum give InitPostgres a
database OID instead of database name.  I have not looked at autovacuum
yet.)  This doesn't really cost anything except a few more bytes in the
relcache ... and in reality I suspect pg_database is always in that
cache anyway.

* We have to have a Schema_pg_database macro in pg_attribute.h.  This
means a little more hand maintenance (unless we accept Robert Haas'
patch to autogenerate all that stuff); but it's still not a big problem.

I think this is clearly worth cleaning up and committing, since even
without any further progress it eliminates number-of-databases as a
significant factor in backend startup time.  Does anyone have any
objection to the above side-effects?

To actually get rid of the pg_database flat file, we'd need to take the
further step of teaching the AV launcher to read pg_database for itself,
or else refactor things so that the AV workers can do that for it.
(Alvaro, any comments about the best way to proceed there?)

I'd also like to look into getting rid of the pg_auth flat file.
As previously noted, that means postponing client auth to later in the
startup sequence.  If we were willing to eliminate role membership as an
available piece of information for auth method selection, we could still
do much of the auth work before initializing the backend proper; in
particular we could issue a password challenge and wait for a response,
which would be good in terms of reducing our exposure to lightweight DDOS
attacks.  I'm not sure if anyone will think that's a good tradeoff though,
since any attacker who can connect to the postmaster port can probably
DDOS the postmaster just fine anyway.

Comments?

            regards, tom lane


Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Re: pgindent timing (was Re: [COMMITTERS] pgsql: Refactor NUM_cache_remove calls in error report path to a PG_TRY)
Next
From: Albert Cervera i Areny
Date:
Subject: Re: Alpha 1 release notes