Thread: Resurrecting pg_upgrade
I am planning to solve the ancient problem of updating to a new major version without dump/reload, by means of writing a new, more bulletproof implementation of Bruce's old pg_upgrade script. Here are some design notes --- please comment. The upgrade scenario -------------------- I envision the upgrade process as working like this: 1. Shut down old postmaster. 2. If planning to install over old software, move at least the old "postgres" executableout of the way. 3. Install new-version Postgres software. 4. Run pg_upgrade (of new version). It will requireaccess to the old postgres executable, as well as the new postgres and pg_dump executables. 5. (optional) Renamedata directory --- see below. 6. Start new postmaster. 7. When satisfied that upgrade was successful, rm -rf old $PGDATA. pg_upgrade will build a new $PGDATA directory without changing the contents of the old one in any way --- this policy ensures that you're not screwed if the upgrade fails for some reason. A disadvantage is that you need a place to put the new $PGDATA that's different from the old. Of course you can manually rename the directories before or after running pg_upgrade, but does anyone feel that's not acceptable? We could make this simpler if we were willing to let pg_upgrade move the old files out of the way, but that loses the "no damage to old installation" property. (But I think Oliver was suggesting adopting version-specific pathnames for PGDATA, in which case this is really no issue anyway.) What pg_upgrade itself does is basically: 1. Run "pg_dumpall --schema-only" (or a near equivalent thereof) to obtain the schema info for the old installation.2. initdb the new installation. 3. Load the above script into the new installation. 4. Physically link (hard-link)user table and index files into the new PGDATA directory, replacing the empty files created by loading theschema. Since we use physical links, there is little growth of total disk space used, and yet the old PGDATA directory remains unscathed. There are some additional steps and special considerations, which are elaborated on below, but that's the basic idea. Note that it should be possible to upgrade even very large databases in a short time, since the user table contents are not scanned. In principle, since pg_dump can dump schemas from several versions back, this scheme allows upgrades to jump across several versions, not just one. The only hard restriction in the current plan is that the contents of user table files cannot change. We could support changes in contents of indexes (for example, 7.3 to 7.4 btree or hash index migration) by issuing a REINDEX instead of moving the old index contents. However, this loses the potential for pg_upgrade to be fast and not chew much additional disk space. It'd still beat a dump/reload though. Currently the no-table-contents-changes restriction keeps us from upgrading from versions older than 7.4 anyway (since type NUMERIC had its on-disk representation changed in 7.4). We could possibly upgrade 7.3 databases that contain no NUMERIC columns, if we take the hit of rebuilding indexes. But is it worth bothering with? Communication ------------- I think it's important to be able to run pg_upgrade with the postmaster shut down. Otherwise there is too much risk that some other user will change the database while we are working. The original pg_upgrade script left it to the DBA to ensure this wouldn't happen, but that seems like a foot-gun of much too large caliber. Instead, all operations should be done through a standalone backend. An additional advantage of doing it this way is that a standalone backend is implicitly superuser, and so no SQL permissions issues will get in the way. This makes it difficult to use pg_dump, however, since pg_dump doesn't know how to talk to a standalone backend. I propose the following solution: 1. The backend should be modified so that a standalone backend is capable of interacting with a client using the normal FE/BE protocol over stdin/stdout (this would be selected with a command line switch to override the existing standalone-backend behavior). 2. libpq should be extended so that one way to contact the database server is to fork/exec a standalone backend and communicate with it using the above facility. There are any number of ways we could design this feature, but one convenient way is that a PGHOST value starting with a pipe symbol "|" could be taken to mean doing this, with the rest of the string providing the postgres executable's name and possibly command-line options. libpq would tack on the switch needed for FE/BE protocol and the database name to connect to, and exec() the result. This would allow both pg_dump and psql to be used with standalone backends. There are probably other uses for such a feature besides pg_upgrade's need. Does this approach create any problems for the Windows port? I'm assuming we can set up pipes in both directions between a parent and child process --- is that available in Windows? Gotchas ------- To do the physical relinking of table files, pg_upgrade will need to discover the mapping from old installation's DB OIDs and relfilenode numbers to new ones. This mapping will get even more complex when tablespaces enter the picture; but it seems very doable. Beware of old files that are symlinks to someplace else; need to replicate the symlink if so. What should happen if hard link fails because of cross-file-system reference (most likely because a whole database subdir was symlinked)? May be okay to punt, or make a symlink, but the latter raises a risk that rm -rf OLDPGDATA will break things. Do we have any supported platforms that don't implement hard-links (perhaps Windows??) WAL: need to do pg_resetxlog in new installation to bring WAL end up to or beyond end of old WAL. This ensures LSN values in user table page headers will be considered valid. Do not need to worry about copying old XLOG, fortunately. pg_clog: need to copy old installation's clog (and hence, its XID counter) so that xact status in existing user tables will be valid. This means doing the resetxlog and clog copy immediately after initdb's VACUUM FREEZE, before we start loading the old schema. Note we do NOT need to vacuum the copied-over tables. OIDs: it's not essential, but likely a good idea to make the OID counter match the old installation's too. pg_control: make sure all fields of new pg_control match old, e.g. check that compile-time options are the same. This will require knowledge of past and present pg_control layouts, but that seems tolerable. Large objects: not a problem, just migrate pg_largeobject contents as though it were a user table. Since LOs will thereby retain their old OIDs, we don't need to worry about fixing references to them from user tables. TOAST tables need to retain the same OID they had before, since this OID is recorded in TOAST external references, which we don't want to update. It should not be impossible to adjust their OIDs after loading the schema definition, but it surely has potential to be tedious. (NB: without UPDATE SET oid = val, it won't be possible to do this via a SQL command; might be better to attack that than develop a low-level hack. Are there any other cases where pg_upgrade needs to massage the new installation at levels below SQL? User table contents upgrade is one, but we're not gonna support that in the short term anyway.) What do we do if desired OID conflicts with some existing table? Renumbering a table in the general case is theoretically possible, but there are far more things to worry about than for TOAST tables (eg, references in stored rules...) Best solution might be to examine all the TOAST table OIDs before deciding where to set the new installation's OID counter. Also possibly detect case where old TOAST table is empty, and don't bother migrating if so, thus reducing number of cases where collision can occur. Note: we do not change the *name* of a toast table, since that's derived from its parent table's OID not its own. No need to touch the toast table's index, either. We do have to update reltoastrelid in parent. Also note mapping from old DB to new can't rely on table names for matching TOAST tables; have to drive it off the parent tables. TOAST valueid OIDs are not a problem, since we'll migrate the toast-table contents in toto. What about migrating config files? In particular, changes in names or semantics of pre-existing config vars seem like a big issue. First cut: just copy the files. Second cut: extract non-default values from old file, insert into new file (so as to preserve comments about variables that didn't exist in old version). We could imagine adding smarts about specific variable names here, if particular variables change in ways that we can deal with specially. Future config-file changes such as changing the syntax of pg_hba.conf would need to come with update code that can reformat the old file; or perhaps in some cases we'd have to punt and leave it to manual updating. Thoughts? regards, tom lane
Tom Lane wrote: > I think it's important to be able to run pg_upgrade with the > postmaster shut down. Otherwise there is too much risk that some > other user will change the database while we are working. The > original pg_upgrade script left it to the DBA to ensure this wouldn't > happen, but that seems like a foot-gun of much too large caliber. > Instead, all operations should be done through a standalone backend. > An additional advantage of doing it this way is that a standalone > backend is implicitly superuser, and so no SQL permissions issues > will get in the way. This would also be a nice solution for people who want a standalone, server-less database system. But for the purpose of pg_upgrade it seems like a lot of work for what could just be a magic switch in the postmaster to really kick everyone else out. > What about migrating config files? In particular, changes in names > or semantics of pre-existing config vars seem like a big issue. > First cut: just copy the files. > Second cut: extract non-default values from old file, insert into > new file (so as to preserve comments about variables that didn't > exist in old version). > We could imagine adding smarts about specific variable names here, > if particular variables change in ways that we can deal with > specially. I would be very careful about making too many smart guesses when upgrading configuration files. This can get really annoying for users who expect it to behave just a little bit differently. Or you get conflicts of authority with packaging tools. Making adjustments because of syntax or name changes is OK, but everything else must be evaluated carefully.
re Windows: pipes, yes, hard links, no (and no sane symlinks either) - also of course no (sane) shell - is this going to be a script or a C program? Maybe use an option which you would disable on Windows to copy the files instead of hardlinking them. Yes it would take lots more time and space, but copying raw files would surely still be a lot faster than loading the dump. cheers andew Tom Lane wrote: [snip whole lotta good stuff]
Andrew Dunstan <andrew@dunslane.net> writes: > re Windows: pipes, yes, hard links, no (and no sane symlinks either) - > also of course no (sane) shell - is this going to be a script or a C > program? C, certainly. > Maybe use an option which you would disable on Windows to copy the files > instead of hardlinking them. Yes it would take lots more time and space, > but copying raw files would surely still be a lot faster than loading > the dump. Yeah, that's what we'll have to do if there's no hard-link capability. regards, tom lane
On Fri, 2003-12-12 at 14:51, Andrew Dunstan wrote: > re Windows: pipes, yes, hard links, no (and no sane symlinks either) Actually, NTFS does support hard links, there is just no support for it in any MS file management GUI. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnfiles/html/ntfs5.asp >From the above link: BOOL CreateHardLink( LPCTSTR lpFileName, LPCTSTR lpExistingFileName, LPSECURITY_ATTRIBUTESlpSecurityAttributes ); > Maybe use an option which you would disable on Windows to copy the files > instead of hardlinking them. Yes it would take lots more time and space, > but copying raw files would surely still be a lot faster than loading > the dump. I think this would be a good feature even without hard link problems. If I am a paranoid admin, and I can afford the time and disk space required, I would want to keep a complete copy of my database, even after the new server is up and running.
On Fri, 12 Dec 2003, Peter Eisentraut wrote: > Tom Lane wrote: > > I think it's important to be able to run pg_upgrade with the > > postmaster shut down. Otherwise there is too much risk that some > > other user will change the database while we are working. The > > original pg_upgrade script left it to the DBA to ensure this wouldn't > > happen, but that seems like a foot-gun of much too large caliber. > > Instead, all operations should be done through a standalone backend. > > An additional advantage of doing it this way is that a standalone > > backend is implicitly superuser, and so no SQL permissions issues > > will get in the way. > > This would also be a nice solution for people who want a standalone, > server-less database system. But for the purpose of pg_upgrade it > seems like a lot of work for what could just be a magic switch in the > postmaster to really kick everyone else out. I was going to ask that one, in relation to a 'READONLY' mode that only the superuse can do writes (ie. single master, multi slave with slaves used as readonly databases), but ... how often do ppl write apps that connect as the superuse in their normal interactions? I know in our case, even though I should know better, we still do it to a large extent ... so how would you know who to legitimately 'kick out'? Hrmmmmm ... how about a READ ONLY EXCEPT FROM <IP> mode? In the case of a slave, the IP would be the master server itself ... in the case of pg_upgrade, it would be localhost? Then again, in the case of pg_upgrade, wouldn't just disabling access from anywhere except localhost prevent others from getting in? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane wrote: >> Instead, all operations should be done through a standalone backend. > This would also be a nice solution for people who want a standalone, > server-less database system. But for the purpose of pg_upgrade it > seems like a lot of work for what could just be a magic switch in the > postmaster to really kick everyone else out. I don't think the approach I proposed is really materially harder than convincing the postmaster to boot everyone else out. (For one thing, I'm not sure how the postmaster could reliably distinguish "you" from "everyone else", bearing in mind that "you" will be needing to make multiple connections to the old database.) I also like the fact that using a standalone backend dodges all issues about user permissions and whether pg_hba.conf will let you connect to a particular database. >> We could imagine adding smarts about specific variable names here, >> if particular variables change in ways that we can deal with >> specially. > I would be very careful about making too many smart guesses when > upgrading configuration files. Agreed. That was more in the line of speculation than something I wanted to do in the near term. It does mean that people will need to rein in the urge to rename configuration variables ;-) regards, tom lane
"Marc G. Fournier" <scrappy@postgresql.org> writes: > Then again, in the case of pg_upgrade, wouldn't just disabling access from > anywhere except localhost prevent others from getting in? Not if your normal operating mode includes connections from clients running locally. I really don't see any clean way to ensure that pg_upgrade (and subsidiary pg_dump runs invoked by it) are the only ones allowed to connect to the database, if we keep the normal postmaster running. But if we shut down the postmaster then it's trivial. regards, tom lane
Matthew T. O'Connor wrote: >On Fri, 2003-12-12 at 14:51, Andrew Dunstan wrote: > > >>re Windows: pipes, yes, hard links, no (and no sane symlinks either) >> >> > >Actually, NTFS does support hard links, there is just no support for it >in any MS file management GUI. > >http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnfiles/html/ntfs5.asp > >[snip] > > I learn something new every day! :-) I guess we *are* specifying that only NTFS will be supported on Windows? (I saw someone the other day running XP on FAT - I couldn't believe it!) > > >>Maybe use an option which you would disable on Windows to copy the files >>instead of hardlinking them. Yes it would take lots more time and space, >>but copying raw files would surely still be a lot faster than loading >>the dump. >> >> > >I think this would be a good feature even without hard link problems. >If I am a paranoid admin, and I can afford the time and disk space >required, I would want to keep a complete copy of my database, even >after the new server is up and running. > > > I agree. A little paranoia never hurt anyone. Of course, you could always back it up manually beforehand too. cheers andrew
"Matthew T. O'Connor" <matthew@zeut.net> writes: > On Fri, 2003-12-12 at 14:51, Andrew Dunstan wrote: >> Maybe use an option which you would disable on Windows to copy the files >> instead of hardlinking them. > I think this would be a good feature even without hard link problems. > If I am a paranoid admin, and I can afford the time and disk space > required, I would want to keep a complete copy of my database, even > after the new server is up and running. That's a good point --- if the upgrade appears to work, but when you actually start the new postmaster there's some incompatibility that results in corruption of your user table files, then you're screwed under a hard link approach. Even though your old $PGDATA directory structure is still intact, the files it contains are corrupted. (Even if they're not corrupt, but you hit some other reason for backing out the update, you probably can't, because very very soon your old WAL and clog will be irretrievably out of date compared to the data files.) Okay, so we want an option to copy even if we could hard link. No problem. Alternative thought: just recommend that if possible, people take a filesystem dump of their old PGDATA directory after stopping the old postmaster. This would be sufficient for retreating to the prior version if needed. It might or might not be slower than copying all the files to a new PGDATA ... regards, tom lane
On Fri, 2003-12-12 at 15:42, Tom Lane wrote: > Alternative thought: just recommend that if possible, people take > a filesystem dump of their old PGDATA directory after stopping > the old postmaster. This would be sufficient for retreating to > the prior version if needed. It might or might not be slower > than copying all the files to a new PGDATA ... Certainly the easier path code wise :-) Being the belt, suspenders and steel tip boots (foot gun protection) type that I am, I would make a backup even if pg_upgrade copies all the data files. Having pg_upgrade copy the data files give you an extra layer of protection if desired, and can possibly save an admin who fails to get a good backup of the old PGDATA for what ever reason.
Matthew T. O'Connor wrote: >On Fri, 2003-12-12 at 15:42, Tom Lane wrote: > > >>Alternative thought: just recommend that if possible, people take >>a filesystem dump of their old PGDATA directory after stopping >>the old postmaster. This would be sufficient for retreating to >>the prior version if needed. It might or might not be slower >>than copying all the files to a new PGDATA ... >> >> > >Certainly the easier path code wise :-) Being the belt, suspenders and >steel tip boots (foot gun protection) type that I am, I would make a >backup even if pg_upgrade copies all the data files. Having pg_upgrade >copy the data files give you an extra layer of protection if desired, >and can possibly save an admin who fails to get a good backup of the old >PGDATA for what ever reason. > > > I'd be in favor of a prompt at the beginning of the script. "Have made a copy of the PGDATA directory?" If answered no, then ask for a confirmation to proceed without backup? To skip the prompt have an option for '--skip-prompt' for those who are a little more sure of themselves or want to write a more automated script for this process. This approach gives more flexibility as there may not be sufficient storage available for double the existing database size for conversion on that mount point / disk. The admin doing the upgrade can copy the existing database wherever they need it: tape, another filesystem, NFS mount, etc. -- Thomas Swan
Why not go the other way. 1) Dump the schemas. 2) Initdb with the new schemas in a tmp PGDATA 3) backup the schemas in the current PGDATA 4) move the new schemas from the new db into the current one. This means that doing an update you would only have to have space for the system catalogs not the whole database. How about a postmaster switch to point to the location of the system catalogs files separate from the data files ? On Fri, 2003-12-12 at 16:14, Thomas Swan wrote: > Matthew T. O'Connor wrote: > > >On Fri, 2003-12-12 at 15:42, Tom Lane wrote: > > > > > >>Alternative thought: just recommend that if possible, people take > >>a filesystem dump of their old PGDATA directory after stopping > >>the old postmaster. This would be sufficient for retreating to > >>the prior version if needed. It might or might not be slower > >>than copying all the files to a new PGDATA ... > >> > >> > > > >Certainly the easier path code wise :-) Being the belt, suspenders and > >steel tip boots (foot gun protection) type that I am, I would make a > >backup even if pg_upgrade copies all the data files. Having pg_upgrade > >copy the data files give you an extra layer of protection if desired, > >and can possibly save an admin who fails to get a good backup of the old > >PGDATA for what ever reason. > > > > > > > I'd be in favor of a prompt at the beginning of the script. "Have made > a copy of the PGDATA directory?" If answered no, then ask for a > confirmation to proceed without backup? To skip the prompt have an > option for '--skip-prompt' for those who are a little more sure of > themselves or want to write a more automated script for this process. > > This approach gives more flexibility as there may not be sufficient > storage available for double the existing database size for conversion > on that mount point / disk. The admin doing the upgrade can copy the > existing database wherever they need it: tape, another filesystem, NFS > mount, etc. > > -- > Thomas Swan > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Dave Smith CANdata Systems Ltd 416-493-9020
Dave Smith <dave.smith@candata.com> writes: > Why not go the other way. > 1) Dump the schemas. > 2) Initdb with the new schemas in a tmp PGDATA > 3) backup the schemas in the current PGDATA > 4) move the new schemas from the new db into the current one. This seems like approximately the same thing except you lose the property of not having modified the old DB if you fail partway through. What's the advantage exactly? > This means that doing an update you would only have to have space for > the system catalogs not the whole database. That's true either way. regards, tom lane
On Fri, 12 Dec 2003, Tom Lane wrote: > Alternative thought: just recommend that if possible, people take > a filesystem dump of their old PGDATA directory after stopping > the old postmaster. This would be sufficient for retreating to > the prior version if needed. It might or might not be slower > than copying all the files to a new PGDATA ... being able to do a dump to a gzip'd tar file might be an option here also ... should take a significantly less footprint to create the "backup", and on an idle server, shouldn't take any longer then other backup methods ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Tom Lane wrote: >Dave Smith <dave.smith@candata.com> writes: > > >>Why not go the other way. >>1) Dump the schemas. >>2) Initdb with the new schemas in a tmp PGDATA >>3) backup the schemas in the current PGDATA >>4) move the new schemas from the new db into the current one. >> >> > >This seems like approximately the same thing except you lose the >property of not having modified the old DB if you fail partway through. >What's the advantage exactly? > > > I do not think that approach buys you much. More than just the schemas change from each major release. The binary (on-disk) format of the relations can change as well, hence the need for the upgrade program. A schema with corrupt data is worthless. ** Warning the user to backup the PGDATA directory, should be sufficient, IMHO. Perhaps even echo a URL to the postgresql.org site for specific backup and upgrade procedures and recommendations. With a full copy of the PGDATA directory an admin can copy the data back reinstall the old version of postgresql and do a postmortem while the old version is still operational without having to keep the service unavailable. If someone is absolutely certain the upgrade will work without an errors then they can holster their loaded gun with the safety off. If there is an error the data can be copied back, old postmaster started, and possibly correct the problem (maybe a reindex operation or the like). Then repeat the upgrade procedure. This approach seems much more simple and flexible as the admin could backup the database to tape or some other medium, possibly multiple volumes, and then do the upgrade in place. ** If the pg_upgrade program were to read/copy old data and output a new data doubling the storage requirements, then you have a quick way to restart the upgrade procedure on failure without having to load the old data again. It seems to me that an error in the upgrade program would likely happen again at the same point on a repeat attempt, so I don't think there are any significant advantages to the upgrade program doing the copy/backup operation. Exceptionally large databases would have to find additional storage for the copy operation. If the copy and upgrade approach were to be followed, it would be advantageous to the admin to be able to specify where the copy of the existing PGDATA would go or the newly generated files could go before they could be moved back to the PGDATA directory. >>This means that doing an update you would only have to have space for >>the system catalogs not the whole database. >> >> > >That's true either way. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 7: don't forget to increase your free space map settings > >
On Fri, 2003-12-12 at 14:00, Tom Lane wrote: > Currently the no-table-contents-changes restriction keeps us from > upgrading from versions older than 7.4 anyway (since type NUMERIC had its > on-disk representation changed in 7.4). We could possibly upgrade 7.3 > databases that contain no NUMERIC columns, if we take the hit of > rebuilding indexes. But is it worth bothering with? How limiting is the above? Does this mean that pg_upgrade will be rendered invalid if there is an on-disk representation change? Do we think we will make it from 7.4 -> 7.5 without on-disk changes? Do we think at this point most upgrades will be without on-disk changes? Or am I missing something, and pg_upgrade will / can do some magic to work around on-disk changes?
"Matthew T. O'Connor" <matthew@zeut.net> writes: >> [ pg_upgrade won't be able to change user table representation ] > How limiting is the above? Does this mean that pg_upgrade will be > rendered invalid if there is an on-disk representation change? Do we > think we will make it from 7.4 -> 7.5 without on-disk changes? Do we > think at this point most upgrades will be without on-disk changes? Per prior discussion, we will enforce some sort of limit on how often the representation of user tables/indexes can be changed. The idea will be to "batch" such changes so that you only have to do a dump/reload every N major releases instead of every one. In other words, pg_upgrade will work for most version upgrades but we reserve the right to occasionally make releases where it doesn't work. How large N will be in practice remains to be seen, of course, but I'd expect something on the order of 4 or 5. In theory pg_upgrade could be made to apply changes in user data representation, but I'm unconvinced that such a process would be a big improvement over dump/reload. regards, tom lane
> Per prior discussion, we will enforce some sort of limit on how often > the representation of user tables/indexes can be changed. The idea will > be to "batch" such changes so that you only have to do a dump/reload > every N major releases instead of every one. In other words, pg_upgrade > will work for most version upgrades but we reserve the right to > occasionally make releases where it doesn't work. > > How large N will be in practice remains to be seen, of course, but I'd > expect something on the order of 4 or 5. > > In theory pg_upgrade could be made to apply changes in user data > representation, but I'm unconvinced that such a process would be a big > improvement over dump/reload. Will we now have to be careful to NEVER re-use OIDs in the system catalogs. Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > Will we now have to be careful to NEVER re-use OIDs in the system catalogs. No. The proposed pg_upgrade procedure doesn't try to reproduce OIDs of catalog entries (other than toast-table OIDs, which are never preassigned anyway), so there's no issue. Good point though --- thanks for thinking about it. regards, tom lane
> No. The proposed pg_upgrade procedure doesn't try to reproduce OIDs of > catalog entries (other than toast-table OIDs, which are never > preassigned anyway), so there's no issue. > > Good point though --- thanks for thinking about it. What about cached OIDs in view and function definitions, etc...? Like if someone had a view that used the old oidrand() function and now we reused that oid for a new, completely different function, would breakage occur? Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > What about cached OIDs in view and function definitions, etc...? Doesn't matter. Catalog entries are dumped and reloaded; there is no carry-forward of OIDs. I suppose if someone were storing OIDs of tables or functions or views in user tables, this procedure would break the references. But that would be true of a dump/reload under current procedures as well. I'm willing to say that that's unsupported. regards, tom lane
On Sun, Dec 14, 2003 at 09:48:20PM -0500, Tom Lane wrote: > Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > > What about cached OIDs in view and function definitions, etc...? > > Doesn't matter. Catalog entries are dumped and reloaded; there is no > carry-forward of OIDs. > > I suppose if someone were storing OIDs of tables or functions or views > in user tables, this procedure would break the references. But that > would be true of a dump/reload under current procedures as well. I'm > willing to say that that's unsupported. Large objects included? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) FOO MANE PADME HUM
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: >> Doesn't matter. Catalog entries are dumped and reloaded; there is no >> carry-forward of OIDs. > Large objects included? No. Large object OIDs are preserved in the given proposal. (Note to self: I wonder whether the recently-added COMMENT ON LARGE OBJECT facility works at all over dump/reload...) regards, tom lane
On Sun, 2003-12-14 at 18:02, Tom Lane wrote: > "Matthew T. O'Connor" <matthew@zeut.net> writes: > > How limiting is the above? Does this mean that pg_upgrade will be > > rendered invalid if there is an on-disk representation change? Do we > > think we will make it from 7.4 -> 7.5 without on-disk changes? Do we > > think at this point most upgrades will be without on-disk changes? > > How large N will be in practice remains to be seen, of course, but I'd > expect something on the order of 4 or 5. Ok, this is what I was looking for. If we are serious about this, would it make sense to start a new policy of bumping the major version number every time an upgrade requires a dump / reload? So PostgreSQL 8.0 would be the next version with on-disk changes, all the 8.x releases would have the same on-disk format, and the next time the disk format changes, then we are on 9.0.
> No. Large object OIDs are preserved in the given proposal. > > (Note to self: I wonder whether the recently-added COMMENT ON LARGE > OBJECT facility works at all over dump/reload...) How do you mean? pg_dump never writes out the COMMENT ON commands... Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: >> (Note to self: I wonder whether the recently-added COMMENT ON LARGE >> OBJECT facility works at all over dump/reload...) > How do you mean? pg_dump never writes out the COMMENT ON commands... Oh, okay, it doesn't work. Care to think about how to fix that? regards, tom lane
>>How do you mean? pg_dump never writes out the COMMENT ON commands... > > > Oh, okay, it doesn't work. > > Care to think about how to fix that? I think you're going to have to explain the exact problem to me - I don't quite get what you mean? Do you mean using pg_dump with the '-b' option? How does pg_dump dump the blobs? Does it do a pg_dumplo effort where it puts them all as files in directories? Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: >> Oh, okay, it doesn't work. > Do you mean using pg_dump with the '-b' option? Yeah. Don't you think that should preserve comments on large objects, now that such comments are alleged to be a supported facility? > How does pg_dump dump the blobs? It dumps them, reloads them (which causes them to be assigned new OIDs) and then runs around and tries to fix up references to them to have the correct OIDs. Possibly that last pass could be extended to include pg_description. Not sure of details though. regards, tom lane
"Matthew T. O'Connor" <matthew@zeut.net> writes: > On Sun, 2003-12-14 at 18:02, Tom Lane wrote: >> How large N will be in practice remains to be seen, of course, but I'd >> expect something on the order of 4 or 5. > Ok, this is what I was looking for. If we are serious about this, would > it make sense to start a new policy of bumping the major version number > every time an upgrade requires a dump / reload? That was discussed already. I think it's purely a cosmetic issue, but have no objection to doing it that way... regards, tom lane
> Yeah. Don't you think that should preserve comments on large objects, > now that such comments are alleged to be a supported facility? > > >>How does pg_dump dump the blobs? > > > It dumps them, reloads them (which causes them to be assigned new OIDs) > and then runs around and tries to fix up references to them to have the > correct OIDs. Possibly that last pass could be extended to include > pg_description. Not sure of details though. OK, I'll look into it. Chris
Hi, > Alternative thought: just recommend that if possible, people > take a filesystem dump of their old PGDATA directory after > stopping the old postmaster. This would be sufficient for > retreating to the prior version if needed. It might or might > not be slower than copying all the files to a new PGDATA ... Filesystem-level snapshots make this very easy. Combined with: Dave Smith <dave.smith@candata.com> writes: > Why not go the other way. > 1) Dump the schemas. > 2) Initdb with the new schemas in a tmp PGDATA > 3) backup the schemas in the current PGDATA > 4) move the new schemas from the new db into the current one. Then it would be possible to: 1) Stop old postmaster 2) Make a filesystem snapshot 3) Upgrade the schemas in-place (as described above) 4) Start new postmaster *) On error: revert filesystem to snapshot Would be very nice for those who can use filesystem snapshots. Sander.
Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: >> Then again, in the case of pg_upgrade, wouldn't just disabling access from >> anywhere except localhost prevent others from getting in? > > Not if your normal operating mode includes connections from clients > running locally. I really don't see any clean way to ensure that > pg_upgrade (and subsidiary pg_dump runs invoked by it) are the only > ones allowed to connect to the database, if we keep the normal > postmaster running. But if we shut down the postmaster then it's > trivial. If you want to prevent "accidential" access, start postmaster on a non-standard port. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Tue, 16 Dec 2003, Jan Wieck wrote: > If you want to prevent "accidential" access, start postmaster on a > non-standard port. That seems like an unfriendly thing to do. You'd have to check to see what port is "standard" for this particular installation, and pick something else. You may choose an unused port, but perhaps it needs to be used in a few minutes by some other process, but then will be occupied. The administrator may also not be happy to have an open port facing the world, or even just other possibly untrusted users on the same machine, assuming you bind to localhost. Jon
On Tue, 16 Dec 2003, Jon Jensen wrote: > On Tue, 16 Dec 2003, Jan Wieck wrote: > > > If you want to prevent "accidential" access, start postmaster on a > > non-standard port. > > That seems like an unfriendly thing to do. You'd have to check to see what > port is "standard" for this particular installation, and pick something > else. You may choose an unused port, but perhaps it needs to be used in a > few minutes by some other process, but then will be occupied. The > administrator may also not be happy to have an open port facing the world, > or even just other possibly untrusted users on the same machine, assuming > you bind to localhost. But aren't ports above a certain number "fair game"? Yep, just answered my own question, quoting from /etc/services: The latest IANA port assignments can be gotten from # http://www.iana.org/assignments/port-numbers # The Well Known Ports are those from 0 through 1023. # The Registered Ports are those from 1024 through 49151 # The Dynamic and/or Private Ports are those from 49152 through 65535 so as long as we use 49152 and above we're cool.
On Tue, 16 Dec 2003, scott.marlowe wrote: > > > If you want to prevent "accidential" access, start postmaster on a > > > non-standard port. > > > > That seems like an unfriendly thing to do. You'd have to check to see what > > port is "standard" for this particular installation, and pick something > > else. You may choose an unused port, but perhaps it needs to be used in a > > few minutes by some other process, but then will be occupied. The > > administrator may also not be happy to have an open port facing the world, > > or even just other possibly untrusted users on the same machine, assuming > > you bind to localhost. > > But aren't ports above a certain number "fair game"? > > Yep, just answered my own question, quoting from /etc/services: > > The latest IANA port assignments can be gotten from > # http://www.iana.org/assignments/port-numbers > # The Well Known Ports are those from 0 through 1023. > # The Registered Ports are those from 1024 through 49151 > # The Dynamic and/or Private Ports are those from 49152 through 65535 > > so as long as we use 49152 and above we're cool. I guess that would work, though this objection remains: > > The administrator may also not be happy to have an open port facing > > the world, or even just other possibly untrusted users on the same > > machine, assuming you bind to localhost. If the PostgreSQL administrator and the system administrator were always the same person, that would be less of a big deal since the sysadmin would know what's going on (assuming s/he reads the docs). Why not use a different UNIX socket in a directory not accessible to others? That would be more secure. Jon
Jan Wieck <JanWieck@Yahoo.com> writes: > If you want to prevent "accidential" access, start postmaster on a > non-standard port. ... thus making pg_upgrade subject to all sorts of interesting questions about whether particular ports get filtered by kernel iptables rules? This doesn't seem like a really great alternative to me ... regards, tom lane