Thread: pg_restore and create FK without verification check
--- ow <oneway_111@yahoo.com> wrote: > IMHO, not only data need to loaded before FK constraints are created but also > there has got to be a feature to allow creation of an FK constraint WITHOUT > doing the verification that all loaded/existing records satisfy the FK > constraint. The ability to create a FK constraint without verification of > existing records should exist by itself (maybe only to superuser) and also as > an option in pg_restore. > > More details: > http://archives.postgresql.org/pgsql-admin/2003-11/msg00308.php > http://archives.postgresql.org/pgsql-admin/2003-11/msg00323.php > It appears there's not a lot of interest in discussing the possibility of FK constraint creation WITHOUT the verification check. How then should one handle the situation with pg_restore and large dbs where creation of FK constraint(s) may take hours? Thanks __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
ow wrote: > >It appears there's not a lot of interest in discussing the possibility of FK >constraint creation WITHOUT the verification check. How then should one handle >the situation with pg_restore and large dbs where creation of FK constraint(s) >may take hours? > > I'd prefer a backup/restore method that dumps physical data, so at restore time there's no need for recreation of FKs. But I didn't see any feedback on this proposal either. Regards, Andreas
Andreas Pflug kirjutas K, 26.11.2003 kell 12:09: > ow wrote: > > > > >It appears there's not a lot of interest in discussing the possibility of FK > >constraint creation WITHOUT the verification check. How then should one handle > >the situation with pg_restore and large dbs where creation of FK constraint(s) > >may take hours? > > > > > > I'd prefer a backup/restore method that dumps physical data, so at > restore time there's no need for recreation of FKs. But I didn't see any > feedback on this proposal either. Was this proposal a separate one from using WAL logs for PITR ? ------------- Hannu
Hannu Krosing wrote: >Andreas Pflug kirjutas K, 26.11.2003 kell 12:09: > > >>ow wrote: >> >> >> >>>It appears there's not a lot of interest in discussing the possibility of FK >>>constraint creation WITHOUT the verification check. How then should one handle >>>the situation with pg_restore and large dbs where creation of FK constraint(s) >>>may take hours? >>> >>> >>> >>> >>I'd prefer a backup/restore method that dumps physical data, so at >>restore time there's no need for recreation of FKs. But I didn't see any >>feedback on this proposal either. >> >> > >Was this proposal a separate one from using WAL logs for PITR ? > > Yes, I mentioned it just a few days when discussing dependency in pg_dump. This is somewhat complementary to WAL and PITR. I'm seeking for a fast way to dump and restore a complete database, like physical file copy, without shutting down the backend. I was thinking of a BACKUP command that streams out the files including any indexes and non-vacuumed tuples. A database recreated from that wouldn't be as clean as a pg_dump/pg_restored database, but it would be up much faster, and there wouldn't be any dependency problem. This doesn't really replace pg_dump/pg_restore, because it probably wouldn't be able to upgrade a cluster. Still, it would be helpful for disaster recovery. Regards, Andreas
--- Andreas Pflug <pgadmin@pse-consulting.de> wrote: > Yes, I mentioned it just a few days when discussing dependency in pg_dump. > This is somewhat complementary to WAL and PITR. I'm seeking for a fast > way to dump and restore a complete database, like physical file copy, > without shutting down the backend. I was thinking of a BACKUP command > that streams out the files including any indexes and non-vacuumed > tuples. A database recreated from that wouldn't be as clean as a > pg_dump/pg_restored database, but it would be up much faster, and there > wouldn't be any dependency problem. > This doesn't really replace pg_dump/pg_restore, because it probably > wouldn't be able to upgrade a cluster. Still, it would be helpful for > disaster recovery. > I think creating a FK without verification check is still needed, especially in case if: 1) original db is corrupted 2) during cluster upgrade 3) there's a need to BACKUP/RESTORE a *schema* instead of db. Thanks __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
ow wrote: >--- Andreas Pflug <pgadmin@pse-consulting.de> wrote: > > >>Yes, I mentioned it just a few days when discussing dependency in pg_dump. >>This is somewhat complementary to WAL and PITR. I'm seeking for a fast >>way to dump and restore a complete database, like physical file copy, >>without shutting down the backend. I was thinking of a BACKUP command >>that streams out the files including any indexes and non-vacuumed >>tuples. A database recreated from that wouldn't be as clean as a >>pg_dump/pg_restored database, but it would be up much faster, and there >>wouldn't be any dependency problem. >>This doesn't really replace pg_dump/pg_restore, because it probably >>wouldn't be able to upgrade a cluster. Still, it would be helpful for >>disaster recovery. >> >> >> > >I think creating a FK without verification check is still needed, especially in >case if: >1) original db is corrupted >2) during cluster upgrade > > Agreed. This might be useful for replication purposes too; in MSSQL, you can write "CREATE TRIGGER ... NOT FOR REPLICATION". I'd like to see a transaction safe way (ENABLE/DISABLE TRIGGER command) for this. >3) there's a need to BACKUP/RESTORE a *schema* instead of db. > >Thanks > > > > >__________________________________ >Do you Yahoo!? >Free Pop-Up Blocker - Get it now >http://companion.yahoo.com/ > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > >
Hello I was asking about this too, one or two weeks ago. >>>> It appears there's not a lot of interest in discussing the >>>> possibility of FK >>>> constraint creation WITHOUT the verification check. How then should >>>> one handle >>>> the situation with pg_restore and large dbs where creation of FK >>>> constraint(s) >>>> may take hours? >>> >>> I'd prefer a backup/restore method that dumps physical data, so at >>> restore time there's no need for recreation of FKs. But I didn't see >>> any feedback on this proposal either. >> >> Was this proposal a separate one from using WAL logs for PITR ? > My question then was: >> Q2: New situation: Why is it not a good idea to backup the database >> files of a cluster incl. all c_log and x_log (log files last) to get a >> "physicaly hot backup". >> In principle it is the same situation like a server which is crashing >> (not a once but during some time). After restoring, it should do a redo >> and rollback automatically like after a crash. This methode (physical >> hot backup) would increas backup and restore times dramatically. The answer from Robert Treat was: > Essentially I think you're right, it should behave much like a crashing > server. The main reason why people don't recommend it is that (depending on > your os setup) there is the potential to lose data that has been commited but > not actually written to disk. Note that you shouldn't get corrupted data > from this, but in many cases losing data is just as bad so we don't recomend > it. If you really want to do this, you should really either shut down the > database or get LVM going. I did not yet many tests. But in principle I was able to hot-backup a cluster or only one database and restore it. But the answer from Robert makes me a little afraid. It means for me he/they do not trust theire recovery mechanism. A definitive answer from Robert is still out. In my opinion a high grade professional database system (like PostgreSQL is or want to be) should have some hot backup features. Otherwise you are NEVER able to handle VLDB's. They were discussing about a 32 TB PostgreSQL database. And I bet my next bonus this year :-), that they are also not able to backup and restore it in a reasonable time. Regards Oli ------------------------------------------------------- Oli Sennhauser Database-Engineer (Oracle & PostgreSQL) Rebenweg 6 CH - 8610 Uster / Switzerland Phone (+41) 1 940 24 82 or Mobile (+41) 79 450 49 14 e-Mail oli.sennhauser@bluewin.ch Website http://mypage.bluewin.ch/shinguz/PostgreSQL/ Secure (signed/encrypted) e-Mail with a Free Personal SwissSign ID: http://www.swisssign.ch Import the SwissSign Root Certificate: http://swisssign.net/cgi-bin/trust/import
> >> Q2: New situation: Why is it not a good idea to backup the database > >> files of a cluster incl. all c_log and x_log (log files last) to get a > >> "physicaly hot backup". > >> In principle it is the same situation like a server which is crashing > >> (not a once but during some time). After restoring, it should do a redo > >> and rollback automatically like after a crash. This methode (physical > >> hot backup) would increas backup and restore times dramatically. > > The answer from Robert Treat was: > > > Essentially I think you're right, it should behave much like a crashing > > server. The main reason why people don't recommend it is that (depending on > > your os setup) there is the potential to lose data that has been commited but > > not actually written to disk. Note that you shouldn't get corrupted data > > from this, but in many cases losing data is just as bad so we don't recomend > > it. If you really want to do this, you should really either shut down the > > database or get LVM going. The key issue here is to have a pg_control file to start from with a finished checkpoint from before you start to backup. Then you need to ensure that you have all logfiles from checkpoint until backup finishes. The last thing to backup must be the last active x_log. It would prbbly be a good idea to not have a vacuum running concurrently :-) And then you need to do a lot of tests, since nobody else does it yet. I think this is an issue, since it is such high risk, nobody will step up easily and say that it is safe. Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > This is somewhat complementary to WAL and PITR. I'm seeking for a fast > way to dump and restore a complete database, like physical file copy, > without shutting down the backend. I was thinking of a BACKUP command > that streams out the files including any indexes and non-vacuumed > tuples. A database recreated from that wouldn't be as clean as a > pg_dump/pg_restored database, but it would be up much faster, and there > wouldn't be any dependency problem. It's already intended to support this as part of the PITR work. The idea is you force a checkpoint and then make a tar-format dump of the database tree (tar or whatever floats your boat, but anyway a filesystem-level backup). The database need not be stopped while you do this, and you don't need a filesystem that can do snapshots or anything fancy like that. The tar dump itself most likely does not represent a consistent state of the database by the time you are done making it. That is okay, because you have also been archiving off to tape (or someplace) all the WAL data generated since that pre-dump checkpoint. You can continue archiving the WAL series for however far forward from the original dump you feel like. If you need to recover, you reload the database from the tar dump and then replay the WAL series against it. This is indistinguishable from a crash recovery situation --- the "inconsistent" tar dump looks just like a disk that has received some but not all of the updates since the last checkpoint. Replay will fix it. The cool thing about this is you can actually bring the DB to the state it was in at any specific point in time covered by your WAL archive --- just run the WAL replay as far as you want, then stop. Solves the "junior DBA deleted all my data Saturday morning" problem, thus "PITR". Now the uncool thing is you need massive amounts of secondary storage to archive all that WAL data, if your installation has heavy update activity. But it seems to me it would address the need you mention above --- you'd just not bother to continue archiving WAL past the end of the dump operation. In principle you could do this today, but we don't have enough support code in place to make it work smoothly, eg WAL segment files aren't labeled with enough identifying information to let you manage an archive full of 'em. Still it doesn't seem that far away. > This doesn't really replace pg_dump/pg_restore, because it probably > wouldn't be able to upgrade a cluster. Right, any such physical dump would be limited to restoring a whole cluster as-is: no imports into other clusters, no selectivity, no fancy games. The main reason is you'd have to dump and restore pg_clog along with the data files. regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > > This doesn't really replace pg_dump/pg_restore, because it probably > > wouldn't be able to upgrade a cluster. > > Right, any such physical dump would be limited to restoring a whole > cluster as-is: no imports into other clusters, no selectivity, no fancy > games. The main reason is you'd have to dump and restore pg_clog along > with the data files. > But that would not help people who would HAVE to use pg_dump/pg_restore (e.g. to backup/restore a single schema), would it? Depending on the db size, etc., creation of FK constraint(s) may take many hours. How should this be handled then? Thanks __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
ow <oneway_111@yahoo.com> writes: > --- Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Right, any such physical dump would be limited to restoring a whole >> cluster as-is: no imports into other clusters, no selectivity, no fancy >> games. > But that would not help people who would HAVE to use pg_dump/pg_restore (e.g. > to backup/restore a single schema), would it? Depending on the db size, etc., > creation of FK constraint(s) may take many hours. How should this be handled > then? Quite honestly, I think they should check their foreign keys. In a partial restore situation there is no guarantee that the referenced table and the referencing table are being restored at the same time from the same dump. An override in that situation looks like a great tool for shooting yourself in the foot. People might be more interested in debating this topic with you if we hadn't discussed it at length just a couple months back. There wasn't consensus then that we had to offer an escape hatch, and you've not offered any argument that wasn't made before. regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Quite honestly, I think they should check their foreign keys. Generally speaking, I agree. The problem is that verification of FK constraint(s) may take too long, depending on the size of the db and other conditions. In my case, on test data, it takes about 1 hour to create tables and copy the data, then about 40 min to create indexes, then 4.5 hours to create one (1) FK constraint. In production, we'll have 10-100x more data than we have for testing. If we have a problem in production, the time necessary to restore the db is simply going to kill us. > People might be more interested in debating this topic with you if we > hadn't discussed it at length just a couple months back. There wasn't > consensus then that we had to offer an escape hatch, and you've not > offered any argument that wasn't made before. I'm simply presenting a problem for which I currently do not see any solution (it's very important for us to be able to restore db within a reasonable amount of time). If there's no solution and none is planned, then we cannot use pgsql, can we? Thanks __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
ow <oneway_111@yahoo.com> writes: > --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Quite honestly, I think they should check their foreign keys. What should I do if I *know* there will be a FK failure but I want to correct it manually. Perhaps by creating all the necessary target records, perhaps by deleting or updating the dead references. Perhaps with a mix of these. As it stands I have to delete the FK constraint, load the table, and fix the data. Then recreate the FK constraint -- with the danger that I'll get the definition wrong -- and wait for the constraint to be verified. If I could disable and reenable the constraint the danger that I would get the definition wrong would be eliminated. And if I had already done the work to ensure there were no broken relationships I would optionally be able to skip the redundant automatic check. I could even have done the verification myself while the data wasn't live for example. The database is a tool. It's annoying to have a tool that tries to outsmart the user. -- greg
Tom Lane wrote: >Andreas Pflug <pgadmin@pse-consulting.de> writes: > > >>This is somewhat complementary to WAL and PITR. I'm seeking for a fast >>way to dump and restore a complete database, like physical file copy, >>without shutting down the backend. I was thinking of a BACKUP command >>that streams out the files including any indexes and non-vacuumed >>tuples. A database recreated from that wouldn't be as clean as a >>pg_dump/pg_restored database, but it would be up much faster, and there >>wouldn't be any dependency problem. >> >> > >It's already intended to support this as part of the PITR work. The >idea is you force a checkpoint and then make a tar-format dump of the >database tree (tar or whatever floats your boat, but anyway a >filesystem-level backup). The database need not be stopped while you do >this, and you don't need a filesystem that can do snapshots or anything >fancy like that. The tar dump itself most likely does not represent a >consistent state of the database by the time you are done making it. >That is okay, because you have also been archiving off to tape (or >someplace) all the WAL data generated since that pre-dump checkpoint. >You can continue archiving the WAL series for however far forward from >the original dump you feel like. If you need to recover, you reload the >database from the tar dump and then replay the WAL series against it. >This is indistinguishable from a crash recovery situation --- the >"inconsistent" tar dump looks just like a disk that has received some >but not all of the updates since the last checkpoint. Replay will fix it. > >The cool thing about this is you can actually bring the DB to the state >it was in at any specific point in time covered by your WAL archive --- >just run the WAL replay as far as you want, then stop. Solves the >"junior DBA deleted all my data Saturday morning" problem, thus "PITR". >Now the uncool thing is you need massive amounts of secondary storage to >archive all that WAL data, > Shouldn't be a problem, since there are few databases out there worldwide exceeding today's average disk capacity... >if your installation has heavy update >activity. But it seems to me it would address the need you mention >above --- you'd just not bother to continue archiving WAL past the end >of the dump operation. > > > PITR is cool, no question, it's more than I've been requesting. When the database server burns, I'd be quite happy if I could restore to my latest tape's point in time, since the WAL log disk probably isn't functional too. So having a fast backup of the snapshot when the backup CHECKPOINT was issued would be enough, no WAL replay needed. >In principle you could do this today, but we don't have enough >support code in place to make it work smoothly, eg WAL segment files >aren't labeled with enough identifying information to let you manage >an archive full of 'em. Still it doesn't seem that far away. > > So I issue CHECKPOINT, and tar the cluster or database. Still, I got two questions: - how to restore a single database - while tar is running, CHECKPOINT(n+1) might be recorded in some files, while others have CHECKPOINT(n). How does the backend know to rollback to CHECKPOINT(n)? Regards, Andreas
Greg Stark wrote: >If I could disable and reenable the constraint the danger that I would get the >definition wrong would be eliminated. And if I had already done the work to >ensure there were no broken relationships I would optionally be able to skip >the redundant automatic check. I could even have done the verification myself >while the data wasn't live for example. > > Since FKs are implemented as trigger, you could disable all triggers on the table right now, no? Could be a bit more comfortable, I agree, and hope for an upcoming DISABLE TRIGGER command. While talking about this: I could add ENABLE/DISABLE TRIGGER functionality to pgadmin3. Unfortunately, on pg_trigger.tgenabled there's still the comment "not presently checked everywhere it should be, so disabling a trigger by setting this to false does not work reliably". I wonder if this is still true for 7.4. I can't imagine that this should be so hard to fix. Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: >> In principle you could do this today, but we don't have enough >> support code in place to make it work smoothly, eg WAL segment files >> aren't labeled with enough identifying information to let you manage >> an archive full of 'em. Still it doesn't seem that far away. > So I issue CHECKPOINT, and tar the cluster or database. Still, I got two > questions: > - how to restore a single database You don't. As I said, any physical backup is going to be all-or-nothing. These techniques are not a replacement for pg_dump. > - while tar is running, CHECKPOINT(n+1) might be recorded in some files, > while others have CHECKPOINT(n). How does the backend know to rollback > to CHECKPOINT(n)? That's part of the management code that we need to write before this will really be very useful; you need to be able to associate the starting time of a tar dump with the most recent previous CHECKPOINT in the WAL logs. Not hard in principle, just takes some code we ain't got yet. regards, tom lane
Tom Lane wrote: >>- how to restore a single database >> >> > >You don't. As I said, any physical backup is going to be >all-or-nothing. These techniques are not a replacement for pg_dump. > > > That's sad. I've been backing up and restoring single databases from a cluster frequently, so I'd really like the database to be selectable. >That's part of the management code that we need to write before this >will really be very useful; you need to be able to associate the >starting time of a tar dump with the most recent previous CHECKPOINT >in the WAL logs. Not hard in principle, just takes some code we ain't >got yet. > > So I lay back patiently (more or less :-) Regards, Andreas
On Wed, 26 Nov 2003, ow wrote: > > People might be more interested in debating this topic with you if we > > hadn't discussed it at length just a couple months back. There wasn't > > consensus then that we had to offer an escape hatch, and you've not > > offered any argument that wasn't made before. > > I'm simply presenting a problem for which I currently do not see any solution > (it's very important for us to be able to restore db within a reasonable amount > of time). If there's no solution and none is planned, then we cannot use pgsql, > can we? You can make your own solution, that's the nice thing about open source stuff. If you wanted to go the SET variable route to control alter time checks of CHECK and FOREIGN KEY constraints, it's almost certainly less than an hours worth of work.
On Wed, 26 Nov 2003, Tom Lane wrote: > Quite honestly, I think they should check their foreign keys. In a > partial restore situation there is no guarantee that the referenced > table and the referencing table are being restored at the same time from > the same dump. An override in that situation looks like a great tool > for shooting yourself in the foot. > > People might be more interested in debating this topic with you if we > hadn't discussed it at length just a couple months back. There wasn't > consensus then that we had to offer an escape hatch, and you've not > offered any argument that wasn't made before. I actually thought the majority in the past discussion thought that an escape hatch was a good idea, but that the discussion broke down in trying to determine what sort of hatch that might be (iirc, it got off into the general discussion of disabling constraints for normal operation as opposed to at alter time).
On Wed, 2003-11-26 at 12:43, Andreas Pflug wrote: > Greg Stark wrote: > > >If I could disable and reenable the constraint the danger that I would get the > >definition wrong would be eliminated. And if I had already done the work to > >ensure there were no broken relationships I would optionally be able to skip > >the redundant automatic check. I could even have done the verification myself > >while the data wasn't live for example. > > > > > > Since FKs are implemented as trigger, you could disable all triggers on > the table right now, no? Could be a bit more comfortable, I agree, and > hope for an upcoming DISABLE TRIGGER command. ISTM I've done this before... from a pg_dump -Fc backup first do a pg_dump -s restore (schema only) and then a pg_dump -a --disable-triggers to load the data without check foreign keys. Theres certainly potential for trouble with this method I suppose but it seems like it accomplish what the original poster requires... Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
ow <oneway_111@yahoo.com> writes: > --- Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Quite honestly, I think they should check their foreign keys. > Generally speaking, I agree. The problem is that verification of FK > constraint(s) may take too long, depending on the size of the db and other > conditions. In my case, on test data, it takes about 1 hour to create tables > and copy the data, then about 40 min to create indexes, then 4.5 hours to > create one (1) FK constraint. If you're seeing this on 7.4, I'd like to see the details of the exact commands being issued. If it's not 7.4, it's not a relevant measurement. regards, tom lane
On Wed, 26 Nov 2003, Tom Lane wrote: > ow <oneway_111@yahoo.com> writes: > > --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Quite honestly, I think they should check their foreign keys. > > > Generally speaking, I agree. The problem is that verification of FK > > constraint(s) may take too long, depending on the size of the db and other > > conditions. In my case, on test data, it takes about 1 hour to create tables > > and copy the data, then about 40 min to create indexes, then 4.5 hours to > > create one (1) FK constraint. > > If you're seeing this on 7.4, I'd like to see the details of the exact > commands being issued. If it's not 7.4, it's not a relevant IIRC, he was. I think the thing causing the difference between his times and the ones we saw typically when doing the tests was that he didn't have an index on the fktable's referencing column.
Stephan Szabo wrote: > > >IIRC, he was. I think the thing causing the difference between his times >and the ones we saw typically when doing the tests was that he didn't have >an index on the fktable's referencing column. > > A common mistake, can't count how often I created this one... And not easy to find, because EXPLAIN won't explain triggers. I'm planning to create some kind of fk index wizard in pgAdmin3, which finds out about fks using columns that aren't covered by an appropriate index. Maybe this check could be performed (as a NOTICE) when the fk is created? Regards, Andreas
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Wed, 26 Nov 2003, Tom Lane wrote: >> If you're seeing this on 7.4, I'd like to see the details of the exact >> commands being issued. If it's not 7.4, it's not a relevant > IIRC, he was. I think the thing causing the difference between his times > and the ones we saw typically when doing the tests was that he didn't have > an index on the fktable's referencing column. I'm not convinced it should matter all that much, for the single-query test method that 7.4 uses. That's why I wanted to see details. regards, tom lane
--- Andreas Pflug <pgadmin@pse-consulting.de> wrote: > Stephan Szabo wrote: > > > > > > >IIRC, he was. I think the thing causing the difference between his times > >and the ones we saw typically when doing the tests was that he didn't have > >an index on the fktable's referencing column. > > > > > > A common mistake, can't count how often I created this one... Wrong. It's a mistake to think that you always HAVE to have an index on FK column. See the links below for more details: http://archives.postgresql.org/pgsql-admin/2003-11/msg00317.php http://archives.postgresql.org/pgsql-admin/2003-11/msg00319.php __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
> A common mistake, can't count how often I created this one... And not > easy to find, because EXPLAIN won't explain triggers. > I'm planning to create some kind of fk index wizard in pgAdmin3, which > finds out about fks using columns that aren't covered by an appropriate > index. Maybe this check could be performed (as a NOTICE) when the fk is > created? Weird - I'm planning the exact same thing for phpPgAdmin! Great minds think alike :P Chris
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >> In principle you could do this today, but we don't have enough > >> support code in place to make it work smoothly, eg WAL segment files > >> aren't labeled with enough identifying information to let you manage > >> an archive full of 'em. Still it doesn't seem that far away. > > > So I issue CHECKPOINT, and tar the cluster or database. Still, I got two > > questions: > > - how to restore a single database > > You don't. As I said, any physical backup is going to be > all-or-nothing. These techniques are not a replacement for pg_dump. But this is just an artifact of the fact that the WAL is a single instance-wide entity, rather than a per-database entity. But since databases are completely separate entities that cannot be simultaneously accessed by any query (corrections welcome), there isn't any reason in principle that the WAL files cannot also be created on a per-database basis. I'm sure, of course, that doing so would bring with it a new set of problems and tradeoffs, so it might not be worth it... -- Kevin Brown kevin@sysexperts.com
On Thu, Nov 27, 2003 at 12:40:28AM +0100, Andreas Pflug wrote: > A common mistake, can't count how often I created this one... And not > easy to find, because EXPLAIN won't explain triggers. That's a pity. And the lack of EXPLAINing function execution, too. Maybe it's not that hard to do? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Just treat us the way you want to be treated + some extra allowance for ignorance." (Michael Brusser)
Kevin Brown <kevin@sysexperts.com> writes: > Tom Lane wrote: >> You don't. As I said, any physical backup is going to be >> all-or-nothing. These techniques are not a replacement for pg_dump. > But this is just an artifact of the fact that the WAL is a single > instance-wide entity, rather than a per-database entity. But since > databases are completely separate entities that cannot be simultaneously > accessed by any query (corrections welcome), there isn't any reason in > principle that the WAL files cannot also be created on a per-database > basis. WAL is not the bottleneck ... as I already mentioned today, pg_clog (and more specifically the meaning of transaction IDs) is what really makes a cluster an indivisible whole at the physical level. If you want to do separate physical dumps/restores, the answer is to set up separate clusters (separate postmasters). Not so hard, is it? regards, tom lane
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > That's a pity. And the lack of EXPLAINing function execution, too. > Maybe it's not that hard to do? Not sure if it's hard or not, but it'd sure be a nice thing to have. regards, tom lane
Tom Lane wrote: > Kevin Brown <kevin@sysexperts.com> writes: > > Tom Lane wrote: > >> You don't. As I said, any physical backup is going to be > >> all-or-nothing. These techniques are not a replacement for pg_dump. > > > But this is just an artifact of the fact that the WAL is a single > > instance-wide entity, rather than a per-database entity. But since > > databases are completely separate entities that cannot be simultaneously > > accessed by any query (corrections welcome), there isn't any reason in > > principle that the WAL files cannot also be created on a per-database > > basis. > > WAL is not the bottleneck ... as I already mentioned today, pg_clog (and > more specifically the meaning of transaction IDs) is what really makes a > cluster an indivisible whole at the physical level. > > If you want to do separate physical dumps/restores, the answer is to set > up separate clusters (separate postmasters). Not so hard, is it? Well, aside from the fact that separate clusters have completely separate user databases, listen on different ports, will compete with other clusters on the same system for resources that would be better managed by a single cluster, and generally have to be maintained as completely separate entities from start to finish, no it's not that hard. ;-) The ability to restore a single large database quickly is, I think, a reasonable request, it's just that right now it's difficult (perhaps impossible) to satisfy that request. It's probably something that we'll have to deal with if we want PG to be useful to people managing really large databases on really, really big iron, though. -- Kevin Brown kevin@sysexperts.com
--- Alvaro Herrera <alvherre@dcc.uchile.cl> wrote: > On Thu, Nov 27, 2003 at 12:40:28AM +0100, Andreas Pflug wrote: > > > A common mistake, can't count how often I created this one... And not > > easy to find, because EXPLAIN won't explain triggers. > > That's a pity. And the lack of EXPLAINing function execution, too. > Maybe it's not that hard to do? > I'd like to emphasize again that NOT having an index on the FK column is a perfectly valid approach, despite some opinions to the contrary. In fact, requiring an index on FK column(s) when it is not required by the application's logic IS a mistake since it slows down inserts/deletes/updates/vacume/reindex/etc on the respective table and wastes disk space (could be considerable amount on large tables). Also, FK column index DOES NOT, in general, solve performance issues with FK verification check. Someone may (and, I'm sure, will) simply have more data or more constraints. The only solution here appears to be the "--disable-triggers" option as it was suggested by Robert Treat. If it works then I'm fine, somehow I did not see that option in the beginning. Thanks __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
On Wed, Nov 26, 2003 at 10:11:20PM -0800, ow wrote: > --- Alvaro Herrera <alvherre@dcc.uchile.cl> wrote: > > On Thu, Nov 27, 2003 at 12:40:28AM +0100, Andreas Pflug wrote: > > > > > A common mistake, can't count how often I created this one... And not > > > easy to find, because EXPLAIN won't explain triggers. > > > > That's a pity. And the lack of EXPLAINing function execution, too. > > Maybe it's not that hard to do? > > I'd like to emphasize again that NOT having an index on the FK column is a > perfectly valid approach, despite some opinions to the contrary. In what scenarios? I'd easily buy this if you are talking about small tables. > Also, FK column index DOES NOT, in general, solve performance issues > with FK verification check. Someone may (and, I'm sure, will) simply > have more data or more constraints. More data? Hmm ... if you have a lot of data in the referenced table, you'd better _have_ an index unless you want a lot of seqscans. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La tristeza es un muro entre dos jardines" (Khalil Gibran)
--- Alvaro Herrera <alvherre@dcc.uchile.cl> wrote: > In what scenarios? I'd easily buy this if you are talking about small > tables. > Read the message again. __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
ow wrote: >I'd like to emphasize again that NOT having an index on the FK column is a >perfectly valid approach, despite some opinions to the contrary. > OW, you might insist that there are several cases when an index is not needed, but I didn't propose to create the index automatically (this really shouldn't happen), but merely to send a NOTICE to the user so he can check if he maybe eventually by chance in doubt should create an index. Databases will certainly suffer much more from missing indexes than from too many indexes. Regards, Andreas
Kevin Brown wrote: >>WAL is not the bottleneck ... as I already mentioned today, pg_clog (and >>more specifically the meaning of transaction IDs) is what really makes a >>cluster an indivisible whole at the physical level. >> > > >The ability to restore a single large database quickly is, I think, >a reasonable request, it's just that right now it's difficult (perhaps >impossible) to satisfy that request. > > > I could live perfectly with a single database restore solution that can't cope with WAL, but merely contains the very snapshot present at the CHECKPOINT when the backup started. Additionally, I could imagine a restore where only one db is restored, and the WAL is replayed from the complete cluster backup set, while ignoring all WAL entries not meant for the database in restauration. Imagine you have a full backup at midnight, and at at 5PM you say "sh*t, I need to have an 11am PITR on my ABC database, while leaving the other five in the cluster untouched". I'd drop that offending DB, restore it, and replay that WAL. Does this sound too esoteric? Regards, Andreas
ow wrote: > --- Tom Lane <tgl@sss.pgh.pa.us> wrote: >> People might be more interested in debating this topic with you if we >> hadn't discussed it at length just a couple months back. There wasn't >> consensus then that we had to offer an escape hatch, and you've not >> offered any argument that wasn't made before. > > I'm simply presenting a problem for which I currently do not see any solution > (it's very important for us to be able to restore db within a reasonable amount > of time). If there's no solution and none is planned, then we cannot use pgsql, > can we? You're simply presenting a problem that isn't there in the first place. If you really feel the need to shoot yourself in the foot, use separate schema and data dumps and do the latter with "-X disable-triggers". And now will you please put it to rest? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
>>> People might be more interested in debating this topic with you if we >>> hadn't discussed it at length just a couple months back. There wasn't >>> consensus then that we had to offer an escape hatch, and you've not >>> offered any argument that wasn't made before. >> >> >> I'm simply presenting a problem for which I currently do not see any >> solution >> (it's very important for us to be able to restore db within a >> reasonable amount >> of time). If there's no solution and none is planned, then we cannot >> use pgsql, >> can we? > > > You're simply presenting a problem that isn't there in the first > place. If you really feel the need to shoot yourself in the foot, use > separate schema and data dumps and do the latter with "-X > disable-triggers". > > And now will you please put it to rest? If this is not a prio 1 problem, what are then the prio one problems??? You are a developer, right? Did you ever manage a big database in production? What shoul I tell to my customers when they want to have a not that big database (100 GB) in PostgreSQL: "I am sorry, but we are not able to do performant backups, I recommend you to choos ORACLE instead???". Is it this we/you recommend? Regards Oli ------------------------------------------------------- Oli Sennhauser Database-Engineer (Oracle & PostgreSQL) Rebenweg 6 CH - 8610 Uster / Switzerland Phone (+41) 1 940 24 82 or Mobile (+41) 79 450 49 14 e-Mail oli.sennhauser@bluewin.ch Website http://mypage.bluewin.ch/shinguz/PostgreSQL/ Secure (signed/encrypted) e-Mail with a Free Personal SwissSign ID: http://www.swisssign.ch Import the SwissSign Root Certificate: http://swisssign.net/cgi-bin/trust/import
Oli Sennhauser wrote: >>>> People might be more interested in debating this topic with you if we >>>> hadn't discussed it at length just a couple months back. There wasn't >>>> consensus then that we had to offer an escape hatch, and you've not >>>> offered any argument that wasn't made before. >>> >>> >>> I'm simply presenting a problem for which I currently do not see any >>> solution >>> (it's very important for us to be able to restore db within a >>> reasonable amount >>> of time). If there's no solution and none is planned, then we cannot >>> use pgsql, >>> can we? >> >> >> You're simply presenting a problem that isn't there in the first >> place. If you really feel the need to shoot yourself in the foot, use >> separate schema and data dumps and do the latter with "-X >> disable-triggers". >> >> And now will you please put it to rest? > > If this is not a prio 1 problem, what are then the prio one problems??? Did you read my mail or only that last sentence? > You are a developer, right? Did you ever manage a big database in > production? What shoul I tell to my customers when they want to have a > not that big database (100 GB) in PostgreSQL: "I am sorry, but we are > not able to do performant backups, I recommend you to choos ORACLE > instead???". Is it this we/you recommend? Among many other things I am a developer too, and I have managed customer databases up to 1.2 TB. But I wonder what you are. You should tell your customers that they have to dump their databases as pg_dump -d swisscheese >swisscheese.schema.dump pg_dump -a -X disable-triggers swisscheese >swisscheese.data.dump This is what I recommended in my previous mail. Is that an unacceptable solution for your customers or what is the problem? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #