Thread: Adding foreign key constraints without integrity check?
Is there a way to add a foreign key constraint without having to wait for it to check the consistency of all existing records? If a database is being reloaded (pg_dumpall then load), it really shouldn't be necessary to check the referential integrity - or at least I should be able to stipulate that I am accepting that risk. My database reload is currently taking about 6 hours to load the data, 42 hours to reindex, and about another 40 hours or so to check the foreign key constraints (about 1.2 billion rows). That's a very long time to be out of commission. I'd really like to eliminate that second 40 hours so I can get it down to a normal weekend. Wes
Wes <wespvp@syntegra.com> writes: > My database reload is currently taking about 6 hours to load the data, 42 > hours to reindex, and about another 40 hours or so to check the foreign key > constraints (about 1.2 billion rows). What PG version is this, and what have you got maintenance_work_mem set to? Undersized m_w_m would hurt both index build and FK checking ... regards, tom lane
On 6/18/06 10:48 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> My database reload is currently taking about 6 hours to load the data, 42 >> hours to reindex, and about another 40 hours or so to check the foreign key >> constraints (about 1.2 billion rows). > > What PG version is this, and what have you got maintenance_work_mem set to? > Undersized m_w_m would hurt both index build and FK checking ... Sorry, forgot that info.. PG is 8.1.4 (I'm testing an upgrade from 7.3.4 to 8.1.4). maintenance_work_mem is set to 983025 - 1.5 times the previous value of 655350. Current RSS of postmaster is about 1.3 GB. System memory is 2GB (would like more, but...). Data is on one array (hardware RAID 5 of eight 73 GB 10k SCSI drives), indexes on another (hardware RAID 5 of five 15K 146 GB SCSI drives on a separate channel), and pg_xlog on a third RAID 1 on a third channel). There are two 2.4 GHz Xeon processors). Wes
Wes wrote: > Is there a way to add a foreign key constraint without having to wait for it > to check the consistency of all existing records? If a database is being > reloaded (pg_dumpall then load), it really shouldn't be necessary to check > the referential integrity - or at least I should be able to stipulate that I > am accepting that risk. You could create the fk-constraints _first_, then disable them, load the data, reindex, and reenable them afterwards. pg_dump/pg_restore can enable and disable fk-constraints before restoring the data, I believe. It does so by tweaking the system catalogs. The only problem I can see is that creating the fk-constraints might create some indices too. But maybe you can manually drop those indices afterwards - I don't know if the fk really _depends_ on the index, or if it creates it only for convenience. greetings, Florian Pflug
> You could create the fk-constraints _first_, then disable them, load > the data, reindex, and reenable them afterwards. > > pg_dump/pg_restore can enable and disable fk-constraints before restoring > the data, I believe. It does so by tweaking the system catalogs. Are referring to '--disable-triggers' on pg_dump? Will this work for foreign key constraints? The doc talks about triggers, but doesn't say anything about FK constraints (are these implemented as triggers?) I don't use pg_restore, just psql. > The only problem I can see is that creating the fk-constraints might create > some indices too. But maybe you can manually drop those indices afterwards - I > don't know if the fk really _depends_ on the index, or if it creates it only > for convenience. I don't see any indexes being added to the table beyond what I add, and those added as a primary key constraint. Currently, pg_dump outputs the FK constraints after the indexes are built, as the last steps. If I try to add the FK constraints after loading the database definitions, but without any indexes, I'm not sure what would happen. Wes
> What PG version is this, and what have you got maintenance_work_mem set to? > Undersized m_w_m would hurt both index build and FK checking ... Looking at the stats again, I did see something unexpected. With MWM set to 983025, index builds were running about 1.3 GB for both RSS and virtual. Adding foreign key constraints, RSS is about 1.1 GB but virtual is slightly over 2 GB. Wes
Wes wrote: >> You could create the fk-constraints _first_, then disable them, load >> the data, reindex, and reenable them afterwards. >> >> pg_dump/pg_restore can enable and disable fk-constraints before restoring >> the data, I believe. It does so by tweaking the system catalogs. > > Are referring to '--disable-triggers' on pg_dump? Will this work for > foreign key constraints? The doc talks about triggers, but doesn't say > anything about FK constraints (are these implemented as triggers?) I don't > use pg_restore, just psql. Yes, I was referring to "--disable-triggers". I always assumes that it disables FK-Constraints as well as triggers, but now that you ask I realize that I might have never actually tried that ;-) But FK-Constraints _are_ implemented as triggers internally, so I guess it should work. >> The only problem I can see is that creating the fk-constraints might create >> some indices too. But maybe you can manually drop those indices afterwards - I >> don't know if the fk really _depends_ on the index, or if it creates it only >> for convenience. > > I don't see any indexes being added to the table beyond what I add, and > those added as a primary key constraint. Currently, pg_dump outputs the FK > constraints after the indexes are built, as the last steps. If I try to add > the FK constraints after loading the database definitions, but without any > indexes, I'm not sure what would happen. Hm.. it i tried it out, and came to the conclusion that my approach doesn't work :-( You can only create an FK if the fields you are referencing in the foreign table form a PK there. And creating a PK implicitly creates an index, which you can't drop without dropping the PK :-( So unless you find a way to force postgres to ignore the index when inserting data, my suggestion won't work :-( greetings, Florian Pflug
Florian, Are you certain: "You can only create an FK if the fields you are referencing in the foreign table form a PK there. And creating a PK implicitly creates an index, which you can't drop without dropping the PK :-(" ???? I'm not sure I am convinced the necessity of a foreign key, "needing" to reference a primary keyed entry from a different table. Florian G. Pflug wrote: > Wes wrote: > >>> You could create the fk-constraints _first_, then disable them, load >>> the data, reindex, and reenable them afterwards. >>> >>> pg_dump/pg_restore can enable and disable fk-constraints before >>> restoring >>> the data, I believe. It does so by tweaking the system catalogs. >> >> >> Are referring to '--disable-triggers' on pg_dump? Will this work for >> foreign key constraints? The doc talks about triggers, but doesn't say >> anything about FK constraints (are these implemented as triggers?) I >> don't >> use pg_restore, just psql. > > Yes, I was referring to "--disable-triggers". I always assumes that it > disables FK-Constraints as well as triggers, but now that you ask I > realize that I might have never actually tried that ;-) > > But FK-Constraints _are_ implemented as triggers internally, so I > guess it should work. > >>> The only problem I can see is that creating the fk-constraints might >>> create >>> some indices too. But maybe you can manually drop those indices >>> afterwards - I >>> don't know if the fk really _depends_ on the index, or if it creates >>> it only >>> for convenience. >> >> >> I don't see any indexes being added to the table beyond what I add, and >> those added as a primary key constraint. Currently, pg_dump outputs >> the FK >> constraints after the indexes are built, as the last steps. If I try >> to add >> the FK constraints after loading the database definitions, but >> without any >> indexes, I'm not sure what would happen. > > Hm.. it i tried it out, and came to the conclusion that my approach > doesn't work :-( > > You can only create an FK if the fields you are referencing in the > foreign table form a PK there. And creating a PK implicitly creates an > index, which you can't drop without dropping the PK :-( > > So unless you find a way to force postgres to ignore the index when > inserting data, my suggestion won't work :-( > > greetings, Florian Pflug > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
On Sun, 2006-06-18 at 22:41, Wes wrote: > Is there a way to add a foreign key constraint without having to wait for it > to check the consistency of all existing records? If a database is being > reloaded (pg_dumpall then load), it really shouldn't be necessary to check > the referential integrity - or at least I should be able to stipulate that I > am accepting that risk. > > My database reload is currently taking about 6 hours to load the data, 42 > hours to reindex, and about another 40 hours or so to check the foreign key > constraints (about 1.2 billion rows). That's a very long time to be out of > commission. I'd really like to eliminate that second 40 hours so I can get > it down to a normal weekend. Are you sure that's really the problem? Do you have indexes on the referring tables (i.e. the foreign key that points to the other table's primary key). Not having an index on the subordinate table makes each and every check on the FK->PK relationship require a seq scan of the subordinate table.
On Jun 18, 2006, at 11:08 PM, Wes wrote: > System memory is 2GB (would like more, but...). Data is on one array > (hardware RAID 5 of eight 73 GB 10k SCSI drives), indexes on another > (hardware RAID 5 of five 15K 146 GB SCSI drives on a separate > channel), and > pg_xlog on a third RAID 1 on a third channel). There are two 2.4 > GHz Xeon > processors). FWIW, RAID5 isn't normally a good recipe for good database performance. This won't help with constraint checking, but you should consider turning fsync off during your restore. Also, why are you frequently dumping and restoring? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On 6/19/06 3:24 PM, "Scott Marlowe" <smarlowe@g2switchworks.com> wrote: > Are you sure that's really the problem? Do you have indexes on the > referring tables (i.e. the foreign key that points to the other table's > primary key). Not having an index on the subordinate table makes each > and every check on the FK->PK relationship require a seq scan of the > subordinate table. Yes, I am sure I have the indexes for both sides. (I just double checked). If not, wouldn't I have abysmal load performance in production, or does that function differently? The constraint that just finished after 18+ hours has a non-composite index on both sides. The other constraint: ADD CONSTRAINT "$2" FOREIGN KEY (recipient) REFERENCES addresses(address_key); has a normal index on address_key. The recipient index is (recipient, date) composite index. This index has always been used when searching on just recipient, and the last time I rebuilt the database (a year ago), the FK addition was inline with expectations. In every case, it takes about the same time to add the foreign key constraint as to create the index, maybe a little more. This is true regardless of whether one of the indexes is composite or not. One constraint build just finished after a little more than 18 hours. The table where the constraint is being created has about 900 million rows. The 'references x' table has about 200 million rows. It churns along eventually eating up about 50 GB or so in the 'base' directory. When that stops growing, it took maybe another hour or two to complete. I'd just like to be able to tell it to simply add the constraint without doing any checking. I know the data is consistent. I just dumped it from a database that has referential integrity enabled. Even if there were an error that crept in to the old database, I don't care - just add the constraint so I can get back online. Right now I'm looking at around 80-90 hours total to do a reload. What's it going to be a year or two from now? I could be out of commission for a week. Wes
On 6/19/06 3:47 PM, "Jim Nasby" <jnasby@pervasive.com> wrote: > FWIW, RAID5 isn't normally a good recipe for good database performance. Understood, but RAID 1 isn't always feasible. The database performs very well with the current hardware configuration - I have no complaints. The only time we have issues is during index rebuilds or a reload. > This won't help with constraint checking, but you should consider > turning fsync off during your restore. I didn't think about doing that, but I wonder how much it would help? The actual data load only takes about 6 hours or so. It's all the index building and constraint checking that takes the time. This time seems to be spent mostly writing temp files and sorting. > Also, why are you frequently dumping and restoring? We have to periodically reindex due to the fact that vacuum processes indexes in index order instead of disk space order. A 'vacuum full' is currently taking about 24 hours. After a reindex, I expect that to drop to 3-4 hours until the index becomes fragmented. I don't think much can be done about the time to reindex (or the order vacuum processes indexes). I wouldn't say we dump/reload frequently. Right now I'm doing a trial run for the 8.x upgrade - I have to make sure I can complete it in the allotted time. However, about a year ago, I had to do multiple reloads in a 2 week period. We had database corruption, and had to reload from source to get a good database - multiple times. We had a defective disk array that wasn't reporting any errors but kept corrupting the database. Besides PG upgrades, I have to consider how long it takes to recover from system failures if I can't use the offline file system backup. I'll be switching to online backups once we get upgraded, but if a reload fails there, I'll again have to fall back to the weekly source backup. Wes
On Mon, 2006-06-19 at 16:39, Wes wrote: > On 6/19/06 3:47 PM, "Jim Nasby" <jnasby@pervasive.com> wrote: > > > FWIW, RAID5 isn't normally a good recipe for good database performance. > > Understood, but RAID 1 isn't always feasible. The database performs very > well with the current hardware configuration - I have no complaints. The > only time we have issues is during index rebuilds or a reload. I think Jim was thinking of RAID 1+0. If you gotta use RAID 5, at least have a contoller with several hundred megs of battery backed cache. Better yet, use a RAID controller with a gig or so of BBU cache and run RAID 1+0 on it. If you can. > > Also, why are you frequently dumping and restoring? > > We have to periodically reindex due to the fact that vacuum processes > indexes in index order instead of disk space order. A 'vacuum full' is > currently taking about 24 hours. After a reindex, I expect that to drop to > 3-4 hours until the index becomes fragmented. I don't think much can be > done about the time to reindex (or the order vacuum processes indexes). Actually, the same question about why your frequently dumping and restoring applies to full vacuums. Why are you doing them? A properly running database should not need vacuum fulls. Nor reindexing. Standard vacuums, scheduled to run often enough, combined with a large enough Free Space Map should mean never needing a full vacuum or reindex. If vacuums slow your system down too much, then you don't have enough I/O bandwidth, and need to see the previous point about a better RAID setup. You should look into slony. You can replicate from one version of pgsql to another, a feature intended to be used for updating. That way, you can put an 8.1 server behind your 7.4 or 8.0 server, replicate to it, wait for it to catch up, shut down your app, fail over your server to the 8.1 machine and viola, you're on 8.1 with minimum downtime.
louis gonzales wrote: > Florian, > Are you certain: > > "You can only create an FK if the fields you are referencing in the > foreign table form a PK there. And creating a PK implicitly creates an > index, which you can't drop without dropping the PK :-(" > Arg.. Should have written "unique index" instead of primary key.. But it doesn't change much, since a unique index and a pk are nearly the same. > I'm not sure I am convinced the necessity of a foreign key, "needing" to > reference a primary keyed entry from a different table. I tried the following: create table a(id int4) ; create table b(id int4, a_id int4) ; alter table b add constraint pk foreign key (a_id) references a (id) ; The alter table gave me an error stating that I need to have a unique index defined on a.id... greetings, Florian Pflug
On 6/19/06 4:48 PM, "Scott Marlowe" <smarlowe@g2switchworks.com> wrote: > > I think Jim was thinking of RAID 1+0. If you gotta use RAID 5, at least > have a contoller with several hundred megs of battery backed cache. > Better yet, use a RAID controller with a gig or so of BBU cache and run > RAID 1+0 on it. If you can. Yes, I realize that is ideal (I meant to write RAID 10 not 1). That takes hardware I don't have, and can't justify spending the money on right now. I/O doesn't appear to be a major issue, although I wasn't constantly monitoring it. Most of the time, the I/O load was not that high - CPU was pegged. Normal production updates and queries are well within (not at all low) expectations. > Actually, the same question about why your frequently dumping and > restoring applies to full vacuums. Why are you doing them? A properly > running database should not need vacuum fulls. Nor reindexing. Simple... VACUUM FULL reads the entire database. Between a full source dump and a full vacuum, I've got a better chance of finding and recovering from corruption sooner. Once bit... There is no (last time I checked) utility to verify full database integrity, and there are no CRC's on data blocks. A pg_dumpall doesn't touch the indexes. While this won't catch everything, seems pretty good at catching corrupted indexes. I've seen no more effective ways of verifying the entire database. The kind of hardware problem we encountered last year is rare, but it does happen. I've seen similar things many times over the years. RAID doesn't help you when a controller mirrors garbage. > You should look into slony. You can replicate from one version of pgsql > to another, a feature intended to be used for updating. I'd love to, but that requires hardware I don't have. Wes
BTW, I do appreciate any and all suggestions. I hope my last message didn't come across otherwise. That's not what was intended. It's just that adding/replacing hardware is not an option right now. Maybe next year... I'm still trying to dig up another 2GB memory. The database actually performs extremely well with the current hardware in the vast majority of cases. I just wish I could cut off the wasted 40+ hours adding the foreign key constraints - it would cut the reload time in half. It's too bad the suggestion of creating the FK before hand didn't pan out. That looked like a good shot. Wes
Florian, I understand where you're coming from. Indexes are always unique and all RDBMS systems use them to 'uniquely' identify a row from the the perspective of internal software management. Index != PrimaryKey, so every table created, despite any Primary/Foreign key contraints put on them, always have a 1-1 Index per row entry. At least that's the way I understand it, can someone else affirm this statement or redirect a misguided 'me ;)'? Thanks group, Florian G. Pflug wrote: > louis gonzales wrote: > >> Florian, >> Are you certain: >> >> "You can only create an FK if the fields you are referencing in the >> foreign table form a PK there. And creating a PK implicitly creates >> an index, which you can't drop without dropping the PK :-(" >> > Arg.. Should have written "unique index" instead of primary key.. > But it doesn't change much, since a unique index and a pk are nearly > the same. > >> I'm not sure I am convinced the necessity of a foreign key, "needing" >> to reference a primary keyed entry from a different table. > > I tried the following: > create table a(id int4) ; > create table b(id int4, a_id int4) ; > alter table b add constraint pk foreign key (a_id) references a (id) ; > Not sure, but maybe the syntax on this is slightly ambiguous. Try creating table b with a primary key constraint on a_id, then alter the table to add foreign key constraint. I'm going to look up a couple of references and see what I can dig up. That may be perfectly legitimate syntax, but it just seems off to me. Sorry if it is, I've spent the last few days on Oracle 9i, so I'm jumping around in my memory.... quite a bit for validity amongst different syntax. > The alter table gave me an error stating that I need to have a unique > index > defined on a.id... > > greetings, Florian Pflug > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend
Florian, So if you: create table test ( id varchar(2) primary key, age int ); create table test2 ( id varchar(2) primary key, age2 int ); alter table test2 add foreign key (id) references test (id); \d test2 you'll see that attribute "id" from test2, now has both a primary key constraint and a foreign key that references the primary key of test. perhaps you can assert two constraints at the same time during an "alter table ..." not sure why your example syntax is failing louis gonzales wrote: > Florian, > I understand where you're coming from. Indexes are always unique and > all RDBMS systems use them to 'uniquely' identify a row from the the > perspective of internal software management. Index != PrimaryKey, so > every table created, despite any Primary/Foreign key contraints put on > them, always have a 1-1 Index per row entry. At least that's the way > I understand it, can someone else affirm this statement or redirect a > misguided 'me ;)'? > > Thanks group, > > Florian G. Pflug wrote: > >> louis gonzales wrote: >> >>> Florian, >>> Are you certain: >>> >>> "You can only create an FK if the fields you are referencing in the >>> foreign table form a PK there. And creating a PK implicitly creates >>> an index, which you can't drop without dropping the PK :-(" >>> >> Arg.. Should have written "unique index" instead of primary key.. >> But it doesn't change much, since a unique index and a pk are nearly >> the same. >> >>> I'm not sure I am convinced the necessity of a foreign key, >>> "needing" to reference a primary keyed entry from a different table. >> >> >> I tried the following: >> create table a(id int4) ; >> create table b(id int4, a_id int4) ; >> alter table b add constraint pk foreign key (a_id) references a (id) ; >> > Not sure, but maybe the syntax on this is slightly ambiguous. Try > creating table b with a primary key constraint on a_id, then alter the > table to add foreign key constraint. I'm going to look up a couple of > references and see what I can dig up. That may be perfectly > legitimate syntax, but it just seems off to me. > > Sorry if it is, I've spent the last few days on Oracle 9i, so I'm > jumping around in my memory.... quite a bit for validity amongst > different syntax. > >> The alter table gave me an error stating that I need to have a unique >> index >> defined on a.id... >> > > >> greetings, Florian Pflug >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 6: explain analyze is your friend > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
louis gonzales wrote: > Florian, > I understand where you're coming from. Indexes are always unique and > all RDBMS systems use them to 'uniquely' identify a row from the the > perspective of internal software management. Surely there are non-unique indices - meaning indices for which there are more then one entry for a given key. > Index != PrimaryKey, so > every table created, despite any Primary/Foreign key contraints put on > them, always have a 1-1 Index per row entry. At least that's the way I > understand it, can someone else affirm this statement or redirect a > misguided 'me ;)'? In postgresql at least, I believe that if you create no index (or pk), then there is no index. The only exception are toast-tables, but you don't even see those tables normally, and they're just an implementation detail of how large attributes are stored. However, the whole point of this thread was whether there is a way to create a FK without postgres checking if it's actually satisfied, or not. This could speed up restoring a dump, because you know that the FK is actually satisfied in that case. My suggestion was to create the fk _before_ loading the data, and disable it similarly to what "--disable-triggers" doest. It turned out, however, that a FK always depends on a unique index (be it a primary key, or not), which prevents my plan from working :-( greetings, Florian Pflug
On 6/20/06 5:07 AM, "Florian G. Pflug" <fgp@phlo.org> wrote: > My suggestion was to create the fk _before_ loading the data, and disable it > similarly to what "--disable-triggers" doest. It turned out, however, that a > FK always depends on a unique index (be it a primary key, or not), which > prevents > my plan from working :-( That was a great idea - too bad it didn't pan out. I don't suppose there's any (reasonable) way to directly insert into the system tables to create the constraint? I could knock almost 2 days off of the almost 4 days to reload if I could solve this. Wes
On Tue, Jun 20, 2006 at 00:49:21 -0400, louis gonzales <gonzales@linuxlouis.net> wrote: > Florian, > I understand where you're coming from. Indexes are always unique and > all RDBMS systems use them to 'uniquely' identify a row from the the > perspective of internal software management. Index != PrimaryKey, so > every table created, despite any Primary/Foreign key contraints put on > them, always have a 1-1 Index per row entry. At least that's the way I > understand it, can someone else affirm this statement or redirect a > misguided 'me ;)'? Note that indexes are not always unique. They can sometimes still be useful for speeding up performance even when there are duplicates. Postgres also has partial indexes which cover only some of the rows in a table, based on a where condition.
Wes wrote: > On 6/20/06 5:07 AM, "Florian G. Pflug" <fgp@phlo.org> wrote: > >> My suggestion was to create the fk _before_ loading the data, and disable it >> similarly to what "--disable-triggers" doest. It turned out, however, that a >> FK always depends on a unique index (be it a primary key, or not), which >> prevents >> my plan from working :-( > > That was a great idea - too bad it didn't pan out. > > I don't suppose there's any (reasonable) way to directly insert into the > system tables to create the constraint? I could knock almost 2 days off of > the almost 4 days to reload if I could solve this. I believe that it's possible, but it's hard to get right. A foreign key consists of at least .) The actual definition in pg_constraints or so .) A trigger .) Various reconds in pg_depend To be on the safe side, you'd need to look at the sourcecode, and miminc what is done there. Maybe you could ask at some postgresql support companies how much effort it would be to add a "without check" flag to "alter table add constraint foreign key", and how much they'd charge for it... greetings, Florian Pflug
On 6/20/06 8:17 PM, "Florian G. Pflug" <fgp@phlo.org> wrote: > Maybe you could ask at some postgresql support companies how much effort it > would > be to add a "without check" flag to "alter table add constraint foreign key", > and > how much they'd charge for it... Or if I get ambitious, dig into the code myself if I can figure out where to start... Wes
On Jun 21, 2006, at 8:38 AM, Wes wrote: >> Maybe you could ask at some postgresql support companies how much >> effort it >> would >> be to add a "without check" flag to "alter table add constraint >> foreign key", >> and >> how much they'd charge for it... > > Or if I get ambitious, dig into the code myself if I can figure out > where to > start... If nothing else, you should bring it up on -hackers and ask to have this added as a TODO. It seems like a worth-while addition to pg_dump/ restore to me... To answer another of your emails in this thread... a LAZY vacuum of the entire database will read every table and index in the database. Wanting to read the entire database to check for corruption is no reason to do a VACUUM FULL. Also, if you're curious about restore time for your upgrade, you should be doing the restore to an 8.1.4 database, not to your current version. There's been a ton of performance improvements made. In fact, I'm wondering if constraint checking in restore has been improved... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On 6/22/06 2:57 PM, "Jim Nasby" <jnasby@pervasive.com> wrote: > If nothing else, you should bring it up on -hackers and ask to have > this added as a TODO. It seems like a worth-while addition to pg_dump/ > restore to me... Thanks for the suggestion. > To answer another of your emails in this thread... a LAZY vacuum of > the entire database will read every table and index in the database. > Wanting to read the entire database to check for corruption is no > reason to do a VACUUM FULL. Sorry, I misspoke. I do not do a VACUUM FULL. I do a VACUUM VERBOSE ANALYZE on a weekly basis. The verbose allows me to see which tables/indexes are getting bad (vacuum time wise). This is currently taking about 24 hours on a weekend (very little production load competing with it). The time drops dramatically after a reindex, then creeps up again as the indexes are updated in random order. > Also, if you're curious about restore > time for your upgrade, you should be doing the restore to an 8.1.4 > database, not to your current version. There's been a ton of > performance improvements made. In fact, I'm wondering if constraint > checking in restore has been improved... Yes, that is what I did. I'm in the process of testing an upgrade from 7.3.x to 8.1.4 - export from 7.3.x and import into 8.1.4. Unfortunately, I'm sitting at about 90 hours when I've got about an 80 hour window on a long weekend... Wes
On 6/18/2006 11:41 PM, Wes wrote: > Is there a way to add a foreign key constraint without having to wait for it > to check the consistency of all existing records? If a database is being > reloaded (pg_dumpall then load), it really shouldn't be necessary to check > the referential integrity - or at least I should be able to stipulate that I > am accepting that risk. > > My database reload is currently taking about 6 hours to load the data, 42 > hours to reindex, and about another 40 hours or so to check the foreign key > constraints (about 1.2 billion rows). That's a very long time to be out of > commission. I'd really like to eliminate that second 40 hours so I can get > it down to a normal weekend. The original implementation when the feature was new was to restore the foreign key constraints at the end of the dump with CREATE CONSTRAINT TRIGGER, which does not check the existing data. pg_dump was changed to use ALTER TABLE instead. Maybe it would be good to have the old behaviour as an option? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
8.1 improved locking for foreign key references but had an unexpected consequence to our application - no parallel loads in our application. The application does an EXCLUSIVE lock on 'addresses'. It then gets all of the keys from 'addresses' it needs, and adds new ones encountered in this load. It then completes the transaction, releases the exclusive lock, and inserts the other table's records using the values read from/inserted into 'addresses'. There are foreign key constraints between the various tables and 'addresses' to insure referential integrity. Previously (pgsql 7.4.5), multiple loads would run simultaneously - and occasionally got 'deadlock detected' with the foreign key locks even though they were referenced in sorted order. When loading tables other than 'addresses', foreign key locks did not prevent other jobs from grabbing the exclusive lock on 'addresses'. With 8.1.4, the foreign key locks prevent other instances from grabbing the lock, so they wait until the first job is complete - only one job loads at a time. About EXCLUSIVE locks, the manual says: "...only reads from the table can proceed in parallel with a transaction holding this lock mode." What is now the appropriate lock? It needs to: 1. Prevent others from updating the table 2. Block other jobs that are requesting the same lock (if job 2 does a SELECT and finds nothing, it will try to create the record that job 1 may already have created in its transaction). 3. Not conflict with foreign key reference locks SHARE does not appear to be appropriate - it would fail #2. Maybe "SHARE UPDATE EXCLUSIVE"? Wes
On Sun, Jul 16, 2006 at 05:46:16PM -0500, Wes wrote: > Previously (pgsql 7.4.5), multiple loads would run simultaneously - and > occasionally got 'deadlock detected' with the foreign key locks even though > they were referenced in sorted order. When loading tables other than > 'addresses', foreign key locks did not prevent other jobs from grabbing the > exclusive lock on 'addresses'. Unless I'm misunderstanding you or a bug was fixed between 7.4.5 and 7.4.13 (the version I'm running), I'm not convinced that last statement is true. EXCLUSIVE conflicts with all lock types except ACCESS SHARE; foreign key references prior to 8.1 use SELECT FOR UPDATE and in 8.1 they use SELECT FOR SHARE, but in both cases they acquire ROW SHARE on the referenced table, which conflicts with EXCLUSIVE. > With 8.1.4, the foreign key locks prevent other instances from grabbing the > lock, so they wait until the first job is complete - only one job loads at a > time. Again, maybe I'm misunderstanding you, but the following example behaves the same way in 8.1.4 and 7.4.13 (foo has a foreign key reference to addresses): T1: BEGIN; T1: INSERT INTO foo (address_id) VALUES (1); T2: BEGIN; T2: LOCK TABLE addresses IN EXCLUSIVE MODE; T2: (blocked until T1 completes) Does this example differ from what you're doing or seeing? > What is now the appropriate lock? It needs to: > > 1. Prevent others from updating the table > 2. Block other jobs that are requesting the same lock (if job 2 does a > SELECT and finds nothing, it will try to create the record that job 1 may > already have created in its transaction). > 3. Not conflict with foreign key reference locks SHARE ROW EXCLUSIVE is the weakest lock that meets these requirements. It conflicts with itself (#2) and with ROW EXCLUSIVE, which UPDATE, DELETE, and INSERT acquire (#1), but doesn't conflict with ROW SHARE, which is what SELECT FOR UPDATE/SHARE acquire (#3). -- Michael Fuhr