Thread: Adding foreign key constraints without integrity check?

Adding foreign key constraints without integrity check?

From

Wes

Date:

19 June 2006, 00:41:22

Is there a way to add a foreign key constraint without having to wait for it
to check the consistency of all existing records?  If a database is being
reloaded (pg_dumpall then load), it really shouldn't be necessary to check
the referential integrity - or at least I should be able to stipulate that I
am accepting that risk.

My database reload is currently taking about 6 hours to load the data, 42
hours to reindex, and about another 40 hours or so to check the foreign key
constraints (about 1.2 billion rows).  That's a very long time to be out of
commission.  I'd really like to eliminate that second 40 hours so I can get
it down to a normal weekend.

Wes

Re: Adding foreign key constraints without integrity check?

From

Tom Lane

Date:

19 June 2006, 00:48:41

Wes <wespvp@syntegra.com> writes:
> My database reload is currently taking about 6 hours to load the data, 42
> hours to reindex, and about another 40 hours or so to check the foreign key
> constraints (about 1.2 billion rows).

What PG version is this, and what have you got maintenance_work_mem set to?
Undersized m_w_m would hurt both index build and FK checking ...

            regards, tom lane

Re: Adding foreign key constraints without integrity

From

Wes

Date:

19 June 2006, 01:09:05

On 6/18/06 10:48 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

>> My database reload is currently taking about 6 hours to load the data, 42
>> hours to reindex, and about another 40 hours or so to check the foreign key
>> constraints (about 1.2 billion rows).
>
> What PG version is this, and what have you got maintenance_work_mem set to?
> Undersized m_w_m would hurt both index build and FK checking ...

Sorry, forgot that info..

PG is 8.1.4 (I'm testing an upgrade from 7.3.4 to 8.1.4).

maintenance_work_mem  is set to 983025 - 1.5 times the previous value of
655350.  Current RSS of postmaster is about 1.3 GB.

System memory is 2GB (would like more, but...).  Data is on one array
(hardware RAID 5 of eight 73 GB 10k SCSI drives), indexes on another
(hardware RAID 5 of five 15K 146 GB SCSI drives on a separate channel), and
pg_xlog on a third RAID 1 on a third channel).  There are two 2.4 GHz Xeon
processors).

Wes

Re: Adding foreign key constraints without integrity check?

From

"Florian G. Pflug"

Date:

19 June 2006, 07:34:57

Wes wrote:
> Is there a way to add a foreign key constraint without having to wait for it
> to check the consistency of all existing records?  If a database is being
> reloaded (pg_dumpall then load), it really shouldn't be necessary to check
> the referential integrity - or at least I should be able to stipulate that I
> am accepting that risk.
You could create the fk-constraints _first_, then disable them, load
the data, reindex, and reenable them afterwards.

pg_dump/pg_restore can enable and disable fk-constraints before restoring
the data, I believe. It does so by tweaking the system catalogs.

The only problem I can see is that creating the fk-constraints might create
some indices too. But maybe you can manually drop those indices afterwards - I
don't know if the fk really _depends_ on the index, or if it creates it only
for convenience.

greetings, Florian Pflug

Re: Adding foreign key constraints without integrity

From

Wes

Date:

19 June 2006, 09:44:44

> You could create the fk-constraints _first_, then disable them, load
> the data, reindex, and reenable them afterwards.
>
> pg_dump/pg_restore can enable and disable fk-constraints before restoring
> the data, I believe. It does so by tweaking the system catalogs.

Are referring to '--disable-triggers' on pg_dump?  Will this work for
foreign key constraints?  The doc talks about triggers, but doesn't say
anything about FK constraints (are these implemented as triggers?)  I don't
use pg_restore, just psql.

> The only problem I can see is that creating the fk-constraints might create
> some indices too. But maybe you can manually drop those indices afterwards - I
> don't know if the fk really _depends_ on the index, or if it creates it only
> for convenience.

I don't see any indexes being added to the table beyond what I add, and
those added as a primary key constraint.  Currently, pg_dump outputs the FK
constraints after the indexes are built, as the last steps.  If I try to add
the FK constraints after loading the database definitions, but without any
indexes, I'm not sure what would happen.

Wes

Re: Adding foreign key constraints without integrity

From

Wes

Date:

19 June 2006, 11:22:18

> What PG version is this, and what have you got maintenance_work_mem set to?
> Undersized m_w_m would hurt both index build and FK checking ...

Looking at the stats again, I did see something unexpected.  With MWM set to
983025, index builds were running about 1.3 GB for both RSS and virtual.
Adding foreign key constraints, RSS is about 1.1 GB but virtual is slightly
over 2 GB.

Wes

Re: Adding foreign key constraints without integrity check?

From

"Florian G. Pflug"

Date:

19 June 2006, 12:07:45

Wes wrote:
>> You could create the fk-constraints _first_, then disable them, load
>> the data, reindex, and reenable them afterwards.
>>
>> pg_dump/pg_restore can enable and disable fk-constraints before restoring
>> the data, I believe. It does so by tweaking the system catalogs.
>
> Are referring to '--disable-triggers' on pg_dump?  Will this work for
> foreign key constraints?  The doc talks about triggers, but doesn't say
> anything about FK constraints (are these implemented as triggers?)  I don't
> use pg_restore, just psql.
Yes, I was referring to "--disable-triggers". I always assumes that it
disables FK-Constraints as well as triggers, but now that you ask I
realize that I might have never actually tried that ;-)

But FK-Constraints _are_ implemented as triggers internally, so I guess
it should work.

>> The only problem I can see is that creating the fk-constraints might create
>> some indices too. But maybe you can manually drop those indices afterwards - I
>> don't know if the fk really _depends_ on the index, or if it creates it only
>> for convenience.
>
> I don't see any indexes being added to the table beyond what I add, and
> those added as a primary key constraint.  Currently, pg_dump outputs the FK
> constraints after the indexes are built, as the last steps.  If I try to add
> the FK constraints after loading the database definitions, but without any
> indexes, I'm not sure what would happen.
Hm.. it i tried it out, and came to the conclusion that my approach
doesn't work :-(

You can only create an FK if the fields you are referencing in the
foreign table form a PK there. And creating a PK implicitly creates an
index, which you can't drop without dropping the PK :-(

So unless you find a way to force postgres to ignore the index when
inserting data, my suggestion won't work :-(

greetings, Florian Pflug

Re: Adding foreign key constraints without integrity check?

From

louis gonzales

Date:

19 June 2006, 16:29:54

Florian,
Are you certain:

"You can only create an FK if the fields you are referencing in the
foreign table form a PK there. And creating a PK implicitly creates an
index, which you can't drop without dropping the PK :-("

????

I'm not sure I am convinced the necessity of a foreign key, "needing" to
reference a primary keyed entry from a different table.




Florian G. Pflug wrote:

> Wes wrote:
>
>>> You could create the fk-constraints _first_, then disable them, load
>>> the data, reindex, and reenable them afterwards.
>>>
>>> pg_dump/pg_restore can enable and disable fk-constraints before
>>> restoring
>>> the data, I believe. It does so by tweaking the system catalogs.
>>
>>
>> Are referring to '--disable-triggers' on pg_dump?  Will this work for
>> foreign key constraints?  The doc talks about triggers, but doesn't say
>> anything about FK constraints (are these implemented as triggers?)  I
>> don't
>> use pg_restore, just psql.
>
> Yes, I was referring to "--disable-triggers". I always assumes that it
> disables FK-Constraints as well as triggers, but now that you ask I
> realize that I might have never actually tried that ;-)
>
> But FK-Constraints _are_ implemented as triggers internally, so I
> guess it should work.
>
>>> The only problem I can see is that creating the fk-constraints might
>>> create
>>> some indices too. But maybe you can manually drop those indices
>>> afterwards - I
>>> don't know if the fk really _depends_ on the index, or if it creates
>>> it only
>>> for convenience.
>>
>>
>> I don't see any indexes being added to the table beyond what I add, and
>> those added as a primary key constraint.  Currently, pg_dump outputs
>> the FK
>> constraints after the indexes are built, as the last steps.  If I try
>> to add
>> the FK constraints after loading the database definitions, but
>> without any
>> indexes, I'm not sure what would happen.
>
> Hm.. it i tried it out, and came to the conclusion that my approach
> doesn't work :-(
>
> You can only create an FK if the fields you are referencing in the
> foreign table form a PK there. And creating a PK implicitly creates an
> index, which you can't drop without dropping the PK :-(
>
> So unless you find a way to force postgres to ignore the index when
> inserting data, my suggestion won't work :-(
>
> greetings, Florian Pflug
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match

Re: Adding foreign key constraints without integrity

From

Scott Marlowe

Date:

19 June 2006, 17:25:20

On Sun, 2006-06-18 at 22:41, Wes wrote:
> Is there a way to add a foreign key constraint without having to wait for it
> to check the consistency of all existing records?  If a database is being
> reloaded (pg_dumpall then load), it really shouldn't be necessary to check
> the referential integrity - or at least I should be able to stipulate that I
> am accepting that risk.
>
> My database reload is currently taking about 6 hours to load the data, 42
> hours to reindex, and about another 40 hours or so to check the foreign key
> constraints (about 1.2 billion rows).  That's a very long time to be out of
> commission.  I'd really like to eliminate that second 40 hours so I can get
> it down to a normal weekend.

Are you sure that's really the problem?  Do you have indexes on the
referring tables (i.e. the foreign key that points to the other table's
primary key).  Not having an index on the subordinate table makes each
and every check on the FK->PK relationship require a seq scan of the
subordinate table.

Re: Adding foreign key constraints without integrity

From

Jim Nasby

Date:

19 June 2006, 17:57:08

On Jun 18, 2006, at 11:08 PM, Wes wrote:
> System memory is 2GB (would like more, but...).  Data is on one array
> (hardware RAID 5 of eight 73 GB 10k SCSI drives), indexes on another
> (hardware RAID 5 of five 15K 146 GB SCSI drives on a separate
> channel), and
> pg_xlog on a third RAID 1 on a third channel).  There are two 2.4
> GHz Xeon
> processors).

FWIW, RAID5 isn't normally a good recipe for good database performance.

This won't help with constraint checking, but you should consider
turning fsync off during your restore.

Also, why are you frequently dumping and restoring?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Adding foreign key constraints without integrity

From

Wes

Date:

19 June 2006, 18:21:57

On 6/19/06 3:24 PM, "Scott Marlowe" <smarlowe@g2switchworks.com> wrote:

> Are you sure that's really the problem?  Do you have indexes on the
> referring tables (i.e. the foreign key that points to the other table's
> primary key).  Not having an index on the subordinate table makes each
> and every check on the FK->PK relationship require a seq scan of the
> subordinate table.

Yes, I am sure I have the indexes for both sides.  (I just double checked).
If not, wouldn't I have abysmal load performance in production, or does that
function differently?

The constraint that just finished after 18+ hours has a non-composite index
on both sides.  The other constraint:

ADD CONSTRAINT "$2" FOREIGN KEY (recipient) REFERENCES
addresses(address_key);

has a normal index on address_key.  The recipient index is (recipient, date)
composite index.  This index has always been used when searching on just
recipient, and the last time I rebuilt the database (a year ago), the FK
addition was inline with expectations.

In every case, it takes about the same time to add the foreign key
constraint as to create the index, maybe a little more.  This is true
regardless of whether one of the indexes is composite or not.  One
constraint build just finished after a little more than 18 hours.  The table
where the constraint is being created has about 900 million rows.  The
'references x' table has about 200 million rows.  It churns along eventually
eating up about 50 GB or so in the 'base' directory. When that stops
growing, it took maybe another hour or two to complete.

I'd just like to be able to tell it to simply add the constraint without
doing any checking.  I know the data is consistent.  I just dumped it from a
database that has referential integrity enabled.  Even if there were an
error that crept in to the old database, I don't care - just add the
constraint so I can get back online.  Right now I'm looking at around 80-90
hours total to do a reload.  What's it going to be a year or two from now?
I could be out of commission for a week.

Wes

Re: Adding foreign key constraints without integrity

From

Wes

Date:

19 June 2006, 18:39:47

On 6/19/06 3:47 PM, "Jim Nasby" <jnasby@pervasive.com> wrote:

> FWIW, RAID5 isn't normally a good recipe for good database performance.

Understood, but RAID 1 isn't always feasible.  The database performs very
well with the current hardware configuration - I have no complaints.  The
only time we have issues is during index rebuilds or a reload.

> This won't help with constraint checking, but you should consider
> turning fsync off during your restore.

I didn't think about doing that, but I wonder how much it would help?  The
actual data load only takes about 6 hours or so.  It's all the index
building and constraint checking that takes the time.  This time seems to be
spent mostly writing temp files and sorting.

> Also, why are you frequently dumping and restoring?

We have to periodically reindex due to the fact that vacuum processes
indexes in index order instead of disk space order.  A 'vacuum full' is
currently taking about 24 hours.  After a reindex, I expect that to drop to
3-4 hours until the index becomes fragmented.  I don't think much can be
done about the time to reindex (or the order vacuum processes indexes).

I wouldn't say we dump/reload frequently.  Right now I'm doing a trial run
for the 8.x upgrade - I have to make sure I can complete it in the allotted
time.  However, about a year ago, I had to do multiple reloads in a 2 week
period.  We had database corruption, and had to reload from source to get a
good database - multiple times.  We had a defective disk array that wasn't
reporting any errors but kept corrupting the database.

Besides PG upgrades, I have to consider how long it takes to recover from
system failures if I can't use the offline file system backup.  I'll be
switching to online backups once we get upgraded, but if a reload fails
there, I'll again have to fall back to the weekly source backup.

Wes

Re: Adding foreign key constraints without integrity

From

Scott Marlowe

Date:

19 June 2006, 18:48:54

On Mon, 2006-06-19 at 16:39, Wes wrote:
> On 6/19/06 3:47 PM, "Jim Nasby" <jnasby@pervasive.com> wrote:
>
> > FWIW, RAID5 isn't normally a good recipe for good database performance.
>
> Understood, but RAID 1 isn't always feasible.  The database performs very
> well with the current hardware configuration - I have no complaints.  The
> only time we have issues is during index rebuilds or a reload.

I think Jim was thinking of RAID 1+0.  If you gotta use RAID 5, at least
have a contoller with several hundred megs of battery backed cache.
Better yet, use a RAID controller with a gig or so of BBU cache and run
RAID 1+0 on it. If you can.

> > Also, why are you frequently dumping and restoring?
>
> We have to periodically reindex due to the fact that vacuum processes
> indexes in index order instead of disk space order.  A 'vacuum full' is
> currently taking about 24 hours.  After a reindex, I expect that to drop to
> 3-4 hours until the index becomes fragmented.  I don't think much can be
> done about the time to reindex (or the order vacuum processes indexes).

Actually, the same question about why your frequently dumping and
restoring applies to full vacuums.  Why are you doing them?  A properly
running database should not need vacuum fulls.  Nor reindexing.

Standard vacuums, scheduled to run often enough, combined with a large
enough Free Space Map should mean never needing a full vacuum or
reindex.  If vacuums slow your system down too much, then you don't have
enough I/O bandwidth, and need to see the previous point about a better
RAID setup.

You should look into slony.  You can replicate from one version of pgsql
to another, a feature intended to be used for updating.

That way, you can put an 8.1 server behind your 7.4 or 8.0 server,
replicate to it, wait for it to catch up, shut down your app, fail over
your server to the 8.1 machine and viola, you're on 8.1 with minimum
downtime.

Re: Adding foreign key constraints without integrity check?

From

"Florian G. Pflug"

Date:

19 June 2006, 19:44:24

louis gonzales wrote:
> Florian,
> Are you certain:
>
> "You can only create an FK if the fields you are referencing in the
> foreign table form a PK there. And creating a PK implicitly creates an
> index, which you can't drop without dropping the PK :-("
>
Arg.. Should have written "unique index" instead of primary key..
But it doesn't change much, since a unique index and a pk are nearly
the same.

> I'm not sure I am convinced the necessity of a foreign key, "needing" to
> reference a primary keyed entry from a different table.
I tried the following:
create table a(id int4) ;
create table b(id int4, a_id int4) ;
alter table b add constraint pk foreign key (a_id) references a (id) ;

The alter table gave me an error stating that I need to have a unique index
defined on a.id...

greetings, Florian Pflug

Re: Adding foreign key constraints without integrity

From

Wes

Date:

19 June 2006, 20:09:19

On 6/19/06 4:48 PM, "Scott Marlowe" <smarlowe@g2switchworks.com> wrote:

>
> I think Jim was thinking of RAID 1+0.  If you gotta use RAID 5, at least
> have a contoller with several hundred megs of battery backed cache.
> Better yet, use a RAID controller with a gig or so of BBU cache and run
> RAID 1+0 on it. If you can.

Yes, I realize that is ideal (I meant to write RAID 10 not 1).  That takes
hardware I don't have, and can't justify spending the money on right now.
I/O doesn't appear to be a major issue, although I wasn't constantly
monitoring it.  Most of the time, the I/O load was not that high - CPU was
pegged. Normal production updates and queries are well within (not at all
low) expectations.

> Actually, the same question about why your frequently dumping and
> restoring applies to full vacuums.  Why are you doing them?  A properly
> running database should not need vacuum fulls.  Nor reindexing.

Simple...  VACUUM FULL reads the entire database.  Between a full source
dump and a full vacuum, I've got a better chance of finding and recovering
from corruption sooner.  Once bit...  There is no (last time I checked)
utility to verify full database integrity, and there are no CRC's on data
blocks.  A pg_dumpall doesn't touch the indexes.  While this won't catch
everything, seems pretty good at catching corrupted indexes.  I've seen no
more effective ways of verifying the entire database.  The kind of hardware
problem we encountered last year is rare, but it does happen.  I've seen
similar things many times over the years.  RAID doesn't help you when a
controller mirrors garbage.

> You should look into slony.  You can replicate from one version of pgsql
> to another, a feature intended to be used for updating.

I'd love to, but that requires hardware I don't have.

Wes

Re: Adding foreign key constraints without integrity

From

Wes

Date:

19 June 2006, 23:23:27

BTW, I do appreciate any and all suggestions.  I hope my last message didn't
come across otherwise.  That's not what was intended.  It's just that
adding/replacing hardware is not an option right now.   Maybe next year...
I'm still trying to dig up another 2GB memory.

The database actually performs extremely well with the current hardware in
the vast majority of cases.  I just wish I could cut off the wasted 40+
hours adding the foreign key constraints - it would cut the reload time in
half.

It's too bad the suggestion of creating the FK before hand didn't pan out.
That looked like a good shot.

Wes

Re: Adding foreign key constraints without integrity check?

From

louis gonzales

Date:

20 June 2006, 01:49:48

Florian,
I understand where you're coming from.  Indexes are always unique and
all RDBMS systems use them to 'uniquely' identify a row from the the
perspective of internal software management.  Index != PrimaryKey, so
every table created, despite any Primary/Foreign key contraints put on
them, always have a 1-1 Index per row entry.  At least that's the way I
understand it, can someone else affirm this statement or redirect a
misguided 'me ;)'?

Thanks group,

Florian G. Pflug wrote:

> louis gonzales wrote:
>
>> Florian,
>> Are you certain:
>>
>> "You can only create an FK if the fields you are referencing in the
>> foreign table form a PK there. And creating a PK implicitly creates
>> an index, which you can't drop without dropping the PK :-("
>>
> Arg.. Should have written "unique index" instead of primary key..
> But it doesn't change much, since a unique index and a pk are nearly
> the same.
>
>> I'm not sure I am convinced the necessity of a foreign key, "needing"
>> to reference a primary keyed entry from a different table.
>
> I tried the following:
> create table a(id int4) ;
> create table b(id int4, a_id int4) ;
> alter table b add constraint pk foreign key (a_id) references a (id) ;
>
Not sure, but maybe the syntax on this is slightly ambiguous.  Try
creating table b with a primary key constraint on a_id, then alter the
table to add foreign key constraint.  I'm going to look up a couple of
references and see what I can dig up.  That may be perfectly legitimate
syntax, but it just seems off to me.

Sorry if it is, I've spent the last few days on Oracle 9i, so I'm
jumping around in my memory.... quite a bit for validity amongst
different syntax.

> The alter table gave me an error stating that I need to have a unique
> index
> defined on a.id...
>

> greetings, Florian Pflug
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Re: Adding foreign key constraints without integrity check?

From

louis gonzales

Date:

20 June 2006, 02:04:17

Florian,

So if you:

create table test (
id varchar(2) primary key,
age int );

create table test2 (
id varchar(2) primary key,
age2 int );

alter table test2 add foreign key (id) references test (id);

\d test2

you'll see that attribute "id" from test2, now has both a primary key
constraint and a foreign key that references the primary key of test.

perhaps you can assert two constraints at the same time during an "alter
table ..." not sure why your example syntax is failing


louis gonzales wrote:

> Florian,
> I understand where you're coming from.  Indexes are always unique and
> all RDBMS systems use them to 'uniquely' identify a row from the the
> perspective of internal software management.  Index != PrimaryKey, so
> every table created, despite any Primary/Foreign key contraints put on
> them, always have a 1-1 Index per row entry.  At least that's the way
> I understand it, can someone else affirm this statement or redirect a
> misguided 'me ;)'?
>
> Thanks group,
>
> Florian G. Pflug wrote:
>
>> louis gonzales wrote:
>>
>>> Florian,
>>> Are you certain:
>>>
>>> "You can only create an FK if the fields you are referencing in the
>>> foreign table form a PK there. And creating a PK implicitly creates
>>> an index, which you can't drop without dropping the PK :-("
>>>
>> Arg.. Should have written "unique index" instead of primary key..
>> But it doesn't change much, since a unique index and a pk are nearly
>> the same.
>>
>>> I'm not sure I am convinced the necessity of a foreign key,
>>> "needing" to reference a primary keyed entry from a different table.
>>
>>
>> I tried the following:
>> create table a(id int4) ;
>> create table b(id int4, a_id int4) ;
>> alter table b add constraint pk foreign key (a_id) references a (id) ;
>>
> Not sure, but maybe the syntax on this is slightly ambiguous.  Try
> creating table b with a primary key constraint on a_id, then alter the
> table to add foreign key constraint.  I'm going to look up a couple of
> references and see what I can dig up.  That may be perfectly
> legitimate syntax, but it just seems off to me.
>
> Sorry if it is, I've spent the last few days on Oracle 9i, so I'm
> jumping around in my memory.... quite a bit for validity amongst
> different syntax.
>
>> The alter table gave me an error stating that I need to have a unique
>> index
>> defined on a.id...
>>
>
>
>> greetings, Florian Pflug
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: Adding foreign key constraints without integrity check?

From

"Florian G. Pflug"

Date:

20 June 2006, 07:08:08

louis gonzales wrote:
> Florian,
> I understand where you're coming from.  Indexes are always unique and
> all RDBMS systems use them to 'uniquely' identify a row from the the
> perspective of internal software management.
Surely there are non-unique indices - meaning indices for which there
are more then one entry for a given key.

> Index != PrimaryKey, so
> every table created, despite any Primary/Foreign key contraints put on
> them, always have a 1-1 Index per row entry.  At least that's the way I
> understand it, can someone else affirm this statement or redirect a
> misguided 'me ;)'?

In postgresql at least, I believe that if you create no index (or pk), then
there is no index. The only exception are toast-tables, but you don't even
see those tables normally, and they're just an implementation detail of how
large attributes are stored.

However, the whole point of this thread was whether there is a way to create
a FK without postgres checking if it's actually satisfied, or not. This could
speed up restoring a dump, because you know that the FK is actually satisfied in
that case.

My suggestion was to create the fk _before_ loading the data, and disable it
similarly to what "--disable-triggers" doest. It turned out, however, that a
FK always depends on a unique index (be it a primary key, or not), which prevents
my plan from working :-(

greetings, Florian Pflug

Re: Adding foreign key constraints without integrity

From

Wes

Date:

20 June 2006, 11:40:37

On 6/20/06 5:07 AM, "Florian G. Pflug" <fgp@phlo.org> wrote:

> My suggestion was to create the fk _before_ loading the data, and disable it
> similarly to what "--disable-triggers" doest. It turned out, however, that a
> FK always depends on a unique index (be it a primary key, or not), which
> prevents
> my plan from working :-(

That was a great idea - too bad it didn't pan out.

I don't suppose there's any (reasonable) way to directly insert into the
system tables to create the constraint?  I could knock almost 2 days off of
the almost 4 days to reload if I could solve this.

Wes

Re: Adding foreign key constraints without integrity check?

From

Bruno Wolff III

Date:

20 June 2006, 17:21:23

On Tue, Jun 20, 2006 at 00:49:21 -0400,
  louis gonzales <gonzales@linuxlouis.net> wrote:
> Florian,
> I understand where you're coming from.  Indexes are always unique and
> all RDBMS systems use them to 'uniquely' identify a row from the the
> perspective of internal software management.  Index != PrimaryKey, so
> every table created, despite any Primary/Foreign key contraints put on
> them, always have a 1-1 Index per row entry.  At least that's the way I
> understand it, can someone else affirm this statement or redirect a
> misguided 'me ;)'?

Note that indexes are not always unique. They can sometimes still be useful for
speeding up performance even when there are duplicates. Postgres also has
partial indexes which cover only some of the rows in a table, based on
a where condition.

Re: Adding foreign key constraints without integrity check?

From

"Florian G. Pflug"

Date:

20 June 2006, 22:18:11

Wes wrote:
> On 6/20/06 5:07 AM, "Florian G. Pflug" <fgp@phlo.org> wrote:
>
>> My suggestion was to create the fk _before_ loading the data, and disable it
>> similarly to what "--disable-triggers" doest. It turned out, however, that a
>> FK always depends on a unique index (be it a primary key, or not), which
>> prevents
>> my plan from working :-(
>
> That was a great idea - too bad it didn't pan out.
>
> I don't suppose there's any (reasonable) way to directly insert into the
> system tables to create the constraint?  I could knock almost 2 days off of
> the almost 4 days to reload if I could solve this.

I believe that it's possible, but it's hard to get right. A foreign key
consists of at least
.) The actual definition in pg_constraints or so
.) A trigger
.) Various reconds in pg_depend

To be on the safe side, you'd need to look at the sourcecode, and miminc
what is done there.

Maybe you could ask at some postgresql support companies how much effort it would
be to add a "without check" flag to "alter table add constraint foreign key", and
how much they'd charge for it...

greetings, Florian Pflug

Re: Adding foreign key constraints without integrity

From

Wes

Date:

21 June 2006, 10:39:04

On 6/20/06 8:17 PM, "Florian G. Pflug" <fgp@phlo.org> wrote:

> Maybe you could ask at some postgresql support companies how much effort it
> would
> be to add a "without check" flag to "alter table add constraint foreign key",
> and
> how much they'd charge for it...

Or if I get ambitious, dig into the code myself if I can figure out where to
start...

Wes

Re: Adding foreign key constraints without integrity

From

Jim Nasby

Date:

22 June 2006, 16:59:11

On Jun 21, 2006, at 8:38 AM, Wes wrote:
>> Maybe you could ask at some postgresql support companies how much
>> effort it
>> would
>> be to add a "without check" flag to "alter table add constraint
>> foreign key",
>> and
>> how much they'd charge for it...
>
> Or if I get ambitious, dig into the code myself if I can figure out
> where to
> start...

If nothing else, you should bring it up on -hackers and ask to have
this added as a TODO. It seems like a worth-while addition to pg_dump/
restore to me...

To answer another of your emails in this thread... a LAZY vacuum of
the entire database will read every table and index in the database.
Wanting to read the entire database to check for corruption is no
reason to do a VACUUM FULL. Also, if you're curious about restore
time for your upgrade, you should be doing the restore to an 8.1.4
database, not to your current version. There's been a ton of
performance improvements made. In fact, I'm wondering if constraint
checking in restore has been improved...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Adding foreign key constraints without integrity

From

Wes

Date:

23 June 2006, 15:29:54

On 6/22/06 2:57 PM, "Jim Nasby" <jnasby@pervasive.com> wrote:

> If nothing else, you should bring it up on -hackers and ask to have
> this added as a TODO. It seems like a worth-while addition to pg_dump/
> restore to me...

Thanks for the suggestion.

> To answer another of your emails in this thread... a LAZY vacuum of
> the entire database will read every table and index in the database.
> Wanting to read the entire database to check for corruption is no
> reason to do a VACUUM FULL.

Sorry, I misspoke.  I do not do a VACUUM FULL.  I do a VACUUM VERBOSE
ANALYZE on a weekly basis.  The verbose allows me to see which
tables/indexes are getting bad (vacuum time wise).  This is currently taking
about 24 hours on a weekend (very little production load competing with it).
The time drops dramatically after a reindex, then creeps up again as the
indexes are updated in random order.

> Also, if you're curious about restore
> time for your upgrade, you should be doing the restore to an 8.1.4
> database, not to your current version. There's been a ton of
> performance improvements made. In fact, I'm wondering if constraint
> checking in restore has been improved...

Yes, that is what I did.  I'm in the process of testing an upgrade from
7.3.x to 8.1.4 - export from 7.3.x and import into 8.1.4.  Unfortunately,
I'm sitting at about 90 hours when I've got about an 80 hour window on a
long weekend...

Wes

Re: Adding foreign key constraints without integrity check?

From

Jan Wieck

Date:

24 June 2006, 17:20:28

On 6/18/2006 11:41 PM, Wes wrote:

> Is there a way to add a foreign key constraint without having to wait for it
> to check the consistency of all existing records?  If a database is being
> reloaded (pg_dumpall then load), it really shouldn't be necessary to check
> the referential integrity - or at least I should be able to stipulate that I
> am accepting that risk.
>
> My database reload is currently taking about 6 hours to load the data, 42
> hours to reindex, and about another 40 hours or so to check the foreign key
> constraints (about 1.2 billion rows).  That's a very long time to be out of
> commission.  I'd really like to eliminate that second 40 hours so I can get
> it down to a normal weekend.

The original implementation when the feature was new was to restore the
foreign key constraints at the end of the dump with CREATE CONSTRAINT
TRIGGER, which does not check the existing data. pg_dump was changed to
use ALTER TABLE instead. Maybe it would be good to have the old
behaviour as an option?


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Lock changes with 8.1 - what's the right lock?

From

Wes

Date:

16 July 2006, 19:46:29

8.1 improved locking for foreign key references but had an unexpected
consequence to our application - no parallel loads in our application.

The application does an EXCLUSIVE lock on 'addresses'.  It then gets all of
the keys from 'addresses' it needs, and adds new ones encountered in this
load.  It then completes the transaction, releases the exclusive lock, and
inserts the other table's records using the values read from/inserted into
'addresses'.

There are foreign key constraints between the various tables and 'addresses'
to insure referential integrity.

Previously (pgsql 7.4.5), multiple loads would run simultaneously - and
occasionally got 'deadlock detected' with the foreign key locks even though
they were referenced in sorted order.  When loading tables other than
'addresses', foreign key locks did not prevent other jobs from grabbing the
exclusive lock on 'addresses'.

With 8.1.4, the foreign key locks prevent other instances from grabbing the
lock, so they wait until the first job is complete - only one job loads at a
time.

About EXCLUSIVE locks, the manual says: "...only reads from the table can
proceed in parallel with a transaction holding this lock mode."

What is now the appropriate lock?  It needs to:

  1. Prevent others from updating the table
  2. Block other jobs that are requesting the same lock (if job 2 does a
SELECT and finds nothing, it will try to create the record that job 1 may
already have created in its transaction).
  3. Not conflict with foreign key reference locks

SHARE does not appear to be appropriate - it would fail #2.  Maybe "SHARE
UPDATE EXCLUSIVE"?

Wes

Re: Lock changes with 8.1 - what's the right lock?

From

Michael Fuhr

Date:

17 July 2006, 01:33:43

On Sun, Jul 16, 2006 at 05:46:16PM -0500, Wes wrote:
> Previously (pgsql 7.4.5), multiple loads would run simultaneously - and
> occasionally got 'deadlock detected' with the foreign key locks even though
> they were referenced in sorted order.  When loading tables other than
> 'addresses', foreign key locks did not prevent other jobs from grabbing the
> exclusive lock on 'addresses'.

Unless I'm misunderstanding you or a bug was fixed between 7.4.5
and 7.4.13 (the version I'm running), I'm not convinced that last
statement is true.  EXCLUSIVE conflicts with all lock types except
ACCESS SHARE; foreign key references prior to 8.1 use SELECT FOR
UPDATE and in 8.1 they use SELECT FOR SHARE, but in both cases they
acquire ROW SHARE on the referenced table, which conflicts with
EXCLUSIVE.

> With 8.1.4, the foreign key locks prevent other instances from grabbing the
> lock, so they wait until the first job is complete - only one job loads at a
> time.

Again, maybe I'm misunderstanding you, but the following example
behaves the same way in 8.1.4 and 7.4.13 (foo has a foreign key
reference to addresses):

T1: BEGIN;
T1: INSERT INTO foo (address_id) VALUES (1);
T2: BEGIN;
T2: LOCK TABLE addresses IN EXCLUSIVE MODE;
T2: (blocked until T1 completes)

Does this example differ from what you're doing or seeing?

> What is now the appropriate lock?  It needs to:
>
>   1. Prevent others from updating the table
>   2. Block other jobs that are requesting the same lock (if job 2 does a
> SELECT and finds nothing, it will try to create the record that job 1 may
> already have created in its transaction).
>   3. Not conflict with foreign key reference locks

SHARE ROW EXCLUSIVE is the weakest lock that meets these requirements.
It conflicts with itself (#2) and with ROW EXCLUSIVE, which UPDATE,
DELETE, and INSERT acquire (#1), but doesn't conflict with ROW SHARE,
which is what SELECT FOR UPDATE/SHARE acquire (#3).

--
Michael Fuhr