Thread: How are foreign key constraints built?

How are foreign key constraints built?

From
Wes
Date:
How are foreign key constraints built?  In loading my database into
PostgreSQL 8.0, on the command:

ALTER TABLE ONLY TABLEA
    ADD CONSTRAINT "$1" FOREIGN KEY (mkey) REFERENCES tableb(mkey) ON DELETE
CASCADE;

I ended up with the following in pg_tmp as it is adding the constraint:

-rw-------    1 postgres  admin  1073741824 23 Jan 06:09 pgsql_tmp2559.25
-rw-------    1 postgres  admin  1073741824 23 Jan 06:24 pgsql_tmp2559.26
-rw-------    1 postgres  admin  1073741824 23 Jan 06:39 pgsql_tmp2559.27
-rw-------    1 postgres  admin   636526592 23 Jan 06:48 pgsql_tmp2559.28
-rw-------    1 postgres  admin  1073741824 23 Jan 11:51 pgsql_tmp2559.29
-rw-------    1 postgres  admin  1073741824 23 Jan 11:34 pgsql_tmp2559.30
-rw-------    1 postgres  admin  1073741824 23 Jan 11:36 pgsql_tmp2559.31
-rw-------    1 postgres  admin  1073741824 23 Jan 11:37 pgsql_tmp2559.32
-rw-------    1 postgres  admin  1073741824 23 Jan 11:38 pgsql_tmp2559.33
-rw-------    1 postgres  admin  1073741824 23 Jan 11:39 pgsql_tmp2559.34
-rw-------    1 postgres  admin  1073741824 23 Jan 11:41 pgsql_tmp2559.35
-rw-------    1 postgres  admin  1073741824 23 Jan 11:42 pgsql_tmp2559.36
-rw-------    1 postgres  admin  1073741824 23 Jan 11:43 pgsql_tmp2559.37
-rw-------    1 postgres  admin  1073741824 23 Jan 11:45 pgsql_tmp2559.38
-rw-------    1 postgres  admin  1073741824 23 Jan 11:46 pgsql_tmp2559.39
-rw-------    1 postgres  admin  1073741824 23 Jan 11:47 pgsql_tmp2559.40
-rw-------    1 postgres  admin  1073741824 23 Jan 11:49 pgsql_tmp2559.41
-rw-------    1 postgres  admin  1073741824 23 Jan 11:50 pgsql_tmp2559.42
-rw-------    1 postgres  admin   603136000 23 Jan 11:51 pgsql_tmp2559.43

I believe files 25-28 are about the size of one of the indexes, but 29-43 is
about 35% larger than the tmp files when building the other index (the other
index is about 10GB).

There's no problem here, I'd just like to understand what it is doing.  I
expected adding the foreign key constraint would just use the existing
indexes to verify the database is currently consistent.  Is this just
working space to more efficiently build the initial constraint, or does it
actually write this to the database?

Wes



Re: How are foreign key constraints built?

From
Tom Lane
Date:
Wes <wespvp@syntegra.com> writes:
> There's no problem here, I'd just like to understand what it is doing.

Either a hash or merge join between the two tables, to verify that all
the keys in the referencing table exist in the referenced table.  The
intermediate data is evidently spilling to disk.

> I expected adding the foreign key constraint would just use the existing
> indexes to verify the database is currently consistent.

People have this weird notion that an index-based plan is always faster
than anything else.  If you like you can try the operation with "set
enable_seqscan = off", but I bet it will take longer.

            regards, tom lane

Re: How are foreign key constraints built?

From
"Jim C. Nasby"
Date:
On Sun, Jan 23, 2005 at 02:01:41PM -0500, Tom Lane wrote:
> Wes <wespvp@syntegra.com> writes:
> > There's no problem here, I'd just like to understand what it is doing.
>
> Either a hash or merge join between the two tables, to verify that all
> the keys in the referencing table exist in the referenced table.  The
> intermediate data is evidently spilling to disk.

If this were done with a nested loop, you wouldn't need any intermediate
data, right? You can just throw the result-set away. ISTM that in the
case of just verifying existance of data, it might be faster to do a
nested loop that doesn't have to spill anywhere, instead of a hash or
merge that needs to generate a bunch of intermediate data. Is the
optimizer able to take this into account?

> > I expected adding the foreign key constraint would just use the existing
> > indexes to verify the database is currently consistent.
>
> People have this weird notion that an index-based plan is always faster
> than anything else.  If you like you can try the operation with "set
> enable_seqscan = off", but I bet it will take longer.

Well, every other database I've used can do index covering, which means
index scans *are* faster.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: How are foreign key constraints built?

From
Alvaro Herrera
Date:
On Sun, Jan 23, 2005 at 03:19:10PM -0600, Jim C. Nasby wrote:

> > People have this weird notion that an index-based plan is always faster
> > than anything else.  If you like you can try the operation with "set
> > enable_seqscan = off", but I bet it will take longer.
>
> Well, every other database I've used can do index covering, which means
> index scans *are* faster.

... on those database systems.  Indexes are different in Postgres in
general: they don't have visibility info (other systems don't need it,
tuples are always visible), and in some databases you have clustered
indexes, where the index is also the heap.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"We are who we choose to be", sang the goldfinch
when the sun is high (Sandman)

Re: How are foreign key constraints built?

From
"Jim C. Nasby"
Date:
On Sun, Jan 23, 2005 at 06:45:36PM -0300, Alvaro Herrera wrote:
> On Sun, Jan 23, 2005 at 03:19:10PM -0600, Jim C. Nasby wrote:
>
> > > People have this weird notion that an index-based plan is always faster
> > > than anything else.  If you like you can try the operation with "set
> > > enable_seqscan = off", but I bet it will take longer.
> >
> > Well, every other database I've used can do index covering, which means
> > index scans *are* faster.
>
> ... on those database systems.  Indexes are different in Postgres in
> general: they don't have visibility info (other systems don't need it,
> tuples are always visible), and in some databases you have clustered
> indexes, where the index is also the heap.

Yes, I understand. I was just pointing out that in other databases, an
index scan of even the entire table can be faster, hence the mentality
that index scans are always better.

I really hope that the current discussion on hackers about tuple
visibility in indexes leads somewhere; I think that would be a huge gain
for PostgreSQL.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: How are foreign key constraints built?

From
Greg Stark
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:

> Well, every other database I've used can do index covering, which means
> index scans *are* faster.

Still not necessarily true. In a case like this it would still be random
access which would be slower than sequential access.

Though Oracle is capable of taking the best of both worlds and doing a hash
join but taking the data from sequentially reading the index instead of the
table.

--
greg

Re: How are foreign key constraints built?

From
"Jim C. Nasby"
Date:
On Mon, Jan 24, 2005 at 09:24:49AM -0500, Greg Stark wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
>
> > Well, every other database I've used can do index covering, which means
> > index scans *are* faster.
>
> Still not necessarily true. In a case like this it would still be random
> access which would be slower than sequential access.
Actually, even with random access, a covering index can still be faster.
Imagine a single-field index on a table with 40 fields.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: How are foreign key constraints built?

From
Wes
Date:
On 1/23/05 1:01 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> If you like you can try the operation with "set
> enable_seqscan = off", but I bet it will take longer.

Ouch!  That was a big mistake.  Since inquiring minds want to know, I
decided to give that a try.  The expected outcome is to beat the heck out of
the index disks as it read one index and referenced the other to see if the
value existed.  What appears to have happened is that it went through the
same process as before, but read each data record via the index.  It still
created all the pgsql_tmp files, the data disk was still the heaviest hit
(expected no or little access there), and it beat the living daylights out
of my swap - pageins/outs like crazy.  The I/O on the index disks was
negligible compared to the data and swap disks.  I won't try that again...

Wes



Re: How are foreign key constraints built?

From
"Jim C. Nasby"
Date:
On Tue, Jan 25, 2005 at 09:38:20AM -0600, Wes wrote:
> On 1/23/05 1:01 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
> > If you like you can try the operation with "set
> > enable_seqscan = off", but I bet it will take longer.
>
> Ouch!  That was a big mistake.  Since inquiring minds want to know, I
> decided to give that a try.  The expected outcome is to beat the heck out of
> the index disks as it read one index and referenced the other to see if the
> value existed.  What appears to have happened is that it went through the
> same process as before, but read each data record via the index.  It still
> created all the pgsql_tmp files, the data disk was still the heaviest hit
> (expected no or little access there), and it beat the living daylights out
> of my swap - pageins/outs like crazy.  The I/O on the index disks was
> negligible compared to the data and swap disks.  I won't try that again...

If the OS is swapping you've got serious issues; you need to look at
your configurating settings that deal with memory and figure out why
you're running out.

And yes, PostgreSQL can't do 'index covering', so even when it accesses
a table via an index it still has to read the base table. This is why if
you need to read the entire table it's faster to seqscan than index
scan.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"