Thread: constraint with reference to the same table

constraint with reference to the same table

From

"Victor Yegorov"

Date:

14 May 2003, 19:09:36

Hello.

I'm using PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66.

Here is topic. Table transactions:

=> \d transactions
      Table "public.transactions"
   Column    |     Type     | Modifiers
-------------+--------------+-----------
 trxn_id     | integer      | not null
 trxn_ret    | integer      |
 trxn_for    | integer      |
 status      | numeric(2,0) | not null
 auth_status | numeric(2,0) | not null
Indexes: transactions_pkey primary key btree (trxn_id)
Foreign Key constraints: trxns_id FOREIGN KEY (trxn_id) REFERENCES connections(conn_id) ON UPDATE NO ACTION ON DELETE
NOACTION, 
                         trxns_ret FOREIGN KEY (trxn_ret) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON
DELETENO ACTION, 
                         trxns_for FOREIGN KEY (trxn_for) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON
DELETENO ACTION 

As you can see, trxns_ret and trxns_for constraints references to the same table they come from.

Maintenance of system includes the following step:
delete from transactions where transactions.trxn_id = uneeded_trxns.trxn_id;
transactions volume is about 10K-20K rows.
uneeded_trxns volume is about 3K-5K rows.


Problem: It takes to MUCH time. EXPLAIN says:
=> explain delete from transactions where transactions.trxn_id = balance_delete_data.conn_id;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Hash Join  (cost=86.47..966.66 rows=5238 width=14)
   Hash Cond: ("outer".trxn_id = "inner".conn_id)
   ->  Seq Scan on transactions  (cost=0.00..503.76 rows=24876 width=10)
   ->  Hash  (cost=73.38..73.38 rows=5238 width=4)
         ->  Seq Scan on balance_delete_data  (cost=0.00..73.38 rows=5238 width=4)
(5 rows)

I was waiting for about 30 minutes and then hit ^C.

After some time spent dropping indexes and constraints, I've found out, that problem was in
those 2 "cyclic" constraints. After drop, query passed in some seconds (that is suitable).

Question: why so?
Thanks in advance.

--

Victor Yegorov

Re: constraint with reference to the same table

From

Stephan Szabo

Date:

14 May 2003, 19:28:59

On Thu, 15 May 2003, Victor Yegorov wrote:

> I'm using PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66.
>
> Here is topic. Table transactions:
>
> => \d transactions
>       Table "public.transactions"
>    Column    |     Type     | Modifiers
> -------------+--------------+-----------
>  trxn_id     | integer      | not null
>  trxn_ret    | integer      |
>  trxn_for    | integer      |
>  status      | numeric(2,0) | not null
>  auth_status | numeric(2,0) | not null
> Indexes: transactions_pkey primary key btree (trxn_id)
> Foreign Key constraints: trxns_id FOREIGN KEY (trxn_id) REFERENCES connections(conn_id) ON UPDATE NO ACTION ON DELETE
NOACTION, 
>                          trxns_ret FOREIGN KEY (trxn_ret) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON
DELETENO ACTION, 
>                          trxns_for FOREIGN KEY (trxn_for) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON
DELETENO ACTION 
>
> As you can see, trxns_ret and trxns_for constraints references to the same table they come from.
>
> Maintenance of system includes the following step:
> delete from transactions where transactions.trxn_id = uneeded_trxns.trxn_id;
> transactions volume is about 10K-20K rows.
> uneeded_trxns volume is about 3K-5K rows.
>
>
> Problem: It takes to MUCH time. EXPLAIN says:
>
> I was waiting for about 30 minutes and then hit ^C.
>
> After some time spent dropping indexes and constraints, I've found out, that problem was in
> those 2 "cyclic" constraints. After drop, query passed in some seconds (that is suitable).
>
> Question: why so?

For each row dropped it's making sure that no row has either a trxn_ret or
trxn_for that pointed to that row.  If those columns aren't indexed it's
going to be amazingly slow (if they are indexed it'll probably only be
normally slow ;) ).

Re: constraint with reference to the same table

From

Rudi Starcevic

Date:

14 May 2003, 19:51:41

Hi,

Can I confirm what this means then ..

For large table's each column with ref. inegritry I should create an index on those columns ?

So if I create a table like this :
CREATE TABLE business_businesstype
(
b_bt_id serial PRIMARY KEY,
b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL,
bt_id integer REFERENCES businesstype ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
);

I should then create 2 index's

CREATE INDEX business_idx ON business_businesstype (business);
CREATE INDEX businesstype_idx ON business_businesstype (businesstype);

Thanks
Regards
Rudi.

Stephan Szabo wrote:

On Thu, 15 May 2003, Victor Yegorov wrote:

I'm using PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66.

Here is topic. Table transactions:

=> \d transactions     Table "public.transactions"  Column    |     Type     | Modifiers
-------------+--------------+-----------trxn_id     | integer      | not nulltrxn_ret    | integer      |trxn_for    | integer      |status      | numeric(2,0) | not nullauth_status | numeric(2,0) | not null
Indexes: transactions_pkey primary key btree (trxn_id)
Foreign Key constraints: trxns_id FOREIGN KEY (trxn_id) REFERENCES connections(conn_id) ON UPDATE NO ACTION ON DELETE NO ACTION,                        trxns_ret FOREIGN KEY (trxn_ret) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON DELETE NO ACTION,                        trxns_for FOREIGN KEY (trxn_for) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON DELETE NO ACTION

As you can see, trxns_ret and trxns_for constraints references to the same table they come from.

Maintenance of system includes the following step:
delete from transactions where transactions.trxn_id = uneeded_trxns.trxn_id;
transactions volume is about 10K-20K rows.
uneeded_trxns volume is about 3K-5K rows.


Problem: It takes to MUCH time. EXPLAIN says:

I was waiting for about 30 minutes and then hit ^C.

After some time spent dropping indexes and constraints, I've found out, that problem was in
those 2 "cyclic" constraints. After drop, query passed in some seconds (that is suitable).

Question: why so?

For each row dropped it's making sure that no row has either a trxn_ret or
trxn_for that pointed to that row.  If those columns aren't indexed it's
going to be amazingly slow (if they are indexed it'll probably only be
normally slow ;) ).



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: constraint with reference to the same table

From

Rudi Starcevic

Date:

14 May 2003, 20:02:07

  Hi,

Oops - sorry I made a typo on those 2 index's.

Wrong:
CREATE  INDEX business_idx ON  business_businesstype (business);
CREATE  INDEX businesstype_idx ON  business_businesstype (businesstype);

Right:
CREATE  INDEX business_idx ON  business_businesstype (b_id);
CREATE  INDEX businesstype_idx ON  business_businesstype (bt_id);

The table:
CREATE TABLE business_businesstype
(
b_bt_id serial PRIMARY KEY,
b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT
NULL,
bt_id integer REFERENCES businesstype ON UPDATE CASCADE ON DELETE
CASCADE NOT NULL
);

Thanks
Regards
Rudi.

Re: constraint with reference to the same table

From

"Victor Yegorov"

Date:

14 May 2003, 20:10:11

* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 02:59]:
> Hi,
>
> Can I confirm what this means then ..
>
> For large table's each column with ref. inegritry I should create an
> index on those columns ?

I think, that indicies are needed only at delete stage to decrease search
time of possible referencing rows.
Not only, of course, but when we speak about
INSERT/UPDATE/DELETE data it is so.

On the other side, indicies increases total query runtime, because for
each row deleted/updated/inserted it'll be necessary to update each index.

In my case, I at first drop "cyclic" constraints, do the job and then
restore them.


--

Victor Yegorov

Re: constraint with reference to the same table

From

Stephan Szabo

Date:

14 May 2003, 20:46:51

On Thu, 15 May 2003, Rudi Starcevic wrote:

> Can I confirm what this means then ..
>
> For large table's each column with ref. inegritry I should create an
> index on those columns ?

In general, yes.  There's always an additional cost with having additional
indexes to modifications to the table, so you need to balance the costs by
what sorts of queries you're doing.  For example, if you're doing a
references constraint to a table that is mostly there for say providing a
nice name for something and those values aren't likely to change (and it's
okay if a change were expensive) then you wouldn't necessarily want the
additional index.

Re: constraint with reference to the same table

From

Rudi Starcevic

Date:

14 May 2003, 20:47:49

Victor,

I see.
Good point.

Thank you kindly.
Regards
Rudi.

Victor Yegorov wrote:

* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 02:59]:

Hi,

Can I confirm what this means then ..

For large table's each column with ref. inegritry I should create an 
index on those columns ?

I think, that indicies are needed only at delete stage to decrease search
time of possible referencing rows.
Not only, of course, but when we speak about
INSERT/UPDATE/DELETE data it is so.

On the other side, indicies increases total query runtime, because for
each row deleted/updated/inserted it'll be necessary to update each index.

In my case, I at first drop "cyclic" constraints, do the job and then
restore them.

Re: constraint with reference to the same table

From

Stephan Szabo

Date:

14 May 2003, 20:49:41

On Thu, 15 May 2003, Victor Yegorov wrote:

> * Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 02:59]:
> > Hi,
> >
> > Can I confirm what this means then ..
> >
> > For large table's each column with ref. inegritry I should create an
> > index on those columns ?
>
> I think, that indicies are needed only at delete stage to decrease search
> time of possible referencing rows.
> Not only, of course, but when we speak about
> INSERT/UPDATE/DELETE data it is so.
>
> On the other side, indicies increases total query runtime, because for
> each row deleted/updated/inserted it'll be necessary to update each index.
>
> In my case, I at first drop "cyclic" constraints, do the job and then
> restore them.

That can be a win, but if you're actually dropping and adding the
constraint again it may not be on large tables since it'll still do a
whole bunch of index lookups to check the existing rows when the alter
table add constraint happens.  Disabling triggers and re-enabling them is
faster but breaks the guarantee of the constraint.

Re: constraint with reference to the same table

From

Victor Yegorov

Date:

14 May 2003, 21:01:02

* Stephan Szabo <sszabo@megazone23.bigpanda.com> [15.05.2003 03:54]:
>
> That can be a win, but if you're actually dropping and adding the
> constraint again it may not be on large tables since it'll still do a
> whole bunch of index lookups to check the existing rows when the alter
> table add constraint happens.  Disabling triggers and re-enabling them is
> faster but breaks the guarantee of the constraint.

You're right. I thought of big tables after posting the reply. My solution
is suitable for my case, i.e. not so big tables.

Returning to the very first question I asked.
May be it is usefull to implicitly create index on foreign key columns?
Actually, untill you had pointed on seq. scans, I thought Postgres is
using internal indicies - don't ask me why.


--

Victor Yegorov

Re: constraint with reference to the same table

From

Rudi Starcevic

Date:

14 May 2003, 21:04:09

Stephan,

Thanks also - I'm actually building a new database as I write this so this topic is perfect timing for me.

I'm using ref. integrity right now mostly for many-to-many type situations.

For example.
I create a table of People,
then a table of Business's,
then I need to relate many people to many business's.

So I create a business_people table *with* index's to the referred to tables
Eg:
CREATE TABLE business_people
(
b_p_id serial PRIMARY KEY,
b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL,
p_id integer REFERENCES people ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
);
CREATE INDEX b_p_b_id_idx ON business_people (b_id);
CREATE INDEX b_p_p_id_idx ON business_people (p_id);

The b_id and p_id are primary key's in other table's so they have an index too.

So far I think I've done every thing right.
Can I ask if you'd agree or not ?

As a side note when I build my PG database's I do it 100% by hand in text.
That is I write Create table statements, save them to file then cut'n'paste them into phpPgAdmin or use PSQL.
So the code I have below is the same code I use build the DB.
I wonder if this is OK or would make other PG user's gasp.
I'm sure most database people out there, not sure about PG people, would use some sort of GUI.

Thanks kindly
I appreciate your time guy's.
Regards
Rudi.

Stephan Szabo wrote:

On Thu, 15 May 2003, Rudi Starcevic wrote:

Can I confirm what this means then ..

For large table's each column with ref. inegritry I should create an
index on those columns ?

In general, yes.  There's always an additional cost with having additional
indexes to modifications to the table, so you need to balance the costs by
what sorts of queries you're doing.  For example, if you're doing a
references constraint to a table that is mostly there for say providing a
nice name for something and those values aren't likely to change (and it's
okay if a change were expensive) then you wouldn't necessarily want the
additional index.



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Re: constraint with reference to the same table

From

Stephan Szabo

Date:

14 May 2003, 21:23:30

On Thu, 15 May 2003, Rudi Starcevic wrote:

> I'm using ref. integrity right now mostly for many-to-many type situations.
>
> For example.
> I create a table of People,
> then a table of Business's,
> then I need to relate many people to many business's.
>
> So I create a business_people table *with* index's to the referred to tables
> Eg:
> CREATE TABLE business_people
> (
> b_p_id serial PRIMARY KEY,
> b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT
> NULL,
> p_id integer REFERENCES people   ON UPDATE CASCADE ON DELETE CASCADE NOT
> NULL
> );
> CREATE  INDEX b_p_b_id_idx ON  business_people (b_id);
> CREATE  INDEX b_p_p_id_idx ON  business_people (p_id);
>
> The b_id and p_id are primary key's in other table's so they have an
> index too.
>
> So far I think I've done every thing right.
> Can I ask if you'd agree or not ?

Generally, yes, I'd agree with something like that, although I might not
have given a separate serial and instead made the primary key the two id
integers (since I'm not sure having the same reference twice makes sense
and I'm not sure that you'll need to reference the relationship itself
separately).  If you weren't likely to be doing your own lookups on b_id
and p_id I'd have to consider the indexes more carefully, since I'd expect
that inserts/updates to business_people are much much more likely than
deletes or key updates to business or people.

> As a side note when I build my PG database's I do it 100% by hand in text.
> That is I write Create table statements, save them to file then
> cut'n'paste them into phpPgAdmin or use PSQL.
> So the code I have below is the same code I use build the DB.
> I wonder if this is OK or would make other PG user's gasp.
> I'm sure most database people out there, not sure about PG people, would
> use some sort of GUI.

I generally do something like the above, or make the tables, get them to
what I want and schema dump them.

Re: constraint with reference to the same table

From

Stephan Szabo

Date:

14 May 2003, 21:24:32

On Thu, 15 May 2003, Victor Yegorov wrote:

> * Stephan Szabo <sszabo@megazone23.bigpanda.com> [15.05.2003 03:54]:
> >
> > That can be a win, but if you're actually dropping and adding the
> > constraint again it may not be on large tables since it'll still do a
> > whole bunch of index lookups to check the existing rows when the alter
> > table add constraint happens.  Disabling triggers and re-enabling them is
> > faster but breaks the guarantee of the constraint.
>
> You're right. I thought of big tables after posting the reply. My solution
> is suitable for my case, i.e. not so big tables.

This may become slightly a higher point of balance if we change the alter
table time check to a single query rather than repeated checks as well.

> Returning to the very first question I asked.
> May be it is usefull to implicitly create index on foreign key columns?

Maybe, it seems to me that we've been trying to move away from such
implicit behavior (such as serial columns no longer implicitly being
unique) in general.  I don't personally have a strong feeling on the
subject.

Re: constraint with reference to the same table

From

Rudi Starcevic

Date:

14 May 2003, 21:43:20

Stephen,


>> although I might not
>> have given a separate serial and instead made the primary key the two id
>> integers (since I'm not sure having the same reference twice makes sense
>> and I'm not sure that you'll need to reference the relationship itself
>> separately).

Yes I see.
That's a very good point.
If I make the primary key across both the business and person instead of using
a new primary key/serial then that will prevent the same business to person
relationship being entered twice.

If I did it that way would this be OK:

New:
CREATE TABLE business_person
(
b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL,
pn_id integer REFERENCES person ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
PRIMARY KEY(b_id,pn_id);
);
CREATE INDEX b_pn_b_id_idx  ON business_person (b_id);
CREATE INDEX b_pn_pn_id_idx ON business_person (pn_id);


Old:
CREATE TABLE business_person
(
b_pn_id serial PRIMARY KEY,
b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL,
pn_id integer REFERENCES person ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
);
CREATE INDEX b_pn_b_id_idx  ON business_person (b_id);
CREATE INDEX b_pn_pn_id_idx ON business_person (pn_id);

As I'd like to sometime's look up business's, sometime's look up people and sometimes
look up both I think I should keep the Index's.

Cheers
Rudi.

Re: constraint with reference to the same table

From

"Victor Yegorov"

Date:

14 May 2003, 21:55:32

* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 04:46]:
> Stephen,
>
>
> New:
> CREATE TABLE business_person
> (
> b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT
> NULL,
> pn_id integer REFERENCES person ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
> PRIMARY KEY(b_id,pn_id);
> );
> CREATE INDEX b_pn_b_id_idx  ON business_person (b_id);
> CREATE INDEX b_pn_pn_id_idx ON business_person (pn_id);

May be it's better to name indexes a bit more clearer? No impact on overall
performance, but you'll ease your life, if you project will grow to hundreds
of tables and thousands of indicies.

> As I'd like to sometime's look up business's, sometime's look up people and
> sometimes
> look up both I think I should keep the Index's.

If your lookups are part of business logic, than it's ok. Also, if your
system generates reports using several table joins that may speed up the
things.

Otherwise, for curiosity cases, it's better to wait some time for the result
of one-time queries.

--

Victor Yegorov

Re: constraint with reference to the same table

From

Rudi Starcevic

Date:

14 May 2003, 22:02:40

Victor,

>> May be it's better to name indexes a bit more clearer? No impact on overall
>> performance, but you'll ease your life, if you project will grow to hundreds
>> of tables and thousands of indicies.

Very true.
Instead of: b_pn_b_id_idx,
I think better would be: busines_person_b_id_idx

Thanks
Rudi.

Victor Yegorov wrote:

* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 04:46]:

Stephen,


New:
CREATE TABLE business_person
(
b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT 
NULL,
pn_id integer REFERENCES person ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
PRIMARY KEY(b_id,pn_id);
);
CREATE INDEX b_pn_b_id_idx  ON business_person (b_id);
CREATE INDEX b_pn_pn_id_idx ON business_person (pn_id);

May be it's better to name indexes a bit more clearer? No impact on overall
performance, but you'll ease your life, if you project will grow to hundreds
of tables and thousands of indicies.

As I'd like to sometime's look up business's, sometime's look up people and 
sometimes
look up both I think I should keep the Index's.

If your lookups are part of business logic, than it's ok. Also, if your
system generates reports using several table joins that may speed up the
things.

Otherwise, for curiosity cases, it's better to wait some time for the result
of one-time queries.

Re: constraint with reference to the same table

From

Rudi Starcevic

Date:

14 May 2003, 22:10:04

Perhaps I also need a 3rd Index ?

One for Business's
One for People and
One for Business_People.

I think I may need the 3rd Index for query's like

Select b_id
 From business_people
where b_id = 1 and pn_id = 2;

I think this way I have an Index for 3 type's of queries.

When I looking for data on just the business,
when I'm looking for data on just people and
when I'm looking for data on business people relationships.

Cheers
Rudi.

Re: constraint with reference to the same table

From

Victor Yegorov

Date:

14 May 2003, 22:18:56

* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 05:15]:
> Perhaps I also need a 3rd Index ?
>
> One for Business's
> One for People and
> One for Business_People.
>
> I think I may need the 3rd Index for query's like

You don't need it. Primary key on that 2 columns will create a unique index
on them. Of course, if you left things unchanged - you'll need to create
business_people index yourself.

execute:

=> \d business_people

and take a glance on a line, describing primary key.

--

Victor Yegorov

Re: constraint with reference to the same table

From

Rudi Starcevic

Date:

14 May 2003, 22:24:30

Victor,

>> You don't need it. Primary key on that 2 columns will create a unique index
>> on them. Of course, if you left things unchanged - you'll need to create
>> business_people index yourself.

Ahh of course ..

"I see said the blind man !" ..

Thanks heaps.
I think now it's pretty clear to me.
I feel I have pretty much optimised my code / sql schema.

Thank you both,
it's a tremendous help - one learns something every day with this list.

Kind regards
Rudi.

Victor Yegorov wrote:

* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 05:15]:

Perhaps I also need a 3rd Index ?

One for Business's
One for People and
One for Business_People.

I think I may need the 3rd Index for query's like

You don't need it. Primary key on that 2 columns will create a unique index
on them. Of course, if you left things unchanged - you'll need to create
business_people index yourself.

execute:

=> \d business_people

and take a glance on a line, describing primary key.