Thread: constraint with reference to the same table
Hello. I'm using PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66. Here is topic. Table transactions: => \d transactions Table "public.transactions" Column | Type | Modifiers -------------+--------------+----------- trxn_id | integer | not null trxn_ret | integer | trxn_for | integer | status | numeric(2,0) | not null auth_status | numeric(2,0) | not null Indexes: transactions_pkey primary key btree (trxn_id) Foreign Key constraints: trxns_id FOREIGN KEY (trxn_id) REFERENCES connections(conn_id) ON UPDATE NO ACTION ON DELETE NOACTION, trxns_ret FOREIGN KEY (trxn_ret) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON DELETENO ACTION, trxns_for FOREIGN KEY (trxn_for) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON DELETENO ACTION As you can see, trxns_ret and trxns_for constraints references to the same table they come from. Maintenance of system includes the following step: delete from transactions where transactions.trxn_id = uneeded_trxns.trxn_id; transactions volume is about 10K-20K rows. uneeded_trxns volume is about 3K-5K rows. Problem: It takes to MUCH time. EXPLAIN says: => explain delete from transactions where transactions.trxn_id = balance_delete_data.conn_id; QUERY PLAN ----------------------------------------------------------------------------------- Hash Join (cost=86.47..966.66 rows=5238 width=14) Hash Cond: ("outer".trxn_id = "inner".conn_id) -> Seq Scan on transactions (cost=0.00..503.76 rows=24876 width=10) -> Hash (cost=73.38..73.38 rows=5238 width=4) -> Seq Scan on balance_delete_data (cost=0.00..73.38 rows=5238 width=4) (5 rows) I was waiting for about 30 minutes and then hit ^C. After some time spent dropping indexes and constraints, I've found out, that problem was in those 2 "cyclic" constraints. After drop, query passed in some seconds (that is suitable). Question: why so? Thanks in advance. -- Victor Yegorov
On Thu, 15 May 2003, Victor Yegorov wrote: > I'm using PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66. > > Here is topic. Table transactions: > > => \d transactions > Table "public.transactions" > Column | Type | Modifiers > -------------+--------------+----------- > trxn_id | integer | not null > trxn_ret | integer | > trxn_for | integer | > status | numeric(2,0) | not null > auth_status | numeric(2,0) | not null > Indexes: transactions_pkey primary key btree (trxn_id) > Foreign Key constraints: trxns_id FOREIGN KEY (trxn_id) REFERENCES connections(conn_id) ON UPDATE NO ACTION ON DELETE NOACTION, > trxns_ret FOREIGN KEY (trxn_ret) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON DELETENO ACTION, > trxns_for FOREIGN KEY (trxn_for) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON DELETENO ACTION > > As you can see, trxns_ret and trxns_for constraints references to the same table they come from. > > Maintenance of system includes the following step: > delete from transactions where transactions.trxn_id = uneeded_trxns.trxn_id; > transactions volume is about 10K-20K rows. > uneeded_trxns volume is about 3K-5K rows. > > > Problem: It takes to MUCH time. EXPLAIN says: > > I was waiting for about 30 minutes and then hit ^C. > > After some time spent dropping indexes and constraints, I've found out, that problem was in > those 2 "cyclic" constraints. After drop, query passed in some seconds (that is suitable). > > Question: why so? For each row dropped it's making sure that no row has either a trxn_ret or trxn_for that pointed to that row. If those columns aren't indexed it's going to be amazingly slow (if they are indexed it'll probably only be normally slow ;) ).
Hi,
Can I confirm what this means then ..
For large table's each column with ref. inegritry I should create an index on those columns ?
So if I create a table like this :
CREATE TABLE business_businesstype
(
b_bt_id serial PRIMARY KEY,
b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL,
bt_id integer REFERENCES businesstype ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
);
I should then create 2 index's
CREATE INDEX business_idx ON business_businesstype (business);
CREATE INDEX businesstype_idx ON business_businesstype (businesstype);
Thanks
Regards
Rudi.
Stephan Szabo wrote:
Can I confirm what this means then ..
For large table's each column with ref. inegritry I should create an index on those columns ?
So if I create a table like this :
CREATE TABLE business_businesstype
(
b_bt_id serial PRIMARY KEY,
b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL,
bt_id integer REFERENCES businesstype ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
);
I should then create 2 index's
CREATE INDEX business_idx ON business_businesstype (business);
CREATE INDEX businesstype_idx ON business_businesstype (businesstype);
Thanks
Regards
Rudi.
Stephan Szabo wrote:
On Thu, 15 May 2003, Victor Yegorov wrote:I'm using PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66. Here is topic. Table transactions: => \d transactions Table "public.transactions" Column | Type | Modifiers -------------+--------------+-----------trxn_id | integer | not nulltrxn_ret | integer |trxn_for | integer |status | numeric(2,0) | not nullauth_status | numeric(2,0) | not null Indexes: transactions_pkey primary key btree (trxn_id) Foreign Key constraints: trxns_id FOREIGN KEY (trxn_id) REFERENCES connections(conn_id) ON UPDATE NO ACTION ON DELETE NO ACTION, trxns_ret FOREIGN KEY (trxn_ret) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON DELETE NO ACTION, trxns_for FOREIGN KEY (trxn_for) REFERENCES transactions(trxn_id) ON UPDATE NO ACTION ON DELETE NO ACTION As you can see, trxns_ret and trxns_for constraints references to the same table they come from. Maintenance of system includes the following step: delete from transactions where transactions.trxn_id = uneeded_trxns.trxn_id; transactions volume is about 10K-20K rows. uneeded_trxns volume is about 3K-5K rows. Problem: It takes to MUCH time. EXPLAIN says: I was waiting for about 30 minutes and then hit ^C. After some time spent dropping indexes and constraints, I've found out, that problem was in those 2 "cyclic" constraints. After drop, query passed in some seconds (that is suitable). Question: why so?For each row dropped it's making sure that no row has either a trxn_ret or trxn_for that pointed to that row. If those columns aren't indexed it's going to be amazingly slow (if they are indexed it'll probably only be normally slow ;) ). ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Hi, Oops - sorry I made a typo on those 2 index's. Wrong: CREATE INDEX business_idx ON business_businesstype (business); CREATE INDEX businesstype_idx ON business_businesstype (businesstype); Right: CREATE INDEX business_idx ON business_businesstype (b_id); CREATE INDEX businesstype_idx ON business_businesstype (bt_id); The table: CREATE TABLE business_businesstype ( b_bt_id serial PRIMARY KEY, b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL, bt_id integer REFERENCES businesstype ON UPDATE CASCADE ON DELETE CASCADE NOT NULL ); Thanks Regards Rudi.
* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 02:59]: > Hi, > > Can I confirm what this means then .. > > For large table's each column with ref. inegritry I should create an > index on those columns ? I think, that indicies are needed only at delete stage to decrease search time of possible referencing rows. Not only, of course, but when we speak about INSERT/UPDATE/DELETE data it is so. On the other side, indicies increases total query runtime, because for each row deleted/updated/inserted it'll be necessary to update each index. In my case, I at first drop "cyclic" constraints, do the job and then restore them. -- Victor Yegorov
On Thu, 15 May 2003, Rudi Starcevic wrote: > Can I confirm what this means then .. > > For large table's each column with ref. inegritry I should create an > index on those columns ? In general, yes. There's always an additional cost with having additional indexes to modifications to the table, so you need to balance the costs by what sorts of queries you're doing. For example, if you're doing a references constraint to a table that is mostly there for say providing a nice name for something and those values aren't likely to change (and it's okay if a change were expensive) then you wouldn't necessarily want the additional index.
Victor,
I see.
Good point.
Thank you kindly.
Regards
Rudi.
Victor Yegorov wrote:
I see.
Good point.
Thank you kindly.
Regards
Rudi.
Victor Yegorov wrote:
* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 02:59]:Hi, Can I confirm what this means then .. For large table's each column with ref. inegritry I should create an index on those columns ?I think, that indicies are needed only at delete stage to decrease search time of possible referencing rows. Not only, of course, but when we speak about INSERT/UPDATE/DELETE data it is so. On the other side, indicies increases total query runtime, because for each row deleted/updated/inserted it'll be necessary to update each index. In my case, I at first drop "cyclic" constraints, do the job and then restore them.
On Thu, 15 May 2003, Victor Yegorov wrote: > * Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 02:59]: > > Hi, > > > > Can I confirm what this means then .. > > > > For large table's each column with ref. inegritry I should create an > > index on those columns ? > > I think, that indicies are needed only at delete stage to decrease search > time of possible referencing rows. > Not only, of course, but when we speak about > INSERT/UPDATE/DELETE data it is so. > > On the other side, indicies increases total query runtime, because for > each row deleted/updated/inserted it'll be necessary to update each index. > > In my case, I at first drop "cyclic" constraints, do the job and then > restore them. That can be a win, but if you're actually dropping and adding the constraint again it may not be on large tables since it'll still do a whole bunch of index lookups to check the existing rows when the alter table add constraint happens. Disabling triggers and re-enabling them is faster but breaks the guarantee of the constraint.
* Stephan Szabo <sszabo@megazone23.bigpanda.com> [15.05.2003 03:54]: > > That can be a win, but if you're actually dropping and adding the > constraint again it may not be on large tables since it'll still do a > whole bunch of index lookups to check the existing rows when the alter > table add constraint happens. Disabling triggers and re-enabling them is > faster but breaks the guarantee of the constraint. You're right. I thought of big tables after posting the reply. My solution is suitable for my case, i.e. not so big tables. Returning to the very first question I asked. May be it is usefull to implicitly create index on foreign key columns? Actually, untill you had pointed on seq. scans, I thought Postgres is using internal indicies - don't ask me why. -- Victor Yegorov
Stephan,
Thanks also - I'm actually building a new database as I write this so this topic is perfect timing for me.
I'm using ref. integrity right now mostly for many-to-many type situations.
For example.
I create a table of People,
then a table of Business's,
then I need to relate many people to many business's.
So I create a business_people table *with* index's to the referred to tables
Eg:
CREATE TABLE business_people
(
b_p_id serial PRIMARY KEY,
b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL,
p_id integer REFERENCES people ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
);
CREATE INDEX b_p_b_id_idx ON business_people (b_id);
CREATE INDEX b_p_p_id_idx ON business_people (p_id);
The b_id and p_id are primary key's in other table's so they have an index too.
So far I think I've done every thing right.
Can I ask if you'd agree or not ?
As a side note when I build my PG database's I do it 100% by hand in text.
That is I write Create table statements, save them to file then cut'n'paste them into phpPgAdmin or use PSQL.
So the code I have below is the same code I use build the DB.
I wonder if this is OK or would make other PG user's gasp.
I'm sure most database people out there, not sure about PG people, would use some sort of GUI.
Thanks kindly
I appreciate your time guy's.
Regards
Rudi.
Stephan Szabo wrote:
Thanks also - I'm actually building a new database as I write this so this topic is perfect timing for me.
I'm using ref. integrity right now mostly for many-to-many type situations.
For example.
I create a table of People,
then a table of Business's,
then I need to relate many people to many business's.
So I create a business_people table *with* index's to the referred to tables
Eg:
CREATE TABLE business_people
(
b_p_id serial PRIMARY KEY,
b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL,
p_id integer REFERENCES people ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
);
CREATE INDEX b_p_b_id_idx ON business_people (b_id);
CREATE INDEX b_p_p_id_idx ON business_people (p_id);
The b_id and p_id are primary key's in other table's so they have an index too.
So far I think I've done every thing right.
Can I ask if you'd agree or not ?
As a side note when I build my PG database's I do it 100% by hand in text.
That is I write Create table statements, save them to file then cut'n'paste them into phpPgAdmin or use PSQL.
So the code I have below is the same code I use build the DB.
I wonder if this is OK or would make other PG user's gasp.
I'm sure most database people out there, not sure about PG people, would use some sort of GUI.
Thanks kindly
I appreciate your time guy's.
Regards
Rudi.
Stephan Szabo wrote:
On Thu, 15 May 2003, Rudi Starcevic wrote:Can I confirm what this means then .. For large table's each column with ref. inegritry I should create an index on those columns ?In general, yes. There's always an additional cost with having additional indexes to modifications to the table, so you need to balance the costs by what sorts of queries you're doing. For example, if you're doing a references constraint to a table that is mostly there for say providing a nice name for something and those values aren't likely to change (and it's okay if a change were expensive) then you wouldn't necessarily want the additional index. ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
On Thu, 15 May 2003, Rudi Starcevic wrote: > I'm using ref. integrity right now mostly for many-to-many type situations. > > For example. > I create a table of People, > then a table of Business's, > then I need to relate many people to many business's. > > So I create a business_people table *with* index's to the referred to tables > Eg: > CREATE TABLE business_people > ( > b_p_id serial PRIMARY KEY, > b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT > NULL, > p_id integer REFERENCES people ON UPDATE CASCADE ON DELETE CASCADE NOT > NULL > ); > CREATE INDEX b_p_b_id_idx ON business_people (b_id); > CREATE INDEX b_p_p_id_idx ON business_people (p_id); > > The b_id and p_id are primary key's in other table's so they have an > index too. > > So far I think I've done every thing right. > Can I ask if you'd agree or not ? Generally, yes, I'd agree with something like that, although I might not have given a separate serial and instead made the primary key the two id integers (since I'm not sure having the same reference twice makes sense and I'm not sure that you'll need to reference the relationship itself separately). If you weren't likely to be doing your own lookups on b_id and p_id I'd have to consider the indexes more carefully, since I'd expect that inserts/updates to business_people are much much more likely than deletes or key updates to business or people. > As a side note when I build my PG database's I do it 100% by hand in text. > That is I write Create table statements, save them to file then > cut'n'paste them into phpPgAdmin or use PSQL. > So the code I have below is the same code I use build the DB. > I wonder if this is OK or would make other PG user's gasp. > I'm sure most database people out there, not sure about PG people, would > use some sort of GUI. I generally do something like the above, or make the tables, get them to what I want and schema dump them.
On Thu, 15 May 2003, Victor Yegorov wrote: > * Stephan Szabo <sszabo@megazone23.bigpanda.com> [15.05.2003 03:54]: > > > > That can be a win, but if you're actually dropping and adding the > > constraint again it may not be on large tables since it'll still do a > > whole bunch of index lookups to check the existing rows when the alter > > table add constraint happens. Disabling triggers and re-enabling them is > > faster but breaks the guarantee of the constraint. > > You're right. I thought of big tables after posting the reply. My solution > is suitable for my case, i.e. not so big tables. This may become slightly a higher point of balance if we change the alter table time check to a single query rather than repeated checks as well. > Returning to the very first question I asked. > May be it is usefull to implicitly create index on foreign key columns? Maybe, it seems to me that we've been trying to move away from such implicit behavior (such as serial columns no longer implicitly being unique) in general. I don't personally have a strong feeling on the subject.
Stephen, >> although I might not >> have given a separate serial and instead made the primary key the two id >> integers (since I'm not sure having the same reference twice makes sense >> and I'm not sure that you'll need to reference the relationship itself >> separately). Yes I see. That's a very good point. If I make the primary key across both the business and person instead of using a new primary key/serial then that will prevent the same business to person relationship being entered twice. If I did it that way would this be OK: New: CREATE TABLE business_person ( b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL, pn_id integer REFERENCES person ON UPDATE CASCADE ON DELETE CASCADE NOT NULL PRIMARY KEY(b_id,pn_id); ); CREATE INDEX b_pn_b_id_idx ON business_person (b_id); CREATE INDEX b_pn_pn_id_idx ON business_person (pn_id); Old: CREATE TABLE business_person ( b_pn_id serial PRIMARY KEY, b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL, pn_id integer REFERENCES person ON UPDATE CASCADE ON DELETE CASCADE NOT NULL ); CREATE INDEX b_pn_b_id_idx ON business_person (b_id); CREATE INDEX b_pn_pn_id_idx ON business_person (pn_id); As I'd like to sometime's look up business's, sometime's look up people and sometimes look up both I think I should keep the Index's. Cheers Rudi.
* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 04:46]: > Stephen, > > > New: > CREATE TABLE business_person > ( > b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT > NULL, > pn_id integer REFERENCES person ON UPDATE CASCADE ON DELETE CASCADE NOT NULL > PRIMARY KEY(b_id,pn_id); > ); > CREATE INDEX b_pn_b_id_idx ON business_person (b_id); > CREATE INDEX b_pn_pn_id_idx ON business_person (pn_id); May be it's better to name indexes a bit more clearer? No impact on overall performance, but you'll ease your life, if you project will grow to hundreds of tables and thousands of indicies. > As I'd like to sometime's look up business's, sometime's look up people and > sometimes > look up both I think I should keep the Index's. If your lookups are part of business logic, than it's ok. Also, if your system generates reports using several table joins that may speed up the things. Otherwise, for curiosity cases, it's better to wait some time for the result of one-time queries. -- Victor Yegorov
Victor, >> May be it's better to name indexes a bit more clearer? No impact on overall >> performance, but you'll ease your life, if you project will grow to hundreds >> of tables and thousands of indicies. Very true. Instead of: b_pn_b_id_idx, I think better would be: busines_person_b_id_idx Thanks Rudi.
Victor Yegorov wrote:
* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 04:46]:Stephen, New: CREATE TABLE business_person ( b_id integer REFERENCES business ON UPDATE CASCADE ON DELETE CASCADE NOT NULL, pn_id integer REFERENCES person ON UPDATE CASCADE ON DELETE CASCADE NOT NULL PRIMARY KEY(b_id,pn_id); ); CREATE INDEX b_pn_b_id_idx ON business_person (b_id); CREATE INDEX b_pn_pn_id_idx ON business_person (pn_id);May be it's better to name indexes a bit more clearer? No impact on overall performance, but you'll ease your life, if you project will grow to hundreds of tables and thousands of indicies.As I'd like to sometime's look up business's, sometime's look up people and sometimes look up both I think I should keep the Index's.If your lookups are part of business logic, than it's ok. Also, if your system generates reports using several table joins that may speed up the things. Otherwise, for curiosity cases, it's better to wait some time for the result of one-time queries.
Perhaps I also need a 3rd Index ? One for Business's One for People and One for Business_People. I think I may need the 3rd Index for query's like Select b_id From business_people where b_id = 1 and pn_id = 2; I think this way I have an Index for 3 type's of queries. When I looking for data on just the business, when I'm looking for data on just people and when I'm looking for data on business people relationships. Cheers Rudi.
* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 05:15]: > Perhaps I also need a 3rd Index ? > > One for Business's > One for People and > One for Business_People. > > I think I may need the 3rd Index for query's like You don't need it. Primary key on that 2 columns will create a unique index on them. Of course, if you left things unchanged - you'll need to create business_people index yourself. execute: => \d business_people and take a glance on a line, describing primary key. -- Victor Yegorov
Victor,
"I see said the blind man !" ..
Thanks heaps.
I think now it's pretty clear to me.
I feel I have pretty much optimised my code / sql schema.
Thank you both,
it's a tremendous help - one learns something every day with this list.
Kind regards
Rudi.
Victor Yegorov wrote:
>> You don't need it. Primary key on that 2 columns will create a unique index >> on them. Of course, if you left things unchanged - you'll need to create >> business_people index yourself.Ahh of course ..
"I see said the blind man !" ..
Thanks heaps.
I think now it's pretty clear to me.
I feel I have pretty much optimised my code / sql schema.
Thank you both,
it's a tremendous help - one learns something every day with this list.
Kind regards
Rudi.
Victor Yegorov wrote:
* Rudi Starcevic <rudi@oasis.net.au> [15.05.2003 05:15]:Perhaps I also need a 3rd Index ? One for Business's One for People and One for Business_People. I think I may need the 3rd Index for query's likeYou don't need it. Primary key on that 2 columns will create a unique index on them. Of course, if you left things unchanged - you'll need to create business_people index yourself. execute: => \d business_people and take a glance on a line, describing primary key.