Thread: schema/db design wrt performance
All, I just noted in another thread that use of foreign keys in postgres significantly hinders performance. I'm wondering what other aspects we should take into consideration in the design of our database. We're coming from Sybase and trying to design a more encompassing, modular, generic database that won't take a serious performance hit under postgres. Thanks, -X __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
I think FK on every database (oracle and MSSQL too) may hit performance, but only in DML (insert/update/delete). These are tradeoffs... referencial integrity vs. problems with batchload for example. My Oracle experience say when I need to do batchload, I disable constraints and then apply and work over exceptions. If you don't make referencial integrity on database maybe you need to do it on you application... and I think will be very painfull. -- Fernando O. Papa DBA > -----Mensaje original----- > De: CaptainX0r [mailto:captainx0r@yahoo.com] > Enviado el: jueves, 16 de enero de 2003 10:52 > Para: pgsql-performance@postgresql.org > Asunto: [PERFORM] schema/db design wrt performance > > > All, > > I just noted in another thread that use of foreign keys in > postgres significantly hinders performance. I'm wondering > what other aspects we should take into consideration in the > design of our database. We're coming from Sybase and trying > to design a more encompassing, modular, generic database that > won't take a serious performance hit under postgres. > > Thanks, > > -X > > __________________________________________________ > Do you Yahoo!? > Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
On Thu, Jan 16, 2003 at 05:51:40AM -0800, CaptainX0r wrote: > All, > > I just noted in another thread that use of foreign keys in postgres > significantly hinders performance. I'm wondering what other Since I think I'm the one responsible for this, I'd better say something clearer for the record. The foreign keys implementation in PostgreSQL essentially uses SELECT . . . FOR UPDATE to ensure that referenced data doesn't go away while a referencing datum is being inserted or altered. The problem with this is that frequently-referenced data are therefore effectively locked during the operation. Other writers will block on the locked data until the first writer finishes. So, for instance, consider two artificial-example tables: create table account (acct_id serial primary key); create table acct_activity (acct_id int references account(acct_id), trans_on timestamp, val numeric(12,2)); If a user has multiple connections and charges things to the same account in more than one connection at the same time, the transactions will have to be processed, effectively, in series: each one will have to wait for another to commit in order to complete. This is just a performance bottleneck. But it gets worse. Suppose the account table is like this: create table account (acct_id serial primary key, con_id int references contact(con_id)); create table contact (con_id serial primary key, name text, address1 text [. . .]); Now, if another transaction is busy trying to delete a contact at the same time the account table is being updated to reflect, say, a new contact, you run the risk of deadlock. The FK support in PostgreSQL is therefore mostly useful for low-volume applications. It can be made to work under heavier load if you use it very carefully and program your application for it. But I suggest avoiding it for heavy-duty use if you really can. > take into consideration in the design of our database. We're > coming from Sybase and trying to design a more encompassing, > modular, generic database that won't take a serious performance hit > under postgres. Avoid NOT IN. This is difficult, because the workaround in Postgres (NOT EXISTS) is frequently lousy on other systems. Apparently there is some fix for this contemplated for 7.4, but I've been really busy lately, so I haven't been following -hackers. Someone else can probably say something more useful about it. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote: > On Thu, Jan 16, 2003 at 05:51:40AM -0800, CaptainX0r wrote: > > All, > > > > I just noted in another thread that use of foreign keys in postgres > > significantly hinders performance. I'm wondering what other > > Since I think I'm the one responsible for this, I'd better say > something clearer for the record. > > The foreign keys implementation in PostgreSQL essentially uses SELECT > . . . FOR UPDATE to ensure that referenced data doesn't go away while a > referencing datum is being inserted or altered. > > The problem with this is that frequently-referenced data are > therefore effectively locked during the operation. Other writers > will block on the locked data until the first writer finishes. > > So, for instance, consider two artificial-example tables: > > create table account (acct_id serial primary key); > > create table acct_activity (acct_id int references > account(acct_id), trans_on timestamp, val numeric(12,2)); > > If a user has multiple connections and charges things to the same > account in more than one connection at the same time, the > transactions will have to be processed, effectively, in series: each > one will have to wait for another to commit in order to complete. This is true even though the default transaction mode is READ COMMITTED? -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "Basically, I got on the plane with a bomb. Basically, I | | tried to ignite it. Basically, yeah, I intended to damage | | the plane." | | RICHARD REID, who tried to blow up American Airlines | | Flight 63 | +------------------------------------------------------------+
On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote: > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote: > > If a user has multiple connections and charges things to the same > > account in more than one connection at the same time, the > > transactions will have to be processed, effectively, in series: each > > one will have to wait for another to commit in order to complete. > > This is true even though the default transaction mode is > READ COMMITTED? Yes. Remember, _both_ of these are doing SELECT. . .FOR UPDATE. Which means they both try to lock the corresponding record. But they can't _both_ lock the same record; that's what the lock prevents. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Thu, 2003-01-16 at 09:39, Andrew Sullivan wrote: > On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote: > > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote: > > > > If a user has multiple connections and charges things to the same > > > account in more than one connection at the same time, the > > > transactions will have to be processed, effectively, in series: each > > > one will have to wait for another to commit in order to complete. > > > > This is true even though the default transaction mode is > > READ COMMITTED? > > Yes. Remember, _both_ of these are doing SELECT. . .FOR UPDATE. > Which means they both try to lock the corresponding record. But they > can't _both_ lock the same record; that's what the lock prevents. Could BEFORE INSERT|UPDATE|DELETE triggers perform the same functionality while touching only the desired records, thus decreasing conflict? -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "Basically, I got on the plane with a bomb. Basically, I | | tried to ignite it. Basically, yeah, I intended to damage | | the plane." | | RICHARD REID, who tried to blow up American Airlines | | Flight 63 | +------------------------------------------------------------+
On 16 Jan 2003, Ron Johnson wrote: > On Thu, 2003-01-16 at 09:39, Andrew Sullivan wrote: > > On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote: > > > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote: > > > > > > If a user has multiple connections and charges things to the same > > > > account in more than one connection at the same time, the > > > > transactions will have to be processed, effectively, in series: each > > > > one will have to wait for another to commit in order to complete. > > > > > > This is true even though the default transaction mode is > > > READ COMMITTED? > > > > Yes. Remember, _both_ of these are doing SELECT. . .FOR UPDATE. > > Which means they both try to lock the corresponding record. But they > > can't _both_ lock the same record; that's what the lock prevents. > > Could BEFORE INSERT|UPDATE|DELETE triggers perform the same > functionality while touching only the desired records, thus > decreasing conflict? It does limit it to the corresponding records, but if you say insert a row pointing at customer 1, and in another transaction insert a row pointing at customer 1, the second waits on the first.
On Thu, Jan 16, 2003 at 09:50:04AM -0600, Ron Johnson wrote: > > Could BEFORE INSERT|UPDATE|DELETE triggers perform the same > functionality while touching only the desired records, thus > decreasing conflict? You can make the constraint DEFERRABLE INITIALY DEFERRED. It helps somewhat. But the potential for deadlock, and the backing up, will still happen to some degree. It's a well-known flaw in the FK system. I beleive the FK implementation was mostly intended as a proof of concept. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Thu, 2003-01-16 at 10:02, Stephan Szabo wrote: > On 16 Jan 2003, Ron Johnson wrote: > > > On Thu, 2003-01-16 at 09:39, Andrew Sullivan wrote: > > > On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote: > > > > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote: > > > > > > > > If a user has multiple connections and charges things to the same > > > > > account in more than one connection at the same time, the > > > > > transactions will have to be processed, effectively, in series: each > > > > > one will have to wait for another to commit in order to complete. > > > > > > > > This is true even though the default transaction mode is > > > > READ COMMITTED? > > > > > > Yes. Remember, _both_ of these are doing SELECT. . .FOR UPDATE. > > > Which means they both try to lock the corresponding record. But they > > > can't _both_ lock the same record; that's what the lock prevents. > > > > Could BEFORE INSERT|UPDATE|DELETE triggers perform the same > > functionality while touching only the desired records, thus > > decreasing conflict? > > It does limit it to the corresponding records, but if you > say insert a row pointing at customer 1, and in another transaction > insert a row pointing at customer 1, the second waits on the first. 2 points: 1. Don't you *want* TXN2 to wait on TXN1? 2. In an OLTP environment (heck, in *any* environment), the goal is to minimize txn length, so TXN2 shouldn't be waiting on TXN1 for more than a fraction of a second anyway. Am I missing something? -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "Basically, I got on the plane with a bomb. Basically, I | | tried to ignite it. Basically, yeah, I intended to damage | | the plane." | | RICHARD REID, who tried to blow up American Airlines | | Flight 63 | +------------------------------------------------------------+
On 16 Jan 2003, Ron Johnson wrote: > On Thu, 2003-01-16 at 10:02, Stephan Szabo wrote: > > On 16 Jan 2003, Ron Johnson wrote: > > > > > On Thu, 2003-01-16 at 09:39, Andrew Sullivan wrote: > > > > On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote: > > > > > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote: > > > > > > > > > > If a user has multiple connections and charges things to the same > > > > > > account in more than one connection at the same time, the > > > > > > transactions will have to be processed, effectively, in series: each > > > > > > one will have to wait for another to commit in order to complete. > > > > > > > > > > This is true even though the default transaction mode is > > > > > READ COMMITTED? > > > > > > > > Yes. Remember, _both_ of these are doing SELECT. . .FOR UPDATE. > > > > Which means they both try to lock the corresponding record. But they > > > > can't _both_ lock the same record; that's what the lock prevents. > > > > > > Could BEFORE INSERT|UPDATE|DELETE triggers perform the same > > > functionality while touching only the desired records, thus > > > decreasing conflict? > > > > It does limit it to the corresponding records, but if you > > say insert a row pointing at customer 1, and in another transaction > > insert a row pointing at customer 1, the second waits on the first. > > 2 points: > > 1. Don't you *want* TXN2 to wait on TXN1? Not really. Maybe I was unclear though. Given create table pktable(a int primary key); create table fktable(a int references pktable); insert into pktable values (1); The blocking would occur on: T1: begin; T2: begin; T1: insert into fktable values (1); T2: insert into fktable values (1); This doesn't need to block. The reason for the lock is to prevent someone from updating or deleting the row out of pktable, but it also prevents this kind of thing. This becomes an issue if you say have tables that store mappings and a table that has an fk to that. You'll be inserting lots of rows with say customertype=7 which points into a table with types and they'll block. Worse, if you say do inserts with different customertypes in different orders in two transactions you can deadlock yourself.
On Thu, Jan 16, 2003 at 10:25:36AM -0600, Ron Johnson wrote: > > 2 points: > > 1. Don't you *want* TXN2 to wait on TXN1? Not really. You really just want a tag which prevents TXN2 from committing when its reference data might go away. So what you want is a lock which says "don't delete, no matter what", until TXN2 commits. Then TXN1 could fail or not, depending on what it's trying to do. The problem is that there isn't a lock of the right strength to do that. > 2. In an OLTP environment (heck, in *any* environment), the goal > is to minimize txn length, so TXN2 shouldn't be waiting on > TXN1 for more than a fraction of a second anyway. Right. But it's possible to have multiple REFERENCES constraints to the same table; that's why I picked an account table, for instance, because you might have a large number of different kinds of things that the same account can do. So while you're correct that one wants to minimize txn length, it's also true that, when the effects are followed across a large system, you can easily start tripping over the FKs. The real problem, then, only shows up on a busy system with a table which gets referenced a lot. I should note, by the way, that the tremendous performance improvements available in 7.2.x have reduced the problem considerably from 7.1.x, at least in my experience. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Thu, 2003-01-16 at 10:38, Stephan Szabo wrote: > On 16 Jan 2003, Ron Johnson wrote: > > > On Thu, 2003-01-16 at 10:02, Stephan Szabo wrote: > > > On 16 Jan 2003, Ron Johnson wrote: > > > > > > > On Thu, 2003-01-16 at 09:39, Andrew Sullivan wrote: > > > > > On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote: > > > > > > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote: > > > > > > > > > > > > If a user has multiple connections and charges things to the same > > > > > > > account in more than one connection at the same time, the > > > > > > > transactions will have to be processed, effectively, in series: each > > > > > > > one will have to wait for another to commit in order to complete. > > > > > > > > > > > > This is true even though the default transaction mode is > > > > > > READ COMMITTED? > > > > > > > > > > Yes. Remember, _both_ of these are doing SELECT. . .FOR UPDATE. > > > > > Which means they both try to lock the corresponding record. But they > > > > > can't _both_ lock the same record; that's what the lock prevents. > > > > > > > > Could BEFORE INSERT|UPDATE|DELETE triggers perform the same > > > > functionality while touching only the desired records, thus > > > > decreasing conflict? > > > > > > It does limit it to the corresponding records, but if you > > > say insert a row pointing at customer 1, and in another transaction > > > insert a row pointing at customer 1, the second waits on the first. > > > > 2 points: > > > > 1. Don't you *want* TXN2 to wait on TXN1? > > Not really. Maybe I was unclear though. > > Given > create table pktable(a int primary key); > create table fktable(a int references pktable); > insert into pktable values (1); > > The blocking would occur on: > T1: begin; > T2: begin; > T1: insert into fktable values (1); > T2: insert into fktable values (1); > > This doesn't need to block. The reason for > the lock is to prevent someone from updating > or deleting the row out of pktable, but it > also prevents this kind of thing. This becomes > an issue if you say have tables that store mappings > and a table that has an fk to that. You'll > be inserting lots of rows with say > customertype=7 which points into a table with > types and they'll block. Worse, if you say > do inserts with different customertypes in > different orders in two transactions you > can deadlock yourself. So Postgres will think it's possible that I could modify the reference table that "customertype=7" refers to? If so, bummer. The commercial RDBMS that I use (Rdb/VMS) allows one to specify that certain tables are only for read access. For example: SET TRANSACTION READ WRITE RESERVING T_MASTER, T_DETAIL FOR SHARED WRITE, T_MAPPING1, T_MAPPING2, T_MAPPING3 FOR SHARED READ; Thus, only minimal locking is taken out on T_MAPPING1, T_MAPPING2 & T_MAPPING3, but if I try to "UPDATE T_MAPPING1" or reference any other table, even in a SELECT statement, then the statement will fail. Rdb also alows for exclusive write locks: SET TRANSACTION READ WRITE RESERVING T_MASTER, T_DETAIL FOR SHARED WRITE, T_MAPPING1, T_MAPPING2, T_MAPPING3 FOR SHARED READ, T_FOOBAR FOR EXCLUSIVE WRITE; Thus, even though there is concurrent access to the other tables, a table lock on T_FOOBAR is taken out. This cuts IO usage in 1/2, but obviously must be used with great discretion. -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "Basically, I got on the plane with a bomb. Basically, I | | tried to ignite it. Basically, yeah, I intended to damage | | the plane." | | RICHARD REID, who tried to blow up American Airlines | | Flight 63 | +------------------------------------------------------------+