Thread: schema/db design wrt performance

schema/db design wrt performance

From
CaptainX0r
Date:
All,

I just noted in another thread that use of foreign keys in postgres
significantly hinders performance.  I'm wondering what other aspects we should
take into consideration in the design of our database.  We're coming from
Sybase and trying to design a more encompassing, modular, generic database that
won't take a serious performance hit under postgres.

Thanks,

-X

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Re: schema/db design wrt performance

From
"Fernando Papa"
Date:
I think FK on every database (oracle and MSSQL too) may hit performance,
but only in DML (insert/update/delete). These are tradeoffs...
referencial integrity vs. problems with batchload for example.
My Oracle experience say when I need to do batchload, I disable
constraints and then apply and work over exceptions.
If you don't make referencial integrity on database maybe you need to do
it on you application... and I think will be very painfull.

--
Fernando O. Papa
DBA


> -----Mensaje original-----
> De: CaptainX0r [mailto:captainx0r@yahoo.com]
> Enviado el: jueves, 16 de enero de 2003 10:52
> Para: pgsql-performance@postgresql.org
> Asunto: [PERFORM] schema/db design wrt performance
>
>
> All,
>
> I just noted in another thread that use of foreign keys in
> postgres significantly hinders performance.  I'm wondering
> what other aspects we should take into consideration in the
> design of our database.  We're coming from Sybase and trying
> to design a more encompassing, modular, generic database that
> won't take a serious performance hit under postgres.
>
> Thanks,
>
> -X
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: schema/db design wrt performance

From
Andrew Sullivan
Date:
On Thu, Jan 16, 2003 at 05:51:40AM -0800, CaptainX0r wrote:
> All,
>
> I just noted in another thread that use of foreign keys in postgres
> significantly hinders performance.  I'm wondering what other

Since I think I'm the one responsible for this, I'd better say
something clearer for the record.

The foreign keys implementation in PostgreSQL essentially uses SELECT
. . . FOR UPDATE to ensure that referenced data doesn't go away while a
referencing datum is being inserted or altered.

The problem with this is that frequently-referenced data are
therefore effectively locked during the operation.  Other writers
will block on the locked data until the first writer finishes.

So, for instance, consider two artificial-example tables:

create table account (acct_id serial primary key);

create table acct_activity (acct_id int references
account(acct_id), trans_on timestamp, val numeric(12,2));

If a user has multiple connections and charges things to the same
account in more than one connection at the same time, the
transactions will have to be processed, effectively, in series: each
one will have to wait for another to commit in order to complete.

This is just a performance bottleneck.  But it gets worse.  Suppose
the account table is like this:

create table account (acct_id serial primary key, con_id int
references contact(con_id));

create table contact (con_id serial primary key, name text, address1
text [. . .]);

Now, if another transaction is busy trying to delete a contact at the
same time the account table is being updated to reflect, say, a new
contact, you run the risk of deadlock.

The FK support in PostgreSQL is therefore mostly useful for
low-volume applications.  It can be made to work under heavier load
if you use it very carefully and program your application for it.
But I suggest avoiding it for heavy-duty use if you really can.

> take into consideration in the design of our database.  We're
> coming from Sybase and trying to design a more encompassing,
> modular, generic database that won't take a serious performance hit
> under postgres.

Avoid NOT IN.  This is difficult, because the workaround in Postgres
(NOT EXISTS) is frequently lousy on other systems.  Apparently there
is some fix for this contemplated for 7.4, but I've been really busy
lately, so I haven't been following -hackers.  Someone else can
probably say something more useful about it.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: schema/db design wrt performance

From
Ron Johnson
Date:
On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote:
> On Thu, Jan 16, 2003 at 05:51:40AM -0800, CaptainX0r wrote:
> > All,
> >
> > I just noted in another thread that use of foreign keys in postgres
> > significantly hinders performance.  I'm wondering what other
>
> Since I think I'm the one responsible for this, I'd better say
> something clearer for the record.
>
> The foreign keys implementation in PostgreSQL essentially uses SELECT
> . . . FOR UPDATE to ensure that referenced data doesn't go away while a
> referencing datum is being inserted or altered.
>
> The problem with this is that frequently-referenced data are
> therefore effectively locked during the operation.  Other writers
> will block on the locked data until the first writer finishes.
>
> So, for instance, consider two artificial-example tables:
>
> create table account (acct_id serial primary key);
>
> create table acct_activity (acct_id int references
> account(acct_id), trans_on timestamp, val numeric(12,2));
>
> If a user has multiple connections and charges things to the same
> account in more than one connection at the same time, the
> transactions will have to be processed, effectively, in series: each
> one will have to wait for another to commit in order to complete.

This is true even though the default transaction mode is
READ COMMITTED?

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "Basically, I got on the plane with a bomb. Basically, I   |
|  tried to ignite it. Basically, yeah, I intended to damage |
|  the plane."                                               |
|    RICHARD REID, who tried to blow up American Airlines    |
|                  Flight 63                                 |
+------------------------------------------------------------+


Re: schema/db design wrt performance

From
Andrew Sullivan
Date:
On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote:
> On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote:

> > If a user has multiple connections and charges things to the same
> > account in more than one connection at the same time, the
> > transactions will have to be processed, effectively, in series: each
> > one will have to wait for another to commit in order to complete.
>
> This is true even though the default transaction mode is
> READ COMMITTED?

Yes.  Remember, _both_ of these are doing SELECT. . .FOR UPDATE.
Which means they both try to lock the corresponding record.  But they
can't _both_ lock the same record; that's what the lock prevents.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: schema/db design wrt performance

From
Ron Johnson
Date:
On Thu, 2003-01-16 at 09:39, Andrew Sullivan wrote:
> On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote:
> > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote:
>
> > > If a user has multiple connections and charges things to the same
> > > account in more than one connection at the same time, the
> > > transactions will have to be processed, effectively, in series: each
> > > one will have to wait for another to commit in order to complete.
> >
> > This is true even though the default transaction mode is
> > READ COMMITTED?
>
> Yes.  Remember, _both_ of these are doing SELECT. . .FOR UPDATE.
> Which means they both try to lock the corresponding record.  But they
> can't _both_ lock the same record; that's what the lock prevents.

Could BEFORE INSERT|UPDATE|DELETE triggers perform the same
functionality while touching only the desired records, thus
decreasing conflict?

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "Basically, I got on the plane with a bomb. Basically, I   |
|  tried to ignite it. Basically, yeah, I intended to damage |
|  the plane."                                               |
|    RICHARD REID, who tried to blow up American Airlines    |
|                  Flight 63                                 |
+------------------------------------------------------------+


Re: schema/db design wrt performance

From
Stephan Szabo
Date:
On 16 Jan 2003, Ron Johnson wrote:

> On Thu, 2003-01-16 at 09:39, Andrew Sullivan wrote:
> > On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote:
> > > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote:
> >
> > > > If a user has multiple connections and charges things to the same
> > > > account in more than one connection at the same time, the
> > > > transactions will have to be processed, effectively, in series: each
> > > > one will have to wait for another to commit in order to complete.
> > >
> > > This is true even though the default transaction mode is
> > > READ COMMITTED?
> >
> > Yes.  Remember, _both_ of these are doing SELECT. . .FOR UPDATE.
> > Which means they both try to lock the corresponding record.  But they
> > can't _both_ lock the same record; that's what the lock prevents.
>
> Could BEFORE INSERT|UPDATE|DELETE triggers perform the same
> functionality while touching only the desired records, thus
> decreasing conflict?

It does limit it to the corresponding records, but if you
say insert a row pointing at customer 1, and in another transaction
insert a row pointing at customer 1, the second waits on the first.



Re: schema/db design wrt performance

From
Andrew Sullivan
Date:
On Thu, Jan 16, 2003 at 09:50:04AM -0600, Ron Johnson wrote:
>
> Could BEFORE INSERT|UPDATE|DELETE triggers perform the same
> functionality while touching only the desired records, thus
> decreasing conflict?

You can make the constraint DEFERRABLE INITIALY DEFERRED.  It helps
somewhat.  But the potential for deadlock, and the backing up, will
still happen to some degree.  It's a well-known flaw in the FK
system.  I beleive the FK implementation was mostly intended as a
proof of concept.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: schema/db design wrt performance

From
Ron Johnson
Date:
On Thu, 2003-01-16 at 10:02, Stephan Szabo wrote:
> On 16 Jan 2003, Ron Johnson wrote:
>
> > On Thu, 2003-01-16 at 09:39, Andrew Sullivan wrote:
> > > On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote:
> > > > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote:
> > >
> > > > > If a user has multiple connections and charges things to the same
> > > > > account in more than one connection at the same time, the
> > > > > transactions will have to be processed, effectively, in series: each
> > > > > one will have to wait for another to commit in order to complete.
> > > >
> > > > This is true even though the default transaction mode is
> > > > READ COMMITTED?
> > >
> > > Yes.  Remember, _both_ of these are doing SELECT. . .FOR UPDATE.
> > > Which means they both try to lock the corresponding record.  But they
> > > can't _both_ lock the same record; that's what the lock prevents.
> >
> > Could BEFORE INSERT|UPDATE|DELETE triggers perform the same
> > functionality while touching only the desired records, thus
> > decreasing conflict?
>
> It does limit it to the corresponding records, but if you
> say insert a row pointing at customer 1, and in another transaction
> insert a row pointing at customer 1, the second waits on the first.

2 points:

1. Don't you *want* TXN2 to wait on TXN1?
2. In an OLTP environment (heck, in *any* environment), the goal
   is to minimize txn length, so TXN2 shouldn't be waiting on
   TXN1 for more than a fraction of a second anyway.

Am I missing something?

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "Basically, I got on the plane with a bomb. Basically, I   |
|  tried to ignite it. Basically, yeah, I intended to damage |
|  the plane."                                               |
|    RICHARD REID, who tried to blow up American Airlines    |
|                  Flight 63                                 |
+------------------------------------------------------------+


Re: schema/db design wrt performance

From
Stephan Szabo
Date:
On 16 Jan 2003, Ron Johnson wrote:

> On Thu, 2003-01-16 at 10:02, Stephan Szabo wrote:
> > On 16 Jan 2003, Ron Johnson wrote:
> >
> > > On Thu, 2003-01-16 at 09:39, Andrew Sullivan wrote:
> > > > On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote:
> > > > > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote:
> > > >
> > > > > > If a user has multiple connections and charges things to the same
> > > > > > account in more than one connection at the same time, the
> > > > > > transactions will have to be processed, effectively, in series: each
> > > > > > one will have to wait for another to commit in order to complete.
> > > > >
> > > > > This is true even though the default transaction mode is
> > > > > READ COMMITTED?
> > > >
> > > > Yes.  Remember, _both_ of these are doing SELECT. . .FOR UPDATE.
> > > > Which means they both try to lock the corresponding record.  But they
> > > > can't _both_ lock the same record; that's what the lock prevents.
> > >
> > > Could BEFORE INSERT|UPDATE|DELETE triggers perform the same
> > > functionality while touching only the desired records, thus
> > > decreasing conflict?
> >
> > It does limit it to the corresponding records, but if you
> > say insert a row pointing at customer 1, and in another transaction
> > insert a row pointing at customer 1, the second waits on the first.
>
> 2 points:
>
> 1. Don't you *want* TXN2 to wait on TXN1?

Not really.  Maybe I was unclear though.

Given
create table pktable(a int primary key);
create table fktable(a int references pktable);
insert into pktable values (1);

The blocking would occur on:
T1: begin;
T2: begin;
T1: insert into fktable values (1);
T2: insert into fktable values (1);

This doesn't need to block.  The reason for
the lock is to prevent someone from updating
or deleting the row out of pktable, but it
also prevents this kind of thing.  This becomes
an issue if you say have tables that store mappings
and a table that has an fk to that.  You'll
be inserting lots of rows with say
customertype=7 which points into a table with
types and they'll block.  Worse, if you say
do inserts with different customertypes in
different orders in two transactions you
can deadlock yourself.



Re: schema/db design wrt performance

From
Andrew Sullivan
Date:
On Thu, Jan 16, 2003 at 10:25:36AM -0600, Ron Johnson wrote:
>
> 2 points:
>
> 1. Don't you *want* TXN2 to wait on TXN1?

Not really.  You really just want a tag which prevents TXN2 from
committing when its reference data might go away.  So what you want
is a lock which says "don't delete, no matter what", until TXN2
commits.  Then TXN1 could fail or not, depending on what it's trying
to do.  The problem is that there isn't a lock of the right strength
to do that.

> 2. In an OLTP environment (heck, in *any* environment), the goal
>    is to minimize txn length, so TXN2 shouldn't be waiting on
>    TXN1 for more than a fraction of a second anyway.

Right.  But it's possible to have multiple REFERENCES constraints
to the same table; that's why I picked an account table, for
instance, because you might have a large number of different kinds of
things that the same account can do.  So while you're correct that
one wants to minimize txn length, it's also true that, when the
effects are followed across a large system, you can easily start
tripping over the FKs.  The real problem, then, only shows up on a
busy system with a table which gets referenced a lot.

I should note, by the way, that the tremendous performance
improvements available in 7.2.x have reduced the problem considerably
from 7.1.x, at least in my experience.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: schema/db design wrt performance

From
Ron Johnson
Date:
On Thu, 2003-01-16 at 10:38, Stephan Szabo wrote:
> On 16 Jan 2003, Ron Johnson wrote:
>
> > On Thu, 2003-01-16 at 10:02, Stephan Szabo wrote:
> > > On 16 Jan 2003, Ron Johnson wrote:
> > >
> > > > On Thu, 2003-01-16 at 09:39, Andrew Sullivan wrote:
> > > > > On Thu, Jan 16, 2003 at 08:34:38AM -0600, Ron Johnson wrote:
> > > > > > On Thu, 2003-01-16 at 08:20, Andrew Sullivan wrote:
> > > > >
> > > > > > > If a user has multiple connections and charges things to the same
> > > > > > > account in more than one connection at the same time, the
> > > > > > > transactions will have to be processed, effectively, in series: each
> > > > > > > one will have to wait for another to commit in order to complete.
> > > > > >
> > > > > > This is true even though the default transaction mode is
> > > > > > READ COMMITTED?
> > > > >
> > > > > Yes.  Remember, _both_ of these are doing SELECT. . .FOR UPDATE.
> > > > > Which means they both try to lock the corresponding record.  But they
> > > > > can't _both_ lock the same record; that's what the lock prevents.
> > > >
> > > > Could BEFORE INSERT|UPDATE|DELETE triggers perform the same
> > > > functionality while touching only the desired records, thus
> > > > decreasing conflict?
> > >
> > > It does limit it to the corresponding records, but if you
> > > say insert a row pointing at customer 1, and in another transaction
> > > insert a row pointing at customer 1, the second waits on the first.
> >
> > 2 points:
> >
> > 1. Don't you *want* TXN2 to wait on TXN1?
>
> Not really.  Maybe I was unclear though.
>
> Given
> create table pktable(a int primary key);
> create table fktable(a int references pktable);
> insert into pktable values (1);
>
> The blocking would occur on:
> T1: begin;
> T2: begin;
> T1: insert into fktable values (1);
> T2: insert into fktable values (1);
>
> This doesn't need to block.  The reason for
> the lock is to prevent someone from updating
> or deleting the row out of pktable, but it
> also prevents this kind of thing.  This becomes
> an issue if you say have tables that store mappings
> and a table that has an fk to that.  You'll
> be inserting lots of rows with say
> customertype=7 which points into a table with
> types and they'll block.  Worse, if you say
> do inserts with different customertypes in
> different orders in two transactions you
> can deadlock yourself.

So Postgres will think it's possible that I could modify the
reference table that "customertype=7" refers to?  If so, bummer.

The commercial RDBMS that I use (Rdb/VMS) allows one to specify
that certain tables are only for read access.

For example:
SET TRANSACTION READ WRITE
    RESERVING T_MASTER, T_DETAIL FOR SHARED WRITE,
              T_MAPPING1, T_MAPPING2, T_MAPPING3 FOR SHARED READ;

Thus, only minimal locking is taken out on T_MAPPING1, T_MAPPING2
& T_MAPPING3, but if I try to "UPDATE T_MAPPING1" or reference any
other table, even in a SELECT statement, then the statement will
fail.

Rdb also alows for exclusive write locks:
SET TRANSACTION READ WRITE
    RESERVING T_MASTER, T_DETAIL FOR SHARED WRITE,
              T_MAPPING1, T_MAPPING2, T_MAPPING3 FOR SHARED READ,
              T_FOOBAR FOR EXCLUSIVE WRITE;

Thus, even though there is concurrent access to the other tables,
a table lock on T_FOOBAR is taken out.  This cuts IO usage in 1/2,
but obviously must be used with great discretion.

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "Basically, I got on the plane with a bomb. Basically, I   |
|  tried to ignite it. Basically, yeah, I intended to damage |
|  the plane."                                               |
|    RICHARD REID, who tried to blow up American Airlines    |
|                  Flight 63                                 |
+------------------------------------------------------------+