Thread: Integrity on large sites

Integrity on large sites

From
Naz Gassiep
Date:
I'm working in a project at the moment that is using MySQL, and people
keep making assertions like this one:

"*Really* big sites don't ever have referential integrity. Or if the few
spots they do (like with financial transactions) it's implemented on the
application level (via, say, optimistic locking), never the database level."

This sounds to me like MySQLish. A large DB working with no RI would
give me nightmares. Is it really true that large sites turn RI off to
improve performance, or is that just a MySQL thing where it gets turned
off just because MySQL allows you to turn it off and improve
performance? Can you even turn RI off in PostgreSQL? Does Oracle, DB2 or
MSSQL allow you to turn it off? Am I just being naive in thinking that
everyone runs their DBs with RI in production?

- Naz

Re: Integrity on large sites

From
"Stuart Cooper"
Date:
> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."

Some large sites don't even use data types!

http://www.thestar.com/News/article/189175

"in some cases the field for the social insurance number was instead
filled in with a birth date."

(search the archives for "OT: Canadian Tax Database")

Cheers,
Stuart.

Re: Integrity on large sites

From
"Richard P. Welty"
Date:
Naz Gassiep wrote:
> I'm working in a project at the moment that is using MySQL, and people
> keep making assertions like this one:
>
> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."
>
> This sounds to me like MySQLish. A large DB working with no RI would
> give me nightmares. Is it really true that large sites turn RI off to
> improve performance,
i know from having worked in the shop that handles it that the databases
used in processing of NYS Personal Income Tax (Informix) most assuredly
use referential integrity.

anything else would be suicide.

certain shops do turn it off for large databases. that doesn't make it a
good idea,
or the industry norm.

richard


Re: Integrity on large sites

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/22/07 21:12, Naz Gassiep wrote:
> I'm working in a project at the moment that is using MySQL, and people
> keep making assertions like this one:
>
> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."
>
> This sounds to me like MySQLish. A large DB working with no RI would
> give me nightmares. Is it really true that large sites turn RI off to
> improve performance, or is that just a MySQL thing where it gets turned
> off just because MySQL allows you to turn it off and improve
> performance? Can you even turn RI off in PostgreSQL? Does Oracle, DB2 or
> MSSQL allow you to turn it off? Am I just being naive in thinking that
> everyone runs their DBs with RI in production?

Allow you to turn it off???

RI as in foreign keys or RI as in primary keys?

FKs are not implemented on our big transactional systems that use
Rdb/VMS.  Originally this was because the extra load would slow down
a system that needed every ounce of speed back on late 1990s technology.

Now we have (some) faster hardware, but even higher posting volumes.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGU6gRS9HxQb37XmcRAjSnAJwN8XhCxsHyeJHqxzi/k0Dj6O8fVACdGxrd
R1hfrTh9ifDivr51AGt1NNQ=
=CSLd
-----END PGP SIGNATURE-----

Re: Integrity on large sites

From
Ben
Date:
Not using foreign keys makes sense for some applications. WORM
applications where you know you are loading accurate data, for
example. Or times when it doesn't matter if an application bug
corrupts your data.

But if you care about your data and if you can't trust your client to
edit it correctly, you'd better have referential integrity. Size is
irrelevant to that equation.

On May 22, 2007, at 7:12 PM, Naz Gassiep wrote:

> I'm working in a project at the moment that is using MySQL, and people
> keep making assertions like this one:
>
> "*Really* big sites don't ever have referential integrity. Or if
> the few
> spots they do (like with financial transactions) it's implemented
> on the
> application level (via, say, optimistic locking), never the
> database level."
>
> This sounds to me like MySQLish. A large DB working with no RI would
> give me nightmares. Is it really true that large sites turn RI off to
> improve performance, or is that just a MySQL thing where it gets
> turned
> off just because MySQL allows you to turn it off and improve
> performance? Can you even turn RI off in PostgreSQL? Does Oracle,
> DB2 or
> MSSQL allow you to turn it off? Am I just being naive in thinking that
> everyone runs their DBs with RI in production?
>
> - Naz
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq


Re: Integrity on large sites

From
"btober"
Date:
----- Original Message Follows -----
From: "Stuart Cooper" <stuart.cooper@gmail.com>

>> "*Really* big sites don't ever have referential
>> integrity. Or if the few spots they do (like with
>> financial transactions) it's implemented on the
>application level (via, say, optimistic locking), never the
>database level."
>
>Some large sites don't even use data types!
>
>http://www.thestar.com/News/article/189175
>
>"in some cases the field for the social insurance number
>was instead filled in with a birth date."


But the fact that they don't use data types, or that some
big sites supposedly may not use referential integrity does
not provide justification that doing without is a Good
Thing. The Canadian Tax article, to any competent systems
admin, would provide incredibly strong justification FOR
using typed and validated data and referential integrity.
Anyone who has to be concerned with the integrity and
validity of their data, which should be the case everywhere
-- otherwise why bother collecting it -- has to enforce
those aspects, and RDBMS are built to do that. "Turning it
off" doesn't seem like a good way to address performance
issues. Buy bigger/better hardware and adjust configuration
settings. Data integrity has to be the first and fundamental
concern. Performance is irrelevant if you can't trust the
data -- would having answers faster be of any use if the
answers were not reliable?



Re: Integrity on large sites

From
PFC
Date:

>>> "*Really* big sites don't ever have referential
>>> integrity. Or if the few spots they do (like with
>>> financial transactions) it's implemented on the
>> application level (via, say, optimistic locking), never the
>> database level."

    Sure, but in the forum benchmark I just did, when using MyISAM, with no
reference integrity checks, at the end of the benchmark, there is an
impressive number of records with broken foreign key relations... when the
user kills his HTTP connection or reloads at the wrong moment, and the
script is interrupted, or killed by an exception or whatever, boom.

Re: Integrity on large sites

From
Scott Ribe
Date:
> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."

Pure, utter, unadulterated bullshit. Speaking as someone who had years of
experience with Sybase SQL Server before either MySQL or PostgreSQL were
even created...

Some big sites do of course juggle performance vs in-database run-time
checks, but the statements as typically presented by MySQL partisans, that
it's never done in the database level, is just wrong. Whether it's a direct
deception, iow speaker knows it to be false, or an indirect deception, iow
speaker is implying a level of expertise he does not possess, either way I
would categorize it as a lie.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Integrity on large sites

From
Scott Marlowe
Date:
Naz Gassiep wrote:
> I'm working in a project at the moment that is using MySQL, and people
> keep making assertions like this one:
>
> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."
>
> This sounds to me like MySQLish. A large DB working with no RI would
> give me nightmares. Is it really true that large sites turn RI off to
> improve performance, or is that just a MySQL thing where it gets turned
> off just because MySQL allows you to turn it off and improve
> performance? Can you even turn RI off in PostgreSQL? Does Oracle, DB2 or
> MSSQL allow you to turn it off? Am I just being naive in thinking that
> everyone runs their DBs with RI in production?

Someone's been drinking the MySQL 3.23 kool aide.

1: The bigger the amount of data you have to store, the more likely you
are to NEED referential integrity to make sure it's not getting all
messed up.  Not just financial data either.  What about applications
like trouble ticketing systems?  Can you imagine having tickets go
orphan in a system to keep track of issues?  What about parts inventory
systems?  Hospital medication tracking?  Transportation scheduling?
Fantasy Football?  All of those systems are likely to need RI to make
sure that the data inside them stays coherent.  We don't want to have
two customers thinking they have the same quarterback / taxi /
penicillen dosage / broken network router / water pump etc...

2: Handling RI in the application doesn't scale.  If everything you do
requires you to check in the app, lock the whole table to prevent race
conditions, and then commit, you'll never scale to any real number of
users.  You can have reliability and performance if you do RI in the
database.  You only get to pick one if you're gonna do RI in the
application.

3:  Of course you can turn off RI in PostgreSQL.  Either remove the FK
triggers or disable them db wide.  You can the same thing in Oracle as
well.  This is normally done during maintenance windows to allow data
that is known to be coherent to be imported quickly.  Doing so while
processing transactions is suicidal.

Re: Integrity on large sites

From
"Alexander Staubo"
Date:
On 5/23/07, Naz Gassiep <naz@mira.net> wrote:
> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."

It's not just the big ones. Try using Ruby on Rails -- and its ORM,
ActiveRecord -- at some point, and you will notice the rampant
ignorance of referential integrity. ActiveRecord bears signs of having
been designed for MySQL.

For example, you need a plugin to add programmatic support for
foreign-key declarations to your schema code, and foreign key
relationships have to be explicitly defined using directives such as
"has_many". The unit test framework assumes it can delete rows in any
order, irrespective of foreign-key references. And so on.

Interestingly, ActiveRecord's support for polymorphic object
associations -- which allow you define a reference to an object in an
arbitrary table -- violates RI *per definition*. There's no support
for setting up the check constraints that would be appropriate for
such attributes. All the more annoying, since such associations are
extremely useful.

Alexander.

Re: Integrity on large sites

From
PFC
Date:
> Some big sites do of course juggle performance vs in-database run-time
> checks, but the statements as typically presented by MySQL partisans,

    Live from the front :

    This freshly created database has had to endure a multithreaded query
assault for about 2 hours.
    It gave up.

TABLE `posts` (
   `post_id` int(11) NOT NULL auto_increment,
   `topic_id` int(11) NOT NULL,
etc...

mysql> SELECT max(post_id) FROM posts;
+--------------+
| max(post_id) |
+--------------+
|       591257 |
+--------------+

mysql> INSERT INTO posts (topic_id,post_text,user_id) VALUES (1,'DIE
BASTARD',666);
ERROR 1062 (23000): Duplicate entry '591257' for key 1

mysql> CHECK TABLE posts;
+-------------------+-------+----------+-----------------------------+
| Table             | Op    | Msg_type | Msg_text                    |
+-------------------+-------+----------+-----------------------------+
| forum_bench.posts | check | warning  | Table is marked as crashed  |
| forum_bench.posts | check | error    | Found 588137 keys of 588135 |
| forum_bench.posts | check | error    | Corrupt                     |
+-------------------+-------+----------+-----------------------------+

mysql> REPAIR TABLE posts;
+-------------------+--------+----------+----------+
| Table             | Op     | Msg_type | Msg_text |
+-------------------+--------+----------+----------+
| forum_bench.posts | repair | status   | OK       |
+-------------------+--------+----------+----------+

mysql> INSERT INTO posts (topic_id,post_text,user_id) VALUES (1,'DIE
BASTARD',666);
Query OK, 1 row affected, 1 warning (0.10 sec)

mysql> SHOW WARNINGS;
+---------+------+------------------------------------------------+
| Level   | Code | Message                                        |
+---------+------+------------------------------------------------+
| Warning | 1364 | Field 'post_time' doesn't have a default value |
+---------+------+------------------------------------------------+

mysql> SELECT max(post_id) FROM posts;
+--------------+
| max(post_id) |
+--------------+
|       591257 |
+--------------+

mysql> SELECT count(*) FROM posts UNION ALL SELECT sum( topic_post_count )
 FROM topics;
+----------+
| count(*) |
+----------+
|   588137 |
|   588145 |
+----------+

mysql> SELECT count(*) FROM topics WHERE topic_id NOT IN (SELECT topic_id
 FROM posts);
+----------+
| count(*) |
+----------+
|    11583 |
+----------+

(Note : there cannot be a topic without a post in it, ha !)

Try Postgres :

forum_bench=> SELECT count(*) FROM posts UNION ALL SELECT
sum( topic_post_count ) FROM topics;
  count
--------
  536108
  536108
(2 lignes)

forum_bench=> SELECT count(*) FROM topics WHERE topic_id NOT IN (SELECT
topic_id FROM posts);
  count
-------
      0
(1 ligne)



Re: Integrity on large sites

From
"Richard P. Welty"
Date:
Scott Marlowe wrote:
> 2: Handling RI in the application doesn't scale.  If everything you do
> requires you to check in the app, lock the whole table to prevent race
> conditions, and then commit, you'll never scale to any real number of
> users.  You can have reliability and performance if you do RI in the
> database.  You only get to pick one if you're gonna do RI in the
> application.
the other risk for RI in the app is the possibility of incompatible
implementation
across different app versions or different apps that access the same data.

not at all a fun place to be, that.

richard


Re: Integrity on large sites

From
Andrew Sullivan
Date:
On Wed, May 23, 2007 at 12:12:52PM +1000, Naz Gassiep wrote:
> give me nightmares. Is it really true that large sites turn RI off to
> improve performance,

You can't "turn it off", but you can "not use it".  And I suppose
there are shops where they don't use it; after all, you can make any
computer system arbitrarily fast if the answer doesn't have to be
right.

By the way, the idea that application-level checks will make your
data integrity cheaper is the sort of idea that application
programmers who don't know anything about databases like to flog.
There are of course occasions where this is trivially true (e.g., you
require two pieces of data in a command, so you error before talking
to the database if in your parse stage they're not both there).  But
any non-trivial integrity check is automatically going to impose
database queries, and anyone who claims the application programmer
can magically make the round trip cheaper than doing the same
operation inside the database is, bluntly, talking nonsense.

Worse, by moving the checking code out to the application, you also
move the maintenance of all that checking code out into the
application, where two different programmers can implement these
one-off checks in subtly different ways, introducing strange,
hard-to-troubleshoot data anomalies that take days to puzzle out and
fix.  Then someone in senior management asks why the database didn't
just catch this on its own, at which point you re-implement all those
foreign keys while _still_ paying the cost of the client-side
validation code (which never gets ripped out), which means that the
whole thing ends up operating _slower_ than any other possible
implementation.  I Have Been In That Meeting.

The whole reason we're using relational databases is so that the
relations can be queried _and maintained_.  Databases vendors didn't
invent foreign keys in order to slow down their systems so they could
have angry customers.  They implemented them in order to protect
their customers' data from bugs in application code.  If your data is
worth storing, it's surely worth storing correctly.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.
        --H.W. Fowler

Re: Integrity on large sites

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/23/07 12:48, PFC wrote:
>
>> Some big sites do of course juggle performance vs in-database run-time
>> checks, but the statements as typically presented by MySQL partisans,
>
>     Live from the front :
>
>     This freshly created database has had to endure a multithreaded
> query assault for about 2 hours.
>     It gave up.
>
> TABLE `posts` (
>   `post_id` int(11) NOT NULL auto_increment,
>   `topic_id` int(11) NOT NULL,
> etc...
>
> mysql> SELECT max(post_id) FROM posts;
> +--------------+
> | max(post_id) |
> +--------------+
> |       591257 |
> +--------------+
>
> mysql> INSERT INTO posts (topic_id,post_text,user_id) VALUES (1,'DIE
> BASTARD',666);
> ERROR 1062 (23000): Duplicate entry '591257' for key 1
>
> mysql> CHECK TABLE posts;
> +-------------------+-------+----------+-----------------------------+
> | Table             | Op    | Msg_type | Msg_text                    |
> +-------------------+-------+----------+-----------------------------+
> | forum_bench.posts | check | warning  | Table is marked as crashed  |
> | forum_bench.posts | check | error    | Found 588137 keys of 588135 |
> | forum_bench.posts | check | error    | Corrupt                     |
> +-------------------+-------+----------+-----------------------------+
>
> mysql> REPAIR TABLE posts;
> +-------------------+--------+----------+----------+
> | Table             | Op     | Msg_type | Msg_text |
> +-------------------+--------+----------+----------+
> | forum_bench.posts | repair | status   | OK       |
> +-------------------+--------+----------+----------+
>
> mysql> INSERT INTO posts (topic_id,post_text,user_id) VALUES (1,'DIE
> BASTARD',666);
> Query OK, 1 row affected, 1 warning (0.10 sec)
>
> mysql> SHOW WARNINGS;
> +---------+------+------------------------------------------------+
> | Level   | Code | Message                                        |
> +---------+------+------------------------------------------------+
> | Warning | 1364 | Field 'post_time' doesn't have a default value |
> +---------+------+------------------------------------------------+

What version of that pathetic RDBMS is this?

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVLLGS9HxQb37XmcRAgpiAJ4nHpdGXL5HFdosWvkIy16CEyXiSwCgjqtB
qYgCmePqgZkGCpdJ/JAFLoE=
=P7OR
-----END PGP SIGNATURE-----

Re: Integrity on large sites

From
"Alexander Staubo"
Date:
On 5/23/07, PFC <lists@peufeu.com> wrote:
> +-------------------+-------+----------+-----------------------------+
> | Table             | Op    | Msg_type | Msg_text                    |
> +-------------------+-------+----------+-----------------------------+
> | forum_bench.posts | check | warning  | Table is marked as crashed  |
> | forum_bench.posts | check | error    | Found 588137 keys of 588135 |
> | forum_bench.posts | check | error    | Corrupt                     |
> +-------------------+-------+----------+-----------------------------+

I find it hard to believe that this is MySQL's fault and not some
problem with your setup. Granted, MySQL is a pretty bad database, but
it's not *that* bad -- your example implies that heavily MyISAM-based
(you don't say whether this is MyISAM or InnoDB) sites such as
Slashdot and Flickr should be falling over every hour.

Alexander.

Re: Integrity on large sites

From
PFC
Date:

> What version of that pathetic RDBMS is this?

    MySQL 5.0.40, on gentoo Linux, Core 2 Duo.

    The table in question takes about 100 inserts/deletes and 600 selects per
second.
    MyISAM isn't able to finish the benchmark. Actually, I have to run REPAIR
TABLE every 20 minutes, since it corrupts.

> I find it hard to believe that this is MySQL's fault and not some
> problem with your setup.

    Yeah, me too.
    Is it a MyISAM bug, a gentoo bug, a hardware bug ? Who knows.
    Go into bugs.mysql.com and search for "corrupt" or "corruption"
    Postgres, InnoDB and memtest86 are perfectly happy on this machine.

    However, I have sometimes upgraded MySQL on websites, and found it to
crash repeatedly, then had to downgrade it.

    I submitted this one, consider it my pet bug :
    http://bugs.mysql.com/bug.php?id=28534

>
> - --
> Ron Johnson, Jr.
> Jefferson LA  USA
>
> Give a man a fish, and he eats for a day.
> Hit him with a fish, and he goes away for good!
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFGVLLGS9HxQb37XmcRAgpiAJ4nHpdGXL5HFdosWvkIy16CEyXiSwCgjqtB
> qYgCmePqgZkGCpdJ/JAFLoE=
> =P7OR
> -----END PGP SIGNATURE-----
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq



Re: Integrity on large sites

From
Tom Allison
Date:
On May 23, 2007, at 12:54 PM, Scott Ribe wrote:

>> "*Really* big sites don't ever have referential integrity. Or if
>> the few
>> spots they do (like with financial transactions) it's implemented
>> on the
>> application level (via, say, optimistic locking), never the
>> database level."
>
> Pure, utter, unadulterated bullshit. Speaking as someone who had
> years of
> experience with Sybase SQL Server before either MySQL or PostgreSQL
> were
> even created...
>
> Some big sites do of course juggle performance vs in-database run-time
> checks, but the statements as typically presented by MySQL
> partisans, that
> it's never done in the database level, is just wrong. Whether it's
> a direct
> deception, iow speaker knows it to be false, or an indirect
> deception, iow
> speaker is implying a level of expertise he does not possess,
> either way I
> would categorize it as a lie.

I concur with the claim of organic fertilizer.

I got into a rather spicy argument at the only RAILS conference I
went to.  They have this mentality that with Rails you don't need to
put in RI on the database because you can always run exists? checks
right before you do the insert to ensure integrity of your data.  Not
only does this apply to Referential Integrity, but also unique
values.  I was damn near screaming at them over the stupidity of such
a notion.

My experience is based on working at a rather large company that has
a really huge Oracle database.
When they designed it, they passed up on all Referential integrity
and all unique constraints.
After five years, we have tables that are >60% duplicate records and
the database is coming to a standstill.
And there is no known method in sight on being able to fix this one.

Bottom line, if the DBA or anyone says we can't support RI or UNIQUE
because of the performance overhead...  I would be inclined to look
for another DBA.
But I have to admit.  I am extremely opinionated about this as I'm
the guy who does most of the performance and metric reporting using
these horrid tables.
it does provide infinite job security, but it's hardly worth it in
the long run.


Re: Integrity on large sites

From
Dave Page
Date:
Alexander Staubo wrote:
> On 5/23/07, PFC <lists@peufeu.com> wrote:
>> +-------------------+-------+----------+-----------------------------+
>> | Table             | Op    | Msg_type | Msg_text                    |
>> +-------------------+-------+----------+-----------------------------+
>> | forum_bench.posts | check | warning  | Table is marked as crashed  |
>> | forum_bench.posts | check | error    | Found 588137 keys of 588135 |
>> | forum_bench.posts | check | error    | Corrupt                     |
>> +-------------------+-------+----------+-----------------------------+
>
> I find it hard to believe that this is MySQL's fault and not some
> problem with your setup. Granted, MySQL is a pretty bad database, but
> it's not *that* bad -- your example implies that heavily MyISAM-based
> (you don't say whether this is MyISAM or InnoDB) sites such as
> Slashdot and Flickr should be falling over every hour.

I'm not going to comment on who's fault it is, but the OP quoted 100
updates and 600 selects per *second*. I can't imagine Flickr or Slashdot
(which is heavily csched for reading) are under anything like that sort
of constant load.

Regards, Dave.

Re: Integrity on large sites

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/23/07 20:29, Tom Allison wrote:
[snip]
> Bottom line, if the DBA or anyone says we can't support RI or UNIQUE
> because of the performance overhead...  I would be inclined to look for
> another DBA.
> But I have to admit.  I am extremely opinionated about this as I'm the
> guy who does most of the performance and metric reporting using these
> horrid tables.
> it does provide infinite job security, but it's hardly worth it in the
> long run.

We must be the exception to the rule.

In July 2005 we did a major long-weekend unload-reload archive of
our big (400M row) toll tables.  There was no RI on the tables,
either PK or FK.

When reloading the "keep" data, I created a PK (fortunately the
hashed/clustered design of the table means that PK enforcement is
almost zero-cost) and loaded the data.

There were ZERO duplicates.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVR0MS9HxQb37XmcRAsozAKC0TCPBjj0cO58SEHfZ0JDoMdWTUQCeNDLq
Fa0x3oDJGTllIZ65dgdTUiY=
=Kqex
-----END PGP SIGNATURE-----

Re: Integrity on large sites

From
Tom Lane
Date:
Dave Page <dpage@postgresql.org> writes:
> I'm not going to comment on who's fault it is, but the OP quoted 100
> updates and 600 selects per *second*. I can't imagine Flickr or Slashdot
> (which is heavily csched for reading) are under anything like that sort
> of constant load.

I'm pretty sure I remember reading that Slashdot had to put enormous
amounts of cacheing in front of their DB to keep it from falling over
on a regular basis.

            regards, tom lane

Re: Integrity on large sites

From
"Joshua D. Drake"
Date:
Dave Page wrote:

>> problem with your setup. Granted, MySQL is a pretty bad database, but
>> it's not *that* bad -- your example implies that heavily MyISAM-based
>> (you don't say whether this is MyISAM or InnoDB) sites such as
>> Slashdot and Flickr should be falling over every hour.
>
> I'm not going to comment on who's fault it is, but the OP quoted 100
> updates and 600 selects per *second*. I can't imagine Flickr or Slashdot
> (which is heavily csched for reading) are under anything like that sort
> of constant load.

Uhmmm.... I would not be surprised at *all* at slashdot or flickr doing
that type of velocity. We have customers right now that under peak are
doing 10 times that and yes these are customers that have similar types
of websites.

Sincerely,

Joshua D. Drake


>
> Regards, Dave.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: Integrity on large sites

From
PFC
Date:
    Flickr uses InnoDB, by the way.



On Thu, 24 May 2007 18:07:21 +0200, Joshua D. Drake <jd@commandprompt.com>
wrote:

> Dave Page wrote:
>
>>> problem with your setup. Granted, MySQL is a pretty bad database, but
>>> it's not *that* bad -- your example implies that heavily MyISAM-based
>>> (you don't say whether this is MyISAM or InnoDB) sites such as
>>> Slashdot and Flickr should be falling over every hour.
>>  I'm not going to comment on who's fault it is, but the OP quoted 100
>> updates and 600 selects per *second*. I can't imagine Flickr or Slashdot
>> (which is heavily csched for reading) are under anything like that sort
>> of constant load.
>
> Uhmmm.... I would not be surprised at *all* at slashdot or flickr doing
> that type of velocity. We have customers right now that under peak are
> doing 10 times that and yes these are customers that have similar types
> of websites.
>
> Sincerely,
>
> Joshua D. Drake

Re: Integrity on large sites

From
Dave Page
Date:
Joshua D. Drake wrote:
> Dave Page wrote:
>
>>> problem with your setup. Granted, MySQL is a pretty bad database, but
>>> it's not *that* bad -- your example implies that heavily MyISAM-based
>>> (you don't say whether this is MyISAM or InnoDB) sites such as
>>> Slashdot and Flickr should be falling over every hour.
>>
>> I'm not going to comment on who's fault it is, but the OP quoted 100
>> updates and 600 selects per *second*. I can't imagine Flickr or Slashdot
>> (which is heavily csched for reading) are under anything like that sort
>> of constant load.
>
> Uhmmm.... I would not be surprised at *all* at slashdot or flickr doing
> that type of velocity. We have customers right now that under peak are
> doing 10 times that and yes these are customers that have similar types
> of websites.

Well taking the /. case, it's well known that they generate static pages
periodically in much the same way as we do for our website and serve
those, rather than hitting the comments database for every web hit.

As for the updates, I imagine they are next to non-existent on /. but if
you look at the comment posts instead which are probably their most
frequent type of non-select query, at 100 inserts per second, that
equates to 8,640,000 comments added per day. Assuming they publish, say
20 stories per day, that averages at 432,000 comments, per story, per day.

I don't recall the last time I saw that sort of response to a /. story...

Regards, Dave.

Re: Integrity on large sites

From
"Alexander Staubo"
Date:
On 5/24/07, PFC <lists@peufeu.com> wrote:
>         Flickr uses InnoDB, by the way.

The master does. The slaves use MyISAM, according to Cal Henderson:

  http://www.slideshare.net/coolpics/flickr-44054

Alexander.

Re: Integrity on large sites

From
Lew
Date:
PFC wrote:
>>>> "*Really* big sites don't ever have referential
>>>> integrity. Or if the few spots they do (like with
>>>> financial transactions) it's implemented on the
>>> application level (via, say, optimistic locking), never the
>>> database level."
>
>     Sure, but in the forum benchmark I just did, when using MyISAM, with
> no reference integrity checks, at the end of the benchmark, there is an
> impressive number of records with broken foreign key relations... when
> the user kills his HTTP connection or reloads at the wrong moment, and
> the script is interrupted, or killed by an exception or whatever, boom.

One assumes you mean implicit foreign key relations, since MyISAM doesn't
enforce them (hence the reason they're "broken", potentiated by the lack of
transaction support).

Sadly, there is a market for wrong answers faster.

--
Lew

Re: Integrity on large sites

From
Ron Mayer
Date:
Tom Lane wrote:
> Dave Page <dpage@postgresql.org> writes:
>> I can't imagine Flickr or Slashdot ...
>
> I'm pretty sure I remember reading that Slashdot had to put enormous
> amounts of cacheing in front of their DB to keep it from falling over
> on a regular basis.

Yes, slashdot and flickr both use memcached:

http://lists.danga.com/pipermail/memcached/2005-November/001726.html
"...we use both memcached and squid at flickr. ..."

http://lists.danga.com/pipermail/memcached/2003-July/000017.html
"...Slashdot's been using memcached to serve its comment text"