Thread: Why facebook used mysql ?

Why facebook used mysql ?

From
Sandeep Srinivasa
Date:
The discussion/comments touched upon why mysql is a better idea for Facebook than Postgres. Here's an interesting one
 
One is that PG doesn't scale that well on multiple cores as MySQL nowadays.
Another is in fundamental differences of storage architecture - all MySQL/InnoDB data is either a clustered primary key, or secondary key with PK pointers - logical relationships between entries allow to have index-only scans, which are a must for web-facing databases (good response times, no variance).
One more reason is that in heavily indexed databases vacuum will have to do full index passes, rather than working with LRU.
As for sharding, etc - there's no way to scale vertically infinitely - so the "stupid people shard" point is very very moot. 
It is much cheaper to go the commodity hardware path.

or

In general Postgresql is faster at complex queries with a lot of joins and such, while MySQL is faster at simple queries such as primary key look up. 

I wonder if anyone can comment on this - especially the part that PG doesnt scale as well as MySQL on multiple cores ? 

regards
Sandeep

Re: Why facebook used mysql ?

From
Richard Broersma
Date:
On Mon, Nov 8, 2010 at 8:24 PM, Sandeep Srinivasa <sss@clearsenses.com> wrote:
> I wonder if anyone can comment on this - especially the part that PG doesnt
> scale as well as MySQL on multiple cores ?

Sorry Sandeep,  there may be some that love to re-re-re-hash these
these subjects.  I myself am losing interest.

The following link contains hundreds of comments that you may be
interested in, some that address issues that are much more interesting
and well established:

http://search.postgresql.org/search?q=mysql+performance&m=1&l=NULL&d=365&s=r&p=1



--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

Re: Why facebook used mysql ?

From
Sandeep Srinivasa
Date:
On Tue, Nov 9, 2010 at 10:31 AM, Richard Broersma <richard.broersma@gmail.com> wrote:
The following link contains hundreds of comments that you may be
interested in, some that address issues that are much more interesting
and well established:

http://search.postgresql.org/search?q=mysql+performance&m=1&l=NULL&d=365&s=r&p=1
 
I did actually try to search for topics on multiple cores vs MySQL, but I wasnt able to find anything of much use. Elsewhere (on Hacker News for example), I have indeed come across statements that PG scales better on multiple cores, which are usually offset by claims that MySQL is better.

Google isnt of much use for this either - while MySQL has several resources talking about  benchmarks/tuning on multi core servers (e.g. http://dimitrik.free.fr/blog/archives/2010/09/mysql-performance-55-notes.html), I cant find any such serious discussion on Postgresql

However, what I did find (http://www.pgcon.org/2008/schedule/events/72.en.html) was titled "Problems with PostgreSQL on Multi-core Systems with Multi-Terabyte Data"  (interestingly, published by the Postgresql Performance Team @ Sun)

Ergo, my question still stands - maybe my google-fu was bad... why is why I am asking for help.

regards
Sandeep

Re: Why facebook used mysql ?

From
Scott Marlowe
Date:
On Mon, Nov 8, 2010 at 10:47 PM, Sandeep Srinivasa <sss@clearsenses.com> wrote:

> I did actually try to search for topics on multiple cores vs MySQL, but I
> wasnt able to find anything of much use. Elsewhere (on Hacker News for
> example), I have indeed come across statements that PG scales better on
> multiple cores, which are usually offset by claims that MySQL is better.
> Google isnt of much use for this either - while MySQL has several resources
> talking about  benchmarks/tuning on multi core servers
> (e.g. http://dimitrik.free.fr/blog/archives/2010/09/mysql-performance-55-notes.html),
> I cant find any such serious discussion on Postgresql

Part of that is that 48 core machines with fast enough memory busses
to use those cores, are only now coming out in affordable packages
($10k or so for a machine with a handful of drives) that they're just
getting tested.  I have 8 core, and 12 core older gen AMDs with DDR667
and DDR800 memory, and they dont' scale PAST 8 cores, either one, but
that limitation is due more to the slower HT buss on the older AMDs.
With the much faster HT busses on the 6xxx series Magny Cours CPUs
they scale right out to 40+ cores or so, and give great numbers.  The
taper as you go past 48 processes isn't to bad.  With proper pooling
to keep the number of active connections at or below say 50, it should
run well for a pretty huge load.  And in everyday operation they are
always responsive, even when things aren't going quite right
otherwise.

> However, what I did find
> (http://www.pgcon.org/2008/schedule/events/72.en.html) was titled "Problems
> with PostgreSQL on Multi-core Systems with Multi-Terabyte Data"
>  (interestingly, published by the Postgresql Performance Team @ Sun)

We're not a company selling a product, we're enthusiasts racing our
databases on the weekends, so to speak, and if someone has ideas on
what's slow and how to make it faster we talk about it.  :)    That
paper wasn't saying that postgresql is problematic at large levels so
much as to address the problems that arise when you do, and ways to
look forward to improving performance.

> Ergo, my question still stands - maybe my google-fu was bad... why is why I
> am asking for help.

To know if either is a good choice you really need to say what you're
planning on doing.  If you're building a petabyte sized datawarehouse
look at what yahoo did with a custom hacked version of pgsql.  If
you're gonna build another facebook look at what they did.  They're
both very different applications of a "database".

So, your question needs more substance.  What do you want to do with your db?
--
To understand recursion, one must first understand recursion.

Re: Why facebook used mysql ?

From
Merlin Moncure
Date:
On Mon, Nov 8, 2010 at 11:24 PM, Sandeep Srinivasa <sss@clearsenses.com> wrote:
> There was an interesting post today on highscalability
> - http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html
> The discussion/comments touched upon why mysql is a better idea for Facebook
> than Postgres. Here's an interesting one

postgresql might not be a good fit for this type of application, but
the reasoning given in the article is really suspicious.  The true
answer was hinted at in the comments: "we chose it first, and there
was never a reason to change it".  It really comes down to they
probably don't need much from the database other than a distributed
key value store, and they built a big software layer on top of that to
manage it.  Hm, I use facebook and I've seen tons of inconsistent
answers, missing notifications and such.  I wonder if there's a
connection there...

merlin

Re: Why facebook used mysql ?

From
Allan Kamau
Date:
On Tue, Nov 9, 2010 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Mon, Nov 8, 2010 at 11:24 PM, Sandeep Srinivasa <sss@clearsenses.com> wrote:
>> There was an interesting post today on highscalability
>> - http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html
>> The discussion/comments touched upon why mysql is a better idea for Facebook
>> than Postgres. Here's an interesting one
>
> postgresql might not be a good fit for this type of application, but
> the reasoning given in the article is really suspicious.  The true
> answer was hinted at in the comments: "we chose it first, and there
> was never a reason to change it".  It really comes down to they
> probably don't need much from the database other than a distributed
> key value store, and they built a big software layer on top of that to
> manage it.  Hm, I use facebook and I've seen tons of inconsistent
> answers, missing notifications and such.  I wonder if there's a
> connection there...
>
> merlin
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

I agree with Merlin, There is a surprising big number of "good"
technology companies (including Google) out there using MySQL. For
sometime I have been wondering why and have come up with a few
(possibly wrong) theories. Such as: these companies are started by
application developers not database experts, the cost (effort) of
changing to other database engine is substantial given that that
probably there is already so much inconsistencies in their current
data setup coupled with considerable amount of inconsistency cover-up
code at the application programs, and maybe the IT team is doubling up
as a fire fighting department constantly putting out the data driven
fires. This is then compounded by the rapid increase in data.

Allan.

Re: Why facebook used mysql ?

From
Scott Ribe
Date:
On Nov 9, 2010, at 7:04 AM, Allan Kamau wrote:

> have come up with a few
> (possibly wrong) theories.

They all sound reasonable. I think you missed an important one though: aggressive (and even sometimes outright false)
promotionand sales by the company MySQL AB. 

Why I started looking at databases, you didn't have to look very hard to find PostgreSQL, but you did have to at least
makea minimal effort. 

Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was a
littleshaky. (I started with 7.3 or 7.4, and it has been rock solid.) 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: Why facebook used mysql ?

From
Vick Khera
Date:
On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
> Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was a
littleshaky. (I started with 7.3 or 7.4, and it has been rock solid.) 
>

In those same times, mysql was also, um, other than rock solid.  I
have somewhere a personal email from Monty describing how to
crash-recover corrupted myisam data files (I was customer number 13 I
believe... i wish i still had that support contract certificate as an
artifact)

Re: Why facebook used mysql ?

From
Tom Lane
Date:
Vick Khera <vivek@khera.org> writes:
> On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
>> Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was
alittle shaky. (I started with 7.3 or 7.4, and it has been rock solid.) 

> In those same times, mysql was also, um, other than rock solid.

I don't have enough operational experience with mysql to speak to how
reliable it was back in the day.  What it *did* have over postgres back
then was speed.  It was a whole lot faster, particularly on the sort of
single-stream-of-simple-queries cases that people who don't know
databases are likely to set up as benchmarks.  (mysql still beats us on
cases like that, though not by as much.)  I think that drove quite a
few early adoption decisions, and now folks are locked in; the cost of
conversion outweighs the (perceived) benefits.

            regards, tom lane

Re: Why facebook used mysql ?

From
Dmitriy Igrishin
Date:
Hey all,

IMO that they choiced MySQL because of no knowledge
about PostgreSQL and about valid database designs.
Just garbage of data for SELECTing with minimal efforts
on data integrity and database server programming (ala
typical PHP project).
Sorry :-)

2010/11/9 Tom Lane <tgl@sss.pgh.pa.us>
Vick Khera <vivek@khera.org> writes:
> On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
>> Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was a little shaky. (I started with 7.3 or 7.4, and it has been rock solid.)

> In those same times, mysql was also, um, other than rock solid.

I don't have enough operational experience with mysql to speak to how
reliable it was back in the day.  What it *did* have over postgres back
then was speed.  It was a whole lot faster, particularly on the sort of
single-stream-of-simple-queries cases that people who don't know
databases are likely to set up as benchmarks.  (mysql still beats us on
cases like that, though not by as much.)  I think that drove quite a
few early adoption decisions, and now folks are locked in; the cost of
conversion outweighs the (perceived) benefits.

                       regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
// Dmitriy.


Re: Why facebook used mysql ?

From
"Gauthier, Dave"
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf > Of Tom Lane
> Sent: Tuesday, November 09, 2010 10:55 AM
> To: Vick Khera
> Cc: Scott Ribe; Allan Kamau; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Why facebook used mysql ?
>
> Vick Khera <vivek@khera.org> writes:
> > On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
> >> Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 > and earliest 7.x, it
wasa little shaky. (I started with 7.3 or 7.4, and it has been rock > > > solid.) 
>
> > In those same times, mysql was also, um, other than rock solid.

> I don't have enough operational experience with mysql to speak to how
> reliable it was back in the day.  What it *did* have over postgres back
> then was speed.  It was a whole lot faster, particularly on the sort of
> single-stream-of-simple-queries cases that people who don't know
> databases are likely to set up as benchmarks.  (mysql still beats us on
> cases like that, though not by as much.)  I think that drove quite a
> few early adoption decisions, and now folks are locked in; the cost of
> conversion outweighs the (perceived) benefits.

A different slant on this has to do with licensing and $$. Might Oracle decide some day to start charging for their new
foundDB?  They are a for-profit company that's beholding to their shareholders LONG before an open software community.
Consumerslike Facebook and Google have deep pockets, something corporate executives really don't dismiss lightly. 


Re: Why facebook used mysql ?

From
David Boreham
Date:
Also there's the strange and mysterious valley group-think syndrome.
I've seen this with several products/technologies over the years.
I suspect it comes from the VCs, but I'm not sure. The latest example
is "you should be using EC2". There always follows a discussion where
I can present 50 concrete reasons based on hard experience why
the suggestion is a bad idea and the other person presents nothing
besides "everyone's doing it". I saw exactly the same thing with MySQL
a few years ago. Before that it was Oracle. It's often easier to go along
with the flow and get some work done vs. trying to argue.



Re: Why facebook used mysql ?

From
Cédric Villemain
Date:
2010/11/9 Tom Lane <tgl@sss.pgh.pa.us>:
> Vick Khera <vivek@khera.org> writes:
>> On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
>>> Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was
alittle shaky. (I started with 7.3 or 7.4, and it has been rock solid.) 
>
>> In those same times, mysql was also, um, other than rock solid.
>
> I don't have enough operational experience with mysql to speak to how
> reliable it was back in the day.  What it *did* have over postgres back
> then was speed.  It was a whole lot faster, particularly on the sort of
> single-stream-of-simple-queries cases that people who don't know
> databases are likely to set up as benchmarks.  (mysql still beats us on
> cases like that, though not by as much.)  I think that drove quite a
> few early adoption decisions, and now folks are locked in; the cost of
> conversion outweighs the (perceived) benefits.

Facebook have writen  "Flashcache [is] built primarily as a block
cache for InnoDB but is general purpose and can be used by other
applications as well."

https://github.com/facebook/flashcache/

A good tool by the way. It is the only place where I like to see SSD
disk. (not at facebook, but with 'volatile' data)

>
>                        regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: Why facebook used mysql ?

From
Merlin Moncure
Date:
On Tue, Nov 9, 2010 at 10:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Vick Khera <vivek@khera.org> writes:
>> On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
>>> Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was
alittle shaky. (I started with 7.3 or 7.4, and it has been rock solid.) 
>
>> In those same times, mysql was also, um, other than rock solid.
>
> I don't have enough operational experience with mysql to speak to how
> reliable it was back in the day.  What it *did* have over postgres back
> then was speed.  It was a whole lot faster, particularly on the sort of
> single-stream-of-simple-queries cases that people who don't know
> databases are likely to set up as benchmarks.  (mysql still beats us on
> cases like that, though not by as much.)  I think that drove quite a
> few early adoption decisions, and now folks are locked in; the cost of
> conversion outweighs the (perceived) benefits.

Postgres 7.2 brought non blocking vacuum.   Before that, you could
pretty much write off any 24x7 duty applications -- dealing with dead
tuples was just too much of a headache.   The mysql of the time, 3.23,
was fast but locky and utterly unsafe.  It has been easier to run
though until recently (8.4 really changed things).

Postgres has been relatively disadvantaged in terms of administrative
overhead which is a bigger deal than sql features, replication,
performance, etc for high load website type cases.  heap FSM, tunable
autovacuum, checkpoint management, smarter/faster statistics
collector, and more backup options may not be as sexy as replication
etc but are very appealing features if you are running 50 database
servers backing a monster web site.   Dumping sys v ipc for mmap is a
hypothetical improvement in that vein :-) (aiui, it is not possible
though).

merlin

Re: Why facebook used mysql ?

From
Andy
Date:
--- On Tue, 11/9/10, Gauthier, Dave <dave.gauthier@intel.com> wrote:

> A different slant on this has to do with licensing and $$.
> Might Oracle decide some day to start charging for their new
> found DB?  They are a for-profit company that's
> beholding to their shareholders LONG before an open software
> community.  Consumers like Facebook and Google have
> deep pockets, something corporate executives really don't
> dismiss lightly.

This is just FUD.

MySQL is GPL'd, just like Linux is.

To say you should avoid MySQL because Oracle may someday start charging for it is like saying you should avoid Linux
becauseRed Hat may someday start charging for it. 

That makes no sense, especially since both Oracle and Red Hat are already charging for their products. Doesn't mean you
can'tkeep using free Linux and MySQL. 




Re: Why facebook used mysql ?

From
"Gauthier, Dave"
Date:
Think upgrades

-----Original Message-----
From: Andy [mailto:angelflow@yahoo.com]
Sent: Tuesday, November 09, 2010 12:02 PM
To: pgsql-general@postgresql.org; Gauthier, Dave
Subject: Re: [GENERAL] Why facebook used mysql ?


--- On Tue, 11/9/10, Gauthier, Dave <dave.gauthier@intel.com> wrote:

> A different slant on this has to do with licensing and $$.
> Might Oracle decide some day to start charging for their new
> found DB?  They are a for-profit company that's
> beholding to their shareholders LONG before an open software
> community.  Consumers like Facebook and Google have
> deep pockets, something corporate executives really don't
> dismiss lightly.

This is just FUD.

MySQL is GPL'd, just like Linux is.

To say you should avoid MySQL because Oracle may someday start charging for it is like saying you should avoid Linux
becauseRed Hat may someday start charging for it. 

That makes no sense, especially since both Oracle and Red Hat are already charging for their products. Doesn't mean you
can'tkeep using free Linux and MySQL. 




Re: Why facebook used mysql ?

From
Scott Marlowe
Date:
On Tue, Nov 9, 2010 at 10:00 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Postgres 7.2 brought non blocking vacuum.   Before that, you could
> pretty much write off any 24x7 duty applications -- dealing with dead
> tuples was just too much of a headache.

Amen!  I remember watching vacuum run alongside other queries and
getting all school-girl giggly over it.  Seriously it was a big big
change for pgsql.

> The mysql of the time, 3.23,
> was fast but locky and utterly unsafe.

True, it was common to see mysql back then just stop, dead.  Go to
bring it back up and have to repair tables.

> Postgres has been relatively disadvantaged in terms of administrative
> overhead which is a bigger deal than sql features, replication,
> performance, etc for high load website type cases.

I would say it's a bigger problem for adoption than for high load
sites.  If Joe User spends an hour a day keeping his database on his
workstation happy, he's probably not happy.  If Joe Admin spends an
hour a day keeping his 100 machine db farm happy, he's probably REALLY
happy that it only takes so long.

Re: Why facebook used mysql ?

From
Chris Browne
Date:
kamauallan@gmail.com (Allan Kamau) writes:
> I agree with Merlin, There is a surprising big number of "good"
> technology companies (including Google) out there using MySQL. For
> sometime I have been wondering why and have come up with a few
> (possibly wrong) theories. Such as: these companies are started by
> application developers not database experts, the cost (effort) of
> changing to other database engine is substantial given that that
> probably there is already so much inconsistencies in their current
> data setup coupled with considerable amount of inconsistency cover-up
> code at the application programs, and maybe the IT team is doubling up
> as a fire fighting department constantly putting out the data driven
> fires. This is then compounded by the rapid increase in data.

This wasn't a good explanation for what happened when Sabre announced
they were using MySQL:

   http://www.mysql.com/news-and-events/generate-article.php?id=2003_33

I used to work at Sabre, and what I saw was *mostly* an Oracle shop, but
with significant bastions of IMS, DB2, Teradata, and Informix.  Your
theory might fit with "dumb startups," but certainly not with Sabre,
which still has significant deployments of IMS!  :-)

I actually am inclined to go with "less rational" explanations; a lot of
decisions get made for reasons that do not connect materially (if at
all) with the technical issues.

One such would be that the lawyers and marketing folk that tend to be at
the executive layer do *their* thing of making deals, and when they're
busy "making deals," the only people interfacing with them are:

 - Salescritters from the Big O buying them lunch

 - Other Political Animals that Made The Decision to go with MySQL (or
   such) and are happy to explain, over golf, that "it went fine for us"
   (even if it didn't go entirely so fine; they didn't hear about it)

Lunch and golf can have material effects.
--
"cbbrowne","@","acm.org"
Rules of the Evil Overlord #67.  "No matter how many shorts we have in
the system, my  guards will be instructed to  treat every surveillance
camera malfunction as a full-scale emergency."

Re: Why facebook used mysql ?

From
Graham Leggett
Date:
On 09 Nov 2010, at 7:16 PM, Gauthier, Dave wrote:

> Think upgrades

This is covered by the GPL license. Once you have released code under
the GPL, all derivative code - ie upgrades - have to also be released
in source form, under the GPL license.

Regards,
Graham
--


Re: Why facebook used mysql ?

From
Andy
Date:
Any upgrades that are based on the MySQL source code will be legally required to be released under GPL too.

That's the beauty of GPL.

Software under MIT or BSD license could be hijacked by private companies. Software under GPL license could not.


--- On Tue, 11/9/10, Gauthier, Dave <dave.gauthier@intel.com> wrote:

> From: Gauthier, Dave <dave.gauthier@intel.com>
> Subject: RE: [GENERAL] Why facebook used mysql ?
> To: "Andy" <angelflow@yahoo.com>, "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
> Date: Tuesday, November 9, 2010, 12:16 PM
> Think upgrades
>
> -----Original Message-----
> From: Andy [mailto:angelflow@yahoo.com]
>
> Sent: Tuesday, November 09, 2010 12:02 PM
> To: pgsql-general@postgresql.org;
> Gauthier, Dave
> Subject: Re: [GENERAL] Why facebook used mysql ?
>
>
> --- On Tue, 11/9/10, Gauthier, Dave <dave.gauthier@intel.com>
> wrote:
>
> > A different slant on this has to do with licensing and
> $$.
> > Might Oracle decide some day to start charging for
> their new
> > found DB?  They are a for-profit company that's
> > beholding to their shareholders LONG before an open
> software
> > community.  Consumers like Facebook and Google have
> > deep pockets, something corporate executives really
> don't
> > dismiss lightly.
>
> This is just FUD.
>
> MySQL is GPL'd, just like Linux is.
>
> To say you should avoid MySQL because Oracle may someday
> start charging for it is like saying you should avoid Linux
> because Red Hat may someday start charging for it.
>
> That makes no sense, especially since both Oracle and Red
> Hat are already charging for their products. Doesn't mean
> you can't keep using free Linux and MySQL.
>
>
>      
>




Re: Why facebook used mysql ?

From
David Boreham
Date:
On 11/9/2010 10:27 AM, Graham Leggett wrote:
>
> This is covered by the GPL license. Once you have released code under
> the GPL, all derivative code - ie upgrades - have to also be released
> in source form, under the GPL license.

Sorry but this is 100% not true. It may be true for a 3rd party (you
release something under the GPL, I enhance it, therefore I am required
to release my enhancement under the GPL). But Oracle owns the copyright
to the MySql code and therefore they can decide to do whatever they want
with it. The only thing they can't do is to 'un-release' existing code
released under the GPL. Everything else is possible.

Ownership of the copyright trumps the GPL.



Re: Why facebook used mysql ?

From
Dave Page
Date:
On Tue, Nov 9, 2010 at 9:28 AM, Andy <angelflow@yahoo.com> wrote:
> Any upgrades that are based on the MySQL source code will be legally required to be released under GPL too.
>
> That's the beauty of GPL.

Upgrades released by Oracle *do not* have be under GPL. They own all
the IP, and can release future versions under whatever terms they see
fit.

Other entities, do have to use the GPL if they release their own updates.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Why facebook used mysql ?

From
David Boreham
Date:
In addition to the license a product is currently available under,
you need to also consider who owns its copyright; who owns
its test suite (which may not be open source at all); who
employs all the people who understand the code and who owns
the trademarks that identify the product.

Red Hat owns none of these things with respect to Linux
(although they do for various other products such as
their Directory Server and JBoss).



Re: Why facebook used mysql ?

From
Scott Marlowe
Date:
On Tue, Nov 9, 2010 at 10:28 AM, Andy <angelflow@yahoo.com> wrote:
> Any upgrades that are based on the MySQL source code will be legally required to be released under GPL too.
>
> That's the beauty of GPL.

This isn't entirely true.  Oracle owns all copyrights to mysql source
code.  they can release a binary only commercially licensed version
with features that they choose NOT to release under the GPL.

Re: Why facebook used mysql ?

From
Andy
Date:
Not true.

As a condition of getting European Commission's approval of its acquisition of Sun/MySQL, Oracle had to agree to
continuethe GPL release. 

And there are non-Oracle upgrades from Google, facebook, Percona, etc. So no one is beholden to Oracle.

--- On Tue, 11/9/10, Dave Page <dpage@pgadmin.org> wrote:

> From: Dave Page <dpage@pgadmin.org>
> Subject: Re: [GENERAL] Why facebook used mysql ?
> To: "Andy" <angelflow@yahoo.com>
> Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>, "DaveGauthier" <dave.gauthier@intel.com>
> Date: Tuesday, November 9, 2010, 12:31 PM
> On Tue, Nov 9, 2010 at 9:28 AM, Andy
> <angelflow@yahoo.com>
> wrote:
> > Any upgrades that are based on the MySQL source code
> will be legally required to be released under GPL too.
> >
> > That's the beauty of GPL.
>
> Upgrades released by Oracle *do not* have be under GPL.
> They own all
> the IP, and can release future versions under whatever
> terms they see
> fit.
>
> Other entities, do have to use the GPL if they release
> their own updates.
>
> --
> Dave Page
> Blog: http://pgsnake.blogspot.com
> Twitter: @pgsnake
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>




Re: Why facebook used mysql ?

From
Lincoln Yeoh
Date:
At 12:24 PM 11/9/2010, Sandeep Srinivasa wrote:
>There was an interesting post today on highscalability -

><http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html>http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html

>
>T
>
>I wonder if anyone can comment on this - especially the part that PG
>doesnt scale as well as MySQL on multiple cores ?

The "multiple cores" part is unlikely to be very relevant in the
Facebook context.

Scaling over four or 8 cores is not a biggie nowadays right? From
what I've seen the Facebook, Google type companies tend to use LOTS
of cheap servers (dual core, quad core, whatever Intel or AMD can
churn out cheaply). I doubt they use >=32 core machines for their
public facing apps.

What's more important to such companies is the ability to scale over
multiple machines. There is no way a single server is going to handle
1 billion users.

So they have to design and build their apps accordingly, to not need
"massive serialization/locking". When you post something on Facebook,
nobody else has to wait for your post first. Nobody cares if their FB
friend/likes counter is "somewhat" wrong for a while (as long as it
eventually shows the correct figure). So scaling out over multiple
machines and "sharding" isn't as hard.

Whereas if you require all posts to have a globally unique ID taken
from an integer sequence that increases without any gaps, it becomes
a rather difficult problem to scale out over many machines. No matter
how many machines you have and wherever they are in the world, every
post has to wait for the sequence.

As for why they used mysql- probably the same reason why they used
php. They're what the original developers used. It doesn't matter so
much as long as the app is not that slow and can scale out without
too much pain.

Regards,
Link.


Re: Why facebook used mysql ?

From
David Boreham
Date:
On 11/9/2010 10:45 AM, Andy wrote:
> As a condition of getting European Commission's approval of its acquisition of Sun/MySQL, Oracle had to agree to
continuethe GPL release. 

In case anyone is interested in what specifically Oracle agreed to do,
this is the text
from the decision (they agreed to do the following for 5 years
post-deal-closing) :

"Commitment to enhance MySQL in the future under the GPL. Oracle shall
continue to
enhance MySQL and make subsequent versions of MySQL, including Version 6,
available under the GPL. Oracle will not release any new, enhanced
version of MySQL
Enterprise Edition without contemporaneously releasing a new, also
enhanced version
of MySQL Community Edition licensed under the GPL. Oracle shall continue
to make
the source code of all versions of MySQL Community Edition publicly
available at no
charge."







Re: Why facebook used mysql ?

From
Dave Page
Date:
On Tue, Nov 9, 2010 at 5:45 PM, Andy <angelflow@yahoo.com> wrote:
> Not true.
>
> As a condition of getting European Commission's approval of its acquisition of Sun/MySQL, Oracle had to agree to
continuethe GPL release. 
>
> And there are non-Oracle upgrades from Google, facebook, Percona, etc. So no one is beholden to Oracle.

It is true. The EU commitments are entirely independent of the licencing.

Also note that the commitments
(http://www.oracle.com/us/corporate/press/042364) are pretty loosely
worded. For example, Oracle would be fulfilling them if they released
a new Enterprise version with 1000 new features, and a corresponding
community version with just one of those features.

And after 5 years (nearer to 4 now I guess), those commitments get
thrown away entirely.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Why facebook used mysql ?

From
Tom Lane
Date:
David Boreham <david_list@boreham.org> writes:
> In addition to the license a product is currently available under,
> you need to also consider who owns its copyright; who owns
> its test suite (which may not be open source at all); who
> employs all the people who understand the code and who owns
> the trademarks that identify the product.

Indeed.  One thing that's particularly worth noting is that the mysql
documentation is not, and never has been, freely redistributable.

The *real* license risk for mysql in the past was that all the people
who were qualified to do serious development worked for the same
company.  If that company chose to stop releasing updates for free ---
as they absolutely had the legal right to do --- the fact that you had
the source code for previous releases wasn't really going to do you a
whole lot of good.  (Now it's possible that users could band together
to start their own fork from the last GPL release, and eventually get to
the point of doing useful development.  We've seen that movie before,
in fact: it's called postgres, circa 1996 right after Berkeley abandoned
it.  So one could hope that after several years you might have a viable
development community, but there's gonna be a lot of pain first.)

The recent fragmentation of development talent over in the mysql world
might change things, but it's still very unclear what the long-term
result will be.  If I were about to choose a database to bet my company
on, I'd be afraid of picking mysql simply because its future development
path isn't clear.  Oracle may own the copyright, but all the key
developers left, so it's definitely not clear that they'll be able to do
much with it for some time to come (even assuming that they want to).
And who knows which of the forks will succeed?

            regards, tom lane

Re: Why facebook used mysql ?

From
Sandeep Srinivasa
Date:
hi,
   I am the OP.

With due respect to everyone (and sincere apologies to Richard Broersma), my intention was not to create a thread about MySQL/Oracle's business practices.

It was about the technical discussion on Highscalability - I have been trying to wrap my head around the concept of multiple core scaling for Postgres, especially beyond 8 core (like Scott's Magny Coeurs example). My doubt arises from  whether Postgres depends on the kernel scheduler for multiple CPU/core utilization. 

If that is the case, then does using FreeBSD vs Linux give rise to any differences in scaling?

Taking the question one step further, do different Linux kernels (and schedulers) impact Postgres scalability ? The Phoronix Test Suite already tests linux kernel releases for regressions in performance w.r.t postgres DB performance (e.g http://www.phoronix.com/scan.php?page=article&item=linux_perf_regressions&num=1), but doesnt particularly focus on multiple cores.

Is it something that should be benchmarked ?

thanks
-Sandeep

P.S. on the topic of scalability, here is another article - http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html , where people have asked if a similar thing can be done using Postgres UDF or a marshalling  ODBA  http://scm.ywesee.com/?p=odba/.git;a=summary

Re: Why facebook used mysql ?

From
David Boreham
Date:
On 11/9/2010 11:10 AM, Sandeep Srinivasa wrote:
> It was about the technical discussion on Highscalability - I have been
> trying to wrap my head around the concept of multiple core scaling for
> Postgres, especially beyond 8 core (like Scott's Magny Coeurs
> example). My doubt arises from  whether Postgres depends on the kernel
> scheduler for multiple CPU/core utilization.
>
> If that is the case, then does using FreeBSD vs Linux give rise to any
> differences in scaling?

Hmm...typically multi-core scaling issues are in the area of memory
contention and cache coherence (and therefore are for the most part not
dependent on the OS and its scheduler).



Re: Why facebook used mysql ?

From
Sandeep Srinivasa
Date:


On Tue, Nov 9, 2010 at 11:46 PM, David Boreham <david_list@boreham.org> wrote:

Hmm...typically multi-core scaling issues are in the area of memory contention and cache coherence (and therefore are for the most part not dependent on the OS and its scheduler).

If it is independent of the OS, then how does one go about tuning it.

Consider this - I get a 12 core server on which I want multiple webserver instances + DB. Can one create CPU pools (say core 1,2,3 for webservers, 4,5,6,7 for DB, etc.) ?

I know about taskset, but should one be using it ?


Re: Why facebook used mysql ?

From
Scott Marlowe
Date:
On Tue, Nov 9, 2010 at 11:10 AM, Sandeep Srinivasa <sss@clearsenses.com> wrote:
> hi,
>    I am the OP.
> With due respect to everyone (and sincere apologies to Richard Broersma), my
> intention was not to create a thread about MySQL/Oracle's business
> practices.

Hehe, we head off on tangents.  It's common, don't worry.

> It was about the technical discussion on Highscalability - I have been
> trying to wrap my head around the concept of multiple core scaling for
> Postgres, especially beyond 8 core (like Scott's Magny Coeurs example). My
> doubt arises from  whether Postgres depends on the kernel scheduler for
> multiple CPU/core utilization.
> If that is the case, then does using FreeBSD vs Linux give rise to any
> differences in scaling?

All multi-process applications like pgsql have to depend on the OS
kernel scheduler to get their processes run.  But in terms of scaling,
that's usually not the biggest issue, it's getting rid of choke points
in the kernel like linux's much earlier versions having one big spin
lock on huge chunks of the kernel.  That's been gone a long time, but
as the number of cores keeps going up, new chokepoints are found and
fixes in both Linux and BSD.

> Taking the question one step further, do different Linux kernels (and
> schedulers) impact Postgres scalability ? The Phoronix Test Suite already
> tests linux kernel releases for regressions in performance w.r.t postgres DB
> performance

The IO scheduler mostly just gets in the way on bigger machines with
battery backed caching controllers and / or SAN arrays.

> (e.g http://www.phoronix.com/scan.php?page=article&item=linux_perf_regressions&num=1),
> but doesnt particularly focus on multiple cores.
> Is it something that should be benchmarked ?

Yes.  Sadly, to do so you really need a $7500 machine

Re: Why facebook used mysql ?

From
Graham Leggett
Date:
On 09 Nov 2010, at 7:30 PM, David Boreham wrote:

> Sorry but this is 100% not true. It may be true for a 3rd party (you
> release something under the GPL, I enhance it, therefore I am
> required to release my enhancement under the GPL). But Oracle owns
> the copyright to the MySql code and therefore they can decide to do
> whatever they want with it. The only thing they can't do is to 'un-
> release' existing code released under the GPL. Everything else is
> possible.
>
> Ownership of the copyright trumps the GPL.

Ownership of the copyright is owned by whoever made the contribution,
and any competent version control system will give you the list of
contributions (and therefore contributors). If a contribution was made
in terms of the GPL, then permission would need to be sought from
everyone who has made a contribution before it could be released under
a different license.

Regards,
Graham
--


Re: Why facebook used mysql ?

From
Scott Marlowe
Date:
On Tue, Nov 9, 2010 at 1:04 PM, Graham Leggett <minfrin@sharp.fm> wrote:
> On 09 Nov 2010, at 7:30 PM, David Boreham wrote:
>
>> Sorry but this is 100% not true. It may be true for a 3rd party (you
>> release something under the GPL, I enhance it, therefore I am required to
>> release my enhancement under the GPL). But Oracle owns the copyright to the
>> MySql code and therefore they can decide to do whatever they want with it.
>> The only thing they can't do is to 'un-release' existing code released under
>> the GPL. Everything else is possible.
>>
>> Ownership of the copyright trumps the GPL.
>
> Ownership of the copyright is owned by whoever made the contribution, and
> any competent version control system will give you the list of contributions
> (and therefore contributors). If a contribution was made in terms of the
> GPL, then permission would need to be sought from everyone who has made a
> contribution before it could be released under a different license.

Contributed code to MySQL AB MUST be assigned copyright to MySQL AB.
 If it's been incorporated into MySQL proper, it's owned by MySQL AB
ne Oracle.

Re: Why facebook used mysql ?

From
Tom Lane
Date:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> On Tue, Nov 9, 2010 at 1:04 PM, Graham Leggett <minfrin@sharp.fm> wrote:
>> Ownership of the copyright is owned by whoever made the contribution, and
>> any competent version control system will give you the list of contributions
>> (and therefore contributors). If a contribution was made in terms of the
>> GPL, then permission would need to be sought from everyone who has made a
>> contribution before it could be released under a different license.

> Contributed code to MySQL AB MUST be assigned copyright to MySQL AB.

Yeah, MySQL AB and successors have been very careful to ensure that they
have air-tight ownership of that code.  I've been asked for copyright
assignments for four-line patches :-(

            regards, tom lane

Re: Why facebook used mysql ?

From
Dann Corbit
Date:

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Sandeep Srinivasa
Sent: Tuesday, November 09, 2010 10:10 AM
To: Lincoln Yeoh
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Why facebook used mysql ?

 

hi,

   I am the OP.

 

With due respect to everyone (and sincere apologies to Richard Broersma), my intention was not to create a thread about MySQL/Oracle's business practices.

 

It was about the technical discussion on Highscalability - I have been trying to wrap my head around the concept of multiple core scaling for Postgres, especially beyond 8 core (like Scott's Magny Coeurs example). My doubt arises from  whether Postgres depends on the kernel scheduler for multiple CPU/core utilization. 

 

If that is the case, then does using FreeBSD vs Linux give rise to any differences in scaling?

 

Taking the question one step further, do different Linux kernels (and schedulers) impact Postgres scalability ? The Phoronix Test Suite already tests linux kernel releases for regressions in performance w.r.t postgres DB performance (e.g http://www.phoronix.com/scan.php?page=article&item=linux_perf_regressions&num=1), but doesnt particularly focus on multiple cores.

 

Is it something that should be benchmarked ?

 

thanks

-Sandeep

 

P.S. on the topic of scalability, here is another article - http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html , where people have asked if a similar thing can be done using Postgres UDF or a marshalling  ODBA  http://scm.ywesee.com/?p=odba/.git;a=summary

>> 

Regarding scaling, there is an interesting NoSQL engine called Kyoto cabinet that has some testing in high volume transactions under different loads and different conditions.

The Kyoto cabinet data engine is written by the Tokyo cabinet author Mr. Hirabayashi of Fallabs.com.  In this document:

http://fallabs.com/kyotocabinet/spex.html

We find something interesting.  In the section called Transaction, we have this:

=============================================================

default

risk on process crash: Some records may be missing.

risk on system crash: Some records may be missing.

performance penalty: none

remark: Auto recovery after crash will take time in proportion of the database size.

transaction

implicit usage: open(..., BasicDB::OAUTOTRAN);

explicit usage: begin_transaction(false); ...; end_transaction(true);

risk on process crash: none

risk on system crash: Some records may be missing.

performance penalty: Throughput will be down to about 30% or less.

transaction + synchronize

implicit usage: open(..., BasicDB::OAUTOTRAN | BasicDB::OAUTOSYNC);

explicit usage: begin_transaction(true); ...; end_transaction(true);

risk on process crash: none

risk on system crash: none

performance penalty: Throughput will be down to about 1% or less.

=============================================================

 

Notice that there is a 3:1 penalty for flushing request to disk from the program to the operating system, and a 100:1 penalty for hard flushing from the operating system to the disk.

So the simple way to scale to huge volumes is simply to allow data loss.  That is a major way in which NoSQL data systems can achieve absurd transaction rates.

 

There are also distributed hash tables (these are also called NoSQL engines, but they are an entirely different technology).  With a distributed hash table, you can get enormous scaling and huge transaction volumes.

http://en.wikipedia.org/wiki/Distributed_hash_table

Distributed hash tables are another kind of key/value store {but an entirely different technology compared to traditional key/value stores like DBM}.

 

When we design a data system, we should examine the project requirements and choose appropriate tools to solve the problems facing the project. 

 

For something like FaceBook, a Key/Value store is a very good solution.  You do not have a big collection of related tables, and there are no billion dollar bank transactions taking place where someone will get a bit bent out of shape if it goes missing.

 

For an analytic project where we plan to do cube operations, a column store like MonetDB is a good idea.

 

For a transactional system like “Point Of Sale” or Accounting, a traditional RDBMS like PostgreSQL is the best solution.

 

I think that an interesting path for growth would be to expand PostgreSQL to allow different table types.  For instance, all leaf tables (those tables without any children) could easily be Key/Value stores.  For analytics, create column store tables.  For ultra-high access, have a distributed hash table.

 

But right now, most RDBMS systems do not have these extra, special table types.  So if you want tools that do those things, then use those tools.

 

IMO-YMMV

<< 

Re: Why facebook used mysql ?

From
David Boreham
Date:
On 11/9/2010 11:36 AM, Sandeep Srinivasa wrote:

> If it is independent of the OS, then how does one go about tuning it.
>
> Consider this - I get a 12 core server on which I want multiple
> webserver instances + DB. Can one create CPU pools (say core 1,2,3 for
> webservers, 4,5,6,7 for DB, etc.) ?
>
> I know about taskset, but should one be using it ?

There are plenty of things you might do, but first you need to figure
out what problem you're solving.
I'd suggest deploying a relatively simple configuration then evaluate
its capacity under your workload.
Does it run fast enough? If so, then job done. If not then why not, and
so on...

The simplest configuration would be one web server instance and one DB
instance.

I don't think you should be looking at process partitioning and core
affinity unless you have already proved that
you have processes that don't scale over the cores you have, to deliver
the throughput you need.



Re: Why facebook used mysql ?

From
Scott Marlowe
Date:
On Tue, Nov 9, 2010 at 4:12 PM, David Boreham <david_list@boreham.org> wrote:
>
> I don't think you should be looking at process partitioning and core
> affinity unless you have already proved that
> you have processes that don't scale over the cores you have, to deliver the
> throughput you need.

Note that you're likely to get FAR more out of processor affinity with
multiple NICs assigned each to its own core / set of cores that share
L3 cache and such.    Having the nics and maybe RAID controllers and /
or fibre channel cards etc on their own set of cores in one group can
make a big difference.

Processor affinity doesn't seem to make much difference for me with
pgsql.  Modern linux schendulers are pretty good at keeping things on
the same core for a while without predefined affinity.

Re: Why facebook used mysql ?

From
David Boreham
Date:
On 11/9/2010 5:05 PM, Scott Marlowe wrote:
> Note that you're likely to get FAR more out of processor affinity with
> multiple NICs assigned each to its own core / set of cores that share
> L3 cache and such.    Having the nics and maybe RAID controllers and /
> or fibre channel cards etc on their own set of cores in one group can
> make a big difference.

Be careful though: this phenomenon typically only comes into play at
very high I/O rates.
I wouldn't want to send the OP down this path without first verifying
that he has a problem.
Most folks are not trying to push 100k requests/s out their web servers..



Re: Why facebook used mysql ?

From
Robert Treat
Date:
On Tue, Nov 9, 2010 at 1:36 PM, Sandeep Srinivasa <sss@clearsenses.com> wrote:

On Tue, Nov 9, 2010 at 11:46 PM, David Boreham <david_list@boreham.org> wrote:

Hmm...typically multi-core scaling issues are in the area of memory contention and cache coherence (and therefore are for the most part not dependent on the OS and its scheduler).

If it is independent of the OS, then how does one go about tuning it.

Consider this - I get a 12 core server on which I want multiple webserver instances + DB. Can one create CPU pools (say core 1,2,3 for webservers, 4,5,6,7 for DB, etc.) ?

I know about taskset, but should one be using it ?


You can do this in some systems (we've done it on solaris systems for example), but realize that for any of the high scale websites, they run dedicated machines for database and web services; and that's essentially a mandatory requirement just for purposes of having visibility into what is affecting your server performance at scale.

It might also be worth mentioning that Facebook doesn't actually run MySQL like you'd get from Oracle; they have their own custom patch set that is tuned specifically for their servers (based on their OS modifications as well). Probably the closest equivalent would be Percona's XtraDB table engine, and I have seen some benchmarks that would certainly show comparable performance at 32 cores if not slightly better, but of course it will be somewhat workload dependent. It's mostly irrelevant though to "internet oriented" companies, very few are looking for 32+ core systems as a solution to their problems.


Robert Treat
play: http://www.xzilla.net
work: http://www.omniti.com/is/hiring

Re: Why facebook used mysql ?

From
Ron Mayer
Date:
Lincoln Yeoh wrote:
> What's more important to such companies is the ability to scale over
> multiple machines.

That question - how much work it is to administer thousands of database
servers - seems to have been largely missing from this conversation.

Apparently back in 2008, Facebook had 1800 MySQL servers with 2 DBAs.[1]

I wonder how that compares with large-scale Postgres deployments.

  Ron

[1] http://perspectives.mvdirona.com/2008/04/22/1800MySQLServersWithTwoDBAs.aspx


Re: Why facebook used mysql ?

From
r t
Date:
On Sun, Nov 14, 2010 at 12:15 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:
Lincoln Yeoh wrote:
> What's more important to such companies is the ability to scale over
> multiple machines.

That question - how much work it is to administer thousands of database
servers - seems to have been largely missing from this conversation.

Apparently back in 2008, Facebook had 1800 MySQL servers with 2 DBAs.[1]

I wonder how that compares with large-scale Postgres deployments.


 From a technology standpoint, it doesn't need to be ostensibly different, provided you use Postgres in a way similar to how facebook is using MySQL. Well, at least now; 8.4's re-implementation of the free space map was critical for "zero-administration" type deployments. If you can script basic failover deployments (remember that 1/2 of those 1800 are just slave machines), you don't abstract storage from the app, and you keep database schema similar across nodes, you can really ramp up the number of deployed servers per dba.


Robert Treat
play: http://www.xzilla.net
work: http://www.omniti.com/is/hiring

when postgres failed to recover after the crash...

From
anypossibility
Date:
I am running postgres postgres version 8.3 on OS X.
The data directory is on network volume.
The network volume was disconnected and server crashed.
Log reported that last know up was 9:30 pm (about 30 min prior to the server crash.)
My conf Checkpoint_Segments setting = 3 (not sure if this is all helpful info)
When postgres try to recover from the crash the volume was still not mounted.
Once storage volume is re-connected:
There were index corruptions. I fixed all that with single user mode.
There were also some missing record... 
I understand that some updates were lost because they haven't been written to the disk yet hence updates are lost.
However, it seems that record that were created long time ago (but updated before the crash occurs) is completely missing (unable to find even after reindex is done).
Does this make sense? or Is this impossible and record might be somewhere on the disk? 
Thank you very much for your time in advance.

Re: when postgres failed to recover after the crash...

From
Craig Ringer
Date:
On 15/11/10 07:04, anypossibility wrote:
> I am running postgres postgres version 8.3 on OS X.
> The data directory is on network volume.

What kind of network volume?

An AFP mount? SMB share? NFS? iSCSI?

In general, it's a really bad idea to run PostgreSQL (or any other
database) over file-level network storage like SMB/AFP/CIFS/NFS.
Block-level network storage like iSCSI is generally OK, depending on the
quality of the drivers in target and initiator.

> I understand that some updates were lost because they haven't been
> written to the disk yet hence updates are lost.
> However, it seems that record that were created long time ago (but
> updated before the crash occurs) is completely missing (unable to find
> even after reindex is done).
> Does this make sense? or Is this impossible and record might be
> somewhere on the disk?

Without details it is hard to know.

Before you do anything more, make a COMPLETE COPY of the entire data
directory, including the pg_clog, pg_xlog, etc directories as well as
the main data base storage. Put it somewhere safe and do not touch it
again, because it might be critical for recovery.

In addition to the network file system type, provide the log files
generated by PostgreSQL when you post a follow-up. These might provide
some explanation of what is wrong.

There is a significant chance your database is severely corrupted if
you've been using a network file system that doesn't respect write
ordering and had it unexpectedly disconnect.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: when postgres failed to recover after the crash...

From
Gabriele Bartolini
Date:
Hi,
> In general, it's a really bad idea to run PostgreSQL (or any other
> database) over file-level network storage like SMB/AFP/CIFS/NFS.
> Block-level network storage like iSCSI is generally OK, depending on the
> quality of the drivers in target and initiator.

What Craig says it is true and it might be worth reading the free
chapter about "Database Hardware" from Greg's book on high performance,
which you can download from
http://blog.2ndquadrant.com/en/2010/10/postgresql-90-high-performance.html

> Before you do anything more, make a COMPLETE COPY of the entire data
> directory, including the pg_clog, pg_xlog, etc directories as well as
> the main data base storage. Put it somewhere safe and do not touch it
> again, because it might be critical for recovery.
Yes, also in case you had any tablespace do not forget about them. But a
cold backup in this cases it is always a good thing.

Ciao,
Gabriele

--
  Gabriele Bartolini - 2ndQuadrant Italia
  PostgreSQL Training, Services and Support
  gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Why facebook used mysql ?

From
Fredric Fredricson
Date:
On 11/09/2010 06:01 PM, Andy wrote:
> MySQL is GPL'd, just like Linux is.
>
Well it is and it isn't. A couple of years ago when I was involved with
choosing DB for a (proprietary) application we could not figure MySQLs
license out. It was GPL'd but at the same time if you wanted to use it
commercially you had to pay. As far as we could tell you should not need
LGPL to make calls to a database.

We choose Postgresql based on features (MySQL did not have stored
procedures back then) so we never resolved the license "dilemma" but it
sure looked strange.

/Fredric

PS. The license cost as such would not be prohibitive but in our case we
did not want the administration that follows licenses. The application
was a industrial machine and there where several thousands already out
that that would be upgraded and a couple of hundreds a year produced.

Attachment

Re: when postgres failed to recover after the crash...

From
anypossibility
Date:
Gabriele, 
Thank you for the link. I downloaded the book and read the chapter. Very useful information.

Craig,
The storage type is SAN over AFP. 

Unfortunately, it has been a week or so since the crash. We were able to recover lost data from last backup (a few hours old) but next time, I will copy entire data directory before restart the server (thank you for the advice).

Here is the copy from pg_log:

Log from 09:19 PM (First time the volume was disconnected)
LOG:  database system was interrupted; last known up at 2010-11-10 21:01:40 MST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 4C/EAA135CC
LOG:  redo is not required
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

Log from 09:27 PM (When volume was re-mounted)
LOG:  database system was interrupted; last known up at 2010-11-10 21:19:13 MST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 4C/EAA1360C
LOG:  redo is not required
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down 










---- On Sun, 14 Nov 2010 18:40:29 -0800 Gabriele Bartolini <gabriele.bartolini@2ndQuadrant.it> wrote ----

Hi,
> In general, it's a really bad idea to run PostgreSQL (or any other
> database) over file-level network storage like SMB/AFP/CIFS/NFS.
> Block-level network storage like iSCSI is generally OK, depending on the
> quality of the drivers in target and initiator.

What Craig says it is true and it might be worth reading the free
chapter about "Database Hardware" from Greg's book on high performance,
which you can download from
http://blog.2ndquadrant.com/en/2010/10/postgresql-90-high-performance.html

> Before you do anything more, make a COMPLETE COPY of the entire data
> directory, including the pg_clog, pg_xlog, etc directories as well as
> the main data base storage. Put it somewhere safe and do not touch it
> again, because it might be critical for recovery.
Yes, also in case you had any tablespace do not forget about them. But a
cold backup in this cases it is always a good thing.

Ciao,
Gabriele

--
Gabriele Bartolini - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: when postgres failed to recover after the crash...

From
Craig Ringer
Date:
On 15/11/10 19:59, anypossibility wrote:
> Gabriele,
> Thank you for the link. I downloaded the book and read the chapter. Very
> useful information.
>
> Craig,
> The storage type is SAN over AFP.

I very, very strongly suggest getting your SAN host to export an iSCSI
volume to mount directly on your mac instead. Using PostgreSQL over AFP
is (as far as I know) not regularly tested, and it's certainly not a
good idea.

--
Craig Ringer