Re: Drupal and PostgreSQL - performance issues? - Mailing list pgsql-general

From Scott Marlowe
Subject Re: Drupal and PostgreSQL - performance issues?
Date
Msg-id dcc563d10810140556v399ce9ddq94e9c88da2b8a742@mail.gmail.com
Whole thread Raw
In response to Re: Drupal and PostgreSQL - performance issues?  (Ivan Sergio Borgonovo <mail@webthatworks.it>)
Responses Re: Drupal and PostgreSQL - performance issues?  ("Ang Chin Han" <ang.chin.han@gmail.com>)
Re: Drupal and PostgreSQL - performance issues?  (Ivan Sergio Borgonovo <mail@webthatworks.it>)
List pgsql-general
On Tue, Oct 14, 2008 at 3:40 AM, Ivan Sergio Borgonovo
<mail@webthatworks.it> wrote:
> On Mon, 13 Oct 2008 20:45:39 -0600
> "Joshua Tolley" <eggyknap@gmail.com> wrote:
>
> Premise:
> I'm not sustaining that the "default" answers are wrong, but they are
> inadequate.
> BTW the OP made a direct comparison of pgsql and mysql running
> drupal. That's a bit different than just asking: how can I improve
> PostgreSQL performances.

Sadly, no one has run any meaningful benchmarks so far.

>> This is a useful question, but there are reasonable answers to it.
>> The key underlying principle is that it's impossible to know what
>> will work well in a given situation until that situation is
>> tested. That's why benchmarks from someone else's box are often
>> mostly useless on your box, except for predicting generalities and
>> then only when they agree with other people's benchmarks.
>> PostgreSQL ships with a very conservative default configuration
>> because (among other things, perhaps) 1) it's a configuration
>> that's very unlikely to fail miserably for most situations, and 2)
>
> So your target are potential skilled DBA that have a coffe pot as
> testing machine?

Actually a lot has been done to better tune pgsql out of the box, but
since it uses shared memory and many oses still come with incredibly
low shared mem settings we're stuck.

> Still you've another DB that kick your ass in most common hardware
> configuration and workload. Something has to be done about the
> tuning. Again... a not tuned Ferrari can't win a F1 GP competing
> with a tuned McLaren but it can stay close. A Skoda Fabia can't.

Except the current benchmark is how fast you can change the tires.

> When people come here and ask why PostgreSQL is slow as a Skoda
> compared to a Ferrari in some tasks and you reply they have to
> tune... a) they will think you're trying to sell them a Skoda b)
> they will think you're selling a Ferrari in a mounting kit.

Actually the most common answer is to ask them if they've actually
used a realistic benchmark.  Then tune.

> It even doesn't help to guide people if 9 out of 10 you reply:
> before we give you any advice... you've to spend half day learning
> how to tune PostgreSQL. When they come back... you reply... but your
> benchmark was not suited for your real work workload.
> It makes helping people hard.

Getting things right is hard.  Do you think any joe can get behind the
wheel of an F1 car and just start driving?  Remember, for every
problem, there is a simple, easy, elegant answer, and it's wrong.

> Remember we are talking about PostgreSQL vs. MySQL performance
> running Drupal.

Yes, and the very first consideration should be, "Will the db I'm
choosing be likely to eat my data?"  If you're not sure on that one
all the benchmarketing in the world won't make a difference.

> But still people point at benchmark where PostgreSQL outperform
> MySQL.
> People get puzzled.

Because they don't understand what databases are and what they do maybe?

> Things like: MySQL will eat your data are hard to sustain and
> explain.

Google is your friend.  Heck, you can find account after account from
MySQL fanboys about their favorite database eating their data.

> I don't have direct experience on corrupted DB... but I'd say it is
> easier to program PostgreSQL than MySQL once your project is over 30
> lines of code because it is less sloppy.
> This is easier to prove: point at the docs and to SQL standard.

Lots of people feel MySQL's tutorial style docs are easier to
comprehend. Especially those unfamiliar with dbs.  I prefer
PostgreSQL's docs, as they are more thorough better suited for a
semi-knowledgable DBA.

>> it's assumed that if server performance matters, someone will
>> spend time tuning things. The fact that database X performs better
>> than PostgreSQL out of the box is fairly irrelevant; if
>> performance matters, you won't use the defaults, you'll find
>> better ones that work for you.
>
> The fact that out of the box on common hardware PostgreSQL
> under-perform MySQL with default config would matter if few
> paragraph below you wouldn't say that integrity has a *big*
> performance cost even on read-only operation.
> When people come back crying that PostgreSQL under-perform with
> Drupal they generally show a considerable gap between the 2.

Again, this is almost always for 1 to 5 users.  Real world DBs have
dozens to hundreds to even thousands of simultaneous users.  My
PostgreSQL servers at work routinely have 10 or 20 queries running at
the same time, and peak at 100 or more.

> But generally the performance gap is astonishing on default
> configuration.

Only for unrealistic benchmarks.  Seriously, for any benchmark with
large concurrency and / or high write percentage, postgreSQL wins.

>It is hard to win the myth surrounding PostgreSQL...
> but again... if you've to trade integrity for speed... at least you
> should have numbers to show what are you talking about. Then people
> may decide.
> You're using a X% slower, Y% more reliable DB.
> You're using a X% slower, Y% more scalable DB. etc...

It's not just integrity for speed!  IT's the fact that MySQL has
serious issues with large concurrency, especially when there's a fair
bit of writes going on.  This is especially true for myisam, but not
completely solved in the Oracle-owned innodb table handler.

> Well horror stories about PostgreSQL being doggy slow are quite
> common among MySQL users.

Users who run single thread benchmarks.  Let them pit their MySQL
servers against my production PostgreSQL servers with a realistic
load.

> If I see a performance gap of 50% I'm going to think that's not
> going to be that easy to fill it with "tuning".
> That means:
> - I may think that with a *reasonable* effort I could come close and
> then I'll have to find other good reasons other than performances to
> chose A in spite of B

Then you are putting your cart before your horse.  Choosing a db based
on a single synthetic benchmark is like buying a car based on the
color of the shift knob.  Quality is far more important.  And so is
your data: "MySQL mangling your data faster than any other db!"  is
not a good selling point..

> Now... you've to tune is not the kind of answer that will help me to
> take a decision in favour of PostgreSQL.

Then please use MySQL.  I've got a db that works well for me.  When
MySQL proves incapable of handling the load, then come back and ask
for help migrating.

> Anyway a 50% or more performance gap is something that make hard to
> take any decision. It something that really makes hard even to give
> advices to new people.

Yes.  You keep harping on the 50% performance gap.  One you nor anyone
else has demonstrated to exist with any reasonable test.

> Why do comparisons between PostgreSQL and MySQL come up so
> frequently?
>
> Because MySQL "is the DB of the Web".
> Many web apps are (were) mainly "read-only" and their data integrity
> is (was) not so important.
> Many Web apps are (were) simple.
>
> Web apps and CMS are a reasonably large slice of what's moving on
> the net. Do these applications need the features PostgreSQL has?

Is their data important?  Is downtime a bad thing for them?

> Is there any trade off? Is it worth to pay that trade off?
>
> Is it worth to conquer this audience even if they are not skilled
> DBA?

Only if they're willing to learn.  I can't spend all day tuning their
pgsql servers for free.  IF not, then let them go, and they'll come
back when they need to.

pgsql-general by date:

Previous
From: Ivan Sergio Borgonovo
Date:
Subject: Re: Drupal and PostgreSQL - performance issues?
Next
From: Sam Mason
Date:
Subject: Re: More schema design advice requested