Re: Drupal and PostgreSQL - performance issues? - Mailing list pgsql-general

From Ivan Sergio Borgonovo
Subject Re: Drupal and PostgreSQL - performance issues?
Date
Msg-id 20081014114011.0ebe495a@dawn.webthatworks.it
Whole thread Raw
In response to Re: Drupal and PostgreSQL - performance issues?  ("Joshua Tolley" <eggyknap@gmail.com>)
Responses Re: Drupal and PostgreSQL - performance issues?  ("Scott Marlowe" <scott.marlowe@gmail.com>)
Re: Drupal and PostgreSQL - performance issues?  (Mikkel Høgh <mikkel@hoegh.org>)
Re: Drupal and PostgreSQL - performance issues?  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-general
On Mon, 13 Oct 2008 20:45:39 -0600
"Joshua Tolley" <eggyknap@gmail.com> wrote:

Premise:
I'm not sustaining that the "default" answers are wrong, but they are
inadequate.
BTW the OP made a direct comparison of pgsql and mysql running
drupal. That's a bit different than just asking: how can I improve
PostgreSQL performances.

I'm happy with PostgreSQL, it does what I think is important for me
better than MySQL... and I'm using it on Drupal in nearly all the
websites I developed.

> On Mon, Oct 13, 2008 at 1:02 AM, Ivan Sergio Borgonovo
> <mail@webthatworks.it> wrote:
> <snip>
> > Anyway I don't find myself comfortable with replies in these 2
> > lines of reasoning:
> > 1) default configuration of PostgreSQL generally doesn't perform
> > well 2) PostgreSQL may be slower but mySQL may trash your data.

> > I think these answers don't make a good service to PostgreSQL.

> > 1) still leave the problem there and doesn't give any good reason
> > why Postgresql comes with a doggy default configuration on most
> > hardware. It still doesn't explain why I've to work more tuning
> > PostgreSQL to achieve similar performances of other DB when
> > other DB don't require tuning.

> This is a useful question, but there are reasonable answers to it.
> The key underlying principle is that it's impossible to know what
> will work well in a given situation until that situation is
> tested. That's why benchmarks from someone else's box are often
> mostly useless on your box, except for predicting generalities and
> then only when they agree with other people's benchmarks.
> PostgreSQL ships with a very conservative default configuration
> because (among other things, perhaps) 1) it's a configuration
> that's very unlikely to fail miserably for most situations, and 2)

So your target are potential skilled DBA that have a coffe pot as
testing machine?
I don't want temporary needs of unskilled dev driving PostgreSQL
project, but they are all potential users. Users too make a project
more successful. Not every dev using a DB is a DBA, not every
project in need for a DB is mature enough to have DBA knowledge.

Still you've another DB that kick your ass in most common hardware
configuration and workload. Something has to be done about the
tuning. Again... a not tuned Ferrari can't win a F1 GP competing
with a tuned McLaren but it can stay close. A Skoda Fabia can't.

When people come here and ask why PostgreSQL is slow as a Skoda
compared to a Ferrari in some tasks and you reply they have to
tune... a) they will think you're trying to sell them a Skoda b)
they will think you're selling a Ferrari in a mounting kit.

It even doesn't help to guide people if 9 out of 10 you reply:
before we give you any advice... you've to spend half day learning
how to tune PostgreSQL. When they come back... you reply... but your
benchmark was not suited for your real work workload.
It makes helping people hard.

Remember we are talking about PostgreSQL vs. MySQL performance
running Drupal.

But still people point at benchmark where PostgreSQL outperform
MySQL.
People get puzzled.

Things like: MySQL will eat your data are hard to sustain and
explain.
I don't have direct experience on corrupted DB... but I'd say it is
easier to program PostgreSQL than MySQL once your project is over 30
lines of code because it is less sloppy.
This is easier to prove: point at the docs and to SQL standard.

> it's assumed that if server performance matters, someone will
> spend time tuning things. The fact that database X performs better
> than PostgreSQL out of the box is fairly irrelevant; if
> performance matters, you won't use the defaults, you'll find
> better ones that work for you.

The fact that out of the box on common hardware PostgreSQL
under-perform MySQL with default config would matter if few
paragraph below you wouldn't say that integrity has a *big*
performance cost even on read-only operation.
When people come back crying that PostgreSQL under-perform with
Drupal they generally show a considerable gap between the 2.

> > Making performance comparable without expert tuning will a) stop
> > most too easy critics about PostgreSQL performances b) give
> > developers much more feedback on PostgreSQL performance in
> > "nearer to optimal" setup.

> Most of the complaints of PostgreSQL being really slow are from
> people who either 1) use PostgreSQL assuming its MySQL and
> therefore don't do things they way a real DBA would do them, or 2)
> simply repeat myths they've heard about PostgreSQL performance and
> have no experience to back up. While it would be nice to be able
> to win over such people, PostgreSQL developers tend to worry more
> about pleasing the people who really know what they're doing. (The
> apparent philosophical contradiction between my statements above
> and the fact that I'm writing something as inane as PL/LOLCODE
> doesn't cause me much lost sleep -- yet)

> > If it is easy to write a tool that will help you to tune
> > PostgreSQL, it seems it would be something that will really help
> > PostgreSQL diffusion and improvements. If it is *complicated* to
> > tune PostgreSQL so that it's performance can be *comparable* (I
> > didn't write optimal) with other DB we have a problem.

> It's not easy to write such a tool; the lists talk about one every
> few months, and invariable conclude it's harder than just teaching
> DBAs to do it (or alternatively letting those that need help pay
> those that can help to tune for them).

But generally the performance gap is astonishing on default
configuration. It is hard to win the myth surrounding PostgreSQL...
but again... if you've to trade integrity for speed... at least you
should have numbers to show what are you talking about. Then people
may decide.
You're using a X% slower, Y% more reliable DB.
You're using a X% slower, Y% more scalable DB. etc...
Or at least tell people they are buying a SUV, a Ferrari or a train
first.
We were talking about CMS. So we know it is not Ferrari, it is not a
Skoda and it may be a train or a SUV (sort of...).

> As to whether it's a problem that it's a complex thing to tune,
> sure it would be nice if it were easier, and efforts are made
> along those lines all the time (cf. GUC simplification efforts for
> a contemporary example). But databases are complex things, and any
> tool that makes them overly simple is only glossing over the
> important details.

You trade complexity for flexibility... so is PostgreSQL a SUV, a
Ferrari, a Skoda and a train too sold in a mounting kit?
I'd expect that if it was a Skoda I wouldn't have any tuning problem
to win a Ferrari on consumption.

> > 2) I never saw a "trashing data benchmark" comparing reliability
> > of PostgreSQL to MySQL. If what I need is a fast DB I'd chose
> > mySQL... I think this could still not be the best decision to
> > take based on *real situation*.

> If you've got an important application (for some definition of
> "important"), your considerations in choosing underlying software
> are more complex than "is it the fastest option". Horror stories
> about MySQL doing strange things to data, because of poor integrity
> constraints, ISAM tables, or other problems are fairly common
> (among PostgreSQL users, at least :) But I will also admit I have

Well horror stories about PostgreSQL being doggy slow are quite
common among MySQL users.
But while it is very easy to "prove" the later on a test config with
default on PostgreSQL it is harder to prove the former.
So it would be better to rephrase the former so it is easier to
prove or just change the term of comparison.
Anyway making easier to tune PostgreSQL even if not optimally would
be a good target.

> > What I get with that kind of answer is:
> > an admission: - PostgreSQL is slow

> People aren't saying that. They're saying it works better when
> someone who knows what they're doing runs it.

I find this a common excuse of programmers.
You user are an asshole, my software is perfect.
It's not a matter of "better". When people comes here saying
PostgreSQL perform badly serving Drupal the performance gap is not
realistically described just with "better".

> > But is PostgreSQL competitive as a DB engine for apps like Drupal
> > for the "average user"?

> So are we talking about the "average user", or someone who needs
> real performance? The average user certainly cares about
> performance, but if (s)he really cares, (s)he will put time toward
> achieving performance.

If I see a performance gap of 50% I'm going to think that's not
going to be that easy to fill it with "tuning".
That means:
- I may think that with a *reasonable* effort I could come close and
then I'll have to find other good reasons other than performances to
chose A in spite of B
- I may think I need a miracle I'm not willing to bet on/pay for.

Now... you've to tune is not the kind of answer that will help me to
take a decision in favour of PostgreSQL.

Anyway a 50% or more performance gap is something that make hard to
take any decision. It something that really makes hard even to give
advices to new people.

Reducing that gap with a set of "common cases" .conf may help.
When people will see a 10%-15% gap without too much effort they will
be more willing to listen what you've to offer more and will believe
easier that that gap can be reduced further.

Unless every camp keeps on believing in myths and new comers have to
believe faithfully.

You may think that people coming here asking for performance advices
already collected enough information on eg. PostgreSQL features...
but it may not be the case.

Think about Drupal developers coming here and asking... is it worth
to support PostgreSQL?

Let me go even further out of track...

Why do comparisons between PostgreSQL and MySQL come up so
frequently?

Because MySQL "is the DB of the Web".
Many web apps are (were) mainly "read-only" and their data integrity
is (was) not so important.
Many Web apps are (were) simple.

Web apps and CMS are a reasonably large slice of what's moving on
the net. Do these applications need the features PostgreSQL has?
Is there any trade off? Is it worth to pay that trade off?

Is it worth to conquer this audience even if they are not skilled
DBA?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


pgsql-general by date:

Previous
From: Harald Fuchs
Date:
Subject: Re: OR or IN ?
Next
From: "Scott Marlowe"
Date:
Subject: Re: Drupal and PostgreSQL - performance issues?