Thread: Drupal and PostgreSQL - performance issues?

Drupal and PostgreSQL - performance issues?

From
Mikkel Høgh
Date:
Hi there,

I've been toying with using PostgreSQL for some of my Drupal sites for
some time, and after his session at OpenSourceDays in Copenhagen last
weekend, Magnus Hagander told me that there a quite a few in the
PostgreSQL community using Drupal.

I have been testing it a bit performance-wise, and the numbers are
worrying. In my test, MySQL (using InnoDB) had a 40% lead in
performance, but I'm unsure whether this is indicative for PostgreSQL
performance in general or perhaps a misconfiguration on my part.

In any case, if anyone has any tips, input, etc. on how best to
configure PostgreSQL for Drupal, or can find a way to poke holes in my
analysis, I would love to hear your insights :)

The performance test results can be found on my blog:
http://mikkel.hoegh.org/blog/2008/drupal_database_performance_mysql_and_postgresql_compared
--
Kind regards,

Mikkel Høgh <mikkel@hoegh.org>


Attachment

Re: [GENERAL] Drupal and PostgreSQL - performance issues?

From
John DeSoi
Date:
On Oct 12, 2008, at 11:57 PM, Mikkel Høgh wrote:

> In any case, if anyone has any tips, input, etc. on how best to
> configure PostgreSQL for Drupal, or can find a way to poke holes in
> my analysis, I would love to hear your insights :)


I just came across this article about moving Drupal from MySQL to
PostgreSQL because of MyISAM data corruption and InnoDB was too slow.


http://groups.drupal.org/node/15793




John DeSoi, Ph.D.





Re: Drupal and PostgreSQL - performance issues?

From
"Scott Marlowe"
Date:
On Sun, Oct 12, 2008 at 9:57 PM, Mikkel Høgh <mikkel@hoegh.org> wrote:
> Hi there,
>
> I've been toying with using PostgreSQL for some of my Drupal sites for some
> time, and after his session at OpenSourceDays in Copenhagen last weekend,
> Magnus Hagander told me that there a quite a few in the PostgreSQL community
> using Drupal.
>
> I have been testing it a bit performance-wise, and the numbers are worrying.
> In my test, MySQL (using InnoDB) had a 40% lead in performance, but I'm
> unsure whether this is indicative for PostgreSQL performance in general or
> perhaps a misconfiguration on my part.

The test you're running is far too simple to tell you which database
will actually be faster in real world usage.  No updates, no inserts,
no interesting or complex work goes into just delivering the front
page over and over.  I suggest you invest some time learning how to
drive a real load testing tool like jmeter and build realistic test
cases (with insert / update / delete as well as selects) and then see
how the databases perform with 1, 2, 5, 10, 50, 100 consecutive
threads running at once.

Without a realistic test scenario and with no connection pooling and
with no performance tuning, I don't think you should make any
decisions right now about which is faster.  It may well be that in a
more realistic testing that mysql keeps up through 5 or 10 client
connections then collapses at 40 or 50, while pgsql keeps climbing in
performance.  This is the performance curve I'm used to seeing from
both dbs under heavy load.

In simple terms, you're kicking the tires and making a decision based on that.

Re: Drupal and PostgreSQL - performance issues?

From
Stephen Frost
Date:
* Mikkel Høgh (mikkel@hoegh.org) wrote:
> I have been testing it a bit performance-wise, and the numbers are
> worrying. In my test, MySQL (using InnoDB) had a 40% lead in
> performance, but I'm unsure whether this is indicative for PostgreSQL
> performance in general or perhaps a misconfiguration on my part.

The comments left on your blog would probably be a good first step, if
you're not doing them already..  Connection pooling could definitely
help if you're not already doing it.  Drupal's MySQL-isms don't help
things either, of course.

Also, you don't post anything about the PostgreSQL config, nor the
hardware it's running on.  The default PostgreSQL config usually isn't
appropriate for decent hardware and that could be a contributing factor
here.  It would also be useful to make sure you've analyze'd your tables
and didn't just do a fresh load w/o any statistics having been gathered.

We run Drupal on PostgreSQL for an internal site and it works reasonably
well.  We havn't had any performance problems but it's not a terribly
large site either.  The issues we've had tend to come from PostgreSQL's
somewhat less-than-supported status with Drupal.

I've been meaning to look into Drupal's PG support to see about
improving it.  Perhaps this winter I'll get a chance to.

    Thanks,

        Stephen

Attachment

Re: [GENERAL] Drupal and PostgreSQL - performance issues?

From
Mikkel Høgh
Date:
Alright, my benchmarks might have been a bit naïve.
When it comes to hardware, my webserver is a SunFire X2100 with an
Opteron 1210 Dual Core and 4 GB DDR2 RAM, running 64-bit Ubuntu Linux
Server 8.04 LTS.

When it comes to the resource usage section of my postgresql.conf, the
only thing that are not commented out are:
shared_buffers = 24MB
max_fsm_pages = 153600

I freely admit that the reason I haven't messed with these values is
that I have next to no clue what the different things do and how they
affect performance, so perhaps an apology is in order. As Scott wrote,
"Without a realistic test scenario and with no connection pooling and
with no performance tuning, I don't think you should make any
decisions right now about which is faster". My apologies.
--
Kind regards,

Mikkel Høgh <mikkel@hoegh.org>

On 13/10/2008, at 06.54, Stephen Frost wrote:

> * Mikkel Høgh (mikkel@hoegh.org) wrote:
>> I have been testing it a bit performance-wise, and the numbers are
>> worrying. In my test, MySQL (using InnoDB) had a 40% lead in
>> performance, but I'm unsure whether this is indicative for PostgreSQL
>> performance in general or perhaps a misconfiguration on my part.
>
> The comments left on your blog would probably be a good first step, if
> you're not doing them already..  Connection pooling could definitely
> help if you're not already doing it.  Drupal's MySQL-isms don't help
> things either, of course.
>
> Also, you don't post anything about the PostgreSQL config, nor the
> hardware it's running on.  The default PostgreSQL config usually isn't
> appropriate for decent hardware and that could be a contributing
> factor
> here.  It would also be useful to make sure you've analyze'd your
> tables
> and didn't just do a fresh load w/o any statistics having been
> gathered.
>
> We run Drupal on PostgreSQL for an internal site and it works
> reasonably
> well.  We havn't had any performance problems but it's not a terribly
> large site either.  The issues we've had tend to come from
> PostgreSQL's
> somewhat less-than-supported status with Drupal.
>
> I've been meaning to look into Drupal's PG support to see about
> improving it.  Perhaps this winter I'll get a chance to.
>
>     Thanks,
>
>         Stephen


Attachment

Re: Drupal and PostgreSQL - performance issues?

From
Greg Smith
Date:
On Sun, 12 Oct 2008, Scott Marlowe wrote:

> It may well be that in a more realistic testing that mysql keeps up
> through 5 or 10 client connections then collapses at 40 or 50, while
> pgsql keeps climbing in performance.

One of the best pro-PostgreSQL comparisons showing this behavior is at
http://tweakers.net/reviews/649/7 MySQL owns that benchmark until you hit
40 users, then...ouch.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Drupal and PostgreSQL - performance issues?

From
Mikkel Høgh
Date:
Well, in that benchmark, what you say is only true for the Niagara
processors. On the Opteron page, MySQL performance only drops slightly
as concurrency passes 50.

MySQL might have a problem with Niagara, but it doesn't seem like it
has the severe concurrency vulnerability you speak of.

There are many reasons to pick PostgreSQL, but this one doesn't seem
to be a general thing. In general, MySQL seems to have problems with
some kinds of threading, since their perfomance on Mac OS X is crappy
as well for that reason.
--
Kind regards,

Mikkel Høgh <mikkel@hoegh.org>

On 13/10/2008, at 10.43, Greg Smith wrote:

> On Sun, 12 Oct 2008, Scott Marlowe wrote:
>
>> It may well be that in a more realistic testing that mysql keeps up
>> through 5 or 10 client connections then collapses at 40 or 50,
>> while pgsql keeps climbing in performance.
>
> One of the best pro-PostgreSQL comparisons showing this behavior is
> at http://tweakers.net/reviews/649/7 MySQL owns that benchmark until
> you hit 40 users, then...ouch.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com
> Baltimore, MD


Attachment

Re: Drupal and PostgreSQL - performance issues?

From
Greg Smith
Date:
On Mon, 13 Oct 2008, Mikkel H�gh wrote:

> Well, in that benchmark, what you say is only true for the Niagara
> processors. On the Opteron page, MySQL performance only drops slightly as
> concurrency passes 50.

That's partly because the upper limit on the graph only goes to 100
concurrent processes.  Since the Opterons are faster, that's not a broad
enough scale to see how fast the right edge of the MySQL curve falls.

You are right that the Niagara processors have a sharper decline than the
more traditional platforms.  The MySQL 5.0.20a graphs at
http://tweakers.net/reviews/657/6 has a nice comparison graph showing a
few different architectures that's also interesting.

Anyway, you don't actually have to believe any of this; you've got a
testbed to try for yourself if you just crank the user count up.  The main
thing I was trying to suggest is that MySQL being a bit faster at 5 users
is not unusual, but it's not really representative of which performs
better either.

> In general, MySQL seems to have problems with some kinds of threading,
> since their perfomance on Mac OS X is crappy as well for that reason.

One of the reasons (but by no means not the only one) that PostgreSQL uses
a multi-process based architecture instead of a threaded one is because
thread library quality varies so much between platforms.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Drupal and PostgreSQL - performance issues?

From
"Ang Chin Han"
Date:
On Mon, Oct 13, 2008 at 11:57 AM, Mikkel Høgh <mikkel@hoegh.org> wrote:

> In any case, if anyone has any tips, input, etc. on how best to configure
> PostgreSQL for Drupal, or can find a way to poke holes in my analysis, I
> would love to hear your insights :)

It'd be more accurate to configure Drupal for PostgreSQL. We use
PostgreSQL for almost everything, including many drupal sites, but the
usage pattern of Drupal puts PostgreSQL at a disadvantage. In short,
Drupal issues a lot of small, simple SQL (100+ is the norm), that
makes tuning hard. To make it faster, you'd need to turn on Drupal's
caches (and PHP opcode caches) to reduce the number of SQLs issued. To
get even better numbers, you'd need to get Drupal to use memcached
instead of calling PostgreSQL for the simple lookups. You can use the
devel module in Drupal to have a look at the SQLs issued. Not pretty,
IMHO.

See: http://2bits.com/articles/benchmarking-postgresql-vs-mysql-performance-using-drupal-5x.html
http://2bits.com/articles/advcache-and-memcached-benchmarks-with-drupal.html

The most promising Drupal performance module for performance looks
like: http://drupal.org/project/cacherouter (900 req/s!) but I haven't
got the chance to give it a go yet.

I'm a die-hard PostgreSQL and Drupal supporter, but in this case, I
concede straight up Drupal+MySQL will always be faster than
Drupal+PostgreSQL because of the way Drupal uses the database. We
still use PostgreSQL for our Drupal sites though, because while it's
slower, it's plenty fast enough.

Re: [GENERAL] Drupal and PostgreSQL - performance issues?

From
"Scott Marlowe"
Date:
On Mon, Oct 13, 2008 at 12:00 AM, Mikkel Høgh <mikkel@hoegh.org> wrote:
> Alright, my benchmarks might have been a bit naïve.
> When it comes to hardware, my webserver is a SunFire X2100 with an Opteron
> 1210 Dual Core and 4 GB DDR2 RAM, running 64-bit Ubuntu Linux Server 8.04
> LTS.
>
> When it comes to the resource usage section of my postgresql.conf, the only
> thing that are not commented out are:
> shared_buffers = 24MB
> max_fsm_pages = 153600

Well, 24MB is pretty small.  See if you can increase your system's
shared memory and postgresql's shared_buffers to somewhere around 256M
to 512M.  It likely won't make a big difference in this scenario, but
overall it will definitely help.

> I freely admit that the reason I haven't messed with these values is that I
> have next to no clue what the different things do and how they affect
> performance, so perhaps an apology is in order. As Scott wrote, "Without a
> realistic test scenario and with no connection pooling and with no
> performance tuning, I don't think you should make any decisions right now
> about which is faster". My apologies.

No need for apologies.  You're looking for the best database for
drupal, and you're asking questions and trying to test to see which
one is best.  You just need to look deeper is all.  I would, however,
posit that you're putting the cart before the horse by looking at
performance first, instead of reliability.

On a machine with properly functioning hardware, postgresql is nearly
indestructable.  MySQL has a lot of instances in time where, if you
pull the plug / lose power it will scramble your db / lose part or all
of your data.  Databases are supposed to be durable.  InnoDB, the
table handler, is pretty good, but it's surrounded by a DB that was
designed for speed not reliability.

There was a time when Microsoft was trying to cast IIS as faster than
Apache, so they released a benchmark showing IIS being twice as fast
as apache at delivering static pages.  Let's say it was 10mS for
apache and 2mS for IIS.  Seems really fast.  Problem is, static pages
are cheap to deliver.  I can buy a $500 server to serve the static
content and if I need more speed, I can throw more servers at the
problem for $500, no OS license fees.

But for dynamic content, the difference was the other way around, and
the delivery times were much higher for IIS, like 50mS for apache and
250mS for IIS.  Suddenly, a handful of dynamic pages and the IIS
server was noticeably slower.

The same type of comparison tends to hold true for MySQL versus
PostgreSQL.  MySQL tends to be very very fast at "select * from table
where id=5" while PostgreSQL is much faster at 4 page long reporting
queries with 5 levels of subselects and a couple of unions.  Things
that make MySQL run so slow as to be useless.  Also, PostgreSQL tends
to keep better read performance as the number of writes increase.
This is the real test, so the point I was making before about
realistic tests is very important.

It's about graceful degradation.  PostgreSQL has it, and when your
site is getting 20 times the traffic you ever tested for, it's a
little late to figure out you might have picked the wrong DBMS.  Note
I'm not saying MySQL is the wrong choice, I'm saying you don't know
because you haven't proven it capable.

Re: [GENERAL] Drupal and PostgreSQL - performance issues?

From
"Scott Marlowe"
Date:
On Mon, Oct 13, 2008 at 8:19 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> There was a time when Microsoft was trying to cast IIS as faster than
> Apache, so they released a benchmark showing IIS being twice as fast
> as apache at delivering static pages.  Let's say it was 10mS for
> apache and 2mS for IIS.

Dyslexia strikes again!  That was supposed to be 5mS... anywho.

Re: [GENERAL] Drupal and PostgreSQL - performance issues?

From
Simon Waters
Date:
On Monday 13 October 2008 15:19:07 Scott Marlowe wrote:
>
> > shared_buffers = 24MB
> > max_fsm_pages = 153600
>
> Well, 24MB is pretty small.  See if you can increase your system's
> shared memory and postgresql's shared_buffers to somewhere around 256M
> to 512M.  It likely won't make a big difference in this scenario, but
> overall it will definitely help.

I noted after reading earlier messages in the thread, that my distro documents
that the values it default to for shared_buffers is rather small.

One of our servers is fairly pressed for memory (some of the time). Is there
any way to measure the amount of churn in the shared_buffers, as a way of
demonstrating that more is needed (or at this moment more would help)?

A few very small databases on this server, and one which is 768M (still pretty
small but a lot bigger than the rest, most of which is logging information).
The only "hot" information is the session table, ~9000 lines, one index on
the session id. Can I ask Postgres to tell me, or estimate, how much memory
this table would occupy if fully cached in memory?

Half the problem in modern computing is knowing what is "slow". In this case,
counting the rows of the session table takes about 100ms. Deleting expired
session rows about 120ms, more if it hasn't done it for a while, which is I
guess evidence that table isn't being cached in memory as efficiency as it
could be.

In this case the server thinks the system I/O is zero for half the tools in
use, because of the RAID hardware, so most of the Linux based tools are
useless in this context.

At the risk of thread hijacking, for the session table I wonder if we are
handling it the most efficient way. It is just a regular table, indexed on
session_id. Each request of note to the server requires retrieval of the
session record, and often updating the expiry information. Every N requests
the application also issues a:

DELETE FROM sessions WHERE expires<NOW() OR expires IS NULL;

Since there is no index on the table, it sequentially scans, and deletes the
stale records. I'm thinking since it is indexed for regular queries, making N
larger has almost no obvious penalty except we accumulate a small number of
stale records for longer. I'm not sure if an index on expires is worth it,
probably too small to make much difference either way.

As for Drupal on Postgres, it might be worth the effort for big
implementations, I did it for a while, but doing it again I'd go with MySQL.
Nothing to do with the database, everything to do with support for 3rd party
add-ins. Till Drupal gets to the automated testing of these things routinely
against different backends and configs...... Perhaps that is all that is
needed, a service for Drupal authors that tries their plugins against
Postgres automatically and complains if it doesn't work?

Re: [GENERAL] Drupal and PostgreSQL - performance issues?

From
Greg Smith
Date:
On Mon, 13 Oct 2008, Simon Waters wrote:

> One of our servers is fairly pressed for memory (some of the time). Is there
> any way to measure the amount of churn in the shared_buffers, as a way of
> demonstrating that more is needed (or at this moment more would help)?

If you wander to http://www.westnet.com/~gsmith/content/postgresql/ my
"Inside the PostgreSQL Buffer Cache" presentation goes over this topic in
extreme detail.

> Can I ask Postgres to tell me, or estimate, how much memory this table
> would occupy if fully cached in memory?

http://wiki.postgresql.org/wiki/Disk_Usage gives an example showing all
the biggest tables/indexes in your data, and links to an article giving
examples of how to find the size of all sorts of things.  One of the
queries in my presentation even shows you what % of each table is actually
being cached by the dedicated database memory.

You also need to consider the OS buffer cache to get the full picture,
which is a bit more complicated;
http://www.kennygorman.com/wordpress/?p=250 gives an example there you
might be able to use.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD