Re: Scalability in postgres - Mailing list pgsql-performance

From Mark Mielke
Subject Re: Scalability in postgres
Date
Msg-id 4A28B378.90006@mark.mielke.cc
Whole thread Raw
In response to Re: Scalability in postgres  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Scalability in postgres  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-performance
Greg Smith wrote:
> This thread reminds me of Jignesh's "Proposal of tunable fix for
> scalability of 8.4" thread from March, except with only a fraction of
> the real-world detail.  There are multiple high-profile locks causing
> scalability concerns at quadruple digit high user counts in the
> PostgreSQL code base, finding them is easy.  Shoot, I know exactly
> where a couple are, and I didn't have to think about it at all--just
> talked with Jignesh a couple of times, led me right to them.  Fixing
> them without causing regressions in low client count cases, now that's
> the hard part.  No amount of theoretical discussion advances that any
> until you're at least staring at a very specific locking problem
> you've already characterized extensively via profiling.  And even
> then, profiling trumps theory every time.  This is why I stay out of
> these discussions and work on boring benchmark tools instead.

I disagree that profiling trumps theory every time. Profiling is useful
for identifying places where the existing architecture exhibits the best
and worst behaviour. It doesn't tell you whether a different
architecture (even a slightly different architecture) would work better
or worse. It might help identify architecture problems. It does not
provide you with architectural solutions.

I think it would be more correct to say that prototyping trumps theory.
That is, if somebody has a theory, and they invest time into a
proof-of-concept patch, and post actual results to show you that "by
changing this code over here to that, I get a N% improvement when using
thousands of connections, at no measurable cost for the single
connection case", these results will be far more compelling than theory.

Still, it has to involve theory, as not everybody has the time to run
off and prototype every wild idea. Discussion can determine whether an
idea has enough merit to be worth investing in a prototype.

I think several valuable theories have been discussed, many of which
directly apply to the domain that PostgreSQL fits within. The question
isn't about how valuable these theories are - they ARE valuable. The
question is how much support from the team can be gathered to bring
about change, and how willing the team is to accept or invest in
architectural changes that might take PostgreSQL to the next level. The
real problem here is the words "invest" and "might". That is, people are
not going to invest on a "might" - people need to be convinced, and for
people that don't have a problem today, the motivation to make the
investment is far less.

In my case, all I have to offer you is theory at this time. I don't have
the time to work on PostgreSQL, and I have not invested the time to
learn the internals of PostgreSQL well enough to comfortably and
effectively make changes to implement a theory I might have. I want to
get there - but there are so many other projects and ideas to pursue,
and I only have a few hours a day to decide what to spend it on.

You can tell me "sorry, your contribution of theory isn't welcome". In
fact, that looks like exactly what you have done. :-)

If the general community agrees with you, I'll stop my contributions of
theories. :-)

I think, though, that some of the PostgreSQL architecture is "old
theory". I have this silly idea that PostgreSQL could one day be better
than Oracle (in terms of features and performance - PostgreSQL already
beats Oracle on cost :-) ). It won't get there without some significant
changes. In only the last few years, I have watched as some pretty
significant changes were introduced into PostgreSQL that significantly
improved its performance and feature set. Many of these changes might
have started with profiling - but the real change came from applied
theory, not from profiling. Bitmap indexes are an example of this.
Profiling tells you what - that large joins involving OR are slow? It
takes theory to answer "why" and "so, what do we do about it?"

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>


pgsql-performance by date:

Previous
From: david@lang.hm
Date:
Subject: Re: Scalability in postgres
Next
From: Laszlo Nagy
Date:
Subject: Why is my stats collector so busy?