Thread: Anyone working on better transaction locking?

Anyone working on better transaction locking?

From
"Ron Peacetree"
Date:
...and if so, what are the current efforts focusing on?



Re: Anyone working on better transaction locking?

From
cbbrowne@cbbrowne.com
Date:
Ron Peacetree wrote:
> ...and if so, what are the current efforts focusing on?

What is it that you think of as being potentially "better" about some
would-be-alternative "transaction locking" scheme?

PostgreSQL already supports MVCC, which is commonly considered to be the
"better" scheme that eliminates a lot of need to lock data.

Furthermore, the phrase "transaction locking" doesn't seem to describe
what one would want to lock.  I wouldn't want to lock a "transaction;"
I'd want to lock DATA.
--
(concatenate 'string "cbbrowne" "@cbbrowne.com")
http://www.ntlug.org/~cbbrowne/sap.html
Rules of  the Evil  Overlord #153.  "My Legions of  Terror will  be an
equal-opportunity employer. Conversely, when  it is prophesied that no
man  can defeat  me, I  will  keep in  mind the  increasing number  of
non-traditional gender roles." <http://www.eviloverlord.com/>



Re: Anyone working on better transaction locking?

From
"Ron Peacetree"
Date:
<cbbrowne@cbbrowne.com> wrote in message
news:20030407194827.D0A3A56B1B@cbbrowne.com...
> Ron Peacetree wrote:
> > ...and if so, what are the current efforts focusing on?
>
> What is it that you think of as being potentially "better" about
some
> would-be-alternative "transaction locking" scheme?
>
> PostgreSQL already supports MVCC, which is commonly considered to be
the
> "better" scheme that eliminates a lot of need to lock data.
Agreed.  FTR, the reason MVCC is "better" is that readers and writers
to the same data don't block each other.  In "traditional" locking
schemes, readers don't block each other, but readers and writers to
the same data do.  Clearly, writers to the same data must always block
each other.

Unfortunately, the performance of PostgreSQL MVCC in comparison to say
Oracle (the performance leader amongst MVCC DB's, and pretty much for
all DB's for that matter) is not competitive.  Therefore there is a
need to improve the implementation of MVCC that PostgreSQL uses.  If
someone can post a detailed blow-by-blow comparison of how the two
operate so that the entire list can see it that would be a Good Thing.
If I can, I'll put together the info and post it myself.


> Furthermore, the phrase "transaction locking" doesn't seem to
describe
> what one would want to lock.  I wouldn't want to lock a
"transaction;"
> I'd want to lock DATA.
>
*sigh*.  The accepted terminology within this domain for what we are
talking about is "transaction locking".  Therefore we should use it to
ease communications.  Argue with Codd and Date if you think the term
is a misnomer.  Secondly, you are thinking only in the space
dimension.  Locks have to protect data within a minimum space vs time
"bubble".  That bubble is defined by the beginning and end of a
transaction, hence we call the locking of resources we do during that
bubble as "transaction locking".



Re: Anyone working on better transaction locking?

From
Andrew Sullivan
Date:
On Tue, Apr 08, 2003 at 01:45:25PM +0000, Ron Peacetree wrote:
> Unfortunately, the performance of PostgreSQL MVCC in comparison to say
> Oracle (the performance leader amongst MVCC DB's, and pretty much for
> all DB's for that matter) is not competitive.  Therefore there is a

What, is this a troll?  The question apparently reduces to, "Why
isn't PostgreSQL as good as Oracle?"  I have two things to say about
that:

1.    For what?  There are things that Oracle users will tell you
not to do, because there is a faster way in Oracle.  

2.    How do you know?  I haven't seen any real benchmarks
comparing PostgreSQL and Oracle similarly tuned on similar hardware. 
So I'm sceptical.

But if you have specifica areas which you think need improvement (and
aren't already listed in the TODO), I'll bet people would like to
hear about it.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8                                        +1 416 646 3304
x110



Re: Anyone working on better transaction locking?

From
Tom Lane
Date:
"Ron Peacetree" <rjpeace@earthlink.net> writes:
> Unfortunately, the performance of PostgreSQL MVCC in comparison to say
> Oracle (the performance leader amongst MVCC DB's, and pretty much for
> all DB's for that matter) is not competitive.

Ron, the tests that I've seen offer no support for that thesis.  If you
want us to accept such a blanket statement as fact, you'd better back
it up with evidence.  Let's see some test cases.

Postgres certainly has plenty of performance issues, but I have no
reason to believe that the fundamental MVCC mechanism is one of them.
        regards, tom lane



Re: Anyone working on better transaction locking?

From
"Ron Peacetree"
Date:
"Andrew Sullivan" <andrew@libertyrms.info> wrote in message
news:20030408230518.GB32207@libertyrms.info...
> On Tue, Apr 08, 2003 at 01:45:25PM +0000, Ron Peacetree wrote:
> > Unfortunately, the performance of PostgreSQL MVCC in comparison to
> > say Oracle (the performance leader amongst MVCC DB's, and pretty
much
> > for all DB's for that matter) is not competitive.  Therefore there
is
>
> What, is this a troll?
Time will tell.  Hopefully not.


> The question apparently reduces to, "Why isn't PostgreSQL
> as good as Oracle?"
Actually, you've just used reductio absurdium, not I.  My question
compares PostgreSQL to the performance leaders within this domain
since I'll have to justify my decisions to my bosses based on such
comparisons.  If you think that is unrealistic, then I wish I worked
where you do.  If you think that is unreasonable, then I think you're
treating PostgreSQL as a religion and not a SW product that must
compete against every other DB solution in the real world in order to
be relevant or even survive.


> 1. For what?  There are things that Oracle users will tell you
> not to do, because there is a faster way in Oracle.
>
> 2. How do you know?  I haven't seen any real benchmarks
> comparing PostgreSQL and Oracle similarly tuned on similar hardware.
> So I'm sceptical.
Please see my response(s) to Tom below.


> But if you have specifica areas which you think need improvement
> (and aren't already listed in the TODO), I'll bet people would like
to
> hear about it.
Please see my posts with regards to sorting and searching, two phase
execution, and two phase commit.  I'll mention thread support in
passing, and I'll be bringing up other stuff as I investigate.  Then
I'll hopefully start helping to solve some of the outstanding issues
in priority order...


"Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
news:4096.1049860699@sss.pgh.pa.us...
> Ron, the tests that I've seen offer no support for that thesis.
What tests?  I've seen no tests doing head-to-head,
feature-for-feature comparisons (particularly for low level features
like locking) of PostgreSQL vs the "biggies": DB2, Oracle, and SQL
Server.  What data I have been able to find is application level, and
certainly not head-to-head.  From those performance results, I've had
to try and extrapolate likely causes from behavioral characteristics,
docs, and what internal code I can look at (clearly not much from the
"biggies").

If you have specific head-to-head, feature-for-feature comparison test
results to share, PLEASE do so.  I need the data.


> If you want us to accept such a blanket statement as fact, you'd
> better back it up with evidence.  Let's see some test cases.
Soon as I have the HW and SW to do so, it'll happen.  I have some "bet
the company" decisions to make in the DB realm.

Test cases are, of course, not the only possible evidence.  I'll get
back to you and the list on this.


> Postgres certainly has plenty of performance issues, but I have no
> reason to believe that the fundamental MVCC mechanism is one of
> them.
Where in your opinion are they then?  How bad are they in comparison
to MySQL or any of the "Big Three"?



Re: Anyone working on better transaction locking?

From
Tom Lane
Date:
"Ron Peacetree" <rjpeace@earthlink.net> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
> news:4096.1049860699@sss.pgh.pa.us...
>> Ron, the tests that I've seen offer no support for that thesis.

> What tests?  I've seen no tests doing head-to-head,
> feature-for-feature comparisons (particularly for low level features
> like locking) of PostgreSQL vs the "biggies": DB2, Oracle, and SQL
> Server.  What data I have been able to find is application level, and
> certainly not head-to-head.

Who said anything about feature-for-feature comparisons?  You made an
(unsupported) assertion about performance, which has little to do with
feature checklists.

The reason I don't believe there's any fundamental MVCC problem is that
no such problem showed up in the head-to-head performance tests that
Great Bridge did about two years ago.  GB is now defunct, and I have
not heard of anyone else willing to stick their neck out far enough to
publish comparative benchmarks against Oracle.  But I still trust the
results they got.

I have helped various people privately with Oracle-to-PG migration
performance problems, and so far the issues have never been MVCC or
transaction issues at all.  What I've seen is mostly planner
shortcomings, such as failure to optimize "foo IN (sub-SELECT)"
decently.  Some of these things are already addressed in development
sources for 7.4.


>> Postgres certainly has plenty of performance issues, but I have no
>> reason to believe that the fundamental MVCC mechanism is one of
>> them.

> Where in your opinion are they then?  How bad are they in comparison
> to MySQL or any of the "Big Three"?

See the TODO list for some of the known problems.  As for "how bad are
they", that depends completely on the particular application and queries
you are looking at ...
        regards, tom lane



Re: Anyone working on better transaction locking?

From
Andrew Sullivan
Date:
On Wed, Apr 09, 2003 at 05:41:06AM +0000, Ron Peacetree wrote:

> Actually, you've just used reductio absurdium, not I.  My question

Nonsense.  You explicitly made the MVCC comparison with Oracle, and
are asking for a "better" locking mechanism without providing any
evidence that PostgreSQL's is bad.  

> compares PostgreSQL to the performance leaders within this domain
> since I'll have to justify my decisions to my bosses based on such
> comparisons.  If you think that is unrealistic, then I wish I worked

Where I work, we test our systems to performance targets having to do
with what we use the database for.  Generic database benchmarks are
not something I have a great deal of faith in.  I repeat my assertion
that, if you have specific areas of concern and the like, and they're
not on the TODO (or in the FAQ), then people would be likely to be
interested; although they'll likely be more interested if the
specifics are not a lot of hand-wavy talk about PostgreSQL not doing
something the right way.

> treating PostgreSQL as a religion and not a SW product that must
> compete against every other DB solution in the real world in order to
> be relevant or even survive.

Actually, given that we are dependent on PostgreSQL's performance and
stability for the whole of the company's revenue, I am pretty certain
that I have as much "real world" experience of PostgreSQL use as
anyone else.  

> Please see my posts with regards to sorting and searching, two phase
> execution, and two phase commit.  

I think your other posts were similar to the one which started this
thread: full of mighty big pronouncements which turned out to depend
on a bunch of not-so-tenable assumptions.  

I'm sorry to be so cranky about this, but I get tired of having to
defend one of my employer's core technologies from accusations based
on half-truths and "everybody knows" assumptions.  For instance,

> I'll mention thread support in passing,

there's actually a FAQ item about thread support, because in the
opinion of those who have looked at it, the cost is just not worth
the benefit.  If you have evidence to the contrary (specific evidence,
please, for this application), and have already read all the previous
discussion of the topic, perhaps people would be interested in
opening that debate again (though I have my doubts).

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8                                        +1 416 646 3304
x110



Re: Anyone working on better transaction locking?

From
cbbrowne@cbbrowne.com
Date:
Tom Lane wrote:
> The reason I don't believe there's any fundamental MVCC problem is that
> no such problem showed up in the head-to-head performance tests that
> Great Bridge did about two years ago.  GB is now defunct, and I have
> not heard of anyone else willing to stick their neck out far enough to
> publish comparative benchmarks against Oracle.  But I still trust the
> results they got.

<irony-mode-on>
You're missing where Mr Peacetree documented how MVCC performance
degraded by 42.37% between versions 7.1 and 7.3.1, as well as his
extensive statistical analysis of the relative behaviours of
PostgreSQL's semantics versus those of DB/2's MVCC implementation.
</irony-mode-off>

> I have helped various people privately with Oracle-to-PG migration
> performance problems, and so far the issues have never been MVCC or
> transaction issues at all.  What I've seen is mostly planner
> shortcomings, such as failure to optimize "foo IN (sub-SELECT)"
> decently.  Some of these things are already addressed in development
> sources for 7.4.

Ah, but that's just anecdotal evidence...  

And if you used radix sorting, that would probably fix it all.  (At
least until you discovered that you needed 65 bit addressing to set
sort_mem high enough...  Oh, did I neglect to mention anything about
irony?)
--
output = reverse("gro.mca@" "enworbbc")
http://www.ntlug.org/~cbbrowne/oses.html
"Luckily for Microsoft,  it's difficult to see a  naked emperor in the
dark." --- Ted Lewis, (former) editor-in-chief, IEEE Computer



Re: Anyone working on better transaction locking?

From
Jan Wieck
Date:
Ron Peacetree wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
> > [...]
> > If you want us to accept such a blanket statement as fact, you'd
> > better back it up with evidence.  Let's see some test cases.
> Soon as I have the HW and SW to do so, it'll happen.  I have some "bet
> the company" decisions to make in the DB realm.

And you are comparing what? Just pure features and/or performace, or
total cost of ownership for your particular case?

It is a common misunderstanding open source would be free software. It
is not because since the software comes as is, without any warranty and
it's usually hard to get support provided or backed by large companies,
it is safe to build you own support team (depends on how much you "bet
the company"). Replacing license fees and support contracts with payroll
entries plus taking the feature and performance differences into account
makes this comparision a very individual, non-portable task.

Unfortunately most manager type people can produce an annoyingly high
volume of questions and suggestions as long as they need more input,
then all of the sudden disappear when they made their decision.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Anyone working on better transaction locking?

From
"Ron Peacetree"
Date:
"Ron Peacetree" <rjpeace@earthlink.net> wrote in message
news:eS0la.16229$ey1.1398978@newsread1.prod.itd.earthlink.net...
> M$'s first implementation of threads was so "heavy" that it didn't
> help them (until they actually implemented real threads and called
> them "strings"),
TYPO ALERT: M$'s better implementation of threads is called "fibers",
not "strings"



Re: Anyone working on better transaction locking?

From
"Ron Peacetree"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
news:8236.1049906884@sss.pgh.pa.us...
> "Ron Peacetree" <rjpeace@earthlink.net> writes:
> > "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
> > news:4096.1049860699@sss.pgh.pa.us...
> >> Ron, the tests that I've seen offer no support for that thesis.
>
> > What tests?  I've seen no tests doing head-to-head,
> > feature-for-feature comparisons (particularly for low level
features
> > like locking) of PostgreSQL vs the "biggies": DB2, Oracle, and SQL
> > Server.  What data I have been able to find is application level,
and
> > certainly not head-to-head.
>
> Who said anything about feature-for-feature comparisons?  You made
an
> (unsupported) assertion about performance, which has little to do
with
> feature checklists.
>
That's not quite fair.  My assertion was about the performance of an
exact feature in comparison to that same feature in another DB
product, not about overall application level performance...  As I
said, I'll get back to you and the list on this.


> The reason I don't believe there's any fundamental MVCC problem is
that
> no such problem showed up in the head-to-head performance tests that
> Great Bridge did about two years ago.  GB is now defunct, and I have
> not heard of anyone else willing to stick their neck out far enough
to
> publish comparative benchmarks against Oracle.  But I still trust
the
> results they got.
>
Last year eWeek did a shoot out that PostgreSQL was notable in its
absence from:
http://www.eweek.com/print_article/0,3668,a=23115,00.asp
Taking those results and adding PostgreSQL to them should be eminently
feasible since the entire environment used for the test is documented
and the actual scripts and data used for the test are also available.
Of course, MySQL has been evolving at such a ferocious rate that even
one year old results, let alone two year old ones, run the risk of not
being accurate for it.


> I have helped various people privately with Oracle-to-PG migration
> performance problems, and so far the issues have never been MVCC or
> transaction issues at all.  What I've seen is mostly planner
> shortcomings, such as failure to optimize "foo IN (sub-SELECT)"
> decently.  Some of these things are already addressed in development
> sources for 7.4.
>
It's probably worth noting that since SQL support was added to
Postgres rather than being part of the product from Day One, certain
"hard" SQL constructs may still be having teething problems.  NOT IN,
for instance, was a problem for both Oracle and SQL Server at some
point in their history (fuzzy memory: pre Oracle 6, not sure about SQL
Server version...)


> >> Postgres certainly has plenty of performance issues, but I have
no
> >> reason to believe that the fundamental MVCC mechanism is one of
> >> them.
>
> > Where in your opinion are they then?  How bad are they in
comparison
> > to MySQL or any of the "Big Three"?
>
> See the TODO list for some of the known problems.  As for "how bad
are
> they", that depends completely on the particular application and
queries
> you are looking at ...
>
Fair enough.



Re: Anyone working on better transaction locking?

From
"Ron Peacetree"
Date:
"Jan Wieck" <JanWieck@Yahoo.com> wrote in message
news:3E956DD8.29432405@Yahoo.com...
> Ron Peacetree wrote:
> > "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
> > > [...]
> > > If you want us to accept such a blanket statement
> > > as fact, you'd better back it up with evidence.  Let's
> > > see some test cases.
> > Soon as I have the HW and SW to do so, it'll happen.
> > I have some "bet the company" decisions to make.
>
> And you are comparing what? Just pure features and/or
> performance, or total cost of ownership for your
> particular case?
>
Technical Analysis and Business Analysis are two separate, and equally
necessary, activities.  However, before one can accurately measure
things like Total Cost of Ownership, one needs to have accurately and
sufficiently characterized what will be owned and one's choices as to
what could be owned...


> It is a common misunderstanding open source would be
> free software. It is not because since the software comes
> as is, without any warranty and it's usually hard to get
> support provided or backed by large companies, it is safe
> to build you own support team (depends on how much
> you "bet the company"). Replacing license fees and
> support contracts with payroll entries plus taking the
> feature and performance differences into account  makes
> this comparision a very individual, non-portable task.
>
Very valid points, and I was a supporter of the FSF and the LPF when
Usenet was "the net" and backbone nodes communicated by modem, so I've
been wrestling with people's sometimes misappropriate
use/understanding of the operator "free" for some time.

However, a correctly done Technical Analysis =should= be reasonably
portable since among other things you don't want to have to start all
over if your company's business or business model changes.  Clearly
Business Analysis is very context dependant.

It should also be noted that given the prices of some of the solutions
out there, there are many companies who's choices are constrained, but
still need to stay in business...


> Unfortunately most manager type people can produce an
> annoyingly high volume of questions and suggestions as
> long as they need more input, then all of the sudden
> disappear when they made their decision.
>
Word.  Although the phrase "manager type people" could be replaced
with "people" and the above would still be true IMHO.  Thankfully,
most of my bosses are people who have worked their up from the
technical trenches, so the conversation at least rates to be focused
and reasonable while it's occurring...



Re: Anyone working on better transaction locking?

From
"Ron Peacetree"
Date:
"Andrew Sullivan" <andrew@libertyrms.info> wrote in message
news:20030409170926.GH2255@libertyrms.info...
> On Wed, Apr 09, 2003 at 05:41:06AM +0000, Ron Peacetree wrote:
> Nonsense.  You explicitly made the MVCC comparison with Oracle, and
> are asking for a "better" locking mechanism without providing any
> evidence that PostgreSQL's is bad.
>
Just because someone else's is "better" does not mean PostgreSQL's is
"bad", and I've never said such.  As I've said, I'll get back to Tom
and the list on this.


> > compares PostgreSQL to the performance leaders within this domain
> > since I'll have to justify my decisions to my bosses based on such
> > comparisons.  If you think that is unrealistic, then I wish I
> > worked where you do.
>
> Where I work, we test our systems to performance targets having to
> do with what we use the database for.  Generic database benchmarks
> are not something I have a great deal of faith in.  I repeat my
> assertion that, if you have specific areas of concern and the like,
> and they're not on the TODO (or in the FAQ), then people would be
> likely to be interested; although they'll likely be more interested
if the
> specifics are not a lot of hand-wavy talk about PostgreSQL not doing
> something the right way.
>
There's nothing "hand wavy"about this unless you think anything except
test cases is "hand wavy".  In that case, you're right.  I don't have
the time or resources to provide exhaustive tests between each DB for
each of the issues we are discussing.  If I did, I'd be publishing a
=very= lucrative newsletter for IT decision makers.  Also, there are
other
valid ways to analyze issues than just application level test cases.
In fact, there are some =better= ways, depending on the issue being
discussed.


> > treating PostgreSQL as a religion and not a SW product that must
> > compete against every other DB solution in the real world in order
> > to be relevant or even survive.
>
> Actually, given that we are dependent on PostgreSQL's performance
> and stability for the whole of the company's revenue, I am pretty
> certain that I have as much "real world" experience of PostgreSQL
> use as anyone else.
>
Your experience was not questioned, and there were "if" clauses at the
beginning of my comments that you seem to be ignoring.  I'm not here
to waste my or anyone else's time on flames.  We've all got work to
do.


> > Please see my posts with regards to ...
>
> I think your other posts were similar to the one which started this
> thread: full of mighty big pronouncements which turned out to depend
> on a bunch of not-so-tenable assumptions.
>
Hmmm.  Well, I don't think of algorithm analysis by the likes of
Knuth, Sedgewick, Gonnet, and Baeza-Yates as being "not so tenable
assumptions", but YMMV.  As for "mighty pronouncements", that also
seems a bit misleading since we are talking about quantifiable
programming and computer science issues, not unquantifiable things
like politics.


> I'm sorry to be so cranky about this, but I get tired of having to
> defend one of my employer's core technologies from accusations based
> on half-truths and "everybody knows" assumptions.  For instance,
>
Again, "accusations" is a bit strong.  I thought the discussion was
about the technical merits and costs of various features and various
ways to implement them, particularly when this product must compete
for installed base with other solutions.  Being coldly realistic about
what a product's strengths and weaknesses are is, again, just good
business.  Sun Tzu's comment about knowing the enemy and yourself
seems appropriate here...


> > I'll mention thread support in passing,
>
> there's actually a FAQ item about thread support, because in the
> opinion of those who have looked at it, the cost is just not worth
> the benefit.  If you have evidence to the contrary (specific
> evidence, please, for this application), and have already read all
the
> previous discussion of the topic, perhaps people would be interested
in
> opening that debate again (though I have my doubts).
>
Zeus had a performance ceiling roughly 3x that of Apache when Zeus
supported threading as well as pre-forking and Apache only supported
pre forking.  The Apache folks now support both.  DB2, Oracle, and SQL
Server all use threads.  Etc, etc.

That's an awful lot of very bright programmers and some serious $$
voting that threads are worth it.  Given all that, if PostgreSQL
specific
thread support is =not= showing itself to be a win that's an
unexpected
enough outcome that we should be asking hard questions as to why not.

At their core, threads are a context switching efficiency tweak.
Since DB's switch context a lot under many circumstances, threads
should be a win under such circumstances.  At the least, it should be
helpful in situations where we have multiple CPUs to split query
execution between.

M$'s first implementation of threads was so "heavy" that it didn't
help them (until they actually implemented real threads and called
them "strings"), but that was not due to the inefficacy of the
concept, but rather M$'s implementation and the system environment
within which that implementation was being used.  Perhaps something
similar is going on here?

Certainly it's =possible= that threads have nothing to offer
PostgreSQL, but IMHO it's not =probable=.  Just another thing for me
to add to my TODO heap for looking at...



Re: Anyone working on better transaction locking?

From
"scott.marlowe"
Date:
On Wed, 9 Apr 2003, Ron Peacetree wrote:

> "Andrew Sullivan" <andrew@libertyrms.info> wrote in message
> news:20030409170926.GH2255@libertyrms.info...
> > On Wed, Apr 09, 2003 at 05:41:06AM +0000, Ron Peacetree wrote:
> > Nonsense.  You explicitly made the MVCC comparison with Oracle, and
> > are asking for a "better" locking mechanism without providing any
> > evidence that PostgreSQL's is bad.
> >
> Just because someone else's is "better" does not mean PostgreSQL's is
> "bad", and I've never said such.  As I've said, I'll get back to Tom
> and the list on this.

But you didn't identify HOW it was better.  I think that's the point 
being made.

> > > Please see my posts with regards to ...
> >
> > I think your other posts were similar to the one which started this
> > thread: full of mighty big pronouncements which turned out to depend
> > on a bunch of not-so-tenable assumptions.
> >
> Hmmm.  Well, I don't think of algorithm analysis by the likes of
> Knuth, Sedgewick, Gonnet, and Baeza-Yates as being "not so tenable
> assumptions", but YMMV.  As for "mighty pronouncements", that also
> seems a bit misleading since we are talking about quantifiable
> programming and computer science issues, not unquantifiable things
> like politics.

But the real truth is revealed when the rubber hits the pavement.  
Remember that Linux Torvalds was roundly criticized for his choice of a 
monolithic development model for his kernel, and was literally told that 
his choice would restrict to "toy" status and that no commercial OS could 
scale with a monolithic kernel.

There's no shortage of people with good ideas, just people with the skills 
to implement those good ideas.  If you've got a patch to apply that's been 
tested to show something is faster EVERYONE here wants to see it.

If you've got a theory, no matter how well backed up by academic research, 
it's still just a theory.  Until someone writes to code to implement it, 
the gains are theoretical, and many things that MIGHT help don't because 
of the real world issues underlying your database, like I/O bandwidth or 
CPU <-> memory bandwidth.

> > I'm sorry to be so cranky about this, but I get tired of having to
> > defend one of my employer's core technologies from accusations based
> > on half-truths and "everybody knows" assumptions.  For instance,
> >
> Again, "accusations" is a bit strong.  I thought the discussion was
> about the technical merits and costs of various features and various
> ways to implement them, particularly when this product must compete
> for installed base with other solutions.  Being coldly realistic about
> what a product's strengths and weaknesses are is, again, just good
> business.  Sun Tzu's comment about knowing the enemy and yourself
> seems appropriate here...

No, you're wrong.  Postgresql doesn't have to compete.  It doesn't have to 
win.  it doesn't need a marketing department.  All those things are nice, 
and I'm glad if it does them, but doesn't HAVE TO.  Postgresql has to 
work.  It does that well.

Postgresql CAN compete if someone wants to put the effort into competing, 
but it isn't a priority for me.  Working is the priority, and if other 
people aren't smart enough to test Postgresql to see if it works for them, 
all the better, I keep my edge by having a near zero cost database engine, 
while the competition spends money on MSSQL or Oracle.

Tom and Andrew ARE coldly realistic about the shortcomings of postgresql.  
It has issues, and things that need to be fixed.  It needs more coders.  
It doesn't need every feature that Oracle or DB2 have.  Heck some of their 
"features" would be considered a mis-feature in the Postgresql world.

> > > I'll mention thread support in passing,
> >
> > there's actually a FAQ item about thread support, because in the
> > opinion of those who have looked at it, the cost is just not worth
> > the benefit.  If you have evidence to the contrary (specific
> > evidence, please, for this application), and have already read all
> the
> > previous discussion of the topic, perhaps people would be interested
> in
> > opening that debate again (though I have my doubts).
> >
> Zeus had a performance ceiling roughly 3x that of Apache when Zeus
> supported threading as well as pre-forking and Apache only supported
> pre forking.  The Apache folks now support both.  DB2, Oracle, and SQL
> Server all use threads.  Etc, etc.

Yes, and if you configured your apache server to have 20 or 30 spare 
servers, in the real world, it was nearly neck and neck to Zeus, but since 
Zeus cost like $3,000 a copy, it is still cheaper to just overwhelm it 
with more servers running apache than to use zeus.

> That's an awful lot of very bright programmers and some serious $$
> voting that threads are worth it.  

For THAT application.  for what a web server does, threads can be very 
useful, even useful enough to put up with the problems created by running 
threads on multiple threading libs on different OSes.  

Let me ask you, if Zeus scrams and crashes out, and it's installed 
properly so it just comes right back up, how much data can you lose?

If Postgresql scrams and crashes out, how much data can you lost?

> Given all that, if PostgreSQL
> specific
> thread support is =not= showing itself to be a win that's an
> unexpected
> enough outcome that we should be asking hard questions as to why not.

There HAS been testing on threads in Postgresql.  It has been covered to 
death.  The fact that you're still arguing proves you likely haven't read 
the archive (google has it back to way back when, use that to look it up) 
about this subject.

Threads COULD help on multi-sorted results, and a few other areas, but the 
increase in performance really wasn't that great for 95% of all the cases, 
and for the 5% it was, simple query planner improvements have provided far 
greater performance increases.

The problem with threading is that we can either use the one process -> 
many thread design, which I personally don't trust for something like a 
database, or a process per backend connection which can run 
multi-threaded.  This scenario makes Postgresql just as stable and 
reliable as it was as a multi-process app, but allows threaded performance 
in certain areas of the backend that are parallelizable to run in parallel 
on multi-CPU systems.

the gain, again, is minimal, and on a system with many users accessing it, 
there is NO real world gain.

> At their core, threads are a context switching efficiency tweak.

Except that on the two OSes which Postgresql runs on the most, threads are 
really no faster than processes.  In the Linux kernel, the only real 
difference is how the OS treats them, creation, destruction of threads 
versus processes is virtually identical there.

> Certainly it's =possible= that threads have nothing to offer
> PostgreSQL, but IMHO it's not =probable=.  Just another thing for me
> to add to my TODO heap for looking at...

It's been tested, it didn't help a lot, and it made it MUCH harder to 
maintain, as threads in Linux are handled by a different lib than in say 
Solaris, or Windows or any other OS.  I.e. you can't guarantee the thread 
lib you need will be there, and that there are no bugs.  MySQL still has 
thread bug issues pop up, most of which are in the thread libs themselves.



Re: Anyone working on better transaction locking?

From
Kevin Brown
Date:
Ron Peacetree wrote:
> Zeus had a performance ceiling roughly 3x that of Apache when Zeus
> supported threading as well as pre-forking and Apache only supported
> pre forking.  The Apache folks now support both.  DB2, Oracle, and SQL
> Server all use threads.  Etc, etc.

You can't use Apache as an example of why you should thread a database
engine, except for the cases where the database is used much like the
web server is: for numerous short transactions.

> That's an awful lot of very bright programmers and some serious $$
> voting that threads are worth it.  Given all that, if PostgreSQL
> specific thread support is =not= showing itself to be a win that's
> an unexpected enough outcome that we should be asking hard questions
> as to why not.

It's not that there won't be any performance benefits to be had from
threading (there surely will, on some platforms), but gaining those
benefits comes at a very high development and maintenance cost.  You
lose a *lot* of robustness when all of your threads share the same
memory space, and make yourself vulnerable to classes of failures that
simply don't happen when you don't have shared memory space.

PostgreSQL is a compromise in this regard: it *does* share memory, but
it only shares memory that has to be shared, and nothing else.  To get
the benefits of full-fledged threads, though, requires that all memory
be shared (otherwise the OS has to tweak the page tables whenever it
switches contexts between your threads).

> At their core, threads are a context switching efficiency tweak.

This is the heart of the matter.  Context switching is an operating
system problem, and *that* is where the optimization belongs.  Threads
exist in large part because operating system vendors didn't bother to
do a good job of optimizing process context switching and
creation/destruction.

Under Linux, from what I've read, process creation/destruction and
context switching happens almost as fast as thread context switching
on other operating systems (Windows in particular, if I'm not
mistaken).

> Since DB's switch context a lot under many circumstances, threads
> should be a win under such circumstances.  At the least, it should be
> helpful in situations where we have multiple CPUs to split query
> execution between.

This is true, but I see little reason that we can't do the same thing
using fork()ed processes and shared memory instead.

There is context switching within databases, to be sure, but I think
you'll be hard pressed to demonstrate that it is anything more than an
insignificant fraction of the total overhead incurred by the database.
I strongly suspect that much larger gains are to be had by optimizing
other areas of the database, such as the planner, the storage manager
(using mmap for file handling may prove useful here), the shared
memory system (mmap may be faster than System V style shared memory),
etc.

The big overhead in the process model on most platforms is in creation
and destruction of processes.  PostgreSQL has a relatively high
connection startup cost.  But there are ways of dealing with this
problem other than threading, namely the use of a connection caching
middleware layer.  Such layers exist for databases other than
PostgreSQL, so the high cost of fielding and setting up a database
connection is *not* unique to PostgreSQL ... which suggests that while
threading may help, it doesn't help *enough*.

I'd rather see some development work go into a connection caching
process that understands the PostgreSQL wire protocol well enough to
look like a PostgreSQL backend to connecting processes, rather than
see a much larger amount of effort be spent on converting PostgreSQL
to a threaded architecture (and then discover that connection caching
is still needed anyway).

> Certainly it's =possible= that threads have nothing to offer
> PostgreSQL, but IMHO it's not =probable=.  Just another thing for me
> to add to my TODO heap for looking at...

It's not that threads don't have anything to offer.  It's that the
costs associated with them are high enough that it's not at all clear
that they're an overall win.


-- 
Kevin Brown                          kevin@sysexperts.com



Re: Anyone working on better transaction locking?

From
Shridhar Daithankar
Date:
On Saturday 12 April 2003 03:02, you wrote:
> Ron Peacetree wrote:
> > Zeus had a performance ceiling roughly 3x that of Apache when Zeus
> > supported threading as well as pre-forking and Apache only supported
> > pre forking.  The Apache folks now support both.  DB2, Oracle, and SQL
> > Server all use threads.  Etc, etc.
>
> You can't use Apache as an example of why you should thread a database
> engine, except for the cases where the database is used much like the
> web server is: for numerous short transactions.

OK. Let me put my experiences. These are benchmarks on a intranet(100MBps lan) 
run off a 1GHZ P-III/IV webserver on mandrake9 for a single 8K file.

apache2044: 1300 rps
boa:      4500rps
Zeus:     6500 rps.

Apache does too many things to be a speed daemon and what it offers is pretty 
impressive from performance POV.

But database is not webserver. It is not suppose to handle tons of concurrent 
requests. That is a fundamental difference.

>
> > That's an awful lot of very bright programmers and some serious $$
> > voting that threads are worth it.  Given all that, if PostgreSQL
> > specific thread support is =not= showing itself to be a win that's
> > an unexpected enough outcome that we should be asking hard questions
> > as to why not.
>
> It's not that there won't be any performance benefits to be had from
> threading (there surely will, on some platforms), but gaining those
> benefits comes at a very high development and maintenance cost.  You
> lose a *lot* of robustness when all of your threads share the same
> memory space, and make yourself vulnerable to classes of failures that
> simply don't happen when you don't have shared memory space.

Well. Threading does not necessarily imply one thread per connection model. 
Threading can be used to make CPU work during I/O and taking advantage of SMP 
for things like sort etc. This is especially true for 2.4.x linux kernels 
where async I/O can not be used for threaded apps. as threads and signal do 
not mix together well.

One connection per thread is not a good model for postgresql since it has 
already built a robust product around process paradigm. If I have to start a 
new database project today, a mix of process+thread is what I would choose bu 
postgresql is not in same stage of life.

> > At their core, threads are a context switching efficiency tweak.
>
> This is the heart of the matter.  Context switching is an operating
> system problem, and *that* is where the optimization belongs.  Threads
> exist in large part because operating system vendors didn't bother to
> do a good job of optimizing process context switching and
> creation/destruction.

But why would a database need a tons of context switches if it is not supposed 
to service loads to request simaltenously? If there are 50 concurrent 
connections, how much context switching overhead is involved regardless of 
amount of work done in a single connection? Remeber that database state is 
maintened in shared memory. It does not take a context switch to access it.

The assumption stems from database being very efficient in creating and 
servicing a new connection. I am not very comfortable with that argument.

> Under Linux, from what I've read, process creation/destruction and
> context switching happens almost as fast as thread context switching
> on other operating systems (Windows in particular, if I'm not
> mistaken).

I hear solaris also has very heavy processes. But postgresql has other issues 
with solaris as well.
>
> > Since DB's switch context a lot under many circumstances, threads
> > should be a win under such circumstances.  At the least, it should be
> > helpful in situations where we have multiple CPUs to split query
> > execution between.

Can you give an example where database does a lot of context switching for 
moderate number of connections?
Shridhar



Re: Anyone working on better transaction locking?

From
Kevin Brown
Date:
Shridhar Daithankar wrote:
> Apache does too many things to be a speed daemon and what it offers
> is pretty impressive from performance POV.
>
> But database is not webserver. It is not suppose to handle tons of
> concurrent requests. That is a fundamental difference.

I'm not sure I necessarily agree with this.  A database is just a
tool, a means of reliably storing information in such a way that it
can be retrieved quickly.  Whether or not it "should" handle lots of
concurrent requests is a question that the person trying to use it
must answer.

A better answer is that a database engine that can handle lots of
concurrent requests can also handle a smaller number, but not vice
versa.  So it's clearly an advantage to have a database engine that
can handle lots of concurrent requests because such an engine can be
applied to a larger number of problems.  That is, of course, assuming
that all other things are equal...

There are situations in which a database would have to handle a lot of
concurrent requests.  Handling ATM transactions over a large area is
one such situation.  A database with current weather information might
be another, if it is actively queried by clients all over the country.
Acting as a mail store for a large organization is another.  And, of
course, acting as a filesystem is definitely another.  :-)

> Well. Threading does not necessarily imply one thread per connection
> model. Threading can be used to make CPU work during I/O and taking
> advantage of SMP for things like sort etc. This is especially true
> for 2.4.x linux kernels where async I/O can not be used for threaded
> apps. as threads and signal do not mix together well.

This is true, but whether you choose to limit the use of threads to a
few specific situations or use them throughout the database, the
dangers and difficulties faced by the developers when using threads
will be the same.

> One connection per thread is not a good model for postgresql since
> it has already built a robust product around process paradigm. If I
> have to start a new database project today, a mix of process+thread
> is what I would choose bu postgresql is not in same stage of life.

Certainly there are situations for which it would be advantageous to
have multiple concurrent actions happening on behalf of a single
connection, as you say.  But that doesn't automatically mean that a
thread is the best overall solution.  On systems such as Linux that
have fast process handling, processes are almost certainly the way to
go.  On other systems such as Solaris or Windows, threads might be the
right answer (on Windows they might be the *only* answer).  But my
argument here is simple: the responsibility of optimizing process
handling belongs to the maintainers of the OS.  Application developers
shouldn't have to worry about this stuff.

Of course, back here in the real world they *do* have to worry about
this stuff, and that's why it's important to quantify the problem.
It's not sufficient to say that "processes are slow and threads are
fast".  Processes on the target platform may well be slow relative to
other systems (and relative to threads).  But the question is: for the
problem being solved, how much overhead does process handling
represent relative to the total amount of overhead the solution itself
incurs?

For instance, if we're talking about addressing the problem of
distributing sorts across multiple CPUs, the amount of overhead
involved in doing disk activity while sorting could easily swamp, in
the typical case, the overhead involved in creating parallel processes
to do the sorts themselves.  And if that's the case, you may as well
gain the benefits of using full-fledged processes rather than deal
with the problems that come with the use of threads -- because the
gains to be found by using threads will be small in relative terms.

> > > At their core, threads are a context switching efficiency tweak.
> >
> > This is the heart of the matter.  Context switching is an operating
> > system problem, and *that* is where the optimization belongs.  Threads
> > exist in large part because operating system vendors didn't bother to
> > do a good job of optimizing process context switching and
> > creation/destruction.
> 
> But why would a database need a tons of context switches if it is
> not supposed to service loads to request simaltenously? If there are
> 50 concurrent connections, how much context switching overhead is
> involved regardless of amount of work done in a single connection? 
> Remeber that database state is maintened in shared memory. It does
> not take a context switch to access it.

If there are 50 concurrent connections with one process per
connection, then there are 50 database processes.  The context switch
overhead is incurred whenever the current process blocks (or exhausts
its time slice) and the OS activates a different process.  Since
database handling is generally rather I/O intensive as services go,
relatively few of those 50 processes are likely to be in a runnable
state, so I would expect the overall hit from context switching to be
rather low -- I'd expect the I/O subsystem to fall over well before
context switching became a real issue.

Of course, all of that is independent of whether or not the database
can handle a lot of simultaneous requests.

> > Under Linux, from what I've read, process creation/destruction and
> > context switching happens almost as fast as thread context switching
> > on other operating systems (Windows in particular, if I'm not
> > mistaken).
> 
> I hear solaris also has very heavy processes. But postgresql has
> other issues with solaris as well.

Yeah, I didn't want to mention Solaris because I haven't kept up with
it and thought that perhaps they had fixed this...


-- 
Kevin Brown                          kevin@sysexperts.com



Re: Anyone working on better transaction locking?

From
Shridhar Daithankar
Date:
On Saturday 12 April 2003 16:24, you wrote:
> A better answer is that a database engine that can handle lots of
> concurrent requests can also handle a smaller number, but not vice
> versa.  So it's clearly an advantage to have a database engine that
> can handle lots of concurrent requests because such an engine can be
> applied to a larger number of problems.  That is, of course, assuming
> that all other things are equal...
>
> There are situations in which a database would have to handle a lot of
> concurrent requests.  Handling ATM transactions over a large area is
> one such situation.  A database with current weather information might
> be another, if it is actively queried by clients all over the country.
> Acting as a mail store for a large organization is another.  And, of
> course, acting as a filesystem is definitely another.  :-)

Well, there is another aspect one should consider. Tuning a database engine 
for a specifiic workload is a hell of a job and shifting it to altogether 
other end of paradigm must be justified.

OK. Postgresql is not optimised to handle lots of concurrent connections, at 
least not much to allow one apache request handler to use a connection. Then 
middleware connection pooling like done in php might be a simpler solution to 
go rather than redoing the postgresql stuff. Because it works.

> This is true, but whether you choose to limit the use of threads to a
> few specific situations or use them throughout the database, the
> dangers and difficulties faced by the developers when using threads
> will be the same.

I do not agree. Let's say I put threading functions in posgresql that do not 
touch shared memory interface at all. They would be hell lot simpler to code 
and mainten than converting postgresql to one thread per connection model.

> Of course, back here in the real world they *do* have to worry about
> this stuff, and that's why it's important to quantify the problem.
> It's not sufficient to say that "processes are slow and threads are
> fast".  Processes on the target platform may well be slow relative to
> other systems (and relative to threads).  But the question is: for the
> problem being solved, how much overhead does process handling
> represent relative to the total amount of overhead the solution itself
> incurs?

That is correct. However it would be a fair assumption on part of postgresql 
developers that a process once setup does not have much of processing 
overhead involved as such, given the state of modern server class OS and 
hardware. So postgresql as it is, fits in that model. I mean it is fine that 
postgresql has heavy connections. Simpler solution is to pool them.

That gets me wondering. Has anybody ever benchmarked how much a database 
connection weighs in terms of memory/CPU/IO BW. for different databases on 
different platforms? Is postgresql really that slow?
Shridhar



Re: Anyone working on better transaction locking?

From
Greg Stark
Date:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:

> But database is not webserver. It is not suppose to handle tons of concurrent 
> requests. That is a fundamental difference.

And in one fell swoop you've dismissed the entire OLTP database industry. 

Have you ever called a travel agent and had him or her look up a fare in the
airline database within seconds? Ever placed an order over the telephone? 
Ever used a busy database-backed web site?

On database-backed web sites, probably the main application for databases
today, almost certainly the main application for free software databases,
every web page request translates into at least one, probably several database
queries. 

All those database queries must complete within a limited time, measured in
milliseconds. When they complete another connection needs to be context
switched in and run again within milliseconds.

On a busy web site the database machine will have several processors and be
processing queries for several web pages simultaneously, but what really
matters is precisely the context switch time between one set of queries and
another.

The test I'm most interested in in the benchmarks effort is simply an index
lookup or update of a single record from a large table. How many thousands of
transactions per second is postgres going to be able to handle on the same
machine as mysql and oracle? How many hundreds of thousands of transactions
per second will they be able to handle on a 4 processor hyperthreaded machine
with a raid array striped across ten disks?

--
greg



Re: Anyone working on better transaction locking?

From
cbbrowne@cbbrowne.com
Date:
Scott Marlowe wrote:
> On Wed, 9 Apr 2003, Ron Peacetree wrote:
> 
> > "Andrew Sullivan" <andrew@libertyrms.info> wrote in message
> > news:20030409170926.GH2255@libertyrms.info...
> > > On Wed, Apr 09, 2003 at 05:41:06AM +0000, Ron Peacetree wrote:
> > > Nonsense.  You explicitly made the MVCC comparison with Oracle, and
> > > are asking for a "better" locking mechanism without providing any
> > > evidence that PostgreSQL's is bad.
> > >
> > Just because someone else's is "better" does not mean PostgreSQL's is
> > "bad", and I've never said such.  As I've said, I'll get back to Tom
> > and the list on this.
> 
> But you didn't identify HOW it was better.  I think that's the point 
> being made.

Oh, but he presented such detailed statistics to prove his case, didn't you 
see it?  :-)

> > > > Please see my posts with regards to ...
> > >
> > > I think your other posts were similar to the one which started this
> > > thread: full of mighty big pronouncements which turned out to depend
> > > on a bunch of not-so-tenable assumptions.
> > >
> > Hmmm.  Well, I don't think of algorithm analysis by the likes of
> > Knuth, Sedgewick, Gonnet, and Baeza-Yates as being "not so tenable
> > assumptions", but YMMV.  As for "mighty pronouncements", that also
> > seems a bit misleading since we are talking about quantifiable
> > programming and computer science issues, not unquantifiable things
> > like politics.
> 
> But the real truth is revealed when the rubber hits the pavement.  
> Remember that Linux Torvalds was roundly criticized for his choice of a 
> monolithic development model for his kernel, and was literally told that 
> his choice would restrict to "toy" status and that no commercial OS could 
> scale with a monolithic kernel.

Indeed.  I have the books from all of the above (when I studied databases 
under Gonnet, Baeza-Yates was his TA...).  And I have seen enough cases of the 
conglomeration of multiple algorithms not behaving the way a blind read of 
their books might suggest to refuse to blindly assume that things are so 
simple.

In the /real/ world, the dictates of flushing buffers to help ensure 
robustness can combine with having enough memory to virtually eliminate read 
I/O to substantially change the results from some simplistic O(f(n)) analysis.

Which is NOT to say that computational complexity is unimportant; what it 
indicates is that theoretical results are merely theoretical.  And may only 
represent a small part of what happens in practice.  The nonsense about radix 
sorts was a wonderful example; it would likely only be useful with PostgreSQL 
if you had some fantastical amount of memory that might not actually be able 
to be constructed within the confines of our solar system.

> There's no shortage of people with good ideas, just people with the skills 
> to implement those good ideas.  If you've got a patch to apply that's been 
> tested to show something is faster EVERYONE here wants to see it.
> 
> If you've got a theory, no matter how well backed up by academic research, 
> it's still just a theory.  Until someone writes to code to implement it, 
> the gains are theoretical, and many things that MIGHT help don't because 
> of the real world issues underlying your database, like I/O bandwidth or 
> CPU <-> memory bandwidth.

An unfortunate thing (to my mind) is that *genuinely novel* operating system 
research has pretty much disappeared.  All we see, these days, are rehashes of 
VMS, MVS, and Unix, along with some reimplementations of P-Code under monikers 
like "JVM", ".NET" or "Parrot."

There's good reason for it; if you build something that is much more than 95% 
indistinguishable from Unix, then you'll be left with the *enormous* projects 
of creating completely new infrastructure for compilers, data persistence 
("novel" would mean, to my mind, concepts different from files), program 
editors, and such.  But if it's 95% the same as Unix, then Emacs, GCC, CVS, 
PostgreSQL, and all sorts of "tool chain" are available to you.

What is unfortunate is that it would be nice to try out some things that are 
Very Different.  Unfortunately, it might take five years of slogging through 
recreating compilers and editors in order to get in about 6 months of "solid 
novel work."

Of course, if you don't plan to lift your finger to help make any of it 
happen, it's easy enough to "armchair quarterback" and suggest that someone 
else do all sorts of would-be "neat things."

> > > I'm sorry to be so cranky about this, but I get tired of having to
> > > defend one of my employer's core technologies from accusations based
> > > on half-truths and "everybody knows" assumptions.  For instance,
> > >
> > Again, "accusations" is a bit strong.  I thought the discussion was
> > about the technical merits and costs of various features and various
> > ways to implement them, particularly when this product must compete
> > for installed base with other solutions.  Being coldly realistic about
> > what a product's strengths and weaknesses are is, again, just good
> > business.  Sun Tzu's comment about knowing the enemy and yourself
> > seems appropriate here...

> No, you're wrong.  Postgresql doesn't have to compete.  It doesn't have to 
> win.  it doesn't need a marketing department.  All those things are nice, 
> and I'm glad if it does them, but doesn't HAVE TO.  Postgresql has to 
> work.  It does that well.

Having a bit more of a "marketing department" might be a nice thing; it could 
make it easier for people that would like to deploy PG to get the idea past 
the higher-ups that have a hard time listening to things that *don't* come 
from that department.

> > > > I'll mention thread support in passing,
> > >
> > > there's actually a FAQ item about thread support, because in the
> > > opinion of those who have looked at it, the cost is just not worth
> > > the benefit.  If you have evidence to the contrary (specific
> > > evidence, please, for this application), and have already read all
> > the
> > > previous discussion of the topic, perhaps people would be interested
> > in
> > > opening that debate again (though I have my doubts).
> > >
> > Zeus had a performance ceiling roughly 3x that of Apache when Zeus
> > supported threading as well as pre-forking and Apache only supported
> > pre forking.  The Apache folks now support both.  DB2, Oracle, and SQL
> > Server all use threads.  Etc, etc.
> 
> Yes, and if you configured your apache server to have 20 or 30 spare 
> servers, in the real world, it was nearly neck and neck to Zeus, but since 
> Zeus cost like $3,000 a copy, it is still cheaper to just overwhelm it 
> with more servers running apache than to use zeus.

All quite entertaining.  Andrew was perhaps trolling just a little bit there; 
our resident "algorithm expert" was certainly easily sucked into leaping down 
the path-too-much-trod.  Just as with choices of sorting algorithms, it's easy 
enough for there to be more to things than whatever the latest academic 
propaganda about threading is.

The VITAL point to be made about threading is that there is a tradeoff, and 
it's not the one that "armchair-quarterbacks-that-don't-write-code" likely 
think of.

--> Hand #1:  Implementing a threaded model would require a lot of work, and 
the *ACTUAL* expected benefits are unknown.

--> Hand #2:  So far, other *easier* optimizations have been providing 
significant speedups, requiring much less effort.

At some point in time, it might be that "doing threading" might become the 
strategy most expected to reap the most rewards for the least amount of 
programmer effort.  Until that time, it's not worth worrying about it.

> > That's an awful lot of very bright programmers and some serious $$
> > voting that threads are worth it.  
> 
> For THAT application.  for what a web server does, threads can be very 
> useful, even useful enough to put up with the problems created by running 
> threads on multiple threading libs on different OSes.  
> 
> Let me ask you, if Zeus scrams and crashes out, and it's installed
> properly so it just comes right back up, how much data can you lose?
> 
> If Postgresql scrams and crashes out, how much data can you lost?

There's another possibility, namely that the "voting" may not have anything to 
do with threading being "best."  Instead, it may be a road to allow the 
largest software houses, that can afford to have enough programmers that can 
"do threading," to crush smaller competitors.  After all, threading offers 
daunting new opportunities for deadlocks, data overruns, and crashes; if only 
those with the most, best thread programmers can compete, that discourages 
others from even /trying/ to compete.
--
output = ("cbbrowne" "@ntlug.org")
http://www3.sympatico.ca/cbbrowne/sgml.html
"I visited  a company  that was doing  programming in BASIC  in Panama
City and I asked them if they resented that the BASIC keywords were in
English.   The answer  was:  ``Do  you resent  that  the keywords  for
control of actions in music are in Italian?''"  -- Kent M Pitman



Re: Anyone working on better transaction locking?

From
Neil Conway
Date:
On Fri, 2003-04-11 at 17:32, Kevin Brown wrote:
> The big overhead in the process model on most platforms is in creation
> and destruction of processes.  PostgreSQL has a relatively high
> connection startup cost.  But there are ways of dealing with this
> problem other than threading, namely the use of a connection caching
> middleware layer.

Furthermore, IIRC PostgreSQL's relatively slow connection creation time
has as much to do with other per-backend initialization work as it does
with the time to actually fork() a new backend. If there is interest in
optimizing backend startup time, my guess would be that there is plenty
of room for improvement without requiring the replacement of processes
with threads.

Cheers,

Neil



Re: Anyone working on better transaction locking?

From
"Michael Paesold"
Date:
Neil Conway wrote:

> Furthermore, IIRC PostgreSQL's relatively slow connection creation time
> has as much to do with other per-backend initialization work as it does
> with the time to actually fork() a new backend. If there is interest in
> optimizing backend startup time, my guess would be that there is plenty
> of room for improvement without requiring the replacement of processes
> with threads.

I see there is a whole TODO Chapter devoted to the topic. There is the idea
of pre-forked and persistent backends. That would be very useful in an
environment where it's quite hard to use connection pooling. We are
currently working on a mail system for a free webmail. The mda (mail
delivery agent) written in C connects to the pg database to do some queries
everytime a new mail comes in. I didn't find a solution for connection
pooling yet.

About the TODO items, apache has a nice description of their accept()
serialization:
http://httpd.apache.org/docs-2.0/misc/perf-tuning.html

Perhaps this could be useful if someone decided to start implementing those
features.

Regards,
Michael Paesold



Re: Anyone working on better transaction locking?

From
Mark Kirkwood
Date:
Greg Stark wrote:

>Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
>
>  
>
>>But database is not webserver. It is not suppose to handle tons of concurrent 
>>requests. That is a fundamental difference.
>>    
>>
>
>And in one fell swoop you've dismissed the entire OLTP database industry. 
>
>Have you ever called a travel agent and had him or her look up a fare in the
>airline database within seconds? Ever placed an order over the telephone? 
>Ever used a busy database-backed web site?
>  
>
That situation is usually handled by means of a TP Monitor that keeps 
open database connections ( e.g, CICS + DB2 ).

I think there is some confusion between "many concurrent connections + 
short transactions" and "many connect / disconnect + short transactions" 
in some of this discussion.

OLTP systems typically fall into the first case - perhaps because their 
db products do not have fast connect / disconnect :-).  Postgresql plus 
some suitable middleware (e.g Php) will handle this configuration *with* 
its current transaction model.

I think you are actually talking about the connect / disconnect speed 
rather than the *transaction* model per se.

best wishes

Mark



Re: Anyone working on better transaction locking?

From
Greg Stark
Date:
Mark Kirkwood <markir@paradise.net.nz> writes:

> I think there is some confusion between "many concurrent connections + short
> transactions" and "many connect / disconnect + short transactions" in some of
> this discussion.

I was intended to clarify that but left it out. In fact I think that's
precisely one of the confusions that's obscuring things in this ongoing
debate.

Worrying about connection time is indeed a red herring. Most databases have
slow connection times so most database drivers implement some form of cached
connections. A lot of effort has gone into working around this particular
database design deficiency.

However even if you reuse existing database connections, you nonetheless are
still context switching between hundreds or potentially thousands of threads
of execution. The lighter-weight that context switch is, the faster it'll be
able to do that.

For a web site where all the queries are preparsed, all the data is cached in
ram, and all the queries involve quick single record lookups and updates, the
machine is often quite easily driven 100% cpu bound. 

It's tricky to evaluate the cost of the context switches because a big part of
the cost is simply the tlb flushes. Not only does a process context switch
involve swapping in memory maps and other housekeeping, but all future memory
accesses like the data copies that an OLTP system spends most of its time
doing are slowed down.

And the other question is how much memory does having many processes running
consume? Every page those processes are consuming that could have been shared
is a page that isn't being used for disk caching, and another page to pollute
the processor's cache.

So for example, I wonder how fast postgres would be if there were a thousand
connections open, all doing fast one-record index lookups as fast as they can.

People are going to say that would just be a poorly designed system, but I
think they're just not applying much foresight. Reasonably designed systems
easily need several hundred connections now, and future large systems will
undoubtedly need thousands.

Anyways, this is a long standing debate and the FAQ answer is mostly, we'll
find out when someone writes the code. Continuing to debate it isn't going to
be very productive. My only desire here is to see more people realize that
optimizing for tons of short transactions using data cached in ram is at least
as important as optimizing for big complex transactions on huge datasets.

--
greg



Re: Anyone working on better transaction locking?

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> However even if you reuse existing database connections, you nonetheless are
> still context switching between hundreds or potentially thousands of threads
> of execution. The lighter-weight that context switch is, the faster it'll be
> able to do that.

> It's tricky to evaluate the cost of the context switches because a big part of
> the cost is simply the tlb flushes. Not only does a process context switch
> involve swapping in memory maps and other housekeeping, but all future memory
> accesses like the data copies that an OLTP system spends most of its time
> doing are slowed down.

So?  You're going to be paying those costs *anyway*, because most of the
process context swaps will be between the application server and the
database.  A process swap is a process swap, and if you are doing only
very short transactions, few of those swaps will be between database
contexts --- app to database to app will be the common pattern.  Unless
you'd like to integrate the client into the same address space as the
database, I do not see that there's an argument here that says multiple
threads in the database will be markedly faster than multiple processes.
        regards, tom lane



Re: Anyone working on better transaction locking?

From
Mark Kirkwood
Date:
Greg Stark wrote:

>So for example, I wonder how fast postgres would be if there were a thousand
>connections open, all doing fast one-record index lookups as fast as they can.
>
Yes - some form of "connection reducing" middleare is probably needed at 
that point ( unless you have fairly highly spec'ed hardware )

>People are going to say that would just be a poorly designed system, but I
>think they're just not applying much foresight. Reasonably designed systems
>easily need several hundred connections now, and future large systems will
>undoubtedly need thousands.
>  
>
I guess the question could be reduced to : whether some form of TP 
Monitor functionality should be built into Postgresql? This *might* be a 
better approach - as there may be a limit to how much faster a Pg 
connection can get. By way of interest I notice that DB2 8.1 has a 
connection concentrator in it - probably for the very reason that we 
have been discussing...

Maybe there should be a TODO list item in the Pg "Exotic  Features" for 
connection pooling / concentrating ???

What do people think ?

Mark



Re: Anyone working on better transaction locking?

From
Kevin Brown
Date:
Shridhar Daithankar wrote:
> > There are situations in which a database would have to handle a lot of
> > concurrent requests.  Handling ATM transactions over a large area is
> > one such situation.  A database with current weather information might
> > be another, if it is actively queried by clients all over the country.
> > Acting as a mail store for a large organization is another.  And, of
> > course, acting as a filesystem is definitely another.  :-)
> 
> Well, there is another aspect one should consider. Tuning a database
> engine for a specifiic workload is a hell of a job and shifting it
> to altogether other end of paradigm must be justified.

Certainly, but that justification comes from the problem being
solved.  If the nature of the problem demands tons of short
transactions (and as I said, a number of problems have such a
requirement), then tuning the database so that it can deal with it is
a requirement if that database is to be used at all.

Now, keep in mind that "tuning the database" here covers a *lot* of
ground and a lot of solutions, including connection-pooling
middleware.

> OK. Postgresql is not optimised to handle lots of concurrent
> connections, at least not much to allow one apache request handler
> to use a connection. Then middleware connection pooling like done in
> php might be a simpler solution to go rather than redoing the
> postgresql stuff. Because it works.

I completely agree.  In fact, I see little reason to change PG's
method of connection handling because I see little reason that a
general-purpose connection pooling frontend can't be developed.

Another method that could help is to prefork the postmaster.

> > This is true, but whether you choose to limit the use of threads to a
> > few specific situations or use them throughout the database, the
> > dangers and difficulties faced by the developers when using threads
> > will be the same.
> 
> I do not agree. Let's say I put threading functions in posgresql
> that do not touch shared memory interface at all. They would be hell
> lot simpler to code and mainten than converting postgresql to one
> thread per connection model.

I think you misunderstand what I'm saying.

There are two approaches we've been talking about thus far:

1.  One thread per connection.  In this instance, every thread shares   exactly the same memory space.

2.  One process per connection, with each process able to create   additional worker threads to handle things like
concurrentsorts.   In this instance, threads that belong to the same process all   share the same memory space
(includingthe SysV shared memory pool   that the processes use to communicate with each other), but the   only memory
that*all* the threads will have in common is the SysV   shared memory pool.
 

Now, the *scope* of the problems introduced by using threading is
different between the two approaches, but the *nature* of the problems
is the same: for any given process, the introduction of threads will
significantly complicate the debugging of memory corruption issues.
This problem will be there no matter which approach you use; the only
difference will be the scale.

And that's why you're probably better off with the third approach:

3.  One process per connection, with each process able to create   additional worker subprocesses to handle things like
concurrent  sorts.  IPC between the subprocesses can be handled using a number   of different mechanisms, perhaps
includingthe already-used SysV   shared memory pool.
 

The reason you're probably better off with this third approach is that
by the time you need the concurrency for sorting, etc., the amount of
time you'll spend on the actual process of sorting, etc. will be so
much larger than the amount of time it takes to create, manage, and
destroy the concurrent processes (even on systems that have extremely
heavyweight processes, like Solaris and Windows) that there will be no
discernable difference between using threads and using processes.  It
may take a few milliseconds to create, manage, and destroy the
subprocesses, but the amount of work to be done is likely to represent
at least a couple of *hundred* milliseconds for a concurrent approach
to be worth it at all.  And if that's the case, you may as well save
yourself the problems associated with using threads.

Even if you'd gain as much as a 10% speed improvement by using threads
to handle concurrent sorts and such instead of processes (an
improvement that is likely to be very difficult to achieve), I think
you're still going to be better off using processes.  To justify the
dangers of using threads, you'd need to see something like a factor of
two or more gain in overall performance, and I don't see how that's
going to be possible even on systems with very heavyweight processes.


I might add that the case where you're likely to gain significant
benefits from using either threads or subprocesses to handle
concurrent sorts is one in which you probably *won't* get many
concurrent connections...because if you're dealing with a lot of
concurrent connections (no matter how long-lived they may be), you're
probably *already* using all of the CPUs on the machine anyway.  The
situation where doing the concurrent subprocesses or subthreads will
help you is one where the connections in question are relatively
long-lived and are performing big, complex queries -- exactly the
situation in which threads won't help you at all relative to
subprocesses, because the amount of work to do on behalf of the
connection will dwarf (that is, be many orders of magnitude greater
than) the amount of time it takes to create, manage, and tear down a
process.

> > Of course, back here in the real world they *do* have to worry about
> > this stuff, and that's why it's important to quantify the problem.
> > It's not sufficient to say that "processes are slow and threads are
> > fast".  Processes on the target platform may well be slow relative to
> > other systems (and relative to threads).  But the question is: for the
> > problem being solved, how much overhead does process handling
> > represent relative to the total amount of overhead the solution itself
> > incurs?
> 
> That is correct. However it would be a fair assumption on part of
> postgresql developers that a process once setup does not have much
> of processing overhead involved as such, given the state of modern
> server class OS and hardware. So postgresql as it is, fits in that
> model. I mean it is fine that postgresql has heavy
> connections. Simpler solution is to pool them.

I'm in complete agreement here, and it's why I have very little faith
that a threaded approach to any of the concurrency problems will yield
enough benefits to justify the very significant drawbacks that a
threaded approach brings to the table.


-- 
Kevin Brown                          kevin@sysexperts.com



Re: Anyone working on better transaction locking?

From
Kevin Brown
Date:
cbbrowne@cbbrowne.com wrote:
> > > That's an awful lot of very bright programmers and some serious $$
> > > voting that threads are worth it.  
> > 
> > For THAT application.  for what a web server does, threads can be very 
> > useful, even useful enough to put up with the problems created by running 
> > threads on multiple threading libs on different OSes.  
> > 
> > Let me ask you, if Zeus scrams and crashes out, and it's installed 
> > properly so it just comes right back up, how much data can you lose?
> > 
> > If Postgresql scrams and crashes out, how much data can you lost?
> 
> There's another possibility, namely that the "voting" may not have
> anything to do with threading being "best."  Instead, it may be a
> road to allow the largest software houses, that can afford to have
> enough programmers that can "do threading," to crush smaller
> competitors.  After all, threading offers daunting new opportunities
> for deadlocks, data overruns, and crashes; if only those with the
> most, best thread programmers can compete, that discourages others
> from even /trying/ to compete.

Yes, but any smart small software shop will realize that threading is
more about buzzword compliance than anything else.  In the real world
where things must get done, threading is just another tool to use when
it's appropriate.  And the only time it's appropriate is when the
amount of time it takes to create, manage, and tear down a process is
a very large fraction of the total amount of time it takes to do the
work.

If we're talking about databases, it's going to be very rare that
threads will *really* buy you any significant performance advantage
over concurrent processes + shared memory.


Buzzword compliance is nice but it doesn't get things done.  At the
end of the day, all that matters is whether or not the tool you chose
does the job you need it to do for as little money as possible.  I
hope in this lean economy that people are starting to realize this.


-- 
Kevin Brown                          kevin@sysexperts.com



Re: Anyone working on better transaction locking?

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> So?  You're going to be paying those costs *anyway*, because most of the
> process context swaps will be between the application server and the
> database.  

Separating the database and application onto dedicated machines is normally
the first major optimization busy sites do when they discover that having the
two on the same machine never scales well.

--
greg



Re: Anyone working on better transaction locking?

From
Shridhar Daithankar
Date:
On Sunday 13 April 2003 09:47, you wrote:
> Even if you'd gain as much as a 10% speed improvement by using threads
> to handle concurrent sorts and such instead of processes (an
> improvement that is likely to be very difficult to achieve), I think
> you're still going to be better off using processes.  To justify the
> dangers of using threads, you'd need to see something like a factor of
> two or more gain in overall performance, and I don't see how that's
> going to be possible even on systems with very heavyweight processes.

I couldn't agree more. 

There is just a corner case to justify threads. Looking around, it would be a 
fair assumption that on any platforms threads are at least as fast as 
processes. So using threads it is guarenteed that "sub-work" will be lot more 
faster.

Of course that does not justify threads even in 5% of cases. So again, no 
reason to use threads for sort etc. However the subprocesses used should be 
simple enough. A process as heavy as a full database connection might not be 
too good.
Shridhar



Re: Anyone working on better transaction locking?

From
Shridhar Daithankar
Date:
On Saturday 12 April 2003 20:29, you wrote:
> Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> > But database is not webserver. It is not suppose to handle tons of
> > concurrent requests. That is a fundamental difference.
>
> And in one fell swoop you've dismissed the entire OLTP database industry.
>
> Have you ever called a travel agent and had him or her look up a fare in
> the airline database within seconds? Ever placed an order over the
> telephone? Ever used a busy database-backed web site?

Well, I was involved in designing a database solution for a telco for their 
support system. That was a fairly big database, aroung 600GB.  There was a 
response time limit as well. Though it was not millisecond.

Though project did not go thr. for non-technical reasons, the bechmark we did 
with postgresql/mysql/oracle left an impression that any of mysql/postgresql 
would handle that kind of load with server clustering. Furthermore postgresql 
would have been the choice given the mature SQL capabilities it has. Even 
with oracle, the database had to be clustered to keep the cost low enough.

> On database-backed web sites, probably the main application for databases
> today, almost certainly the main application for free software databases,
> every web page request translates into at least one, probably several
> database queries.

Queries != connection. We are talking about reducing number of connections 
required, not number of queries sent across.

> All those database queries must complete within a limited time, measured in
> milliseconds. When they complete another connection needs to be context
> switched in and run again within milliseconds.
>
> On a busy web site the database machine will have several processors and be
> processing queries for several web pages simultaneously, but what really
> matters is precisely the context switch time between one set of queries and
> another.

Well, If the application is split between application server and database 
server, I would rather put a cluster of low end database machines and have an 
data consolidating layer in middleware. That is cheaper than big iron 
database machine and can be expanded as required.

However this would not work in all cases unless you are able to partition the 
data. Otherwise you need a database that can have single database image 
across machines. 

If and when postgresql moves to mmap based model, postgresql running on mosix 
should be able to do it. Using PITR mechanism, it would get clustering 
abilities as well. This is been discussed before.

Right now, postgresql does not have any of these capabilities. So using 
application level data consolidation is the only answer

> The test I'm most interested in in the benchmarks effort is simply an index
> lookup or update of a single record from a large table. How many thousands
> of transactions per second is postgres going to be able to handle on the
> same machine as mysql and oracle? How many hundreds of thousands of
> transactions per second will they be able to handle on a 4 processor
> hyperthreaded machine with a raid array striped across ten disks?

I did the same test on a 4 way xeon machine with 4GB of RAM and 40GB of data. 
Both mysql and postgresql did lookups at approximately 80% speed of oracle. 
IIRC they were doing 600 queries per second but I could be off. It was more 
than 6 months ago.
However testing number of clients was not a criteria. We only tested with 10 
concurrent clients. Mysql freezes at high database loads and high number of 
concurrent connection. Postgresql has tendency to hold the load muh longer. 
So more the number of connections, faster will be response time. That should 
be a fairly flat curve for upto 100 concurrent connection. Good enough 
hardware assumed.
Shridhar



Re: Anyone working on better transaction locking?

From
Jan Wieck
Date:
Greg Stark wrote:
> 
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > So?  You're going to be paying those costs *anyway*, because most of the
> > process context swaps will be between the application server and the
> > database.
> 
> Separating the database and application onto dedicated machines is normally
> the first major optimization busy sites do when they discover that having the
> two on the same machine never scales well.

If there is enough of an "application" to separate it ... :-)

People talk about "database backed websites" when all they need is
thousands of single index lookups. Can someone give me a real world
example of such a website? And if so, what's wrong with using ndbm/gdbm? 

All these hypothetical arguments based on "the little test I made" don't
lead to anything. I can create such tests that push the CPU load to 20
or more or get all the drive LED's nervous in no time. They don't tell
anything, that's the problem. They are purely synthetic. They hammer a
few different simple queries in a totally unrealistic, hysteric fashion
against a database and are called benchmarks. There are absolutely no
means of consistency checks built into the tests and if one really runs
checksum tests after 100 concurrent clients hammered this other super
fast superior sql database for 10 minutes people wonder how inconsistent
it can become after 10 minutes ... without a single error message.

Anyone ever thought about a reference implementation of TPC-W? 


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Anyone working on better transaction locking?

From
Jan Wieck
Date:
Ron Peacetree wrote:
> 
> "Jan Wieck" <JanWieck@Yahoo.com> wrote in message
> > And you are comparing what? Just pure features and/or
> > performance, or total cost of ownership for your
> > particular case?
> >
> Technical Analysis and Business Analysis are two separate, and equally
> necessary, activities.  However, before one can accurately measure
> things like Total Cost of Ownership, one needs to have accurately and
> sufficiently characterized what will be owned and one's choices as to
> what could be owned...

Okay, so you are doing the technical analysis for now.

> [...]
> However, a correctly done Technical Analysis =should= be reasonably
> portable since among other things you don't want to have to start all
> over if your company's business or business model changes.  Clearly
> Business Analysis is very context dependant.

However, doing a technical analysis correctly does not mean to blindly
ask about all the advanced features for each subsystem. The technical
analysis is part of the entire evaluation process. That process starts
with collecting the business requirements and continues with specifying
the technical requirements based on that. Not the other way round,
because technology should not drive, it should serve (unless the
technology in question is your business).

Possible changes in business model might slightly change the technical
requirements in the future, so an appropriate security margin is added.
But the attempt to build canned technical analysis for later reuse is
what leads to the worst solutions. How good is a decision based on 2
year old technical information?

Now all the possible candidates get compared against these
"requirements". That candidate "O" has the super duper buzzword feature
"XYZ" candidate "P" does not have is of very little importance unless
"XYZ" is somewhere on the requirements list. The availability of that
extra feature will not result in any gain here.

In an earlier eMail you pointed out that 2 phase commit is essential for
SMP/distributed applications. I know well what a distributed application
is, but what in the world is an SMP application?


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Anyone working on better transaction locking?

From
Jan Wieck
Date:
Michael Paesold wrote:
> I see there is a whole TODO Chapter devoted to the topic. There is the idea
> of pre-forked and persistent backends. That would be very useful in an
> environment where it's quite hard to use connection pooling. We are
> currently working on a mail system for a free webmail. The mda (mail
> delivery agent) written in C connects to the pg database to do some queries
> everytime a new mail comes in. I didn't find a solution for connection
> pooling yet.

I am still playing with the model of reusing connections in a
transparent fashion with a pool manager that uses SCM_RIGHTS messages
over UNIX domain socketpairs. I will scribble down some concept anytime
soon. This will include some more advantages than pure startup cost
reduction, okay?


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Anyone working on better transaction locking?

From
Tom Lane
Date:
[ Warning, topic drift ahead ]

Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> However this would not work in all cases unless you are able to partition the
> data. Otherwise you need a database that can have single database image 
> across machines. 

> If and when postgresql moves to mmap based model, postgresql running on mosix
> should be able to do it.

In a thread that's been criticizing handwavy arguments for fundamental
redesigns offering dubious performance improvements, you should know
better than to say such a thing ;-)

I don't believe that such a design would work at all, much less have
any confidence that it would give acceptable performance.  Would mosix
shared memory support TAS mutexes?   I don't see how it could, really.
That leaves you needing to come up with some other low-level lock
mechanism and get it to have adequate performance across CPUs.  Even
after you've got the locking to work, what would performance be like?
Postgres is built on the assumption of cheap access to shared data
structures (lock manager, buffer manager, etc) and I don't think this'll
qualify as cheap.
        regards, tom lane



Re: Anyone working on better transaction locking?

From
Hannu Krosing
Date:
Tom Lane kirjutas P, 13.04.2003 kell 18:45:
> [ Warning, topic drift ahead ]
> 
> Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> > However this would not work in all cases unless you are able to partition the
> > data. Otherwise you need a database that can have single database image 
> > across machines. 
> 
> > If and when postgresql moves to mmap based model, postgresql running on mosix
> > should be able to do it.
> 
> In a thread that's been criticizing handwavy arguments for fundamental
> redesigns offering dubious performance improvements, you should know
> better than to say such a thing ;-)
> 
> I don't believe that such a design would work at all, much less have
> any confidence that it would give acceptable performance.  Would mosix
> shared memory support TAS mutexes?   I don't see how it could, really.
> That leaves you needing to come up with some other low-level lock
> mechanism and get it to have adequate performance across CPUs.

Does anybody have any idea how Oracle RAC does it ?

They seem to need to syncronize a lot (at least locks and data cache
coherency) across different machines.

> Even
> after you've got the locking to work, what would performance be like?
> Postgres is built on the assumption of cheap access to shared data
> structures (lock manager, buffer manager, etc) and I don't think this'll
> qualify as cheap.

[OT]

I vaguely remember some messages about getting PG to work well on NUMA
computers, which by definition should have non-uniformly cheap access to
shared data structures. 

They must have faced similar problems.

-------------
Hannu



Re: Anyone working on better transaction locking?

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
> I vaguely remember some messages about getting PG to work well on NUMA
> computers, which by definition should have non-uniformly cheap access to
> shared data structures. 

My recollection of the thread is that we didn't know how to do it ;-)
        regards, tom lane



Re: Anyone working on better transaction locking?

From
Mark Kirkwood
Date:
Mark Kirkwood wrote:

> Maybe there should be a TODO list item in the Pg "Exotic  Features" 
> for connection pooling / concentrating ???
>
>
Oh dear, there already is... (under "Startup Time"), I just missed it :-(


Mark



Re: Anyone working on better transaction locking?

From
Andrew Sullivan
Date:
On Sun, Apr 13, 2003 at 10:43:06PM +0300, Hannu Krosing wrote:
> Does anybody have any idea how Oracle RAC does it ?

According to some marketing literature I saw, it was licensed
technology; it was supposed to be related to VMS.  Moredetails I
don't have, though.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8                                        +1 416 646 3304
x110



Re: Anyone working on better transaction locking?

From
cbbrowne@cbbrowne.com
Date:
> On Sun, Apr 13, 2003 at 10:43:06PM +0300, Hannu Krosing wrote:
> > Does anybody have any idea how Oracle RAC does it ?
> 
> According to some marketing literature I saw, it was licensed
> technology; it was supposed to be related to VMS.  More details I
> don't have, though.

That would fit perfectly with it having been part of the purchase of Rdb
from Digital...  There might well be some "harvestable" Rdb information
out there somewhere...

http://citeseer.nj.nec.com/lomet92private.html

(note that (Rdb != /RDB) && (Rdb != Rand RDB); there's an unfortunate
preponderance of "things called RDB")
--
(reverse (concatenate 'string "moc.enworbbc@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/rdbms.html
Rules of the Evil Overlord #15. "I will never employ any device with a
digital  countdown.   If I  find  that  such  a device  is  absolutely
unavoidable, I  will set it to  activate when the  counter reaches 117
and the hero is just putting his plan into operation."
<http://www.eviloverlord.com/>