Thread: Threaded PosgreSQL server

Threaded PosgreSQL server

From

"Dann Corbit"

Date:

04 February 2002, 21:40:34

Are there any plans to merge the sources from the experimental threaded
server and the forked server so that a compile switch could choose the
model?

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

04 February 2002, 22:26:44

If someone wanted to submit appropriate patches for the v7.3 development
tree, that merge cleanly, I can't see why this wouldn't be a good thing
...

On Mon, 4 Feb 2002, Dann Corbit wrote:

> Are there any plans to merge the sources from the experimental threaded
> server and the forked server so that a compile switch could choose the
> model?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

Re: Threaded PosgreSQL server

From

Date:

04 February 2002, 23:46:05

I would love to see this happen but they are already quite different and
drifting further apart every day.  I am trying integrate parts of the real
PostgreSQL into threaded postgres as time permits.

I think threaded postgres could serve as a vehicle for testing the
relative value of using threads, but trying to merge patches would be a
major task.  I found the interesting marketing white paper the covers
PostgreSQL, Illustra, Informix, DSA ( using threads ), and Datablade
extensions. If
nothing else, it shows that PostgreSQL extension model can be used in
threaded environment. 

www.databaseassociates.com/pdf/infobj.pdf 

Myron Scott
mkscott@sacadia.com

On Mon, 4 Feb 2002, Marc G. Fournier wrote:

> 
> If someone wanted to submit appropriate patches for the v7.3 development
> tree, that merge cleanly, I can't see why this wouldn't be a good thing
> ...
> 
> 
> On Mon, 4 Feb 2002, Dann Corbit wrote:
> 
> > Are there any plans to merge the sources from the experimental threaded
> > server and the forked server so that a compile switch could choose the
> > model?
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> >     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> >
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

Re: Threaded PosgreSQL server

From

"D. Hageman"

Date:

05 February 2002, 00:36:53


I would have to contend that the two will never been merged into one 
source base.  If the threaded server is done correctly, then many of the 
internal structures and logic will be radically different.  I have to 
commend Mr. Scott for continuing on with this work when it was pretty 
obvious from previous discussions that this would not be "well received".


On Mon, 4 Feb 2002 mkscott@sacadia.com wrote:

> 
> 
> I would love to see this happen but they are already quite different and
> drifting further apart every day.  I am trying integrate parts of the real
> PostgreSQL into threaded postgres as time permits.
> 
> I think threaded postgres could serve as a vehicle for testing the
> relative value of using threads, but trying to merge patches would be a
> major task.  I found the interesting marketing white paper the covers
> PostgreSQL, Illustra, Informix, DSA ( using threads ), and Datablade
> extensions. If
> nothing else, it shows that PostgreSQL extension model can be used in
> threaded environment. 
> 
> www.databaseassociates.com/pdf/infobj.pdf 
> 
> Myron Scott
> mkscott@sacadia.com
> 
> 
> On Mon, 4 Feb 2002, Marc G. Fournier wrote:
> 
> > 
> > If someone wanted to submit appropriate patches for the v7.3 development
> > tree, that merge cleanly, I can't see why this wouldn't be a good thing
> > ...
> > 
> > 
> > On Mon, 4 Feb 2002, Dann Corbit wrote:
> > 
> > > Are there any plans to merge the sources from the experimental threaded
> > > server and the forked server so that a compile switch could choose the
> > > model?
> > >


-- 
//========================================================\\
||  D. Hageman                    <dhageman@dracken.com>  ||
\\========================================================//

Re: Threaded PosgreSQL server

From

"Zeugswetter Andreas SB SD"

Date:

05 February 2002, 06:07:54

> If someone wanted to submit appropriate patches for the v7.3 development
> tree, that merge cleanly, I can't see why this wouldn't be a good thing
> ...

I thought that the one thread instead of one process per client model
would only be an advantage for the "native Windows port" ?

Imho a useful threaded model on unix would involve a separation of threads
and clients. ( 1 CPU thread per physical CPU, several IO threads)
But that would involve a complete redesign.

Andreas

> > Are there any plans to merge the sources from the experimental threaded
> > server and the forked server so that a compile switch could choose the
> > model?

Re: Threaded PosgreSQL server

From

Tom Lane

Date:

05 February 2002, 11:16:37

"Marc G. Fournier" <scrappy@hub.org> writes:
> If someone wanted to submit appropriate patches for the v7.3 development
> tree, that merge cleanly, I can't see why this wouldn't be a good thing
> ...

I would resist it.  I do not think we need the portability and
reliability headaches that would come with it.  Furthermore,
an #ifdef'd implementation would be the worst of all possible
worlds, as it would do major damage to readability of the code.
        regards, tom lane

Re: Threaded PosgreSQL server

From

Haroldo Stenger

Date:

05 February 2002, 13:40:22

Dann Corbit wrote:
> 
> Are there any plans to merge the sources from the experimental threaded
> server and the forked server so that a compile switch could choose the
> model?

Just a question, in order to elighten my thought. Does the current experimental
threaded server disable multi-process model? Or does it *add* the functionality
as a compile switch? (This would be the other way round as the one you pointed
out.)

I think it is important as to evaluate resistance to go multithreading.

If they disabled the original method, I agree with Tom. If they *merged* both
flawlessly, I would try to consider it for the current tree.

Any comments?

Regards,
Haroldo.

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

05 February 2002, 23:59:32

On Tue, 5 Feb 2002, Haroldo Stenger wrote:

> Dann Corbit wrote:
> >
> > Are there any plans to merge the sources from the experimental threaded
> > server and the forked server so that a compile switch could choose the
> > model?
>
> Just a question, in order to elighten my thought. Does the current experimental
> threaded server disable multi-process model? Or does it *add* the functionality
> as a compile switch? (This would be the other way round as the one you pointed
> out.)
>
> I think it is important as to evaluate resistance to go multithreading.
>
> If they disabled the original method, I agree with Tom. If they *merged* both
> flawlessly, I would try to consider it for the current tree.
>
> Any comments?

That's kinda what I was hoping ... is it something that could be
seamlessly integrated to have minimal impact on the code itself ... even
if there was some way of having a 'thread.c' vs 'non-thread.c' that could
be link'd in, with wrapper functions?

Tha again, has anyone looked at the apache project?  Apache2 has several
"process models" ... prefork being one (like ours), or a 'worker', which
is a prefork/threaded model where you can have n child processes, with m
'threads' inside of each ... not sure if something like that coul be
retrofit'd into what we have, but ... ?

Re: Threaded PosgreSQL server

From

"Dann Corbit"

Date:

06 February 2002, 00:19:28

-----Original Message-----
From: Marc G. Fournier [mailto:scrappy@hub.org]
Sent: Tuesday, February 05, 2002 11:37 AM
To: Haroldo Stenger
Cc: Dann Corbit; Tom Lane; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Threaded PosgreSQL server
[snip]
> That's kinda what I was hoping ... is it something that could be
> seamlessly integrated to have minimal impact on the code itself ...
even
> if there was some way of having a 'thread.c' vs 'non-thread.c' that
could
> be link'd in, with wrapper functions?

> Tha again, has anyone looked at the apache project?  Apache2 has
several
> "process models" ... prefork being one (like ours), or a 'worker',
which
> is a prefork/threaded model where you can have n child processes, with
m
> 'threads' inside of each ... not sure if something like that coul be
> retrofit'd into what we have, but ... ?

It could be done, but it might be an effort.  As an example the ACE
project:
http://www.cs.wustl.edu/~schmidt/ACE.html
has a number of easily selected threading models.  It is also portable
to an
enormous number of platforms (including all flavors of UNIX).  However,
it
is C++ rather than C, and so that particular transition would probably
be
pretty traumatic if someone tried to use ACE as a toolset.  But at least
it
does demonstrate that such a thing is feasible.  As a "for instance" you
can
look at the Jaws web server (which is both open source and very much
faster
than the Apache server).  It can easily be built with many different
threading
models.

Re: Threaded PosgreSQL server

From

Date:

06 February 2002, 01:49:22

> 
> On Tue, 5 Feb 2002, Haroldo Stenger wrote:
> 
> > Just a question, in order to elighten my thought. Does the current experimental
> > threaded server disable multi-process model? Or does it *add* the functionality
> > as a compile switch? (This would be the other way round as the one you pointed
> > out.)
> >

Currently, exper. threaded postgres can have multiple processes using
multiple threads with the same shared memory.  There is no forking
involved in the process though.  Shared memory, mutexes, and conditonal
locks go global or private to the process based on a run-time flag.

> 
> That's kinda what I was hoping ... is it something that could be
> seamlessly integrated to have minimal impact on the code itself ... even
> if there was some way of having a 'thread.c' vs 'non-thread.c' that could
> be link'd in, with wrapper functions?
> 

The first basic problem is that global variables are scattered throughout
the source as well as some static stack variables.  Hunting these down and
finding a home for them is, in and of itself, a major task.  For example,
flex
produces code that is not thread safe, you have to modify that too.  The
current work around in exper. thrreaded postgres is not pretty, one
"environment" structure that holds all the normal postgres globals in
thread local storage.  This makes compile time choices impractical I
think.

Cheers,

Myron
mkscott@sacadia.com

Re: Threaded PosgreSQL server

From

Jean-Michel POURE

Date:

06 February 2002, 02:19:20

Le Mardi 5 Février 2002 20:36, Marc G. Fournier a écrit :
> Apache2 has several "process models" ... prefork being one (like ours), or 
a 'worker', which is a prefork/threaded model where you can have n child 
processes, with m 'threads' inside of each ... not sure if something like 
that coul be  retrofit'd into what we have, but ... ?

Why not try to link Cygwin staticly?

Best regards,
Jean-Michel POURE

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

06 February 2002, 10:39:42

On Tue, 5 Feb 2002 mkscott@sacadia.com wrote:

> The first basic problem is that global variables are scattered
> throughout the source as well as some static stack variables.  Hunting
> these down and finding a home for them is, in and of itself, a major
> task.  For example, flex produces code that is not thread safe, you have
> to modify that too.  The current work around in exper. thrreaded
> postgres is not pretty, one "environment" structure that holds all the
> normal postgres globals in thread local storage.  This makes compile
> time choices impractical I think.

Okay, but this has been discussed in the past concerning threading ... the
first make work that would have to be done was 'cleaning the code' so that
it was thread-safe ...

Basically, if we were to look at moving *towards* a fork/thread model in
the future, what can we learn and incorporate from the work already being
done?  How much of the work in the threaded server is cleaning up the code
to be thread-safe, that would benefit the base code itself and start us
down that path?

Right now, from everythign I've heard, making the code thread-safe is one
big onerous task ... but if we were to start incorporating changes from
the 'thread work' that is being done now, into the base server, and ppl
start thinking thread-safe when they are coding new stuff, over time, this
task becomes smaller ...

Re: Threaded PosgreSQL server

From

nconway@klamath.dyndns.org (Neil Conway)

Date:

06 February 2002, 14:40:53

On Tue, Feb 05, 2002 at 03:36:41PM -0400, Marc G. Fournier wrote:
> Tha again, has anyone looked at the apache project?  Apache2 has several
> "process models" ... prefork being one (like ours), or a 'worker', which
> is a prefork/threaded model where you can have n child processes, with m
> 'threads' inside of each ... not sure if something like that coul be
> retrofit'd into what we have, but ... ?

We could even use the nice Apache Portable Runtime, which is a
platform-independant layer over threading/networking/shm/etc (there's a
summary here: http://apr.apache.org/docs/apr/modules.html).
This might improve PostgreSQL on non-UNIX platforms, namely Win32.

However, I think using threads is only a good idea if it gets us a
substantial performance increase. From what I've seen, that isn't the
case; and even if the time to create a connection is a bottleneck, there
are other, more conservative ways of improving it (e.g. pre-forking,
persistent backends, and IIRC some work Tom Lane was doing to reduce
backend startup time).

And given the complexity and reduced reliability that threads bring, I
think the only advantage would be buzzword-compliance -- which isn't a
priority, personally.

Cheers,

Neil

Re: Threaded PosgreSQL server

From

Doug McNaught

Date:

06 February 2002, 15:49:52

nconway@klamath.dyndns.org (Neil Conway) writes:

> However, I think using threads is only a good idea if it gets us a
> substantial performance increase. From what I've seen, that isn't the
> case; and even if the time to create a connection is a bottleneck, there
> are other, more conservative ways of improving it (e.g. pre-forking,
> persistent backends, and IIRC some work Tom Lane was doing to reduce
> backend startup time).

The one place where it could be a clear win would be in splitting
single very large queries over multiple CPUs.  This would probably
require an even larger redesign of the whole system than moving to a
query-per-thread rather than per-process model.  I think "real"
multi-master replication and clustering is a better goal in the short
term...

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863

Re: Threaded PosgreSQL server

From

Haroldo Stenger

Date:

06 February 2002, 16:00:37

Doug McNaught wrote:
> The one place where it could be a clear win would be in splitting
> single very large queries over multiple CPUs.  This would probably
> require an even larger redesign of the whole system than moving to a
> query-per-thread rather than per-process model.  I think "real"
> multi-master replication and clustering is a better goal in the short
> term...

Agreed.

Though, starting to think & code thread safe would be nice too. 

Regards,
Haroldo.

Re: Threaded PosgreSQL server

From

Date:

06 February 2002, 16:40:55

On Wed, 6 Feb 2002, Marc G. Fournier wrote:

> Right now, from everythign I've heard, making the code thread-safe is one
> big onerous task ... but if we were to start incorporating changes from
> the 'thread work' that is being done now, into the base server, and ppl
> start thinking thread-safe when they are coding new stuff, over time, this
> task becomes smaller ...
> 

I agree, once the move is made to thread-safe it becomes much easier to
maintain thread-safe code.  I also very much like the idea of multiple
thread/process models that could be chosen from.  I think the question has
always been the
inital cost vs. benefit.  The group has not seen much to be gained for
the amount of initial work involved.  After working with the code, I too
felt it wasn't worth it.  

After revisiting the threaded code after a long break I now see some real
benefits to threading.  For example,  I was able to incorporate Tom Lane's
lazy_vacuum code to do relation clean up automatically when a threshold of
page writes occurred.  I was also able to use the freespace information to
be shared among threads in the process without touching shared mem.  As a
result, a pgbench run with 20 clients and over 1,000,000
trasactions maintained a more or less constant tps with manual
vacuum commands and far less heap expansion.  You can do this with
processes (planned for 7.3 I think) but I
think it was much easier with threads.  Other things may open up with
threads as well like Java stored procedures.  Anyway, now I think it is
worth it.

Myron
mkscott@sacadia.com

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

06 February 2002, 17:09:56

On Wed, 6 Feb 2002, Peter Eisentraut wrote:

> Haroldo Stenger writes:
>
> > Though, starting to think & code thread safe would be nice too.
>
> The thing about thread-safeness is that it's only actually useful when
> you're using threads.  Otherwise it wastes everybody's time -- the
> programmer's, the computer's, and the user's.

The thing is, there are several areas where using threads would be a
benefit, from what I've read on this list over the years ... as time goes
on, less and less of the OSs in use dont' have threads, so we have to
start *somewhere* to work towards that sort of hybrid system ...

Re: Threaded PosgreSQL server

From

Peter Eisentraut

Date:

06 February 2002, 17:09:56

Haroldo Stenger writes:

> Though, starting to think & code thread safe would be nice too.

The thing about thread-safeness is that it's only actually useful when
you're using threads.  Otherwise it wastes everybody's time -- the
programmer's, the computer's, and the user's.

-- 
Peter Eisentraut   peter_e@gmx.net

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

06 February 2002, 17:19:54

On Wed, 6 Feb 2002 mkscott@sacadia.com wrote:

> After revisiting the threaded code after a long break I now see some
> real benefits to threading.  For example, I was able to incorporate Tom
> Lane's lazy_vacuum code to do relation clean up automatically when a
> threshold of page writes occurred.  I was also able to use the freespace
> information to be shared among threads in the process without touching
> shared mem.  As a result, a pgbench run with 20 clients and over
> 1,000,000 trasactions maintained a more or less constant tps with manual
> vacuum commands and far less heap expansion.  You can do this with
> processes (planned for 7.3 I think) but I think it was much easier with
> threads.  Other things may open up with threads as well like Java stored
> procedures.  Anyway, now I think it is worth it.

Are there code clean-ups that have gone into the thread'd code that could
be incorporated into the existing code base that would start us down that
path?  For instance, based my limited understanding of threaded servers, I
believe that 'global variables' are generally considered "A Real Bad
Thing" ... in one of your email's, you mentioned:

"The first basic problem is that global variables are scattered throughout
the source as well as some static stack variables.  Hunting these down and
finding a home for them is, in and of itself, a major task.  For example,
flex produces code that is not thread safe, you have to modify that too.
The current work around in exper. thrreaded postgres is not pretty, one
"environment" structure that holds all the normal postgres globals in
thread local storage.  This makes compile time choices impractical I
think."

Now, what is a 'clean' solution to this?  Making sure that all variables
are passed through to various functions, maybe through a struct construct?
So, can we start there and work our way through the code?  Start simple
... take one of the global(s), put it into the struct and take it out of
global space and make sure that its passed appropriately through all the
required functions ... add in the next one, and do another trace?

Someone, or a group of ppl, with thread knowledge needs to start this
forward ... once the clean up begins, even without any thread code thrown
in, it shouldn't be too difficult to keep it clean to go to 'the next
step', no?

Re: Threaded PosgreSQL server

From

Haroldo Stenger

Date:

06 February 2002, 17:41:58

Peter Eisentraut wrote:
> 
> Haroldo Stenger writes:
> 
> > Though, starting to think & code thread safe would be nice too.
> 
> The thing about thread-safeness is that it's only actually useful when
> you're using threads.  Otherwise it wastes everybody's time -- the
> programmer's, the computer's, and the user's.

Yes I see. The scenario under which I see doing it to be useful, is thinking in
adding multi-threading for PG v 7.5 say, and preparing the road. But maybe it's
a worthless effort. Many developers are pointing it. Let's forget about threads
for now.

By the way, my original question about how integrated the multi-threading fork
reached, remained unanswered. I will assume it went threading, dropping forever
the original behaviour, so deciding me towards not considering threading a
viable option (for now).

Regards,
Haroldo.

Re: Threaded PosgreSQL server

From

Haroldo Stenger

Date:

06 February 2002, 17:42:02

"Marc G. Fournier" wrote:
> The thing is, there are several areas where using threads would be a
> benefit, from what I've read on this list over the years ... as time goes
> on, less and less of the OSs in use dont' have threads, so we have to
> start *somewhere* to work towards that sort of hybrid system ...

Yes.

But, maybe things like full-fledged replication, savepoints/nested transactions,
out-of-transaction-scope cursors, and others must have priority over this; and
that mutating PG thread safe, will slow down a 7.3 release a lot, something not
wanted by many here.

Let's make a pro cons list of thread related aspectcs here. We saw a lot of
cons. Write some pros explicitely. We're not in a hurry anyway.

Regards,
Haroldo,

Re: Threaded PosgreSQL server

From

Haroldo Stenger

Date:

06 February 2002, 18:39:48

mkscott@sacadia.com wrote:
> 
> On Wed, 6 Feb 2002, Haroldo Stenger wrote:
> 
> >
> > By the way, my original question about how integrated the multi-threading fork
> > reached, remained unanswered. I will assume it went threading, dropping forever
> > the original behaviour, so deciding me towards not considering threading a
> > viable option (for now).
> 
> Yes, you can use postmaster and fork for a connection...or at least you
> could prior to some recent changes.  I haven't tested it that way for
> awhile but it should work.

I find it very interesting. So you are telling us you were successfull in
keeping both functionalities? So why don't you tell us what of an effort was it
to convert the code to thread-safe? Just to compose a community view of the
issue, and make a rational decision...

Regards,
Haroldo.

Re: Threaded PosgreSQL server

From

Date:

06 February 2002, 18:49:53

On Wed, 6 Feb 2002, Haroldo Stenger wrote:

> 
> By the way, my original question about how integrated the multi-threading fork
> reached, remained unanswered. I will assume it went threading, dropping forever
> the original behaviour, so deciding me towards not considering threading a
> viable option (for now).

Yes, you can use postmaster and fork for a connection...or at least you
could prior to some recent changes.  I haven't tested it that way for
awhile but it should work.

Myron
mkscott@sacadia.com

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

06 February 2002, 21:19:58

On Wed, 6 Feb 2002, Haroldo Stenger wrote:

> "Marc G. Fournier" wrote:
> > The thing is, there are several areas where using threads would be a
> > benefit, from what I've read on this list over the years ... as time goes
> > on, less and less of the OSs in use dont' have threads, so we have to
> > start *somewhere* to work towards that sort of hybrid system ...
>
> Yes.
>
> But, maybe things like full-fledged replication, savepoints/nested
> transactions, out-of-transaction-scope cursors, and others must have
> priority over this; and

If this are priorities for some, we do welcome patches from them to make
it happen ... it is an open source project ... I am trying to encourage
one person how has obviously spent a good deal of time on the whole
threaded issue to start working at using his experience with PgSQL and
Threading to see what, if anything, can be done to try and keep his work
and ours from diverging too far ...

> that mutating PG thread safe, will slow down a 7.3 release a lot,
> something not wanted by many here.

Depends on how it is handled ...

Re: Threaded PosgreSQL server

From

Jeff Davis

Date:

06 February 2002, 21:30:19

> Let's make a pro cons list of thread related aspectcs here. We saw a lot of
> cons. Write some pros explicitely. We're not in a hurry anyway.

I think in addition to pros/cons, an important question is:
How has threading influenced other DBMS's? I know MySQL uses threading, at 
least in the development version; how much has it helped? Is the utility of a 
database based partly on the presence of threading? Take Oracle, MsSQL, and 
others; which have threading and which seem to gain from threading?

I don't follow the other DB's as closely, so I don't know the answers.

I suspect that looking at other databases will give us a clue about the 
magnitude of the pros, rather than just the areas of influence.

Regards,Jeff

Re: Threaded PosgreSQL server

From

Haroldo Stenger

Date:

06 February 2002, 21:49:48

"Marc G. Fournier" wrote:
> 
> On Wed, 6 Feb 2002, Haroldo Stenger wrote:
> 
> > "Marc G. Fournier" wrote:
> > > The thing is, there are several areas where using threads would be a
> > > benefit, from what I've read on this list over the years ... as time goes
> > > on, less and less of the OSs in use dont' have threads, so we have to
> > > start *somewhere* to work towards that sort of hybrid system ...
> >
> > Yes.
> >
> > But, maybe things like full-fledged replication, savepoints/nested
> > transactions, out-of-transaction-scope cursors, and others must have
> > priority over this; and
> 
> If this are priorities for some, we do welcome patches from them to make
> it happen ... it is an open source project ... I am trying to encourage
> one person how has obviously spent a good deal of time on the whole
> threaded issue to start working at using his experience with PgSQL and
> Threading to see what, if anything, can be done to try and keep his work
> and ours from diverging too far ...

Yes, that was my very original thinking. We shouldn't waste programmers or code.
But we're trying to make an idea of cost/benefit/risk. Let's go on with this
discussion, basing it on the pros outlined by the threaded fork knowledge
holders, right? Maybe they are tired, maybe they spent too much effort, and
don't want to do it again. Should that be the case, at least let us obtain
information from the *developent process* of their work in order to measure the
impact on current source tree with current programming force.

> 
> > that mutating PG thread safe, will slow down a 7.3 release a lot,
> > something not wanted by many here.
> 
> Depends on how it is handled ...

How do you see it not slowing down, when key developers said their view is that
multithreading will pose a major obstacle? Are you envisioning any special
approach not already talked about?

Regards,
Haroldo.

Re: Threaded PosgreSQL server

From

Date:

06 February 2002, 22:20:29

On Wed, 6 Feb 2002, Marc G. Fournier wrote:

> Are there code clean-ups that have gone into the thread'd code that could
> be incorporated into the existing code base that would start us down that
> path?  

I don't think existing code is much help.  So much has changed since 7.0.2
that the current threaded code is propbably only good for investigating
the benefits of threading and maybe some porting techniques.

> For instance, based my limited understanding of threaded servers, I
> believe that 'global variables' are generally considered "A Real Bad
> Thing" ... in one of your email's, you mentioned:
> 
> "The first basic problem is that global variables are scattered throughout
> the source ..."
> 
> Now, what is a 'clean' solution to this?  

The current threaded postgres is messy because I just packed all the
global variables, including those produced be flex, into a 5K structure.
Everytime threaded code needed a "global", it called a function to
retrieve a pointer from thread local storage.  When I profiled the code I
saw way too many calls to grab the environment structure and I modified
some hotspots to pass the structure down the call chain.  Ideally, I think
that the "environment" structure could be optimized for size and passed
down the call chain to reduce the number of times thread local storage is
accessed.  This is also bad because when anyone working on a segment of
code needs a global, they need to add it to the "environment" structure.
I don't think this would be a good situation for code maintainers.

> 
> Someone, or a group of ppl, with thread knowledge needs to start this
> forward ... once the clean up begins, even without any thread code thrown
> in, it shouldn't be too difficult to keep it clean to go to 'the next
> step', no?
> 

I came up with a process to find global variables in the code that became
somewhat effective and could be applied to the current code.  Someone else
might have a better way of ding this though.

Myron
mkscott@sacadia.com

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

06 February 2002, 22:29:58

On Wed, 6 Feb 2002, Haroldo Stenger wrote:

> > Depends on how it is handled ...
>
> How do you see it not slowing down, when key developers said their view
> is that multithreading will pose a major obstacle? Are you envisioning
> any special approach not already talked about?

Read my previous emails?  To move *any part of PgSQL* to a threaded model
(even one where each connection is still forked, but parts of each
connection are threaded), the mess of global variables needs to be cleaned
up ... that will be one of the "major obstacles" ... if someone with a
knowledge of making code thread-safe were to submit patches (even very
large ones) that start to clean this up, it could be broken down into more
manageable chunks ...

The second major obstacle that has been identified is cross-platform
comapability ... as I mentioned already, and another has also, Apache2 has
their APR code that might help us reduce that obstacle to a more
manageable level, since, I believe, the Apache license wouldn't restrict
us to being able to use/distribute the code ... this is definitely
something that we'd have to look into to make sure though ...

The point is that nobody is even implying that this is a "for v7.3"
project ... there have been several projects that have been initiated over
the years that have straddled releases, and we have alot of very good
developers, and testers, that will make sure that any changes are "for the
good" ...

Re: Threaded PosgreSQL server

From

Brian Bruns

Date:

06 February 2002, 22:40:39

On Wed, 6 Feb 2002, Haroldo Stenger wrote:

> > > that mutating PG thread safe, will slow down a 7.3 release a lot,
> > > something not wanted by many here.
> > 
> > Depends on how it is handled ...
> 
> How do you see it not slowing down, when key developers said their view is that
> multithreading will pose a major obstacle? Are you envisioning any special
> approach not already talked about?

Excuse my butting in, but it large part we are talking about changing 
things like:

if (PqSomeStaticOrGlobalVariable) { ... }

to 

if (MyPort->PqSomeVariable) { ... }

converting to thread safety should not, at least for this kind of low 
hanging fruit, have any negative performance impact.  And from my vantage 
point it takes out a whole lot of "where did that come from and who set it 
when?" kinda questions when reading the code.  Of course I'm just getting 
my feet wet so feel free to correct my first impressions.

Brian

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

06 February 2002, 22:50:57

On Wed, 6 Feb 2002, Brian Bruns wrote:

> On Wed, 6 Feb 2002, Haroldo Stenger wrote:
>
> > > > that mutating PG thread safe, will slow down a 7.3 release a lot,
> > > > something not wanted by many here.
> > >
> > > Depends on how it is handled ...
> >
> > How do you see it not slowing down, when key developers said their view is that
> > multithreading will pose a major obstacle? Are you envisioning any special
> > approach not already talked about?
>
> Excuse my butting in, but it large part we are talking about changing
> things like:
>
> if (PqSomeStaticOrGlobalVariable) { ... }
>
> to
>
> if (MyPort->PqSomeVariable) { ... }
>
> converting to thread safety should not, at least for this kind of low
> hanging fruit, have any negative performance impact.  And from my vantage
> point it takes out a whole lot of "where did that come from and who set it
> when?" kinda questions when reading the code.  Of course I'm just getting
> my feet wet so feel free to correct my first impressions.

This is one way that it could be accomplish ... I think one of the more
proper ways would be to convert the Global variables to proper function
calls ... a combination of the two would most likely be optimal ...

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

06 February 2002, 22:50:57

On Wed, 6 Feb 2002 mkscott@sacadia.com wrote:

> I came up with a process to find global variables in the code that
> became somewhat effective and could be applied to the current code.
> Someone else might have a better way of ding this though.

Is this something that could be added to the distribution similar to some
of the other development tools?  Is it a shell script?

Re: Threaded PosgreSQL server

From

Haroldo Stenger

Date:

07 February 2002, 00:19:37


Brian Bruns wrote:
> 
> On Wed, 6 Feb 2002, Haroldo Stenger wrote:
> 
> > > > that mutating PG thread safe, will slow down a 7.3 release a lot,
> > > > something not wanted by many here.
> > >
> > > Depends on how it is handled ...
> >
> > How do you see it not slowing down, when key developers said their view is that
> > multithreading will pose a major obstacle? Are you envisioning any special
> > approach not already talked about?
> 
> Excuse my butting in, but it large part we are talking about changing
> things like:
> 
> if (PqSomeStaticOrGlobalVariable) { ... }
> 
> to
> 
> if (MyPort->PqSomeVariable) { ... }
> 
> converting to thread safety should not, at least for this kind of low
> hanging fruit, have any negative performance impact.  And from my vantage
> point it takes out a whole lot of "where did that come from and who set it
> when?" kinda questions when reading the code.  Of course I'm just getting
> my feet wet so feel free to correct my first impressions.

Just that when I said "will slow down a 7.3 release a lot", I was referring to
*the date of the release*, not its inherent performance, the code to be
multi-threaded or not. It was a software engineering sort of consideration. 

Regards,
Haroldo.

Re: Threaded PosgreSQL server

From

Haroldo Stenger

Date:

07 February 2002, 00:49:35

Here I'll respectfully compile the opinions that I found of impact over a
dicision:

Revisited key developer opinion 1:

Tom Lane wrote:
> > If someone wanted to submit appropriate patches for the v7.3 development
> > tree, that merge cleanly, I can't see why this wouldn't be a good thing
> 
> I would resist it.  I do not think we need the portability and
> reliability headaches that would come with it.  Furthermore,
> an #ifdef'd implementation would be the worst of all possible
> worlds, as it would do major damage to readability of the code.

Revisited key developer opinion 2:

Peter Eisentraut wrote:
> > Though, starting to think & code thread safe would be nice too.
> 
> The thing about thread-safeness is that it's only actually useful when
> you're using threads.  Otherwise it wastes everybody's time -- the
> programmer's, the computer's, and the user's.

So at least for Tom Lane and Peter E., threads are hard to implement. For Tom,
we would enter a world of portability and reliability headaches. For Peter,
unless we *want* threads, we don't have to start *now* coding thread safe.
Please correct me if I'm wrong.

Zeugswetter Andreas SB SD wrote:
> > If someone wanted to submit appropriate patches for the v7.3 development
> > tree, that merge cleanly, I can't see why this wouldn't be a good thing
> 
> I thought that the one thread instead of one process per client model
> would only be an advantage for the "native Windows port" ?
> 
> Imho a useful threaded model on unix would involve a separation of threads
> and clients. ( 1 CPU thread per physical CPU, several IO threads)
> But that would involve a complete redesign.

For Andreas, for a threaded PG to be useful under a Unix environment, a complete
PG redesign would be needed.

"Marc G. Fournier" wrote:
> 
> On Wed, 6 Feb 2002, Haroldo Stenger wrote:
> 
> > > Depends on how it is handled ...
> >
> > How do you see it not slowing down, when key developers said their view
> > is that multithreading will pose a major obstacle? Are you envisioning
> > any special approach not already talked about?
> 
> Read my previous emails?  To move *any part of PgSQL* to a threaded model
> (even one where each connection is still forked, but parts of each
> connection are threaded), the mess of global variables needs to be cleaned
> up ... that will be one of the "major obstacles" ... if someone with a
> knowledge of making code thread-safe were to submit patches (even very
> large ones) that start to clean this up, it could be broken down into more
> manageable chunks ...
> 

Yes, I liked too the idea of multiple process, running multiple threads each,
distributed under some wise criteria.

I wonder if cleaning up the mess of global variables, seems not convenient from
Peter's or Tom's point of view. Standard wisdom says globals should be avoided.
In current PG's case, they should be reworked in a way or another.

> The second major obstacle that has been identified is cross-platform
> comapability ... as I mentioned already, and another has also, Apache2 has
> their APR code that might help us reduce that obstacle to a more
> manageable level, since, I believe, the Apache license wouldn't restrict
> us to being able to use/distribute the code ... this is definitely
> something that we'd have to look into to make sure though ...

I agree with cross-polinization among open source projects. BTW, this practice
should be encouraged, and not called "stealing", not even as a joke, as I've
seen it called for example for the TCP/IP Linux stack code (99% sure this was
the one module), which came from the *BSD projects, in its very first version.
Also mentioning that BSD -> GPL was possible, but not the other way round; I
don't mean to start a war or anything, just exposing facts.

> The point is that nobody is even implying that this is a "for v7.3"
> project ... there have been several projects that have been initiated over
> the years that have straddled releases, and we have alot of very good
> developers, and testers, that will make sure that any changes are "for the
> good" ...

Yes, I agree. If starting to think & code thread safe *now* proves *not* to be a
waste of everybody's time, that's the path to follow. This very point is the one
under technical examination, right? 

Regards,
Haroldo.

Re: Threaded PosgreSQL server

From

Date:

07 February 2002, 01:09:48

On Wed, 6 Feb 2002, Jeff Davis wrote:

> I think in addition to pros/cons, an important question is:
> How has threading influenced other DBMS's? I know MySQL uses threading, at 
> least in the development version; how much has it helped? Is the utility of a 

I think threads was or is a big deal Informix (now IBM) Dynamic Server. 
With a combination of multiple processes and threads it is able to spread a
query among multiple processors and recruit more resources for complex
queries.

Myron

Re: Threaded PosgreSQL server

From

Date:

07 February 2002, 01:19:51

On Wed, 6 Feb 2002, Marc G. Fournier wrote:

> 
> Is this something that could be added to the distribution similar to some
> of the other development tools?  Is it a shell script?
> 

No, but I suppose it could and should be, I just used a combination of the
commands nm and grep to find all the global symbols in the object files of
each subsection then went through the code and determined if they needed
to be moved.  

Myron

Re: Threaded PosgreSQL server

From

Hannu Krosing

Date:

07 February 2002, 05:23:04

On Wed, 2002-02-06 at 23:00, mkscott@sacadia.com wrote:
> 
> 
> On Wed, 6 Feb 2002, Marc G. Fournier wrote:
> 
> > Right now, from everythign I've heard, making the code thread-safe is one
> > big onerous task ... but if we were to start incorporating changes from
> > the 'thread work' that is being done now, into the base server, and ppl
> > start thinking thread-safe when they are coding new stuff, over time, this
> > task becomes smaller ...
> > 
> 
> I agree, once the move is made to thread-safe it becomes much easier to
> maintain thread-safe code.  I also very much like the idea of multiple
> thread/process models that could be chosen from.  I think the question has
> always been the
> inital cost vs. benefit.  The group has not seen much to be gained for
> the amount of initial work involved.  After working with the code, I too
> felt it wasn't worth it.  
> 
> After revisiting the threaded code after a long break I now see some real
> benefits to threading.  For example,  I was able to incorporate Tom Lane's
> lazy_vacuum code to do relation clean up automatically when a threshold of
> page writes occurred. 

Could you please explain why it was easier to do with your threaded
version than with the standard version ?

> I was also able to use the freespace information to
> be shared among threads in the process without touching shared mem.  As a
> result, a pgbench run with 20 clients and over 1,000,000
> trasactions maintained a more or less constant tps with manual
> vacuum commands and far less heap expansion.

Do you mean that "it ran at more or less the same speed as when running
comcurrent manual VACUUMs" ?

Btw, have you tried comparing pgbench runs on threaded model vs forked
model. IIRC your code can run both ways.

> You can do this with processes (planned for 7.3 I think) but I
> think it was much easier with threads.  Other things may open up with
> threads as well like Java stored procedures.  Anyway, now I think it is
> worth it.

In my experience any code cleanup will eventually pay off (if the
project lives long enough :)

---------
Hannu

Re: Threaded PosgreSQL server

From

Karel Zak

Date:

07 February 2002, 06:10:31

On Thu, Feb 07, 2002 at 12:03:56PM +0200, Hannu Krosing wrote:

> Btw, have you tried comparing pgbench runs on threaded model vs forked
> model. IIRC your code can run both ways.
It depend on OS. For example do fork and create thread is very simular on Linux. May be ..can be some speed difference
betweenlocking and access to shared memory?
 
IMHO in thread version is problem with backend crash (user's bugs in PL .etc).

> > You can do this with processes (planned for 7.3 I think) but I
> > think it was much easier with threads.  Other things may open up with
> > threads as well like Java stored procedures.  Anyway, now I think it is
> > worth it.
Are all current PL interpereters thread safe?
       Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz,
http://mape.jcu.cz

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

07 February 2002, 07:40:31

On Thu, 7 Feb 2002, Haroldo Stenger wrote:

>
>
> Brian Bruns wrote:
> >
> > On Wed, 6 Feb 2002, Haroldo Stenger wrote:
> >
> > > > > that mutating PG thread safe, will slow down a 7.3 release a lot,
> > > > > something not wanted by many here.
> > > >
> > > > Depends on how it is handled ...
> > >
> > > How do you see it not slowing down, when key developers said their view is that
> > > multithreading will pose a major obstacle? Are you envisioning any special
> > > approach not already talked about?
> >
> > Excuse my butting in, but it large part we are talking about changing
> > things like:
> >
> > if (PqSomeStaticOrGlobalVariable) { ... }
> >
> > to
> >
> > if (MyPort->PqSomeVariable) { ... }
> >
> > converting to thread safety should not, at least for this kind of low
> > hanging fruit, have any negative performance impact.  And from my vantage
> > point it takes out a whole lot of "where did that come from and who set it
> > when?" kinda questions when reading the code.  Of course I'm just getting
> > my feet wet so feel free to correct my first impressions.
>
> Just that when I said "will slow down a 7.3 release a lot", I was referring to
> *the date of the release*, not its inherent performance, the code to be
> multi-threaded or not. It was a software engineering sort of consideration.

Again, if we go at it as 'threaded for v7.3', then most probably ... but I
would not allow that to happen, nor would any of the *core* developers ...
what I am, and have been, advocating is starting down the 'thread-safe'
path ... as has actually been discussed before, there are sections of
PostgreSQL that could make use of threading without the whole system
*being* threaded ... stuff that, right now, are done sequentially that
could be done in parralel if threading was available ...

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

07 February 2002, 09:21:03

On Thu, 7 Feb 2002, Haroldo Stenger wrote:

> Here I'll respectfully compile the opinions that I found of impact over a
> dicision:
>
> Revisited key developer opinion 1:
>
> Tom Lane wrote:
> > > If someone wanted to submit appropriate patches for the v7.3 development
> > > tree, that merge cleanly, I can't see why this wouldn't be a good thing
> >
> > I would resist it.  I do not think we need the portability and
> > reliability headaches that would come with it.  Furthermore,
> > an #ifdef'd implementation would be the worst of all possible
> > worlds, as it would do major damage to readability of the code.

Put this into context ... I had suggested someone submit'ng #ifdef'd code
that could implement threaded, not that someone submit'd code to clean up
a mess that nobody *really* wants to clean up due to time and lack of
visibility/glory *grin*

> > Revisited key developer opinion 2:
>
> Peter Eisentraut wrote:
> > > Though, starting to think & code thread safe would be nice too.
> >
> > The thing about thread-safeness is that it's only actually useful when
> > you're using threads.  Otherwise it wastes everybody's time -- the
> > programmer's, the computer's, and the user's.
>
>
> So at least for Tom Lane and Peter E., threads are hard to implement.
> For Tom, we would enter a world of portability and reliability
> headaches. For Peter, unless we *want* threads, we don't have to start
> *now* coding thread safe. Please correct me if I'm wrong.

yes and no ... Tom is/was looking at it from an 'implement it for all the
systems we currently support' point of view, without looking at (and Tom,
feel free to correct me if I'm wrong) what has been implemented outside of
our project to simplify the portability and reliability issues associated
with supporting both a fork and fork/thread model ... with the work that
the Apache group has done in this regard, and the fact that their license
is not restrictive, both issues may (or may not) be moot, but someone has
to investigate that ...

In Peter's case ... I'm sorry, but I was always taught in programming that
"global variables should be avoided at all costs" ... right now, all I'm
advocating *right now* is making our variables thread safe, which, from my
understanding, means getting rid of the global variables ... not sure how
that affects the users themselves, but, from a programmers standpoint, the
'time' is what the person cleaning the code has to put into it ... once
its cleaned up, any new code or changes should just automatically be
"global variables aren't permitted"

Both Tom and Peter have better/more important things on their plates then
to go through the code and clean up the global variables ...

Eventually, I would like to see, where possible, threaded code put in so
that each connection is *still* forked, but parts of the connection that
could deal with more parralel processing making use of threads to speed it
up ...

> I wonder if cleaning up the mess of global variables, seems not
> convenient from Peter's or Tom's point of view. Standard wisdom says
> globals should be avoided. In current PG's case, they should be reworked
> in a way or another.

Correct, and that is what I am currently advocating ... if we get that
cleaned up, so that 'threaded' is possible, nothing stops the next step
being someone submit'ng a simple patch that uses threading to 'read from
disk while processing what has been read in, as it is being read in' ...
the point is, until we clean out the *time consuming, but relatively easy*
anti-thread issues we have, even if that is over several releases, nothing
else is going to happen cause "its too big of a job" ... what I would like
to see is someone submitting large patches that clean the global
variables, one global at a time ... I say large, because I would imagine
that pretty much any global is going to hit a *large* number of files to
remove it, and add it back in as an arg to functions ...

I can't see anyone convincingly argue against such patches, since, IMHO,
global variables are a remenent of when we took over the code from
Berkeley, I can't see any of the core developers actually *approving* of
them being there except the work involved in removing them ... :)

Re: Threaded PosgreSQL server

From

Justin Clift

Date:

07 February 2002, 09:53:55

Haroldo Stenger wrote:
<snip>
> 
> I agree with cross-polinization among open source projects. BTW, this practice
> should be encouraged, and not called "stealing", not even as a joke, as I've
> seen it called for example for the TCP/IP Linux stack code (99% sure this was
> the one module), which came from the *BSD projects, in its very first version.
> Also mentioning that BSD -> GPL was possible, but not the other way round; I
> don't mean to start a war or anything, just exposing facts.
> 
> > The point is that nobody is even implying that this is a "for v7.3"
> > project ... there have been several projects that have been initiated over
> > the years that have straddled releases, and we have alot of very good
> > developers, and testers, that will make sure that any changes are "for the
> > good" ...
> 
> Yes, I agree. If starting to think & code thread safe *now* proves *not* to be a
> waste of everybody's time, that's the path to follow. This very point is the one
> under technical examination, right?

So, with this thought in mind of "starting to think & code thread safe",
we should start putting together a set of reference guidlines,
especially drawing on the experience of people whom have good, solid
experience with threaded, multi-process, cross-platform coding.  It
should take into account the people who are reading it, may not be as
experienced in this um... specialised area of coding too.

We've identified "global variables" needing to be done in a better and
more consistent way.

So, what else do coders need to do when "thinking and coding thread
safe", that we can make into a guidline for forthcoming PostgreSQL
coding?

:-)

Regards and best wishes,

Justin Clift

> Regards,
> Haroldo.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi

Re: Threaded PosgreSQL server

From

Date:

07 February 2002, 11:40:12


> Again, if we go at it as 'threaded for v7.3', then most probably ... but I
> would not allow that to happen, nor would any of the *core* developers ...
> what I am, and have been, advocating is starting down the 'thread-safe'
> path ... as has actually been discussed before, there are sections of
> PostgreSQL that could make use of threading without the whole system
> *being* threaded ... stuff that, right now, are done sequentially that
> could be done in parralel if threading was available ...


How about doing what Marc suggests and start moving toward reentrant
functions in postgres.   

This could be done by creating a global private
memory area that is accessed much like shared memory is now with a hash
table setting aside memory for various code subsections.  We could put
all the global variables there with little impact on current functionality
and, if done right, speed.  I think I have a good idea as to where most of
the "difficult" globals are and could start working on moving them once
the global memory area was set up.  We can worry about threads vs.
processes later.


comments?

Myron

Re: Threaded PosgreSQL server

From

mlw

Date:

07 February 2002, 12:10:17

Justin Clift wrote:
> 
> Haroldo Stenger wrote:
> <snip>
> >
> > I agree with cross-polinization among open source projects. BTW, this practice
> > should be encouraged, and not called "stealing", not even as a joke, as I've
> > seen it called for example for the TCP/IP Linux stack code (99% sure this was
> > the one module), which came from the *BSD projects, in its very first version.
> > Also mentioning that BSD -> GPL was possible, but not the other way round; I
> > don't mean to start a war or anything, just exposing facts.
> >
> > > The point is that nobody is even implying that this is a "for v7.3"
> > > project ... there have been several projects that have been initiated over
> > > the years that have straddled releases, and we have alot of very good
> > > developers, and testers, that will make sure that any changes are "for the
> > > good" ...
> >
> > Yes, I agree. If starting to think & code thread safe *now* proves *not* to be a
> > waste of everybody's time, that's the path to follow. This very point is the one
> > under technical examination, right?
> 
> So, with this thought in mind of "starting to think & code thread safe",
> we should start putting together a set of reference guidlines,
> especially drawing on the experience of people whom have good, solid
> experience with threaded, multi-process, cross-platform coding.  It
> should take into account the people who are reading it, may not be as
> experienced in this um... specialised area of coding too.
> 
> We've identified "global variables" needing to be done in a better and
> more consistent way.
> 
> So, what else do coders need to do when "thinking and coding thread
> safe", that we can make into a guidline for forthcoming PostgreSQL
> coding?

Going from a "process model" to a "threaded model" is a HUGE
undertaking. In the process model, all data is assumed to be private,
and shared data must be explicitly shared.  In a threaded model all data
is implicitly shared and private data must be explicitly made private.
Do not under estimate what this means or how hard it is to convert one
to the other.

Also:

Think of file handles. In a threaded version of postgreSQL, all
connections will be competing for file handles. I think the limit in
Linux is 1024.

All threads will be competing for memory mapping. As systems get more
and more RAM, on the x86 and other 32 bit machines, process space is
limited to 2 to 3 gig. If you have 8 gig in your system, PostgreSQL
won't be able to use it.

As I have said before, multithreading queries within a connection
process would be pretty cool, on a low load server, this could make a
big performance increase, but it may be easier to create a couple I/O
threads per connection process and devise some queuing mechanism for
disk reads/write. In essence provide an asynchronous I/O system. This
would give you the some of the performance of multithreading a query,
while not requiring a complete thread-safe implementation.

I think threading connections is a VERY bad idea. I am dubious that the
amount of work will result in a decent return on investment.

Re: Threaded PosgreSQL server

From

Thomas Lockhart

Date:

07 February 2002, 12:40:11

...
> As I have said before, multithreading queries within a connection
> process would be pretty cool, on a low load server, this could make a
> big performance increase, but it may be easier to create a couple I/O
> threads per connection process and devise some queuing mechanism for
> disk reads/write. In essence provide an asynchronous I/O system. This
> would give you the some of the performance of multithreading a query,
> while not requiring a complete thread-safe implementation.

The other use case would be a high load server with only one or a few
connections (big queries, few clients); see below.

> I think threading connections is a VERY bad idea. I am dubious that the
> amount of work will result in a decent return on investment.

Agreed. A subset area which *might* be a benefit for the use case above
is to allow threading of subqueries, which might happen after the
optimizer section of code. That is a (pretty big) fraction of the code,
not all of it, and it would still continue the benefits of the
process-per-client model while allowing a client to spread across
multiple processors.

The other area which could be exploited with restructuring to allow
post-optimizer threading is distributed databases, where each of those
subqueries could be rerouted to another server.

A first cut would be to allow read-only distributed databases; that
might demote the nomenclature for this to federated databases, but it is
still an interesting capability.
                      - Thomas

Re: Threaded PosgreSQL server

From

Date:

07 February 2002, 12:40:15

On Thu, 7 Feb 2002, mlw wrote:

> 
> Going from a "process model" to a "threaded model" is a HUGE
> undertaking. In the process model, all data is assumed to be private,
> and shared data must be explicitly shared.  In a threaded model all data
> is implicitly shared and private data must be explicitly made private.
> Do not under estimate what this means or how hard it is to convert one
> to the other.

Agreed.

> 
> Also:
> 
> Think of file handles. In a threaded version of postgreSQL, all
> connections will be competing for file handles. I think the limit in
> Linux is 1024.
> 

Yes, but because the current file manager is built with three layers of
absraction  OS FD --> Postgres Vfd --> Postgres Storage Manager it is
possible to manage and configure this very nicely.  For threaded postgres,
each thread has its own storage manager which share Vfd's to sharing max.
This prevents too many threads from trying to seek on the same OS FD.  The
Vfd's manage OS FD resources.

> All threads will be competing for memory mapping. As systems get more
> and more RAM, on the x86 and other 32 bit machines, process space is
> limited to 2 to 3 gig. If you have 8 gig in your system, PostgreSQL
> won't be able to use it.
> 

You should be able to set up several processes in shared memory for the
db.  5 processes * 256 client threads per process = 1280 clients or
something like that. 

> As I have said before, multithreading queries within a connection
> process would be pretty cool, on a low load server, 

I think this would be possible now if I knew how to spin out subqueries
from the query tree.

Myron
mkscott@sacadia.com

Re: Threaded PosgreSQL server

From

Hannu Krosing

Date:

07 February 2002, 13:50:32

On Thu, 2002-02-07 at 19:13, mlw wrote:
> Justin Clift wrote:
> 
> Also:
> 
> Think of file handles. In a threaded version of postgreSQL, all
> connections will be competing for file handles. I think the limit in
> Linux is 1024.

From what I've seen we are more likely to hit the per-system file handle
limit when all separate forks open the same files over and over again,
so as the number of processes grows we will be worse off than usin the
same file handles for all connections in threaded mode. 

> I think threading connections is a VERY bad idea. I am dubious that the
> amount of work will result in a decent return on investment.
This whole thread started with a notion that this has already been done
once and the idea was to investigate what could be brought over to main
forked-only (the threaded version could be forked at the same time)
codebase.

----------------
Hannu

Re: Threaded PosgreSQL server

From

Hannu Krosing

Date:

07 February 2002, 13:50:41

On Thu, 2002-02-07 at 12:49, Karel Zak wrote:

>  IMHO in thread version is problem with backend crash (user's bugs in 
>  PL .etc).

The current behaviour for crashing one backend is also "terminate all
backends as something bad may have happened to shared memory".

----------
Hannu

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

07 February 2002, 14:03:35

On Thu, 7 Feb 2002, mlw wrote:

> As I have said before, multithreading queries within a connection
> process would be pretty cool, on a low load server, this could make a
> big performance increase, but it may be easier to create a couple I/O
> threads per connection process and devise some queuing mechanism for
> disk reads/write. In essence provide an asynchronous I/O system. This
> would give you the some of the performance of multithreading a query,
> while not requiring a complete thread-safe implementation.
>
> I think threading connections is a VERY bad idea. I am dubious that the
> amount of work will result in a decent return on investment.

I don't believe anyone (or, at least I hope not) is advocating threading
connections ... with systems getting more and more CPUs, and more and more
RAM, what I'm advocating is looking at taking pieces from within the
connection itself and threading those, to improve performance ... from
what I can tell with Apache2 itself, there is no "thread only" model that
they are advocating ... the closest is their 'worker' where you can have
multiple connections threaded in multiple processes, so, in theory, you
could limit to a large number of threads and a very low number of
processes ...

Re: Threaded PosgreSQL server

From

Date:

07 February 2002, 14:50:11

On Thu, 7 Feb 2002, Marc G. Fournier wrote:

> 
> I don't believe anyone (or, at least I hope not) is advocating threading
> connections ... with systems getting more and more CPUs, and more and more
> RAM, what I'm advocating is looking at taking pieces from within the
> connection itself and threading those, to improve performance ... from
> what I can tell with Apache2 itself, there is no "thread only" model that
> they are advocating ... the closest is their 'worker' where you can have
> multiple connections threaded in multiple processes, so, in theory, you
> could limit to a large number of threads and a very low number of
> processes ...

Making postgres functions thread-safe increases the
flexibility of the codebase.  Whether threading connections, sub-queries,
increasing processor utilization, or some other unforseen optimization,
having reentrant and thread-safe code leaves the door open for new ideas.
Yes, writing reenterant code can be restrictive and a little more complex,
but not much, the big work is the upfront cost of porting.  I have done it
done it once and gained a great deal on projects that I am working on.

Myron
mkscott@sacadia.com

Re: Threaded PosgreSQL server

From

"D. Hageman"

Date:

07 February 2002, 15:47:27

On Thu, 7 Feb 2002, mlw wrote:
> 
> Going from a "process model" to a "threaded model" is a HUGE
> undertaking. In the process model, all data is assumed to be private,
> and shared data must be explicitly shared.  In a threaded model all data
> is implicitly shared and private data must be explicitly made private.
> Do not under estimate what this means or how hard it is to convert one
> to the other.
> 

I agree with the first and last sentance ... the rest of the paragraph is 
... well we argued this before - look in the archives.

> Also:
> 
> Think of file handles. In a threaded version of postgreSQL, all
> connections will be competing for file handles. I think the limit in
> Linux is 1024.

Depends on how it is done.

> All threads will be competing for memory mapping. As systems get more
> and more RAM, on the x86 and other 32 bit machines, process space is
> limited to 2 to 3 gig. If you have 8 gig in your system, PostgreSQL
> won't be able to use it.

Depends on how it is done.

> I think threading connections is a VERY bad idea. I am dubious that the
> amount of work will result in a decent return on investment.

Depends on how it is done.  We should be careful to assume that threading 
postgresql instantly equates to threading connections.  That is only *ONE* 
possible type of threading architecture one could choose.  Making broad 
generalized statements doesn't accomplish anything in this debate ... 
instead be more focused with your comments so one can make heads or tails 
out of them.

-- 
//========================================================\\
||  D. Hageman                    <dhageman@dracken.com>  ||
\\========================================================//

Re: Threaded PosgreSQL server

From

mlw

Date:

07 February 2002, 16:50:18

"D. Hageman" wrote:
> 
> On Thu, 7 Feb 2002, mlw wrote:
> >
> > Going from a "process model" to a "threaded model" is a HUGE
> > undertaking. In the process model, all data is assumed to be private,
> > and shared data must be explicitly shared.  In a threaded model all data
> > is implicitly shared and private data must be explicitly made private.
> > Do not under estimate what this means or how hard it is to convert one
> > to the other.
> >
> 
> I agree with the first and last sentance ... the rest of the paragraph is
> ... well we argued this before - look in the archives.

yes, I know.
> 
> > Also:
> >
> > Think of file handles. In a threaded version of postgreSQL, all
> > connections will be competing for file handles. I think the limit in
> > Linux is 1024.
> 
> Depends on how it is done.

How does it depend? If you have one process with multiple threads, you
will bump up against the process limit of file handles.

> 
> > All threads will be competing for memory mapping. As systems get more
> > and more RAM, on the x86 and other 32 bit machines, process space is
> > limited to 2 to 3 gig. If you have 8 gig in your system, PostgreSQL
> > won't be able to use it.
> 
> Depends on how it is done.

Again, How does it depend? If you have one process, there is a limit to
the amount of memory it can access. 3gig (2gig on older Windows) of
process space it is a classic limitation to x86 operating systems.

> 
> > I think threading connections is a VERY bad idea. I am dubious that the
> > amount of work will result in a decent return on investment.
> 
> Depends on how it is done.  We should be careful to assume that threading
> postgresql instantly equates to threading connections.  That is only *ONE*
> possible type of threading architecture one could choose.  Making broad
> generalized statements doesn't accomplish anything in this debate ...
> instead be more focused with your comments so one can make heads or tails
> out of them.

There are, AFAIK two reasons to thread PostgreSQL:

(1) Run the multiple connections in their own thread with the assumption
that this is more efficient for [n] reasons.
(2) Run a single query across multiple threads, thus parallelizing the
query engine.

There is a mutant of this as well: (1a)  You could have multiple
processes each with [n] connection threads.

As far as PostgreSQL is concerned, I am dubious that (1) or (1a) will
provide any real benefit for the amount of work required to accomplish
it. Work on "pre-forking" would be FAR more productive.

The idea of parallelizing queries could be very worth while. However,
that being said, creating a set of I/O threads that get blocks from disk
devices asynchronously, my be enough with a very limited amount of work.

I guess all I am saying, is that a person's time is really the only
limited resource. Tom, Bruce, Marc, Peter and everyone else have a
limited amount of time. If I could influence how those guys spend their
time, I would hope they spent time working on improving the
functionality of PostgreSQL, not the tedium of making it thread safe.

> 
> --
> //========================================================\\
> ||  D. Hageman                    <dhageman@dracken.com>  ||
> \\========================================================//
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Threaded PosgreSQL server

From

"D. Hageman"

Date:

07 February 2002, 17:20:43

On Thu, 7 Feb 2002, mlw wrote:

<SNIP a bunch crap that will hopefully be implicitly explained 
and understand by the comments below>

> There are, AFAIK two reasons to thread PostgreSQL:
> 
> (1) Run the multiple connections in their own thread with the assumption
> that this is more efficient for [n] reasons.
> (2) Run a single query across multiple threads, thus parallelizing the
> query engine.

(3) Parallelize house keeping (for example vacuums) of the database.  I 
think they are going to call this processes or something slated for the 
next version? 

(4) Replication

(5) Referential Integritity cleanups

(6) EXOTIC FEATURES: crossdb

Oh yeah ... and we might be able to drop the whole startup time section 
from the TODO list.  It all depends on how one wants to implement the 
threads into postgresql.  Then again ... maybe a task of this endeavor 
would be more appropriately forked off and proceeded on as a seperate 
project (as it kinda as already been done).

> I guess all I am saying, is that a person's time is really the only
> limited resource. Tom, Bruce, Marc, Peter and everyone else have a
> limited amount of time. If I could influence how those guys spend their
> time, I would hope they spent time working on improving the
> functionality of PostgreSQL, not the tedium of making it thread safe.

The people that do the biggest amount of coding should definately code 
what they feel is the best to work on - NO one is arguing that.  If a few 
of them want to assist in this endeavor then they should do that as well.  
Most importantly - we shouldn't belittle the efforts of those that do see 
the vision of how this could be beneficial in the long run.  My point is 
that I see more people wasting time complaining then it would take to make 
up a list of coding practices to follow for future work that will make the 
postgresql code base better.  (Come on ... the first thing a programmer is 
taught is that global variables are BAD).

-- 
//========================================================\\
||  D. Hageman                    <dhageman@dracken.com>  ||
\\========================================================//

Re: Threaded PosgreSQL server

From

Tom Lane

Date:

07 February 2002, 17:50:02

"D. Hageman" <dhageman@dracken.com> writes:
> (Come on ... the first thing a programmer is 
> taught is that global variables are BAD).

Reality check time: I don't believe there are very many
gratuitously-static variables in the backend.  Most of the ones I can
think of offhand are associated with data structures that are actually
global, or at least would be of interest to more than one thread.
(For example, the catcache/relcache data structures are referenced from
static variables.  You would very likely want these caches to be shared
across as many threads as possible.  The data structures associated with
configuration variables would need to be shared by all threads executing
on behalf of a particular client connection.  Etc.)  So the hard part of
making the code "thread safe" is figuring out what we want to do with
potentially-sharable data structures: can they be shared, if so across
what scope, and what sort of locking penalty will we pay for sharing
them?

Maybe I'm missing something, but I don't think that a "coding practices"
document will do much of anything to improve our threading situation.
It might be worth having on other grounds, but not that one.
        regards, tom lane

Re: Threaded PosgreSQL server

From

Doug McNaught

Date:

07 February 2002, 17:50:02

"D. Hageman" <dhageman@dracken.com> writes:

> (3) Parallelize house keeping (for example vacuums) of the database.  I 
> think they are going to call this processes or something slated for the 
> next version? 
> 
> (4) Replication
> 
> (5) Referential Integritity cleanups
> 
> (6) EXOTIC FEATURES: crossdb

I fail to see how threads are required for any of these.  They could
just as well be done with a separate process(es) in the current model.

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863

Re: Threaded PosgreSQL server

From

"D. Hageman"

Date:

07 February 2002, 18:40:22

On 7 Feb 2002, Doug McNaught wrote:

> "D. Hageman" <dhageman@dracken.com> writes:
> 
> > (3) Parallelize house keeping (for example vacuums) of the database.  I 
> > think they are going to call this processes or something slated for the 
> > next version? 
> > 
> > (4) Replication
> > 
> > (5) Referential Integritity cleanups
> > 
> > (6) EXOTIC FEATURES: crossdb
> 
> I fail to see how threads are required for any of these.  They could
> just as well be done with a separate process(es) in the current model.
> 

Oh, I didn't realize the conversation was about what threads was 
"required" for completing.  My mistake ... *cough* *cough*

-- 
//========================================================\\
||  D. Hageman                    <dhageman@dracken.com>  ||
\\========================================================//

Re: Threaded PosgreSQL server

From

"D. Hageman"

Date:

07 February 2002, 18:51:18

On Thu, 7 Feb 2002, Tom Lane wrote:

> 
> Maybe I'm missing something, but I don't think that a "coding practices"
> document will do much of anything to improve our threading situation.
> It might be worth having on other grounds, but not that one.
> 

You aren't missing anything.  A document of coding practices with points 
on using thread-safe functions etc. isn't going to revolutionize anything.  
However, it has the potential of being the best way to begin and soften 
the cries of the luddites (which is the biggest problem at the momment).

-- 
//========================================================\\
||  D. Hageman                    <dhageman@dracken.com>  ||
\\========================================================//

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

07 February 2002, 20:11:30

On Thu, 7 Feb 2002, mlw wrote:

> How does it depend? If you have one process with multiple threads, you
> will bump up against the process limit of file handles.

So?  Use an OS that doesn't impose such limits, or lets you increase them?

> Again, How does it depend? If you have one process, there is a limit to
> the amount of memory it can access. 3gig (2gig on older Windows) of
> process space it is a classic limitation to x86 operating systems.

But, we aren't talking about one *big* process with many threads ... we
are talking several processes that make use of threads to speed up various
processes ... kinda like programming in C for 99% of a project, but going
to assembly for stuff that could use that little bit of a boost ...

> I guess all I am saying, is that a person's time is really the only
> limited resource. Tom, Bruce, Marc, Peter and everyone else have a
> limited amount of time. If I could influence how those guys spend their
> time, I would hope they spent time working on improving the
> functionality of PostgreSQL, not the tedium of making it thread safe.

Except that, as several ppl have pointed out, that 'tedium' could result
in functionality that we really don't have right now ... right now, with a
"non-threaded, single process per connection", you really aren't making
*as efficient of use* of a multi-CPU environment ... how many queries
spend a good deal of time sitting in an I/O wait state because it has to
wait untli all the data is read from the drive before it can start
processing?  Going to a large database/application, on a Quad+ server,
where you don't have *alot* of queries happening, but those that do are
*very* large ... that large query is currently stuck on the one CPU while
the other 3+ CPUs are sitting idle ... etc, etc ... there is functionality
that 'working around' in a non-threaded environment would be more tedious
then doing the code clean up itself, and, most likely, not near as
efficient as it could be ...

The first step has to be taken *sometime*, and best to encourage it while
we have ppl around that have the *knowledge* to take it ... god, I can
remember when doing the code cleanups to get configure integrated into our
build process (there was a time where configure didn't exist) was a
tedious process, but how many ppl out there could imagine us without it?

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

07 February 2002, 20:40:13

On Thu, 7 Feb 2002 mkscott@sacadia.com wrote:

> Making postgres functions thread-safe increases the flexibility of the
> codebase.  Whether threading connections, sub-queries, increasing
> processor utilization, or some other unforseen optimization, having
> reentrant and thread-safe code leaves the door open for new ideas. Yes,
> writing reenterant code can be restrictive and a little more complex,
> but not much, the big work is the upfront cost of porting.  I have done
> it done it once and gained a great deal on projects that I am working
> on.

Would be willing to take what you've learnt and work with the current CVS
tree towards making her thread-safe?  Even small steps regularly taken
brings us closer to being able to use even *some* threading in the backend
...

Re: Threaded PosgreSQL server

From

Lincoln Yeoh

Date:

07 February 2002, 22:40:26

At 04:39 PM 07-02-2002 -0500, mlw wrote:
>
>There are, AFAIK two reasons to thread PostgreSQL:
>
>(1) Run the multiple connections in their own thread with the assumption
>that this is more efficient for [n] reasons.
>(2) Run a single query across multiple threads, thus parallelizing the
>query engine.
>
>There is a mutant of this as well: (1a)  You could have multiple
>processes each with [n] connection threads.
>
>As far as PostgreSQL is concerned, I am dubious that (1) or (1a) will
>provide any real benefit for the amount of work required to accomplish
>it. Work on "pre-forking" would be FAR more productive.
>
>The idea of parallelizing queries could be very worth while. However,
>that being said, creating a set of I/O threads that get blocks from disk
>devices asynchronously, my be enough with a very limited amount of work.

2) seems to be the only good argument for threads so far. 1) may only be
true on certain O/Ses.

That said, are those large single queries typically CPU bound or IO bound
or neither?

If they are IO bound then given my limited understanding it is not easy to
see how spreading the query over additional CPUs is going to help.

I suggest that work on clustering postgresql may result in a more scalable
general solution than threaded postgresql. Looks to be more difficult, but
the benefits seem more tangible.

Cheerio,
Link.

Re: Threaded PosgreSQL server

From

Date:

08 February 2002, 02:41:19

On Thu, 7 Feb 2002, Marc G. Fournier wrote:

> 
> Would be willing to take what you've learnt and work with the current CVS
> tree towards making her thread-safe?  Even small steps regularly taken
> brings us closer to being able to use even *some* threading in the backend
> ...
> 

I can definitely take a stab aat it.  Maybe I can make a test case with
some globals that are accessed often submit some patches to see what
people think.  Can I send them to you?

Myron
mkscott@sacadia.com

Re: Threaded PosgreSQL server

From

"Christopher Kings-Lynne"

Date:

08 February 2002, 03:20:04

> I can definitely take a stab aat it.  Maybe I can make a test case with
> some globals that are accessed often submit some patches to see what
> people think.  Can I send them to you?

Maybe we should assign someone (or a team) to be the 'thread strike force'.
Their job is to (at their leisure) tidy up various parts of the source code
in such a way that they should not affect other parts.  This should be done
during the release cycle, so there is plenty of time to test their changes.

Then, once the whole source tree has had its stylistic improvements, it
would become easier to switch to a threaded/mpm model...

Chris

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

08 February 2002, 08:20:50

On Thu, 7 Feb 2002 mkscott@sacadia.com wrote:

>
>
>
> On Thu, 7 Feb 2002, Marc G. Fournier wrote:
>
> >
> > Would be willing to take what you've learnt and work with the current CVS
> > tree towards making her thread-safe?  Even small steps regularly taken
> > brings us closer to being able to use even *some* threading in the backend
> > ...
> >
>
> I can definitely take a stab aat it.  Maybe I can make a test case with
> some globals that are accessed often submit some patches to see what
> people think.  Can I send them to you?

Send them through to pgsql-patches@postgresql.org ... since we are right
at the start of the development cycle for v7.3, things should be alot
easier ... pretty much expect to send them in, have them reviewed and
commented upon by various developers as to how this shold be done this
way, and that shouldn't be done this way and have to re-submit ... :)

Re: Threaded PosgreSQL server

From

"Marc G. Fournier"

Date:

08 February 2002, 08:20:51

On Fri, 8 Feb 2002, Christopher Kings-Lynne wrote:

> > I can definitely take a stab aat it.  Maybe I can make a test case with
> > some globals that are accessed often submit some patches to see what
> > people think.  Can I send them to you?
>
> Maybe we should assign someone (or a team) to be the 'thread strike force'.
> Their job is to (at their leisure) tidy up various parts of the source code
> in such a way that they should not affect other parts.  This should be done
> during the release cycle, so there is plenty of time to test their changes.
>
> Then, once the whole source tree has had its stylistic improvements, it
> would become easier to switch to a threaded/mpm model...

Woo hoo, he caught up with the thread *grin* *poke*

Yes, this is exactly what we've been discussing, while some have been
trying to tangent off onto side threads ...

Re: Threaded PosgreSQL server

From

Justin Clift

Date:

08 February 2002, 08:50:50

"Marc G. Fournier" wrote:
> 
<snip>
> 
> Woo hoo, he caught up with the thread *grin* *poke*
> 
> Yes, this is exactly what we've been discussing, while some have been
> trying to tangent off onto side threads ...

I feel this would benefit from some kind of PostgreSQL specific guide
for new coders to follow.  Doesn't have to be overdone, but it should at
least give people an idea of what stuff to keep in mind when coding.

???

Regards and best wishes,

Justin Clift

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi

Re: Threaded PosgreSQL server

From

Tom Lane

Date:

08 February 2002, 11:25:09

<mkscott@sacadia.com> writes:
> I can definitely take a stab aat it.  Maybe I can make a test case with
> some globals that are accessed often submit some patches to see what
> people think.  Can I send them to you?

I have a sneaking feeling that what you are going to come up with is a
multi-megabyte patch to convert CurrentMemoryContext into a non-global,
which will require changing the parameter list of damn near every
routine in the backend.

Personally I will vote for rejecting such a patch, as it will uglify the
code (and break nearly all existing user-written extension functions)
far more than is justified by what it accomplishes: exactly zero, in
terms of near-term usefulness.

I think what's more interesting to discuss at this stage is the
considerations I alluded to before: what are we going to do with the
caches and other potentially-sharable datastructures?  Without a
credible design for those issues, there is no point in sweating the
small-but-annoying stuff.
        regards, tom lane

Re: Threaded PosgreSQL server

From

Karel Zak

Date:

08 February 2002, 12:21:03

On Fri, Feb 08, 2002 at 11:17:51AM -0500, Tom Lane wrote:
> <mkscott@sacadia.com> writes:
> > I can definitely take a stab aat it.  Maybe I can make a test case with
> > some globals that are accessed often submit some patches to see what
> > people think.  Can I send them to you?
> 
> I have a sneaking feeling that what you are going to come up with is a
> multi-megabyte patch to convert CurrentMemoryContext into a non-global,
> which will require changing the parameter list of damn near every
> routine in the backend.
Sorry I not too careful watch this discussion, but if I see thatyou are talking about PostgreSQL memory management and
threadsI have have a note.

I and Dan Horak one year work on Mape project (http://mape.jcu.cz) and we already have ported good postgres memory
managementinto thread daemon. It works very well and it's transparend solution -- you not must rewrite routines that
useMamoryContextSwitchTo or palloc() and other stuff, because everything is based on thread-specific contexts (see man
aboutpthread_key_create). With this solution you not must change to muchthings in current code.       Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz,
http://mape.jcu.cz

Re: Threaded PosgreSQL server

From

Date:

08 February 2002, 13:28:18

On Fri, 8 Feb 2002, Tom Lane wrote:

> I have a sneaking feeling that what you are going to come up with is a
> multi-megabyte patch to convert CurrentMemoryContext into a non-global,
> which will require changing the parameter list of damn near every
> routine in the backend.

While working with 7.0.2, I changed the call signature on only about 10
functions.  In the MemoryContext example,
MemorycontextSwitchTo(<Any>MemoryContext) turned into 
MemoryContextSwitchTo(GetEnv()-><Any>MemoryContext).  You may be able 
to do this with a #define.  While profiling the
code, this actually had very little impact on CPU resources.  There were
some hotspots where it made more sense to pass the global environment to
the function but the list is small.

> 
> Personally I will vote for rejecting such a patch, as it will uglify the
> code (and break nearly all existing user-written extension functions)
> far more than is justified by what it accomplishes: exactly zero, in
> terms of near-term usefulness.

I don't think that user functions need be broken.  As long as they use
palloc, a recompile may be all that is needed.

> 
> I think what's more interesting to discuss at this stage is the
> considerations I alluded to before: what are we going to do with the
> caches and other potentially-sharable datastructures?  Without a
> credible design for those issues, there is no point in sweating the
> small-but-annoying stuff.

As far as caches go, I punted on sharing.  Controlling access to the cache
hash tables looked like alot of work and I thought the contention for this
resource would be high.  So I had each thread build separate cache
structures.  The one difference was I had the original cache build occur
from memory rather than the file pg_internal.init.  So when the first
thread for a particular db is built,  the cache structures are built in
system memory and copied into the appropriate MemoryContext.  Each
subsequent cache for the db is copied from main memory at thread build.

One place where sharing worked great was the file manager.  I modified
md.c to share Vfd's. I made the maximum number of threads that could share
one Vfd configurable so that the number of Vfds created and the contention
to those Vfd's could be balanced.

It seems obvious to me that we need to thread slowly and softly into this
area so I promise I will not to spend a ton of time mangling the whole CVS
tree, 
that most definitely, would be a waste of everybody's time. I think I can
find an example area that will be a small patch and submit it for review.
Hopefully this can get the ball rolling.

Myron
mkscott@sacadia.com

Re: Threaded PosgreSQL server

From

Tom Lane

Date:

08 February 2002, 13:52:58

<mkscott@sacadia.com> writes:
> On Fri, 8 Feb 2002, Tom Lane wrote:
>> I have a sneaking feeling that what you are going to come up with is a
>> multi-megabyte patch to convert CurrentMemoryContext into a non-global,
>> which will require changing the parameter list of damn near every
>> routine in the backend.

> While working with 7.0.2, I changed the call signature on only about 10
> functions.  In the MemoryContext example,
> MemorycontextSwitchTo(<Any>MemoryContext) turned into 
> MemoryContextSwitchTo(GetEnv()-><Any>MemoryContext).  You may be able 
> to do this with a #define.

Oh, I see.  Okay, if we can hide the messiness inside #define's then it
might not be as bad as I was expecting.  That'd also allow the overhead
to be compiled away when we didn't need/want thread support, which'd be
even nicer.
        regards, tom lane