Thread: Forking vs. Threading

Forking vs. Threading

From
"Bryan Encina"
Date:
Below is a post from the Fedora mailing list as to why one of the users
recommends Firebird over PostgreSQL (this thread came up because of
questions regarding MySQL licensing) and I was wondering if someone from
pgsql advocacy had any comments on this or would like to respond.

Sorry for top posting, but I didn't want to ruin any formatting.

-b

--------------------------------
I like PostgreSQL as far as it's simplicity and things go.  It's nice,
and there are some good front ends for it.  The one complaint I have
with Postgres is that it forks.

MySQL and Firebird use threads and Postgres forks.  Forking is ok,
unless you have many database connections.  The more connections the
more processes.  I noticed while profiling an application that every
connection alone was taking over 1MB of memory.  This based on the
process per connection gripe I have.

So, Postgres, sure I like it, but as far as a major DBMS goes, I think
it is limited by it's memory usage.  That's just my opinion on the
matter.  However, it is a fact that it forks (forking takes more time
and more resources than threading).  One benefit in forking is the same
reason Apache forks( memory leaks can be minimized).  However, I think
if a DBMS has that bad of a memory leak....I won't use it.

I like to advocate Firebird as much as possible.  It runs on many
platforms and seems to be pretty scalable as far as connections and
usage goes, and it has a very flexible license as well.

I like all three mentioned DBMS, just different reasons for using them
at different times.

Wade


Re: Forking vs. Threading

From
Shridhar Daithankar
Date:
Bryan Encina wrote:

> Below is a post from the Fedora mailing list as to why one of the users
> recommends Firebird over PostgreSQL (this thread came up because of
> questions regarding MySQL licensing) and I was wondering if someone from
> pgsql advocacy had any comments on this or would like to respond.
>
> Sorry for top posting, but I didn't want to ruin any formatting.
>
> -b
>
> --------------------------------
> I like PostgreSQL as far as it's simplicity and things go.  It's nice,
> and there are some good front ends for it.  The one complaint I have
> with Postgres is that it forks.
>
> MySQL and Firebird use threads and Postgres forks.  Forking is ok,
> unless you have many database connections.  The more connections the
> more processes.  I noticed while profiling an application that every
> connection alone was taking over 1MB of memory.  This based on the
> process per connection gripe I have.

Umm.. Notcied how much of that 1MB is shared?

> So, Postgres, sure I like it, but as far as a major DBMS goes, I think
> it is limited by it's memory usage.  That's just my opinion on the
> matter.  However, it is a fact that it forks (forking takes more time
> and more resources than threading).  One benefit in forking is the same
> reason Apache forks( memory leaks can be minimized).  However, I think
> if a DBMS has that bad of a memory leak....I won't use it.

Postgresql uses fork because threading isn't as stable as forks on all the
platforms it runs on. Furthermore when postgresql started it was definitely not..

Forking isn't as major issue on linux as it could be on say solaris. However
connection setup time is rather tiny amount of what a database does actually.
Isn't it?

>
> I like to advocate Firebird as much as possible.  It runs on many
> platforms and seems to be pretty scalable as far as connections and
> usage goes, and it has a very flexible license as well.

Firebird does have an edge over postgresql in terms of native windows port.
Nobody denies that..

>
> I like all three mentioned DBMS, just different reasons for using them
> at different times.

Certainly..

  Shridhar

Re: Forking vs. Threading

From
Neil Conway
Date:
On 19-Mar-04, at 9:08 AM, Shridhar Daithankar wrote:
> Bryan Encina wrote:
>> MySQL and Firebird use threads and Postgres forks.  Forking is ok,
>> unless you have many database connections.  The more connections the
>> more processes.  I noticed while profiling an application that every
>> connection alone was taking over 1MB of memory.  This based on the
>> process per connection gripe I have.
>
> Umm.. Notcied how much of that 1MB is shared?

Also, I'd expect that the amount of memory that a busy installation
should be devoting to caching I/O (whether done by the DBMS or the
kernel) will dwarf the amount of memory each backend has allocated
privately.

Assuming the kernel implements COW (which is reasonable, of course), I
don't think the overall difference in memory footprint should be very
significant -- or rather, if it is, it results from something other
than the choice between threads and fork().

-Neil


Re: Forking vs. Threading

From
Enrico Weigelt
Date:
* Bryan Encina <bryan.encina@valleypres.org> wrote:

<big_snip>
> MySQL and Firebird use threads and Postgres forks.  Forking is ok,
> unless you have many database connections.  The more connections the
> more processes.  I noticed while profiling an application that every
> connection alone was taking over 1MB of memory.  This based on the
> process per connection gripe I have.

Thats not a noticable problem.

Fork() of course takes a while, but at least on Linux it is not as
expensive as one might expect (like, lets say, windows). On Linux
(up to 2.4) creating a new thread is exactly forking, but w/o copying
segment descriptors and fd's. On a normal fork(), most things are shared
w/ copy-on-write - so a fork is practically not more expensive than
pthread_create().
Well, thats at least the case for linux up to 2.4, for other platforms
I cant say much.

With forked processes you have separated data pages, so you dont have
to take care of locking of stuff which isn't really in shared memory.
This makes coder's life much easier (less errors), saves much time
consumed by locking otherwise and also allows to put many things directly
into the data segment, thus better performance again (on various hw
architectures, direct+absolute addressing is much faster than indirect)

> So, Postgres, sure I like it, but as far as a major DBMS goes, I think
> it is limited by it's memory usage.  That's just my opinion on the
Every RDBMS is limited by its memory usage :)
Such things just take much memory (if the should be fast). Assuming,
modern kernels do copy-on-write on/after fork, it shouldnt be such a issue.

> matter.  However, it is a fact that it forks (forking takes more time
> and more resources than threading).  One benefit in forking is the same
Does it really ?
I cant imagine, where so much time goes by on fork() (at least on linux).

> reason Apache forks( memory leaks can be minimized).  However, I think
> if a DBMS has that bad of a memory leak....I won't use it.
Well, psql *did* have a memleak (IMHO it was 7.1 which crashed on some
special kind of rewrite rules), but its long ago :)
But thats not the (only) point.
Using different processes has also some other advantages (as stated above).
Since the MMU of modern CPUs can do memory protection, we should use it :)

If you talk about the apache httpd, we shouldn't forget, that there're
several MPMs out there: i.e. prefork + perchild for process-based,
threadpool + worker for thread-based. Both have pros and cons.
The MT-MPMs are sometimes a little bit faster, since they reuse many
resources, i.e. let one thread handle multiple requests or have some
spare threads laying around waiting for connects.
But this can also be done with an MP-architecture.

Probably this shouldn't be such an hard job for implementing it in psql.

> I like to advocate Firebird as much as possible.  It runs on many
> platforms and seems to be pretty scalable as far as connections and
> usage goes, and it has a very flexible license as well.
Flexible license ?
I didn't read the text yet, but from what I heared, I dont really know
what I'm allowed to do and what not. Couldn't they just use some
established and well-known license ?


cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service

  phone:     +49 36207 519931         www:       http://www.metux.de/
  fax:       +49 36207 519932         email:     contact@metux.de
  cellphone: +49 174 7066481
---------------------------------------------------------------------
 -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------

Re: Forking vs. Threading

From
Enrico Weigelt
Date:
* Shridhar Daithankar <shridhar@frodo.hserus.net> wrote:

<snip>
> However connection setup time is rather tiny amount of what a database does
> actually. Isn't it?
well, mysql is told to be much faster in startup. (didnt really test it).
probably this could be solved w/ a prefork mechanism in psql.


cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service

  phone:     +49 36207 519931         www:       http://www.metux.de/
  fax:       +49 36207 519932         email:     contact@metux.de
  cellphone: +49 174 7066481
---------------------------------------------------------------------
 -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------