Re: [HACKERS] Its not my fault. Its SEG's FAULT! - Mailing list pgsql-hackers

From dg@illustra.com (David Gould)
Subject Re: [HACKERS] Its not my fault. Its SEG's FAULT!
Date
Msg-id 9804030707.AA12212@hawk.illustra.com
Whole thread Raw
In response to Re: [HACKERS] Its not my fault. Its SEG's FAULT!  ("Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>)
List pgsql-hackers
Vadim:
> I agreed with Maurice.
> Using GC instead of MemoryDuration everywhere isn't good idea for
> database server.

Why? Please state your reasons for this claim.

> But we could implement additional GC-like allocation mode and use it
> where is appropriate!

This assumes that there is a "where it is not appropriate". My contention
is that it is generally appropriate. So my question must be, where is it
not appropriate and why?

> One example - using float8 (etc) in WHERE. We could switch to GC-allocation
> in the beginnig of ExecQual () and destroy all allocations made in GC-mode
> before return().
>
> Another example - psort.c! With -S 8192 I see that server uses ~ 30M
> of memory - due to malloc/palloc overhead in palloc() for each tuple.
> No one of these allocations will be freed untill psort_end() <-
> good place for GC-destroyer.

The examples you give are certainly places where a GC would be very very
useful.  But, I think restricting the GC to cover only some allocations
would lose most of the benifit of using a GC altogether.

First, the entire heap and stack have to be scanned as part of the root
set in either case. However your proposal only lets the collector free
some of the garbage identified in that scan. This has the effect of making
the cost of each bit of reclaimed storage higher than it would be in the
general case. That is, the cost of a collection remains the same, but less
storage would be freed by each collection.

Second, one of the reasons a GC can be faster that explicit allocation /
deallocation is that it frees the rest of the system from doing bookeeping
work. A half-and-half system does not get this benifit.

PostgreSQL is I think an especially good candidate to use a GC as the overall
complexity of the system makes it very hard to determine the real lifetime of
any particular allocation. This is why we have the complex MemoryDuration
system that we currently have. This is also why we have the leaks and vast
storage requirements that we have.

Finally, my main reason for suggesting GC is stabilty and correctness. With
an effective GC, many many bugs simply never get the chance to exist at all.

A GC would likewise make the business of writing loadable functions for new
types etc much simpler and less error prone.

Did you have a chance to review the links I sent in the earlier posting?
Some of the papers referenced there are quite interesting, particularly
the Zorn papers on the real cost of explicit storage allocation.

-dg


David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
If simplicity worked, the world would be overrun with insects.

pgsql-hackers by date:

Previous
From: dg@illustra.com (David Gould)
Date:
Subject: Re: [HACKERS] Everything leaks; How it mm suppose to work?
Next
From: "Maurice Gittens"
Date:
Subject: Re: [HACKERS] inherited sequences and primary keys