Its not my fault. Its SEG's FAULT! - Mailing list pgsql-hackers

From dg@illustra.com (David Gould)
Subject Its not my fault. Its SEG's FAULT!
Date
Msg-id 9804020844.AA10153@hawk.illustra.com
Whole thread Raw
List pgsql-hackers
> Maurice:
> Sorry to bring bad news but it seems that the postgresql daemon is leaking
> memory when building indices.
> (When using electric fence it takes long enough to notice -:))
>
> Anybody want to recommend a good freeware tool which helps to find memory
> leaks?



> Bruce:
> Yea, as I reported earlier, it is probably all from the same place.  I
> used a pginterface C file do show it.  I think we need Purify.



> Vadim:
> Chris Albertson wrote:
> >
> > This is just one example.  ++Every time++ I do a SELECT where
> > the expected result is a large number of rows I get a
> > failure of some type.
> >
> > testdb=> select count(*) from tassm16
> > testdb-> where 15.5 < b_mag::float4 - (0.375 * (b_mag -
> >          r_mag)::float4);
> > FATAL 1:  palloc failure: memory exhausted
> > testdb=>
> >
> > I can make Postgresql 6.3 fail every time.  Just do a SELECT
> > where the number of rows returned is > a few million.  The
>
> 0.375 above is float8 and so server uses two float8 funcs to
> calculate right op of '<' ==> 2 * palloc(8) for each row.
> palloc(8) eats more than 8 bytes of memmory (~ 24): 2 * 24 = 48,
> 48 * 1_million_of_rows = 48 Mb.
>
> This is problem of all data types passed by ref !!!
> And this is old problem.


I have been doing some thought about memory allocation in postgresql
so the above messages are very timely (at least to me).

I would like to discuss the idea of replacing the current scheme of
explicit memory allocation and and deallocation from separate
"MemoryDuration" pools with a conservative garbage collector.

For more information about garbage collection in general and about the
specific collector I am proposing see these urls:

  GC FAQ and links
    http://www.iecc.com/gclist/GC-faq.html

  Boehm-Weiser Collector
    http://reality.sgi.com/employees/boehm_mti/gc.html

Rationale:

From my experience, I think that this is a stone cold win for postgresql. I
have of course, no before and afternumbers (yet) to back this up, but the
following are I think true:

- MemoryDuration is really "poor mans garbage collection" anyway. The idea
  being that it is either a) hard, or b) impossible to arrange to free your
  memory at the right time. Some objects need to last for a session, some
  for a transaction, some for a statement, some for a function, and some
  for other indeterminate durations. The duration mechanism is meant to
  help free things at the right time. Arguably it almost but not quite
  works.

- MemoryDuration adds complexity to the code. A quick skim of the code base
  will show that duration gets switched all over the place often for not
  very obvious reasons. An example is a function allocateing working memory
  (per function duration) and returning an allocated result (per statement
  duration). This makes writing both server code and extension code much
  harder than need be.


- MemoryDuration is very costly in terms of space:

a)  A function returning an allocated object returns memory that cannot be
    freed until the end of the statement. But as observed, often this memory
    is really needed only "per row". When millions of rows are involved, this
    gets pretty piggy (they just don't make swap partitions big enough...).

b)  Each chunk of allocated memory has overhead of 8 to 16 bytes of
    bookkeeping overhead to link it to the MemoryContext structure that
    controls the duration. This is in addition to whatever overhead (often
    8 bytes) imposed by the underlying malloc() for boundary tags or size
    and arena information. Each allocation then has 16 to 24 bytes of
    overhead.

    The statistics I have seen for a derivative of postgres say that 86% of
    all allocations are 64 bytes or less. 75% are 32 bytes or less, and 43%
    are less than 16 bytes. This suggests that allocator overhead about
    doubles the storage needed.

c)  But it is really quite a lot worse than this. As noted above, memory
    is not freed for reuse in a timely way but accumulated until the end
    of the memory duration (ie statement or transaction). This is the
    usual reason for running out of memory in a large query. Additionaly,
    the division of memory into separate pools creates extra fragmentation
    which can only make matters even worse.


- MemoryDuration is very costly in terms of speed:

a)  Some profiling (with Quantify) I have done on a derivative of postgres
    show that for even simple queries return only one row like
       "select * from table where key = <unique_value>;"
    spend about 25% to 30% of their total execution time in malloc(), free(),
    or one of the MemoryContext routines. This probably understates the case
    since it is based on instruction counting, not actual time and the
    "chase a big list of pointers" operation in MemoryContextDestroy() is
    almost guaranteed to have nasty cache behavior.

b)  There is quite a bit of bookeeping code all over the system (to support
    releaseing memory after an error etc). This is in heavily trafficced
    paths. Since is so widely distributed it is very hard to measure the
    slowdown, but there certainly is some. This could all be removed if we
    had garbage collection (although this in itself would be a big job).


- MemoryDuration is very costly in terms of correctness and stability:

  I am not going to say much here except to point out the number of
  freed pointer errors and memory leaks that have been found in the code
  to date. And that there are new ones in every release. And that I have
  spent a good part of the last three years chasing leaks out of a
  similar system, and no, I am not done yet.

  The very existance of companies and products like Purify should be a
  tipoff. There is no practical way to write large C programs with
  dynamic storage without significant storage management problems.


- MemoryDuration is very costly in terms of development cost:

  First, there are huge testing and debugging costs associated with
  manual storage management. This is basically all waste motion and a
  royal pain.

  Even more importantly, code written knowing that there is garbage
  collection tends to have about substantially fewer source statements.
  A typical case is a routine that allocates several things and operates
  on them and checks for failures:

/* non garbage collected example
*/
dothing(...)
{
    if ((p1 = malloc(...)) == NULL)
        return ERR;
    ...
    if ((p2 = malloc(...)) == NULL) {
        free(p1);
        return ERR;
    }
    ...
    if ((p3 = malloc(...)) == NULL) {
        free(p2);
        free(p1);
        return ERR;
    }
    ...
    if ((p4 = malloc(...)) == NULL) {
        free(p3);
        free(p2);
        free(p1);
        return ERR;
    }
    if ((info = do_the_wild_thing(p1, p2,p3, p4)) == ERR) {
        free(p4)
        free(p3);
        free(p2);
        free(p1);
        return ERR;
    }
    ...
    free(info)
    free(p4)
    free(p3);
    free(p2);
    free(p1);
    return OK;
}



/* same example only this time with garbage collection
 */
dothing(...)
{
    if ((p1 = malloc(...)) == NULL)
        return ERR;
    ...
    if ((p2 = malloc(...)) == NULL)
        return ERR;
    ...
    if ((p3 = malloc(...)) == NULL)
        return ERR;
    ...
    if ((p4 = malloc(...)) == NULL)
        return ERR;
    ...
    if ((info = do_the_wild_thing(p1, p2,p3, p4)) == ERR)
        return ERR;
    ...
    return OK;
}


I know which one I would rather write! And it is fairly obvious which one
is more likely to work.


This is probably to long for one post, so I will stop now.

I would very much like comments and suggestions on this topic, especially if
this is something you have thought about or have experience with.

Unsupported assertions to the effect "GC is too slow ... only works with
lisp ..." etc are ok too, but will be eligible to win valuable prizes.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"A week of coding can sometimes save an hour of thought."

pgsql-hackers by date:

Previous
From: "Vadim B. Mikheev"
Date:
Subject: Re: [HACKERS] Open 6.3.1 issues
Next
From: serj
Date:
Subject: 6.3.1 patches