Re: [HACKERS] vacuum process size - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] vacuum process size
Date
Msg-id 13993.935155147@sss.pgh.pa.us
Whole thread Raw
In response to RE: [HACKERS] vacuum process size  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
List pgsql-hackers
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> I found the following comment in utils/mmgr/aset.c.
> The high memory usage of big vacuum is probably caused by this
> change.

AFAIK, there is no "change" there.  free() doesn't give memory
back to the kernel either.

> Calling repalloc() many times with its size parameter increasing
> would need large amount of memory.

Good point, because aset.c doesn't coalesce adjacent free chunks.
And of course, reallocating the block bigger and bigger is exactly
the usual behavior with realloc-using code :-(

I don't think it would be a good idea to add coalescing logic to aset.c
--- that'd defeat the purpose of building a small/simple/fast allocator.

Perhaps for large standalone chunks (those that AllocSetAlloc made an
entire separate block for), AllocSetFree should free() the block instead
of putting the chunk on its own freelist.  Assuming that malloc/free are
smart enough to coalesce adjacent blocks, that would prevent the bad
behavior from recurring once the request size gets past
ALLOC_SMALLCHUNK_LIMIT, and for small requests we don't care.

But it doesn't look like there is any cheap way to detect that a chunk
being freed takes up all of its block.  We'd have to mark it specially
somehow.  A kluge that comes to mind is to set the chunk->size to zero
when it is a standalone allocation.

I believe Jan designed the current aset.c logic.  Jan, any comments?

> Should vacuum call realloc() directly ?

Not unless you like *permanent* memory leaks instead of transient ones.
Consider what will happen at elog().

However, another possible solution is to redesign the data structure
in vacuum() so that it can be made up of multiple allocation blocks,
rather than insisting that all the array entries always be consecutive.
Then it wouldn't depend on repalloc at all.  On the whole I like that
idea better --- even if repalloc can be fixed not to waste memory, it
still implies copying large amounts of data around for no purpose.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Ansley, Michael"
Date:
Subject: RE: [HACKERS] Postgres' lexer
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Re: Bug#43221: postgresql: When disk is full, insert corrupts indices