Home > mailing lists

Re: directory archive format for pg_dump - Mailing list pgsql-hackers

From	Joachim Wieland
Subject	Re: directory archive format for pg_dump
Date	December 1, 2010 22:35:57
Msg-id	AANLkTinauQsCGt-yeyeN_QWuN+tao+q126rwKnO2QWLP@mail.gmail.com Whole thread Raw
In response to	Re: directory archive format for pg_dump (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses	Re: directory archive format for pg_dump
List	pgsql-hackers

Tree view

On Wed, Dec 1, 2010 at 9:05 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Forgot attachment. This is also available in the above git repo.

I have quickly checked your modifications, on the one hand I like the
reduction of functions, I would have said that we have AH around all
the time and so we could just allocate once and stuff it all into
ctx->cs and reuse the buffers for every object, but re-allocating them
for every (dumpable) object should be fine as well.

Regarding the function pointers that you removed, you are now putting
back in what Dimitri wanted me to take out, namely switch/case
instructions for the algorithms and then #ifdefs for every algorithm.
It's not too many now since we have taken out LZF. Well, I can live
with both ways.

There is one thing however that I am not in favor of, which is the
removal of the "sizeHint" parameter for the read functions. The reason
for this parameter is not very clear now without LZF but I have tried
to put in a few comments to explain the situation (which you have
taken out as well :-) ).

The point is that zlib is a stream based compression algorithm, you
just stuff data in and from time to time you get data out and in the
end you explicitly flush the compressor. The read function can just
return as many bytes as it wants and we can just hand it all over to
zlib. Other compression algorithms however are block based and first
write a block header that contains the information on the next data
block, including uncompressed and compressed sizes. Now with the
sizeHint parameter I used, the compressor could tell the read function
that it just wants to read the fixed size header (6 bytes IIRC). In
the header it would look up the compressed size for the next block and
would then ask the read function to get exactly this amount of data,
decompress it and go on with the next block, and so forth...

Of course you can possibly do that memory management inside the
compressor with an extra buffer holding what you got in excess but
it's a pain. If you removed that part on purpose on the grounds that
there is no block based compression algorithm in core and probably
never will be, then that's okay :-)


Joachim

pgsql-hackers by date:

From: Jeff Davis
Date: 01 December 2010, 22:28:10
Subject: build problem

From: Josh Berkus
Date: 01 December 2010, 22:38:36
Subject: Re: Proposal: First step towards Intelligent, integrateddatabase

Re: directory archive format for pg_dump - Mailing list pgsql-hackers

Previous

Next