On Wed, 2002-03-13 at 12:16, Jan Wieck wrote:
> Jeffrey W. Baker wrote:
> > On Wed, 2002-03-13 at 07:22, Jan Wieck wrote:
> > > [...]
> > >
> > > Remember, TOAST doesn't only come in slices, don't you
> > > usually brown it? Meaning, the data gets compressed (with a
> > > lousy but really fast algorithm). What kind of data is
> > > resp_body? 50% compression ratio ... I guess it's html,
> > > right?
> >
> > It is gzipped and base64-encoded text. It's somewhat strange that a
> > fast LZ would deflate it very much, but I guess it must be an artifact
> > of the base64. The initial gzip tends to deflate the data by about 90%.
>
> Now THAT is very surprising to me! The SLZ algorithm used in
> TOAST will for sure not be able to squeeze anything out of a
> gzip compressed stream. The result would be bigger again.
> B64 changes the file size basically to 4/3rd, but since the
> input stream is gzipped, the resulting B64 stream shouldn't
> contain patterns that SLZ can use to reduce the size again.
>
> Are you sure you're B64-encoding the gzipped text?
I am positive:
rupert=# select substr(body, 0, 200) from resp_body where resp = (select
max(resp) from resp_body);
eJztfXt34riy799hrf4OGuZMJ1k3BL949SScRQhJmCbAAbp7z75zV5bAAjxtbI5tkjB75rvfkiwb
GxxDHt0dgvtBjC2VpFLVr6qkknMydiZ6+WRMsFo+6dV7jVqZnOE5ami2oxkjG31ALWdMLLgxIIZN
UFvHDrFPsm7Z1MmEOBiNHWeaIf87025P07X7qWYRO40Gp
rupert=# select min(length(body)), max(length(body)), avg(length(body))
from resp_body;
min | max | avg
-----+--------+------------------
0 | 261948 | 21529.5282897281
> I mean,
> you have an average body size of 23K "gzipped", so you're
> telling that the average uncompressed body size is about
> 230K? You are storing 230 Megabytes of raw body data per
> hour? Man, who is writing all that text?
Reuters.
I have increased the free space map and will be able to restart the
postmaster today at around midnight GMT.
Thanks for you help,
Jeffrey