Re: more about pg_toast growth - Mailing list pgsql-general

From Jan Wieck
Subject Re: more about pg_toast growth
Date
Msg-id 200203132016.g2DKGDr32370@saturn.janwieck.net
Whole thread Raw
In response to Re: more about pg_toast growth  ("Jeffrey W. Baker" <jwbaker@acm.org>)
Responses Re: more about pg_toast growth  ("Jeffrey W. Baker" <jwbaker@acm.org>)
List pgsql-general
Jeffrey W. Baker wrote:
> On Wed, 2002-03-13 at 07:22, Jan Wieck wrote:
> > [...]
> >
> >     Remember,  TOAST  doesn't  only  come  in  slices,  don't you
> >     usually brown it?  Meaning, the data gets compressed (with  a
> >     lousy  but  really  fast  algorithm).   What  kind of data is
> >     resp_body? 50% compression  ratio  ...  I  guess  it's  html,
> >     right?
>
> It is gzipped and base64-encoded text.  It's somewhat strange that a
> fast LZ would deflate it very much, but I guess it must be an artifact
> of the base64.  The initial gzip tends to deflate the data by about 90%.

    Now  THAT is very surprising to me! The SLZ algorithm used in
    TOAST will for sure not be able to squeeze anything out of  a
    gzip  compressed  stream.   The result would be bigger again.
    B64 changes the file size basically to 4/3rd, but  since  the
    input  stream  is gzipped, the resulting B64 stream shouldn't
    contain patterns that SLZ can use to reduce the size again.

    Are you sure you're B64-encoding the gzipped  text?  I  mean,
    you  have  an  average  body size of 23K "gzipped", so you're
    telling that the average  uncompressed  body  size  is  about
    230K?  You  are  storing  230  Megabytes of raw body data per
    hour? Man, who is writing all that text?

>
> >     Anyway,  I  would  suggest  you  increase  the  max_fsm_pages
> >     parameter.  Commented out parameters in  the  postgresql.conf
> >     file  means  "default".   You  said  you're doing about 1,000
> >     inserts an hour and a daily bulk delete  of  approx.  24,000.
> >     Assuming  most of the toast tuples are contigous, that'd mean
> >     you are freeing something like 35,000 toast  pages.  I  would
> >     suggest  a freespace map size of 50,000 pages, to start with.
> >     That should at least lower the growth rate. If you still  see
> >     growth, go ahead and increase it further.
>
> I will happily do so.  What is the argument against increasing the free
> space map?  Does it consume more memory?  The machine has 4GB main
> memory, and I wouln't notice increased consumption of a few megabytes.

    It allocates some more shared  memory.  It's  surely  in  the
    range of a few megabytes, so no need to worry in this case.

> What triggers storage in the toast tables?  Is it because of a tuple
> longer than some threshold?

    It is triggered by the attempt to store a row bigger than 1/4
    of the blocksize. Blocksize defaults to 8K. The toaster tries
    to shrink down the row size by first compressing attribute by
    attribute, then by moving them out to  the  toast  table.  As
    soon  as the row fit's into the 2K it'll stop and the storage
    happens.

    The 1/4 blocksize is something I choose because  it  had  the
    best  performance  in  my tests. The theory behind it is that
    your key fields are likely to be the small ones  that  remain
    uncompressed  in  the  main-row.   If more such main rows fit
    into fewer blocks, you have better caching of key attributes,
    thus faster scans and joins. And access to the huge values is
    only done after the  final  result  set  is  collected,  what
    usually  becomes a smaller percentage of the entire data, the
    larger the data is.  So it all reduces I/O.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


pgsql-general by date:

Previous
From: Elein
Date:
Subject: explain vs. explain analyze
Next
From: Fran Fabrizio
Date:
Subject: Re: Function created yet not found