Thread: More harrowing tales of TOAST growth

More harrowing tales of TOAST growth

From
Jeffrey Baker
Date:
(This isn't the full data I someday hope to have, but it is the
limit of my current understanding.)

I've been whining here about my databases growing for a while.  I
manage a database that I have to drop and reload every two weeks
simply because it grows to fill the available storage device, with
the associated performance penalty.

I have reduced the problem to its essence by writing a few programs
that trigger it.  My environment is Debian GNU/Linux unstable
distribution, PostgreSQL 7.2.1, x86 host with 256MB main memory and
a 2-disk stripe set of 10,000RPM disks used exclusively by the
database.

I create the simple database with these commands:

createdb grow
psql -c "create table grow (body text)" grow

I then start 10 copies of the "churn" program, which is attached
here.  Churn inserts tuples of approximately 13KB which are
compressible to 10KB.  It does this continuously in a tight loop.
The ten processes simulate my situation of many clients writing into
the database.

I also start one copy of the vac program.  This program deletes all
tuples from grow and vacuums the database continuously.  This should
free all the space in the tables, which will be filled up again by
the churn programs.  I expect a sawtooth effect of database size
over time, with the database reaching an upper limit steady state
size.

I ran this experiment for 10 minutes.  At time 0 the database size
was 2MB.  At 60 seconds the size was 22MB.  The size continued
upward:

Time | Size
0      2
60     7
120    60
180    109
240    181
300    282
360    284
420    363
480    442
540    549
600    557

I stopped the experiment by killing all churn and vac processes,
which left the final 'delete from grow' command still running.
After it finished, there were 5020 tuples in grow and 35,140 tuples
in its TOAST table.  The physical size on disk was dominated by
the TOAST table at 539MB and its index of 14MB.  The actual grow
table was a mere 2MB.

The amount of useful data in the TOAST table was a scant 65MB
according to select sum(length(chunk_data)) from <table>.  This
jives with the 5000 * 10KB tuples in the grow table.   The TOAST
data claimed 19939 pages.

I don't have a complete conclusion at this point but it seems
obvious to me that after a delete and vacuum, new tuples aren't
stuck into the freed space, but are rather tacked on the end, or
something conceptually similar.

-jwb

Attachment

Re: More harrowing tales of TOAST growth

From
Jeffrey Baker
Date:
On Sat, May 04, 2002 at 06:50:17PM -0700, Sean Chittenden wrote:
> > I ran this experiment for 10 minutes.  At time 0 the database size
> > was 2MB.  At 60 seconds the size was 22MB.  The size continued
> > upward:
> >
> > Time | Size
> > 0      2
> > 60     7
> > 120    60
> > 180    109
> > 240    181
> > 300    282
> > 360    284
> > 420    363
> > 480    442
> > 540    549
> > 600    557
>
> This may be outisde of the possibility of your environment, but have
> you tried a VACUUM FULL or VACUUM FREEZE ?  -sc

As I've said before, dump and reload is MUCH faster than vacuum
full, and they both require the same exclusive access.  Therefore
vacuum full isn't any better than my current situation.

-jwb

Re: More harrowing tales of TOAST growth

From
Sean Chittenden
Date:
> I ran this experiment for 10 minutes.  At time 0 the database size
> was 2MB.  At 60 seconds the size was 22MB.  The size continued
> upward:
>
> Time | Size
> 0      2
> 60     7
> 120    60
> 180    109
> 240    181
> 300    282
> 360    284
> 420    363
> 480    442
> 540    549
> 600    557

This may be outisde of the possibility of your environment, but have
you tried a VACUUM FULL or VACUUM FREEZE ?  -sc

--
Sean Chittenden

Re: More harrowing tales of TOAST growth

From
grant
Date:
It seems that your problem with toast tables indexes growing due to the
tree not collapsing when the growth is all on one side, and Scott
Marlowe's problem of booleans not collapsing because too many values are
the same are closely related, and very connected.  Has anyone with these
problems tried dropping and re-creating the index?  This would have some
impact, but no worse than the vacuum, and it would not require downtime,
just may cause poor performance for any hits during the re-build.

Questions for a guru (probably Tom Lane)
1)  Are statistics generated at index creation time, or just vacuum
analyze?
2)  Does vacuum analyze cause less locking than vacuum full?
3)  Can index creation and dropping be isolated in a transaction?
4)  Putting those considerations together, what is the best order of doing
things to recover the space (or at least make it reuseable to stop growth)
with the least impact for the users?