Thread: Toasted table not deleted when no out of line columns left

Toasted table not deleted when no out of line columns left

From
Zoltan Boszormenyi
Date:
Hi,

we came across a database where a table had a toasted table,
keeping huge amounts of disk space allocated. However,
the table's current definition didn't explain why there was
a toasted table. Then upon some experiments, it struck me.
There _was_ a toasted field but as the schema was modified,
the fields was dropped, leaving only inline stored fields.
VACUUM [FULL] [ANALYZE] didn't cleaned up the space
that was used by the toasted table. My tests were done on 8.3.3.

As every statements that reference a table puts a lock on the
pg_class record, ALTER TABLE cannot progress until all locks
are gone, i.e. the transactions referencing the table finished.
It's true vice-versa, ALTER TABLE blocks every transactions
that may reference the table. Judging from that, the toasted table
cleanup may be part of ALTER TABLE DROP COLUMN.

Best regards,
Zoltán Böszörményi

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/



Re: Toasted table not deleted when no out of line columns left

From
Tom Lane
Date:
Zoltan Boszormenyi <zb@cybertec.at> writes:
> we came across a database where a table had a toasted table,
> keeping huge amounts of disk space allocated. However,
> the table's current definition didn't explain why there was
> a toasted table. Then upon some experiments, it struck me.
> There _was_ a toasted field but as the schema was modified,
> the fields was dropped, leaving only inline stored fields.
> VACUUM [FULL] [ANALYZE] didn't cleaned up the space
> that was used by the toasted table. My tests were done on 8.3.3.

This is not a bug; it is operating as designed.  Observe the statement
in the NOTES section of the ALTER TABLE page:
   The DROP COLUMN form does not physically remove the column, but   simply makes it invisible to SQL operations.
Subsequentinsert and   update operations in the table will store a null value for the   column. Thus, dropping a column
isquick but it will not immediately   reduce the on-disk size of your table, as the space occupied by the   dropped
columnis not reclaimed. The space will be reclaimed over   time as existing rows are updated.
 

... and it goes on to point out how to force immediate space reclamation
if you need that.  These statements apply independently of whether any
particular value is toasted or not.

The reason for this choice is that reclaiming the space immediately
would turn DROP COLUMN from a quick operation into a slow one, as it
would have to grovel over every row of the table looking for TOAST
pointers.

> Judging from that, the toasted table
> cleanup may be part of ALTER TABLE DROP COLUMN.

That would only help if you were dropping the last potentially-toastable
column of a table.  And implementing it would require introducing weird
corner cases into the tuple toaster, because it might now come across
TOAST pointers that point to a no-longer-existent table, and have to
consider that to be a no-op instead of an error condition.
        regards, tom lane


Re: Toasted table not deleted when no out of line columns left

From
"Hans-Jürgen Schönig"
Date:
>

*snip*

>> Judging from that, the toasted table
>> cleanup may be part of ALTER TABLE DROP COLUMN.
>
> That would only help if you were dropping the last potentially-
> toastable
> column of a table.  And implementing it would require introducing
> weird
> corner cases into the tuple toaster, because it might now come across
> TOAST pointers that point to a no-longer-existent table, and have to
> consider that to be a no-op instead of an error condition.
>
>             regards, tom lane




tom,

in our test case we had a table with 10 integer columns (nothing else)
along with a 10 gb toast table - this is why we were a little surprised.
in this case it can definitely be cleaned up.
it is clear that we definitely don't want to change columns directly
here when a column is dropped. - however, if there is not a single
toastable column left, we should definitely clean up.
we will compile a patch within the next days to cover this case.
many thanks,
    hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de





Re: Toasted table not deleted when no out of line columns left

From
Tom Lane
Date:
"Hans-Jürgen Schönig" <hs@cybertec.at> writes:
>> ...  And implementing it would require introducing weird
>> corner cases into the tuple toaster, because it might now come across
>> TOAST pointers that point to a no-longer-existent table, and have to
>> consider that to be a no-op instead of an error condition.

> we will compile a patch within the next days to cover this case.

I'm not sure which part of "no" you didn't understand, but: I do not
believe this is worth making the toast code less robust for.
        regards, tom lane


Re: Toasted table not deleted when no out of line columns left

From
Simon Riggs
Date:
On Sun, 2008-09-21 at 12:05 -0400, Tom Lane wrote:

> ... and it goes on to point out how to force immediate space reclamation
> if you need that.  These statements apply independently of whether any
> particular value is toasted or not.
> 
> The reason for this choice is that reclaiming the space immediately
> would turn DROP COLUMN from a quick operation into a slow one, as it
> would have to grovel over every row of the table looking for TOAST
> pointers.
> 
> > Judging from that, the toasted table
> > cleanup may be part of ALTER TABLE DROP COLUMN.

I thought Hans meant cleanup, not drop?

Perhaps there is room for a function that scans a toast table to remove
unreferenced toast data? It could be done much more efficiently than the
UPDATE and VACUUM FULL technique. No need to add it into DROP COLUMN,
but that doesn't mean it shouldn't be available somewhere, somehow.

Hans is likely to write this anyway for his customer, so it seems worth
defining how it should look so we can accept it into core. VACUUM TOAST
perhaps?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Toasted table not deleted when no out of line columns left

From
"Hans-Jürgen Schönig"
Date:
On Sep 22, 2008, at 9:46 AM, Simon Riggs wrote:

>
> On Sun, 2008-09-21 at 12:05 -0400, Tom Lane wrote:
>
>> ... and it goes on to point out how to force immediate space
>> reclamation
>> if you need that.  These statements apply independently of whether
>> any
>> particular value is toasted or not.
>>
>> The reason for this choice is that reclaiming the space immediately
>> would turn DROP COLUMN from a quick operation into a slow one, as it
>> would have to grovel over every row of the table looking for TOAST
>> pointers.
>>
>>> Judging from that, the toasted table
>>> cleanup may be part of ALTER TABLE DROP COLUMN.
>
> I thought Hans meant cleanup, not drop?
>
> Perhaps there is room for a function that scans a toast table to
> remove
> unreferenced toast data? It could be done much more efficiently than
> the
> UPDATE and VACUUM FULL technique. No need to add it into DROP COLUMN,
> but that doesn't mean it shouldn't be available somewhere, somehow.
>
> Hans is likely to write this anyway for his customer, so it seems
> worth
> defining how it should look so we can accept it into core. VACUUM
> TOAST
> perhaps?



hello simon,

we definitely have to do something about this problem. VACUUM FULL is
not an option at all.
once the last text column is gone (toastable column) we definitely
have to reclaim space.
we just cannot afford to lose hundreds of gigs of good storage because
of this missing feature.

so, to comment tom's answer - it is not about not understanding "no";
it was more a request to get a "how to do it best" because we have to
do it somehow.
best regards,
    hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de



Re: Toasted table not deleted when no out of line columns left

From
Simon Riggs
Date:
On Mon, 2008-09-22 at 10:59 +0200, Hans-Jürgen Schönig wrote:
> On Sep 22, 2008, at 9:46 AM, Simon Riggs wrote:

> > I thought Hans meant cleanup, not drop?

> we definitely have to do something about this problem. 

I think the issue is identifying the problem. Reading the title of the
post, I think Tom says "no" to *deleting* the toast table. He also says
"no" to cleaning the table as part of DROP COLUMN. That still leaves you
an opening for an out-of-line command/function to perform a clean,
without deleting the table completely.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Toasted table not deleted when no out of line columns left

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> I think the issue is identifying the problem. Reading the title of the
> post, I think Tom says "no" to *deleting* the toast table. He also says
> "no" to cleaning the table as part of DROP COLUMN. That still leaves you
> an opening for an out-of-line command/function to perform a clean,

... see CLUSTER ...
        regards, tom lane


Re: Toasted table not deleted when no out of line columns left

From
Simon Riggs
Date:
On Mon, 2008-09-22 at 07:53 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > I think the issue is identifying the problem. Reading the title of the
> > post, I think Tom says "no" to *deleting* the toast table. He also says
> > "no" to cleaning the table as part of DROP COLUMN. That still leaves you
> > an opening for an out-of-line command/function to perform a clean,
> 
> ... see CLUSTER ...

It's possible we'd want to do this even with no indexes on a table and
we definitely might want to do it without taking lengthy locks. 

It's good that DROP COLUMN is very quick, but its not good that it
doesn't remove the space and there's no way to make it do that without
requiring locks to be held for long periods.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Toasted table not deleted when no out of line columns left

From
Gregory Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Simon Riggs <simon@2ndQuadrant.com> writes:
>> I think the issue is identifying the problem. Reading the title of the
>> post, I think Tom says "no" to *deleting* the toast table. He also says
>> "no" to cleaning the table as part of DROP COLUMN. That still leaves you
>> an opening for an out-of-line command/function to perform a clean,
>
> ... see CLUSTER ...

Hmm.... I wonder if this doesn't have the same problems you're describing with
the toaster. If someone has a cursor WITH HOLD against the table they don't
get a session level lock against the tables which fed the cursor do they? In
which case it's possible for there to be toast pointers in the cursor which
will expanded much later. If someone else has run CLUSTER in the intervening
time the user will get an error.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


Re: Toasted table not deleted when no out of line columns left

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Hmm.... I wonder if this doesn't have the same problems you're describing with
> the toaster. If someone has a cursor WITH HOLD against the table they don't
> get a session level lock against the tables which fed the cursor do
> they?

Hmm, interesting point.  The tuples are read out and saved in a
tuplestore before we release locks, but I'm not sure if there is
anything in that codepath that would detoast toast references.
Seems like you're right that there would need to be.
        regards, tom lane


Re: Toasted table not deleted when no out of line columns left

From
Hannu Krosing
Date:
On Mon, 2008-09-22 at 07:53 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > I think the issue is identifying the problem. Reading the title of the
> > post, I think Tom says "no" to *deleting* the toast table. He also says
> > "no" to cleaning the table as part of DROP COLUMN. That still leaves you
> > an opening for an out-of-line command/function to perform a clean,

As i understood the initial post, the situation is even worse for TOAST
tables than for ordinary tables - there is _NO_ way, except cluster or
explicit (CREATE TABLE new AS SELECT + create indexes + drop old table +
rename new to old) to clean up toast. For removing an inline column you
can let a (update pk_id=pk_id limit 1000 ; vacuum) script run in
background for a few weeks and get your space back.

> ... see CLUSTER ...
> 
>             regards, tom lane

CLUSTER is something, you could use, if you had a mostly idle database
and a lot of time.

On real-life databases where this actually matters, you usually have
neither.

----------------
Hannu





Re: Toasted table not deleted when no out of line columns left

From
Hannu Krosing
Date:
On Sun, 2008-09-21 at 12:05 -0400, Tom Lane wrote:
> Zoltan Boszormenyi <zb@cybertec.at> writes:
> > we came across a database where a table had a toasted table,
> > keeping huge amounts of disk space allocated. However,
> > the table's current definition didn't explain why there was
> > a toasted table. Then upon some experiments, it struck me.
> > There _was_ a toasted field but as the schema was modified,
> > the fields was dropped, leaving only inline stored fields.
> > VACUUM [FULL] [ANALYZE] didn't cleaned up the space
> > that was used by the toasted table. My tests were done on 8.3.3.
> 
> This is not a bug; it is operating as designed.  Observe the statement
> in the NOTES section of the ALTER TABLE page:
> 
>     The DROP COLUMN form does not physically remove the column, but
>     simply makes it invisible to SQL operations. Subsequent insert and
>     update operations in the table will store a null value for the
>     column. Thus, dropping a column is quick but it will not immediately
>     reduce the on-disk size of your table, as the space occupied by the
>     dropped column is not reclaimed. The space will be reclaimed over
>     time as existing rows are updated.

And it seems that it is never reclaimed (instead of "reclaimed over
time" as claimed in docs) if the column happens to have been toasted.

> ... and it goes on to point out how to force immediate space reclamation
> if you need that.  These statements apply independently of whether any
> particular value is toasted or not.

Are you sure ?

how do you explain the above "VACUUM [FULL] [ANALYZE] didn't cleaned up
the space" claim ?

Is it just not true ?

Or an overlooked corner case / implementation detail ?

----------------
Hannu




Re: Toasted table not deleted when no out of line columns left

From
Tom Lane
Date:
Hannu Krosing <hannu@krosing.net> writes:
> On Sun, 2008-09-21 at 12:05 -0400, Tom Lane wrote:
>> The DROP COLUMN form does not physically remove the column, but
>> simply makes it invisible to SQL operations. Subsequent insert and
>> update operations in the table will store a null value for the
>> column. Thus, dropping a column is quick but it will not immediately
>> reduce the on-disk size of your table, as the space occupied by the
>> dropped column is not reclaimed. The space will be reclaimed over
>> time as existing rows are updated.

> And it seems that it is never reclaimed (instead of "reclaimed over
> time" as claimed in docs) if the column happens to have been toasted.

Utterly false.  The toasted values will be deletable after their parent
rows have been updated.  This is exactly the same as for space in the
parent row itself.

> how do you explain the above "VACUUM [FULL] [ANALYZE] didn't cleaned up
> the space" claim ?

He didn't do any updates in the parent table.
        regards, tom lane