Thread: dead tuples and VACUUM

dead tuples and VACUUM

From
Dima Tkach
Date:
Hi, everybody!

I wanted to ask a few questions about VACUUM, and dead tuples in general

Suppose, I have a table with a few million rows, and every now and then
(say, monthly) every row in the table is updated.

First of all, it is my understanding that this monthly job will
double the size of the table (create a copy of every tuple it is
updating). Is that right?

Now, if I run VACUUM on that table, it is supposed to reclaim those dead
tuples.. Will it actually? I remember some discussions about 7.2, where
it was mentioned that plain (not FULL) VACUUM doesn't move rows between
pages (so, it seems to me, that if every row is updated, plain vacuum is
  useless). Is it any better in 7.3? Or does it still have to be VACUUM
FULL?

And finally, if I do *not* run VACUUM, and let it just sit there with
the doubled number of tuples, what will happen next month, when every
row gets updated again? Will it be able to reuse the dead tuples then,
or will it just keep creating the new ones?

My problem is that, as I described above, I am afraid that plain VACUUM
just won't help, and running VACUUM FULL takes *days* on my database,
and we just cannot afford that much downtime (in fact, it is quicker,
for some reason to dump and reload the table then to vacuum full it!).

I'd love to hear what you guys know about it!

Thanks a lot!

Dima


Re: dead tuples and VACUUM

From
Bruno Wolff III
Date:
On Sat, May 31, 2003 at 14:15:04 -0400,
  Dima Tkach <dmitry@openratings.com> wrote:
>
> First of all, it is my understanding that this monthly job will
> double the size of the table (create a copy of every tuple it is
> updating). Is that right?

Yes.

> Now, if I run VACUUM on that table, it is supposed to reclaim those dead
> tuples.. Will it actually? I remember some discussions about 7.2, where
> it was mentioned that plain (not FULL) VACUUM doesn't move rows between
> pages (so, it seems to me, that if every row is updated, plain vacuum is
>  useless). Is it any better in 7.3? Or does it still have to be VACUUM
> FULL?

You need to make sure FSM is set large enough to plain vacuum to recover
all of the tuples.

> And finally, if I do *not* run VACUUM, and let it just sit there with
> the doubled number of tuples, what will happen next month, when every
> row gets updated again? Will it be able to reuse the dead tuples then,
> or will it just keep creating the new ones?

If you don't run even plain vacuum the deleted tuples won't be reused.
If you have run plain vacuum with FSM set high enough the free space
will be reused.

Re: dead tuples and VACUUM

From
Andrew Sullivan
Date:
On Sat, May 31, 2003 at 02:15:04PM -0400, Dima Tkach wrote:
> First of all, it is my understanding that this monthly job will
> double the size of the table (create a copy of every tuple it is
> updating). Is that right?

Sort of.  Read on.

> Now, if I run VACUUM on that table, it is supposed to reclaim those dead
> tuples.. Will it actually? I remember some discussions about 7.2, where
> it was mentioned that plain (not FULL) VACUUM doesn't move rows between
> pages (so, it seems to me, that if every row is updated, plain vacuum is
>  useless). Is it any better in 7.3? Or does it still have to be VACUUM
> FULL?

If you replace every tuple in the table, you probably don't have a
large enough fress space map to track all of that.  So VACUUM FULL is
a good idea.

But another answer is to VACUUM every (say) couple thousand UPDATEs.
That'll keep the table size managable.  You could even run a parallel
VACUUM.  If the whole table is UPDATEd in one transaction, though,
that won't help.

> And finally, if I do *not* run VACUUM, and let it just sit there with
> the doubled number of tuples, what will happen next month, when every
> row gets updated again? Will it be able to reuse the dead tuples then,
> or will it just keep creating the new ones?

It can't use the free space until you've run VACUUM.

A
--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Table data type modification

From
"Guillaume Houssay"
Date:
Sorry for this very basic question.

After installing Postgresql 7.3.2 and migrating table description. I can not
modify the column types of some tables.

To do the modification, I tried both pgphpadmin and pgadmin with the
administrator user which comes by default after installation.

I am pretty sure this is link to the privilegies but I can not find out the
solution.

Could you please help ?

GH


Re: dead tuples and VACUUM

From
Dmitry Tkach
Date:
Thanks for the reply!

I'd still like to clariofy some points...

>If you replace every tuple in the table, you probably don't have a
>large enough fress space map to track all of that.  So VACUUM FULL is
>a good idea.
>
>
Where do I set that free space map? Is it a configuration parameter?
And what exactly does it mean? If it is not high enough and I never run
VACUUM FULL, are my dead tuples lost forever?

>But another answer is to VACUUM every (say) couple thousand UPDATEs.
>That'll keep the table size managable.
>
I am afraid, that is not feasible too - you see, that monthly update job
I mentioned runs continuosly and takes about a couple of weeks to
complete as it is.
If I pause it every couple thousands of rows to do a vacuum, it will, I
am afraid, take months (and I have to stay under at least 1 month,
because the whole idea is to update every month).

> You could even run a parallel
>VACUUM.  If the whole table is UPDATEd in one transaction, though,
>that won't help.
>
It is not one transaction. I am running about 10 parallel jobs, and each
of them commits every now and then (every 10K rows, I believe).
If I run VACUUM in parallel, what exactly is going to happen?

Will I still be loosing some of those dead tuples after each update?

Basically, my question is - if I let it run this way forever, will the
database just keep growing on me until I am out of the disk space?
Is there any way to prevent that, without having to take it offline for
days?

Thanks!

Dima



Re: dead tuples and VACUUM

From
Bruno Wolff III
Date:
On Mon, Jun 02, 2003 at 10:57:18 -0400,
  Dmitry Tkach <dmitry@openratings.com> wrote:
> Thanks for the reply!
>
> I'd still like to clariofy some points...
>
> >If you replace every tuple in the table, you probably don't have a
> >large enough fress space map to track all of that.  So VACUUM FULL is
> >a good idea.
> >
> >
> Where do I set that free space map? Is it a configuration parameter?
> And what exactly does it mean? If it is not high enough and I never run
> VACUUM FULL, are my dead tuples lost forever?

It is a configuration paramter. I believe the name is FSM.

>
> >But another answer is to VACUUM every (say) couple thousand UPDATEs.
> >That'll keep the table size managable.
> >
> I am afraid, that is not feasible too - you see, that monthly update job
> I mentioned runs continuosly and takes about a couple of weeks to
> complete as it is.
> If I pause it every couple thousands of rows to do a vacuum, it will, I
> am afraid, take months (and I have to stay under at least 1 month,
> because the whole idea is to update every month).

If you are doing normal vacuum (not vacuum full), it doesn't lock tables
so you can run it along with other updates without taking a big performance
hit. All it does is mark deleted tuples that are not visible to any
current transactions as safe to reuse. By running vacuum frequently you
can maintain a steady state size for your database. If it grows unusually
large and you need to recover some space for other uses then you would
want to use vacuum full.

>
> >You could even run a parallel
> >VACUUM.  If the whole table is UPDATEd in one transaction, though,
> >that won't help.
> >
> It is not one transaction. I am running about 10 parallel jobs, and each
> of them commits every now and then (every 10K rows, I believe).
> If I run VACUUM in parallel, what exactly is going to happen?

It will work just fine.

>
> Will I still be loosing some of those dead tuples after each update?

Vacuum will mark deleted tuples as reusuable if there are no open
transactions that can still see them.

>
> Basically, my question is - if I let it run this way forever, will the
> database just keep growing on me until I am out of the disk space?
> Is there any way to prevent that, without having to take it offline for
> days?

It should reach a steady state size that depends on how often you vacuum
and how often you update. If something goes wrong (say a transaction
is started intereactively and then left open for a day or two) then
the database may grow unusually large and you may need to do a vacuum
full.

>
> Thanks!
>
> Dima
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org