Thread: Big delete on big table... now what?

Big delete on big table... now what?

From
"Fernando Hevia"
Date:
Hi list.

I have a table with over 30 million rows. Performance was dropping steadily
so I moved old data not needed online to an historic table. Now the table
has about 14 million rows. I don't need the disk space returned to the OS
but I do need to improve performance. Will a plain vacuum do or is a vacuum
full necessary?
¿Would a vacuum full improve performance at all?

Thanks for your hindsight.
Regards,

Fernando.


Re: Big delete on big table... now what?

From
"Kevin Grittner"
Date:
>>> "Fernando Hevia" <fhevia@ip-tel.com.ar> wrote:

> I have a table with over 30 million rows. Performance was dropping
steadily
> so I moved old data not needed online to an historic table. Now the
table
> has about 14 million rows. I don't need the disk space returned to
the OS
> but I do need to improve performance. Will a plain vacuum do or is a
vacuum
> full necessary?
> *Would a vacuum full improve performance at all?

If this database can be out of production for long enough to run it
(possibly a few hours, depending on hardware, configuration, table
width, indexes) your best option might be to CLUSTER and ANALYZE the
table.  It gets more complicated if you can't tolerate down-time.

-Kevin

Re: Big delete on big table... now what?

From
Bill Moran
Date:
"Fernando Hevia" <fhevia@ip-tel.com.ar> wrote:
>
> Hi list.
>
> I have a table with over 30 million rows. Performance was dropping steadily
> so I moved old data not needed online to an historic table. Now the table
> has about 14 million rows. I don't need the disk space returned to the OS
> but I do need to improve performance. Will a plain vacuum do or is a vacuum
> full necessary?
> ¿Would a vacuum full improve performance at all?

If you can afford the downtime on that table, cluster would be best.

If not, do the normal vacuum and analyze.  This is unlikely to improve
the performance much (although it may shrink the table _some_) but
regular vacuum will keep performance from getting any worse.

You can also reindex pretty safely.  Any queries that run during the
reindex will just have to do so without the indexes.

Longer term, if you remove smaller groups of rows more frequently, you'll
probably be able to maintain performance and table bloat at a reasonable
level with normal vacuum.

--
Bill Moran
Collaborative Fusion Inc.

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Big delete on big table... now what?

From
Gregory Stark
Date:
"Bill Moran" <wmoran@collaborativefusion.com> writes:

> "Fernando Hevia" <fhevia@ip-tel.com.ar> wrote:
>>
>> Hi list.
>>
>> I have a table with over 30 million rows. Performance was dropping steadily
>> so I moved old data not needed online to an historic table. Now the table
>> has about 14 million rows. I don't need the disk space returned to the OS
>> but I do need to improve performance. Will a plain vacuum do or is a vacuum
>> full necessary?
>> ¿Would a vacuum full improve performance at all?
>
> If you can afford the downtime on that table, cluster would be best.
>
> If not, do the normal vacuum and analyze.  This is unlikely to improve
> the performance much (although it may shrink the table _some_) but
> regular vacuum will keep performance from getting any worse.

Note that CLUSTER requires enough space to store the new and the old copies of
the table simultaneously. That's the main reason for VACUUM FULL to still
exist.

There is also the option of doing something like (assuming id is already an
integer -- ie this doesn't actually change the data):

 ALTER TABLE x ALTER id TYPE integer USING id;

which will rewrite the whole table. This is effectively the same as CLUSTER
except it doesn't order the table according to an index. It will still require
enough space to hold two copies of the table but it will be significantly
faster.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: Big delete on big table... now what?

From
"Fernando Hevia"
Date:
> Gregory Stark <start@enterprisedb.com> writes:
>
> "Bill Moran" <wmoran@collaborativefusion.com> writes:
>
> > "Fernando Hevia" <fhevia@ip-tel.com.ar> wrote:
> >> Hi list.
> >> I have a table with over 30 million rows. Performance was dropping
> >> steadily so I moved old data not needed online to an
> historic table.
> >> Now the table has about 14 million rows. I don't need the
> disk space
> >> returned to the OS but I do need to improve performance.
> Will a plain
> >> vacuum do or is a vacuum full necessary?
> >> ¿Would a vacuum full improve performance at all?
> >
> > If you can afford the downtime on that table, cluster would be best.
> >
> > If not, do the normal vacuum and analyze.  This is unlikely
> to improve
> > the performance much (although it may shrink the table _some_) but
> > regular vacuum will keep performance from getting any worse.
>
> Note that CLUSTER requires enough space to store the new and
> the old copies of the table simultaneously. That's the main
> reason for VACUUM FULL to still exist.
>
> There is also the option of doing something like (assuming id
> is already an integer -- ie this doesn't actually change the data):
>
>  ALTER TABLE x ALTER id TYPE integer USING id;
>
> which will rewrite the whole table. This is effectively the
> same as CLUSTER except it doesn't order the table according
> to an index. It will still require enough space to hold two
> copies of the table but it will be significantly faster.
>

Yes, I can afford a downtime on Sunday.
Actually the clustering option would help since most of our slow queries use
the same index.

Thanks Bill and Gregory for the advice.
Regards,
Fernando.