Thread: REINDEX disk space requirements

REINDEX disk space requirements

From

David Schnur

Date:

06 November 2009, 12:00:11

Hello,

I'm a developer on a product that includes a built-in PostgreSQL DB; currently 8.3.5. One of our tables is very active - it can see in the tens of millions of rows inserted and deleted per day. Generally speaking, new rows arrive throughout the day, and older rows from previous days are periodically deleted. A task runs VACUUM ANALYZE after those deletions, to keep space available. We noticed that the index on this table sometimes grew larger over time, so we added a REINDEX at a low-activity time of day.

One large installation we're working with is seeing 'out of disk space' errors when performing the REINDEX. I don't have precise numbers at the moment, but here's what I know:

- Total DB size is ~100 GB

- Size of the main table is ~60 GB (~1B rows)

- Size of the main table PK index is ~20 GB

- Free space on disk is ~35 GB

- Disk quotas are not an issue

could not extend relation 1663/16384/5881417: wrote only 4096 of 8192 bytes at block 1631971

HINT: Check free disk space.

'REINDEX TABLE <redacted>'

Could this be caused by anything other than actually running out of space? If not, is there a way calculate, based on the existing index size, table size or number of rows, roughly how much space the REINDEX requires, or get an upper-bound on that value? Thanks,

David

Re: REINDEX disk space requirements

From

Alvaro Herrera

Date:

06 November 2009, 13:11:43

David Schnur escribió:

> I'm a developer on a product that includes a built-in PostgreSQL DB;
> currently 8.3.5.  One of our tables is very active - it can see in the tens
> of millions of rows inserted and deleted per day.  Generally speaking, new
> rows arrive throughout the day, and older rows from previous days are
> periodically deleted.  A task runs VACUUM ANALYZE after those deletions, to
> keep space available.

Immediately after the deletions, or is there some delay?  Keep in mind
that rows cannot be reclaimed until the oldest transaction that was open
when they were deleted is finished.  So if you vacuum too quickly, the
deleted rows may not be deleted.  It's better if you insert some delay
between delete and vacuum, the duration of which is dependent on the
duration of your transactions.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: REINDEX disk space requirements

From

Tom Lane

Date:

06 November 2009, 13:20:32

David Schnur <dnschnur@gmail.com> writes:
> One large installation we're working with is seeing 'out of disk space'
> errors when performing the REINDEX.  I don't have precise numbers at the
> moment, but here's what I know:

> - Total DB size is ~100 GB
> - Size of the main table is ~60 GB (~1B rows)
> - Size of the main table PK index is ~20 GB
> - Free space on disk is ~35 GB

Out of disk space is 100% guaranteed here, because it'll take about
twice the index size to do a REINDEX --- there's a sort file that's
roughly the size of the index, plus the new index itself, and we
don't risk deleting the old index until the transaction commits.

Possibly you could drop and recreate the index instead of using REINDEX,
if you're going to have the table locked anyway.  But it seems to me
that you're likely to need more disk pretty soon, unless this DB is
more static than most.  Maybe just spring for more hardware now.

            regards, tom lane

Re: REINDEX disk space requirements

From

Anj Adu

Date:

06 November 2009, 13:44:16

An alternative to adding more hardware is to partition the table. This
may be your best solution for the long term too. The benefits are

1. Elimination of frequent vacuums
2. Instant space reclamation via partition deletes.

We had a similar situation as yours...we bit the bullet and
implemented partitioning..and that was the best decision we ever made.

On Fri, Nov 6, 2009 at 9:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Schnur <dnschnur@gmail.com> writes:
>> One large installation we're working with is seeing 'out of disk space'
>> errors when performing the REINDEX.  I don't have precise numbers at the
>> moment, but here's what I know:
>
>> - Total DB size is ~100 GB
>> - Size of the main table is ~60 GB (~1B rows)
>> - Size of the main table PK index is ~20 GB
>> - Free space on disk is ~35 GB
>
> Out of disk space is 100% guaranteed here, because it'll take about
> twice the index size to do a REINDEX --- there's a sort file that's
> roughly the size of the index, plus the new index itself, and we
> don't risk deleting the old index until the transaction commits.
>
> Possibly you could drop and recreate the index instead of using REINDEX,
> if you're going to have the table locked anyway.  But it seems to me
> that you're likely to need more disk pretty soon, unless this DB is
> more static than most.  Maybe just spring for more hardware now.
>
>                        regards, tom lane
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>

Re: REINDEX disk space requirements

From

David Schnur

Date:

06 November 2009, 13:47:39

On Fri, Nov 6, 2009 at 12:11 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:

Immediately after the deletions, or is there some delay? Keep in mind
that rows cannot be reclaimed until the oldest transaction that was open
when they were deleted is finished.

The VACUUM waits until after the DELETE has been committed. But when you refer to the oldest transaction, do you mean any transaction at all? Currently it's guaranteed that no other transaction is running when VACUUM starts, but we were thinking of changing that behavior. It would then be possible for an INSERT in a separate transaction to start running ~10 seconds before the DELETE is done, and continue running for ~10 seconds after the VACUUM starts.

Is that the problem you were referring to? How does VACUUM behave in that situation? It sounds like it returns without reclaiming anything?

On Fri, Nov 6, 2009 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Out of disk space is 100% guaranteed here, because it'll take about
twice the index size to do a REINDEX --- there's a sort file that's
roughly the size of the index, plus the new index itself, and we
don't risk deleting the old index until the transaction commits.

Aha; I assumed it would require free space equal to the size of the index. If it needs double, that explains it pretty clearly. That's exactly what I was looking for; thank you!

David

Re: REINDEX disk space requirements

From

Alvaro Herrera

Date:

06 November 2009, 14:18:57

David Schnur escribió:
> On Fri, Nov 6, 2009 at 12:11 PM, Alvaro Herrera <alvherre@commandprompt.com>
>  wrote:
>
> > Immediately after the deletions, or is there some delay?  Keep in mind
> > that rows cannot be reclaimed until the oldest transaction that was open
> > when they were deleted is finished.
>
> The VACUUM waits until after the DELETE has been committed.  But when you
> refer to the oldest transaction, do you mean any transaction at all?

Yes.  But it goes back even further: is there any other transaction
running that was also running when the DELETE started?  If there is,
vacuum won't be able to reclaim the rows.

>  Currently it's guaranteed that no other transaction is running when VACUUM
> starts, but we were thinking of changing that behavior.  It would then be
> possible for an INSERT in a separate transaction to start running ~10
> seconds before the DELETE is done, and continue running for ~10 seconds
> after the VACUUM starts.
>
> Is that the problem you were referring to?

Yes.  You'd have to wait until the INSERT is finished, and run VACUUM
then.

> How does VACUUM behave in that situation?  It sounds like it returns
> without reclaiming anything?

Yes.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.