Thread: Performance Problem with Vacuum of bytea table (PG 8.0.13)

Performance Problem with Vacuum of bytea table (PG 8.0.13)

From
Bastian Voigt
Date:
Hi *,
for caching large autogenerated XML files, I have created a bytea table
in my database so that the cached files can be used by multiple servers.
There are about 500 rows and 10-20 Updates per minute on the table. The
files stored in the bytea are anything from 10kB to 10MB. My PostgreSQL
version is 8.0.13 on Gentoo Linux (x86) with PostGIS 1.2.0.

For vacuum I use the pg_autovacuum daemon. It decided to vacuum my cache
table about every 3 hours, the vacuum process takes 20-30 minutes
(oops!) every time.

Now my big big problem is that the database gets really really slow
during these 20 minutes and after the vacuum process is running for a
short time, many transactions show state "UPDATE waiting" in the process
list. In my Java application server I sometimes get tons of deadlock
Exceptions (waiting on ShareLock blahblah). The web frontend gets nearly
unusable, logging in takes more than 60 seconds, etc. etc.

Under normal circumstances my application is really fast, vacuuming
other tables is no problem, only the bytea table is really awkward

I hope some of you performance cracks can help me...


this is my table definition:

Table »public.binary_cache«
  Column  |             Type            | Attributes
----------+-----------------------------+-----------
 cache_id | bigint                      | not null
 date     | timestamp without time zone |
 data     | bytea                       |

Indexe:
    »binary_cache_pkey« PRIMARY KEY, btree (cache_id)


Thanks in advance for any hints!

--
Bastian Voigt
Neumünstersche Straße 4
20251 Hamburg
telefon +49 - 40  - 67957171
mobil   +49 - 179 - 4826359



Re: Performance Problem with Vacuum of bytea table (PG 8.0.13)

From
Richard Huxton
Date:
Bastian Voigt wrote:
> Hi *,
> for caching large autogenerated XML files, I have created a bytea table
> in my database so that the cached files can be used by multiple servers.
> There are about 500 rows and 10-20 Updates per minute on the table. The
> files stored in the bytea are anything from 10kB to 10MB. My PostgreSQL
> version is 8.0.13 on Gentoo Linux (x86) with PostGIS 1.2.0.
>
> For vacuum I use the pg_autovacuum daemon. It decided to vacuum my cache
> table about every 3 hours, the vacuum process takes 20-30 minutes
> (oops!) every time.

Try vacuuming every 3 minutes and see what happens.

--
   Richard Huxton
   Archonet Ltd

Re: Performance Problem with Vacuum of bytea table (PG 8.0.13)

From
Bastian Voigt
Date:

OK, I'll give that a try. What about pg_autovacuum then? Is it a problem
when two processes try to vacuum the same table in parallel? Or do I
need to deactivate autovacuum altogether?
>
> Try vacuuming every 3 minutes and see what happens.
>

(Sorry Richard, forgot to reply to the list!)
--
Bastian Voigt
Neumünstersche Straße 4
20251 Hamburg
telefon +49 - 40  - 67957171
mobil   +49 - 179 - 4826359




Re: Performance Problem with Vacuum of bytea table (PG 8.0.13)

From
Richard Huxton
Date:
Bastian Voigt wrote:
>
> OK, I'll give that a try. What about pg_autovacuum then? Is it a problem
> when two processes try to vacuum the same table in parallel? Or do I
> need to deactivate autovacuum altogether?

I was about to say that you can tune pg_autovacuum, but I just checked
your original post and you're running 8.0.x - not sure about that one.

You'll have to check the documentation for that version to see if you
can either:
1. exclude that table from pg_autovacuum
2. increase pg_autovacuum's sensitivity

If not, and this table is the most active, it might be simpler just to
run your own vacuum-ing from a cron job.

--
   Richard Huxton
   Archonet Ltd

Re: Performance Problem with Vacuum of bytea table (PG 8.0.13)

From
Bastian Voigt
Date:
Richard Huxton wrote:
>
> I was about to say that you can tune pg_autovacuum, but I just checked
> your original post and you're running 8.0.x - not sure about that one.
The system catalog pg_autovacuum which allows finetuning autovacuum at
table level was introduced in 8.1 :-(

> You'll have to check the documentation for that version to see if you
> can either:
> 1. exclude that table from pg_autovacuum
> 2. increase pg_autovacuum's sensitivity
(1) seems to be impossible (correct me if I'm wrong..), so maybe I'll go
for (2) ...

> If not, and this table is the most active, it might be simpler just to
> run your own vacuum-ing from a cron job.
Well, it is one of the most active, but there are others. pg_autovacuum
seems to do a very good job, apart from this one table...


--
Bastian Voigt
Neumünstersche Straße 4
20251 Hamburg
telefon +49 - 40  - 67957171
mobil   +49 - 179 - 4826359



Re: Performance Problem with Vacuum of bytea table (PG 8.0.13)

From
Richard Huxton
Date:
Bastian Voigt wrote:
> Richard Huxton wrote:
>>
>> I was about to say that you can tune pg_autovacuum, but I just checked
>> your original post and you're running 8.0.x - not sure about that one.
> The system catalog pg_autovacuum which allows finetuning autovacuum at
> table level was introduced in 8.1 :-(

Hmm - thought it might have been :-(

>> You'll have to check the documentation for that version to see if you
>> can either:
>> 1. exclude that table from pg_autovacuum
>> 2. increase pg_autovacuum's sensitivity
> (1) seems to be impossible (correct me if I'm wrong..), so maybe I'll go
> for (2) ...

No, the per-table stuff was via the system table.

>> If not, and this table is the most active, it might be simpler just to
>> run your own vacuum-ing from a cron job.
> Well, it is one of the most active, but there are others. pg_autovacuum
> seems to do a very good job, apart from this one table...

Do you have any settings in your postgresql.conf? Failing that, you
would have to poke around the source.

--
   Richard Huxton
   Archonet Ltd

No, this did not help. The vacuum process is still running far too long
and makes everything slow. It is even worse than before, cause now the
system is slow almost all the time while when vacuuming only every 3
hours it is only slow once every three hours.....


I now did the following. Well, no comment.....


Shellscript A:

while true
do
    psql -U $user -d $database -c "vacuum analyze verbose binary_cache"
    echo "Going to sleep"
    sleep 60
done


Shellscript B:

while true
do
        ps aux > $tempfile
        numwaiting=`grep UPDATE.waiting $tempfile | grep -c -v grep`
        echo "Number of waiting updates: $numwaiting"

        vacuumpid=`grep VACUUM $tempfile| grep -v grep | awk '{print $2}'`
        echo "PID of vacuum process: $vacuumpid"

        if [ $numwaiting -gt 5 ]
        then
                echo "Too many waiting transactions, killing vacuum
process $vacuumpid..."
                kill $vacuumpid
        fi
        echo "Sleeping 30 Seconds"
        sleep 30
done

--
Bastian Voigt
Neumünstersche Straße 4
20251 Hamburg
telefon +49 - 40  - 67957171
mobil   +49 - 179 - 4826359



Bastian Voigt wrote:
> No, this did not help. The vacuum process is still running far too long
> and makes everything slow. It is even worse than before, cause now the
> system is slow almost all the time while when vacuuming only every 3
> hours it is only slow once every three hours.....

Could you check the output of vacuum verbose on that table and see how
much work it's doing? I'd have thought the actual bytea data would be
TOASTed away to a separate table for storage, leaving the vacuum with
very little work to do.

It might well be your actual problem is your disk I/O is constantly
saturated and the vacuum just pushes it over the edge. In which case
you'll either need more/better disks or to find a quiet time once a day
to vacuum and just do so then.

--
   Richard Huxton
   Archonet Ltd

you should first cluster the table on primary key.
The table is probably already bloated from the 3 hr delay it had before.
First
CLUSTER "primary key index name" ON group_fin_account_tst;
Then
vacuum it every 3 minutes.
NB! clustering takes an access exclusive lock on table

Kristo

On 25.05.2007, at 15:30, Bastian Voigt wrote:

> No, this did not help. The vacuum process is still running far too
> long and makes everything slow. It is even worse than before, cause
> now the system is slow almost all the time while when vacuuming
> only every 3 hours it is only slow once every three hours.....
>
>
> I now did the following. Well, no comment.....
>
>
> Shellscript A:
>
> while true
> do
>    psql -U $user -d $database -c "vacuum analyze verbose binary_cache"
>    echo "Going to sleep"
>    sleep 60
> done
>
>
> Shellscript B:
>
> while true
> do
>        ps aux > $tempfile
>        numwaiting=`grep UPDATE.waiting $tempfile | grep -c -v grep`
>        echo "Number of waiting updates: $numwaiting"
>
>        vacuumpid=`grep VACUUM $tempfile| grep -v grep | awk '{print
> $2}'`
>        echo "PID of vacuum process: $vacuumpid"
>
>        if [ $numwaiting -gt 5 ]
>        then
>                echo "Too many waiting transactions, killing vacuum
> process $vacuumpid..."
>                kill $vacuumpid
>        fi
>        echo "Sleeping 30 Seconds"
>        sleep 30
> done
>
> --
> Bastian Voigt
> Neumünstersche Straße 4
> 20251 Hamburg
> telefon +49 - 40  - 67957171
> mobil   +49 - 179 - 4826359
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


Bastian Voigt wrote:
> No, this did not help. The vacuum process is still running far too long
> and makes everything slow. It is even worse than before, cause now the
> system is slow almost all the time while when vacuuming only every 3
> hours it is only slow once every three hours.....
>
>
> I now did the following. Well, no comment.....

Killing the vacuum mid-process doesn't help you, because the table will
be in a sorrier state than it was when it started.

I think it would be better if you:

1. Revert pg_autovacuum changes so that it processes every 3 hours or
whatever, like you had at the start of this thread.  Or maybe less.
That one will take care of the _other_ tables.

2. Vacuum the bytea table manually more often, say every 10 minutes or
so (vacuum, sleep 10m, goto start).  Make sure this is done with an
appropriate vacuum_cost_delay setting (and related settings).

3. Raise max_fsm_pages so that a lot of pages with free space can be
recorded for that table


The point here is that vacuuming the bytea table can take a long time
due to vacuum_cost_delay, but it won't affect the rest of the system;
regular operation will continue to run at (almost) normal speed.  Having
a big number of free pages ensures that the free space in the table is
not "lost".

Also, you may want to reindex that table once, because with so many
killing vacuums you have probably screwed up the indexes big time (maybe
cluster it once instead of reindexing, because that will compact the
heap as well as the indexes).

Another recommendation is to upgrade to 8.2.4 which is faster and has
a better autovacuum.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Kristo Kaiv wrote:
> you should first cluster the table on primary key.
> The table is probably already bloated from the 3 hr delay it had before.
> First
> CLUSTER "primary key index name" ON group_fin_account_tst;
> Then
> vacuum it every 3 minutes.
> NB! clustering takes an access exclusive lock on table
Kristo,
thanks a bunch!!
This was the solution...
The cluster operation took about 60sec, and after it was done the vacuum
finished in only 10sec. or so, with no noticeable performance
bottleneck. Now vacuum is running every 2-3 minutes and makes no problems.

Hhhh, now I can look forward to a laid-back weekend..

Richard, Kristo, Alvaro, thanks 1000 times for responding so quickly

:-)

--
Bastian Voigt
Neumünstersche Straße 4
20251 Hamburg
telefon +49 - 40  - 67957171
mobil   +49 - 179 - 4826359



Richard Huxton wrote:
> Could you check the output of vacuum verbose on that table and see how
> much work it's doing? I'd have thought the actual bytea data would be
> TOASTed away to a separate table for storage, leaving the vacuum with
> very little work to do.
I'm quite new to postgres (actually I just ported our running
application from MySQL...), so I don't know what toast means. But I
noticed that vacuum also tried to cleanup some "toast" relations or so.
This was what took so long.

> It might well be your actual problem is your disk I/O is constantly
> saturated and the vacuum just pushes it over the edge. In which case
> you'll either need more/better disks or to find a quiet time once a
> day to vacuum and just do so then.
Yes, that was definitely the case. But now everything runs smoothly
again, so I don't think I need to buy new disks.

Regards
Bastian


--
Bastian Voigt
Neumünstersche Straße 4
20251 Hamburg
telefon +49 - 40  - 67957171
mobil   +49 - 179 - 4826359



Re: Performance Problem with Vacuum of bytea table (PG 8.0.13)

From
Tom Lane
Date:
Bastian Voigt <post@bastian-voigt.de> writes:
> Now my big big problem is that the database gets really really slow
> during these 20 minutes and after the vacuum process is running for a
> short time, many transactions show state "UPDATE waiting" in the process
> list. In my Java application server I sometimes get tons of deadlock
> Exceptions (waiting on ShareLock blahblah). The web frontend gets nearly
> unusable, logging in takes more than 60 seconds, etc. etc.

Hmm.  That's a bit weird --- what are they waiting on exactly?  Look in
pg_locks to see what the situation is.  A vacuum per se ought not be
blocking any updates.

Aside from the recommendation to make the vacuums happen more frequently
instead of less so, you should experiment with vacuum_cost_delay and
related parameters.  The idea is to reduce vacuum's I/O load so that it
doesn't hurt foreground response time.  This means any individual vacuum
will take longer, but you won't need to care.

            regards, tom lane

Re: Performance Problem with Vacuum of bytea table (PG 8.0.13)

From
"Jim C. Nasby"
Date:
On Fri, May 25, 2007 at 10:29:30AM +0200, Bastian Voigt wrote:
> Hi *,
> for caching large autogenerated XML files, I have created a bytea table
> in my database so that the cached files can be used by multiple servers.
> There are about 500 rows and 10-20 Updates per minute on the table. The
> files stored in the bytea are anything from 10kB to 10MB. My PostgreSQL
> version is 8.0.13 on Gentoo Linux (x86) with PostGIS 1.2.0.
>
> For vacuum I use the pg_autovacuum daemon. It decided to vacuum my cache
> table about every 3 hours, the vacuum process takes 20-30 minutes
> (oops!) every time.

You'll want to decrease autovacum_vacuum_scale_factor to 0.2 if you're
on anything less than 8.2.
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment