Re: Performance query about large tables, lots of concurrent access - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: Performance query about large tables, lots of concurrent access
Date
Msg-id 467A9CE5.508@g2switchworks.com
Whole thread Raw
In response to Re: Performance query about large tables, lots of concurrent access  (Karl Wright <kwright@metacarta.com>)
Responses Re: Performance query about large tables, lots of concurrent access
List pgsql-performance
Karl Wright wrote:
> Scott Marlowe wrote:
>> Karl Wright wrote:
>>
>>> Shaun Thomas wrote:
>>>
>>>> On Wednesday 20 June 2007 12:55:20 pm Karl Wright wrote:
>>>>
>>>>
>>>>> I am afraid that I did answer this.  My largest tables
>>>>> are the ones continually being updated.  The smaller
>>>>> ones are updated only infrequently.
>>>>
>>>>
>>>>
>>>> You know, it actually sounds like you're getting whacked by the
>>>> same problem that got us a while back.  It sounds like you weren't
>>>> vacuuming frequently enough initially, and then tried vacuuming
>>>> later, only after you noticed performance degrade.
>>>>
>>>> Unfortunately what that means, is for several weeks or months,
>>>> Postgres has not been reusing rows on your (admittedly) active and
>>>> large tables; it just appends at the end, and lets old rows slowly
>>>> bloat that table larger and larger.  Indexes too, will suffer from
>>>> dead pages.  As frightening/sickening as this sounds, you may need
>>>> to dump/restore the really huge table, or vacuum-full to put it on
>>>> a crash diet, and then maintain a strict daily or bi-daily vacuum
>>>> schedule to keep it under control.
>>>>
>>>
>>> A nice try, but I had just completed a VACUUM on this database three
>>> hours prior to starting the VACUUM that I gave up on after 27
>>> hours.  So I don't see how much more frequently I could do it.  (The
>>> one I did earlier finished in six hours - but to accomplish that I
>>> had to shut down EVERYTHING else that machine was doing.)
>>
>>
>> So, have you ever run vacuum full or reindex on this database?
>>
>
> No.  However, this database has only existed since last Thursday
> afternoon.
Well, a couple of dozen update statements with no where clause on large
tables could bloat it right up.

It's not about age so much as update / delete patterns.
>
>> You are aware of the difference between how vacuum and vacuum full
>> work, right?
>>
>> vacuum := mark deleted tuples as available, leave in table
>> vacuum full := compact tables to remove deleted tuples.
>>
>> While you should generally avoid vacuum full, if you've let your
>> database get so bloated that the majority of space in your tables is
>> now empty / deleted tuples, you likely need to vacuuum full / reindex
>> it.
>>
> If the database is continually growing, should VACUUM FULL be necessary?
If it's only growing, with no deletes or updates, then no.  Generally,
on a properly vacuumed database, vacuum full should never be needed.
>> For instance, on my tiny little 31 Gigabyte reporting database, the
>> main table takes up about 17 Gigs.  This query gives you some idea
>> how many bytes each row is taking on average:
>>
>> select relname, relpages::float*8192 as size, reltuples,
>> (relpages::double precision*8192)/reltuples::double precision as
>> bytes_per_row from pg_class where relname = 'businessrequestsummary';
>>        relname         |    size     |  reltuples  |  bytes_per_row
>> ------------------------+-------------+-------------+-----------------
>> businessrequestsummary | 17560944640 | 5.49438e+07 | 319.61656229454
>>
>> Note that these numbers are updated by running analyze...
>>
>> What does it say about your DB?
>>
>
> I wish I could tell you.  Like I said, I had to abandon this project
> to test out an upgrade procedure involving pg_dump and pg_restore.
> (The upgrade also seems to take a very long time - over 6 hours so
> far.) When it is back online I can provide further information.

Well, let us know.  I would definitely recommend getting more / faster
disks.  Right now I've got a simple 4 disk RAID10 on the way to replace
the single SATA drive I'm running on right now.  I can't wait.

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Data transfer very slow when connected via DSL
Next
From: Andrew Sullivan
Date:
Subject: Re: Volunteer to build a configuration tool