Home > mailing lists

Re: Performance query about large tables, lots of concurrent access - Mailing list pgsql-performance

From	Karl Wright
Subject	Re: Performance query about large tables, lots of concurrent access
Date	June 21, 2007 13:29:59
Msg-id	467AA77D.3030204@metacarta.com Whole thread Raw
In response to	Re: Performance query about large tables, lots of concurrent access (Scott Marlowe <smarlowe@g2switchworks.com>)
Responses	Re: Performance query about large tables, lots of concurrent access Re: Performance query about large tables, lots of concurrent access
List	pgsql-performance

Tree view

Scott Marlowe wrote:
> Karl Wright wrote:
>> Scott Marlowe wrote:
>>> Karl Wright wrote:
>>>
>>>> Shaun Thomas wrote:
>>>>
>>>>> On Wednesday 20 June 2007 12:55:20 pm Karl Wright wrote:
>>>>>
>>>>>
>>>>>> I am afraid that I did answer this.  My largest tables
>>>>>> are the ones continually being updated.  The smaller
>>>>>> ones are updated only infrequently.
>>>>>
>>>>>
>>>>>
>>>>> You know, it actually sounds like you're getting whacked by the
>>>>> same problem that got us a while back.  It sounds like you weren't
>>>>> vacuuming frequently enough initially, and then tried vacuuming
>>>>> later, only after you noticed performance degrade.
>>>>>
>>>>> Unfortunately what that means, is for several weeks or months,
>>>>> Postgres has not been reusing rows on your (admittedly) active and
>>>>> large tables; it just appends at the end, and lets old rows slowly
>>>>> bloat that table larger and larger.  Indexes too, will suffer from
>>>>> dead pages.  As frightening/sickening as this sounds, you may need
>>>>> to dump/restore the really huge table, or vacuum-full to put it on
>>>>> a crash diet, and then maintain a strict daily or bi-daily vacuum
>>>>> schedule to keep it under control.
>>>>>
>>>>
>>>> A nice try, but I had just completed a VACUUM on this database three
>>>> hours prior to starting the VACUUM that I gave up on after 27
>>>> hours.  So I don't see how much more frequently I could do it.  (The
>>>> one I did earlier finished in six hours - but to accomplish that I
>>>> had to shut down EVERYTHING else that machine was doing.)
>>>
>>>
>>> So, have you ever run vacuum full or reindex on this database?
>>>
>>
>> No.  However, this database has only existed since last Thursday
>> afternoon.
> Well, a couple of dozen update statements with no where clause on large
> tables could bloat it right up.
>
> It's not about age so much as update / delete patterns.
>>
>>> You are aware of the difference between how vacuum and vacuum full
>>> work, right?
>>>
>>> vacuum := mark deleted tuples as available, leave in table
>>> vacuum full := compact tables to remove deleted tuples.
>>>
>>> While you should generally avoid vacuum full, if you've let your
>>> database get so bloated that the majority of space in your tables is
>>> now empty / deleted tuples, you likely need to vacuuum full / reindex
>>> it.
>>>
>> If the database is continually growing, should VACUUM FULL be necessary?
> If it's only growing, with no deletes or updates, then no.  Generally,
> on a properly vacuumed database, vacuum full should never be needed.
>>> For instance, on my tiny little 31 Gigabyte reporting database, the
>>> main table takes up about 17 Gigs.  This query gives you some idea
>>> how many bytes each row is taking on average:
>>>
>>> select relname, relpages::float*8192 as size, reltuples,
>>> (relpages::double precision*8192)/reltuples::double precision as
>>> bytes_per_row from pg_class where relname = 'businessrequestsummary';
>>>        relname         |    size     |  reltuples  |  bytes_per_row
>>> ------------------------+-------------+-------------+-----------------
>>> businessrequestsummary | 17560944640 | 5.49438e+07 | 319.61656229454
>>>
>>> Note that these numbers are updated by running analyze...
>>>
>>> What does it say about your DB?
>>>
>>
>> I wish I could tell you.  Like I said, I had to abandon this project
>> to test out an upgrade procedure involving pg_dump and pg_restore.
>> (The upgrade also seems to take a very long time - over 6 hours so
>> far.) When it is back online I can provide further information.
>
> Well, let us know.  I would definitely recommend getting more / faster
> disks.  Right now I've got a simple 4 disk RAID10 on the way to replace
> the single SATA drive I'm running on right now.  I can't wait.
>

I checked the disk picture - this is a RAID disk array with 6 drives,
with a bit more than 1Tbyte total storage.  15,000 RPM.  It would be
hard to get more/faster disk than that.

Karl

pgsql-performance by date:

From: Andrew Sullivan
Date: 21 June 2007, 12:59:30
Subject: Re: Volunteer to build a configuration tool

From: "Campbell, Lance"
Date: 21 June 2007, 13:31:25
Subject: Re: Volunteer to build a configuration tool

Re: Performance query about large tables, lots of concurrent access - Mailing list pgsql-performance

Previous

Next