Thread: Performance query about large tables, lots of concurrent access

Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Hi,

I have an application which really exercises the performance of
postgresql in a major way, and I am running into a performance
bottleneck with Postgresql 8.1 that I do not yet understand.

Here are the details:

- There is a primary table, with some secondary tables
- The principle transaction consists of a "SELECT...FOR UPDATE",
followed by either an INSERT or an UPDATE on the primary table
- INSERTs, DELETEs, and UPDATEs may occur on the secondary table
depending on what happens with the primary table, for any given
transaction.  The secondary table has about 10x the number of rows as
the primary.
- All operations are carefully chosen so that highly discriminatory
indexes are used to locate the record(s) in question.  The execution
plans show INDEX SCAN operations being done in all cases.
- At any given time, there are up to 100 of these operations going on at
once against the same database.

What I am seeing:

- In postgresql 7.4, the table activity seems to be gated by locks, and
runs rather slowly except when the sizes of the tables are small.
- In postgresql 8.1, locks do not seem to be an issue, and the activity
runs about 10x faster than for postgresql 7.4.
- For EITHER database version, the scaling behavior is not the log(n)
behavior I'd expect (where n is the number of rows in the table), but
much more like linear performance.  That is, as the tables grow,
performance drops off precipitously.  For a primary table size up to
100,000 rows or so, I get somewhere around 700 transactions per minute,
on average.  Between 100,000 and 1,000,000 rows I got some 150
transactions per minute.  At about 1,500,000 rows I get about 40
transactions per minute.
- Access to a row in the secondary table (which right now has 13,000,000
rows in it) via an index that has extremely good discriminatory ability
on a busy machine takes about 90 seconds elapsed time at the moment -
which I feel is pretty high.

I tried increasing the shared_buffers parameter to see if it had any
impact on overall throughput.  It was moderately helpful going from the
small default value up to 8192, but less helpful when I increased it
beyond that.  Currently I have it set to 131072.

Question:  Does anyone have any idea what bottleneck I am hitting?  An
index's performance should in theory scale as the log of the number of
rows - what am I missing here?

Thanks very much!
Karl

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Karl Wright wrote:
> Hi,
>
> I have an application which really exercises the performance of
> postgresql in a major way, and I am running into a performance
> bottleneck with Postgresql 8.1 that I do not yet understand.
>
> Here are the details:
>
> - There is a primary table, with some secondary tables
> - The principle transaction consists of a "SELECT...FOR UPDATE",
> followed by either an INSERT or an UPDATE on the primary table
> - INSERTs, DELETEs, and UPDATEs may occur on the secondary table
> depending on what happens with the primary table, for any given
> transaction.  The secondary table has about 10x the number of rows as
> the primary.
> - All operations are carefully chosen so that highly discriminatory
> indexes are used to locate the record(s) in question.  The execution
> plans show INDEX SCAN operations being done in all cases.
> - At any given time, there are up to 100 of these operations going on at
> once against the same database.
>
> What I am seeing:
>
> - In postgresql 7.4, the table activity seems to be gated by locks, and
> runs rather slowly except when the sizes of the tables are small.
> - In postgresql 8.1, locks do not seem to be an issue, and the activity
> runs about 10x faster than for postgresql 7.4.
> - For EITHER database version, the scaling behavior is not the log(n)
> behavior I'd expect (where n is the number of rows in the table), but
> much more like linear performance.  That is, as the tables grow,
> performance drops off precipitously.  For a primary table size up to
> 100,000 rows or so, I get somewhere around 700 transactions per minute,
> on average.  Between 100,000 and 1,000,000 rows I got some 150
> transactions per minute.  At about 1,500,000 rows I get about 40
> transactions per minute.
> - Access to a row in the secondary table (which right now has 13,000,000
> rows in it) via an index that has extremely good discriminatory ability
> on a busy machine takes about 90 seconds elapsed time at the moment -
> which I feel is pretty high.
>
> I tried increasing the shared_buffers parameter to see if it had any
> impact on overall throughput.  It was moderately helpful going from the
> small default value up to 8192, but less helpful when I increased it
> beyond that.  Currently I have it set to 131072.
>
> Question:  Does anyone have any idea what bottleneck I am hitting?  An
> index's performance should in theory scale as the log of the number of
> rows - what am I missing here?
>
> Thanks very much!
> Karl
>

I suppose I should also have noted that the postgresql processes that
are dealing with the transactions seem to be CPU bound.  Here's a "top"
from the running system:

top - 15:58:50 up 4 days,  4:45,  1 user,  load average: 17.14, 21.05, 22.46
Tasks: 194 total,  15 running, 177 sleeping,   0 stopped,   2 zombie
Cpu(s): 98.4% us,  1.5% sy,  0.0% ni,  0.0% id,  0.1% wa,  0.0% hi,  0.0% si
Mem:  16634256k total, 16280244k used,   354012k free,   144560k buffers
Swap:  8008360k total,       56k used,  8008304k free, 15071968k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15966 postgres  18   0 1052m 1.0g 1.0g R 66.5  6.3   0:18.64 postmaster
14683 postgres  17   0 1053m 1.0g 1.0g R 54.9  6.3   0:17.90 postmaster
17050 postgres  15   0 1052m  93m  90m S 50.3  0.6   0:06.42 postmaster
16816 postgres  18   0 1052m 166m 162m R 46.3  1.0   0:04.80 postmaster
16697 postgres  18   0 1052m 992m 988m R 42.3  6.1   0:15.49 postmaster
17272 postgres  16   0 1053m 277m 273m S 30.8  1.7   0:09.91 postmaster
16659 postgres  16   0 1052m 217m 213m R 29.8  1.3   0:06.60 postmaster
15509 postgres  18   0 1052m 1.0g 1.0g R 23.2  6.4   0:26.72 postmaster
16329 postgres  18   0 1052m 195m 191m R 16.9  1.2   0:05.54 postmaster
14019 postgres  20   0 1052m 986m 983m R 16.5  6.1   0:16.50 postmaster
17002 postgres  18   0 1052m  38m  35m R 12.6  0.2   0:02.98 postmaster
16960 postgres  15   0 1053m 453m 449m S  3.3  2.8   0:10.39 postmaster
16421 postgres  15   0 1053m 1.0g 1.0g S  2.3  6.2   0:23.59 postmaster
13588 postgres  15   0 1052m 1.0g 1.0g D  0.3  6.4   0:47.89 postmaster
24708 root      15   0  2268 1136  836 R  0.3  0.0   0:05.92 top
     1 root      15   0  1584  520  452 S  0.0  0.0   0:02.08 init

Karl




Re: Performance query about large tables, lots of concurrent access

From
Tom Lane
Date:
Karl Wright <kwright@metacarta.com> writes:
> - At any given time, there are up to 100 of these operations going on at
> once against the same database.

It sounds like your hardware is far past "maxed out".  Which is odd
since tables with a million or so rows are pretty small for modern
hardware.  What's the CPU and disk hardware here, exactly?  What do you
see when watching vmstat or iostat (as appropriate for OS, which you
didn't mention either)?

            regards, tom lane

> Question:  Does anyone have any idea what bottleneck I am hitting?  An
> index's performance should in theory scale as the log of the number of
> rows - what am I missing here?

    These can help people on the list to help you :

    - Your hardware config (CPU, RAM, disk) ?
    - EXPLAIN ANALYZE from slow queries ?
    - VACUUM and ANALYZE : yes ? how often ?
    - VACUUM VERBOSE output

    for huge bits of text with long line length, mail sucks, upload to a web
host or something.

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Tom Lane wrote:
> Karl Wright <kwright@metacarta.com> writes:
>> - At any given time, there are up to 100 of these operations going on at
>> once against the same database.
>
> It sounds like your hardware is far past "maxed out".  Which is odd
> since tables with a million or so rows are pretty small for modern
> hardware.  What's the CPU and disk hardware here, exactly?  What do you
> see when watching vmstat or iostat (as appropriate for OS, which you
> didn't mention either)?
>
>             regards, tom lane
>

Yes, I was surprised as well, which is why I decided to post.

The hardware is a Dell 2950, two processor, dual-core each processor, 16
GB memory, with a RAID disk controller.  The operating system is Debian
Linux (sarge plus mods, currently using the Postgresql 8.1 backport).

Also, as I said before, I have done extensive query analysis and found
that the plans for the queries that are taking a long time are in fact
very reasonable.  Here's an example from the application log of a query
that took way more time than its plan would seem to indicate it should:

 >>>>>>
[2007-06-18 09:39:49,783]ERROR Found a query that took more than a
minute: [UPDATE intrinsiclink SET isnew=? WHERE ((jobid=? AND
childidhash=? AND childid=?)) AND (isnew=? OR isnew=?)]
[2007-06-18 09:39:49,783]ERROR   Parameter 0: 'B'
[2007-06-18 09:39:49,783]ERROR   Parameter 1: '1181766706097'
[2007-06-18 09:39:49,783]ERROR   Parameter 2:
'7E130F3B688687757187F1638D8776ECEF3009E0'
[2007-06-18 09:39:49,783]ERROR   Parameter 3:
'http://norwich.openguides.org/?action=index;index_type=category;index_value=Cafe;format=atom'
[2007-06-18 09:39:49,783]ERROR   Parameter 4: 'E'
[2007-06-18 09:39:49,783]ERROR   Parameter 5: 'N'
[2007-06-18 09:39:49,797]ERROR  Plan: Index Scan using i1181764142395 on
intrinsiclink  (cost=0.00..14177.29 rows=5 width=253)
[2007-06-18 09:39:49,797]ERROR  Plan:   Index Cond: ((jobid = $2) AND
((childidhash)::text = ($3)::text))
[2007-06-18 09:39:49,797]ERROR  Plan:   Filter: ((childid = ($4)::text)
AND ((isnew = ($5)::bpchar) OR (isnew = ($6)::bpchar)))
[2007-06-18 09:39:49,797]ERROR
<<<<<<
(The intrinsiclink table above is the "child table" I was referring to
earlier, with 13,000,000 rows at the moment.)

Overnight I shut things down and ran a VACUUM operation to see if that
might help.  I'll post again when I find out if indeed that changed any
performance numbers.  If not, I'll be able to post vmstat output at that
time.

Karl




Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
An overnight VACUUM helped things quite a bit.  I am now getting
throughput of around 75 transactions per minute, where before I was
getting 30.  Also, the CPU is no longer pegged, and the machines load
average has dropped to an acceptable 6-10 from somewhere above 20.

While this is still pretty far off the best performance I saw (when the
tables were smaller), it's reasonably consistent with O(log(n))
performance at least.

This particular run lasted four days before a VACUUM became essential.
The symptom that indicates that VACUUM is needed seems to be that the
CPU usage of any given postgresql query skyrockets.  Is this essentially
correct?

Karl

Karl Wright wrote:
> Tom Lane wrote:
>> Karl Wright <kwright@metacarta.com> writes:
>>> - At any given time, there are up to 100 of these operations going on
>>> at once against the same database.
>>
>> It sounds like your hardware is far past "maxed out".  Which is odd
>> since tables with a million or so rows are pretty small for modern
>> hardware.  What's the CPU and disk hardware here, exactly?  What do you
>> see when watching vmstat or iostat (as appropriate for OS, which you
>> didn't mention either)?
>>
>>             regards, tom lane
>>
>
> Yes, I was surprised as well, which is why I decided to post.
>
> The hardware is a Dell 2950, two processor, dual-core each processor, 16
> GB memory, with a RAID disk controller.  The operating system is Debian
> Linux (sarge plus mods, currently using the Postgresql 8.1 backport).
>
> Also, as I said before, I have done extensive query analysis and found
> that the plans for the queries that are taking a long time are in fact
> very reasonable.  Here's an example from the application log of a query
> that took way more time than its plan would seem to indicate it should:
>
>  >>>>>>
> [2007-06-18 09:39:49,783]ERROR Found a query that took more than a
> minute: [UPDATE intrinsiclink SET isnew=? WHERE ((jobid=? AND
> childidhash=? AND childid=?)) AND (isnew=? OR isnew=?)]
> [2007-06-18 09:39:49,783]ERROR   Parameter 0: 'B'
> [2007-06-18 09:39:49,783]ERROR   Parameter 1: '1181766706097'
> [2007-06-18 09:39:49,783]ERROR   Parameter 2:
> '7E130F3B688687757187F1638D8776ECEF3009E0'
> [2007-06-18 09:39:49,783]ERROR   Parameter 3:
> 'http://norwich.openguides.org/?action=index;index_type=category;index_value=Cafe;format=atom'
>
> [2007-06-18 09:39:49,783]ERROR   Parameter 4: 'E'
> [2007-06-18 09:39:49,783]ERROR   Parameter 5: 'N'
> [2007-06-18 09:39:49,797]ERROR  Plan: Index Scan using i1181764142395 on
> intrinsiclink  (cost=0.00..14177.29 rows=5 width=253)
> [2007-06-18 09:39:49,797]ERROR  Plan:   Index Cond: ((jobid = $2) AND
> ((childidhash)::text = ($3)::text))
> [2007-06-18 09:39:49,797]ERROR  Plan:   Filter: ((childid = ($4)::text)
> AND ((isnew = ($5)::bpchar) OR (isnew = ($6)::bpchar)))
> [2007-06-18 09:39:49,797]ERROR
> <<<<<<
> (The intrinsiclink table above is the "child table" I was referring to
> earlier, with 13,000,000 rows at the moment.)
>
> Overnight I shut things down and ran a VACUUM operation to see if that
> might help.  I'll post again when I find out if indeed that changed any
> performance numbers.  If not, I'll be able to post vmstat output at that
> time.
>
> Karl
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>


Re: Performance query about large tables, lots of concurrent access

From
Alvaro Herrera
Date:
Karl Wright wrote:

> This particular run lasted four days before a VACUUM became essential.
> The symptom that indicates that VACUUM is needed seems to be that the
> CPU usage of any given postgresql query skyrockets.  Is this essentially
> correct?

Are you saying you weren't used to run VACUUM all the time?  If so,
that's where the problem lies.

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar
al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
La Feria de las Tinieblas, R. Bradbury)

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Alvaro Herrera wrote:
> Karl Wright wrote:
>
>> This particular run lasted four days before a VACUUM became essential.
>> The symptom that indicates that VACUUM is needed seems to be that the
>> CPU usage of any given postgresql query skyrockets.  Is this essentially
>> correct?
>
> Are you saying you weren't used to run VACUUM all the time?  If so,
> that's where the problem lies.
>

Postgresql 7.4 VACUUM runs for so long that starting it with a cron job
even every 24 hours caused multiple instances of VACUUM to eventually be
running in my case.  So I tried to find a VACUUM schedule that permitted
each individual vacuum to finish before the next one started.  A vacuum
seemed to require 4-5 days with this particular database - or at least
it did for 7.4.  So I had the VACUUM schedule set to run every six days.

I will be experimenting with 8.1 to see how long it takes to complete a
vacuum under load conditions tonight.

Karl


Re: Performance query about large tables, lots of concurrent access

From
Gregory Stark
Date:
"Karl Wright" <kwright@metacarta.com> writes:

> This particular run lasted four days before a VACUUM became essential. The
> symptom that indicates that VACUUM is needed seems to be that the CPU usage of
> any given postgresql query skyrockets.  Is this essentially correct?

Postgres is designed on the assumption that VACUUM is run regularly. By
"regularly" we're talking of an interval usually on the order of hours, or
even less. On some workloads some tables need to be vacuumed every 5 minutes,
for example.

VACUUM doesn't require shutting down the system, it doesn't lock any tables or
otherwise prevent other jobs from making progress. It does add extra i/o but
there are knobs to throttle its i/o needs. The intention is that VACUUM run in
the background more or less continually using spare i/o bandwidth.

The symptom of not having run vacuum regularly is that tables and indexes
bloat to larger sizes than necessary. If you run "VACUUM VERBOSE" it'll tell
you how much bloat your tables and indexes are suffering from (though the
output is a bit hard to interpret).

Table and index bloat slow things down but not generally by increasing cpu
usage. Usually they slow things down by causing queries to require more i/o.

It's only UPDATES and DELETES that create garbage tuples that need to be
vacuumed though. If some of your tables are mostly insert-only they might need
to be vacuumed as frequently or at all.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Performance query about large tables, lots of concurrent access

From
Alvaro Herrera
Date:
Karl Wright wrote:
> Alvaro Herrera wrote:
> >Karl Wright wrote:
> >
> >>This particular run lasted four days before a VACUUM became essential.
> >>The symptom that indicates that VACUUM is needed seems to be that the
> >>CPU usage of any given postgresql query skyrockets.  Is this essentially
> >>correct?
> >
> >Are you saying you weren't used to run VACUUM all the time?  If so,
> >that's where the problem lies.
>
> Postgresql 7.4 VACUUM runs for so long that starting it with a cron job
> even every 24 hours caused multiple instances of VACUUM to eventually be
> running in my case.  So I tried to find a VACUUM schedule that permitted
> each individual vacuum to finish before the next one started.  A vacuum
> seemed to require 4-5 days with this particular database - or at least
> it did for 7.4.  So I had the VACUUM schedule set to run every six days.

How large is the database?  I must admit I have never seen a database
that took 4 days to vacuum.  This could mean that your database is
humongous, or that the vacuum strategy is wrong for some reason.

You know that you can run vacuum on particular tables, right?  It would
be probably a good idea to run vacuum on the most updated tables, and
leave alone those that are not or little updated (hopefully the biggest;
this would mean that an almost-complete vacuum run would take much less
than a whole day).

Or maybe vacuum was stuck waiting on a lock somewhere.

> I will be experimenting with 8.1 to see how long it takes to complete a
> vacuum under load conditions tonight.

You can also turn autovacuum on in 8.1, which might help quite a bit
with finding a good vacuum schedule (you would need a bit of tuning it
though, of course).

In any case, if you are struggling for performance you are strongly
adviced to upgrade to 8.2.

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/DXLWNGRJD34J
"No single strategy is always right (Unless the boss says so)"
                                                  (Larry Wall)

Re: Performance query about large tables, lots of concurrent access

From
Mark Lewis
Date:
On Tue, 2007-06-19 at 09:37 -0400, Karl Wright wrote:
> Alvaro Herrera wrote:
> > Karl Wright wrote:
> >
> >> This particular run lasted four days before a VACUUM became essential.
> >> The symptom that indicates that VACUUM is needed seems to be that the
> >> CPU usage of any given postgresql query skyrockets.  Is this essentially
> >> correct?
> >
> > Are you saying you weren't used to run VACUUM all the time?  If so,
> > that's where the problem lies.
> >
>
> Postgresql 7.4 VACUUM runs for so long that starting it with a cron job
> even every 24 hours caused multiple instances of VACUUM to eventually be
> running in my case.  So I tried to find a VACUUM schedule that permitted
> each individual vacuum to finish before the next one started.  A vacuum
> seemed to require 4-5 days with this particular database - or at least
> it did for 7.4.  So I had the VACUUM schedule set to run every six days.
>
> I will be experimenting with 8.1 to see how long it takes to complete a
> vacuum under load conditions tonight.

The longer you wait between vacuuming, the longer each vacuum is going
to take.

There is of course a point of diminishing returns for vacuum where this
no longer holds true; if you vacuum too frequently the overhead of
running the vacuum will dominate the running time.  But 6 days for a
busy database is probably way, way, way past that threshold.

Generally, the busier the database the more frequently you need to
vacuum, not less.  If your update/delete transaction rate is high enough
then you may need to vacuum multiple times per hour, at least on some
tables.  Playing with autovacuum might help you out here, because it can
look at how badly a vacuum is needed and adjust the vacuuming rate on
the fly on a per-table basis.  Be sure to look up some reasonable
autovacuum settings first; the 8.1 defaults aren't.

-- Mark

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Gregory Stark wrote:
> "Karl Wright" <kwright@metacarta.com> writes:
>
>> This particular run lasted four days before a VACUUM became essential. The
>> symptom that indicates that VACUUM is needed seems to be that the CPU usage of
>> any given postgresql query skyrockets.  Is this essentially correct?
>
> Postgres is designed on the assumption that VACUUM is run regularly. By
> "regularly" we're talking of an interval usually on the order of hours, or
> even less. On some workloads some tables need to be vacuumed every 5 minutes,
> for example.

Fine - but what if the previous vacuum is still in progress, and does
not finish in 5 minutes?

>
> VACUUM doesn't require shutting down the system, it doesn't lock any tables or
> otherwise prevent other jobs from making progress. It does add extra i/o but
> there are knobs to throttle its i/o needs. The intention is that VACUUM run in
> the background more or less continually using spare i/o bandwidth.
>

This spare bandwidth is apparently hard to come by in my particular
application.  That's the only way I can reconcile your information with
it taking 4 days to complete.

> The symptom of not having run vacuum regularly is that tables and indexes
> bloat to larger sizes than necessary. If you run "VACUUM VERBOSE" it'll tell
> you how much bloat your tables and indexes are suffering from (though the
> output is a bit hard to interpret).
>
> Table and index bloat slow things down but not generally by increasing cpu
> usage. Usually they slow things down by causing queries to require more i/o.
>

Yes, that's what I understood, which is why I was puzzled by the effects
I was seeing.

> It's only UPDATES and DELETES that create garbage tuples that need to be
> vacuumed though. If some of your tables are mostly insert-only they might need
> to be vacuumed as frequently or at all.
>

Well, the smaller tables don't change much, but the bigger tables have a
  lively mix of inserts and updates, so I would expect these would need
vacuuming often.

I'll post again when I can find a vacuum schedule that seems to work.

Karl

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Alvaro Herrera wrote:
> Karl Wright wrote:
>> Alvaro Herrera wrote:
>>> Karl Wright wrote:
>>>
>>>> This particular run lasted four days before a VACUUM became essential.
>>>> The symptom that indicates that VACUUM is needed seems to be that the
>>>> CPU usage of any given postgresql query skyrockets.  Is this essentially
>>>> correct?
>>> Are you saying you weren't used to run VACUUM all the time?  If so,
>>> that's where the problem lies.
>> Postgresql 7.4 VACUUM runs for so long that starting it with a cron job
>> even every 24 hours caused multiple instances of VACUUM to eventually be
>> running in my case.  So I tried to find a VACUUM schedule that permitted
>> each individual vacuum to finish before the next one started.  A vacuum
>> seemed to require 4-5 days with this particular database - or at least
>> it did for 7.4.  So I had the VACUUM schedule set to run every six days.
>
> How large is the database?  I must admit I have never seen a database
> that took 4 days to vacuum.  This could mean that your database is
> humongous, or that the vacuum strategy is wrong for some reason.
>

The database is humongus, and the machine is under intense load.  On the
instance where this long vacuum occurred, there were several large
tables - one with 7,000,000 rows, one with 14,000,000, one with
140,000,000, and one with 250,000,000.

> You know that you can run vacuum on particular tables, right?  It would
> be probably a good idea to run vacuum on the most updated tables, and
> leave alone those that are not or little updated (hopefully the biggest;
> this would mean that an almost-complete vacuum run would take much less
> than a whole day).

Yeah, sorry, that doesn't apply here.

>
> Or maybe vacuum was stuck waiting on a lock somewhere.
>
>> I will be experimenting with 8.1 to see how long it takes to complete a
>> vacuum under load conditions tonight.
>
> You can also turn autovacuum on in 8.1, which might help quite a bit
> with finding a good vacuum schedule (you would need a bit of tuning it
> though, of course).
>
> In any case, if you are struggling for performance you are strongly
> adviced to upgrade to 8.2.
>

Ok - that's something I should be able to do once we can go to debian's
etch release.  There's a backport of 8.2 available there.  (The one for
sarge is still considered 'experimental').

Karl


Re: Performance query about large tables, lots of concurrent access

From
Bill Moran
Date:
In response to Karl Wright <kwright@metacarta.com>:

> Alvaro Herrera wrote:
> > Karl Wright wrote:
> >> Alvaro Herrera wrote:
> >>> Karl Wright wrote:
> >>>
> >>>> This particular run lasted four days before a VACUUM became essential.
> >>>> The symptom that indicates that VACUUM is needed seems to be that the
> >>>> CPU usage of any given postgresql query skyrockets.  Is this essentially
> >>>> correct?
> >>> Are you saying you weren't used to run VACUUM all the time?  If so,
> >>> that's where the problem lies.
> >> Postgresql 7.4 VACUUM runs for so long that starting it with a cron job
> >> even every 24 hours caused multiple instances of VACUUM to eventually be
> >> running in my case.  So I tried to find a VACUUM schedule that permitted
> >> each individual vacuum to finish before the next one started.  A vacuum
> >> seemed to require 4-5 days with this particular database - or at least
> >> it did for 7.4.  So I had the VACUUM schedule set to run every six days.
> >
> > How large is the database?  I must admit I have never seen a database
> > that took 4 days to vacuum.  This could mean that your database is
> > humongous, or that the vacuum strategy is wrong for some reason.
>
> The database is humongus, and the machine is under intense load.  On the
> instance where this long vacuum occurred, there were several large
> tables - one with 7,000,000 rows, one with 14,000,000, one with
> 140,000,000, and one with 250,000,000.

Don't rule out the possibility that the only way to fix this _might_ be to
throw more hardware at it.  Proper configuration can buy you a lot, but if
your usage is exceeding the available bandwidth of the IO subsystem, the
only way you're going to get better performance is to put in a faster IO
subsystem.

> > You know that you can run vacuum on particular tables, right?  It would
> > be probably a good idea to run vacuum on the most updated tables, and
> > leave alone those that are not or little updated (hopefully the biggest;
> > this would mean that an almost-complete vacuum run would take much less
> > than a whole day).
>
> Yeah, sorry, that doesn't apply here.

Why not?  I see no reason why an appropriate autovaccum schedule would not
apply to your scenario.  I'm not saying it does, only that your response
does not indicate that it doesn't, and thus I'm concerned that you're
writing autovacuum off without proper research.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Bill Moran wrote:
> In response to Karl Wright <kwright@metacarta.com>:
>
>> Alvaro Herrera wrote:
>>> Karl Wright wrote:
>>>> Alvaro Herrera wrote:
>>>>> Karl Wright wrote:
>>>>>
>>>>>> This particular run lasted four days before a VACUUM became essential.
>>>>>> The symptom that indicates that VACUUM is needed seems to be that the
>>>>>> CPU usage of any given postgresql query skyrockets.  Is this essentially
>>>>>> correct?
>>>>> Are you saying you weren't used to run VACUUM all the time?  If so,
>>>>> that's where the problem lies.
>>>> Postgresql 7.4 VACUUM runs for so long that starting it with a cron job
>>>> even every 24 hours caused multiple instances of VACUUM to eventually be
>>>> running in my case.  So I tried to find a VACUUM schedule that permitted
>>>> each individual vacuum to finish before the next one started.  A vacuum
>>>> seemed to require 4-5 days with this particular database - or at least
>>>> it did for 7.4.  So I had the VACUUM schedule set to run every six days.
>>> How large is the database?  I must admit I have never seen a database
>>> that took 4 days to vacuum.  This could mean that your database is
>>> humongous, or that the vacuum strategy is wrong for some reason.
>> The database is humongus, and the machine is under intense load.  On the
>> instance where this long vacuum occurred, there were several large
>> tables - one with 7,000,000 rows, one with 14,000,000, one with
>> 140,000,000, and one with 250,000,000.
>
> Don't rule out the possibility that the only way to fix this _might_ be to
> throw more hardware at it.  Proper configuration can buy you a lot, but if
> your usage is exceeding the available bandwidth of the IO subsystem, the
> only way you're going to get better performance is to put in a faster IO
> subsystem.
>
>>> You know that you can run vacuum on particular tables, right?  It would
>>> be probably a good idea to run vacuum on the most updated tables, and
>>> leave alone those that are not or little updated (hopefully the biggest;
>>> this would mean that an almost-complete vacuum run would take much less
>>> than a whole day).
>> Yeah, sorry, that doesn't apply here.
>
> Why not?  I see no reason why an appropriate autovaccum schedule would not
> apply to your scenario.  I'm not saying it does, only that your response
> does not indicate that it doesn't, and thus I'm concerned that you're
> writing autovacuum off without proper research.
>

I'm not writing off autovacuum - just the concept that the large tables
aren't the ones that are changing.  Unfortunately, they *are* the most
dynamically updated.

Karl


Re: Performance query about large tables, lots of concurrent access

From
Gregory Stark
Date:
"Karl Wright" <kwright@metacarta.com> writes:

> Fine - but what if the previous vacuum is still in progress, and does not
> finish in 5 minutes?

Yes, well, there are problems with this design but the situation is already
much improved in 8.2 and there are more improvements on the horizon.

But it's likely that much of your pain is artificial here and once your
database is cleaned up a bit more it will be easier to manage.

> Well, the smaller tables don't change much, but the bigger tables have a lively
> mix of inserts and updates, so I would expect these would need vacuuming often.

Hm, I wonder if you're running into a performance bug that was fixed sometime
back around then. It involved having large numbers of tuples indexed with the
same key value. Every search for a single record required linearly searching
through the entire list of values.

If you have thousands of updates against the same tuple between vacuums you'll
have the same kind of situation and queries against that key will indeed
require lots of cpu.

To help any more you'll have to answer the basic questions like how many rows
are in the tables that take so long to vacuum, and how large are they on disk.
On 7.4 I think the best way to get the table size actually is by doing
"select relfilenode from pg_class where relname = 'tablename'" and then looking
in the postgres directory for the files in base/*/<relfilenode>*

The best information would be to do vacuum verbose and report the data it
prints out.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Performance query about large tables, lots of concurrent access

From
Kurt Overberg
Date:
A useful utility that I've found is PgFouine.  It has an option to
analyze VACUUM VERBOSE logs.  It has been instrumental in helping me
figure out whats been going on with my VACUUM that is taking 4+
hours, specifically tracking the tables that are taking the longest.
I highly recommend checking it out.   It would also perhaps be a good
idea rather than simply starting a vacuum every 6 days, set it so
that it starts again as soon as it finishes (using a lock file or
something that is polled for every few hours or minutes).  This way,
a vacuum will kick off right when the other one finishes, hopefully
slowly decreasing in time over time.

Hope this helps...

/kurt


On Jun 19, 2007, at 10:06 AM, Karl Wright wrote:

> Alvaro Herrera wrote:
>> Karl Wright wrote:
>>> Alvaro Herrera wrote:
>>>> Karl Wright wrote:
>>>>
>>>>> This particular run lasted four days before a VACUUM became
>>>>> essential. The symptom that indicates that VACUUM is needed
>>>>> seems to be that the CPU usage of any given postgresql query
>>>>> skyrockets.  Is this essentially correct?
>>>> Are you saying you weren't used to run VACUUM all the time?  If so,
>>>> that's where the problem lies.
>>> Postgresql 7.4 VACUUM runs for so long that starting it with a
>>> cron job even every 24 hours caused multiple instances of VACUUM
>>> to eventually be running in my case.  So I tried to find a VACUUM
>>> schedule that permitted each individual vacuum to finish before
>>> the next one started.  A vacuum seemed to require 4-5 days with
>>> this particular database - or at least it did for 7.4.  So I had
>>> the VACUUM schedule set to run every six days.
>> How large is the database?  I must admit I have never seen a database
>> that took 4 days to vacuum.  This could mean that your database is
>> humongous, or that the vacuum strategy is wrong for some reason.
>
> The database is humongus, and the machine is under intense load.
> On the instance where this long vacuum occurred, there were several
> large tables - one with 7,000,000 rows, one with 14,000,000, one
> with 140,000,000, and one with 250,000,000.
>
>> You know that you can run vacuum on particular tables, right?  It
>> would
>> be probably a good idea to run vacuum on the most updated tables, and
>> leave alone those that are not or little updated (hopefully the
>> biggest;
>> this would mean that an almost-complete vacuum run would take much
>> less
>> than a whole day).
>
> Yeah, sorry, that doesn't apply here.
>
>> Or maybe vacuum was stuck waiting on a lock somewhere.
>>> I will be experimenting with 8.1 to see how long it takes to
>>> complete a vacuum under load conditions tonight.
>> You can also turn autovacuum on in 8.1, which might help quite a bit
>> with finding a good vacuum schedule (you would need a bit of
>> tuning it
>> though, of course).
>> In any case, if you are struggling for performance you are strongly
>> adviced to upgrade to 8.2.
>
> Ok - that's something I should be able to do once we can go to
> debian's etch release.  There's a backport of 8.2 available there.
> (The one for sarge is still considered 'experimental').
>
> Karl
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


Re: Performance query about large tables, lots of concurrent access

From
Tom Lane
Date:
Karl Wright <kwright@metacarta.com> writes:
> Also, as I said before, I have done extensive query analysis and found
> that the plans for the queries that are taking a long time are in fact
> very reasonable.  Here's an example from the application log of a query
> that took way more time than its plan would seem to indicate it should:

> [2007-06-18 09:39:49,797]ERROR  Plan: Index Scan using i1181764142395 on
> intrinsiclink  (cost=0.00..14177.29 rows=5 width=253)
> [2007-06-18 09:39:49,797]ERROR  Plan:   Index Cond: ((jobid = $2) AND
> ((childidhash)::text = ($3)::text))
> [2007-06-18 09:39:49,797]ERROR  Plan:   Filter: ((childid = ($4)::text)
> AND ((isnew = ($5)::bpchar) OR (isnew = ($6)::bpchar)))

I see the discussion thread has moved on to consider lack-of-vacuuming
as the main problem, but I didn't want to let this pass without
comment.  The above plan is not necessarily good at all --- it depends
on how many rows are selected by the index condition alone (ie, jobid
and childidhash) versus how many are selected by the index and filter
conditions.  If the index retrieves many rows, most of which are
eliminated by the filter condition, it's still gonna take a long time.

In this case it looks like the planner is afraid that that's exactly
what will happen --- a cost of 14177 suggests that several thousand row
fetches are expected to happen, and yet it's only predicting 5 rows out
after the filter.  It's using this plan anyway because it has no better
alternative, but you should think about whether a different index
definition would help.

            regards, tom lane

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Tom Lane wrote:
> Karl Wright <kwright@metacarta.com> writes:
>> Also, as I said before, I have done extensive query analysis and found
>> that the plans for the queries that are taking a long time are in fact
>> very reasonable.  Here's an example from the application log of a query
>> that took way more time than its plan would seem to indicate it should:
>
>> [2007-06-18 09:39:49,797]ERROR  Plan: Index Scan using i1181764142395 on
>> intrinsiclink  (cost=0.00..14177.29 rows=5 width=253)
>> [2007-06-18 09:39:49,797]ERROR  Plan:   Index Cond: ((jobid = $2) AND
>> ((childidhash)::text = ($3)::text))
>> [2007-06-18 09:39:49,797]ERROR  Plan:   Filter: ((childid = ($4)::text)
>> AND ((isnew = ($5)::bpchar) OR (isnew = ($6)::bpchar)))
>
> I see the discussion thread has moved on to consider lack-of-vacuuming
> as the main problem, but I didn't want to let this pass without
> comment.  The above plan is not necessarily good at all --- it depends
> on how many rows are selected by the index condition alone (ie, jobid
> and childidhash) versus how many are selected by the index and filter
> conditions.  If the index retrieves many rows, most of which are
> eliminated by the filter condition, it's still gonna take a long time.
>
> In this case it looks like the planner is afraid that that's exactly
> what will happen --- a cost of 14177 suggests that several thousand row
> fetches are expected to happen, and yet it's only predicting 5 rows out
> after the filter.  It's using this plan anyway because it has no better
> alternative, but you should think about whether a different index
> definition would help.
>
>             regards, tom lane
>

Well, that's odd, because the hash in question that it is using is the
SHA-1 hash of a URL.  There's essentially one row per URL in this table.
  Even with a large table I would not expect more than a couple of
collisions at most.

How does it arrive at that estimate of 14,000?

Karl


Re: Performance query about large tables, lots of concurrent access

From
Tom Lane
Date:
Karl Wright <kwright@metacarta.com> writes:
> [2007-06-18 09:39:49,797]ERROR  Plan: Index Scan using i1181764142395 on
> intrinsiclink  (cost=0.00..14177.29 rows=5 width=253)
> [2007-06-18 09:39:49,797]ERROR  Plan:   Index Cond: ((jobid = $2) AND
> ((childidhash)::text = ($3)::text))
> [2007-06-18 09:39:49,797]ERROR  Plan:   Filter: ((childid = ($4)::text)
> AND ((isnew = ($5)::bpchar) OR (isnew = ($6)::bpchar)))

>> In this case it looks like the planner is afraid that that's exactly
>> what will happen --- a cost of 14177 suggests that several thousand row
>> fetches are expected to happen, and yet it's only predicting 5 rows out
>> after the filter.

> Well, that's odd, because the hash in question that it is using is the
> SHA-1 hash of a URL.  There's essentially one row per URL in this table.

What about isnew?

Also, how many rows do *you* expect out of the query?  The planner is
not going to be aware of the hashed relationship between childidhash
and childid --- it'll think those are independent conditions which they
evidently aren't.  So it may be that the query really does retrieve
thousands of rows, and the rows=5 estimate is bogus because it's
double-counting the selectivity of the childid condition.

            regards, tom lane

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Tom Lane wrote:
> Karl Wright <kwright@metacarta.com> writes:
>> [2007-06-18 09:39:49,797]ERROR  Plan: Index Scan using i1181764142395 on
>> intrinsiclink  (cost=0.00..14177.29 rows=5 width=253)
>> [2007-06-18 09:39:49,797]ERROR  Plan:   Index Cond: ((jobid = $2) AND
>> ((childidhash)::text = ($3)::text))
>> [2007-06-18 09:39:49,797]ERROR  Plan:   Filter: ((childid = ($4)::text)
>> AND ((isnew = ($5)::bpchar) OR (isnew = ($6)::bpchar)))
>
>>> In this case it looks like the planner is afraid that that's exactly
>>> what will happen --- a cost of 14177 suggests that several thousand row
>>> fetches are expected to happen, and yet it's only predicting 5 rows out
>>> after the filter.
>
>> Well, that's odd, because the hash in question that it is using is the
>> SHA-1 hash of a URL.  There's essentially one row per URL in this table.
>
> What about isnew?

Isnew is simply a flag which I want to set for all rows that belong to
this particular child, but only if it's one of two particular values.

>
> Also, how many rows do *you* expect out of the query?  The planner is
> not going to be aware of the hashed relationship between childidhash
> and childid --- it'll think those are independent conditions which they
> evidently aren't.  So it may be that the query really does retrieve
> thousands of rows, and the rows=5 estimate is bogus because it's
> double-counting the selectivity of the childid condition.
>

This can vary, but I expect there to be at on average a few dozen rows
returned from the overall query.  The only way the index-condition part
of the query can be returning thousands of rows would be if: (a) there
is really a lot of data of this kind, or (b) the hash function is
basically not doing its job and there are thousands of collisions occurring.

In fact, that's not the case. In psql I just did the following analysis:

 >>>>>>
metacarta=> explain select count(*) from intrinsiclink where
jobid=1181766706097 and
childidhash='7E130F3B688687757187F1638D8776ECEF3009E0';
                                                              QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=14992.23..14992.24 rows=1 width=0)
    ->  Index Scan using i1181764142395 on intrinsiclink
(cost=0.00..14971.81 rows=8167 width=0)
          Index Cond: ((jobid = 1181766706097::bigint) AND
((childidhash)::text = '7E130F3B688687757187F1638D8776ECEF3009E0'::text))
(3 rows)

metacarta=> select count(*) from intrinsiclink where jobid=1181766706097
and childidhash='7E130F3B688687757187F1638D8776ECEF3009E0';
  count
-------
      0
(1 row)
<<<<<<

Granted this is well after-the-fact, but you can see that the cost
estimate is wildly wrong in this case.

I did an ANALYZE on that table and repeated the explain, and got this:

 >>>>>>
metacarta=> analyze intrinsiclink;
ANALYZE
metacarta=> explain select count(*) from intrinsiclink where
jobid=1181766706097 and
childidhash='7E130F3B688687757187F1638D8776ECEF3009E0';
                                                              QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=15276.36..15276.37 rows=1 width=0)
    ->  Index Scan using i1181764142395 on intrinsiclink
(cost=0.00..15255.53 rows=8333 width=0)
          Index Cond: ((jobid = 1181766706097::bigint) AND
((childidhash)::text = '7E130F3B688687757187F1638D8776ECEF3009E0'::text))
(3 rows)
<<<<<<

... even more wildly wrong.

Karl

>             regards, tom lane
>


Re: Performance query about large tables, lots of concurrent access

From
Francisco Reyes
Date:
Gregory Stark writes:

> VACUUM doesn't require shutting down the system, it doesn't lock any tables or
> otherwise prevent other jobs from making progress. It does add extra i/o but

In addition to what Gregory pointed out, you may want to also consider using
Autovacuum. That may also help.

Re: Performance query about large tables, lots of concurrent access

From
Francisco Reyes
Date:
Alvaro Herrera writes:

> How large is the database?  I must admit I have never seen a database
> that took 4 days to vacuum.  This could mean that your database is
> humongous, or that the vacuum strategy is wrong for some reason.

Specially with 16GB of RAM.

I have a setup with several databases (the largest of which is 1TB database)
and I do a nightly vacuum analyze for ALL databases. It takes about 22
hours. And this is with constant updates to the large 1TB database. This is
with Postgresql 8.1.3

Re: Performance query about large tables, lots of concurrent access

From
Gregory Stark
Date:
"Karl Wright" <kwright@metacarta.com> writes:

>> In this case it looks like the planner is afraid that that's exactly
>> what will happen --- a cost of 14177 suggests that several thousand row
>> fetches are expected to happen, and yet it's only predicting 5 rows out
>> after the filter.  It's using this plan anyway because it has no better
>> alternative, but you should think about whether a different index
>> definition would help.

Another index won't help if the reason the cost is so high isn't because the
index isn't very selective but because there are lots of dead tuples.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Performance query about large tables, lots of concurrent access

From
Tom Lane
Date:
Karl Wright <kwright@metacarta.com> writes:
> I did an ANALYZE on that table and repeated the explain, and got this:
> ...
> ... even more wildly wrong.

Hmm.  You might need to increase the statistics target for your larger
tables.  It's probably not a big deal for queries like this one, but I'm
worried that you may be getting bad plans for complicated joins.

            regards, tom lane

Re: Performance query about large tables, lots of concurrent access

From
Francisco Reyes
Date:
Karl Wright writes:

> I'm not writing off autovacuum - just the concept that the large tables
> aren't the ones that are changing.  Unfortunately, they *are* the most
> dynamically updated.

Would be possible for you to partition the tables?
By date or some other fashion to try to have some tables not get affected by
the updates/inserts?

I am in the process of breaking a DB.. to have tables by dates. Our
historical data never changes.

Also, what is the physical size of all this data?


Re: Performance query about large tables, lots of concurrent access

From
Alvaro Herrera
Date:
Karl Wright wrote:

> I did an ANALYZE on that table and repeated the explain, and got this:
>
> >>>>>>
> metacarta=> analyze intrinsiclink;
> ANALYZE
> metacarta=> explain select count(*) from intrinsiclink where
> jobid=1181766706097 and
> childidhash='7E130F3B688687757187F1638D8776ECEF3009E0';
>                                                              QUERY PLAN
>
>
------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=15276.36..15276.37 rows=1 width=0)
>    ->  Index Scan using i1181764142395 on intrinsiclink
> (cost=0.00..15255.53 rows=8333 width=0)
>          Index Cond: ((jobid = 1181766706097::bigint) AND
> ((childidhash)::text = '7E130F3B688687757187F1638D8776ECEF3009E0'::text))
> (3 rows)
> <<<<<<
>
> ... even more wildly wrong.

Interesting.  What is the statistics target for this table?  Try
increasing it, with ALTER TABLE ... SET STATISTICS, rerun analyze, and
try again.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Performance query about large tables, lots of concurrent access

From
Gregory Stark
Date:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "Karl Wright" <kwright@metacarta.com> writes:
>
>>> In this case it looks like the planner is afraid that that's exactly
>>> what will happen --- a cost of 14177 suggests that several thousand row
>>> fetches are expected to happen, and yet it's only predicting 5 rows out
>>> after the filter.  It's using this plan anyway because it has no better
>>> alternative, but you should think about whether a different index
>>> definition would help.
>
> Another index won't help if the reason the cost is so high isn't because the
> index isn't very selective but because there are lots of dead tuples.

Sorry, I didn't mean to say that was definitely the case, only that having
bloated tables with lots of dead index pointers could have similar symptoms
because the query still has to follow all those index pointers.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Performance query about large tables, lots of concurrent access

From
Heikki Linnakangas
Date:
Francisco Reyes wrote:

> I have a setup with several databases (the largest of which is 1TB
> database) and I do a nightly vacuum analyze for ALL databases. It takes
> about 22 hours. And this is with constant updates to the large 1TB
> database. This is with Postgresql 8.1.3

22h nightly? Wow, you have long nights ;-).

On a serious note, the index vacuum improvements in 8.2 might help you
to cut that down. You seem to be happy with your setup, but I thought
I'd mention it..

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Performance query about large tables, lots of concurrent access

From
Francisco Reyes
Date:
Heikki Linnakangas writes:

> On a serious note, the index vacuum improvements in 8.2 might help you
> to cut that down. You seem to be happy with your setup, but I thought
> I'd mention it..

I am really, really trying.. to go to 8.2.
I have a thread on "general" going on for about a week.
I am unable to restore a database on 8.2.4.. on a particular machine.
Don't know if the issue is the machine configuration or whether I have found
a Postgresql bug.

The plan is to copy the data over and work on migrating to the second
machine.

Also we are splitting the database so historical information (which never
changes for us) will be in one DB and all the active/current data will be on
another. This way our backups/vacuums will be faster.

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Francisco Reyes wrote:
> Alvaro Herrera writes:
>
>> How large is the database?  I must admit I have never seen a database
>> that took 4 days to vacuum.  This could mean that your database is
>> humongous, or that the vacuum strategy is wrong for some reason.
>
> Specially with 16GB of RAM.
>
> I have a setup with several databases (the largest of which is 1TB
> database) and I do a nightly vacuum analyze for ALL databases. It takes
> about 22 hours. And this is with constant updates to the large 1TB
> database. This is with Postgresql 8.1.3

Okay - I started a VACUUM with the 8.1 database yesterday morning,
having the database remain under load.  As of 12:30 today (~27 hours),
the original VACUUM was still running.  At that point:

(a) I had to shut it down anyway because I needed to do another
experiment having to do with database export/import performance, and
(b) the performance of individual queries had already degraded
significantly in the same manner as what I'd seen before.

So, I guess this means that there's no way I can keep the database
adequately vacuumed with my anticipated load and hardware.  One thing or
the other will have to change.

Is the VACUUM in 8.2 significantly faster than the one in 8.1?  Or, is
the database less sensitive performance-wise to delayed VACUUM commands?

Karl

Re: Performance query about large tables, lots of concurrent access

From
Heikki Linnakangas
Date:
Karl Wright wrote:
> So, I guess this means that there's no way I can keep the database
> adequately vacuumed with my anticipated load and hardware.  One thing or
> the other will have to change.

Have you checked your maintenance_work_mem setting? If it's not large
enough, vacuum will need to scan through all indexes multiple times
instead of just once. With 16 GB of RAM you should set it to something
like 2GB I think, or even more.

> Is the VACUUM in 8.2 significantly faster than the one in 8.1?

Yes, in particular if you have a lot of indexes. Scanning the indexes
was done in index page order, which in worst case means random I/O, and
we used to do an extra scan of all index pages to collect empty ones.
Now it's all done as a single sequential pass.

>  Or, is
> the database less sensitive performance-wise to delayed VACUUM commands?

No.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Performance query about large tables, lots of concurrent access

From
Alvaro Herrera
Date:
Karl Wright wrote:

> (b) the performance of individual queries had already degraded
> significantly in the same manner as what I'd seen before.

You didn't answer whether you had smaller, more frequently updated
tables that need more vacuuming.  This comment makes me think you do.  I
think what you should be looking at is whether you can forget vacuuming
the whole database in one go, and make it more granular.

--
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)

Re: Performance query about large tables, lots of concurrent access

From
Francisco Reyes
Date:
Karl Wright writes:

> Okay - I started a VACUUM with the 8.1 database yesterday morning,
> having the database remain under load.  As of 12:30 today (~27 hours),
> the original VACUUM was still running.  At that point:

I don't recall if you said it already, but what is your
maintenance_work_mem?

> (a) I had to shut it down anyway because I needed to do another
> experiment having to do with database export/import performance, and

Do you know which tables change the most often?
Have you tried to do vacuum of those one at a time and see how long they
take?

> (b) the performance of individual queries had already degraded
> significantly in the same manner as what I'd seen before.

If you have a lot of inserts perhaps you can do analyze more often also.

> So, I guess this means that there's no way I can keep the database
> adequately vacuumed with my anticipated load and hardware.

It is a possibility, but you could consider other strategies.. totally
dependant on the programs accessing the data..

For example:
do you have any historical data that never changes?
Could that be moved to a different database in that same machine or another
machine? That would decrease your vacuum times.
Also partitioning the data so data that never changes is in separate
tables may also help (but I am not sure of this).

Given the sizes you sent to the list, it may be simply that it is more than
the hardware can handle.

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Alvaro Herrera wrote:
> Karl Wright wrote:
>
>> (b) the performance of individual queries had already degraded
>> significantly in the same manner as what I'd seen before.
>
> You didn't answer whether you had smaller, more frequently updated
> tables that need more vacuuming.  This comment makes me think you do.  I
> think what you should be looking at is whether you can forget vacuuming
> the whole database in one go, and make it more granular.
>

I am afraid that I did answer this.  My largest tables are the ones
continually being updated.  The smaller ones are updated only infrequently.

Karl


Re: Performance query about large tables, lots of concurrent access

From
Alvaro Herrera
Date:
Karl Wright wrote:
> Alvaro Herrera wrote:
> >Karl Wright wrote:
> >
> >>(b) the performance of individual queries had already degraded
> >>significantly in the same manner as what I'd seen before.
> >
> >You didn't answer whether you had smaller, more frequently updated
> >tables that need more vacuuming.  This comment makes me think you do.  I
> >think what you should be looking at is whether you can forget vacuuming
> >the whole database in one go, and make it more granular.
>
> I am afraid that I did answer this.  My largest tables are the ones
> continually being updated.  The smaller ones are updated only infrequently.

Can you afford to vacuum them in parallel?

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Java is clearly an example of money oriented programming"  (A. Stepanov)

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Francisco Reyes wrote:
> Karl Wright writes:
>
>> Okay - I started a VACUUM with the 8.1 database yesterday morning,
>> having the database remain under load.  As of 12:30 today (~27 hours),
>> the original VACUUM was still running.  At that point:
>
> I don't recall if you said it already, but what is your
> maintenance_work_mem?
>

I'm trying that now.

>> (a) I had to shut it down anyway because I needed to do another
>> experiment having to do with database export/import performance, and
>
> Do you know which tables change the most often?
> Have you tried to do vacuum of those one at a time and see how long they
> take?

I can certainly do that, but at the rate it's currently operating that
may take many more days.

>
>> (b) the performance of individual queries had already degraded
>> significantly in the same manner as what I'd seen before.
>
> If you have a lot of inserts perhaps you can do analyze more often also.

I'm not getting bad query plans; I'm getting good plans but slow
execution.  This is consistent with what someone else said, which was
that if you didn't run VACUUM enough, then dead tuples would cause
performance degradation of the kind I am seeing.

(FWIW, ANALYZE operations are kicked off after every 30,000 inserts,
updates, or deletes, by the application itself).

>> So, I guess this means that there's no way I can keep the database
>> adequately vacuumed with my anticipated load and hardware.
>
> It is a possibility, but you could consider other strategies.. totally
> dependant on the programs accessing the data..
>
> For example:
> do you have any historical data that never changes?

Some, but it's insignificant compared to the big tables that change all
over the place.

> Could that be moved to a different database in that same machine or
> another machine? That would decrease your vacuum times.

That's not an option, since we ship appliances and this would require
that each appliance somehow come in pairs.

> Also partitioning the data so data that never changes is in separate
> tables may also help (but I am not sure of this).
>

Right, see earlier discussion.

> Given the sizes you sent to the list, it may be simply that it is more
> than the hardware can handle.
>

I'm going to recommend going to 8.2 so that we get as much improvement
as possible before panicking entirely. :-)

Karl

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Alvaro Herrera wrote:
> Karl Wright wrote:
>> Alvaro Herrera wrote:
>>> Karl Wright wrote:
>>>
>>>> (b) the performance of individual queries had already degraded
>>>> significantly in the same manner as what I'd seen before.
>>> You didn't answer whether you had smaller, more frequently updated
>>> tables that need more vacuuming.  This comment makes me think you do.  I
>>> think what you should be looking at is whether you can forget vacuuming
>>> the whole database in one go, and make it more granular.
>> I am afraid that I did answer this.  My largest tables are the ones
>> continually being updated.  The smaller ones are updated only infrequently.
>
> Can you afford to vacuum them in parallel?
>

Hmm, interesting question.  If VACUUM is disk limited then it wouldn't
help, probably, unless I moved various tables to different disks
somehow.  Let me think about whether that might be possible.

Karl


Re: Performance query about large tables, lots of concurrent access

From
Alvaro Herrera
Date:
Karl Wright wrote:
> Alvaro Herrera wrote:
> >Karl Wright wrote:

> >>I am afraid that I did answer this.  My largest tables are the ones
> >>continually being updated.  The smaller ones are updated only
> >>infrequently.
> >
> >Can you afford to vacuum them in parallel?
>
> Hmm, interesting question.  If VACUUM is disk limited then it wouldn't
> help, probably, unless I moved various tables to different disks
> somehow.  Let me think about whether that might be possible.

Well, is it disk limited?  Do you have the vacuum_delay stuff enabled?

--
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
"I would rather have GNU than GNOT."  (ccchips, lwn.net/Articles/37595/)

Re: Performance query about large tables, lots of concurrent access

From
Michael Stone
Date:
Is there a sensible way to partition the large table into smaller
tables?

Mike Stone

Re: Performance query about large tables, lots of concurrent access

From
"Joshua D. Drake"
Date:
Michael Stone wrote:
> Is there a sensible way to partition the large table into smaller tables?

It entirely depends on your data set.

Joshua D. Drake


>
> Mike Stone
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: Performance query about large tables, lots of concurrent access

From
Andrew Sullivan
Date:
On Wed, Jun 20, 2007 at 02:01:34PM -0400, Karl Wright wrote:
> (FWIW, ANALYZE operations are kicked off after every 30,000 inserts,
> updates, or deletes, by the application itself).

I don't think you should do it that way.  I suspect that automatic
VACUUM ANALYSE way more often on each table -- like maybe in a loop
-- would be better for your case.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.
                --Brad Holland

Re: Performance query about large tables, lots of concurrent access

From
Michael Stone
Date:
On Wed, Jun 20, 2007 at 11:14:45AM -0700, Joshua D. Drake wrote:
>Michael Stone wrote:
>>Is there a sensible way to partition the large table into smaller tables?
>
>It entirely depends on your data set.

Yes, that's why it was a question rather than a suggestion. :)

Mike Stone

Re: Performance query about large tables, lots of concurrent access

From
Shaun Thomas
Date:
On Wednesday 20 June 2007 12:55:20 pm Karl Wright wrote:

> I am afraid that I did answer this.  My largest tables
> are the ones continually being updated.  The smaller
> ones are updated only infrequently.

You know, it actually sounds like you're getting whacked by the same
problem that got us a while back.  It sounds like you weren't vacuuming
frequently enough initially, and then tried vacuuming later, only after
you noticed performance degrade.

Unfortunately what that means, is for several weeks or months, Postgres
has not been reusing rows on your (admittedly) active and large tables;
it just appends at the end, and lets old rows slowly bloat that table
larger and larger.  Indexes too, will suffer from dead pages.  As
frightening/sickening as this sounds, you may need to dump/restore the
really huge table, or vacuum-full to put it on a crash diet, and then
maintain a strict daily or bi-daily vacuum schedule to keep it under
control.

The reason I think this: even with several 200M row tables, vacuums
shouldn't take over 24 hours.  Ever.  Do a vacuum verbose and see just
how many pages it's trying to reclaim.  I'm willing to wager it's
several orders of magnitude higher than the max_fsm_pages setting
you've stuck in your config file.

You'll also want to see which rows in your 250M+ table are actually
active, and shunt the stable rows to another (warehouse) table maybe
available only via view or table partition.  I mean, your most active
table is also the largest?  Seems a bit backward, to me.

--

Shaun Thomas
Database Administrator

Leapfrog Online
807 Greenwood Street
Evanston, IL 60201
Tel. 847-440-8253
Fax. 847-570-5750
www.leapfrogonline.com

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Shaun Thomas wrote:
> On Wednesday 20 June 2007 12:55:20 pm Karl Wright wrote:
>
>
>>I am afraid that I did answer this.  My largest tables
>>are the ones continually being updated.  The smaller
>>ones are updated only infrequently.
>
>
> You know, it actually sounds like you're getting whacked by the same
> problem that got us a while back.  It sounds like you weren't vacuuming
> frequently enough initially, and then tried vacuuming later, only after
> you noticed performance degrade.
>
> Unfortunately what that means, is for several weeks or months, Postgres
> has not been reusing rows on your (admittedly) active and large tables;
> it just appends at the end, and lets old rows slowly bloat that table
> larger and larger.  Indexes too, will suffer from dead pages.  As
> frightening/sickening as this sounds, you may need to dump/restore the
> really huge table, or vacuum-full to put it on a crash diet, and then
> maintain a strict daily or bi-daily vacuum schedule to keep it under
> control.
>

A nice try, but I had just completed a VACUUM on this database three
hours prior to starting the VACUUM that I gave up on after 27 hours.  So
I don't see how much more frequently I could do it.  (The one I did
earlier finished in six hours - but to accomplish that I had to shut
down EVERYTHING else that machine was doing.)

Karl


> The reason I think this: even with several 200M row tables, vacuums
> shouldn't take over 24 hours.  Ever.  Do a vacuum verbose and see just
> how many pages it's trying to reclaim.  I'm willing to wager it's
> several orders of magnitude higher than the max_fsm_pages setting
> you've stuck in your config file.
>
> You'll also want to see which rows in your 250M+ table are actually
> active, and shunt the stable rows to another (warehouse) table maybe
> available only via view or table partition.  I mean, your most active
> table is also the largest?  Seems a bit backward, to me.
>


Re: Performance query about large tables, lots of concurrent access

From
Andrew Sullivan
Date:
On Wed, Jun 20, 2007 at 05:29:41PM -0400, Karl Wright wrote:
> A nice try, but I had just completed a VACUUM on this database three
> hours prior to starting the VACUUM that I gave up on after 27 hours.

You keep putting it that way, but your problem is essentially that
you have several tables that _all_ need to be vacuumed.  VACUUM need
not actually be a database-wide operation.

> earlier finished in six hours - but to accomplish that I had to shut
> down EVERYTHING else that machine was doing.)

This suggests to me that you simply don't have enough machine for the
job.  You probably need more I/O, and actually more CPU wouldn't
hurt, because then you could run three VACUUMs on three separate
tables (on three separate disks, of course) and not have to switch
them off and on the CPU.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
        --Alexander Hamilton

Re: Performance query about large tables, lots of concurrent access

From
Scott Marlowe
Date:
Karl Wright wrote:
> Shaun Thomas wrote:
>> On Wednesday 20 June 2007 12:55:20 pm Karl Wright wrote:
>>
>>
>>> I am afraid that I did answer this.  My largest tables
>>> are the ones continually being updated.  The smaller
>>> ones are updated only infrequently.
>>
>>
>> You know, it actually sounds like you're getting whacked by the same
>> problem that got us a while back.  It sounds like you weren't
>> vacuuming frequently enough initially, and then tried vacuuming
>> later, only after you noticed performance degrade.
>>
>> Unfortunately what that means, is for several weeks or months,
>> Postgres has not been reusing rows on your (admittedly) active and
>> large tables; it just appends at the end, and lets old rows slowly
>> bloat that table larger and larger.  Indexes too, will suffer from
>> dead pages.  As frightening/sickening as this sounds, you may need to
>> dump/restore the really huge table, or vacuum-full to put it on a
>> crash diet, and then maintain a strict daily or bi-daily vacuum
>> schedule to keep it under control.
>>
>
> A nice try, but I had just completed a VACUUM on this database three
> hours prior to starting the VACUUM that I gave up on after 27 hours.
> So I don't see how much more frequently I could do it.  (The one I did
> earlier finished in six hours - but to accomplish that I had to shut
> down EVERYTHING else that machine was doing.)

So, have you ever run vacuum full or reindex on this database?

You are aware of the difference between how vacuum and vacuum full work,
right?

vacuum := mark deleted tuples as available, leave in table
vacuum full := compact tables to remove deleted tuples.

While you should generally avoid vacuum full, if you've let your
database get so bloated that the majority of space in your tables is now
empty / deleted tuples, you likely need to vacuuum full / reindex it.

For instance, on my tiny little 31 Gigabyte reporting database, the main
table takes up about 17 Gigs.  This query gives you some idea how many
bytes each row is taking on average:

select relname, relpages::float*8192 as size, reltuples,
(relpages::double precision*8192)/reltuples::double precision as
bytes_per_row from pg_class where relname = 'businessrequestsummary';
        relname         |    size     |  reltuples  |  bytes_per_row
------------------------+-------------+-------------+-----------------
 businessrequestsummary | 17560944640 | 5.49438e+07 | 319.61656229454

Note that these numbers are updated by running analyze...

What does it say about your DB?

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Scott Marlowe wrote:
> Karl Wright wrote:
>
>> Shaun Thomas wrote:
>>
>>> On Wednesday 20 June 2007 12:55:20 pm Karl Wright wrote:
>>>
>>>
>>>> I am afraid that I did answer this.  My largest tables
>>>> are the ones continually being updated.  The smaller
>>>> ones are updated only infrequently.
>>>
>>>
>>>
>>> You know, it actually sounds like you're getting whacked by the same
>>> problem that got us a while back.  It sounds like you weren't
>>> vacuuming frequently enough initially, and then tried vacuuming
>>> later, only after you noticed performance degrade.
>>>
>>> Unfortunately what that means, is for several weeks or months,
>>> Postgres has not been reusing rows on your (admittedly) active and
>>> large tables; it just appends at the end, and lets old rows slowly
>>> bloat that table larger and larger.  Indexes too, will suffer from
>>> dead pages.  As frightening/sickening as this sounds, you may need to
>>> dump/restore the really huge table, or vacuum-full to put it on a
>>> crash diet, and then maintain a strict daily or bi-daily vacuum
>>> schedule to keep it under control.
>>>
>>
>> A nice try, but I had just completed a VACUUM on this database three
>> hours prior to starting the VACUUM that I gave up on after 27 hours.
>> So I don't see how much more frequently I could do it.  (The one I did
>> earlier finished in six hours - but to accomplish that I had to shut
>> down EVERYTHING else that machine was doing.)
>
>
> So, have you ever run vacuum full or reindex on this database?
>

No.  However, this database has only existed since last Thursday afternoon.

> You are aware of the difference between how vacuum and vacuum full work,
> right?
>
> vacuum := mark deleted tuples as available, leave in table
> vacuum full := compact tables to remove deleted tuples.
>
> While you should generally avoid vacuum full, if you've let your
> database get so bloated that the majority of space in your tables is now
> empty / deleted tuples, you likely need to vacuuum full / reindex it.
>

If the database is continually growing, should VACUUM FULL be necessary?

> For instance, on my tiny little 31 Gigabyte reporting database, the main
> table takes up about 17 Gigs.  This query gives you some idea how many
> bytes each row is taking on average:
>
> select relname, relpages::float*8192 as size, reltuples,
> (relpages::double precision*8192)/reltuples::double precision as
> bytes_per_row from pg_class where relname = 'businessrequestsummary';
>        relname         |    size     |  reltuples  |  bytes_per_row
> ------------------------+-------------+-------------+-----------------
> businessrequestsummary | 17560944640 | 5.49438e+07 | 319.61656229454
>
> Note that these numbers are updated by running analyze...
>
> What does it say about your DB?
>

I wish I could tell you.  Like I said, I had to abandon this project to
test out an upgrade procedure involving pg_dump and pg_restore.  (The
upgrade also seems to take a very long time - over 6 hours so far.)
When it is back online I can provide further information.

Karl

Re: Performance query about large tables, lots of concurrent access

From
Scott Marlowe
Date:
Karl Wright wrote:
> Scott Marlowe wrote:
>> Karl Wright wrote:
>>
>>> Shaun Thomas wrote:
>>>
>>>> On Wednesday 20 June 2007 12:55:20 pm Karl Wright wrote:
>>>>
>>>>
>>>>> I am afraid that I did answer this.  My largest tables
>>>>> are the ones continually being updated.  The smaller
>>>>> ones are updated only infrequently.
>>>>
>>>>
>>>>
>>>> You know, it actually sounds like you're getting whacked by the
>>>> same problem that got us a while back.  It sounds like you weren't
>>>> vacuuming frequently enough initially, and then tried vacuuming
>>>> later, only after you noticed performance degrade.
>>>>
>>>> Unfortunately what that means, is for several weeks or months,
>>>> Postgres has not been reusing rows on your (admittedly) active and
>>>> large tables; it just appends at the end, and lets old rows slowly
>>>> bloat that table larger and larger.  Indexes too, will suffer from
>>>> dead pages.  As frightening/sickening as this sounds, you may need
>>>> to dump/restore the really huge table, or vacuum-full to put it on
>>>> a crash diet, and then maintain a strict daily or bi-daily vacuum
>>>> schedule to keep it under control.
>>>>
>>>
>>> A nice try, but I had just completed a VACUUM on this database three
>>> hours prior to starting the VACUUM that I gave up on after 27
>>> hours.  So I don't see how much more frequently I could do it.  (The
>>> one I did earlier finished in six hours - but to accomplish that I
>>> had to shut down EVERYTHING else that machine was doing.)
>>
>>
>> So, have you ever run vacuum full or reindex on this database?
>>
>
> No.  However, this database has only existed since last Thursday
> afternoon.
Well, a couple of dozen update statements with no where clause on large
tables could bloat it right up.

It's not about age so much as update / delete patterns.
>
>> You are aware of the difference between how vacuum and vacuum full
>> work, right?
>>
>> vacuum := mark deleted tuples as available, leave in table
>> vacuum full := compact tables to remove deleted tuples.
>>
>> While you should generally avoid vacuum full, if you've let your
>> database get so bloated that the majority of space in your tables is
>> now empty / deleted tuples, you likely need to vacuuum full / reindex
>> it.
>>
> If the database is continually growing, should VACUUM FULL be necessary?
If it's only growing, with no deletes or updates, then no.  Generally,
on a properly vacuumed database, vacuum full should never be needed.
>> For instance, on my tiny little 31 Gigabyte reporting database, the
>> main table takes up about 17 Gigs.  This query gives you some idea
>> how many bytes each row is taking on average:
>>
>> select relname, relpages::float*8192 as size, reltuples,
>> (relpages::double precision*8192)/reltuples::double precision as
>> bytes_per_row from pg_class where relname = 'businessrequestsummary';
>>        relname         |    size     |  reltuples  |  bytes_per_row
>> ------------------------+-------------+-------------+-----------------
>> businessrequestsummary | 17560944640 | 5.49438e+07 | 319.61656229454
>>
>> Note that these numbers are updated by running analyze...
>>
>> What does it say about your DB?
>>
>
> I wish I could tell you.  Like I said, I had to abandon this project
> to test out an upgrade procedure involving pg_dump and pg_restore.
> (The upgrade also seems to take a very long time - over 6 hours so
> far.) When it is back online I can provide further information.

Well, let us know.  I would definitely recommend getting more / faster
disks.  Right now I've got a simple 4 disk RAID10 on the way to replace
the single SATA drive I'm running on right now.  I can't wait.

Re: Performance query about large tables, lots of concurrent access

From
Karl Wright
Date:
Scott Marlowe wrote:
> Karl Wright wrote:
>> Scott Marlowe wrote:
>>> Karl Wright wrote:
>>>
>>>> Shaun Thomas wrote:
>>>>
>>>>> On Wednesday 20 June 2007 12:55:20 pm Karl Wright wrote:
>>>>>
>>>>>
>>>>>> I am afraid that I did answer this.  My largest tables
>>>>>> are the ones continually being updated.  The smaller
>>>>>> ones are updated only infrequently.
>>>>>
>>>>>
>>>>>
>>>>> You know, it actually sounds like you're getting whacked by the
>>>>> same problem that got us a while back.  It sounds like you weren't
>>>>> vacuuming frequently enough initially, and then tried vacuuming
>>>>> later, only after you noticed performance degrade.
>>>>>
>>>>> Unfortunately what that means, is for several weeks or months,
>>>>> Postgres has not been reusing rows on your (admittedly) active and
>>>>> large tables; it just appends at the end, and lets old rows slowly
>>>>> bloat that table larger and larger.  Indexes too, will suffer from
>>>>> dead pages.  As frightening/sickening as this sounds, you may need
>>>>> to dump/restore the really huge table, or vacuum-full to put it on
>>>>> a crash diet, and then maintain a strict daily or bi-daily vacuum
>>>>> schedule to keep it under control.
>>>>>
>>>>
>>>> A nice try, but I had just completed a VACUUM on this database three
>>>> hours prior to starting the VACUUM that I gave up on after 27
>>>> hours.  So I don't see how much more frequently I could do it.  (The
>>>> one I did earlier finished in six hours - but to accomplish that I
>>>> had to shut down EVERYTHING else that machine was doing.)
>>>
>>>
>>> So, have you ever run vacuum full or reindex on this database?
>>>
>>
>> No.  However, this database has only existed since last Thursday
>> afternoon.
> Well, a couple of dozen update statements with no where clause on large
> tables could bloat it right up.
>
> It's not about age so much as update / delete patterns.
>>
>>> You are aware of the difference between how vacuum and vacuum full
>>> work, right?
>>>
>>> vacuum := mark deleted tuples as available, leave in table
>>> vacuum full := compact tables to remove deleted tuples.
>>>
>>> While you should generally avoid vacuum full, if you've let your
>>> database get so bloated that the majority of space in your tables is
>>> now empty / deleted tuples, you likely need to vacuuum full / reindex
>>> it.
>>>
>> If the database is continually growing, should VACUUM FULL be necessary?
> If it's only growing, with no deletes or updates, then no.  Generally,
> on a properly vacuumed database, vacuum full should never be needed.
>>> For instance, on my tiny little 31 Gigabyte reporting database, the
>>> main table takes up about 17 Gigs.  This query gives you some idea
>>> how many bytes each row is taking on average:
>>>
>>> select relname, relpages::float*8192 as size, reltuples,
>>> (relpages::double precision*8192)/reltuples::double precision as
>>> bytes_per_row from pg_class where relname = 'businessrequestsummary';
>>>        relname         |    size     |  reltuples  |  bytes_per_row
>>> ------------------------+-------------+-------------+-----------------
>>> businessrequestsummary | 17560944640 | 5.49438e+07 | 319.61656229454
>>>
>>> Note that these numbers are updated by running analyze...
>>>
>>> What does it say about your DB?
>>>
>>
>> I wish I could tell you.  Like I said, I had to abandon this project
>> to test out an upgrade procedure involving pg_dump and pg_restore.
>> (The upgrade also seems to take a very long time - over 6 hours so
>> far.) When it is back online I can provide further information.
>
> Well, let us know.  I would definitely recommend getting more / faster
> disks.  Right now I've got a simple 4 disk RAID10 on the way to replace
> the single SATA drive I'm running on right now.  I can't wait.
>

I checked the disk picture - this is a RAID disk array with 6 drives,
with a bit more than 1Tbyte total storage.  15,000 RPM.  It would be
hard to get more/faster disk than that.

Karl

Re: Performance query about large tables, lots of concurrent access

From
Andrew Sullivan
Date:
On Thu, Jun 21, 2007 at 12:29:49PM -0400, Karl Wright wrote:
> I checked the disk picture - this is a RAID disk array with 6 drives,
> with a bit more than 1Tbyte total storage.  15,000 RPM.  It would be
> hard to get more/faster disk than that.

What kind of RAID?  It's _easy_ to get faster disk that 6 drives in
RAID5, even if they're 15,000 RPM.  The rotation speed is the least
of your problems in many RAID implementations.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.
        --H.W. Fowler

Re: Performance query about large tables, lots of concurrent access

From
Scott Marlowe
Date:
Andrew Sullivan wrote:
> On Thu, Jun 21, 2007 at 12:29:49PM -0400, Karl Wright wrote:
>
>> I checked the disk picture - this is a RAID disk array with 6 drives,
>> with a bit more than 1Tbyte total storage.  15,000 RPM.  It would be
>> hard to get more/faster disk than that.
>>
>
> What kind of RAID?  It's _easy_ to get faster disk that 6 drives in
> RAID5, even if they're 15,000 RPM.  The rotation speed is the least
> of your problems in many RAID implementations.
>
Also, the controller means a lot.  I'd rather have a 4 disk RAID-10 with
an Areca card with BBU Cache than a 16 disk RAID 5 on an adaptec (with
or without cache... :) )

Re: Performance query about large tables, lots of concurrent access

From
Scott Marlowe
Date:
Andrew Sullivan wrote:
> On Thu, Jun 21, 2007 at 12:29:49PM -0400, Karl Wright wrote:
>
>> I checked the disk picture - this is a RAID disk array with 6 drives,
>> with a bit more than 1Tbyte total storage.  15,000 RPM.  It would be
>> hard to get more/faster disk than that.
>>
>
> What kind of RAID?  It's _easy_ to get faster disk that 6 drives in
> RAID5, even if they're 15,000 RPM.  The rotation speed is the least
> of your problems in many RAID implementations.

Oh, and the driver rev means a lot too.  Some older driver revisions for
some RAID cards are very slow.

Re: Performance query about large tables, lots of concurrent access

From
"Joshua D. Drake"
Date:
Scott Marlowe wrote:
> Andrew Sullivan wrote:
>> On Thu, Jun 21, 2007 at 12:29:49PM -0400, Karl Wright wrote:
>>
>>> I checked the disk picture - this is a RAID disk array with 6 drives,
>>> with a bit more than 1Tbyte total storage.  15,000 RPM.  It would be
>>> hard to get more/faster disk than that.
>>>
>>
>> What kind of RAID?  It's _easy_ to get faster disk that 6 drives in
>> RAID5, even if they're 15,000 RPM.  The rotation speed is the least
>> of your problems in many RAID implementations.
>>
> Also, the controller means a lot.  I'd rather have a 4 disk RAID-10 with
> an Areca card with BBU Cache than a 16 disk RAID 5 on an adaptec (with
> or without cache... :) )

Oh come on... Adaptec makes a great skeet.

Joshua D. Drake


>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: Performance query about large tables, lots of concurrent access

From
Michael Stone
Date:
On Thu, Jun 21, 2007 at 12:29:49PM -0400, Karl Wright wrote:
>I checked the disk picture - this is a RAID disk array with 6 drives,
>with a bit more than 1Tbyte total storage.  15,000 RPM.  It would be
>hard to get more/faster disk than that.

Well, it's not hard to more disk than that, but you'd probably have to
look at an external storage array (or more than one). A larger number of
larger/slower drives, splitting indices away from data, etc., will
almost certainly outperform 6 disks, 15k RPM or not.

Mike Stone

Re: Performance query about large tables, lots of concurrent access

From
Jean-David Beyer
Date:
Michael Stone wrote:
> On Thu, Jun 21, 2007 at 12:29:49PM -0400, Karl Wright wrote:
>> I checked the disk picture - this is a RAID disk array with 6 drives,
>> with a bit more than 1Tbyte total storage.  15,000 RPM.  It would be
>> hard to get more/faster disk than that.
>
> Well, it's not hard to more disk than that, but you'd probably have to
> look at an external storage array (or more than one). A larger number of
> larger/slower drives, splitting indices away from data, etc., will almost
> certainly outperform 6 disks, 15k RPM or not.
>
I also have 6 hard drives (Four of these are 10,000RPM Ultra/320 SCSI hard
drives, and the other two will be soon), 4 of which are dedicated
exclusively to the DBMS, and the other two are for everything else. I am
currently running IBM DB2, using the 4 SCSI drives in raw mode and letting
DB2 do the IO (except for the bottom level device drivers). I have all the
Indices on one drive, and most of the data on the other three, except for
some very small, seldom used tables (one has two rows, one has about 10
rows) that are managed by the OS on the other drives. I have tested this and
the bottleneck is the logfiles. For this reason, I am about to upgrade the
"everything else" drives to SCSI drives (the logfiles are on one of these).
They are currently 7200 rpm EIDE drives, but the SCSI ones are sitting on
top of the computer now, ready to be installed.

When I upgrade from RHEL3 to RHEL5 (disks for that are also sitting on top
of the computer), I will be switching from DB2 to postgreSQL, and that will
be an opportunity to lay out the disks differently. I think the partitions
will end up being about the same, but for the four main data drives, I am
thinking about doing something like this, where D is data and X is Index,
and Ti is table.

Drive 3    Drive 4    Drive 5    Drive 6
DT1    XT1
    DT2    XT2
        DT3    Xt3
XT4            DT4
etc.

Now once that is set up and populated, it might make sense to move things
around somewhat to further reduce seek contention. But that would require
actually populating the database and measuring it.
This setup would probably be pretty good if using just T1 and T3, for
example, but less good if using just T1 and T2. So ideal, such as it is,
would depend on the accesses being made by the usual program to the database.

These drives are about 17 GBytes each, which is enough for the database in
question. (The other two are about 80 GBytes each, which is enough to run
Linux and my other stuff on.)

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 08:45:01 up 4 days, 16:20, 3 users, load average: 4.23, 4.24, 4.21