Thread: vacuum locking

vacuum locking

From
Rob Nagler
Date:
It seems a simple "vacuum" (not full or analyze) slows down the
database dramatically.  I am running vacuum every 15 minutes, but it
takes about 5 minutes to run even after a fresh import.  Even with
vacuuming every 15 minutes, I'm not sure vacuuming is working
properly.

There are a lot of updates.  The slowest relation is the primary key
index, which is composed of a sequence.  I've appended a csv with the
parsed output from vacuum.  The page counts are growing way too fast
imo.  I believe this is caused by the updates, and index pages not
getting re-used.  The index values aren't changing, but other values
in the table are.

Any suggestions how to make vacuuming more effective and reducing the
time it takes to vacuum?  I'd settle for less frequent vacuuming or
perhaps index rebuilding.  The database can be re-imported in about an
hour.

Rob
----------------------------------------------------------------
Spacing every 15 minutes
Pages,Tuples,Deleted
7974,1029258,1536
7979,1025951,4336
7979,1026129,52
7979,1025618,686
7979,1025520,152
7980,1025583,28
7995,1028008,6
8004,1030016,14
8010,1026149,4965
8012,1026684,6
8014,1025910,960
8020,1026812,114
8027,1027642,50
8031,1027913,362
8040,1028368,784
8046,1028454,1143
8049,1029155,6
8053,1029980,10
8065,1031506,24
8084,1029134,4804
8098,1031004,346
8103,1029412,3044
8118,1029736,1872
8141,1031643,1704
8150,1032597,286
8152,1033222,6
8159,1029436,4845
8165,1029987,712
8170,1030229,268
8176,1029568,1632
8189,1030136,1540
8218,1030915,3963
8255,1033049,4598
8297,1036583,3866
8308,1031412,8640
8315,1031987,1058
8325,1033892,6
8334,1030589,4625
8350,1031709,1040
8400,1033071,5946
8426,1031555,8368
8434,1031638,2240
8436,1031703,872
8442,1031891,612



Re: vacuum locking

From
Shridhar Daithankar
Date:
Rob Nagler wrote:

> It seems a simple "vacuum" (not full or analyze) slows down the
> database dramatically.  I am running vacuum every 15 minutes, but it
> takes about 5 minutes to run even after a fresh import.  Even with
> vacuuming every 15 minutes, I'm not sure vacuuming is working
> properly.
>
> There are a lot of updates.  The slowest relation is the primary key
> index, which is composed of a sequence.  I've appended a csv with the
> parsed output from vacuum.  The page counts are growing way too fast
> imo.  I believe this is caused by the updates, and index pages not
> getting re-used.  The index values aren't changing, but other values
> in the table are.

You should try 7.4 beta and pg_autovacuum which is a contrib module in CVS tip.
It works with 7.3 as well.

Major reason for 7.4 is, it fixes index growth in vacuum. So if your database is
fit, it will stay that way with proper vacuuming.
>
> Any suggestions how to make vacuuming more effective and reducing the
> time it takes to vacuum?  I'd settle for less frequent vacuuming or
> perhaps index rebuilding.  The database can be re-imported in about an
> hour.

Make sure that you have FSM properly tuned. Bump it from defaults to suit your
needs. I hope you have gone thr. this page for general purpose setting.

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

>
> Rob
> ----------------------------------------------------------------
> Spacing every 15 minutes
> Pages,Tuples,Deleted
> 7974,1029258,1536
> 7979,1025951,4336
> 7979,1026129,52
> 7979,1025618,686

Assuming those were incremental figures, largest you have is ~8000 tuples per 15
minutes and 26 pages. I think with proper FSM/shared buffers/effective cache and
a pg_autovacuum with 1 min. polling interval, you could end up in lot better shape.

Let us know if it works.

  Shridhar


Re: vacuum locking

From
Rod Taylor
Date:
> Any suggestions how to make vacuuming more effective and reducing the
> time it takes to vacuum?  I'd settle for less frequent vacuuming or
> perhaps index rebuilding.  The database can be re-imported in about an
> hour.

Which version and what are your FSM settings?

Attachment

Re: vacuum locking

From
Rob Nagler
Date:
Shridhar Daithankar writes:
> You should try 7.4 beta and pg_autovacuum which is a contrib module
> in CVS tip.

It's on our todo list. :)

How does pg_autovacuum differ from vacuumdb?  I mean it seems to call
the vacuum operation underneath just as vacuumdb does.  I obviously
didn't follow the logic as to how it gets there. :-)

> Make sure that you have FSM properly tuned. Bump it from defaults to
> suit your needs. I hope you have gone thr. this page for general
> purpose setting.

I didn't start vacuuming regularly until recently, so I didn't see
this problem.

> Assuming those were incremental figures, largest you have is ~8000
> tuples per 15 minutes and 26 pages. I think with proper FSM/shared
> buffers/effective cache and a pg_autovacuum with 1 min. polling
> interval, you could end up in lot better shape.

Here are the numbers that are different.  I'm using 7.3:

shared_buffers = 8000
sort_mem = 8000
vacuum_mem = 64000
effective_cache_size = 40000

free says:
             total       used       free     shared    buffers     cached
Mem:       1030676    1005500      25176          0      85020     382280
-/+ buffers/cache:     538200     492476
Swap:      2096472     272820    1823652

It seems effective_cache_size is about right.

vacuum_mem might be slowing down the system?  But if I reduce it,
won't vacuuming get slower?

max_fsm_relations is probably too low (the default in my conf file
says 100, probably needs to be 1000).  Not sure how this affects disk
usage.

Here's the summary for the two active tables during a vacuum interval
with high activity.  The other tables don't get much activity, and are
much smaller.  As you see the 261 + 65 adds up to the bulk of the 5
minutes it takes to vacuum.

INFO:  Removed 8368 tuples in 427 pages.
        CPU 0.06s/0.04u sec elapsed 1.54 sec.
INFO:  Pages 24675: Changed 195, Empty 0; Tup 1031519: Vac 8368, Keep 254, UnUsed 1739.
        Total CPU 2.92s/2.58u sec elapsed 65.35 sec.

INFO:  Removed 232 tuples in 108 pages.
        CPU 0.01s/0.02u sec elapsed 0.27 sec.
INFO:  Pages 74836: Changed 157, Empty 0; Tup 4716475: Vac 232, Keep 11, UnUsed
641.
        Total CPU 10.19s/6.03u sec elapsed 261.44 sec.

How would vacuuming every minute finish in time?  It isn't changing
much in the second table, but it's taking 261 seconds to wade through
5m rows.

Assuming I vacuum every 15 minutes, it would seem like max_fsm_pages
should be 1000, because that's about what was reclaimed.  The default
is 10000.  Do I need to change this?

Sorry to be so dense, but I just don't know the right values are.

Thanks muchly for the advice,
Rob



Re: vacuum locking

From
Josh Berkus
Date:
Rob,

> vacuum_mem might be slowing down the system?  But if I reduce it,
> won't vacuuming get slower?

Yes, but it will have less of an impact on the system while it's running.

> INFO:  Removed 8368 tuples in 427 pages.
>         CPU 0.06s/0.04u sec elapsed 1.54 sec.
> INFO:  Pages 24675: Changed 195, Empty 0; Tup 1031519: Vac 8368, Keep 254,
> UnUsed 1739. Total CPU 2.92s/2.58u sec elapsed 65.35 sec.
>
> INFO:  Removed 232 tuples in 108 pages.
>         CPU 0.01s/0.02u sec elapsed 0.27 sec.
> INFO:  Pages 74836: Changed 157, Empty 0; Tup 4716475: Vac 232, Keep 11,
> UnUsed 641.
>         Total CPU 10.19s/6.03u sec elapsed 261.44 sec.

What sort of disk array do you have?   That seems like a lot of time
considering how little work VACUUM is doing.


--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: vacuum locking

From
Manfred Koizar
Date:
On Fri, 17 Oct 2003 09:52:26 -0600, Rob Nagler <nagler@bivio.biz>
wrote:
>INFO:  Removed 8368 tuples in 427 pages.
>        CPU 0.06s/0.04u sec elapsed 1.54 sec.
>INFO:  Pages 24675: Changed 195, Empty 0; Tup 1031519: Vac 8368, Keep 254, UnUsed 1739.
>        Total CPU 2.92s/2.58u sec elapsed 65.35 sec.
>
>INFO:  Removed 232 tuples in 108 pages.
>        CPU 0.01s/0.02u sec elapsed 0.27 sec.
>INFO:  Pages 74836: Changed 157, Empty 0; Tup 4716475: Vac 232, Keep 11, UnUsed
>641.
>        Total CPU 10.19s/6.03u sec elapsed 261.44 sec.

The low UnUsed numbers indicate that FSM is working fine.

>Assuming I vacuum every 15 minutes, it would seem like max_fsm_pages
>should be 1000, because that's about what was reclaimed.  The default
>is 10000.  Do I need to change this?

ISTM you are VACCUMing too aggressively.  You are reclaiming less than
1% and 0.005%, respectively, of tuples.  I would increase FSM settings
to ca. 1000 fsm_relations, 100000 fsm_pages and VACUUM *less* often,
say every two hours or so.

... or configure autovacuum to VACUUM a table when it has 10% dead
tuples.

Servus
 Manfred

Re: vacuum locking

From
Rob Nagler
Date:
Manfred Koizar writes:
> ISTM you are VACCUMing too aggressively.  You are reclaiming less than
> 1% and 0.005%, respectively, of tuples.  I would increase FSM settings
> to ca. 1000 fsm_relations, 100000 fsm_pages and VACUUM *less* often,
> say every two hours or so.

I did this.  We'll see how it goes.

> ... or configure autovacuum to VACUUM a table when it has 10% dead
> tuples.

This solution doesn't really fix the fact that VACUUM consumes the
disk while it is running.  I want to avoid the erratic performance on
my web server when VACUUM is running.

mfg,
Rob


Re: vacuum locking

From
Rob Nagler
Date:
Josh Berkus writes:
> Yes, but it will have less of an impact on the system while it's running.

We'll find out.   I lowered it to vacuum_mem to 32000.

> What sort of disk array do you have?   That seems like a lot of time
> considering how little work VACUUM is doing.

Vendor: DELL     Model: PERCRAID Mirror  Rev: V1.0
  Type:   Direct-Access                    ANSI SCSI revision: 02

Two 10K disks attached.

Rob

Re: vacuum locking

From
Vivek Khera
Date:
>>>>> "RN" == Rob Nagler <nagler@bivio.biz> writes:


RN> Vendor: DELL     Model: PERCRAID Mirror  Rev: V1.0
RN>   Type:   Direct-Access                    ANSI SCSI revision: 02


AMI or Adaptec based?

If AMI, make sure it has write-back cache enabled (and you have
battery backup!), and disable the 'readahead' feature if you can.


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: vacuum locking

From
Vivek Khera
Date:
>>>>> "RN" == Rob Nagler <nagler@bivio.biz> writes:

RN> This solution doesn't really fix the fact that VACUUM consumes the
RN> disk while it is running.  I want to avoid the erratic performance on
RN> my web server when VACUUM is running.

What's the disk utilization proir to running vacuum?  If it is
hovering around 95% or more of capacity, of course you're gonna
overwhelm it.

This ain't Star Trek -- the engines can't run at 110%, Cap'n!


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: vacuum locking

From
Rob Nagler
Date:
Vivek Khera writes:
> AMI or Adaptec based?

Adaptec, I think.  AIC-7899 LVD SCSI is what dmidecode says, and
Red Hat/Adaptec aacraid driver, Aug 18 2003 is what comes up when it
boots.  I haven't be able to use the aac utilities with this driver,
however, so it's hard to interrogate the device.

> If AMI, make sure it has write-back cache enabled (and you have
> battery backup!), and disable the 'readahead' feature if you can.

I can't do this so easily.  It's at a colo, and it's production.
I doubt this has anything to do with this problem, anyway.  We're
talking about hundreds of megabytes of data.

> What's the disk utilization proir to running vacuum?  If it is
> hovering around 95% or more of capacity, of course you're gonna
> overwhelm it.

Here's the vmstat 5 at a random time:

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0 272372  38416  78220 375048   0   3     2     0    0     0   2   2   0
 0  0  0 272372  30000  78320 375660   0   0    34   274  382   284   5   1  94
 0  1  0 272372  23012  78372 375924   0   0    25   558  445   488   8   2  90
 1  0  0 272368  22744  78472 376192   0   6   125   594  364   664   9   3  88

And here's it during vacuum:

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  2  1 277292   9620  72028 409664  46  32  4934  4812 1697   966   8   4  88
 0  3  0 277272   9588  72096 412964  61   0  7303  2478 1391   976   3   3  94
 2  2  0 277336   9644  72136 393264 1326  32  2827  2954 1693  1519   8   3  89
The pages are growing proportionately with the number of tuples, btw.
Here's a vacuum snippet from a few days ago after a clean import,
running every 15 minutes:

INFO:  Removed 2192 tuples in 275 pages.
        CPU 0.06s/0.01u sec elapsed 0.91 sec.
INFO:  Pages 24458: Changed 260, Empty 0; Tup 1029223: Vac 2192, Keep 3876, UnUsed 26.
        Total CPU 2.91s/2.22u sec elapsed 65.74 sec.

And here's the latest today, running every 2 hours:

INFO:  Removed 28740 tuples in 1548 pages.
        CPU 0.08s/0.06u sec elapsed 3.73 sec.
INFO:  Pages 27277: Changed 367, Empty 0; Tup 1114178: Vac 28740, Keep 1502, UnUsed 10631.
        Total CPU 4.78s/4.09u sec elapsed 258.10 sec.

The big tables/indexes are taking longer, but it's a big CPU/elapsed
time savings to vacuum every two hours vs every 15 minutes.

There's still the problem that when vacuum is running interactive
performance drops dramatically.  A query that takes a couple of
seconds to run when the db isn't being vacuumed will take minutes when
vacuum is running.  It's tough for me to correlate exactly, but I
suspect that while postgres is vacuuming an index or table, nothing else
runs.  In between relations, other stuff gets to run, and then vacuum
hogs all the resources again.  This could be for disk reasons or
simply because postgres locks the index or table while it is being
vacuumed.  Either way, the behavior is unacceptable.  Users shouldn't
have to wait minutes while the database picks up after itself.

The concept of vacuuming seems to be problematic.  I'm not sure why
the database simply can't garbage collect incrementally.  AGC is very
tricky, especially AGC that involves gigabytes of data on disk.
Incremental garbage collection seems to be what other databases do,
and it's been my experience that other databases don't have the type
of unpredictable behavior I'm seeing with Postgres.  I'd rather the
database be a little bit slower on average than have to figure out the
best time to inconvenience my users.

Since my customer already has Oracle, we'll be running tests in the
coming month(s :-) with Oracle to see how it performs under the same
load and hardware.  I'll keep this group posted.

Rob



Re: vacuum locking

From
Tom Lane
Date:
Rob Nagler <nagler@bivio.biz> writes:
> Here's the vmstat 5 at a random time:

>    procs                      memory    swap          io     system         cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
>  0  0  0 272372  38416  78220 375048   0   3     2     0    0     0   2   2   0
>  0  0  0 272372  30000  78320 375660   0   0    34   274  382   284   5   1  94
>  0  1  0 272372  23012  78372 375924   0   0    25   558  445   488   8   2  90
>  1  0  0 272368  22744  78472 376192   0   6   125   594  364   664   9   3  88

> And here's it during vacuum:

>    procs                      memory    swap          io     system         cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
>  1  2  1 277292   9620  72028 409664  46  32  4934  4812 1697   966   8   4  88
>  0  3  0 277272   9588  72096 412964  61   0  7303  2478 1391   976   3   3  94
>  2  2  0 277336   9644  72136 393264 1326  32  2827  2954 1693  1519   8   3  89

The increased I/O activity is certainly to be expected, but what I find
striking here is that you've got substantial swap activity in the second
trace.  What is causing that?  Not VACUUM I don't think.  It doesn't have
any huge memory demand.  But swapping out processes could account for
the perceived slowdown in interactive response.

            regards, tom lane

Re: vacuum locking

From
Mario Weilguni
Date:
Am Donnerstag, 23. Oktober 2003 01:32 schrieb Rob Nagler:
> The concept of vacuuming seems to be problematic.  I'm not sure why
> the database simply can't garbage collect incrementally.  AGC is very
> tricky, especially AGC that involves gigabytes of data on disk.
> Incremental garbage collection seems to be what other databases do,
> and it's been my experience that other databases don't have the type
> of unpredictable behavior I'm seeing with Postgres.  I'd rather the
> database be a little bit slower on average than have to figure out the
> best time to inconvenience my users.

I think oracle does not do garbage collect, it overwrites the tuples directly
and stores the old tuples in undo buffers. Since most transactions are
commits, this is a big win.


Re: vacuum locking

From
Andrew Sullivan
Date:
On Wed, Oct 22, 2003 at 09:27:47PM -0400, Tom Lane wrote:

> trace.  What is causing that?  Not VACUUM I don't think.  It doesn't have
> any huge memory demand.  But swapping out processes could account for

What about if you've set vacuum_mem too high?

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: vacuum locking

From
Tom Lane
Date:
Andrew Sullivan <andrew@libertyrms.info> writes:
> On Wed, Oct 22, 2003 at 09:27:47PM -0400, Tom Lane wrote:
>> trace.  What is causing that?  Not VACUUM I don't think.  It doesn't have
>> any huge memory demand.  But swapping out processes could account for

> What about if you've set vacuum_mem too high?

Maybe, but only if it actually had reason to use a ton of memory ---
that is, it were recycling a very large number of tuples in a single
table.  IIRC that didn't seem to be the case here.

            regards, tom lane

Re: vacuum locking

From
Tom Lane
Date:
Mario Weilguni <mweilguni@sime.com> writes:
> I think oracle does not do garbage collect, it overwrites the tuples directly
> and stores the old tuples in undo buffers. Since most transactions are
> commits, this is a big win.

... if all tuples are the same size, and if you never have any
transactions that touch enough tuples to overflow your undo segment
(or even just sit there for a long time, preventing you from recycling
undo-log space; this is the dual of the VACUUM-can't-reclaim-dead-tuple
problem).  And a few other problems that any Oracle DBA can tell you about.
I prefer our system.

            regards, tom lane

Re: vacuum locking

From
Andrew Sullivan
Date:
On Thu, Oct 23, 2003 at 09:17:41AM -0400, Tom Lane wrote:
>
> Maybe, but only if it actually had reason to use a ton of memory ---
> that is, it were recycling a very large number of tuples in a single
> table.  IIRC that didn't seem to be the case here.

Ah, that's what I was trying to ask.  I didn't know if the memory was
actually taken by vacuum at the beginning (like shared memory is) or
what-all happened.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: vacuum locking

From
Rob Nagler
Date:
Tom Lane writes:
> ... if all tuples are the same size, and if you never have any

Incorrect.  If the tuples smaller, Oracle does the right thing.  If
there's enough space in the page, it shifts the tuples to make room.
That's what pctfree, pctused and pctincrease allow you to control.
It's all in memory so its fast, and I don't think it has to update any
indices.

> transactions that touch enough tuples to overflow your undo segment

That's easily configured, and hasn't been a problem in the databases
I've managed.

> (or even just sit there for a long time, preventing you from recycling

That's probably bad software or a batch system--which is tuned
differently.  Any OLTP system has to be able to partition its problems
to keep transactions short and small.  If it doesn't, it will not be
usable.

> undo-log space; this is the dual of the VACUUM-can't-reclaim-dead-tuple
> problem).  And a few other problems that any Oracle DBA can tell you
> about.  I prefer our system.

Oracle seems to make the assumption that data changes, which is why it
manages free space within each page as well as within free lists.  The
database will be bigger but you get much better performance on DML.
It is very good at caching so reads are fast.

Postgres seems to make the assumption that updates and deletes are
rare.  A delete/insert policy for updates means that a highly indexed
table requires lots of disk I/O when the update happens and the
concomitant garbage collection when vacuum runs.  But then MVCC makes
the assumption that there's lots of DML.  I don't understand the
philosphical split here.

I guess I don't understand what application profiles/statistics makes
you prefer Postgres' approach over Oracle's.

> The increased I/O activity is certainly to be expected, but what I find
> striking here is that you've got substantial swap activity in the second
> trace.  What is causing that?  Not VACUUM I don't think.  It doesn't have
> any huge memory demand.  But swapping out processes could account for
> the perceived slowdown in interactive response.

The box is a bit memory starved, and we'll be addressing that
shortly.  I don't think it accounts for 3 minute queries, but perhaps
it might.  vacuum_mem is 32mb, btw.

Rob



Re: vacuum locking

From
Vivek Khera
Date:
>>>>> "RN" == Rob Nagler <nagler@bivio.biz> writes:

RN> Vivek Khera writes:
>> AMI or Adaptec based?

RN> Adaptec, I think.  AIC-7899 LVD SCSI is what dmidecode says, and
RN> Red Hat/Adaptec aacraid driver, Aug 18 2003 is what comes up when it

Cool.  No need to diddle with it, then.  The Adaptec work quite well,
especially if you have battery backup.

Anyhow, it seems that as Tom mentioned, you are going into swap when
your vacuum runs, so I'll suspect you're just at the edge of total
memory utilization, and then you go over the top.

Another theory is that the disk capacity is near saturation, the
vacuum causes it to slow down just a smidge, and then your application
opens additional connections to handle the incoming requests which
don't complete fast enough, causing more memory usage with the
additional postmasters created.  Again, you suffer the slow spiral of
death due to resource shortage.

I'd start by getting full diagnosis of overall what your system is
doing during the vacuum (eg, additional processes created) then see if
adding RAM will help.

Also, how close are you to the capacity of your disk bandwidth?  I
don't see that in your numbers.  I know in freebsd I can run "systat
-vmstat" and it gives me a percentage of utilization that lets me know
when I'm near the capacity.


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: vacuum locking

From
Mario Weilguni
Date:
Am Donnerstag, 23. Oktober 2003 15:26 schrieb Tom Lane:
> ... if all tuples are the same size, and if you never have any
> transactions that touch enough tuples to overflow your undo segment
> (or even just sit there for a long time, preventing you from recycling
> undo-log space; this is the dual of the VACUUM-can't-reclaim-dead-tuple
> problem).  And a few other problems that any Oracle DBA can tell you about.
> I prefer our system.

of course both approaches have advantages, it simply depends on the usage
pattern. A case where oracle really rules over postgresql are m<-->n
connection tables where each record consist of two foreign keys, the
overwrite approach is a big win here.


Re: vacuum locking

From
Greg Stark
Date:
Rob Nagler <nagler@bivio.biz> writes:

> Incorrect.  If the tuples smaller, Oracle does the right thing.  If
> there's enough space in the page, it shifts the tuples to make room.
> That's what pctfree, pctused and pctincrease allow you to control.
> It's all in memory so its fast, and I don't think it has to update any
> indices.

Note that pctfree/pctused are a big performance drain on the usual case. Try
setting them to 0/100 on a table that doesn't get updates (like a many-many
relation table) and see how much faster it is to insert and scan.

> > transactions that touch enough tuples to overflow your undo segment
>
> That's easily configured, and hasn't been a problem in the databases
> I've managed.

Judging by the number of FAQ lists out there that explain various quirks of
rollback segment configuration I wouldn't say it's so easily configured.

> > (or even just sit there for a long time, preventing you from recycling
>
> That's probably bad software or a batch system--which is tuned
> differently.  Any OLTP system has to be able to partition its problems
> to keep transactions short and small.  If it doesn't, it will not be
> usable.

Both DSS style and OLTP style databases can be accomodated with rollback
segments though it seems to me that DSS style databases lose most of the
advantage of rollback segments and optimistic commit.

The biggest problem is on systems where there's a combination of both users.
You need tremendous rollback segments to deal with the huge volume of oltp
transactions that can occur during a single DSS query. And the DSS query
performance is terrible as it has to check the rollback segments for a large
portion of the blocks it reads.

> Oracle seems to make the assumption that data changes,

Arguably it's the other way around. Postgres's approach wins whenever most of
the tuples in a table have been updated, in that case it just has to scan the
whole table ignoring old records not visible to the transaction. Oracle has to
consult the rollback segment for any recently updated tuple. Oracle's wins in
the case where most of the tuples haven't changed so it can just scan the
table without consulting lots of rollback segments.

--
greg

Re: vacuum locking

From
"Stephen"
Date:
I ran into the same problem with VACUUM on my Linux box. If you are running
Linux, take a look at "elvtune" or read this post:

http://groups.google.com/groups?q=stephen+vacuum+linux&hl=en&lr=&ie=UTF-8&se
lm=gRdjb.7484%241o2.77%40nntp-post.primus.ca&rnum=3

Regards, Stephen


"Rob Nagler" <nagler@bivio.biz> wrote in message
news:16272.30527.120343.547492@jump.bivio.com...
> Manfred Koizar writes:
> > ISTM you are VACCUMing too aggressively.  You are reclaiming less than
> > 1% and 0.005%, respectively, of tuples.  I would increase FSM settings
> > to ca. 1000 fsm_relations, 100000 fsm_pages and VACUUM *less* often,
> > say every two hours or so.
>
> I did this.  We'll see how it goes.
>
> > ... or configure autovacuum to VACUUM a table when it has 10% dead
> > tuples.
>
> This solution doesn't really fix the fact that VACUUM consumes the
> disk while it is running.  I want to avoid the erratic performance on
> my web server when VACUUM is running.
>
> mfg,
> Rob
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>



Re: vacuum locking

From
Rob Nagler
Date:
Stephen writes:
> I ran into the same problem with VACUUM on my Linux box. If you are running
> Linux, take a look at "elvtune" or read this post:

The default values were -r 64 -w 8192.  The article said this was
"optimal".  I just futzed with different values anywere from -w 128 -r
128 to -r 16 -w 8192.  None of these mattered much when vacuum is
running.

This is a RAID1 box with two disks.  Even with vacuum and one other
postmaster running, it's still got to get a lot of blocks through the
I/O system.

Rob



Re: vacuum locking

From
Rob Nagler
Date:
Mario Weilguni writes:
> of course both approaches have advantages, it simply depends on the usage
> pattern. A case where oracle really rules over postgresql are m<-->n
> connection tables where each record consist of two foreign keys, the
> overwrite approach is a big win here.

That's usually our case.  My company almost always has "groupware"
problems to solve.  Every record has a "realm" (security) foreign key
and typically another key.  The infrastructure puts the security
key on queries to avoid returning the wrong realm's data.

Rob



Re: vacuum locking

From
Rob Nagler
Date:
Vivek Khera writes:
> Also, how close are you to the capacity of your disk bandwidth?  I
> don't see that in your numbers.  I know in freebsd I can run "systat
> -vmstat" and it gives me a percentage of utilization that lets me know
> when I'm near the capacity.

The vacuum totally consumes the system.  It's in a constant "D".  As
near as I can tell, it's hitting all blocks in the database.

The problem is interactive performance when vacuum is in a D state.
Even with just two processes doing "stuff" (vacuum and a select, let's
say), the select is very slow.

My understanding of the problem is that if a query hits the disk hard
(and many of my queries do) and vacuum is hitting the disk hard, they
contend for the same resource and nobody wins.  The query optimizer
has lots of problems with my queries and ends up doing silly sorts.
As a simple example, one query goes like this:

        select avg(f1) from t1 group by f2;

This results in a plan like:

 Aggregate  (cost=171672.95..180304.41 rows=115086 width=32)
   ->  Group  (cost=171672.95..177427.26 rows=1150862 width=32)
         ->  Sort  (cost=171672.95..174550.10 rows=1150862 width=32)
               Sort Key: f2
               ->  Seq Scan on t1  (cost=0.00..39773.62 rows=1150862 width=32)

This is of course stupid, because it sorts a 1M rows, which probably
means it has to hit disk (sort_mem can only be so large).  Turns out
there are only about 20 different values of f2, so it would be much
better to aggregate without sorting.  This is the type of query which
runs while vacuum runs and I'm sure the two are just plain
incompatible.  vacuum is read intensive and this query is write
intensive.

Rob



Re: vacuum locking

From
Rob Nagler
Date:
Greg Stark writes:
> Note that pctfree/pctused are a big performance drain on the usual case. Try
> setting them to 0/100 on a table that doesn't get updates (like a many-many
> relation table) and see how much faster it is to insert and scan.

Right.  You can optimize each table independently.  The "usual" case
doesn't exist in most databases, I've found, which is why Oracle does
better.

> Judging by the number of FAQ lists out there that explain various quirks of
> rollback segment configuration I wouldn't say it's so easily configured.

Maybe we just got lucky. :-)

> The biggest problem is on systems where there's a combination of both users.

As is ours.

> You need tremendous rollback segments to deal with the huge volume of oltp
> transactions that can occur during a single DSS query. And the DSS query
> performance is terrible as it has to check the rollback segments for a large
> portion of the blocks it reads.

The DSS issues only come into play I think if the queries are long.
This is our problem.  Postgres does a bad job with DSS, I believe.  I
mentioned the select avg(f1) from t1 group by f2 in another message.
If it were optimized for "standard" SQL, such as, avg, sum, etc., I
think it would do a lot better with DSS-type problems.  Our problem
seems to be that the DSS queries almost always hit disk to sort.

> Arguably it's the other way around. Postgres's approach wins whenever most of
> the tuples in a table have been updated, in that case it just has to scan the
> whole table ignoring old records not visible to the transaction. Oracle has to
> consult the rollback segment for any recently updated tuple. Oracle's wins in
> the case where most of the tuples haven't changed so it can just scan the
> table without consulting lots of rollback segments.

I see what you're saying.  I'm not a db expert, just a programmer
trying to make his queries go faster, so I'll acknowledge that the
design is theoretically better.

In practice, I'm still stuck.  As a simple example, this query
        select avg(f1) from t1 group by f2

Takes 33 seconds (see explain analyze in another note in this thread)
to run on idle hardware with about 1GB available in the cache.  It's
clearly hitting disk to do the sort.  Being a dumb programmer, I
changed the query to:

    select f1 from t1;

And wrote the rest in Perl.  It takes 4 seconds to run.  Why?  The
Perl doesn't sort to disk, it aggregates in memory.  There are 18 rows
returned.  What I didn't mention is that I originally had:

    select avg(f1), t2.name from t1, t2 where t2.f2 = t1.f2 group by t2.name;

Which is much worse:

 Aggregate  (cost=161046.30..162130.42 rows=8673 width=222) (actual time=72069.10..87455.69 rows=18 loops=1)
   ->  Group  (cost=161046.30..161479.95 rows=86729 width=222) (actual time=71066.38..78108.17 rows=963660 loops=1)
         ->  Sort  (cost=161046.30..161263.13 rows=86729 width=222) (actual time=71066.36..72445.74 rows=963660
loops=1)
               Sort Key: t2.name
               ->  Merge Join  (cost=148030.15..153932.66 rows=86729 width=222) (actual time=19850.52..27266.40
rows=963660loops=1) 
                     Merge Cond: ("outer".f2 = "inner".f2)
                     ->  Sort  (cost=148028.59..150437.74 rows=963660 width=58) (actual time=19850.18..21750.12
rows=963660loops=1) 
                           Sort Key: t1.f2
                           ->  Seq Scan on t1  (cost=0.00..32479.60 rows=963660 width=58) (actual time=0.06..3333.39
rows=963660loops=1) 
                     ->  Sort  (cost=1.56..1.60 rows=18 width=164) (actual time=0.30..737.59 rows=931007 loops=1)
                           Sort Key: t2.f2
                           ->  Seq Scan on t2  (cost=0.00..1.18 rows=18 width=164) (actual time=0.05..0.08 rows=18
loops=1)
 Total runtime: 87550.31 msec

Again, there are about 18 values of f2.  The optimizer even knows this
(it's a foreign key to t2.f2), but instead it does the query plan in
exactly the wrong order.  It hits disk probably 3 times as much as the
simpler query judging by the amount of time this query takes (33 vs 88
secs).  BTW, adding an index to t1.f2 has seriously negative effects
on many other DSS queries.

I'm still not sure that the sort problem is our only problem when
vacuum runs.  It's tough to pin down.  We'll be adding more memory to
see if that helps with the disk contention.

Rob



Re: vacuum locking

From
Greg Stark
Date:
Rob Nagler <nagler@bivio.biz> writes:

> Mario Weilguni writes:
> > of course both approaches have advantages, it simply depends on the usage
> > pattern. A case where oracle really rules over postgresql are m<-->n
> > connection tables where each record consist of two foreign keys, the
> > overwrite approach is a big win here.

I don't understand why you would expect overwriting to win here.
What types of updates do you do on these tables?

Normally I found using update on such a table was too awkward to contemplate
so I just delete all the relation records that I'm replacing for the key I'm
working with and insert new ones. This always works out to be cleaner code. In
fact I usually leave such tables with no UPDATE grants on them.

In that situation I would have actually expected Postgres to do as well as or
better than Oracle since that makes them both functionally equivalent.

--
greg

Re: vacuum locking

From
Greg Stark
Date:
Rob Nagler <nagler@bivio.biz> writes:

> Greg Stark writes:
> > Note that pctfree/pctused are a big performance drain on the usual case. Try
> > setting them to 0/100 on a table that doesn't get updates (like a many-many
> > relation table) and see how much faster it is to insert and scan.
>
> Right.  You can optimize each table independently.  The "usual" case
> doesn't exist in most databases, I've found, which is why Oracle does
> better.

Sorry I was unclear. By "usual case" I meant reading, as opposed to updates.
The size of the on-disk representation turns out to be a major determinant in
a lot of database applications, since the dominant resource is i/o bandwidth.
Try doing a fresh import of a large table with pctfree 0 pctuse 100 and
compare how long a select takes on it compared to the original table.



> In practice, I'm still stuck.  As a simple example, this query
>         select avg(f1) from t1 group by f2
>
> Takes 33 seconds (see explain analyze in another note in this thread)
> to run on idle hardware with about 1GB available in the cache.  It's
> clearly hitting disk to do the sort.  Being a dumb programmer, I
> changed the query to:

I didn't see the rest of the thread so forgive me if you've already seen these
suggestions.

FIrstly, that type of query will be faster in 7.4 due to implementing a new
method for doing groups called hash aggregates.

Secondly you could try raising sort_mem. Postgres can't know how much memory
it really has before it swaps, so there's a parameter to tell it. And swapping
would be much worse than doing disk sorts.

You can raise sort_mem to tell it how much memory it's allowed to use before
it goes to disk sorts. You can even use ALTER SESSION to raise it in a few DSS
sessions but leave it low the many OLTP sessions. If it's high in OLTP
sessions then you could quickly hit swap when they all happen to decide to use
the maximum amount at the same time. But then you don't want to be doing big
sorts in OLTP sessions anyways.

Unfortunately there's no way to tell how much memory it thinks it's going to
use. I used to use a script to monitor the pgsql_tmp directory in the database
to watch for usage.

>     select f1 from t1;
>
> And wrote the rest in Perl. It takes 4 seconds to run. Why? The Perl doesn't
> sort to disk, it aggregates in memory. There are 18 rows returned. What I
> didn't mention is that I originally had:

Oof. I expect if you convinced 7.3 to do the sort in memory by a suitable
value of sort_mem it would be close, but still slower than perl. 7.4 should be
very close since hash aggregates would be more or less equivalent to the perl
method.


>     select avg(f1), t2.name from t1, t2 where t2.f2 = t1.f2 group by t2.name;
>
> Which is much worse:
>
>  Aggregate  (cost=161046.30..162130.42 rows=8673 width=222) (actual time=72069.10..87455.69 rows=18 loops=1)
>    ->  Group  (cost=161046.30..161479.95 rows=86729 width=222) (actual time=71066.38..78108.17 rows=963660 loops=1)
>          ->  Sort  (cost=161046.30..161263.13 rows=86729 width=222) (actual time=71066.36..72445.74 rows=963660
loops=1)
>                Sort Key: t2.name
>                ->  Merge Join  (cost=148030.15..153932.66 rows=86729 width=222) (actual time=19850.52..27266.40
rows=963660loops=1) 
>                      Merge Cond: ("outer".f2 = "inner".f2)
>                      ->  Sort  (cost=148028.59..150437.74 rows=963660 width=58) (actual time=19850.18..21750.12
rows=963660loops=1) 
>                            Sort Key: t1.f2
>                            ->  Seq Scan on t1  (cost=0.00..32479.60 rows=963660 width=58) (actual time=0.06..3333.39
rows=963660loops=1) 
>                      ->  Sort  (cost=1.56..1.60 rows=18 width=164) (actual time=0.30..737.59 rows=931007 loops=1)
>                            Sort Key: t2.f2
>                            ->  Seq Scan on t2  (cost=0.00..1.18 rows=18 width=164) (actual time=0.05..0.08 rows=18
loops=1)
>  Total runtime: 87550.31 msec
>
> Again, there are about 18 values of f2.  The optimizer even knows this
> (it's a foreign key to t2.f2), but instead it does the query plan in
> exactly the wrong order.  It hits disk probably 3 times as much as the
> simpler query judging by the amount of time this query takes (33 vs 88
> secs).  BTW, adding an index to t1.f2 has seriously negative effects
> on many other DSS queries.

Well, first of all it doesn't really because you said to group by t2.name not
f1. You might expect it to at least optimize something like this:

select avg(f1),t2.name from t1 join t2 using (f2) group by f2

but even then I don't think it actually is capable of using foreign keys as a
hint like that. I don't think Oracle does either actually, but I'm not sure.

To convince it to do the right thing you would have to do either:

SELECT a, t2.name
  FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1
  JOIN t2 USING (f2)

Or use a subquery:

SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
  FROM t1
 GROUP BY f2


Oh, incidentally, my use of the "JOIN" syntax is a personal preference.
Ideally it would produce identical plans but unfortunately that's not always
true yet, though 7.4 is closer. I think in the suggestion above it actually
would.

--
greg

Re: vacuum locking

From
Rob Nagler
Date:
Greg Stark writes:
> Sorry I was unclear. By "usual case" I meant reading, as opposed to updates.
> The size of the on-disk representation turns out to be a major determinant in
> a lot of database applications, since the dominant resource is i/o bandwidth.
> Try doing a fresh import of a large table with pctfree 0 pctuse 100 and
> compare how long a select takes on it compared to the original table.

BTW, I greatly appreciate your support on this stuff.  This list is a
fantastic resource.

I think we agree.  The question is what is the workload.  On tables
without updates, postgres will be fast enough.  However, postgres is
slow on tables with updates afaict.  I think of OLTP as a system with
updates.  One can do DSS on an OLTP database with Oracle, at least it
seems to work for one of our projects.

> FIrstly, that type of query will be faster in 7.4 due to implementing a new
> method for doing groups called hash aggregates.

We'll be trying it as soon as it is out.

> Secondly you could try raising sort_mem. Postgres can't know how much memory
> it really has before it swaps, so there's a parameter to tell it. And swapping
> would be much worse than doing disk sorts.

It is at 8000.  This is probably as high as I can go with multiple
postmasters.  The sort area is shared in Oracle (I think :-) in the
UGA.

> You can raise sort_mem to tell it how much memory it's allowed to
> use before it goes to disk sorts. You can even use ALTER SESSION to
> raise it in a few DSS sessions but leave it low the many OLTP
> sessions. If it's high in OLTP sessions then you could quickly hit
> swap when they all happen to decide to use the maximum amount at the
> same time. But then you don't want to be doing big sorts in OLTP
> sessions anyways.

This is a web app.  I can't control what the user wants to do.
Sometimes they update data, and other times they simply look at it.

I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I
set sort_mem in the conf file to 512000, restarted postrgres.  Reran
the simpler query (no name) 3 times, and it was still 27 secs.

> Unfortunately there's no way to tell how much memory it thinks it's
> going to use. I used to use a script to monitor the pgsql_tmp
> directory in the database to watch for usage.

I don't have to.  The queries that run slow are hitting disk.
Anything that takes a minute has to be writing to disk.

> Well, first of all it doesn't really because you said to group by t2.name not
> f1. You might expect it to at least optimize something like this:

I put f2 in the group by, and it doesn't matter.  That's the point.
It's the on-disk sort before the aggregate that's killing the query.

> but even then I don't think it actually is capable of using foreign keys as a
> hint like that. I don't think Oracle does either actually, but I'm not sure.

I'll be finding out this week.

> To convince it to do the right thing you would have to do either:
>
> SELECT a, t2.name
>   FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1
>   JOIN t2 USING (f2)
>
> Or use a subquery:
>
> SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
>   FROM t1
>  GROUP BY f2

This doesn't solve the problem.  It's the GROUP BY that is doing the
wrong thing.  It's grouping, then aggregating.

Rob



Re: vacuum locking

From
Rob Nagler
Date:
Greg Stark writes:
> I don't understand why you would expect overwriting to win here.
> What types of updates do you do on these tables?

These are statistics that we're adjusting.  I think that's pretty
normal stuff.  The DSS component is the avg() of these numbers on
particular groups.  The groups are related to foreign keys to
customers and other things.

> Normally I found using update on such a table was too awkward to
> contemplate so I just delete all the relation records that I'm
> replacing for the key I'm working with and insert new ones. This
> always works out to be cleaner code. In fact I usually leave such
> tables with no UPDATE grants on them.

In accounting apps, we do this, too.  It's awkward with all the
relationships to update all the records in the right order.  But
Oracle wins on delete/insert, too, because it reuses the tuples it
already has in memory, and it can reuse the same foreign key index
pages, too, since the values are usually the same.

The difference between Oracle and postgres seems to be optimism.
postgres assumes the transaction will fail and/or that a transaction
will modify lots of data that is used by other queries going on in
parallel.  Oracle assumes that the transaction is going to be
committed, and it might as well make the changes in place.

> In that situation I would have actually expected Postgres to do as well as or
> better than Oracle since that makes them both functionally
> equivalent.

I'll find out soon enough. :-)

Rob



Re: vacuum locking

From
Greg Stark
Date:
Rob Nagler <nagler@bivio.biz> writes:

> I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I
> set sort_mem in the conf file to 512000, restarted postrgres.  Reran
> the simpler query (no name) 3 times, and it was still 27 secs.

Sorry, I don't know how that bubbled up from the depths of my Oracle memory.
In postgres it's just "SET"

db=> set sort_mem = 512000;
SET

> > To convince it to do the right thing you would have to do either:
> >
> > SELECT a, t2.name
> >   FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1
> >   JOIN t2 USING (f2)
> >
> > Or use a subquery:
> >
> > SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
> >   FROM t1
> >  GROUP BY f2
>
> This doesn't solve the problem.  It's the GROUP BY that is doing the
> wrong thing.  It's grouping, then aggregating.

But at least in the form above it will consider using an index on f2, and it
will consider using indexes on t1 and t2 to do the join.

It's unlikely to go ahead and use the indexes though because normally sorting
is faster than using the index when scanning the whole table. You should
compare the "explain analyze" results for the original query and these two.
And check the results with "set enable_seqscan = off" as well.

I suspect you'll find your original query uses sequential scans even when
they're disabled because it has no alternative. With the two above it can use
indexes but I suspect you'll find they actually take longer than the
sequential scan and sort -- especially if you have sort_mem set large enough.

--
greg

Re: vacuum locking

From
Rob Nagler
Date:
Greg Stark writes:
> > > SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
> > >   FROM t1
> > >  GROUP BY f2
> >
> > This doesn't solve the problem.  It's the GROUP BY that is doing the
> > wrong thing.  It's grouping, then aggregating.
>
> But at least in the form above it will consider using an index on f2, and it
> will consider using indexes on t1 and t2 to do the join.

There are 20 rows in t2, so an index actually slows down the join.
I had to drop the index on t1.f2, because it was trying to use it
instead of simply sorting 20 rows.

I've got preliminary results for a number of "hard" queries between
oracle and postgres (seconds):

 PG ORA
  0   5 q1
  1   0 q2
  0   5 q3
  2   1 q4
219   7 q5
217   5 q6
 79   2 q7
 31   1 q8

These are averages of 10 runs of each query.  I didn't optimize
pctfree, etc., but I did run analyze after the oracle import.

One of the reason postgres is faster on the q1-4 is that postgres
supports OFFSET/LIMIT, and oracle doesn't.  q7 and q8 are the queries
that I've referred to recently (avg of group by).

q5 and q6 are too complex to discuss here, but the fundamental issue
is the order in which postgres decides to do things.  The choice for
me is clear: the developer time trying to figure out how to make the
planner do the "obviously right thing" has been too high with
postgres.  These tests demonstate to me that for even complex queries,
oracle wins for our problem.

It looks like we'll be migrating to oracle for this project from these
preliminary results.  It's not just the planner problems.  The
customer is more familiar with oracle, and the vacuum performance is
another problem.

Rob

Re: vacuum locking

From
Tom Lane
Date:
Rob Nagler <nagler@bivio.biz> writes:
> q5 and q6 are too complex to discuss here,

How do you expect us to get better if you don't show us the problems?

BTW, have you tried any of this with a 7.4beta release?  Another project
that I'm aware of saw several bottlenecks in their Oracle-centric code
go away when they tested 7.4 instead of 7.3.  For instance, there is
hash aggregation capability, which would probably solve the aggregate
query problem you were complaining about in
http://archives.postgresql.org/pgsql-performance/2003-10/msg00640.php

            regards, tom lane

Re: vacuum locking

From
Josh Berkus
Date:
Rob,

> q5 and q6 are too complex to discuss here, but the fundamental issue
> is the order in which postgres decides to do things.  The choice for
> me is clear: the developer time trying to figure out how to make the
> planner do the "obviously right thing" has been too high with
> postgres.  These tests demonstate to me that for even complex queries,
> oracle wins for our problem.
>
> It looks like we'll be migrating to oracle for this project from these
> preliminary results.  It's not just the planner problems.  The
> customer is more familiar with oracle, and the vacuum performance is
> another problem.

Hey, we can't win 'em all.   If we could, Larry would be circulating his
resume'.

I hope that you'll stay current with PostgreSQL developments so that you can
do a similarly thourough evaluation for your next project.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: vacuum locking

From
Greg Stark
Date:
Rob Nagler <nagler@bivio.biz> writes:

> One of the reason postgres is faster on the q1-4 is that postgres
> supports OFFSET/LIMIT, and oracle doesn't.  q7 and q8 are the queries
> that I've referred to recently (avg of group by).

Well the way to do offset/limit in Oracle is:

SELECT *
  FROM (
        SELECT ... , rownum AS n
         WHERE rownum <= OFFSET+LIMIT
       )
 WHERE n > OFFSET

That's basically the same thing Postgres does anyways. It actually has to do
the complete query and fetch and discard the records up to the OFFSET and then
stop when it hits the LIMIT.

> q5 and q6 are too complex to discuss here, but the fundamental issue
> is the order in which postgres decides to do things.

That true for pretty 99% of all query optimization whether it's on Postgres or
Oracle. I'm rather curious to see the query and explain analyze output from q5
and q6.

--
greg

Re: vacuum locking

From
"scott.marlowe"
Date:
On Wed, 29 Oct 2003, Rob Nagler wrote:

> Greg Stark writes:
> > > > SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
> > > >   FROM t1
> > > >  GROUP BY f2
> > >
> > > This doesn't solve the problem.  It's the GROUP BY that is doing the
> > > wrong thing.  It's grouping, then aggregating.
> >
> > But at least in the form above it will consider using an index on f2, and it
> > will consider using indexes on t1 and t2 to do the join.
>
> There are 20 rows in t2, so an index actually slows down the join.
> I had to drop the index on t1.f2, because it was trying to use it
> instead of simply sorting 20 rows.

t2 was 'vacuum full'ed and analyzed, right?  Just guessing.


Re: vacuum locking

From
Rob Nagler
Date:
Josh Berkus writes:
> I hope that you'll stay current with PostgreSQL developments so that you can
> do a similarly thourough evaluation for your next project.

Oh, no worries.  This project just happens to be a tough one.  We're
heavily invested in Postgres.  Other projects we maintain that use
Postgres are zoescore.com, colosla.org, and paintedsnapshot.com.

I am currently working on a very large project where the customer is
very committed to Postgres/open source.  We're in discussions about
what to do about the scalability problems we saw in the other project.
You can help by addressing a dilema we (my new customer and I) see.
I apologize for the length of what follows, but I'm trying to be as
clear as possible about our situation.

I have had a lot push back from the core Postgres folks on the idea of
planner hints, which would go a long way to solve the performance
problems we are seeing.  I presented an alternative approach: have a
"style sheet" (Scribe, LaTex) type of solution in the postmaster,
which can be customized by end users.  That got no response so I
assume it wasn't in line with the "Postgres way" (more below).

The vacuum problem is very serious for the problematic database to the
point that one of my customer's customers said:

    However, I am having a hard time understanding why the system is so
    slow... from my perspective it seems like you have some fundamental
    database issues that need to be addressed.

This is simply unacceptable, and that's why we're moving to Oracle.
It's very bad for my business reputation.

I don't have a ready solution to vacuuming, and none on the list have
been effective.  We'll be adding more memory, but it seems to be disk
bandwidth problem.  I run Oracle on much slower system, and I've never
noticed problems of this kind, even when a database-wide validation is
running.  When vacuum is running, it's going through the entire
database, and that pretty much trashes all other queries, especially
DSS queries.  As always it is just software, and there's got to be
80/20 solution.

Our new project is large, high-profile, but not as data intensive as
the problematic one.  We are willing to commit significant funding and
effort to make Postgres faster.  We are "business value" driven.  That
means we solve problems practically instead of theoretically.  This
seems to be in conflict with "the Postgres way", which seems to be
more theoretical.  Our business situation comes ahead of theories.

My customer (who monitors this list) and I believe that our changes
would not be accepted back into the Postgres main branch.  That
presents us with a difficult situation, because we don't want to own a
separate branch.  (Xemacs helped push emacs, and maybe that's what has
to happen here, yet it's not a pretty situation.)

We'll be meeting next week to discuss the situation, and how we'll go
forward.   We have budget in 2003 to spend on this, but only if the
situation can be resolved.  Otherwise, we'll have to respect the data
we are seeing, and think about our choice of technologies.

Thanks for the feedback.

Rob



Re: vacuum locking

From
Rob Nagler
Date:
scott.marlowe writes:
> t2 was 'vacuum full'ed and analyzed, right?  Just guessing.

Fresh import.  I've been told this includes a ANALYZE.

Rob



Re: vacuum locking

From
Tom Lane
Date:
Rob Nagler <nagler@bivio.biz> writes:
> When vacuum is running, it's going through the entire
> database, and that pretty much trashes all other queries, especially
> DSS queries.  As always it is just software, and there's got to be
> 80/20 solution.

One thing that's been discussed but not yet tried is putting a tunable
delay into VACUUM's per-page loop (ie, sleep N milliseconds after each
heap page is processed, and probably each index page too).  This might
be useless or it might be the 80/20 solution you want.  Want to try it
and report back?

            regards, tom lane

Re: vacuum locking

From
"scott.marlowe"
Date:
On Thu, 30 Oct 2003, Rob Nagler wrote:

> The vacuum problem is very serious for the problematic database to the
> point that one of my customer's customers said:
>
>     However, I am having a hard time understanding why the system is so
>     slow... from my perspective it seems like you have some fundamental
>     database issues that need to be addressed.
>
> This is simply unacceptable, and that's why we're moving to Oracle.
> It's very bad for my business reputation.
>
> I don't have a ready solution to vacuuming, and none on the list have
> been effective.  We'll be adding more memory, but it seems to be disk
> bandwidth problem.  I run Oracle on much slower system, and I've never
> noticed problems of this kind, even when a database-wide validation is
> running.  When vacuum is running, it's going through the entire
> database, and that pretty much trashes all other queries, especially
> DSS queries.  As always it is just software, and there's got to be
> 80/20 solution.

Have you looked at the autovacuum daemon?  Was it found wanting or what?
I've had good luck with it so far, so I was just wondering if it might
work for your needs as well.  It's quite intelligent about which tables
et.al. it vacuums.


Re: vacuum locking

From
"scott.marlowe"
Date:
On Thu, 30 Oct 2003, Rob Nagler wrote:

> scott.marlowe writes:
> > t2 was 'vacuum full'ed and analyzed, right?  Just guessing.
>
> Fresh import.  I've been told this includes a ANALYZE.

You should probably run analyze by hand just to be sure.  If the planner
is using an index scan on a table with 20 rows, then it's likely it has
the default statistics for the table, not real ones.


Re: vacuum locking

From
Josh Berkus
Date:
Rob,

> I have had a lot push back from the core Postgres folks on the idea of
> planner hints, which would go a long way to solve the performance
> problems we are seeing.

I can tell you that the general reaction that you'll get is "let's fix the
problems with the planner instead of giving the user a workaround."  Not that
that helps people running on older versions, but it stems from a attitude of
"let's heal the illness, not the symptoms" attitude that is one of our
project's strengths.

> I presented an alternative approach: have a
> "style sheet" (Scribe, LaTex) type of solution in the postmaster,
> which can be customized by end users.  That got no response so I
> assume it wasn't in line with the "Postgres way" (more below).

Or you just posted it on a bad week.   I don't remember your post; how about
we try it out on Hackers again and we'll argue it out?

> running.  When vacuum is running, it's going through the entire
> database, and that pretty much trashes all other queries, especially
> DSS queries.  As always it is just software, and there's got to be
> 80/20 solution.

See Tom's post.

> Our new project is large, high-profile, but not as data intensive as
> the problematic one.  We are willing to commit significant funding and
> effort to make Postgres faster.  We are "business value" driven.  That
> means we solve problems practically instead of theoretically.  This
> seems to be in conflict with "the Postgres way", which seems to be
> more theoretical.  Our business situation comes ahead of theories.

As always, it's a matter of balance.   Our "theoretical purity" has given
PostgreSQL a reliability and recoverability level only otherwise obtainable
from Oracle for six figures.   And has allowed us to build an extensability
system that lets users define their own datatypes, operators, aggregates,
etc., in a way that is not possible on *any* other database.  This is what
you're up against when you suggest changes to some of the core components ...
people don't want to break what's not broken unless there are substantial,
proven gains to be made.

> My customer (who monitors this list) and I believe that our changes
> would not be accepted back into the Postgres main branch.

If you haven't posted, you don't know.   A *lot* of suggestions get rejected
because the suggestor wants Tom, Bruce, Peter, Joe and Jan to do the actual
work or aren't willing to follow-through with testing and maintanence.  As I
said before, *I* don't remember earlier posts from you offering patches;
perhaps it's time to try again?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: vacuum locking

From
Bruce Momjian
Date:
Josh Berkus wrote:
> > Our new project is large, high-profile, but not as data intensive as
> > the problematic one.  We are willing to commit significant funding and
> > effort to make Postgres faster.  We are "business value" driven.  That
> > means we solve problems practically instead of theoretically.  This
> > seems to be in conflict with "the Postgres way", which seems to be
> > more theoretical.  Our business situation comes ahead of theories.
>
> As always, it's a matter of balance.   Our "theoretical purity" has given
> PostgreSQL a reliability and recoverability level only otherwise obtainable
> from Oracle for six figures.   And has allowed us to build an extensibility
> system that lets users define their own datatypes, operators, aggregates,
> etc., in a way that is not possible on *any* other database.  This is what
> you're up against when you suggest changes to some of the core components ...
> people don't want to break what's not broken unless there are substantial,
> proven gains to be made.

Let me add a little historical perspective here --- the PostgreSQL code
base is almost 20 years old, and code size has doubled in the past 7
years.  We are into PostgreSQL for the long haul --- that means we want
code that will be working and maintainable 7 years from now.  If your
solution doesn't fit that case, well, you might be right, it might get
rejected.  However, we find that it is worth the time and effort to make
our code sustainable, and it is possible your solution could be set up
to do that.  However, it requires you to go beyond your "business
situation" logic and devote time to contribute something that will make
PostgreSQL better 5 years in the future, as well as the next release.

We have found very few companies that are not willing to work within
that long-term perspective.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: vacuum locking

From
Rob Nagler
Date:
Tom Lane writes:
> Rob Nagler <nagler@bivio.biz> writes:
> > q5 and q6 are too complex to discuss here,
>
> How do you expect us to get better if you don't show us the problems?

With all due respect and thanks for the massive amount of help, I have
presented the problems.  q5 and q6 are a subset of the following
general problems:

    * Multiple ORDER BY results in no index used.
      Solution: drop multiple ORDER BY, only use first

    * Vacuum locks out interactive users
      Solution: don't run vacuum full and only run vacuum at night

    * Low cardinality index on large table confuses planner
      Solution: Drop (foreign key) index, which hurts other performance

    * Grouped aggregates result in disk sort
      Solution: Wait to 7.4 (may work), or write in Perl (works today)

    * Extreme non-linear performance (crossing magic number in
      optimizer drops performance three orders of magnitude)
      Solution: Don't cross magic number, or code in Perl

The general problem is that our system generates 90% of the SQL we
need.  There are many advantages to this, such as being able to add
OFFSET/LIMIT support with a few lines of code in a matter of hours.
Every time we have to custom code a query, or worse, code it in Perl,
we lose many benefits.  I understand the need to optimize queries, but
my general experience with Oracle is that I don't have to do this very
often.  When the 80/20 rule inverts, there's something fundamentally
wrong with the model.  That's where we feel we're at.  It's cost us a
tremendous amount of money to deal with these query optimizations.

The solution is not to fix the queries, but to address the root
causes.  That's what my other note in this thread is about.  I hope
you understand the spirit of my suggestion, and work with us to
finding an acceptable approach to the general problems.

> BTW, have you tried any of this with a 7.4beta release?

I will, but for my other projects, not this one.  I'll run this data,
because it's a great test case.

We have a business decision to make: devote more time to Postgres or
go with Oracle.  I spent less than a day getting the data into Oracle
and to create the benchmark.  The payoff is clear, now.  The risk of
7.4 is still very high, because the vacuum problem still looms and a
simple "past performance is a good indicator of future performance".
Going forward, there's no choice.  We've had to limit end-user
functionality to get Postgres working as well as it does, and that's
way below where Oracle is without those same limits and without any
effort put into tuning.

Thanks again for all your support.

Rob



Re: vacuum locking

From
Christopher Kings-Lynne
Date:
> Fresh import.  I've been told this includes a ANALYZE.

Uh - no it doesn't.

Chris