Thread: PG 7.0 is 2.5 times slower running a big report

PG 7.0 is 2.5 times slower running a big report

From

"Bryan White"

Date:

24 May 2000, 17:11:02

I have a large report that I run once a day.  Under 6.5.3 it took just over
3hrs to run.  Under 7.0 it is now taking 8 hours to run.  No other changes
were made.

This is on RedHat Linux 6.2.  A PIII 733 with 384MB Ram, and 2 IDE 7200 RPM
disks.  One disk contains the Postgres directroy including the data
directory, and the other disk has everything else.

The Postmaster is started with these options: -i -B 1024 -N 256 -o -F

The report is being run on a backup server just after the database has been
loaded from a dump and 'vacuum analyse'd.  There is practically nothing else
running on the box during the run.

The report creates four separate concurrent cursors.  Each of the queries is
sorted by a customer number which is an index.  The index is unique in the
customer table but not the others.  For the other cursors it pops values as
needed to process data for the current customer number.  There are also
other selects that are run for each customer order processed to retrieve its
line items.  The report does not update the database at all, it is just
accumulating totals that will be written out when the report finishes.

Top tells me the front end process is using 5 to 10 percent of the CPU and
the back end is using 10 to 20 percent.  The load average is about 1.0 and
the CPU is about 80% idle.  I am prettry certain on 6.5.3 that the CPU usage
was much higher.  Its almost like the new version has some sort of throttle
to keep one backend from saturating the system.  Indeed the box is much more
responsive than it used to be while running this report.

Suggestions?

Re: PG 7.0 is 2.5 times slower running a big report

From

Tom Lane

Date:

24 May 2000, 18:12:11

"Bryan White" <bryan@arcamax.com> writes:
> Top tells me the front end process is using 5 to 10 percent of the CPU and
> the back end is using 10 to 20 percent.  The load average is about 1.0 and
> the CPU is about 80% idle.

It's probably waiting for disk I/O ...

What does EXPLAIN tell you about how the queries are being executed?
Do you by any chance have the 6.5.3 system still available to compare
its EXPLAIN output?

            regards, tom lane

Re: PG 7.0 is 2.5 times slower running a big report

From

Joseph Shraibman

Date:

24 May 2000, 18:45:13

Try putting 'nofsync' in your pg_options.


Bryan White wrote:
>
> I have a large report that I run once a day.  Under 6.5.3 it took just over
> 3hrs to run.  Under 7.0 it is now taking 8 hours to run.  No other changes
> were made.
>
> This is on RedHat Linux 6.2.  A PIII 733 with 384MB Ram, and 2 IDE 7200 RPM
> disks.  One disk contains the Postgres directroy including the data
> directory, and the other disk has everything else.
>
> The Postmaster is started with these options: -i -B 1024 -N 256 -o -F
>
> The report is being run on a backup server just after the database has been
> loaded from a dump and 'vacuum analyse'd.  There is practically nothing else
> running on the box during the run.
>
> The report creates four separate concurrent cursors.  Each of the queries is
> sorted by a customer number which is an index.  The index is unique in the
> customer table but not the others.  For the other cursors it pops values as
> needed to process data for the current customer number.  There are also
> other selects that are run for each customer order processed to retrieve its
> line items.  The report does not update the database at all, it is just
> accumulating totals that will be written out when the report finishes.
>
> Top tells me the front end process is using 5 to 10 percent of the CPU and
> the back end is using 10 to 20 percent.  The load average is about 1.0 and
> the CPU is about 80% idle.  I am prettry certain on 6.5.3 that the CPU usage
> was much higher.  Its almost like the new version has some sort of throttle
> to keep one backend from saturating the system.  Indeed the box is much more
> responsive than it used to be while running this report.
>
> Suggestions?

Re: PG 7.0 is 2.5 times slower running a big report

From

"Bryan White"

Date:

24 May 2000, 21:37:15

> Try putting 'nofsync' in your pg_options.

I am already starting the postmaster with -o -F which I believe has the same
effect.  Besides this process does not alter the database so there shold be
nothing to sync.

Re: PG 7.0 is 2.5 times slower running a big report

From

The Hermit Hacker

Date:

24 May 2000, 21:55:19

I might have missed it, but did you submit the EXPLAIN results of the
query you are attempting, as Tom Lane requested?

On Wed, 24 May 2000, Bryan White wrote:

> > Try putting 'nofsync' in your pg_options.
>
> I am already starting the postmaster with -o -F which I believe has the same
> effect.  Besides this process does not alter the database so there shold be
> nothing to sync.
>
>

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Re: PG 7.0 is 2.5 times slower running a big report

From

"Bryan White"

Date:

24 May 2000, 22:09:15

> "Bryan White" <bryan@arcamax.com> writes:
> > Top tells me the front end process is using 5 to 10 percent of the CPU
and
> > the back end is using 10 to 20 percent.  The load average is about 1.0
and
> > the CPU is about 80% idle.
>
> It's probably waiting for disk I/O ...
>
> What does EXPLAIN tell you about how the queries are being executed?
> Do you by any chance have the 6.5.3 system still available to compare
> its EXPLAIN output?

explain select custid, poregdate, firstcontactdate,
    mastersubscribed, offersubscribed, bouncecount
    from customer order by custid;
Sort  (cost=598354.56..598354.56 rows=2446621 width=40)
  ->  Seq Scan on customer  (cost=0.00..75939.21 rows=2446621 width=40)

explain select custid, orderid, date, leadsource,
    paymentstatus, shipping + tax
    from orders order by custid;
Sort  (cost=167945.80..167945.80 rows=588242 width=60)
  ->  Seq Scan on orders  (cost=0.00..31399.42 rows=588242 width=60)

explain select custid, action, offer, date, source
    from contact order by custid;
Index Scan using iconcus4 on contact  (cost=0.00..1446338.62 rows=6462635
width=44)

explain select custid, listid
    from custlist order by custid;
Index Scan using iclcust3 on custlist  (cost=0.00..334501.73 rows=2738543
width=8)

I find the 'Sort's on customer and orders supprising.
Here are the index creates from a dump:
CREATE  INDEX "icusln" on "customer" using btree ( "lname" "text_ops" );
CREATE UNIQUE INDEX "icusem2" on "customer" using btree ( "email"
"text_ops" );
CREATE  INDEX "icusph" on "customer" using btree ( "phone" "text_ops" );
CREATE UNIQUE INDEX "icusid" on "customer" using btree ( "custid"
"int4_ops" );

CREATE  INDEX "iordldsrc3" on "orders" using btree ( "leadsource"
"text_ops" );
CREATE UNIQUE INDEX "iordid3" on "orders" using btree ( "orderid"
"int4_ops" );
CREATE  INDEX "iordcus3" on "orders" using btree ( "custid" "int4_ops",
"date" "date_ops" );
CREATE  INDEX "iorddate3" on "orders" using btree ( "date" "date_ops" );

The iordcus3 index has a second component that is irrelevent to this
operation.  Is the optimizer intelligent enough to still use it.

Re: PG 7.0 is 2.5 times slower running a big report

From

"Bryan White"

Date:

24 May 2000, 22:16:19

> What does EXPLAIN tell you about how the queries are being executed?
> Do you by any chance have the 6.5.3 system still available to compare
> its EXPLAIN output?

I can shuffle directories and reload under 6.5.3 but I need to wait until
todays run of the report finishes.  I got bit by the change in the default
date format and had to run it again.  As it is I won't be able to do this
until the morning.

Re: PG 7.0 is 2.5 times slower running a big report

From

Tom Lane

Date:

24 May 2000, 22:24:17

"Bryan White" <bryan@arcamax.com> writes:
>> What does EXPLAIN tell you about how the queries are being executed?
>> Do you by any chance have the 6.5.3 system still available to compare
>> its EXPLAIN output?

> explain select custid, poregdate, firstcontactdate,
>     mastersubscribed, offersubscribed, bouncecount
>     from customer order by custid;
> Sort  (cost=598354.56..598354.56 rows=2446621 width=40)
>    -> Seq Scan on customer  (cost=0.00..75939.21 rows=2446621 width=40)

> explain select custid, orderid, date, leadsource,
>     paymentstatus, shipping + tax
>     from orders order by custid;
> Sort  (cost=167945.80..167945.80 rows=588242 width=60)
>    -> Seq Scan on orders  (cost=0.00..31399.42 rows=588242 width=60)

> explain select custid, action, offer, date, source
>     from contact order by custid;
> Index Scan using iconcus4 on contact  (cost=0.00..1446338.62 rows=6462635
> width=44)

> explain select custid, listid
>     from custlist order by custid;
> Index Scan using iclcust3 on custlist  (cost=0.00..334501.73 rows=2738543
> width=8)

> I find the 'Sort's on customer and orders supprising.

Well, in fact you should not, because 6.5.* didn't know how to use an
index to implement ORDER BY at all ;-).  7.0 knows how to do it both
ways, and so the real question here is whether the planner's cost
estimates for the two ways are accurate or not.

My guess is that the index-based ORDER BYs are actually a lot slower
than sort-based for your tables, and that's why 7.0 is slower than 6.5.
But the planner thinks they will be faster in two of these cases.
We need to figure out why it's making the wrong choice.

You can force sort-based or index-based ordering by issuing
    SET enable_indexscan = OFF;
or
    SET enable_sort = OFF;
respectively.  I suggest that the next step should be to see what
EXPLAIN says for all four queries in both cases (so we can see what
the planner's estimates for the two cases actually are for each
table), and then to measure the actual runtimes of each of the
SELECTs both ways.

            regards, tom lane

Re: PG 7.0 is 2.5 times slower running a big report

From

Tom Lane

Date:

24 May 2000, 22:29:19

"Bryan White" <bryan@arcamax.com> writes:
>> What does EXPLAIN tell you about how the queries are being executed?
>> Do you by any chance have the 6.5.3 system still available to compare
>> its EXPLAIN output?

> I can shuffle directories and reload under 6.5.3 but I need to wait until
> todays run of the report finishes.  I got bit by the change in the default
> date format and had to run it again.  As it is I won't be able to do this
> until the morning.

Don't worry about it.  Now that I've seen the queries, I know that there
is only one query plan 6.5.3 could have used: seq-scan and sort.  So
there's no need to reload 6.5.3 just to see its EXPLAIN output.

            regards, tom lane

Re: PG 7.0 is 2.5 times slower running a big report

From

"Bryan White"

Date:

24 May 2000, 22:45:15

> You can force sort-based or index-based ordering by issuing
> SET enable_indexscan = OFF;
> or
> SET enable_sort = OFF;
> respectively.  I suggest that the next step should be to see what
> EXPLAIN says for all four queries in both cases (so we can see what
> the planner's estimates for the two cases actually are for each
> table), and then to measure the actual runtimes of each of the
> SELECTs both ways.

Here is to comparison:
explain select ... from customer order by custid;
Sort  (cost=598354.56..598354.56 rows=2446621 width=40)
  ->  Seq Scan on customer  (cost=0.00..75939.21 rows=2446621 width=40)
Index Scan using icusid on customer  (cost=0.00..823287.37 rows=2446621
width=40)
Choice was sort

explain select ... from orders order by custid;
Sort  (cost=167945.80..167945.80 rows=588242 width=60)
  ->  Seq Scan on orders  (cost=0.00..31399.42 rows=588242 width=60)
Index Scan using iordcus3 on orders  (cost=0.00..326167.12 rows=588242
width=60)
Choice was sort

explain select ... from contact order by custid;
Sort  (cost=1874247.64..1874247.64 rows=6462635 width=44)
  ->  Seq Scan on contact  (cost=0.00..141404.35 rows=6462635 width=44)
Index Scan using iconcus4 on contact  (cost=0.00..1446338.62 rows=6462635
width=44)
Choice was index

explain select ... from custlist order by custid;
Sort  (cost=469342.83..469342.83 rows=2738543 width=8)
  ->  Seq Scan on custlist  (cost=0.00..42109.43 rows=2738543 width=8)
Index Scan using iclcust3 on custlist  (cost=0.00..334501.73 rows=2738543
width=8)
Choice was index

Re: PG 7.0 is 2.5 times slower running a big report

From

"Bryan White"

Date:

24 May 2000, 22:57:15


> "Bryan White" <bryan@arcamax.com> writes:
> >> What does EXPLAIN tell you about how the queries are being executed?
> >> Do you by any chance have the 6.5.3 system still available to compare
> >> its EXPLAIN output?
>
> > I can shuffle directories and reload under 6.5.3 but I need to wait
until
> > todays run of the report finishes.  I got bit by the change in the
default
> > date format and had to run it again.  As it is I won't be able to do
this
> > until the morning.
>
> Don't worry about it.  Now that I've seen the queries, I know that there
> is only one query plan 6.5.3 could have used: seq-scan and sort.  So
> there's no need to reload 6.5.3 just to see its EXPLAIN output.

I have set index_scan off for tomorrow morning's run.  I will let you know
what happens.

Re: PG 7.0 is 2.5 times slower running a big report

From

"Bryan White"

Date:

25 May 2000, 11:33:28

> Well, when you have 2.7 million records in a database, the code might be
as good
> as it can be.

I have recoverd the performance lost when I moved to Postgres 7.0 by
executing
SET enable_indexscan = OFF before creating my cursors and turning it back on
for the inner loop query.  It may even be faster then before so I am happy.

> So each run of the log takes 2.7 million queries.
>
> Is there no way to use a where clause to limit the scope of your queries?

No, the point is the resulting report covers the entire database.

> An explanation of the purpose of these queries would help me think about
what
> you are after.  Maybe it isn't a code problem, may it a business rules
problem.
> At the moment, I understand why you need to go through all the items in
the db
> on every day.  Can't you just go through the items that were updated on a
given
> day?  So, in general, what is the goal of this report?

The report summarizes a lot of different aspects of our database.  It breaks
down totals in many by a variety of variables.  For example orders are
broken down by source, date, status, and sku (most orders have only one line
item).  I could come up with quicker methods to do any one of the breakdowns
but to do them all it is more efficient to make a single pass over all the
data.

> I'm new to the cursor method (and to PostgreSQL, though I've done a lot of
work
> with MySQL), but it is interesting, so I'm certainly going to look into
it.

Part of the advantage of the cursor is in memory management.  Is a
non-cursor select the database builds and transfers the entire result set to
the front end application.  For large queries this can be quite a chunk of
memory.

With a sequential scan and sort of the database I think it is still
buffering the result set on the back end but one copy is better then two.
Keep in mind I am running this report on a backup database server so I don't
have to worry much about other processes being hurt by the load.

It seems that with index scans the cursors start producing data right away
(but the overall rate is slower).  With sequential scan and sort the report
gets no data for the first 30 minutes and then runs at about 4 times the
rate of the index scan.

> What difference do you want to capture in distinguishing the customer
table from
> the custlist table?

I am not certain I understand the question.  The custlist table contains
subscriptions to various services.  It is a very simple table containing a
listid and a custid.

Re: PG 7.0 is 2.5 times slower running a big report

From

"Bryan White"

Date:

25 May 2000, 11:39:23

> I have set index_scan off for tomorrow morning's run.  I will let you know
> what happens.

I think my problem is fixed.  By disabling index scan while creating the
cursors and then turning it back on again for the small query that occurs in
my inner loop the performance has returned to normal (It may infact be
better then before).

Thanks a bunch.
Bryan

Re: PG 7.0 is 2.5 times slower running a big report

From

Tom Lane

Date:

25 May 2000, 12:50:23

"Bryan White" <bryan@arcamax.com> writes:
> I have recoverd the performance lost when I moved to Postgres 7.0 by
> executing SET enable_indexscan = OFF before creating my cursors and
> turning it back on for the inner loop query.  It may even be faster
> then before so I am happy.

OK, so it was the indexscans that were hurting.  (7.0 has new sorting
code too, so I was a little afraid that the problem might be with the
sorts.  Evidently not.)

This suggests that at least on your setup, the default value of 4.0 for
random_page_cost might still be too low.  I have not tried to measure
that number on a Linux machine, just on machines with BSD-derived
filesystems.  Maybe Linux does a lot worse with random accesses than
BSD?  Needs looking into.

> It seems that with index scans the cursors start producing data right away
> (but the overall rate is slower).  With sequential scan and sort the report
> gets no data for the first 30 minutes and then runs at about 4 times the
> rate of the index scan.

Right, that's what you'd expect: the sort has to be completed before it
knows which row to deliver first, but an indexscan has no such startup
cost.

            regards, tom lane

Re: PG 7.0 is 2.5 times slower running a big report

From

Bruce Momjian

Date:

25 May 2000, 13:10:26

> "Bryan White" <bryan@arcamax.com> writes:
> > I have recoverd the performance lost when I moved to Postgres 7.0 by
> > executing SET enable_indexscan = OFF before creating my cursors and
> > turning it back on for the inner loop query.  It may even be faster
> > then before so I am happy.
>
> OK, so it was the indexscans that were hurting.  (7.0 has new sorting
> code too, so I was a little afraid that the problem might be with the
> sorts.  Evidently not.)
>
> This suggests that at least on your setup, the default value of 4.0 for
> random_page_cost might still be too low.  I have not tried to measure
> that number on a Linux machine, just on machines with BSD-derived
> filesystems.  Maybe Linux does a lot worse with random accesses than
> BSD?  Needs looking into.

As I remember, Linux turns off file system prefetch if a seek is done.
BSD file systems turn off prefetch only if prefetched blocks remain
unused.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: PG 7.0 is 2.5 times slower running a big report

From

"Matthias Urlichs"

Date:

26 May 2000, 05:06:44

Hi,

Bruce Momjian:
> As I remember, Linux turns off file system prefetch if a seek is done.

Correct (still so for Linux 2.4).

> BSD file systems turn off prefetch only if prefetched blocks remain
> unused.

"Unused" is decided when?
Do any, all, or some percentage of prefetched blocks need to be unused?

--
Matthias Urlichs  |  noris network GmbH   |   smurf@noris.de  |  ICQ: 20193661
The quote was selected randomly. Really.       |        http://smurf.noris.de/
--
If hyperspace did not already exist, science fiction writers would have had to
invent it
            -Peter Oakley

Re: PG 7.0 is 2.5 times slower running a big report

From

Alfred Perlstein

Date:

26 May 2000, 06:10:39

* Matthias Urlichs <smurf@noris.de> [000526 02:47] wrote:
> Hi,
>
> Bruce Momjian:
> > As I remember, Linux turns off file system prefetch if a seek is done.
>
> Correct (still so for Linux 2.4).
>
> > BSD file systems turn off prefetch only if prefetched blocks remain
> > unused.
>
> "Unused" is decided when?
> Do any, all, or some percentage of prefetched blocks need to be unused?

The statement:

> > BSD file systems turn off prefetch only if prefetched blocks remain
> > unused.

is false for FreeBSD afaik.

FreeBSD tracks how often sequential access happens and will retry it
as the file's access trend indicates.  It can flip back and forth from
readahead and non-readahead.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

Re: PG 7.0 is 2.5 times slower running a big report

From

Bruce Momjian

Date:

26 May 2000, 11:02:38

> The statement:
>
> > > BSD file systems turn off prefetch only if prefetched blocks remain
> > > unused.
>
> is false for FreeBSD afaik.
>
> FreeBSD tracks how often sequential access happens and will retry it
> as the file's access trend indicates.  It can flip back and forth from
> readahead and non-readahead.

Yes, read my later posting describing the actual algorithm.  Let me know
if it is wrong.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: PG 7.0 is 2.5 times slower running a big report

From

Bruce Momjian

Date:

26 May 2000, 11:05:36

> Hi,
>
> Bruce Momjian:
> > As I remember, Linux turns off file system prefetch if a seek is done.
>
> Correct (still so for Linux 2.4).
>
> > BSD file systems turn off prefetch only if prefetched blocks remain
> > unused.
>
> "Unused" is decided when?
> Do any, all, or some percentage of prefetched blocks need to be unused?

It starts out prefetching blocks on first file read.  In later reads, if
the desired block is in the cache, it continues read-ahead and may even
scale up the read-ahead window.

If the desired block is not in the cache, it starts to think it is
reading ahead too much or random reads are being done, and scales back
the read-ahead window.

It knows if the current read is sequential from from the previous read,
and can scale up the read-ahead window in those cases.

So the case where a file is being read sequentially but in two sections
at the same time should work fine because the read-aheads are being
used.

It basically a feedback loop causing a read that is sequential from a
previous read to make the read-ahead go up, and a read that is not in
the cache to make the read-ahead go down.

At least that is how I am reading the BSDI code.  I could send a bit of
the code over, but it is pretty confusing.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

RE: PG 7.0 is 2.5 times slower running a big report

From

Bill Barnes

Date:

26 May 2000, 11:45:47

Might I suggest a change in the name of this thread or starting a new one.
The subject problem was resolved sometime ago and this is just bad publicity.
Almost scare me off till I read the user with the problem was now happy.

Bill Barnes

>===== Original Message From Bruce Momjian <pgman@candle.pha.pa.us> =====
>> The statement:
>>
>> > > BSD file systems turn off prefetch only if prefetched blocks remain
>> > > unused.
>>
>> is false for FreeBSD afaik.
>>
>> FreeBSD tracks how often sequential access happens and will retry it
>> as the file's access trend indicates.  It can flip back and forth from
>> readahead and non-readahead.
>
>Yes, read my later posting describing the actual algorithm.  Let me know
>if it is wrong.
>
>--
>  Bruce Momjian                        |  http://www.op.net/~candle
>  pgman@candle.pha.pa.us               |  (610) 853-3000
>  +  If your life is a hard drive,     |  830 Blythe Avenue
>  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

------------------------------------------------------------
This e-mail has been sent to  you  courtesy of OperaMail,  a
free  web-based  service  from  Opera  Software,  makers  of
the award-winning Web Browser - http://www.operasoftware.com
------------------------------------------------------------

Re: PG 7.0 is 2.5 times slower running a big report

From

"Bryan White"

Date:

26 May 2000, 12:16:41


> Might I suggest a change in the name of this thread or starting a new one.
> The subject problem was resolved sometime ago and this is just bad
publicity.
> Almost scare me off till I read the user with the problem was now happy.
>

Yeah I still find it scary when I see it and I started the thread.