Thread: Surprise :-(

Surprise :-(

From

"Mihai Gheorghiu"

Date:

05 September 2002, 16:30:42

I ran a vacuum analyze on a database. Now the query I ran vacuum analyze for
takes twice as long, and all the other queries I tested take longer, too.
Please help.
Thank you all.

Re: Surprise :-(

From

Stephan Szabo

Date:

05 September 2002, 16:34:22

On Thu, 5 Sep 2002, Mihai Gheorghiu wrote:

> I ran a vacuum analyze on a database. Now the query I ran vacuum analyze for
> takes twice as long, and all the other queries I tested take longer, too.
> Please help.

What are the queries and explain output for the queries (preferably
including the old state if you have explain from that as well).

Re: Surprise :-(

From

"Mihai Gheorghiu"

Date:

05 September 2002, 18:07:10

explain select account, sum(amount) from tbas_transactions where isposted
and trxtype = 'MP' group by account;
psql:xx.txt:1: NOTICE:  QUERY PLAN:

Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
  ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
        ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
              ->  Index Scan using trx_trxtype_idx on tbas_transactions
(cost=0.00..10699.78 rows=3025 width=24)

EXPLAIN

Sorry, I do not have an explain from before vacuum analyze.
The table has ~700k rows and indices on account, trxtype and a few other
fields used in other queries.


-----Original Message-----
From: Stephan Szabo <sszabo@megazone23.bigpanda.com>
To: Mihai Gheorghiu <tanethq@earthlink.net>
Cc: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Date: Thursday, September 05, 2002 4:34 PM
Subject: Re: [GENERAL] Surprise :-(


>On Thu, 5 Sep 2002, Mihai Gheorghiu wrote:
>
>> I ran a vacuum analyze on a database. Now the query I ran vacuum analyze
for
>> takes twice as long, and all the other queries I tested take longer, too.
>> Please help.
>
>What are the queries and explain output for the queries (preferably
>including the old state if you have explain from that as well).
>

Re: Surprise :-(

From

Stephan Szabo

Date:

05 September 2002, 18:51:43

On Thu, 5 Sep 2002, Mihai Gheorghiu wrote:

>
> explain select account, sum(amount) from tbas_transactions where isposted
> and trxtype = 'MP' group by account;
> psql:xx.txt:1: NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
>   ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
>         ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
>               ->  Index Scan using trx_trxtype_idx on tbas_transactions
> (cost=0.00..10699.78 rows=3025 width=24)
>
> EXPLAIN
>
> Sorry, I do not have an explain from before vacuum analyze.
> The table has ~700k rows and indices on account, trxtype and a few other
> fields used in other queries.

If you set enable_indexscan=off; and then run explain/run the query
is it better?

For other questions, how many rows have trxtype='MP' really and what
version are you running?

Re: Surprise :-(

From

"Nigel J. Andrews"

Date:

05 September 2002, 19:07:04

On Thu, 5 Sep 2002, Mihai Gheorghiu wrote:

>
> explain select account, sum(amount) from tbas_transactions where isposted
> and trxtype = 'MP' group by account;
> psql:xx.txt:1: NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
>   ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
>         ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
>               ->  Index Scan using trx_trxtype_idx on tbas_transactions
> (cost=0.00..10699.78 rows=3025 width=24)
>
> EXPLAIN
>
> Sorry, I do not have an explain from before vacuum analyze.
> The table has ~700k rows and indices on account, trxtype and a few other
> fields used in other queries.
>
>
> -----Original Message-----
> From: Stephan Szabo <sszabo@megazone23.bigpanda.com>
> To: Mihai Gheorghiu <tanethq@earthlink.net>
> Cc: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
> Date: Thursday, September 05, 2002 4:34 PM
> Subject: Re: [GENERAL] Surprise :-(
>
>
> >On Thu, 5 Sep 2002, Mihai Gheorghiu wrote:
> >
> >> I ran a vacuum analyze on a database. Now the query I ran vacuum analyze
> for
> >> takes twice as long, and all the other queries I tested take longer, too.
> >> Please help.
> >
> >What are the queries and explain output for the queries (preferably
> >including the old state if you have explain from that as well).
> >

Okay, so, on the face of it it's a pretty good plan, it has chosen an index
scan returning only 3025 tuples out of 7000,000 after all. What does EXPLAIN
ANALYZE <your query> show?

Also you could try and replicate the previous plan by doing a SET
ENABLE_INDEXSCAN = OFF. Although this is a unlikely to give the previous plan
imo. A better approach would be to drop the index currently used, once sure of
being able to recreate it of course, and repeating as necessary to see which
index was giving the previous level performance.

What does SELECT * FROM pg_statistic WHERE starelid = (SELECT oid FROM pg_class
WHERE relname = 'tbas_transactions'); give? Don't forget to indicate which
column of your table has which attribute number in that output.

Assuming the name of the index used is descriptive the difference must be due
to isposted being true being more selective than the trxtype test. It starts to
get difficult at this point without knowing the explain analyze results and
something like SELECT isposted, count(1) FROM tbas_transactions GROUP BY
isposted.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: Surprise :-(

From

"Mihai Gheorghiu"

Date:

06 September 2002, 13:56:59

PG7.1.3 on RH7.1 on Dell PowerEdge 2500SC P3-933, 1GB RAM, 18GB SCSI160
There are 350k rows with trxtype=MP
With indexscan=off:
Aggregate  (cost=22975.15..22990.27 rows=302 width=24)
  ->  Group  (cost=22975.15..22982.71 rows=3025 width=24)
        ->  Sort  (cost=22975.15..22975.15 rows=3025 width=24)
              ->  Seq Scan on tbas_transactions  (cost=0.00..22800.29
rows=3025 width=24)
Time: 25.9s
With indexscan=on:
Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
  ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
        ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
              ->  Index Scan using trx_trxtype_idx on tbas_transactions
(cost=0.00..10699.78 rows=3025 width=24)
Time: 24.9s
The point is I need to run this query in a fraction of the above time,
otherwise I'm in deep trouble.
Any suggestion is welcome.

-----Original Message-----
From: Stephan Szabo <sszabo@megazone23.bigpanda.com>
To: Mihai Gheorghiu <tanethq@earthlink.net>
Cc: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Date: Thursday, September 05, 2002 6:51 PM
Subject: Re: [GENERAL] Surprise :-(


>
>On Thu, 5 Sep 2002, Mihai Gheorghiu wrote:
>
>>
>> explain select account, sum(amount) from tbas_transactions where isposted
>> and trxtype = 'MP' group by account;
>> psql:xx.txt:1: NOTICE:  QUERY PLAN:
>>
>> Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
>>   ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
>>         ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
>>               ->  Index Scan using trx_trxtype_idx on tbas_transactions
>> (cost=0.00..10699.78 rows=3025 width=24)
>>
>> EXPLAIN
>>
>> Sorry, I do not have an explain from before vacuum analyze.
>> The table has ~700k rows and indices on account, trxtype and a few other
>> fields used in other queries.
>
>If you set enable_indexscan=off; and then run explain/run the query
>is it better?
>
>For other questions, how many rows have trxtype='MP' really and what
>version are you running?
>
>

Re: Surprise :-(

From

Stephan Szabo

Date:

06 September 2002, 14:11:41

On Fri, 6 Sep 2002, Mihai Gheorghiu wrote:

> PG7.1.3 on RH7.1 on Dell PowerEdge 2500SC P3-933, 1GB RAM, 18GB SCSI160
> There are 350k rows with trxtype=MP
> With indexscan=off:
> Aggregate  (cost=22975.15..22990.27 rows=302 width=24)
>   ->  Group  (cost=22975.15..22982.71 rows=3025 width=24)
>         ->  Sort  (cost=22975.15..22975.15 rows=3025 width=24)
>               ->  Seq Scan on tbas_transactions  (cost=0.00..22800.29
> rows=3025 width=24)
> Time: 25.9s
> With indexscan=on:
> Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
>   ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
>         ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
>               ->  Index Scan using trx_trxtype_idx on tbas_transactions
> (cost=0.00..10699.78 rows=3025 width=24)
> Time: 24.9s
> The point is I need to run this query in a fraction of the above time,
> otherwise I'm in deep trouble.
> Any suggestion is welcome.

Hmm, in general a multicolumn index on (trxtype,account) might enable it
to get rid of the sort step.

Re: Surprise :-(

From

"Nigel J. Andrews"

Date:

06 September 2002, 14:27:15

On Fri, 6 Sep 2002, Mihai Gheorghiu wrote:

> PG7.1.3 on RH7.1 on Dell PowerEdge 2500SC P3-933, 1GB RAM, 18GB SCSI160
> There are 350k rows with trxtype=MP
> With indexscan=off:
> Aggregate  (cost=22975.15..22990.27 rows=302 width=24)
>   ->  Group  (cost=22975.15..22982.71 rows=3025 width=24)
>         ->  Sort  (cost=22975.15..22975.15 rows=3025 width=24)
>               ->  Seq Scan on tbas_transactions  (cost=0.00..22800.29
> rows=3025 width=24)
> Time: 25.9s
> With indexscan=on:
> Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
>   ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
>         ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
>               ->  Index Scan using trx_trxtype_idx on tbas_transactions
> (cost=0.00..10699.78 rows=3025 width=24)
> Time: 24.9s
> The point is I need to run this query in a fraction of the above time,
> otherwise I'm in deep trouble.
> Any suggestion is welcome.
>

Yes, drop the index on trxtype so that it's selecting on isposted. Obviously,
knowing things like the actual number of rows returned by explain analyze (only
7.2.x?) as per my other message would be more usedul.

Your trxtype index is actually selecting 50% of the rows but thinks it only has
3025...hmmmm...typing that made me think something is wrong...ah, I see 7.1.3,
not 7.2.x...how many other values for this field are there? Even 7.1.3 should
have caught that as a common value right? However, in short, unless isposted is
true for lots of those 350k rows you dropping the index would hopefully prompt
a switch to an index on that column. Or, create a multicolumn index on
(isposted,trxtype).

BTW, there must be some sort of data clustering going because that index scan
is faster than the sequential scan even though it's fetching 50% of the
table. Perhaps it's a result of caching.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: Surprise :-(

From

"Mihai Gheorghiu"

Date:

06 September 2002, 14:51:40

I created an index over account and trxtype.
Here is the result of explain (with indexscan=on):
Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
  ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
        ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
              ->  Index Scan using trx_trxtype_idx on tbas_transactions
(cost=0.00..10699.78 rows=3025 width=24)
I checked, and the two-column index is there, in the pd_indexes table.

-----Original Message-----
From: Stephan Szabo <sszabo@megazone23.bigpanda.com>
To: Mihai Gheorghiu <tanethq@earthlink.net>
Cc: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Date: Friday, September 06, 2002 2:11 PM
Subject: Re: [GENERAL] Surprise :-(


>
>On Fri, 6 Sep 2002, Mihai Gheorghiu wrote:
>
>> PG7.1.3 on RH7.1 on Dell PowerEdge 2500SC P3-933, 1GB RAM, 18GB SCSI160
>> There are 350k rows with trxtype=MP
>> With indexscan=off:
>> Aggregate  (cost=22975.15..22990.27 rows=302 width=24)
>>   ->  Group  (cost=22975.15..22982.71 rows=3025 width=24)
>>         ->  Sort  (cost=22975.15..22975.15 rows=3025 width=24)
>>               ->  Seq Scan on tbas_transactions  (cost=0.00..22800.29
>> rows=3025 width=24)
>> Time: 25.9s
>> With indexscan=on:
>> Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
>>   ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
>>         ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
>>               ->  Index Scan using trx_trxtype_idx on tbas_transactions
>> (cost=0.00..10699.78 rows=3025 width=24)
>> Time: 24.9s
>> The point is I need to run this query in a fraction of the above time,
>> otherwise I'm in deep trouble.
>> Any suggestion is welcome.
>
>Hmm, in general a multicolumn index on (trxtype,account) might enable it
>to get rid of the sort step.
>
>

Re: Surprise :-(

From

"Mihai Gheorghiu"

Date:

06 September 2002, 15:13:29

I ran select from pg_statistics... as you advised
The result is attached.
Col#   Name
5      account
10     trxtype
15     amount
28     isposted
I must admit I cannot make very much sense out of it. What does it tell?
Thank you very much.
P.S. I am running PG7.1.3. Is explain analyze an improvement in 7.2?

-----Original Message-----
From: Nigel J. Andrews <nandrews@investsystems.co.uk>
To: Mihai Gheorghiu <tanethq@earthlink.net>
Cc: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Date: Thursday, September 05, 2002 7:07 PM
Subject: Re: [GENERAL] Surprise :-(


>On Thu, 5 Sep 2002, Mihai Gheorghiu wrote:
>
>>
>> explain select account, sum(amount) from tbas_transactions where isposted
>> and trxtype = 'MP' group by account;
>> psql:xx.txt:1: NOTICE:  QUERY PLAN:
>>
>> Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
>>   ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
>>         ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
>>               ->  Index Scan using trx_trxtype_idx on tbas_transactions
>> (cost=0.00..10699.78 rows=3025 width=24)
>>
>> EXPLAIN
>>
>> Sorry, I do not have an explain from before vacuum analyze.
>> The table has ~700k rows and indices on account, trxtype and a few other
>> fields used in other queries.
>>
>>
>> -----Original Message-----
>> From: Stephan Szabo <sszabo@megazone23.bigpanda.com>
>> To: Mihai Gheorghiu <tanethq@earthlink.net>
>> Cc: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
>> Date: Thursday, September 05, 2002 4:34 PM
>> Subject: Re: [GENERAL] Surprise :-(
>>
>>
>> >On Thu, 5 Sep 2002, Mihai Gheorghiu wrote:
>> >
>> >> I ran a vacuum analyze on a database. Now the query I ran vacuum
analyze
>> for
>> >> takes twice as long, and all the other queries I tested take longer,
too.
>> >> Please help.
>> >
>> >What are the queries and explain output for the queries (preferably
>> >including the old state if you have explain from that as well).
>> >
>
>Okay, so, on the face of it it's a pretty good plan, it has chosen an index
>scan returning only 3025 tuples out of 7000,000 after all. What does
EXPLAIN
>ANALYZE <your query> show?
>
>Also you could try and replicate the previous plan by doing a SET
>ENABLE_INDEXSCAN = OFF. Although this is a unlikely to give the previous
plan
>imo. A better approach would be to drop the index currently used, once sure
of
>being able to recreate it of course, and repeating as necessary to see
which
>index was giving the previous level performance.
>
>What does SELECT * FROM pg_statistic WHERE starelid = (SELECT oid FROM
pg_class
>WHERE relname = 'tbas_transactions'); give? Don't forget to indicate which
>column of your table has which attribute number in that output.
>
>Assuming the name of the index used is descriptive the difference must be
due
>to isposted being true being more selective than the trxtype test. It
starts to
>get difficult at this point without knowing the explain analyze results and
>something like SELECT isposted, count(1) FROM tbas_transactions GROUP BY
>isposted.
>
>
>--
>Nigel J. Andrews
>Director
>
>---
>Logictree Systems Limited
>Computer Consultants
>
>
>
>

Attachment

o7.csv

Re: Surprise :-(

From

"Mihai Gheorghiu"

Date:

06 September 2002, 16:08:13

Only one of the 350k trxtype=MP records has isposted=false !!!
I am downloading 7.2.2 right now, and, God willing, on Monday I will be able
to apply your recommendations regarding explain analyze.
Which leads me to the following question:
I dump this database with pgdump in less than 15 minutes and restore it with
psql -d mydata -f mydata.dmp in 375 minutes.
Is there a faster way?
Thanks again.

-----Original Message-----
From: Nigel J. Andrews <nandrews@investsystems.co.uk>
To: Mihai Gheorghiu <tanethq@earthlink.net>
Cc: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Date: Friday, September 06, 2002 2:27 PM
Subject: Re: [GENERAL] Surprise :-(


>On Fri, 6 Sep 2002, Mihai Gheorghiu wrote:
>
>> PG7.1.3 on RH7.1 on Dell PowerEdge 2500SC P3-933, 1GB RAM, 18GB SCSI160
>> There are 350k rows with trxtype=MP
>> With indexscan=off:
>> Aggregate  (cost=22975.15..22990.27 rows=302 width=24)
>>   ->  Group  (cost=22975.15..22982.71 rows=3025 width=24)
>>         ->  Sort  (cost=22975.15..22975.15 rows=3025 width=24)
>>               ->  Seq Scan on tbas_transactions  (cost=0.00..22800.29
>> rows=3025 width=24)
>> Time: 25.9s
>> With indexscan=on:
>> Aggregate  (cost=10874.64..10889.76 rows=302 width=24)
>>   ->  Group  (cost=10874.64..10882.20 rows=3025 width=24)
>>         ->  Sort  (cost=10874.64..10874.64 rows=3025 width=24)
>>               ->  Index Scan using trx_trxtype_idx on tbas_transactions
>> (cost=0.00..10699.78 rows=3025 width=24)
>> Time: 24.9s
>> The point is I need to run this query in a fraction of the above time,
>> otherwise I'm in deep trouble.
>> Any suggestion is welcome.
>>
>
>Yes, drop the index on trxtype so that it's selecting on isposted.
Obviously,
>knowing things like the actual number of rows returned by explain analyze
(only
>7.2.x?) as per my other message would be more usedul.
>
>Your trxtype index is actually selecting 50% of the rows but thinks it only
has
>3025...hmmmm...typing that made me think something is wrong...ah, I see 7.1
.3,
>not 7.2.x...how many other values for this field are there? Even 7.1.3
should
>have caught that as a common value right? However, in short, unless
isposted is
>true for lots of those 350k rows you dropping the index would hopefully
prompt
>a switch to an index on that column. Or, create a multicolumn index on
>(isposted,trxtype).
>
>BTW, there must be some sort of data clustering going because that index
scan
>is faster than the sequential scan even though it's fetching 50% of the
>table. Perhaps it's a result of caching.
>
>
>--
>Nigel J. Andrews
>Director
>
>---
>Logictree Systems Limited
>Computer Consultants
>

Re: Surprise :-(

From

"Nigel J. Andrews"

Date:

06 September 2002, 18:44:17

On Fri, 6 Sep 2002, Mihai Gheorghiu wrote:

> I ran select from pg_statistics... as you advised
> The result is attached.
> Col#   Name
> 5      account
> 10     trxtype
> 15     amount
> 28     isposted
> I must admit I cannot make very much sense out of it. What does it tell?
> Thank you very much.
> P.S. I am running PG7.1.3. Is explain analyze an improvement in 7.2?
>
> >>
> >> explain select account, sum(amount) from tbas_transactions where isposted
> >> and trxtype = 'MP' group by account;
> >> psql:xx.txt:1: NOTICE:  QUERY PLAN:
> >>
> >> Sorry, I do not have an explain from before vacuum analyze.
> >> The table has ~700k rows and indices on account, trxtype and a few other
> >> fields used in other queries.

First, I have been assuming you're working on a non-essential and/or
non-production database where doing such things as deleting indexes is an
acceptable cost to determine and attempt to fix the speed problems you are
experiencing. I wouldn't have suggested such things otherwise.

Second, I believe you also stated that number of rows in this table with
trxtype = 'MP' is about 350k, i.e. 50%. The pg_statistic output you shown
shows 'RG' as the most common value with low and high values as thinks 'AS' and
'XP'. I think based on this information the statistics stand a chance of being
incorrect and you should try and confirm the distribution of values in this
column. Doing

    SELECT trxtype, count(1)
     FROM tbas_transactions
     GROUP BY trxtype
     ORDER BY trxtype DESC
     LIMIT 10

would be instructive. The limit number is somewhat arbitrary, the most
interesting results of that query will be the first and probably second row
returned and the row where trxtype is 'MP'.

However, as it stands I suggest you should do a

     VACUUM VERBOSE ANALYZE tbas_transactions

and retest your slow query. If no significant improvement it would be a good
idea to show us the output of that vacuum command and the same pg_statistic
entries as before but taken after this vacuum.

As for the data loading into 7.2.2 taking a long time. I can't really suggest
anything. You may find the 7.2.2 load does take less time than the 7.1.3 you
tested it on. I think everyone would also recommend doing the upgrade even with
this load time.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: Surprise :-(

From

Jason Earl

Date:

06 September 2002, 19:33:32

"Mihai Gheorghiu" <tanethq@earthlink.net> writes:

> Only one of the 350k trxtype=MP records has isposted=false !!!
> I am downloading 7.2.2 right now, and, God willing, on Monday I will be able
> to apply your recommendations regarding explain analyze.
> Which leads me to the following question:
> I dump this database with pgdump in less than 15 minutes and restore it with
> psql -d mydata -f mydata.dmp in 375 minutes.
> Is there a faster way?
> Thanks again.

Uh... You might want to skip ahead and go right to the new 7.3beta1.
Otherwise you are probably going to *want* to do another dump reload
cycle in the very near future.

Just a thought.

Jason

Re: Surprise :-(

From

"Steve Wolfe"

Date:

09 September 2002, 10:08:33

  I've ran into similar oddnesses twice in the past two months.  In both
cases, a developper came to me and said that a particular query ran very
quickly the day the table was created and populated, and ran very slowly
starting the next day, after the DB had been VACUUM'ed.  In both cases,
their WHERE clause used operations such as concatenations on fields that
did not have indexes.  In both cases, creating the indexes speeded up the
query, dropping the cost from some obscene number to a very low one.  ( on
the order of 8000 to 60).

  I'm not sure why the queries worked well before the database was
VACUUM'ed, but it's happened to me twice now - and it sounds similar
enough to your situation that you may want to look at whether you have
appropriate indexes created on your table.

steve

Re: Surprise :-(

From

"Mihai Gheorghiu"

Date:

09 September 2002, 16:53:32

This is the result of the statistic/count query:
 trxtype | count
---------+--------
 MP      | 347529
 AS      |  92273
 PR      |  56664
 TS      |  37756
 RG      |  30438
 PK      |  24764
 UP      |  14930
 EX      |  10285
 PD      |   7817
 OT      |   4149
 WW      |   2948
 PO      |   2568
 VO      |    728
 XP      |      2
 LF      |      1
(15 rows)


-----Original Message-----
From: Nigel J. Andrews <nandrews@investsystems.co.uk>
To: Mihai Gheorghiu <tanethq@earthlink.net>
Cc: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Date: Friday, September 06, 2002 6:44 PM
Subject: Re: [GENERAL] Surprise :-(


>
>On Fri, 6 Sep 2002, Mihai Gheorghiu wrote:
>
>> I ran select from pg_statistics... as you advised
>> The result is attached.
>> Col#   Name
>> 5      account
>> 10     trxtype
>> 15     amount
>> 28     isposted
>> I must admit I cannot make very much sense out of it. What does it tell?
>> Thank you very much.
>> P.S. I am running PG7.1.3. Is explain analyze an improvement in 7.2?
>>
>> >>
>> >> explain select account, sum(amount) from tbas_transactions where
isposted
>> >> and trxtype = 'MP' group by account;
>> >> psql:xx.txt:1: NOTICE:  QUERY PLAN:
>> >>
>> >> Sorry, I do not have an explain from before vacuum analyze.
>> >> The table has ~700k rows and indices on account, trxtype and a few
other
>> >> fields used in other queries.
>
>First, I have been assuming you're working on a non-essential and/or
>non-production database where doing such things as deleting indexes is an
>acceptable cost to determine and attempt to fix the speed problems you are
>experiencing. I wouldn't have suggested such things otherwise.
>
>Second, I believe you also stated that number of rows in this table with
>trxtype = 'MP' is about 350k, i.e. 50%. The pg_statistic output you shown
>shows 'RG' as the most common value with low and high values as thinks 'AS'
and
>'XP'. I think based on this information the statistics stand a chance of
being
>incorrect and you should try and confirm the distribution of values in this
>column. Doing
>
> SELECT trxtype, count(1)
> FROM tbas_transactions
> GROUP BY trxtype
> ORDER BY trxtype DESC
> LIMIT 10
>
>would be instructive. The limit number is somewhat arbitrary, the most
>interesting results of that query will be the first and probably second row
>returned and the row where trxtype is 'MP'.
>
>However, as it stands I suggest you should do a
>
> VACUUM VERBOSE ANALYZE tbas_transactions
>
>and retest your slow query. If no significant improvement it would be a
good
>idea to show us the output of that vacuum command and the same pg_statistic
>entries as before but taken after this vacuum.
>
>As for the data loading into 7.2.2 taking a long time. I can't really
suggest
>anything. You may find the 7.2.2 load does take less time than the 7.1.3
you
>tested it on. I think everyone would also recommend doing the upgrade even
with
>this load time.
>
>
>--
>Nigel J. Andrews
>Director
>
>---
>Logictree Systems Limited
>Computer Consultants
>
>
>

Re: Surprise :-(

From

"Nigel J. Andrews"

Date:

09 September 2002, 19:35:30

On Mon, 9 Sep 2002, Mihai Gheorghiu wrote:

> This is the result of the statistic/count query:
>  trxtype | count
> ---------+--------
>  MP      | 347529
> ...
>  RG      |  30438
> ...

> >
> >On Fri, 6 Sep 2002, Mihai Gheorghiu wrote:
> >
> >> I ran select from pg_statistics... as you advised
> >> The result is attached.
> >> Col#   Name
> >> 5      account
> >> 10     trxtype
> >> 15     amount
> >> 28     isposted

starelid,staattnum,staop,stanullfrac,stacommonfrac,stacommonval,staloval,stahival

---------,----------,------,------------,--------------,--------------------------,--------------------------,-----------------------------------
55256329,10,1058,0,0.0479733,RG,AS,XP

RG fraction of table = 30 / 700 = 5%   (as pg_statistics shows)
MP fraction of table = 350 / 700 = 50% (expected in pg_statistics)

May be someone else can verify that 7.1.3 had problems with stats gathering but
all this just suggests to me that the table hasn't been analysed. Are you sure
it was done? Try what I suggest below, to do just the one table, when you have
time and if it's still appropiate.

Also you could run the command:

UPDATE pg_statistics SET sttcommonval = 'MP', stacommonfrac = 0.5
  WHERE statrelid = 55256329 AND staattnum = 10;

which will make pg_statistics reflect reality more accurate. However, that is
just going to make the planner choose the seqscan plan over the index scan and
doesn't explain why the statistics weren't updated properly by your vacuum
analyze.

Are you also sure that this same query on the same data ran faster before the
vacuum? Perhaps the fast query used a different value in the trxtype test?

It boils down to something must have changed and if it's not the data it must
be the query.

[as a reminder...]
> >> >>
> >> >> explain select account, sum(amount) from tbas_transactions where
> isposted
> >> >> and trxtype = 'MP' group by account;
> >> >>
> >> >> Sorry, I do not have an explain from before vacuum analyze.
> >> >> The table has ~700k rows and indices on account, trxtype and a few
> other
> >> >> fields used in other queries.
> >

And an interesting item to perform...

> >However, as it stands I suggest you should do a
> >
> > VACUUM VERBOSE ANALYZE tbas_transactions
> >
> >and retest your slow query. If no significant improvement it would be a
> good
> >idea to show us the output of that vacuum command and the same pg_statistic
> >entries as before but taken after this vacuum.
> >

I hope the 7.2.2 install is going well. It will be interesting see what you
make of it regarding this issue. I can't see it taking a significantly shorter
time for this query on this data.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants