Thread: slow query performance

slow query performance

From

"Dave Weaver"

Date:

30 October 2003, 05:13:06

I'm having severe performance issues with a conceptually simple
database.  The database has one table, containing weather observations.
The table currently has about 13.5 million rows, and is being updated
constantly. (The database is running on a dual 550MHz PIII with 512MB RAM.
I have PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC 2.96
on RedHat 7.2)

On the whole, queries are of the form:

    SELECT ? FROM obs WHERE station = ?
                    AND valid_time < ? AND valid_time > ?
or:
    SELECT ? FROM obs WHERE station IN (?, ?, ...)
                    AND valid_time < ? AND valid_time > ?

Queries like these are taking around 4 to 5 minutes each, which seems
excessively slow to me (or are my expectations far too optimistic?).

For instance:
    SELECT station, air_temp FROM obs
         WHERE station = 'EGBB'
            AND valid_time > '28/8/03 00:00'
     AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.

An EXPLAIN of the above query says:
  NOTICE:  QUERY PLAN:

  Index Scan using obs_pkey on obs  (cost=0.00..9.01 rows=1 width=20)

A simple "SELECT count(*) from obs" query takes around that sort of time
too.

I have run "vacuumdb --analyze obs", to little effect.

How can I speed this up? Where am I going wrong? Is there a problem with
the table structure, or the indexes? Does the continual updating of the
database (at the rate of somewhere between 1-3 entries per second) cause
problems?

The table and indexes are defined as follows:


                           Table "obs"
         Attribute          |           Type           | Modifier
----------------------------+--------------------------+----------
 valid_time                 | timestamp with time zone |
 metar_air_temp             | double precision         |
 relative_humidity          | double precision         |
 pressure_change            | double precision         |
 ceiling                    | double precision         |
 metar_dew_point            | double precision         |
 metar_gusts                | double precision         |
 wet_bulb_temperature       | double precision         |
 past_weather               | text                     |
 visibility                 | double precision         |
 metar_visibility           | double precision         |
 precipitation              | double precision         |
 station                    | character(10)            |
 pressure_msl               | double precision         |
 metar_min_temperature_6hr  | double precision         |
 precipitation_period       | double precision         |
 metar_wet_bulb             | double precision         |
 saturation_mixing_ratio    | double precision         |
 metar_pressure             | double precision         |
 metar_sky_cover            | text                     |
 dew_point                  | double precision         |
 wind_direction             | double precision         |
 actual_time                | timestamp with time zone |
 gust_speed                 | double precision         |
 high_cloud_type            | text                     |
 precipitation_24hr         | double precision         |
 metar_precipitation_24hr   | double precision         |
 pressure_tendency          | text                     |
 metar_relative_humidity    | double precision         |
 low_cloud_type             | text                     |
 metar_max_temperature_6hr  | double precision         |
 middle_cloud_type          | text                     |
 air_temp                   | double precision         |
 low_and_middle_cloud_cover | text                     |
 metar_wind_dir             | double precision         |
 metar_weather              | text                     |
 snow_depth                 | double precision         |
 metar_snow_depth           | double precision         |
 min_temp_12hr              | double precision         |
 present_weather            | text                     |
 wind_speed                 | double precision         |
 snow_cover                 | text                     |
 metar_wind_speed           | double precision         |
 metar_ceiling              | double precision         |
 max_temp_12hr              | double precision         |
 mixing_ratio               | double precision         |
 pressure_change_3hr        | double precision         |
 total_cloud                | integer                  |
 max_temp_24hr              | double precision         |
 min_temp_24hr              | double precision         |
 snow_amount_6hr            | double precision         |
Indices: obs_pkey,
         obs_station,
         obs_valid_time

           Index "obs_pkey"
 Attribute  |           Type
------------+--------------------------
 valid_time | timestamp with time zone
 station    | character(10)
unique btree

    Index "obs_station"
 Attribute |     Type
-----------+---------------
 station   | character(10)
btree

        Index "obs_valid_time"
 Attribute  |           Type
------------+--------------------------
 valid_time | timestamp with time zone
btree

(I suspect the obs_valid_time index is redundant, because of the
obs_pkey index - is that right?)

I'd be grateful for any advice and any clues to help speed this up.
Many thanks,
Dave

Re: slow query performance

From

Shridhar Daithankar

Date:

30 October 2003, 05:27:05

Dave Weaver wrote:
  too optimistic?).
>
> For instance:
>     SELECT station, air_temp FROM obs
>          WHERE station = 'EGBB'
>             AND valid_time > '28/8/03 00:00'
>      AND valid_time < '28/10/03 00:00'
>
> takes 4 mins 32 secs.
>
> An EXPLAIN of the above query says:
>   NOTICE:  QUERY PLAN:
>
>   Index Scan using obs_pkey on obs  (cost=0.00..9.01 rows=1 width=20)
>
> A simple "SELECT count(*) from obs" query takes around that sort of time
> too.

That is no test. Postgresql will always fetch the entire table.
>
> I have run "vacuumdb --analyze obs", to little effect.

Check http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html for general
tuning tips and do a vacuum full if table is updated frequently. Also reindex
the indexes after vacuum. It will take quite some time though.

See if that helps.

  Shridhar

Re: slow query performance

From

"Dave Weaver"

Date:

30 October 2003, 06:06:00

Shridhar Daithankar wrote:

> Dave Weaver wrote:

> > A simple "SELECT count(*) from obs" query takes around that sort of time

> > too.

> That is no test. Postgresql will always fetch the entire table.

OK, I wasn't aware of that.

> > I have run "vacuumdb --analyze obs", to little effect.

> Check http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html for general

> tuning tips

Thanks for the link - I will read and digest.

> and do a vacuum full if table is updated frequently. Also reindex

> the indexes after vacuum. It will take quite some time though.

The version of Postgres that I'm running (7.1.3) doesn't seem to have a "vacuum full".

I have reindexed after a vacuum, with liitle-to-no effect.

Dave.

Re: slow query performance

From

Shridhar Daithankar

Date:

30 October 2003, 06:42:11

Dave Weaver wrote:
> The version of Postgres that I'm running (7.1.3) doesn't seem to have a
> "vacuum full".
>
> I have reindexed after a vacuum, with liitle-to-no effect.

If possible you should upgrade. 7.1.3 is rather old. But that shouldn't mke that
much difference ( I guess. Wasn't a postgresql user back then..:-)

  Shridhar

Re: slow query performance

From

Jeff

Date:

30 October 2003, 09:20:45

> > For instance:
> >     SELECT station, air_temp FROM obs
> >          WHERE station = 'EGBB'
> >             AND valid_time > '28/8/03 00:00'
> >      AND valid_time < '28/10/03 00:00'
> >
> > takes 4 mins 32 secs.

How many rows should that return?
[explain analyze will tell you that]

and while that runs is your disk thrashing? vmstat's bi/bo columns will
tell you.

7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
.4beta] with huge db's

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: slow query performance

From

"Dave Weaver"

Date:

30 October 2003, 09:50:18

Jeff wrote:
> Dave Weaver wrote:
> > For instance:
> >     SELECT station, air_temp FROM obs
> >          WHERE station = 'EGBB'
> >             AND valid_time > '28/8/03 00:00'
> >      AND valid_time < '28/10/03 00:00'
> >
> > takes 4 mins 32 secs.
>
> How many rows should that return?
> [explain analyze will tell you that]

"explain analyze" doesn't seem to be part of this postgres version
(or I misunderstood something).
That particular query returned 24 rows.

> and while that runs is your disk thrashing? vmstat's bi/bo columns will
> tell you.

The machine's over the other side of the building, so I can't physically
see if the disk is thrashing.
I'm not sure how to interpret the vmstat output; running "vmstat 1" shows
me bi/bo both at zero (mostly) until I start the query. Then bi shoots up
to around 2500 (bo remains around zero) until the query finishes.

> 7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
> .4beta] with huge db's

Is the upgrade likely to make a difference?
I'm still none-the-wiser wether the problem I have is due to:
    1 Postgres version
    2 Database size
    3 Table structure
    4 Configuration issues
    5 Slow hardware
    6 All of the above
    7 None of the above
    8 Something else

Thanks for the help,
Dave.

Re: slow query performance

From

Paul Thomas

Date:

30 October 2003, 09:53:51

On 30/10/2003 10:41 Shridhar Daithankar wrote:
> Dave Weaver wrote:
>> The version of Postgres that I'm running (7.1.3) doesn't seem to have a
>> "vacuum full".
>>
>> I have reindexed after a vacuum, with liitle-to-no effect.
>
> If possible you should upgrade. 7.1.3 is rather old. But that shouldn't
> mke that much difference ( I guess. Wasn't a postgresql user back
> then..:-)
>
>  Shridhar


IIRC, vacuum on older versions of PG locked tables and was effectively
what vacuum full is now. Looking at HISTORY, non-locking vacuum and vacuum
full were introduced in 7.2.

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller
Business |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

Re: slow query performance

From

Jeff

Date:

30 October 2003, 10:29:58

On Thu, 30 Oct 2003 13:49:46 -0000
"Dave Weaver" <davew@wsieurope.com> wrote:

> Jeff wrote:
> > Dave Weaver wrote:
> > > For instance:
> > >     SELECT station, air_temp FROM obs
> > >          WHERE station = 'EGBB'
> > >             AND valid_time > '28/8/03 00:00'
> > >      AND valid_time < '28/10/03 00:00'
> > >
> > > takes 4 mins 32 secs.
> >
> > How many rows should that return?
> > [explain analyze will tell you that]
>
> "explain analyze" doesn't seem to be part of this postgres version
> (or I misunderstood something).
> That particular query returned 24 rows.
>
You run explain analyze [insert query here]

Post that output


>
> The machine's over the other side of the building, so I can't
> physically see if the disk is thrashing.
> I'm not sure how to interpret the vmstat output; running "vmstat 1"
> shows me bi/bo both at zero (mostly) until I start the query. Then bi
> shoots up to around 2500 (bo remains around zero) until the query
> finishes.
>

The BI column means it is reading 2500 blocks / second. This is
typically kB/sec (Linux defaults to 1kB block size on filesystems)

That seems pretty low.. even for an older disk.
We'll need the explain analyze output to help further.


--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: slow query performance

From

"Dave Weaver"

Date:

30 October 2003, 11:21:49

Jeff Wrote:
> "Dave Weaver" <davew@wsieurope.com> wrote:
> >
> > "explain analyze" doesn't seem to be part of this postgres version (or
> > I misunderstood something).
> > That particular query returned 24 rows.
> >
> You run explain analyze [insert query here]
>
> Post that output

The output is probably not what you were expecting! :-)

obs=> explain analyze select * from obs where station = 'EGBB'
obs-> and valid_time > '28/8/03 00:00' and valid_time < '28/10/03 00:00';
ERROR: parser: parse error at or near "analyze"
obs=> \h explain
Command:     EXPLAIN
Description: Shows statement execution plan
Syntax:
EXPLAIN [ VERBOSE ] query

As I said, "explain analyze" doesn't seem to be part of this postgres
version.

Dave.

Re: slow query performance

From

"scott.marlowe"

Date:

30 October 2003, 11:21:55

On Thu, 30 Oct 2003, Dave Weaver wrote:

> Jeff wrote:
> > Dave Weaver wrote:
> > > For instance:
> > >     SELECT station, air_temp FROM obs
> > >          WHERE station = 'EGBB'
> > >             AND valid_time > '28/8/03 00:00'
> > >      AND valid_time < '28/10/03 00:00'
> > >
> > > takes 4 mins 32 secs.
> >
> > How many rows should that return?
> > [explain analyze will tell you that]
>
> "explain analyze" doesn't seem to be part of this postgres version
> (or I misunderstood something).
> That particular query returned 24 rows.

Back then it was just explain.  explain analyze actually runs the query
and tells you how long each thing too etc...  i.e. it gives you the "I
imagine I'll get this many rows back and it'll cost this much" part, then
the cold hard facts of how many rows really came back, and how long it
really too.  Quite a nice improvement.

> > and while that runs is your disk thrashing? vmstat's bi/bo columns will
> > tell you.
>
> The machine's over the other side of the building, so I can't physically
> see if the disk is thrashing.
> I'm not sure how to interpret the vmstat output; running "vmstat 1" shows
> me bi/bo both at zero (mostly) until I start the query. Then bi shoots up
> to around 2500 (bo remains around zero) until the query finishes.

Your disk is likely trashing.

Can you set sort_mem on that old version of pgsql to something higher?

set sort_mem = 32768;

or something similar?

> > 7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
> > .4beta] with huge db's
>
> Is the upgrade likely to make a difference?
> I'm still none-the-wiser wether the problem I have is due to:
>     1 Postgres version
>     2 Database size
>     3 Table structure
>     4 Configuration issues
>     5 Slow hardware
>     6 All of the above
>     7 None of the above
>     8 Something else

Yes, the upgrade is very likely to make a difference.  The average
performance gain for each version since then has been, in my experience,
anywhere from a few percentage points faster to many times faster,
depending on what you were trying to do.

Why not download 7.4beta5 and see if you can get it to import the data
from 7.1.3?  It's close to going production, and in my opinion, 7.4beta5
is probably at least as stable as 7.1.3 considering the number of unfixed
bugs likely to be hiding out there.  My guess is that you'll find your
workstation running 74beta5 with one IDE hard drive outrunning your server
with 7.1.3 on it.  Seriously.

We're running 7.2.4 where I work, and the change from 7.1 to 7.2 was huge
for us, especially the non-full vacuums.

Re: slow query performance

From

Stephan Szabo

Date:

30 October 2003, 12:01:28

On Thu, 30 Oct 2003, Dave Weaver wrote:

> Jeff wrote:
> > Dave Weaver wrote:
> > > For instance:
> > >     SELECT station, air_temp FROM obs
> > >          WHERE station = 'EGBB'
> > >             AND valid_time > '28/8/03 00:00'
> > >      AND valid_time < '28/10/03 00:00'
> > >
> > > takes 4 mins 32 secs.
> >
> > How many rows should that return?
> > [explain analyze will tell you that]
>
> "explain analyze" doesn't seem to be part of this postgres version
> (or I misunderstood something).
> That particular query returned 24 rows.
>
>
> > and while that runs is your disk thrashing? vmstat's bi/bo columns will
> > tell you.
>
> The machine's over the other side of the building, so I can't physically
> see if the disk is thrashing.
> I'm not sure how to interpret the vmstat output; running "vmstat 1" shows
> me bi/bo both at zero (mostly) until I start the query. Then bi shoots up
> to around 2500 (bo remains around zero) until the query finishes.
>
>
> > 7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
> > .4beta] with huge db's
>
> Is the upgrade likely to make a difference?

Well, it's likely to get you better help.  Explain Analyze (added in 7.2
IIRC) gets us information on the real time spent in operations as well as
the real number of rows.

But, back to the question, what is the definition of the index it's using?
If you don't have already have an index on (station,valid_time does
making one help?

Re: slow query performance

From

Tom Lane

Date:

30 October 2003, 13:36:46

"Dave Weaver" <davew@wsieurope.com> writes:
> Is the upgrade likely to make a difference?

I'm not sure if it would help for this specific query, but in general
each major PG release has useful performance improvements over the
previous one.

What I'm wondering about is an index-bloat problem (see the
pgsql-performance archives for discussions).  Do you do a lot of updates
or deletes on this table, or is it just inserts?  What is the physical
size of the table and its index?  The output of VACUUM VERBOSE for this
table would be useful to show.

            regards, tom lane

Re: slow query performance

From

"Dave Weaver"

Date:

31 October 2003, 03:31:17

Tom Lane wrote:
> Do you do a lot of updates or deletes on this table, or is it just
> inserts?

Inserts and updates. No deletes.


> What is the physical size of the table and its index?

How do I find out this information?


> The output of VACUUM VERBOSE for this table would be useful to show.

obs=> vacuum verbose obs;
NOTICE:  --Relation obs--
NOTICE:  Pages 276896: Changed 2776, reaped 67000, Empty 0, New 0;s
  Tup 13739326: Vac 78031, Keep/VTL 3141/3141, Crash 0, UnUsed 303993,
  MinLen 72, MaxLen 476; Re-using: Free/Avail. Space 16174372/14995020;
  EndEmpty/Avail. Pages 0/18004. CPU 26.11s/3.78u sec.
NOTICE:  Index obs_pkey: Pages 114870; Tuples 13739326: Deleted 37445. CPU
12.33s/39.86u sec.
NOTICE:  Index obs_valid_time: Pages 45713; Tuples 13739326: Deleted 37445.
CPU 4.38s/37.65u sec.
NOTICE:  InvalidateSharedInvalid: cache state reset
NOTICE:  Index obs_station: Pages 53170; Tuples 13739326: Deleted 37445. CPU
6.46s/56.63u sec.
NOTICE:  Rel obs: Pages: 276896 --> 275200; Tuple(s) moved: 30899. CPU
33.94s/51.05u sec.
NOTICE:  Index obs_pkey: Pages 114962; Tuples 13739326: Deleted 30881. CPU
13.24s/19.80u sec.
NOTICE:  Index obs_valid_time: Pages 45819; Tuples 13739326: Deleted 30881.
CPU 4.51s/17.42u sec.
NOTICE:  Index obs_station: Pages 53238; Tuples 13739326: Deleted 30881. CPU
5.78s/18.33u sec.
NOTICE:  --Relation pg_toast_503832058--
NOTICE:  Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0,
  Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail.
  Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_toast_503832058_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u
sec.
VACUUM
obs=>


Cheers,
Dave.

Re: slow query performance

From

"Dave Weaver"

Date:

31 October 2003, 04:33:51

Stephan Szabo wrote:
> But, back to the question, what is the definition of the index it's using?
> If you don't have already have an index on (station,valid_time does
> making one help?

From my original post:

           Index "obs_pkey"
 Attribute  |           Type
------------+--------------------------
 valid_time | timestamp with time zone
 station    | character(10)
unique btree

    Index "obs_station"
 Attribute |     Type
-----------+---------------
 station   | character(10)
btree

        Index "obs_valid_time"
 Attribute  |           Type
------------+--------------------------
 valid_time | timestamp with time zone
btree

(I suspect the obs_valid_time index is redundant, because of the obs_pkey
index - is that right?)

Cheers,
Dave.

Re: slow query performance

From

Shridhar Daithankar

Date:

31 October 2003, 04:57:26

Dave Weaver wrote:

> Tom Lane wrote:
>
>>Do you do a lot of updates or deletes on this table, or is it just
>>inserts?
>
>
> Inserts and updates. No deletes.

Updates are insert/deletes under postgresql as it does not updates rows in place.

>>What is the physical size of the table and its index?
> How do I find out this information?

cd $PGDATA;du -h

This will give you size of each directory. Using utility oid2name in contrib
module in sources, you can find out what object is stored in which file. Same is
true for data files under it as well.

> obs=> vacuum verbose obs;
> NOTICE:  --Relation obs--
> NOTICE:  Pages 276896: Changed 2776, reaped 67000, Empty 0, New 0;s
>   Tup 13739326: Vac 78031, Keep/VTL 3141/3141, Crash 0, UnUsed 303993,
>   MinLen 72, MaxLen 476; Re-using: Free/Avail. Space 16174372/14995020;
>   EndEmpty/Avail. Pages 0/18004. CPU 26.11s/3.78u sec.
> NOTICE:  Index obs_pkey: Pages 114870; Tuples 13739326: Deleted 37445. CPU
> 12.33s/39.86u sec.
> NOTICE:  Index obs_valid_time: Pages 45713; Tuples 13739326: Deleted 37445.
> CPU 4.38s/37.65u sec.
> NOTICE:  InvalidateSharedInvalid: cache state reset
> NOTICE:  Index obs_station: Pages 53170; Tuples 13739326: Deleted 37445. CPU
> 6.46s/56.63u sec.
> NOTICE:  Rel obs: Pages: 276896 --> 275200; Tuple(s) moved: 30899. CPU
> 33.94s/51.05u sec.
> NOTICE:  Index obs_pkey: Pages 114962; Tuples 13739326: Deleted 30881. CPU
> 13.24s/19.80u sec.
> NOTICE:  Index obs_valid_time: Pages 45819; Tuples 13739326: Deleted 30881.
> CPU 4.51s/17.42u sec.
> NOTICE:  Index obs_station: Pages 53238; Tuples 13739326: Deleted 30881. CPU
> 5.78s/18.33u sec.
> NOTICE:  --Relation pg_toast_503832058--
> NOTICE:  Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0,
>   Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail.
>   Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
> NOTICE:  Index pg_toast_503832058_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u
> sec.
> VACUUM

You reindexed? Seems like this is after you have already run a vacuum. So not
much change is shown here.

  HTH

  Shridhar

Re: slow query performance

From

Mark Kirkwood

Date:

31 October 2003, 06:37:30

Dave,

Apologies if this has been suggested before, but maybe :

- interchanging the key order for the "obs_pkey" index and
- clustering the "obs" table on "station"

might make these queries go a bit better?

Alternatively if maintaining a cluster on station is infeasable, you
could consider a collection of partial indexes on valid_time for each
station:

create index obs_valid_time _stat1 on obs(valid_time) where station =
'station 1';

(etc for each station)...

regards

Mark


Dave Weaver wrote:

>On the whole, queries are of the form:
>
>    SELECT ? FROM obs WHERE station = ?
>                    AND valid_time < ? AND valid_time > ?
>or:
>    SELECT ? FROM obs WHERE station IN (?, ?, ...)
>                    AND valid_time < ? AND valid_time > ?
>
>An EXPLAIN of the above query says:
>  NOTICE:  QUERY PLAN:
>
>  Index Scan using obs_pkey on obs  (cost=0.00..9.01 rows=1 width=20)
>
>           Index "obs_pkey"
> Attribute  |           Type
>------------+--------------------------
> valid_time | timestamp with time zone
> station    | character(10)
>unique btree
>
>
>

Re: slow query performance

From

"Dave Weaver"

Date:

31 October 2003, 07:28:18

Mark wrote:
> - interchanging the key order for the "obs_pkey" index and

I will try this.

> - clustering the "obs" table on "station"

Sorry, I don't understand what you mean by this - can you explain?

> create index obs_valid_time _stat1 on obs(valid_time) where station =
'station 1';
>

Interesting. I didn't know you could do that (I have a lot to learn!).
There are about 13500 stations, so that's a lot of indexes.

Hang on - just tried it and I get a 'parse error at or near "where"', so
I suspect that's yet another reason to upgrade...

I think the one message that's come across loud and clear from this
thread is "Upgrade!", so I guess that's what I'll be doing!

Many thanks to all who have contributed to this thread so far.
I really appreciate al hte help.

Cheers,
Dave.

Re: slow query performance

From

Tom Lane

Date:

31 October 2003, 10:10:41

"Dave Weaver" <davew@wsieurope.com> writes:
>> The output of VACUUM VERBOSE for this table would be useful to show.

> NOTICE:  Index obs_pkey: Pages 114962; Tuples 13739326: Deleted 30881. CPU
> 13.24s/19.80u sec.

Lets see, 114962 pages at 8K apiece, divided by 13739326 entries, gives
about 68 bytes per index entry. The entries themselves (contents a
timestamp and a char(10)) take 8 + 4 + 10 bytes for data, plus 8 bytes
for the index tuple header, plus 2 bytes wasted for alignment (assuming
this is Intel hardware), plus a 4-byte line pointer; 36 bytes total.
So you have an index loading factor of about 52%, which is noticeably
less than the theoretical optimum of 70%, though not really bad yet.
I think you are seeing some index bloat --- especially if you
recently reindexed, meaning that the index hasn't had very long to
grow.  You could try keeping an eye on the size of obs_pkey over time
and see if it grows faster than the table itself.

If you are going to upgrade I'd counsel going to 7.4, which should solve
or at least greatly reduce the problem of index bloat.  See the
pgsql-performance archives for more discussion.

            regards, tom lane

Re: slow query performance

From

Dave Weaver

Date:

31 October 2003, 14:47:42

On 29 Oct 2003 09:40:43 GMT, I wrote:
>
>  I'm having severe performance issues with a conceptually simple
>  database.

Oops - forgot to mention, I'm running:
    PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC 2.96
on RedHat 7.2

--
Dave
email: zen13097 (AT) zen [DOT] co {DOT} uk

Re: slow query performance

From

Mark Kirkwood

Date:

31 October 2003, 17:24:07

Dave Weaver wrote:

>>- clustering the "obs" table on "station"
>>
>>
>
>Sorry, I don't understand what you mean by this - can you explain?
>
>
>
Supposing obs_pkey is on (station, valid_time):

cluster obs_pkey on obs

will re-order the rows in obs based on the index obs_pkey. (This is
clustering on 'station' and 'valid_time', to do just station you could
use an index on just 'station').

The down side is that the row ordering is slowly lost as rows are
updated, so periodic running of the cluster command is needed - this is
a pain as it will take a while for 13 million row table.

regards

Mark

Re: slow query performance

From

Alvaro Herrera

Date:

02 November 2003, 23:12:43

On Sat, Nov 01, 2003 at 10:18:14AM +1300, Mark Kirkwood wrote:
>
> Dave Weaver wrote:
>
> >>- clustering the "obs" table on "station"
> >
> >Sorry, I don't understand what you mean by this - can you explain?

> Supposing obs_pkey is on (station, valid_time):
>
> cluster obs_pkey on obs

Be aware that doing this on the 7.1.3 version you are running will drop
the other indexes on the table, and some other metadata about it (grants,
inheritance, foreign key relationships IIRC).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Coge la flor que hoy nace alegre, ufana. ¿Quién sabe si nacera otra mañana?"