Re: Comparative performance - Mailing list pgsql-performance

From Joe
Subject Re: Comparative performance
Date
Msg-id 433BE1A0.5000807@freedomcircle.net
Whole thread Raw
In response to Re: Comparative performance  (Gavin Sherry <swm@alcove.com.au>)
Responses Re: Comparative performance
Re: Comparative performance
List pgsql-performance
Gavin Sherry wrote:
> Please post the table definitions, queries and explain analyze results so
> we can tell you why the performance is poor.

I did try to post that last night but apparently my reply didn't make it to the
list.  Here it is again:

Matthew Nuzum wrote:

 > This is the right list. Post detail and I'm sure you'll get some suggestions.


Thanks, Matthew (and Chris and Gavin).

The main table used in the query is defined as follows:

CREATE TABLE entry (
   entry_id serial PRIMARY KEY,
   title VARCHAR(128) NOT NULL,
   subtitle VARCHAR(128),
   subject_type SMALLINT,
   subject_id INTEGER REFERENCES topic,
   actor_type SMALLINT,
   actor_id INTEGER REFERENCES topic,
   actor VARCHAR(64),
   actor_role VARCHAR(64),
   rel_entry_id INTEGER,
   rel_entry VARCHAR(64),
   description VARCHAR(255),
   quote text,
   url VARCHAR(255),
   entry_date CHAR(10),
   created DATE NOT NULL DEFAULT CURRENT_DATE,
   updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
WITHOUT OIDS;
CREATE INDEX entry_actor_id ON entry (actor_id);
CREATE INDEX entry_subject_id ON entry (subject_id);

It has 3422 rows at this time.

The query for one of the pages is the following:

SELECT entry_id, subject_type AS type, subject_type, subject_id, actor_type,
actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle,
description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date
from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date
from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created,
updated FROM entry WHERE subject_id = 1079
UNION SELECT entry_id, actor_type AS type, subject_type, subject_id, actor_type,
actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle,
description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date
from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date
from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created,
updated FROM entry WHERE actor_id = 1079 ORDER BY type, title, subtitle;

The output of EXPLAIN ANALYZE is:

  Sort  (cost=158.98..159.14 rows=62 width=568) (actual time=16.000..16.000
rows=59 loops=1)
    Sort Key: "type", title, subtitle
    ->  Unique  (cost=153.57..157.14 rows=62 width=568) (actual
time=16.000..16.000 rows=59 loops=1)
          ->  Sort  (cost=153.57..153.73 rows=62 width=568) (actual
time=16.000..16.000 rows=59 loops=1)
                Sort Key: entry_id, "type", subject_type, subject_id,
actor_type, actor_id, actor, actor_role, rel_entry_id, rel_entry, title,
subtitle, description, url, main_quote, rel_quote, dom, date_ymd, date_ym,
date_y, created, updated
                ->  Append  (cost=0.00..151.73 rows=62 width=568) (actual
time=0.000..16.000 rows=59 loops=1)
                      ->  Subquery Scan "*SELECT* 1"  (cost=0.00..17.21 rows=4
width=568) (actual time=0.000..0.000 rows=3 loops=1)
                            ->  Index Scan using entry_subject_id on entry
(cost=0.00..17.17 rows=4 width=568) (actual time=0.000..0.000 rows=3 loops=1)
                                  Index Cond: (subject_id = 1079)
                      ->  Subquery Scan "*SELECT* 2"  (cost=0.00..134.52 rows=58
width=568) (actual time=0.000..16.000 rows=56 loops=1)
                            ->  Seq Scan on entry  (cost=0.00..133.94 rows=58
width=568) (actual time=0.000..16.000 rows=56 loops=1)
                                  Filter: (actor_id = 1079)
  Total runtime: 16.000 ms
(13 rows)

What I don't quite understand is why it's doing a sequential scan on actor_id
instead of using the entry_actor_id index.  Note that actor_id has 928 non-null
values (27%), whereas subject_id has 3089 non-null values (90%).

Note that the entry_date column was originally a MySQL date but it had partial
dates, i.e., some days and months are set to zero.  Eventually I hope to define
a PostgreSQL datatype for it and to simplify the substring retrievals.  However,
I don't think the extra computational time should affect the overall runtime
significantly.

Gavin, I'm using PostgreSQL 8.0.3, Apache 1.3.28, PHP 4.3.4, MySQL 4.0.16 and
I'm comparing both databases on XP (on a Pentium 4, 1.6 GHz, 256 MB RAM).

Thanks for any feedback.

Joe


pgsql-performance by date:

Previous
From: Joe
Date:
Subject: Re: Comparative performance
Next
From: Andreas Pflug
Date:
Subject: Re: Comparative performance