Re: Comparative performance - Mailing list pgsql-performance

From Joe
Subject Re: Comparative performance
Date
Msg-id 4341E328.2060305@freedomcircle.net
Whole thread Raw
In response to Re: Comparative performance  (PFC <lists@boutiquenumerique.com>)
Responses Re: Comparative performance  (PFC <lists@boutiquenumerique.com>)
List pgsql-performance
PFC wrote:
>     Even though this query isn't that optimized, it's still only 16
> milliseconds.
>     Why does it take this long for PHP to get the results ?
>
>     Can you try pg_query'ing this exact same query, FROM PHP, and timing
> it  with getmicrotime() ?

That query took about 27 msec in actual PHP execution time.  It turns out the
real culprit is the following query, which interestingly enough retrieves zero
rows in the case of the Economists page that I've been using for testing, yet it
uses up about 1370 msec in actual runtime:

SELECT topic_id1, topic_id2, topic_name, categ_id, list_name, t.title, url,
page_type, rel_type, inverse_id, r.description AS rel_descrip, r.created, r.updated
FROM relationship r, topic t, entry_type e
WHERE ((topic_id1 = topic_id AND topic_id2 = 1252) OR (topic_id2 = topic_id and
topic_id1 = 1252)) AND rel_type = type_id AND e.class_id = 2
ORDER BY rel_type, list_name;

The EXPLAIN ANALYZE output, after I ran VACUUM ANALYZE on the three tables, is:

  Sort  (cost=4035.55..4035.56 rows=1 width=131) (actual time=2110.000..2110.000
rows=0 loops=1)
    Sort Key: r.rel_type, t.list_name
    ->  Nested Loop  (cost=36.06..4035.54 rows=1 width=131) (actual
time=2110.000..2110.000 rows=0 loops=1)
          Join Filter: ((("inner".topic_id1 = "outer".topic_id) AND
("inner".topic_id2 = 1252)) OR (("inner".topic_id2 = "outer".topic_id) AND
("inner".topic_id1 = 1252)))
          ->  Seq Scan on topic t  (cost=0.00..38.34 rows=1234 width=90) (actual
time=0.000..15.000 rows=1234 loops=1)
          ->  Materialize  (cost=36.06..37.13 rows=107 width=45) (actual
time=0.000..0.509 rows=466 loops=1234)
                ->  Merge Join  (cost=30.31..35.96 rows=107 width=45) (actual
time=0.000..0.000 rows=466 loops=1)
                      Merge Cond: ("outer".type_id = "inner".rel_type)
                      ->  Index Scan using entry_type_pkey on entry_type e  (cost
=0.00..3.94 rows=16 width=4) (actual time=0.000..0.000 rows=15 loops=1)
                            Filter: (class_id = 2)
                      ->  Sort  (cost=30.31..31.48 rows=466 width=43) (actual
time=0.000..0.000 rows=466 loops=1)
                            Sort Key: r.rel_type
                            ->  Seq Scan on relationship r  (cost=0.00..9.66
rows=466 width=43) (actual time=0.000..0.000 rows=466 loops=1)
  Total runtime: 2110.000 ms
(14 rows)

The tables are as follows:

CREATE TABLE entry_type (
   type_id SMALLINT NOT NULL PRIMARY KEY,
   title VARCHAR(32) NOT NULL,
   rel_title VARCHAR(32),
   class_id SMALLINT NOT NULL DEFAULT 1,
   inverse_id SMALLINT,
   updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
WITHOUT OIDS;

CREATE TABLE topic (
   topic_id serial PRIMARY KEY,
   topic_name VARCHAR(48) NOT NULL UNIQUE,
   categ_id SMALLINT NOT NULL,
   parent_entity INTEGER,
   parent_concept INTEGER,
   crossref_id INTEGER,
   list_name VARCHAR(80) NOT NULL,
   title VARCHAR(80),
   description VARCHAR(255),
   url VARCHAR(64),
   page_type SMALLINT NOT NULL,
   dark_ind BOOLEAN NOT NULL DEFAULT FALSE,
   ad_code INTEGER,
   created DATE NOT NULL DEFAULT CURRENT_DATE,
   updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
WITHOUT OIDS;

CREATE TABLE relationship (
   topic_id1 INTEGER NOT NULL REFERENCES topic,
   topic_id2 INTEGER NOT NULL REFERENCES topic,
   rel_type INTEGER NOT NULL,
   description VARCHAR(255),
   created DATE NOT NULL DEFAULT CURRENT_DATE,
   updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (topic_id1, topic_id2, rel_type))
WITHOUT OIDS;

I'm thinking that perhaps I need to set up another index with topic_id2 first
and topic_id1 second.  In addition, an index on entry_type.class_id may improve
things.  Another possibility would be to rewrite the query as a UNION.

Of course, this doesn't explain how MySQL manages to execute the query in about
9 msec.  The only minor differences in the schema are:  entry_type.title and
rel_title are char(32) in MySQL, entry_type.class_id is a tinyint, and
topic.categ_id, page_type and dark_ind are also tinyints. MySQL also doesn't
have the REFERENCES.

A couple of interesting side notes from my testing.  First is that pg_connect()
took about 39 msec but mysql_connect() took only 4 msec, however, pg_pconnect()
took 0.14 msec while mysql_pconnect() took 0.99 msec (all tests were repeated
five times and the quoted results are averages).  Second, is that PostgreSQL's
performance appears to be much more consistent in certain queries.  For example,
the query that retrieves the list of subtopics (the names and description of
economists), took 17 msec in PG, with a low of 15 (three times) and a high of
21, whereas MySQL took 60 msec on average but had a low of 22 and a high of 102
msec.

Joe


pgsql-performance by date:

Previous
From: Ron Peacetree
Date:
Subject: Re: [HACKERS] A Better External Sort?
Next
From: PFC
Date:
Subject: Re: Comparative performance