postgresql does seqscan instead of using an existing index - Mailing list pgsql-general

From Jan Weerts
Subject postgresql does seqscan instead of using an existing index
Date
Msg-id B349BABAF9A92F4D9FBFCADF8D5FEDD508108B@ivsrv03.i-views.de
Whole thread Raw
Responses Re: postgresql does seqscan instead of using an existing  (Brian Hirt <bhirt@mobygames.com>)
Re: postgresql does seqscan instead of using an existing index  (Andrew Sullivan <andrew@libertyrms.info>)
List pgsql-general

Hi all!

I hope this is the right list for this question, if not, please
direct me to the appropriate one. This mail is rather longish, so
thanks in advance to all, who dare to read :-).

We are using PostgreSQL-7.1.2 (the upgrade is planned) to manage
the source code of a development team. We have read performance
issues, which increased over the time of the usage (and filling)
of this PostgreSQL database. Writing performance is acceptable.

Analyzing the typical query sequence, when loading a piece of
sourcecode, we found that a special query pattern consumes most
of the time. A sample query looks like this:

  SELECT * FROM TW_ClassRecord
  WHERE primaryKey IN (
    SELECT metaclassRef FROM TW_PkgClasses
    WHERE packageRef = 10047 AND metaclassRef NOT IN (
      SELECT metaclassRef FROM TW_PkgClasses
      WHERE packageRef = 10023
      )
    )

This query takes over 30 seconds to complete. There exist indexes
on TW_PkgClasses(packageRef) and naturally the primaryKey index on
TW_ClassRecord (both freshly vacuumed and analyzed).

an explain of the above query gives:
  Seq Scan on tw_classrecord  (cost=100000000.00..420280692.87
                               rows=310762 width=68)
    SubPlan
    -> Materialize (cost=1030.60..1030.60 rows=18 width=4)
       -> Index Scan using tw_pkgclasses_packageref_index on
          tw_pkgclasses  (cost=0.00..1030.60 rows=18 width=4)
            SubPlan
            -> Materialize  (cost=54.82..54.82 rows=18 width=4)
               -> Index Scan using tw_pkgclasses_packageref_index on
                  tw_pkgclasses (cost=0.00..54.82 rows=18 width=4)

As you see, the planner does not want to use the index on the
primaryKey of tw_classrecord for the outermost part of this query.
Setting enable_seqscan to off had no effect on the planner.

Breaking the query in two parts, gives a much better performance
(much less than a second each) and uses the index:
    SELECT metaclassRef FROM TW_PkgClasses
    WHERE packageRef = 10047 AND metaclassRef NOT IN (
      SELECT metaclassRef FROM TW_PkgClasses
      WHERE packageRef = 10023
      )
delivers 138 integers.

  SELECT * FROM TW_ClassRecord
  WHERE primaryKey IN ( <INSERT THE 138 INTEGERS HERE> )

explained, delivers:

  Index Scan using tw_classrecord_pkey, tw_classrecord_pkey, .....
  on tw_classrecord  (cost=0.00..698.67 rows=14 width=68)

The used tables:
  TW_PkgClasses   306692 rows
    4 integer columns, one multicolumn index, where packageRef
    is the first row
  TW_ClassRecord  310762 rows
    5 integers, 4 varchars, 3 single column indices on
    the primarykey and two of the varchar fields

My question is: How do I convince PostgreSQL to use the index
when executing the original query? Any other solution would
also be welcomed (we already thought about changing the
development environment query mechanisms...).

Before I get the question: The DB is located on a 700MHz Linux 2.2 box with a single fast IDE drive and 512MB RAM. A second PostgreSQL database on the same machine carries much less data and read response times are very good there. The machine itself is not loaded, all tests were performed off-hours, meaning just a single user of the database. I guess, that the machine is not the issue.

If you need any further data, which I did not provide here, please let me know.

Thanks in advance
  Jan

pgsql-general by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: "...integer[] references..." = error
Next
From: Brian Hirt
Date:
Subject: Re: postgresql does seqscan instead of using an existing