Re: postgresql does seqscan instead of using an existing - Mailing list pgsql-general

From Brian Hirt
Subject Re: postgresql does seqscan instead of using an existing
Date
Msg-id 1031249279.28232.1.camel@loopy.tr.berkhirt.com
Whole thread Raw
In response to postgresql does seqscan instead of using an existing index  ("Jan Weerts" <j.weerts@i-views.de>)
List pgsql-general
IN clause have historically performed very slowly with postgres.

Try rewriting the query to use exists and not exists or better yet, use
joins if your data allows you to.

On Thu, 2002-09-05 at 11:49, Jan Weerts wrote:
> Hi all!
>
> I hope this is the right list for this question, if not, please
> direct me to the appropriate one. This mail is rather longish, so
> thanks in advance to all, who dare to read :-).
>
> We are using PostgreSQL-7.1.2 (the upgrade is planned) to manage
> the source code of a development team. We have read performance
> issues, which increased over the time of the usage (and filling)
> of this PostgreSQL database. Writing performance is acceptable.
>
> Analyzing the typical query sequence, when loading a piece of
> sourcecode, we found that a special query pattern consumes most
> of the time. A sample query looks like this:
>
>   SELECT * FROM TW_ClassRecord
>   WHERE primaryKey IN (
>     SELECT metaclassRef FROM TW_PkgClasses
>     WHERE packageRef = 10047 AND metaclassRef NOT IN (
>       SELECT metaclassRef FROM TW_PkgClasses
>       WHERE packageRef = 10023
>       )
>     )
>
> This query takes over 30 seconds to complete. There exist indexes
> on TW_PkgClasses(packageRef) and naturally the primaryKey index on
> TW_ClassRecord (both freshly vacuumed and analyzed).
>
> an explain of the above query gives:
>   Seq Scan on tw_classrecord  (cost=100000000.00..420280692.87
>                                rows=310762 width=68)
>     SubPlan
>     -> Materialize (cost=1030.60..1030.60 rows=18 width=4)
>        -> Index Scan using tw_pkgclasses_packageref_index on
>           tw_pkgclasses  (cost=0.00..1030.60 rows=18 width=4)
>             SubPlan
>             -> Materialize  (cost=54.82..54.82 rows=18 width=4)
>                -> Index Scan using tw_pkgclasses_packageref_index on
>                   tw_pkgclasses (cost=0.00..54.82 rows=18 width=4)
>
>
> As you see, the planner does not want to use the index on the
> primaryKey of tw_classrecord for the outermost part of this query.
> Setting enable_seqscan to off had no effect on the planner.
>
> Breaking the query in two parts, gives a much better performance
> (much less than a second each) and uses the index:
>     SELECT metaclassRef FROM TW_PkgClasses
>     WHERE packageRef = 10047 AND metaclassRef NOT IN (
>       SELECT metaclassRef FROM TW_PkgClasses
>       WHERE packageRef = 10023
>       )
> delivers 138 integers.
>
>   SELECT * FROM TW_ClassRecord
>   WHERE primaryKey IN ( <INSERT THE 138 INTEGERS HERE> )
>
> explained, delivers:
>
>   Index Scan using tw_classrecord_pkey, tw_classrecord_pkey, .....
>   on tw_classrecord  (cost=0.00..698.67 rows=14 width=68)
>
> The used tables:
>   TW_PkgClasses   306692 rows
>     4 integer columns, one multicolumn index, where packageRef
>     is the first row
>   TW_ClassRecord  310762 rows
>     5 integers, 4 varchars, 3 single column indices on
>     the primarykey and two of the varchar fields
>
> My question is: How do I convince PostgreSQL to use the index
> when executing the original query? Any other solution would
> also be welcomed (we already thought about changing the
> development environment query mechanisms...).
>
> Before I get the question: The DB is located on a 700MHz Linux 2.2 box
> with a single fast IDE drive and 512MB RAM. A second PostgreSQL
> database on the same machine carries much less data and read response
> times are very good there. The machine itself is not loaded, all tests
> were performed off-hours, meaning just a single user of the database.
> I guess, that the machine is not the issue.
>
> If you need any further data, which I did not provide here, please let
> me know.
>
> Thanks in advance
>   Jan


pgsql-general by date:

Previous
From: "Jan Weerts"
Date:
Subject: postgresql does seqscan instead of using an existing index
Next
From: Steve
Date:
Subject: Literal '-' in regular expression bracket sets