Thread: Very slow query - why?

Very slow query - why?

From

"Ben"

Date:

31 January 2004, 23:47:59

My question, pulled out of my source code comments:

// This search is SLOW when WHERE is just zcustnum. This is inexplicable
// to me, as the WHERE and ON conditions only reference zcustnum directly
// in both tables, and in both instances there are btree indexes for them.
// It seems to me that such a search ought to be almost instantaneous; the
// ordered lists of zcustnum in the indexes should make the ON and WHERE
// clauses absolutely trivial to evaluate. In momcust, zcustnum is unique
// (but not enforced by the database - I just never create a duplicate.)
// In sonaddr, zcustnum is not always unique. Basically, this reflects
// the fact that customers may store more than one address per account.
//
// Conditions:
// ===========
// ------------- initial portion of statement, formatted for readability:
// SELECT    a.zcustnum,trim(a.zcompany),trim(a.zfirstname),
//            trim(a.zlastname),trim(a.zaddr),trim(a.zaddr2),
//            trim(a.zcity),trim(a.zstate),trim(a.zzipcode),
//            trim(a.zcountry),a.zbigphone,a.zbigaltph,trim(a.zemail),
//            a.zanumb,trim(a.zsalu),trim(a.ztitle),a.zoptin,
//            b.zodr_date,b.zbadcheck,trim(b.zcomment),trim(b.zcomment2)
//
//            FROM sonaddr AS a
//            LEFT OUTER JOIN momcust AS b
//            ON (a.zcustnum = b.zcustnum)

// ------------- Alternate WHERE clause timings:
//    a:        WHERE a.zcustnum=30538                            --  4 secs
//    b:        WHERE b.zcustnum=30538                            -- 12 secs
//    c:        WHERE a.zcustnum=30538 AND b.zcustnum=30538        --  4 secs

// ------------- Table sizes:
// 101679 sonaddr records
// 102653 momcust records

// ------------- Host conditions:
// PostgreSQL 4.3
// Dell dual 3 GHz Pentium 4 CPU
// Linux 2.4.20 SMP

// ------------- Other information:
// Number of result rows returned for test: 6.
// Using libpq interface through c language, over a network connection.
// The following select is almost immediate, perhaps .1 second.
// An additional indexed field, a.znumb, is used in the WHERE clause.
// It returns one result. Why would this be so much faster?
// -------------
// SELECT    a.zcustnum,trim(a.zcompany),trim(a.zfirstname),
//            trim(a.zlastname),trim(a.zaddr),trim(a.zaddr2),
//            trim(a.zcity),trim(a.zstate),trim(a.zzipcode),
//            trim(a.zcountry),a.zbigphone,a.zbigaltph,trim(a.zemail),
//            a.zanumb,trim(a.zsalu),trim(a.ztitle),a.zoptin,
//            b.zodr_date,b.zbadcheck,trim(b.zcomment),trim(b.zcomment2),
//            b.znomail,trim(b.zwebsite),trim(b.zpassword),trim(b.zquery),
//            trim(b.zanswer),trim(b.zfirstname),trim(b.zlastname)
//
//            FROM sonaddr AS a
//            LEFT OUTER JOIN momcust AS b
//            ON (a.zcustnum = b.zcustnum)
//
//            WHERE a.zcustnum=30538 AND a.zanumb=3

Thanks for any insight

--Ben

Re: Very slow query - why?

From

Eric Ridge

Date:

01 February 2004, 00:56:23

On Jan 28, 2004, at 5:20 PM, Ben wrote:

> My question, pulled out of my source code comments:
>
> // This search is SLOW when WHERE is just zcustnum. This is
> inexplicable
> // to me, as the WHERE and ON conditions only reference zcustnum
> directly
> // in both tables, and in both instances there are btree indexes for
> them.

using the EXPLAIN command would probably open your eyes to where the
problem is.

In addition, if this zcustnum column is a bigint, you need to either
quote the rhs of the filter or cast it to a bitint in order for
postgres to actually use the index you have.

SELECT ... WHERE zcustnum = '30538'
       or
SELECT ... WHERE zcustnum = 30538::bigint;

eric

> // It seems to me that such a search ought to be almost instantaneous;
> the
> // ordered lists of zcustnum in the indexes should make the ON and
> WHERE
> // clauses absolutely trivial to evaluate. In momcust, zcustnum is
> unique
> // (but not enforced by the database - I just never create a
> duplicate.)
> // In sonaddr, zcustnum is not always unique. Basically, this reflects
> // the fact that customers may store more than one address per account.
> //
> // Conditions:
> // ===========
> // ------------- initial portion of statement, formatted for
> readability:
> // SELECT    a.zcustnum,trim(a.zcompany),trim(a.zfirstname),
> //            trim(a.zlastname),trim(a.zaddr),trim(a.zaddr2),
> //            trim(a.zcity),trim(a.zstate),trim(a.zzipcode),
> //            trim(a.zcountry),a.zbigphone,a.zbigaltph,trim(a.zemail),
> //            a.zanumb,trim(a.zsalu),trim(a.ztitle),a.zoptin,
> //            b.zodr_date,b.zbadcheck,trim(b.zcomment),trim(b.zcomment2)
> //
> //            FROM sonaddr AS a
> //            LEFT OUTER JOIN momcust AS b
> //            ON (a.zcustnum = b.zcustnum)
>
> // ------------- Alternate WHERE clause timings:
> //    a:        WHERE a.zcustnum=30538                            --  4 secs
> //    b:        WHERE b.zcustnum=30538                            -- 12 secs
> //    c:        WHERE a.zcustnum=30538 AND b.zcustnum=30538        --  4 secs
>
> // ------------- Table sizes:
> // 101679 sonaddr records
> // 102653 momcust records
>
> // ------------- Host conditions:
> // PostgreSQL 4.3
> // Dell dual 3 GHz Pentium 4 CPU
> // Linux 2.4.20 SMP
>
> // ------------- Other information:
> // Number of result rows returned for test: 6.
> // Using libpq interface through c language, over a network connection.
> // The following select is almost immediate, perhaps .1 second.
> // An additional indexed field, a.znumb, is used in the WHERE clause.
> // It returns one result. Why would this be so much faster?
> // -------------
> // SELECT    a.zcustnum,trim(a.zcompany),trim(a.zfirstname),
> //            trim(a.zlastname),trim(a.zaddr),trim(a.zaddr2),
> //            trim(a.zcity),trim(a.zstate),trim(a.zzipcode),
> //            trim(a.zcountry),a.zbigphone,a.zbigaltph,trim(a.zemail),
> //            a.zanumb,trim(a.zsalu),trim(a.ztitle),a.zoptin,
> //            b.zodr_date,b.zbadcheck,trim(b.zcomment),trim(b.zcomment2),
> //            b.znomail,trim(b.zwebsite),trim(b.zpassword),trim(b.zquery),
> //            trim(b.zanswer),trim(b.zfirstname),trim(b.zlastname)
> //
> //            FROM sonaddr AS a
> //            LEFT OUTER JOIN momcust AS b
> //            ON (a.zcustnum = b.zcustnum)
> //
> //            WHERE a.zcustnum=30538 AND a.zanumb=3
>
> Thanks for any insight
>
> --Ben
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend

Re: Very slow query - why?

From

Richard Schilling

Date:

02 February 2004, 05:52:38

Looking at your query, I notice that the longer query happens when you
search on b.zcustnum=30538 while the LEFT OUTER JOIN remains the same.
It could be that when testing a.zcustnum=30530 the server can short
circuit the logic - it only has to check a.zcustnum to see if the
entire tuple should be selected.  On the other hand when its selecting
on b.zcustnum, it has to go match up a.zcustnum with b.zcustnum first
before testing for the condition.  Extra processing.  Try clustering on
field zcustnum to see if that helps, or when selecting only on
b.zcustnum you might make b the first table specified in your join.

You might also look at clustering zcustnum in both tables.

Richard




On 2004.01.28 14:20 Ben wrote:
> My question, pulled out of my source code comments:
>
> // This search is SLOW when WHERE is just zcustnum. This is
> inexplicable
> // to me, as the WHERE and ON conditions only reference zcustnum
> directly
> // in both tables, and in both instances there are btree indexes for
> them.
> // It seems to me that such a search ought to be almost instantaneous;
> the
> // ordered lists of zcustnum in the indexes should make the ON and
> WHERE
> // clauses absolutely trivial to evaluate. In momcust, zcustnum is
> unique
> // (but not enforced by the database - I just never create a
> duplicate.)
> // In sonaddr, zcustnum is not always unique. Basically, this reflects
> // the fact that customers may store more than one address per
> account.
> //
> // Conditions:
> // ===========
> // ------------- initial portion of statement, formatted for
> readability:
> // SELECT    a.zcustnum,trim(a.zcompany),trim(a.zfirstname),
> //            trim(a.zlastname),trim(a.zaddr),trim(a.zaddr2),
> //            trim(a.zcity),trim(a.zstate),trim(a.zzipcode),
> //
> trim(a.zcountry),a.zbigphone,a.zbigaltph,trim(a.zemail),
> //            a.zanumb,trim(a.zsalu),trim(a.ztitle),a.zoptin,
> //
> b.zodr_date,b.zbadcheck,trim(b.zcomment),trim(b.zcomment2)
> //
> //            FROM sonaddr AS a
> //            LEFT OUTER JOIN momcust AS b
> //            ON (a.zcustnum = b.zcustnum)
>
> // ------------- Alternate WHERE clause timings:
> //    a:        WHERE
> a.zcustnum=30538
> --
>  4 secs
> //    b:        WHERE
> b.zcustnum=30538
> --
> 12 secs
> //    c:        WHERE a.zcustnum=30538 AND
> b.zcustnum=30538        --
>  4 secs
>
> // ------------- Table sizes:
> // 101679 sonaddr records
> // 102653 momcust records
>
> // ------------- Host conditions:
> // PostgreSQL 4.3
> // Dell dual 3 GHz Pentium 4 CPU
> // Linux 2.4.20 SMP
>
> // ------------- Other information:
> // Number of result rows returned for test: 6.
> // Using libpq interface through c language, over a network
> connection.
> // The following select is almost immediate, perhaps .1 second.
> // An additional indexed field, a.znumb, is used in the WHERE clause.
> // It returns one result. Why would this be so much faster?
> // -------------
> // SELECT    a.zcustnum,trim(a.zcompany),trim(a.zfirstname),
> //            trim(a.zlastname),trim(a.zaddr),trim(a.zaddr2),
> //            trim(a.zcity),trim(a.zstate),trim(a.zzipcode),
> //
> trim(a.zcountry),a.zbigphone,a.zbigaltph,trim(a.zemail),
> //            a.zanumb,trim(a.zsalu),trim(a.ztitle),a.zoptin,
> //
> b.zodr_date,b.zbadcheck,trim(b.zcomment),trim(b.zcomment2),
> //
> b.znomail,trim(b.zwebsite),trim(b.zpassword),trim(b.zquery),
> //
> trim(b.zanswer),trim(b.zfirstname),trim(b.zlastname)
> //
> //            FROM sonaddr AS a
> //            LEFT OUTER JOIN momcust AS b
> //            ON (a.zcustnum = b.zcustnum)
> //
> //            WHERE a.zcustnum=30538 AND a.zanumb=3
>
> Thanks for any insight
>
> --Ben
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>

Re: Very slow query - why?

From

Tom Lane

Date:

02 February 2004, 13:53:59

Richard Schilling <rschi@rsmba.biz> writes:
> Looking at your query, I notice that the longer query happens when you
> search on b.zcustnum=30538 while the LEFT OUTER JOIN remains the same.
> It could be that when testing a.zcustnum=30530 the server can short
> circuit the logic - it only has to check a.zcustnum to see if the
> entire tuple should be selected.

Also note that the planner has gotten successively smarter about outer
joins in each of the past several releases.  Without knowing which PG
version this is (and it ain't "PostgreSQL 4.3" as alleged in the OP's
message...), and without seeing EXPLAIN ANALYZE results, it's hard to
speculate about what's going on.

            regards, tom lane

Re: Very slow query - why?

From

"Ben"

Date:

02 February 2004, 19:57:03

On Mon, 02 Feb 2004 09:53:50 -0500, Tom Lane wrote:

> Also note that the planner has gotten successively smarter about outer
> joins in each of the past several releases.  Without knowing which PG
> version this is (and it ain't "PostgreSQL 4.3" as alleged in the OP's
> message...),

Sorry, SHB PostgreSQL 7.3 - my apologies...

> and without seeing EXPLAIN ANALYZE results, it's hard to speculate about
> what's going on.

Here they are. Also, just to explain, I wasn't really asking why the 12
second timing was happening, I was asking why this took several seconds rather
than coming back in a fraction of a second - since there were indexes on
everything, it seemed to me that the rows involved in both tables could be
identified more or less instantly, then combined as required, then
returned. >1 second on a 3 GHz processor is a TON of computing time, and
most operations appear to happen instantly within this particular
database. This machine is actually a dual 3 GHz machine, but I get the
impression that PostgreSQL uses one at a time for a particular job, so
it's not effectively 6 GHz. Still, it's a pretty fast machine. :)

The details:

EXPLAIN ANALYZE
SELECT a.zcustnum,trim(a.zcompany),trim(a.zfirstname),
       trim(a.zlastname),trim(a.zaddr),trim(a.zaddr2),
       trim(a.zcity),trim(a.zstate),trim(a.zzipcode),
       trim(a.zcountry),a.zbigphone,a.zbigaltph,trim(a.zemail),
       a.zanumb,trim(a.zsalu),trim(a.ztitle),a.zoptin,
       b.zodr_date,b.zbadcheck,trim(b.zcomment),trim(b.zcomment2)
       FROM sonaddr AS a LEFT OUTER JOIN momcust AS b
       ON (a.zcustnum = b.zcustnum)
       WHERE a.zcustnum=30538;

 QUERY PLAN
 ----------
 Merge Join  (cost=34952.76..40696.70 rows=260923 width=464) (actual
 time=1492.00..1492.55 rows=6 loops=1)
   Merge Cond: ("outer".zcustnum = "inner".zcustnum) ->  Index Scan using
   acn_index on sonaddr a  (cost=0.00..1965.45 rows=508 width=359) (actual
   time=0.10..0.28 rows=6 loops=1)
         Index Cond: (zcustnum = 30538)
   ->  Sort  (cost=34952.76..35209.39 rows=102650 width=105) (actual
   time=1441.52..1467.60 rows=27367 loops=1)
         Sort Key: b.zcustnum
         ->  Seq Scan on momcust b  (cost=0.00..13858.50 rows=102650
         width=105) (actual time=0.03..521.36 rows=102657 loops=1)
 Total runtime: 1504.42 msec
(8 rows)