Thread: Why the difference in plans ?

Why the difference in plans ?

From
Dave Cramer
Date:
Below I have two almost identical queries.  Strangely enough the one
that uses the index is slower ???

explain analyze select uid from user_profile where
lower(firstname)='angie' and extract(year from age('2008-02-26
02:50:31.382', dob)) >= 18 and extract(year from age('2008-02-26
02:50:31.382', dob)) <= 68 and image1 is not null and profileprivacy=1
and isactive='t' order by name asc limit 250;

                                                            QUERY  
  PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=166423.90..166423.93 rows=11 width=17) (actual
time=1033.634..1034.137 rows=129 loops=1)
    ->  Sort  (cost=166423.90..166423.93 rows=11 width=17) (actual
time=1033.631..1033.811 rows=129 loops=1)
          Sort Key: name
          ->  Seq Scan on user_profile  (cost=0.00..166423.71 rows=11
width=17) (actual time=46.730..1032.994 rows=129 loops=1)
                Filter: ((lower((firstname)::text) = 'angie'::text)
AND (date_part('year'::text, age('2008-02-26 02:50:31.382'::timestamp
without time zone, dob)) >= 18::double precision) AND
(date_part('year'::text, age('2008-02-26 02:50:31.382'::timestamp
without time zone, dob)) <= 68::double precision) AND (image1 IS NOT
NULL) AND (profileprivacy = 1) AND isactive)
  Total runtime: 1034.334 ms
(6 rows)

jnj=# explain analyze select uid from user_profile where
lower(firstname)='angie' and dob <= '1990-03-05 15:17:29.537' and dob
 >= '1940-03-05 15:17:29.537' and image1 is not null and
profileprivacy=1 and isactive='t' order by name asc limit 250;

     QUERY  
  PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..113963.92 rows=250 width=17) (actual
time=230.326..4688.607 rows=129 loops=1)
    ->  Index Scan using user_profile_name_key on user_profile
(cost=0.00..460414.23 rows=1010 width=17) (actual
time=230.322..4688.174 rows=129 loops=1)
          Filter: ((lower((firstname)::text) = 'angie'::text) AND (dob
<= '1990-03-05 15:17:29.537'::timestamp without time zone) AND (dob >=
'1940-03-05 15:17:29.537'::timestamp without time zone) AND (image1 IS
NOT NULL) AND (profileprivacy = 1) AND isactive)
  Total runtime: 4688.906 ms
(4 rows)

Re: Why the difference in plans ?

From
Josh Berkus
Date:
Dave,

> Below I have two almost identical queries.  Strangely enough the one
> that uses the index is slower ???

My first guess would be that records are highly correlated by DOB and not at
all by name.  However, it would help if you supplied both the index
definitions and what changed between the two queries to cause the index to be
used.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: Why the difference in plans ?

From
Dave Cramer
Date:
On 6-Mar-08, at 12:26 PM, Josh Berkus wrote:

> Dave,
>
>> Below I have two almost identical queries.  Strangely enough the one
>> that uses the index is slower ???
>
> My first guess would be that records are highly correlated by DOB
> and not at
> all by name.  However, it would help if you supplied both the index
> definitions and what changed between the two queries to cause the
> index to be
> used.

The two queries were run 2 seconds apart, there were no changes
between. I'll get the index definitions.

Dave
>
>
> --
> Josh Berkus
> PostgreSQL @ Sun
> San Francisco
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: Why the difference in plans ?

From
"Stephen Denne"
Date:
Dave Cramer wrote:
> I have two almost identical queries.  Strangely enough the one
> that uses the index is slower ???

The index scan is being used so that it can retrieve the rows in the name order.
It expects that if it was to retrieve every row via the index, it would get about 1010 rows that matched the filter,
andit knows it can stop after 250, so assuming the matching rows are evenly distributed it thinks it can stop after
havingread only a quarter of the rows. 

However only 129 rows matched. Consequently it had to read every row in the table anyway, seeking a fair bit as the
readorder was specified by the index rather than in sequential order, and it also had to read the index. These extra
costswere much larger than reading the lot sequentially, and sorting 129 resulting rows. 

The first query picked a sequential scan as it thought it was only going to get 11 results, so was expecting that the
limitwasn't going to come into play, and that every row would have to be read anyway. 

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________
  This email has been scanned by the DMZGlobal Business Quality
              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________



Re: Why the difference in plans ?

From
Dave Cramer
Date:
On 6-Mar-08, at 5:10 PM, Stephen Denne wrote:

> Dave Cramer wrote:
>> I have two almost identical queries.  Strangely enough the one
>> that uses the index is slower ???
>
> The index scan is being used so that it can retrieve the rows in the
> name order.
> It expects that if it was to retrieve every row via the index, it
> would get about 1010 rows that matched the filter, and it knows it
> can stop after 250, so assuming the matching rows are evenly
> distributed it thinks it can stop after having read only a quarter
> of the rows.
>
> However only 129 rows matched. Consequently it had to read every row
> in the table anyway, seeking a fair bit as the read order was
> specified by the index rather than in sequential order, and it also
> had to read the index. These extra costs were much larger than
> reading the lot sequentially, and sorting 129 resulting rows.
>
> The first query picked a sequential scan as it thought it was only
> going to get 11 results, so was expecting that the limit wasn't
> going to come into play, and that every row would have to be read
> anyway.
>
The strange thing of course is that the data is exactly the same for
both runs, the tables have not been changed between runs, and I did
them right after another. Even more strange is that the seq scan is
faster than the index scan.

Dave
> Regards,
> Stephen Denne.
>
> Disclaimer:
> At the Datamail Group we value team commitment, respect,
> achievement, customer focus, and courage. This email with any
> attachments is confidential and may be subject to legal privilege.
> If it is not intended for you please advise by reply immediately,
> destroy it and do not copy, disclose or use it in any way.
>
> __________________________________________________________________
>  This email has been scanned by the DMZGlobal Business Quality
>              Electronic Messaging Suite.
> Please see http://www.dmzglobal.com/services/bqem.htm for details.
> __________________________________________________________________
>
>


Re: Why the difference in plans ?

From
"Stephen Denne"
Date:
> The strange thing of course is that the data is exactly the same for
> both runs, the tables have not been changed between runs, and I did
> them right after another. Even more strange is that the seq scan is
> faster than the index scan.

It is not strange at all, since both queries read ALL the rows in your table, checking each and every row to see
whetherit matched your predicates. 

The sequential scan read them in the order they are on the disk, meaning your disk didn't have to seek as much
(assuminglow file fragmentation). 

The index scan again reads all the rows in your table, but reads them in the order they were in the index, which is
probablyquite different from the order that they are on the disk, so the disk had to seek a lot. In addition, it had to
readthe index. 

Taking some wild guesses about the distribution of your data, I'd hazard a guess that this specific query could be sped
upa great deal by creating an index on lower(firstname). 

Regards,
Stephen.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________
  This email has been scanned by the DMZGlobal Business Quality
              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________



Re: Why the difference in plans ?

From
Dave Cramer
Date:
Josh,

On 6-Mar-08, at 12:26 PM, Josh Berkus wrote:

> Dave,
>
>> Below I have two almost identical queries.  Strangely enough the one
>> that uses the index is slower ???
>
> My first guess would be that records are highly correlated by DOB
> and not at
> all by name.  However, it would help if you supplied both the index
> definitions and what changed between the two queries to cause the
> index to be
> used.

Indexes:
     "user_profile_pkey" PRIMARY KEY, btree (uid) CLUSTER
     "user_profile_name_idx" UNIQUE, btree (name varchar_pattern_ops)
     "user_profile_name_key" UNIQUE, btree (name)
     "user_profile_uploadcode_key" UNIQUE, btree (uploadcode)
     "user_profile_active_idx" btree (isactive)
     "user_profile_areacode_index" btree (areacode)
     "user_profile_gender_idx" btree (gender)

and nothing changed between runs.

Dave
>
>
> --
> Josh Berkus
> PostgreSQL @ Sun
> San Francisco
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: Why the difference in plans ?

From
Dave Cramer
Date:
On 6-Mar-08, at 9:30 PM, Stephen Denne wrote:

>> The strange thing of course is that the data is exactly the same for
>> both runs, the tables have not been changed between runs, and I did
>> them right after another. Even more strange is that the seq scan is
>> faster than the index scan.
>
> It is not strange at all, since both queries read ALL the rows in
> your table, checking each and every row to see whether it matched
> your predicates.
>
> The sequential scan read them in the order they are on the disk,
> meaning your disk didn't have to seek as much (assuming low file
> fragmentation).
>
> The index scan again reads all the rows in your table, but reads
> them in the order they were in the index, which is probably quite
> different from the order that they are on the disk, so the disk had
> to seek a lot. In addition, it had to read the index.
>
OK, that makes sense.

So given that the predicates are essentially the same why would the
planner decide to use or not use the index ?

>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: Why the difference in plans ?

From
Josh Berkus
Date:
Dave,
>      "user_profile_pkey" PRIMARY KEY, btree (uid) CLUSTER
>      "user_profile_name_idx" UNIQUE, btree (name varchar_pattern_ops)
>      "user_profile_name_key" UNIQUE, btree (name)
>      "user_profile_uploadcode_key" UNIQUE, btree (uploadcode)
>      "user_profile_active_idx" btree (isactive)
>      "user_profile_areacode_index" btree (areacode)
>      "user_profile_gender_idx" btree (gender)

You need to change one of the name indexes to a functional index on
lower(firstname).  That'll speed the query up considerably.

I'm still puzzled as to why the index is being used at all in the 2nd
query, as it seems very unlikely to work out, but the above is the
practical solution to your problem.


--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco