Thread: Optimizing No matching record Queries

Optimizing No matching record Queries

From
Pallav Kalva
Date:
Hi,

   I am using Postgres 8.2.4, we have to regularly run some queries on
some big tables to see if we have any data for a particular request. But
sometimes we might not have any matching rows on a particular request as
in this case, when it cant find any matching rows it pretty much scans
the whole table and it takes too long to execute.

  As you can see from explain analyze output the response time is
horrible, Is there anything I can do to improve these queries ?

  Tables are autovacuumed regularly.


   select relname,relpages,reltuples from pg_class where relname in
('listing','listingstatus','listedaddress');

    relname    | relpages |  reltuples
---------------+----------+-------------
 listing       |   132725 | 9.22896e+06
 listingstatus |        1 |           6
 listedaddress |    63459 | 8.15774e+06
(3 rows)

helix_fdc=# select relname,last_autovacuum,last_autoanalyze from
pg_stat_user_tables where relname in ('listing','listedaddress');
    relname    |        last_autovacuum        |       last_autoanalyze
---------------+-------------------------------+-------------------------------
 listing       | 2008-02-12 10:57:54.690913-05 | 2008-02-12
10:57:54.690913-05
 listedaddress | 2008-02-09 14:12:44.038341-05 | 2008-02-12
11:17:47.822597-05
(3 rows)

Explain Analyze Output
================

explain analyze
select listing0_.listingid as listingid157_, listing0_.entrydate as
entrydate157_, listing0_.lastupdate as lastupdate157_,
       listing0_.sourcereference as sourcere4_157_, listing0_.start as
start157_, listing0_.stop as stop157_,
       listing0_.price as price157_, listing0_.updateHashcode as
updateHa8_157_, listing0_.fklistedaddressid as fklisted9_157_,
       listing0_.fklistingsubtypeid as fklisti10_157_,
listing0_.fkbestaddressid as fkbesta11_157_,
       listing0_.fklistingsourceid as fklisti12_157_,
listing0_.fklistingtypeid as fklisti13_157_,
       listing0_.fklistingstatusid as fklisti14_157_,
listing0_.fkpropertytypeid as fkprope15_157_
from listing.listing listing0_, listing.listingstatus listingsta1_,
listing.listedaddress listedaddr2_
where listing0_.fklistingstatusid=listingsta1_.listingstatusid
and listing0_.fklistedaddressid=listedaddr2_.listedaddressid
and listing0_.fklistingsourceid=5525
and listingsta1_.shortname='active'
and (listedaddr2_.fkverifiedaddressid is not null)
order by listing0_.entrydate desc limit 10;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..11191.64 rows=10 width=107) (actual
time=2113544.437..2113544.437 rows=0 loops=1)
   ->  Nested Loop  (cost=0.00..790129.94 rows=706 width=107) (actual
time=2113544.427..2113544.427 rows=0 loops=1)
         ->  Nested Loop  (cost=0.00..783015.53 rows=853 width=107)
(actual time=2113544.420..2113544.420 rows=0 loops=1)
               ->  Index Scan Backward using idx_listing_entrydate on
listing listing0_  (cost=0.00..781557.28 rows=5118 width=107) (actual
time=2113544.412..2113544.412 rows=0 loops=1)
                     Filter: (fklistingsourceid = 5525)
               ->  Index Scan using pk_listingstatus_listingstatusid on
listingstatus listingsta1_  (cost=0.00..0.27 rows=1 width=4) (never
executed)
                     Index Cond: (listing0_.fklistingstatusid =
listingsta1_.listingstatusid)
                     Filter: (shortname = 'active'::text)
         ->  Index Scan using pk_listedaddress_listedaddressid on
listedaddress listedaddr2_  (cost=0.00..8.33 rows=1 width=4) (never
executed)
               Index Cond: (listing0_.fklistedaddressid =
listedaddr2_.listedaddressid)
               Filter: (fkverifiedaddressid IS NOT NULL)
 Total runtime: 2113544.580 ms
(12 rows)


Table Definitions
============

 \d listing.listing
                                               Table "listing.listing"
       Column       |            Type
|                            Modifiers
--------------------+-----------------------------+------------------------------------------------------------------
 listingid          | integer                     | not null default
nextval(('listing.listingseq'::text)::regclass)
 fklistingsourceid  | integer                     | not null
 fklistingtypeid    | integer                     | not null
 entrydate          | timestamp without time zone | not null
 lastupdate         | timestamp without time zone | not null
 fklistedaddressid  | integer                     |
 fkbestaddressid    | integer                     |
 sourcereference    | text                        |
 fkpropertytypeid   | integer                     | not null
 fklistingstatusid  | integer                     | not null
 start              | timestamp without time zone | not null
 stop               | timestamp without time zone |
 _entrydate         | timestamp without time zone | default
('now'::text)::timestamp(6) without time zone
 price              | numeric(14,2)               |
 fklistingsubtypeid | integer                     |
 updatehashcode     | text                        |
Indexes:
    "pk_listing_listingid" PRIMARY KEY, btree (listingid), tablespace
"indexdata"
    "idx_listing_entrydate" btree (entrydate), tablespace "indexdata"
    "idx_listing_fkbestaddressid" btree (fkbestaddressid), tablespace
"indexdata"
    "idx_listing_fklistingsourceid" btree (fklistingsourceid),
tablespace "indexdata"
    "idx_listing_fklistingtypeid" btree (fklistingtypeid), tablespace
"indexdata"
    "idx_listing_lastupdate" btree (lastupdate), tablespace "indexdata"
    "idx_listing_sourcereference" btree (sourcereference), tablespace
"indexdata"
    "idx_listing_stop" btree (stop), tablespace "indexdata"
    "idx_listing_updatehashcode" btree (updatehashcode), tablespace
"indexdata"
Foreign-key constraints:
    "fk_listing_address" FOREIGN KEY (fkbestaddressid) REFERENCES
listing.address(addressid)
    "fk_listing_listedaddress" FOREIGN KEY (fklistedaddressid)
REFERENCES listing.listedaddress(listedaddressid)
    "fk_listing_listingsource" FOREIGN KEY (fklistingsourceid)
REFERENCES listing.listingsource(listingsourceid)
    "fk_listing_listingstatus" FOREIGN KEY (fklistingstatusid)
REFERENCES listing.listingstatus(listingstatusid)
    "fk_listing_listingsubtype" FOREIGN KEY (fklistingsubtypeid)
REFERENCES listing.listingsubtype(listingsubtypeid)
    "fk_listing_listingtypes" FOREIGN KEY (fklistingtypeid) REFERENCES
listing.listingtype(listingtypeid)
    "fk_listing_propertytype" FOREIGN KEY (fkpropertytypeid) REFERENCES
listing.propertytype(propertytypeid)

\d listing.listedaddress
                                               Table "listing.listedaddress"
       Column        |            Type
|                               Modifiers

---------------------+-----------------------------+------------------------------------------------------------------------
 listedaddressid     | integer                     | not null default
nextval(('listing.listedaddressseq'::text)::regclass)
 fkaddressid         | integer                     |
 fkverifiedaddressid | integer                     |
 verifyattempt       | timestamp without time zone |
 _entrydate          | timestamp without time zone | default
('now'::text)::timestamp(6) without time zone
Indexes:
    "pk_listedaddress_listedaddressid" PRIMARY KEY, btree
(listedaddressid), tablespace "indexdata"
    "uk_listedaddress_fkaddressid" UNIQUE, btree (fkaddressid),
tablespace "indexdata"
    "idx_listedaddress_fkverifiedaddressid" btree (fkverifiedaddressid),
tablespace "indexdata"
Foreign-key constraints:
    "fk_listedaddress_address" FOREIGN KEY (fkaddressid) REFERENCES
listing.address(addressid)
    "fk_listedaddress_verifiedaddress" FOREIGN KEY (fkverifiedaddressid)
REFERENCES listing.verifiedaddress(verifiedaddressid)

 \d listing.listingstatus
                                             Table "listing.listingstatus"
     Column      |            Type
|                               Modifiers

-----------------+-----------------------------+------------------------------------------------------------------------
 listingstatusid | integer                     | not null default
nextval(('listing.listingstatusseq'::text)::regclass)
 shortname       | text                        |
 longname        | text                        |
 _entrydate      | timestamp without time zone | default
('now'::text)::timestamp(6) without time zone
Indexes:
    "pk_listingstatus_listingstatusid" PRIMARY KEY, btree
(listingstatusid), tablespace "indexdata"



TIA,
Pallav

Re: Optimizing No matching record Queries

From
"Stephen Denne"
Date:
Pallav Kalva asked
...
> and listing0_.fklistingsourceid=5525
...
> order by listing0_.entrydate desc limit 10;

>                ->  Index Scan Backward using idx_listing_entrydate on
> listing listing0_  (cost=0.00..781557.28 rows=5118 width=107) (actual
> time=2113544.412..2113544.412 rows=0 loops=1)
>                      Filter: (fklistingsourceid = 5525)

Would it help to have a combined index on fklistingsourceid, entrydate?

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________
  This email has been scanned by the DMZGlobal Business Quality
              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________



Re: Optimizing No matching record Queries

From
"Dean Gibson (DB Administrator)"
Date:
On 2008-02-12 13:35, Pallav Kalva wrote:
> Hi,
>
> ...
> Table Definitions
> ============
>
> \d listing.listingstatus
>                                             Table "listing.listingstatus"
>     Column      |            Type
> |                               Modifiers
>
-----------------+-----------------------------+------------------------------------------------------------------------

>
> listingstatusid | integer                     | not null default
> nextval(('listing.listingstatusseq'::text)::regclass)
> shortname       | text                        |
> longname        | text                        |
> _entrydate      | timestamp without time zone | default
> ('now'::text)::timestamp(6) without time zone
> Indexes:
>    "pk_listingstatus_listingstatusid" PRIMARY KEY, btree
> (listingstatusid), tablespace "indexdata"
>
Since you are searching by "shortname", trying adding an index on that.
Although with that tiny a table, it might not matter.

The questions are:

1. Why in the planner scanning the entire idx_listing_entrydate, when
I'd think it should be scanning the entire
pk_listingstatus_listingstatusid ?
2. Why is "Index Scan using pk_listingstatus_listingstatusid on
listingstatus listingsta1_  (cost=0.00..0.27 rows=1 width=4) (never
executed)" ?

Note:  I'm new at this as well, and jumped in to learn as well as to help.

-- Dean

--
Mail to my list address MUST be sent via the mailing list.  All other mail will bounce.


Re: Optimizing No matching record Queries

From
Gregory Stark
Date:
"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:

> Pallav Kalva asked
> ...
>> and listing0_.fklistingsourceid=5525
> ...
>> order by listing0_.entrydate desc limit 10;
>
>>                ->  Index Scan Backward using idx_listing_entrydate on
>> listing listing0_  (cost=0.00..781557.28 rows=5118 width=107) (actual
>> time=2113544.412..2113544.412 rows=0 loops=1)
>>                      Filter: (fklistingsourceid = 5525)
>
> Would it help to have a combined index on fklistingsourceid, entrydate?

I think that would help. You already have a ton of indexes, you might consider
whether all your queries start with a listingsourceid and whether you can have
that as a prefix on the existing index.

Another thing to try is raising the stats target on fklistingsourceid and/or
entrydate. The estimate seems pretty poor. It could just be that the
distribution is highly skewed which is a hard case to estimate correctly.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: Optimizing No matching record Queries

From
Richard Huxton
Date:
Dean Gibson (DB Administrator) wrote:
> The questions are:
>
> 1. Why in the planner scanning the entire idx_listing_entrydate, when
> I'd think it should be scanning the entire
> pk_listingstatus_listingstatusid ?

It's looking at the ORDER BY and sees that the query needs the 10 most
recent, so tries searching by date. That's sensible where you are going
to have a lot of matches for fklistingsourceid.

Which suggests that statistics for "fklistingsourceid" aren't high
enough, like Greg suggested. If that doesn't help, the index on
(fklistingsourceid,entrydate) that Stephen might well do so.

> 2. Why is "Index Scan using pk_listingstatus_listingstatusid on
> listingstatus listingsta1_  (cost=0.00..0.27 rows=1 width=4) (never
> executed)" ?

Because nothing comes out of the first index-scan.

--
   Richard Huxton
   Archonet Ltd

Re: Optimizing No matching record Queries

From
Pallav Kalva
Date:
Thanks! for all your replies, I tried increasing the statistics on
fklistingsourceid to 1000 it made any difference.  Then I created an
index on (fklistingsourceid,entrydate) it helped and it was fast.

This index would fix this problem but in general I would like to know
what if  there are queries where it does "index scan backwards" and
there is no "order by clause" and the query is still bad ? Would there
be a case like that or the planner uses index scan backwards only when
use order by desc also.


Richard Huxton wrote:
> Dean Gibson (DB Administrator) wrote:
>> The questions are:
>>
>> 1. Why in the planner scanning the entire idx_listing_entrydate, when
>> I'd think it should be scanning the entire
>> pk_listingstatus_listingstatusid ?
>
> It's looking at the ORDER BY and sees that the query needs the 10 most
> recent, so tries searching by date. That's sensible where you are
> going to have a lot of matches for fklistingsourceid.
>
> Which suggests that statistics for "fklistingsourceid" aren't high
> enough, like Greg suggested. If that doesn't help, the index on
> (fklistingsourceid,entrydate) that Stephen might well do so.
>
>> 2. Why is "Index Scan using pk_listingstatus_listingstatusid on
>> listingstatus listingsta1_  (cost=0.00..0.27 rows=1 width=4) (never
>> executed)" ?
>
> Because nothing comes out of the first index-scan.
>


Re: Optimizing No matching record Queries

From
Gregory Stark
Date:
"Pallav Kalva" <pkalva@livedatagroup.com> writes:

> This index would fix this problem but in general I would like to know what if
> there are queries where it does "index scan backwards" and there is no "order
> by clause" and the query is still bad ? Would there be a case like that or the
> planner uses index scan backwards only when use order by desc also.

I think you're oversimplifying. Basically you were asking the planner for the
most recent record for a given user. The planner had the choice either of

a) going through all the records for a given user and picking the most recent,

or b) scanning the records from most recent to oldest and looking for the
given user.

It was a choice between two evils. If there are a lot of records for the user
then a) will be bad since it has to scan all of them to find the most recent
and if there are no records for the user then b) will be bad because it'll
have to go through all of the records to the beginning of time.

The suggested index lets it scan the records for the given user from most
recent to oldest without seeing any records for any other user.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!