Thread: Poor Query

From:
Pallav Kalva
Date:

Hi Everybody,

     I have a performance problem with this query , it takes lot of time
on the production database. is there a way to improve it ? i do vacuumdb
on this database and do anlyze on the users table separately daily


select userID, fname, lname, email, phone, dateEntered, dateCanceled,
dateSuspended, billAmount, billDate, dateBilled, datePaid, '?' as searches
from Users u
where 1=1 AND exists (select userID
            from bankaccount ba
            where ba.bankaccountID = u.bankaccountID
            and   ba.accountnumber = '12345678')
 AND exists (select userID
         from bankaccount ba
         where ba.bankaccountID = u.bankaccountID
         and ba.routingNumber = '12345678')
order by UserID desc
limit 500

                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..12752.61 rows=500 width=120)
   ->  Index Scan Backward using users_pkey on users u
(cost=0.00..2460462.79 rows=96469 width=120)
         Filter: ((subplan) AND (subplan))
         SubPlan
           ->  Index Scan using bankaccount_pkey on bankaccount ba
(cost=0.00..3.07 rows=1 width=0)
                 Index Cond: (bankaccountid = $1)
                 Filter: (routingnumber = '12345678'::text)
           ->  Index Scan using bankaccount_pkey on bankaccount ba
(cost=0.00..3.07 rows=1 width=0)
                 Index Cond: (bankaccountid = $1)
                 Filter: (accountnumber = '12345678'::text)

I tried changing it but it still takes lot of time


select userID, fname, lname, email, phone, dateEntered, dateCanceled,
dateSuspended, billAmount, billDate, dateBilled, datePaid, '?' as searches
from Users u
where 1=1 AND exists (select userID
            from bankaccount ba
            where ba.bankaccountID = u.bankaccountID
            and   ba.accountnumber = '12345678'
                   and ba.routingNumber = '12345678')
order by UserID desc
limit 500
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..3309.62 rows=500 width=120)
   ->  Index Scan Backward using users_pkey on users u
(cost=0.00..1277101.86 rows=192938 width=120)
         Filter: (subplan)
         SubPlan
           ->  Index Scan using bankaccount_pkey on bankaccount ba
(cost=0.00..3.07 rows=1 width=0)
                 Index Cond: (bankaccountid = $1)
                 Filter: ((accountnumber = '12345678'::text) AND
(routingnumber = '12345678'::text))


   the users_pkey index on the primary key userid is on users table. it
seems to be using index but it still takes lot of time.
  here is the output from the pg_class for the users and bankaccount
table . Table doesnt have lot of records but this query take anywhere
from 3 to 5 min to run which is really bad for us. Can we improve the
performance on this query ?

relname | relpages | reltuples
---------+----------+-----------
 users   |    39967 |    385875
bankaccount |      242 |     16453


Thanks!
Pallav





From:
Pierre-Frédéric Caillaud
Date:

    How many rows do the following queries return :

select userID
              from bankaccount ba
              where ba.bankaccountID = u.bankaccountID
              and   ba.accountnumber = '12345678'

select userID
           from bankaccount ba
           where ba.bankaccountID = u.bankaccountID
           and ba.routingNumber = '12345678'

    Can you post EXPLAIN ANALYZE for these two queries ?
    Regards.

From:
Pallav Kalva
Date:

Pierre-Frédéric Caillaud wrote:

>
>     How many rows do the following queries return :
>
> select userID
>              from bankaccount ba
>              where ba.bankaccountID = u.bankaccountID
>              and   ba.accountnumber = '12345678'
>
> select userID
>           from bankaccount ba
>           where ba.bankaccountID = u.bankaccountID
>           and ba.routingNumber = '12345678'
>
>     Can you post EXPLAIN ANALYZE for these two queries ?
>     Regards.
>
Thanks! for the quick reply. It should usually return just one account
for that user so its only one record. Actually userid column doesnt
exist on bankaccount table it exists only on the user table and it is
joined with bankaccountid column, if i run this query separately i
wouldnt able to run it .


From:
Pierre-Frédéric Caillaud
Date:


    Just wanted to know the selectivity of the accountnumber and
routingNumber columns.
    I shoulda written :

>>     How many rows do the following queries return :
    One or few at most, or a lot ?
>>
>> select userID
>>              from bankaccount
>>              WHERE accountnumber = '12345678'
>>
>> select userID
>>           from bankaccount
>>           WHERE routingNumber = '12345678'
>>
>>     Can you post EXPLAIN ANALYZE for these two queries ?
>>     Regards.
>>
> Thanks! for the quick reply. It should usually return just one account
> for that user so its only one record. Actually userid column doesnt
> exist on bankaccount table it exists only on the user table and it is
> joined with bankaccountid column, if i run this query separately i
> wouldnt able to run it .
>
>




From:
Pallav Kalva
Date:

Pierre-Frédéric Caillaud wrote:

>
>
>     Just wanted to know the selectivity of the accountnumber and
> routingNumber columns.
>     I shoulda written :
>
>>>     How many rows do the following queries return :
>>
>     One or few at most, or a lot ?


Just One, user can i have only one bankaccount.

>
>>>
>>> select userID
>>>              from bankaccount
>>>              WHERE accountnumber = '12345678'
>>>
>>> select userID
>>>           from bankaccount
>>>           WHERE routingNumber = '12345678'
>>>
>>>     Can you post EXPLAIN ANALYZE for these two queries ?
>>>     Regards.
>>>
>> Thanks! for the quick reply. It should usually return just one
>> account  for that user so its only one record. Actually userid column
>> doesnt  exist on bankaccount table it exists only on the user table
>> and it is  joined with bankaccountid column, if i run this query
>> separately i  wouldnt able to run it .
>>
>>
>
>
>



From:
Pierre-Frédéric Caillaud
Date:


> Just One, user can i have only one bankaccount.

    Ah well, in that case :
    This is your query :

select userID, fname, lname, email, phone, dateEntered, dateCanceled,
dateSuspended, billAmount, billDate, dateBilled, datePaid, '?' as searches
 from Users u
where 1=1 AND exists (select userID
            from bankaccount ba
            where ba.bankaccountID = u.bankaccountID
            and   ba.accountnumber = '12345678')
AND exists (select userID
         from bankaccount ba
         where ba.bankaccountID = u.bankaccountID
         and ba.routingNumber = '12345678')
order by UserID desc
limit 500

    What it does is scan all users, and for each user, test if it has the
accountnumber or the routingNumber you seek. You're reversing the problem
: you should first look for accountnumber and routingNumber, THEN look for
the user :


SELECT * FROM Users WHERE bankaccountID IN
(SELECT bankaccountID FROM bankaccount WHERE accountnumber = '12345678'
OR/AND routingNumber = '12345678')

or :

SELECT * FROM Users WHERE userID IN
(SELECT userID FROM bankaccount WHERE accountnumber = '12345678' OR/AND
routingNumber = '12345678')

There is something very strange in your query, it seems that bankaccount
and Users both have a UserID column and a bankaccountID column. Is this
normal ? It looks denormalized to me...

From:
Pallav Kalva
Date:

Pierre-Frédéric Caillaud wrote:

>
>
>> Just One, user can i have only one bankaccount.
>
>
>     Ah well, in that case :
>     This is your query :
>
> select userID, fname, lname, email, phone, dateEntered, dateCanceled,
> dateSuspended, billAmount, billDate, dateBilled, datePaid, '?' as
> searches
> from Users u
> where 1=1 AND exists (select userID
>            from bankaccount ba
>            where ba.bankaccountID = u.bankaccountID
>            and   ba.accountnumber = '12345678')
> AND exists (select userID
>         from bankaccount ba
>         where ba.bankaccountID = u.bankaccountID
>         and ba.routingNumber = '12345678')
> order by UserID desc
> limit 500
>
>     What it does is scan all users, and for each user, test if it has
> the  accountnumber or the routingNumber you seek. You're reversing the
> problem  : you should first look for accountnumber and routingNumber,
> THEN look for  the user :
>
>
> SELECT * FROM Users WHERE bankaccountID IN
> (SELECT bankaccountID FROM bankaccount WHERE accountnumber =
> '12345678'  OR/AND routingNumber = '12345678')
>
> or :
>
> SELECT * FROM Users WHERE userID IN
> (SELECT userID FROM bankaccount WHERE accountnumber = '12345678'
> OR/AND  routingNumber = '12345678')
>
> There is something very strange in your query, it seems that
> bankaccount  and Users both have a UserID column and a bankaccountID
> column. Is this  normal ? It looks denormalized to me...
>
Userid column is only in users table not in bankaccounts table , based
on your suggestion i made changes to the query and here are the explain
plans :


select userID, fname, lname, email, phone, dateEntered, dateCanceled,
dateSuspended, billAmount, billDate, dateBilled, datePaid, '?' as searches
from Users u
where bankaccountid in  (select bankaccountid
            from bankaccount ba
            where ba.bankaccountID = u.bankaccountID
            and   ba.accountnumber = '12345678')
 AND bankaccountid in (select bankaccountid
         from bankaccount ba
         where ba.bankaccountID = u.bankaccountID
         and ba.routingNumber = '12345678')
order by UserID desc
limit 500


QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..6642.59 rows=500 width=121) (actual
time=40180.116..93650.837 rows=1 loops=1)
   ->  Index Scan Backward using users_pkey on users u
(cost=0.00..1087936.69 rows=81891 width=121) (actual
time=40180.112..93650.829 rows=1 loops=1)
         Filter: ((subplan) AND (subplan))
         SubPlan
           ->  Index Scan using bankaccount_pkey on bankaccount ba
(cost=0.00..3.08 rows=1 width=4) (actual time=0.019..0.019 rows=0 loops=3)
                 Index Cond: (bankaccountid = $0)
                 Filter: (routingnumber = '12345678'::text)
           ->  Index Scan using bankaccount_pkey on bankaccount ba
(cost=0.00..3.08 rows=1 width=4) (actual time=0.004..0.004 rows=0
loops=385914)
                 Index Cond: (bankaccountid = $0)
                 Filter: (accountnumber = '12345678'::text)
 Total runtime: 93684.307 ms



select userID, fname, lname, email, phone, dateEntered, dateCanceled,
dateSuspended, billAmount, billDate, dateBilled, datePaid, '?' as searches
from Users u
where bankaccountid in  (select bankaccountid
            from bankaccount ba
            where ba.bankaccountID = u.bankaccountID
            and   ba.accountnumber = '12345678'
            and ba.routingNumber = '12345678')
order by UserID desc
limit 500



QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1777.53 rows=500 width=121) (actual
time=18479.669..63584.437 rows=1 loops=1)
   ->  Index Scan Backward using users_pkey on users u
(cost=0.00..582250.93 rows=163781 width=121) (actual
time=18479.663..63584.428 rows=1 loops=1)
         Filter: (subplan)
         SubPlan
           ->  Index Scan using bankaccount_pkey on bankaccount ba
(cost=0.00..3.09 rows=1 width=4) (actual time=0.004..0.004 rows=0
loops=385914)
                 Index Cond: (bankaccountid = $0)
                 Filter: ((accountnumber = '12345678'::text) AND
(routingnumber = '12345678'::text))
 Total runtime: 63596.222 ms

What's wierd is even though there is a index on bankaccountid table it
doesnt use that index, it uses the index on the userid table and the
execution time is little better but it still takes over a minute to
execute .



From:
Pierre-Frédéric Caillaud
Date:

    Your suffering comes from the "where ba.bankaccountID = u.bankaccountID"
in the subselect. It means postgres has to run the subselect once for each
row in Users. You want the subselect to run only once, and return one (or
more?) bankaccountid's, then fetch the users from Users.

    Just remove the "where ba.bankaccountID = u.bankaccountID" !

> select userID, fname, lname, email, phone, dateEntered, dateCanceled,
> dateSuspended, billAmount, billDate, dateBilled, datePaid, '?' as
> searches
> from Users u
> where bankaccountid in  (select bankaccountid
>            from bankaccount ba
>            where ba.bankaccountID = u.bankaccountID
>            and   ba.accountnumber = '12345678'
>            and ba.routingNumber = '12345678')
> order by UserID desc
> limit 500

New version :

  select userID, fname, lname, email, phone, dateEntered, dateCanceled,
  dateSuspended, billAmount, billDate, dateBilled, datePaid, '?' as
  searches
  from Users u
  where bankaccountid in  (select bankaccountid
             from bankaccount ba
             WHERE    ba.accountnumber = '12345678'
             and ba.routingNumber = '12345678')

You could also do this :

  select u.* from Users u, bankaccount ba
    where u.bankaccountid = ba.bankaccountid
    and   ba.accountnumber = '12345678'
             and ba.routingNumber = '12345678')




From:
Pallav Kalva
Date:

Pierre-Frédéric Caillaud wrote:

>
>     Your suffering comes from the "where ba.bankaccountID =
> u.bankaccountID"  in the subselect. It means postgres has to run the
> subselect once for each  row in Users. You want the subselect to run
> only once, and return one (or  more?) bankaccountid's, then fetch the
> users from Users.
>
>     Just remove the "where ba.bankaccountID = u.bankaccountID" !
>
>> select userID, fname, lname, email, phone, dateEntered, dateCanceled,
>> dateSuspended, billAmount, billDate, dateBilled, datePaid, '?' as
>> searches
>> from Users u
>> where bankaccountid in  (select bankaccountid
>>            from bankaccount ba
>>            where ba.bankaccountID = u.bankaccountID
>>            and   ba.accountnumber = '12345678'
>>            and ba.routingNumber = '12345678')
>> order by UserID desc
>> limit 500
>
>
> New version :
>
>  select userID, fname, lname, email, phone, dateEntered, dateCanceled,
>  dateSuspended, billAmount, billDate, dateBilled, datePaid, '?' as
>  searches
>  from Users u
>  where bankaccountid in  (select bankaccountid
>             from bankaccount ba
>             WHERE    ba.accountnumber = '12345678'
>             and ba.routingNumber = '12345678')
>
> You could also do this :
>
>  select u.* from Users u, bankaccount ba
>     where u.bankaccountid = ba.bankaccountid
>     and   ba.accountnumber = '12345678'
>             and ba.routingNumber = '12345678')
>
>
>
Thanks! a lot that was it , it is way much better now.