Thread: [NOVICE] Queries with Joins before filtering taking too much time! Filter(where clause) *first* -- suggestions ?
[NOVICE] Queries with Joins before filtering taking too much time! Filter(where clause) *first* -- suggestions ?
From
Hursh Jain
Date:
Hi: Postgres 9.6.1 on a large linux box. I have 2 tables under consideration: 1. property (about 800,000 rows) This has a attribute called current_owner which is FK pointing to uid in users table. This attribute can be null. 2. users (about 400 rows) This has users in the system, each user has a uid. ===== Query: SELECT * FROM property p LEFT OUTER JOIN users ON (p.current_owner = users.uid) WHERE p.pid in (SELECT pid FROM reward WHERE reward_type = 'daily' ORDER BY reward_date DESC LIMIT 30) ; This query takes a looong time. I tried an EXPLAIN ANALYSE and this is what I got: QUERY PLAN ------------------------------------------------------------------------------ Merge Cond: (users.uid = p.current_owner) -> Index Scan using pk_users on users (cost=0.27..393.66 rows=414 width=198) (actual time=0.007..0.007 rows=1 loops=1) -> Index Scan using idx_property_current_owner on property p (cost=0.43..9889803.81 rows=3201653 width=775) (actual time=10785.283..10785.2..83 rows=0 loops=1) Filter: (SubPlan 1) Rows Removed by Filter: 760651 SubPlan 1 -> Limit (cost=1.31..1.32 rows=1 width=16) (actual time=0.011..0.011 rows=0 loops=760651) -> Sort (cost=1.31..1.32 rows=1 width=16) (actual time=0.008..0.008 rows=0 loops=760651) Sort Key: reward.reward_date DESC Sort Method: quicksort Memory: 25kB -> Index Scan using idx_reward_reward_type on reward (cost=0.29..1.30 rows=1 width=16) (actual time=0.005..0.005 rows=0.. loops=760651) Index Cond: (reward_type = 'daily'::text) Planning time: 5.575 ms Execution time: 10785.510 ms --------------------------------- You can see the index scan on property is done first, like so: Index Scan using idx_property_current_owner on property p (cost=0.43..9889803.81 rows=3201653 width=775) (actual time=10785.283..10785.2..83 rows=0 loops=1) This take 10+ seconds. But this is wasted, since the number of property rows are then filtered to less than 20 later on (via the WHERE clause). Any suggestions on how to do the filtering FIRST and then do the join ? I need to left outer join with users since the user (current_owner) might be null. Best, --J
[NOVICE] Re: Queries with Joins before filtering taking too much time! Filter(where clause) *first* -- suggestions ?
From
Hursh Jain
Date:
As a followup: I noticed that there isn't any "pid" column in the reward table. So that was an error and maybe that's why a sequential scan was being forced. To recap, the query was: SELECT * FROM property p LEFT OUTER JOIN users ON (p.current_owner = users.uid) WHERE p.pid in (SELECT pid FROM reward WHERE reward_type = 'daily' ORDER BY reward_date DESC LIMIT 30) But weirdly, the query still runs without any errors. Why ? === I tried this (a simpler version of the above): SELECT * from property WHERE pid in ( SELECT pid FROM reward WHERE reward_type = 'DAILYPROMO_WIN') ; And this also runs (again taking a long time) but without any errors. How can this be ? (there is no pid in the reward table, so why does the subquery not throw an error)? In fact, because there was no error, I didn't notice my typo (no pid in reward table) until right now... Best, --J
Re: [NOVICE] Queries with Joins before filtering taking too muchtime! Filter (where clause) *first* -- suggestions ?
From
"David G. Johnston"
Date:
On Sunday, January 15, 2017, Hursh Jain <hurshj@gmail.com> wrote:
SELECT * from property
WHERE pid in (
SELECT pid FROM reward WHERE reward_type = 'DAILYPROMO_WIN')
;
And this also runs (again taking a long time) but without any errors. How can this be ? (there is no pid in the reward table, so why does the
subquery not throw an error)?
The "pid" column is the one from the outer query property table - this is a correlated subquery that you've written.
As long as one record with that reward_type exists your query devolves to "where true" thus making indexes useless and returning every row in property.
David J.
Re: [NOVICE] Queries with Joins before filtering taking too muchtime! Filter (where clause) *first* -- suggestions ?
From
Hursh Jain
Date:
Thanks David. I didn't realize that the column names in the subquery would be scoped to the outer query table. I thought one *had* to use an alias for that and by default the column names were always scoped to the inner query (and would error out otherwise). Definitely learnt something new today..thanks again for the reply! Best, --H David G. Johnston wrote: > On Sunday, January 15, 2017, Hursh Jain <hurshj@gmail.com > <mailto:hurshj@gmail.com>> wrote: > > > SELECT * from property > WHERE pid in ( > SELECT pid FROM reward WHERE reward_type = 'DAILYPROMO_WIN') > ; > > And this also runs (again taking a long time) but without any > errors. How can this be ? (there is no pid in the reward table, so > why does the > subquery not throw an error)? > > > > The "pid" column is the one from the outer query property table - this > is a correlated subquery that you've written. > > As long as one record with that reward_type exists your query devolves > to "where true" thus making indexes useless and returning every row in > property. > > David J. >