Thread: Left joining table to setof function

Left joining table to setof function

From
Robert Fitzpatrick
Date:
I have a function that returns a set of a custom type...

trax=# \d client_search
  Composite type "public.client_search"
     Column     |          Type
----------------+------------------------
 clientno       | bigint
 client         | character varying(100)
 contactno      | bigint
 city           | character varying(50)
 state          | character varying(50)
 contact_info   | text
 contact_title  | character varying(100)
 contact_phone  | character varying(20)
 contact_cell   | character varying(20)
 clientnamesort | text
 lastnamesort   | text

...and need to left join this with a table that can tell me if each
record exists in that table depending on certain values. When I get to
tens of thousands of records returned by the function, while my runtime
of the function itself is little changed, I am struggling to lower the
runtime as it gets very high when there are more and more matches in the
join. I have gone through to make sure all my comparisons in the join
are of same type, here is the table I am joining...

trax=# \d tblsearch_selections                                               Table "public.tblsearch_selections"
       Column        |         Type          |                                     Modifiers
          

---------------------+-----------------------+------------------------------------------------------------------------------------
 search_selection_id | bigint                | not null default
nextval('tblsearch_selections_search_selection_id_seq'::regclass)
 search_id           | integer               |
 user_id             | character varying(12) |
 selection_value     | text                  |
 selected_value      | bigint                |
 selection_type      | character varying(10) | default 'client'::character varying
Indexes:
    "tblsearch_selections_pkey" PRIMARY KEY, btree (search_selection_id)
    "search_selection_unique_idx" UNIQUE, btree (search_id, user_id, selection_value, selection_type)
Foreign-key constraints:
    "tblsearch_selections_search_fk" FOREIGN KEY (search_id) REFERENCES tblclientsearch(search_id) ON UPDATE CASCADE ON
DELETECASCADE 
    "tblsearch_selections_user_fk" FOREIGN KEY (user_id) REFERENCES tbluser(flduserid) ON UPDATE CASCADE ON DELETE
CASCADE

Here is the analysis of the query with over 12000 join matches...

trax=# explain analyze select * from client_search_id_func(62) left join tblsearch_selections ON search_id = 62 and
user_id= 'RF' and ((selected_value = clientno and selection_type = 'client') or (selected_value = contactno and
selection_type= 'contact')); 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=301.98..9434.48 rows=1000 width=426) (actual time=288.130..234578.634 rows=18202 loops=1)
   Join Filter: (((tblsearch_selections.selected_value = client_search_id_func.clientno) AND
((tblsearch_selections.selection_type)::text= 'client'::text)) OR ((tblsearch_selections.selected_value =
client_search_id_func.contactno)AND ((tblsearch_selections.selection_type)::text = 'contact'::text))) 
   ->  Function Scan on client_search_id_func  (cost=0.00..12.50 rows=1000 width=382) (actual time=253.805..265.752
rows=18202loops=1) 
   ->  Materialize  (cost=301.98..305.02 rows=304 width=44) (actual time=0.001..4.931 rows=12747 loops=18202)
         ->  Seq Scan on tblsearch_selections  (cost=0.00..301.68 rows=304 width=44) (actual time=0.040..12.749
rows=12747loops=1) 
               Filter: ((search_id = 62) AND ((user_id)::text = 'RF'::text))
 Total runtime: 234673.875 ms
(7 rows)

Perhaps I need to use subselects or some other approach?

trax=# explain analyze select *, (select search_selection_id from tblsearch_selections where search_id = 62 and user_id
='RF' and ((selected_value = clientno and selection_type = 'client') or (selected_value = contactno and selection_type
='contact'))) from client_search_id_func(62); 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Function Scan on client_search_id_func  (cost=0.00..315837.50 rows=1000 width=382) (actual time=259.520..148524.590
rows=18202loops=1) 
   SubPlan
     ->  Seq Scan on tblsearch_selections  (cost=0.00..315.82 rows=1 width=8) (actual time=3.555..8.131 rows=1
loops=18202)
           Filter: ((search_id = 62) AND ((user_id)::text = 'RF'::text) AND (((selected_value = $0) AND
((selection_type)::text= 'client'::text)) OR ((selected_value = $1) AND ((selection_type)::text = 'contact'::text)))) 
 Total runtime: 148540.593 ms

And with no matches, the winner so far is the left join...

trax=# delete from tblsearch_selections where search_id = 62;
DELETE 12747
trax=# explain analyze select * from client_search_id_func(62) left join tblsearch_selections ON search_id = 62 and
user_id= 'RF' and ((selected_value = clientno and selection_type = 'client') or (selected_value = contactno and
selection_type= 'contact')); 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=513.90..384586.40 rows=1000 width=426) (actual time=253.248..308.741 rows=18202 loops=1)
   Join Filter: (((tblsearch_selections.selected_value = client_search_id_func.clientno) AND
((tblsearch_selections.selection_type)::text= 'client'::text)) OR ((tblsearch_selections.selected_value =
client_search_id_func.contactno)AND ((tblsearch_selections.selection_type)::text = 'contact'::text))) 
   ->  Function Scan on client_search_id_func  (cost=0.00..12.50 rows=1000 width=382) (actual time=251.167..258.764
rows=18202loops=1) 
   ->  Materialize  (cost=513.90..641.92 rows=12802 width=44) (actual time=0.001..0.001 rows=0 loops=18202)
         ->  Seq Scan on tblsearch_selections  (cost=0.00..501.10 rows=12802 width=44) (actual time=2.066..2.066 rows=0
loops=1)
               Filter: ((search_id = 62) AND ((user_id)::text = 'RF'::text))
 Total runtime: 319.371 ms
(7 rows)

Thanks for any help!

--
Robert


Re: Left joining table to setof function

From
Gregory Stark
Date:
"Robert Fitzpatrick" <lists@webtent.net> writes:

> Indexes:
>     "tblsearch_selections_pkey" PRIMARY KEY, btree (search_selection_id)
>     "search_selection_unique_idx" UNIQUE, btree (search_id, user_id, selection_value, selection_type)


   SELECT *
     FROM client_search_id_func(62)
LEFT JOIN tblsearch_selections ON search_id = 62
                              AND user_id = 'RF'
                              AND ((    selected_value = clientno
                                    AND selection_type = 'client'
                                   ) OR (
                                        selected_value = contactno
                                    AND selection_type = 'contact'))


According to the index the unique way to identify a record is (search_id,
user_id, selection_value, selection_type). But your join condition is matching
on (search_id, user_id, ***selected_value***, selection_type).

I'm not sure what these columns are but this looks like a bug. Certainly it
will make PostgreSQL less likely to use the index since it can only use the
first two columns of it which leaves it with few options.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Left joining table to setof function

From
Tom Lane
Date:
Robert Fitzpatrick <lists@webtent.net> writes:
> trax=# explain analyze select * from client_search_id_func(62) left join tblsearch_selections ON search_id = 62 and
user_id= 'RF' and ((selected_value = clientno and selection_type = 'client') or (selected_value = contactno and
selection_type= 'contact')); 

I don't think you're going to get anywhere until you get rid of the
OR-of-ANDs join condition; the OR defeats using any reasonably
intelligent join method such as a merge or hash join.  You should
redesign the set-returning function to emit a representation that
matches the table you want to search, or perhaps rethink the table's
representation.

            regards, tom lane