index on custom function; explain - Mailing list pgsql-performance

From jan.aerts@bbsrc.ac.uk
Subject index on custom function; explain
Date
Msg-id 1128352451.312090.37640@g44g2000cwa.googlegroups.com
Whole thread Raw
Responses Re: index on custom function; explain
List pgsql-performance
Hi,

I'm trying to include a custom function in my SQL-queries, which
unfortunately leaves the server hanging...

I basically search through two tables:
* TABLE_MAPPING: lists that 'abc' is mapped to 'def'
  id1 | name1 | id2 | name2
  -------------------------
  1   | abc   | 2   | def
  3   | uvw   | 4   | xyz
This data means that 'abc' is mapped_to 'def', and 'uvw' is mapped_to
'xyz'. About 1,500,000 records in total.

* TABLE ALIASES: lists different aliases of the same thing
  id1 | name1 | id2 | name2
  -------------------------
  3   | uvw   | 2   | def
This data means that 'uvw' and 'def' are essentially the same thing.
About 820,000 records in total.

I have indexes on all columns of the above tables.

Based on the two tables above, 'abc' is indirectly mapped_to 'xyz' as
well (through 'def' also-known-as 'uvw').

I wrote this little function: aliases_of
CREATE FUNCTION aliases_of(INTEGER) RETURNS SETOF integer
AS 'SELECT $1
    UNION
    SELECT id1 FROM aliases WHERE id2 = $1
    UNION
    SELECT id2 FROM aliases WHERE id1 = $1
    '
LANGUAGE SQL
STABLE;

A simple SELECT aliases_of(2) shows:
  aliases_of
  ----------
  2
  3

Now, when I want to traverse the aliases, I would write a query as
follows:
SELECT m1.name1, m1.name2, m2.name1, m2.name2
FROM mappings m1, mappings m2
WHERE m1.name1 = 'abc'
AND m2.name1 IN (SELECT aliases_of(m1.name2));

Unfortunately, this query seems to keep running and to never stop...


An EXPLAIN of the above query shows:
QUERY PLAN
-----------------------------------------------------------------------------
 Nested Loop  (cost=0.00..118379.45 rows=1384837 width=80)
   Join Filter: (subplan)
   ->  Index Scan using ind_cmappings_object1_id on c_mappings m1
(cost=0.00..7.08 rows=2 width=40)
         Index Cond: (name1 = 'abc')
   ->  Seq Scan on c_mappings m2  (cost=0.00..35935.05 rows=1423805
width=40)
   SubPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
(7 rows)

Strangely enough, I _do_ get output when I type the following query:
SELECT m1.name1, m1.name2, m2.name1, m2.name2
FROM mappings m1, mappings m2
WHERE m1.name1 = 'abc'
AND m2.name1 IN (
  SELECT m1.name2
  UNION
  SELECT name2 FROM aliases WHERE name1 = m1.name2
  UNION
  SELECT name1 FROM aliases WHERE name2 = m2.name1
);

The EXPLAIN for this query is:
                                                       QUERY PLAN
-------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..36712030.90 rows=1384837 width=80)
   Join Filter: (subplan)
   ->  Index Scan using ind_cmappings_object1_id on c_mappings m1
(cost=0.00..7.08 rows=2 width=40)
         Index Cond: (object1_id = 16575564)
   ->  Seq Scan on c_mappings m2  (cost=0.00..35935.05 rows=1423805
width=40)
   SubPlan
     ->  Unique  (cost=13.21..13.23 rows=1 width=4)
           ->  Sort  (cost=13.21..13.22 rows=3 width=4)
                 Sort Key: object2_id
                 ->  Append  (cost=0.00..13.18 rows=3 width=4)
                       ->  Subquery Scan "*SELECT* 1"  (cost=0.00..0.01
rows=1 width=0)
                             ->  Result  (cost=0.00..0.01 rows=1
width=0)
                       ->  Subquery Scan "*SELECT* 2"  (cost=0.00..5.92
rows=1 width=4)
                             ->  Index Scan using
ind_caliases_object2_id on c_aliases  (cost=0.00..5.92 rows=1 width=4)
                                   Index Cond: (object2_id = $0)
                       ->  Subquery Scan "*SELECT* 3"  (cost=0.00..7.25
rows=1 width=4)
                             ->  Index Scan using
ind_caliases_object1_id on c_aliases  (cost=0.00..7.25 rows=1 width=4)
                                   Index Cond: (object1_id = $0)
(18 rows)

So my questions are:
* Does anyone have any idea how I can integrate a function that lists
all aliases for a given name into such a mapping query?
* Does the STABLE keyword in the function definition make the function
to read all its data into memory?
* Is there a way to let postgres use an "Index scan" on that function
instead of a "seq scan"?

Any help very much appreciated,
Jan Aerts


pgsql-performance by date:

Previous
From: "Dann Corbit"
Date:
Subject: Re: [HACKERS] A Better External Sort?
Next
From: "Kevin Grittner"
Date:
Subject: Re: Is There Any Way ....