domain cast in parameterized vs. non-parameterized query - Mailing list pgsql-hackers

From David Kamholz
Subject domain cast in parameterized vs. non-parameterized query
Date
Msg-id CAKuxgJ51c3k4JsB2uK8tS5_sWfSvTXY8tHOkwJwD23HZtTb1Vw@mail.gmail.com
Whole thread Raw
Responses Re: domain cast in parameterized vs. non-parameterized query  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
I've recently come across a query that produces different plans depending on whether it's parameterized or not. The parameterized query takes ~50ms while the non-parameterized query takes ~4s. The issue seems to be that the query contains a STABLE function (uid_langvar) whose parameter is a domain over text (uid).

The parameterized query is able to use the return value of uid_langvar to choose a better plan:

2017-12-19 23:13:21 GMT LOG:  duration: 0.063 ms  plan:
        Query Text:
        SELECT id FROM langvar WHERE uid(lang_code, var_code) = uid
       
        Index Scan using langvar_uid_idx on langvar  (cost=0.29..2.59 rows=1 width=4) (actual time=0.058..0.058 rows=1 loops=1)
          Index Cond: ((((((lang_code)::text || '-'::text) || lpad((var_code)::text, 3, '0'::text)))::uid)::text = ($1)::text)
2017-12-19 23:13:21 GMT CONTEXT:  SQL function "uid_langvar" statement 1
2017-12-19 23:13:21 GMT LOG:  duration: 150.634 ms  plan:
        Query Text: select expr.id, expr.langvar, expr.txt, expr.txt_degr, exprx.score as expr_score
        from expr
        inner join exprx on expr.id = exprx.id
        where expr.langvar = uid_langvar($1)
        order by exprx.score desc
        limit 10
       
        Limit  (cost=0.88..426.76 rows=10 width=45) (actual time=133.378..150.611 rows=10 loops=1)
          ->  Nested Loop  (cost=0.88..23706962.21 rows=556656 width=45) (actual time=133.376..150.598 rows=10 loops=1)
                ->  Index Scan Backward using exprx_score_langvar_idx on exprx  (cost=0.44..2973934.39 rows=25583602 width=8) (actual time=0.052..13.479 rows=5589 loops=1)
                ->  Index Scan using expr_pkey on expr  (cost=0.44..0.81 rows=1 width=41) (actual time=0.023..0.023 rows=0 loops=5589)
                      Index Cond: (id = exprx.id)
                      Filter: (langvar = uid_langvar('spa-000'::uid))
                      Rows Removed by Filter: 1

Note that "SELECT id FROM langvar..." is the body of the uid_langvar function. Also note that in the filter condition, 'spa-000' is cast directly to uid. However, the non-parameterized query, where 'spa-000' is passed directly, produces a different plan:

2017-12-19 23:18:01 GMT LOG:  duration: 0.066 ms  plan:
        Query Text: 
        SELECT id FROM langvar WHERE uid(lang_code, var_code) = uid
        
        Index Scan using langvar_uid_idx on langvar  (cost=0.29..2.59 rows=1 width=4) (actual time=0.062..0.062 rows=1 loops=1)
          Index Cond: ((((((lang_code)::text || '-'::text) || lpad((var_code)::text, 3, '0'::text)))::uid)::text = ($1)::text)
2017-12-19 23:18:01 GMT CONTEXT:  SQL function "uid_langvar" statement 1
2017-12-19 23:18:05 GMT LOG:  duration: 3950.817 ms  plan:
        Query Text: select expr.id, expr.langvar, expr.txt, expr.txt_degr, exprx.score as expr_score 
        from expr 
        inner join exprx on expr.id = exprx.id 
        where expr.langvar = uid_langvar('spa-000')
        order by exprx.score desc 
        limit 10
        ;
        Limit  (cost=12842.48..12842.51 rows=10 width=45) (actual time=3950.777..3950.793 rows=10 loops=1)
          ->  Sort  (cost=12842.48..12850.44 rows=3182 width=45) (actual time=3950.775..3950.780 rows=10 loops=1)
                Sort Key: exprx.score DESC
                Sort Method: top-N heapsort  Memory: 26kB
                ->  Nested Loop  (cost=1.13..12773.72 rows=3182 width=45) (actual time=1.524..3541.873 rows=561076 loops=1)
                      ->  Index Scan using expr_langvar_id_idx on expr  (cost=0.69..3823.68 rows=3183 width=41) (actual time=1.480..717.547 rows=561293 loops=1)
                            Index Cond: (langvar = uid_langvar(('spa-000'::text)::uid))
                      ->  Index Scan using exprx_id_idx on exprx  (cost=0.44..2.71 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=561293)
                            Index Cond: (id = expr.id)

Note that in the above plan, 'spa-000' is cast to text before it's cast to uid. This is apparently connected to why postgresql can't choose the better plan.

This difference between plans of parameterized and non-parameterized queries seems strange to me. Is it actually expected/correct or is it a bug?

Here's the definition of the domain and the functions, in case it's relevant:

CREATE DOMAIN uid AS text
CONSTRAINT uid_check CHECK ((VALUE ~ '^[a-z]{3}-\d{3}$'::text));

CREATE FUNCTION uid_langvar(uid uid) RETURNS integer
    LANGUAGE sql STABLE PARALLEL SAFE
    AS $$
SELECT id FROM langvar WHERE uid(lang_code, var_code) = uid
$$;

CREATE FUNCTION uid(lang_code alpha3, var_code smallint) RETURNS uid
    LANGUAGE sql IMMUTABLE PARALLEL SAFE
    AS $$select (lang_code || '-' || lpad(var_code::text, 3, '0'))::uid;$$;

pgsql-hackers by date:

Previous
From: Ildar Musin
Date:
Subject: General purpose hashing func in pgbench
Next
From: Thomas Munro
Date:
Subject: Re: [HACKERS] Parallel Hash take II