Thread: Query Planner Issue - View and regexp_replace?

Query Planner Issue - View and regexp_replace?

From
Brandon Kane
Date:
I'm not entirely sure where to ask this question, or what to include, so I'm hoping folks here can
help me out with that.

Basically, I have a query that joins a table to a view.  The view contains a CASE statement with a
regexp_replace call.  The query plan used involves a sequential scan on the table used in the
view.  However, when I run directly against the table, or include the view's SQL in my query
directly, I (correctly) get a nested loop scan with index scans.  When I remove the regexp_replace
clause from my CASE statement in the view, then the original query begins to use the "correct"
query plan.

I'm able to reproduce this behavior with simple test tables and absolutely no test data loaded, so
I'm including the simple reproduction case below, with all the SQL.

Any help in directing this into the correct format and to the correct place is appreciated.

Thanks,
-Brandon

-- Server Version String
-- PostgreSQL 8.2.5 on i386-redhat-linux-gnu,
-- compiled by GCC gcc (GCC) 4.1.2 20070502 (Red Hat 4.1.2-12)


--
--  Main Table
--
CREATE TABLE test_table
(
  id character(12) NOT NULL,
  data character varying(20),
  CONSTRAINT "PK_test_table" PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
ALTER TABLE test_table OWNER TO postgres;

--
-- Join Table - Filtering done here, then joined to main table
--
CREATE TABLE test_join_table
(
  joinid integer NOT NULL,
  refid character(12) NOT NULL,
  data integer NOT NULL,
  CONSTRAINT "PK_join_test" PRIMARY KEY (joinid),
  CONSTRAINT "FK_join_test_table" FOREIGN KEY (refid)
      REFERENCES test_table (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (OIDS=FALSE);
ALTER TABLE test_join_table OWNER TO postgres;

CREATE INDEX "IDX_join_data"
  ON test_join_table
  USING btree
  (data);



--
--  View on Main Table.  Used in join
--
CREATE OR REPLACE VIEW test_view AS
 SELECT test_table.id, test_table.data,
        CASE
            WHEN btrim(test_table.data::text) = ''::text THEN '0'::text::bpchar::character varying
-- Removing this condition from CASE statement in view corrects query planner problem.
            WHEN NOT isnumeric(test_table.data::text) THEN
                  regexp_replace(
                        test_table.data::text,
                        '[^0-9]'::text,
                        ''::text,
                        'g'::text)::bpchar::character varying
--
            ELSE test_table.data
        END AS datafixed
   FROM test_table;

ALTER TABLE test_view OWNER TO postgres;


--
-- Query #1 - This returns the correct query plan.  View is not involved.
--
select tt.* FROM test_table tt
INNER JOIN test_join_table jt
ON tt.id = jt.refid
AND jt.joinid = 3 AND jt.data = 4


--
-- Query #2 - This is the problem query.  It fails to return a plan involving a nested loop
--  with the two tables.  If the view is modified as mentioned above, this does return the
--  correct query plan.
--
select tt.* FROM test_view tt
INNER JOIN test_join_table jt
ON tt.id = jt.refid
AND jt.joinid = 3 AND jt.data = 4


--
-- Query #3 - Interestingly, this returns the correct query plan.  I've taken the view SQL, and
--  merged it into the larger query, and it seems to return the correct query plan, even with
--  the problematic line in the CASE statement.
--
SELECT tt.id, tt.data,
        CASE
            WHEN btrim(tt.data::text) = ''::text THEN '0'::text::bpchar::character varying
            WHEN NOT isnumeric(tt.data::text) THEN
                      regexp_replace(
                           tt.data::text,
                           '[^0-9]'::text,
                           ''::text,
                           'g'::text)::bpchar::character varying
            ELSE tt.data
        END AS datafixed
   FROM test_table tt
INNER JOIN test_join_table jt
ON tt.id = jt.refid
AND jt.joinid = 3 AND jt.data = 4


-- This query plan discrepancy occurs with absolutely no data loaded in the tables, as well as in
-- the actual tables with actual data loaded.  This suggests that the problem is not related to
-- statistics.  Since the field being operated in the case statement is not an index/search or
-- join column, I can't figure out why it affects the query plan.  I can't seem to find anything
-- on this through Google.
-- I expect that the query planner will return a nested loop scan with the join table being on
-- the outer loop.  In my practical scenario, this is the most efficient query, since the join
-- table filtering will remove most of the rows.  What I find in query #2 is that instead a
-- hash/hash join is performed, which does a sequential scan on the main table.  This is what
-- I'm trying to avoid.






      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page.
http://www.yahoo.com/r/hs

Re: Query Planner Issue - View and regexp_replace?

From
Tom Lane
Date:
Brandon Kane <brandonk98@yahoo.com> writes:
> -- Removing this condition from CASE statement in view corrects query planner problem.
>             WHEN NOT isnumeric(test_table.data::text) THEN
>                   regexp_replace(
>                         test_table.data::text,
>                         '[^0-9]'::text,
>                         ''::text,
>                         'g'::text)::bpchar::character varying
> --

What's isnumeric(), and is it by any chance marked volatile?
If so, that's your problem.

            regards, tom lane

Re: Query Planner Issue - View and regexp_replace?

From
Brandon Kane
Date:
Yes, this turns out to be a volatile user-defined function.  Is there some specific part of the
documentation you can point me to that helps explain why a volatile function might affect the plan
for a column not involved in select or join operations?  I happened to pull this function off a
posting somewhere, so after reviewing the volatility keywords, volatile is clearly not required,
so that will be an easy fix.  I'm still interested in knowing more about how volatility fits into
the query planner.

Thanks for your help.
-Brandon

--- Tom Lane wrote:

> What's isnumeric(), and is it by any chance marked volatile?
> If so, that's your problem.
>
>             regards, tom lane
>



      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page.
http://www.yahoo.com/r/hs

Re: Query Planner Issue - View and regexp_replace?

From
Tom Lane
Date:
Brandon Kane <brandonk98@yahoo.com> writes:
> Yes, this turns out to be a volatile user-defined function.  Is there
> some specific part of the documentation you can point me to that helps
> explain why a volatile function might affect the plan for a column not
> involved in select or join operations?

I don't believe it's documented anywhere outside the source code, but
the short explanation is that the planner won't flatten the view for
fear of introducing extra evaluations of the volatile function.
(The fact that it isn't actually used at all doesn't help, because
we don't know that at the point where this decision has to be made.)

You could search the archives for the discussions leading to this patch:
http://archives.postgresql.org/pgsql-committers/2006-08/msg00358.php

            regards, tom lane