Thread: View based upon function won't use index on joins

View based upon function won't use index on joins

From
Jonathan Foy
Date:
Hello,

I've inherited some very...interestingly...
designed tables, and am trying to figure out how to make them usable.  I've got an ugly hack in place, but it will not use an index properly, and I'm hoping someone will be able to point me in the right direction.

Production is running 8.1.3, but I'm testing in 8.3.3.  I know that's not good, but I'm seeing the exact same problem in both, so hopefully fixing it in one will fix the other.

All tables/functions/views are included at the bottom, somewhat truncated to reduce length/repetition.

The table in question (though not the only one with this problem) has a series of 24 column pairs per row, one holding a code and the other a value.  Any code/value combo could be populated in any of these fields (the codes identify the type of value).  The row is keyed into based upon an id number/qualifier pair.  So, for a single id number/qualifier, there can be from 0 to 24 populated pairs.  We need to go in for a single key and pull a list of all codes/values.  Hopefully that makes sense.

I created a set-returning function that would pull in the row for a specific number/qualifier combination, check each code to see if it was null/empty, and if not it would return a record containing the code/value.

For various reasons I needed to create a view based upon this.  Due to postgres not liking having set-returning pl/pgsql functions in select statements, the only way that I could get the view to work was to create a pl/sql wrapper that simply pulls the results of the prior pl/pgsql function.

I have the view working, and if I pull straight from the view it uses the index properly (on id_nbr, id_qfr).  However, if I try to join to another table, based upon the indexed fields, I get a sequential scan.  This is not ideal at all.  I know a lot of this is bad practice and ugly, but I need to get something that will work.

Any ideas?  I'm willing to rework any and all as far as views/functions are concerned, redesigning the tables is sadly not an option at this time.


Ugly table:

CREATE TABLE value_codes
(
  id_nbr integer NOT NULL,
  id_qfr character(1) NOT NULL,
  val_1_cd_1 character varying(30),
  val_1_amt_1 numeric(10,2),
  val_1_cd_2 character varying(30),
  val_1_amt_2 numeric(10,2),
  ...
  val_2_cd_12 character varying(30),
  val_2_amt_12 numeric(10,2),
  CONSTRAINT value_codes_pkey PRIMARY KEY (id_nbr, id_qfr)
)
WITH (
  OIDS=TRUE
);



Joined table:

CREATE TABLE main_table
(
  id_nbr integer NOT NULL,
  id_qfr character(1) NOT NULL,
  create_dt character(8),
  create_tm character(8),
  CONSTRAINT main_table_pkey PRIMARY KEY (id_nbr, id_qfr)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX main_table_create_dt_index
  ON main_table
  USING btree
  (create_dt);



Initial function:

CREATE OR REPLACE FUNCTION get_value_codes(IN fun_id_nbr integer,
    IN fun_id_qfr character,
    OUT value_code character varying,
    OUT value_amount numeric)
  RETURNS SETOF record AS
$BODY$
declare
    current_row    record;
begin

    select    val_1_cd_1,
        val_1_amt_1,
        val_1_cd_2,
        val_1_amt_2,
        ...
        val_2_cd_12,
        val_2_amt_12
    into     current_row
    from     value_codes
    where   id_nbr = fun_id_nbr
        and id_qfr = fun_id_qfr;

    if
        current_row.val_1_cd_1 is not null
        and current_row.val_1_cd_1 != ''
    then
        value_code := current_row.val_1_cd_1;
        value_amount := current_row.val_1_amt_1;

        return next;
    end if;
    ...
    if
        current_row.val_2_cd_12 is not null
        and current_row.val_2_cd_12 != ''
    then
        value_code := current_row.val_2_cd_12;
        value_amount := current_row.val_2_amt_12;

        return next;
    end if;

    return;
end;
$BODY$
  LANGUAGE 'plpgsql' VOLATILE
  COST 100
  ROWS 10;



Wrapper function:

CREATE OR REPLACE FUNCTION get_value_codes_wrapper(IN id_nbr integer,
    IN id_qfr character,
    OUT value_code character varying,
    OUT value_amount numeric)
  RETURNS SETOF record AS
$BODY$
    SELECT * FROM get_value_codes($1, $2);
$BODY$
  LANGUAGE 'sql' VOLATILE
  COST 100
  ROWS 10;



View:

CREATE OR REPLACE VIEW value_codes_view AS
 SELECT value_codes.id_nbr,
      value_codes.id_qfr,
      (get_value_codes_wrapper(value_codes.id_nbr, value_codes.id_qfr)).value_code AS value_code,
      (get_value_codes_wrapper(value_codes.id_nbr, value_codes.id_qfr)).value_amount AS value_amount
   FROM value_codes;



Simple query Explained:

explain analyze select * from value_codes_view where id_nbr >= 90000000;

Index Scan using value_codes_pkey on value_codes  (cost=0.00..128.72 rows=53 width=6) (actual time=17.593..172.031 rows=15 loops=1)
  Index Cond: (id_nbr >= 90000000)
Total runtime: 172.141 ms


Join query explained:

explain analyze select * from main_table, value_codes_view
where create_dt >= '20091001'
and main_table.id_nbr = value_codes_view.id_nbr
and main_table.id_qfr = value_codes_view.id_qfr;

Hash Join  (cost=24.38..312425.40 rows=1 width=97) (actual time=220062.607..220295.870 rows=1 loops=1)
  Hash Cond: ((value_codes.id_nbr = main_table.id_nbr) AND (value_codes.id_qfr = main_table.id_qfr))
  ->  Seq Scan on value_codes  (cost=0.00..297676.77 rows=535427 width=6) (actual time=15.846..219553.511 rows=138947 loops=1)
  ->  Hash  (cost=21.47..21.47 rows=194 width=24) (actual time=0.455..0.455 rows=53 loops=1)
        ->  Index Scan using main_table_create_dt_index on main_table  (cost=0.00..21.47 rows=194 width=24) (actual time=0.033..0.243 rows=53 loops=1)
              Index Cond: (create_dt >= '20091001'::bpchar)
Total runtime: 220296.173 ms

Re: View based upon function won't use index on joins

From
Віталій Тимчишин
Date:
How about

CREATE OR REPLACE VIEW value_codes_view AS
select * from (
 SELECT value_codes.id_nbr,
      value_codes.id_qfr,
      (ARRAY[val_1_cd_1, ... , val_2_cd_12])[i] as value_code,
      (ARRAY[val_1_amt_1, ... , val_2_amt_12])[i] as value_amount,
   FROM value_codes, generate_series(1,24) i) a
where value_code is not null and value_code != '';
?

Re: View based upon function won't use index on joins

From
Jonathan Foy
Date:
This seems to result in the same problem; should I attempt to pull for a specific id_nbr/id_qfr, postgres uses the index without a problem. If I try to join the two tables/views however, it insists on doing a sequential scan (actually two in this case) and will not use the index.  Any other ideas/explanations?

That being said, I probably need to look into arrays more.  I haven't used them at all in my relatively brief experience with postgres.  More research!

2009/11/20 Віталій Тимчишин <tivv00@gmail.com>
How about


CREATE OR REPLACE VIEW value_codes_view AS
select * from (

 SELECT value_codes.id_nbr,
      value_codes.id_qfr,
      (ARRAY[val_1_cd_1, ... , val_2_cd_12])[i] as value_code,
      (ARRAY[val_1_amt_1, ... , val_2_amt_12])[i] as value_amount,
   FROM value_codes, generate_series(1,24) i) a
where value_code is not null and value_code != '';
?

Re: View based upon function won't use index on joins

From
Віталій Тимчишин
Date:


20 листопада 2009 р. 17:01 Jonathan Foy <thefoy@gmail.com> написав:
This seems to result in the same problem; should I attempt to pull for a specific id_nbr/id_qfr, postgres uses the index without a problem. If I try to join the two tables/views however, it insists on doing a sequential scan (actually two in this case) and will not use the index.  Any other ideas/explanations?

Have you tried to do same (join) when not using the viewes or converting columns into records? May be the problem is not in conversion, but in something simplier, like statistics or index bloat?

Best regards, Vitalii Tymchyshyn

Re: View based upon function won't use index on joins

From
Jonathan Foy
Date:
I don't think so. I actually dumped the tables involved into stripped down versions of themselves in a new database for testing, so the data involved should be completely fresh.  I ran a vacuum analyze after the dump of course.

Just for paranoia's sake though I did do the following:

explain analyze select id_nbr, id_qfr,
val_1_cd_1,
        val_1_cd_2,
        ...
        val_2_amt_12
from value_codes
where main_table.create_dt >= '20091001'
and main_table.id_nbr = value_codes.id_nbr
and main_table.id_qfr = value_codes.id_qfr

with the following results

"Nested Loop  (cost=0.00..1592.17 rows=132 width=150) (actual time=0.093..1.075 rows=4 loops=1)"
"  ->  Index Scan using main_table_create_dt_index on main_table  (cost=0.00..21.47 rows=194 width=6) (actual time=0.035..0.249 rows=53 loops=1)"
"        Index Cond: (create_dt >= '20091001'::bpchar)"
"  ->  Index Scan using value_codes_pkey on value_codes  (cost=0.00..8.08 rows=1 width=150) (actual time=0.007..0.007 rows=0 loops=53)"
"        Index Cond: ((value_codes.id_nbr = main_table.id_nbr) AND (value_codes.id_qfr = main_table.id_qfr))"
"Total runtime: 1.279 ms"


I'm stumped.  I'm starting to think that I'm trying to get postgres to do something that it just doesn't do.  Shy of just throwing a trigger in the table to actually populate a second table with the same data solely for reporting purposes, which I hate to do for obvious reasons, I don't know what else to do.  And this is only one example of this situation in the databases that I'm dealing with, I was hoping to come up with a more generic solution that I could apply in any number of locations.

I do very much appreciate the responses...I've been gradually getting deeper and deeper into postgres, and am still very much learning as I go.  All advice is very helpful.

Thanks..

2009/11/20 Віталій Тимчишин <tivv00@gmail.com>


20 листопада 2009 р. 17:01 Jonathan Foy <thefoy@gmail.com> написав:

This seems to result in the same problem; should I attempt to pull for a specific id_nbr/id_qfr, postgres uses the index without a problem. If I try to join the two tables/views however, it insists on doing a sequential scan (actually two in this case) and will not use the index.  Any other ideas/explanations?

Have you tried to do same (join) when not using the viewes or converting columns into records? May be the problem is not in conversion, but in something simplier, like statistics or index bloat?

Best regards, Vitalii Tymchyshyn

Re: View based upon function won't use index on joins

From
Robert Haas
Date:
2009/11/20 Jonathan Foy <thefoy@gmail.com>:
> Shy of just throwing a trigger in the
> table to actually populate a second table with the same data solely for
> reporting purposes,

That's what I would do in your situation, FWIW.  Query optimization is
a hard problem even under the best of circumstances; getting the
planner to DTRT with a crazy schema is - well, really hard.

...Robert