Thread: Composite fields and the query planner

Composite fields and the query planner

From
Steve Rogerson
Date:
I'm seeing a problem with the query planner not doing what's expected, and I
think it is because we are using composite fields. Here is a stripped down
example.

create type type1 as ( part1 varchar, part2 varchar);
create table table1 (field1 type1, field2 varchar);
create function get_part1(type1) returns varchar as $$ select ($1).part1 $$
language sql;
create index i1 on table1 (get_part1(field1));
create index i2 on table1 (field2);


If we do

select * from table1 where get_part1(field1) > 'val';

it correctly uses index i1, but if I do

select * from table1 where get_part1(field1) > 'val1' and field2 = 'val2';

It - essentially - gets the hits from both indexes and then "ands" them
together whatever the data distribution. In the case I am  interested in the
data is actually a timestamp, and I am typically looking from things that have
happened in the last little while. Typically I might get about 100 hits out of
about 20,000,000 records. "field2" is column that could return up to 10% of
the data. If we split field1 into two normal fields   and index the first part
  naturally, the query planner correctly returns the recent ones and filters
them out according to field2.

Naively, it looks to me that issue is that pg_stats table is holding data
about the columns ( it seems even if the are not indexed )  and not storing
information about the values actually stored in the index. As a result the
query planner has no information about the data distribution so can't do it's job.

For many reason's we don't want to have to split up the data into separate
parts, unless we really really have to.


Steve


Re: Composite fields and the query planner

From
Tom Lane
Date:
Steve Rogerson <steve.pg@yewtc.demon.co.uk> writes:
> I'm seeing a problem with the query planner not doing what's expected, and I
> think it is because we are using composite fields. Here is a stripped down
> example.

> create type type1 as ( part1 varchar, part2 varchar);
> create table table1 (field1 type1, field2 varchar);
> create function get_part1(type1) returns varchar as $$ select ($1).part1 $$
> language sql;
> create index i1 on table1 (get_part1(field1));
> create index i2 on table1 (field2);

I tested this example in HEAD and 9.0.x and didn't see any particular
problem with rowcount estimates for the get_part1() expression.  You
do have to have the i1 index in place when the table is analyzed, else
ANALYZE won't collect any stats about the expression.

            regards, tom lane


Re: Composite fields and the query planner

From
Steve Rogerson
Date:
On 05/05/13 15:06, Tom Lane wrote:
> Steve Rogerson <steve.pg@yewtc.demon.co.uk> writes:
>> I'm seeing a problem with the query planner not doing what's expected, and I
>> think it is because we are using composite fields. Here is a stripped down
>> example.
>
> I tested this example in HEAD and 9.0.x and didn't see any particular
> problem with rowcount estimates for the get_part1() expression.  You
> do have to have the i1 index in place when the table is analyzed, else
> ANALYZE won't collect any stats about the expression.
>
>             regards, tom lane

I should have said I am using 9.0.7 - also I have "ANALYZEd" all the relevant
tables. Having said that I am having problems re-creating a rich enough example.

Steve



Re: Composite fields and the query planner

From
Steve Rogerson
Date:
On 05/05/13 15:06, Tom Lane wrote:
> Steve Rogerson <steve.pg@yewtc.demon.co.uk> writes:
>> I'm seeing a problem with the query planner not doing what's expected, and I
>> think it is because we are using composite fields. Here is a stripped down
>> example.
>> create type type1 as ( part1 varchar, part2 varchar);
>> create table table1 (field1 type1, field2 varchar);
>> create function get_part1(type1) returns varchar as $$ select ($1).part1 $$
>> language sql;
>> create index i1 on table1 (get_part1(field1));
>> create index i2 on table1 (field2);
> I tested this example in HEAD and 9.0.x and didn't see any particular
> problem with rowcount estimates for the get_part1() expression.  You
> do have to have the i1 index in place when the table is analyzed, else
> ANALYZE won't collect any stats about the expression.
>
>             regards, tom lane
In the end it turned out to be another issue.  As best as I can see, what
happened was that if the value of field2 was not in the common values list,
then it would  use  the two indexes, otherwise it would use one index and
filter by field2. It takes about 10 times longer to get the two list and "AND"
them than doing one and filtering.  I think we've found a work around, using a
combined index.

Having said all that I did discover an unrelated problem with composite
fields. Basically it doesn't seem to consult the pg_stats data to decide if to
do a seq scan or an index scan. Always saying that it should do an index
scan.  I'm not quite sure how it could compare the value that is indexed with
the pg_stats data, so I guess it either decides it's all less than or all
greater than the value you are checking for.


Steve