Thread: Composite fields and the query planner
I'm seeing a problem with the query planner not doing what's expected, and I think it is because we are using composite fields. Here is a stripped down example. create type type1 as ( part1 varchar, part2 varchar); create table table1 (field1 type1, field2 varchar); create function get_part1(type1) returns varchar as $$ select ($1).part1 $$ language sql; create index i1 on table1 (get_part1(field1)); create index i2 on table1 (field2); If we do select * from table1 where get_part1(field1) > 'val'; it correctly uses index i1, but if I do select * from table1 where get_part1(field1) > 'val1' and field2 = 'val2'; It - essentially - gets the hits from both indexes and then "ands" them together whatever the data distribution. In the case I am interested in the data is actually a timestamp, and I am typically looking from things that have happened in the last little while. Typically I might get about 100 hits out of about 20,000,000 records. "field2" is column that could return up to 10% of the data. If we split field1 into two normal fields and index the first part naturally, the query planner correctly returns the recent ones and filters them out according to field2. Naively, it looks to me that issue is that pg_stats table is holding data about the columns ( it seems even if the are not indexed ) and not storing information about the values actually stored in the index. As a result the query planner has no information about the data distribution so can't do it's job. For many reason's we don't want to have to split up the data into separate parts, unless we really really have to. Steve
Steve Rogerson <steve.pg@yewtc.demon.co.uk> writes: > I'm seeing a problem with the query planner not doing what's expected, and I > think it is because we are using composite fields. Here is a stripped down > example. > create type type1 as ( part1 varchar, part2 varchar); > create table table1 (field1 type1, field2 varchar); > create function get_part1(type1) returns varchar as $$ select ($1).part1 $$ > language sql; > create index i1 on table1 (get_part1(field1)); > create index i2 on table1 (field2); I tested this example in HEAD and 9.0.x and didn't see any particular problem with rowcount estimates for the get_part1() expression. You do have to have the i1 index in place when the table is analyzed, else ANALYZE won't collect any stats about the expression. regards, tom lane
On 05/05/13 15:06, Tom Lane wrote: > Steve Rogerson <steve.pg@yewtc.demon.co.uk> writes: >> I'm seeing a problem with the query planner not doing what's expected, and I >> think it is because we are using composite fields. Here is a stripped down >> example. > > I tested this example in HEAD and 9.0.x and didn't see any particular > problem with rowcount estimates for the get_part1() expression. You > do have to have the i1 index in place when the table is analyzed, else > ANALYZE won't collect any stats about the expression. > > regards, tom lane I should have said I am using 9.0.7 - also I have "ANALYZEd" all the relevant tables. Having said that I am having problems re-creating a rich enough example. Steve
On 05/05/13 15:06, Tom Lane wrote: > Steve Rogerson <steve.pg@yewtc.demon.co.uk> writes: >> I'm seeing a problem with the query planner not doing what's expected, and I >> think it is because we are using composite fields. Here is a stripped down >> example. >> create type type1 as ( part1 varchar, part2 varchar); >> create table table1 (field1 type1, field2 varchar); >> create function get_part1(type1) returns varchar as $$ select ($1).part1 $$ >> language sql; >> create index i1 on table1 (get_part1(field1)); >> create index i2 on table1 (field2); > I tested this example in HEAD and 9.0.x and didn't see any particular > problem with rowcount estimates for the get_part1() expression. You > do have to have the i1 index in place when the table is analyzed, else > ANALYZE won't collect any stats about the expression. > > regards, tom lane In the end it turned out to be another issue. As best as I can see, what happened was that if the value of field2 was not in the common values list, then it would use the two indexes, otherwise it would use one index and filter by field2. It takes about 10 times longer to get the two list and "AND" them than doing one and filtering. I think we've found a work around, using a combined index. Having said all that I did discover an unrelated problem with composite fields. Basically it doesn't seem to consult the pg_stats data to decide if to do a seq scan or an index scan. Always saying that it should do an index scan. I'm not quite sure how it could compare the value that is indexed with the pg_stats data, so I guess it either decides it's all less than or all greater than the value you are checking for. Steve