Thread: [GENERAL] Planner statistics usage for composite type

[GENERAL] Planner statistics usage for composite type

From
Dmitry Lazurkin
Date:
Hello.

I am trying to find workaround for cross-column statistics. For example,
I have tags with similarity:
select version();
                                                      version
-------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10beta2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit

CREATE TABLE tags(
       t varchar(30) NOT NULL,
       s double precision
);
INSERT INTO tags (SELECT 'tag1'::text as t, 0.7 as s from
generate_series(0, 10000));

I think i can create index for cross-column statistics:
CREATE TYPE tag_sim AS (
       t varchar(30),
       s double precision
);
CREATE INDEX tags_composite ON tags USING btree ((ROW(t, s)::tag_sim));
ANALYZE tags;
SELECT * FROM pg_stats WHERE tablename = 'tags_composite';
-[ RECORD 1 ]----------+---------------
schemaname             | public
tablename              | tags_composite
attname                | row
inherited              | f
null_frac              | 0
avg_width              | 40
n_distinct             | 1
most_common_vals       | {"(tag1,0.7)"}
most_common_freqs      | {1}
histogram_bounds       | (null)
correlation            | 1
most_common_elems      | (null)
most_common_elem_freqs | (null)
elem_count_histogram   | (null)

OK, I have statistics. Search:
EXPLAIN SELECT * FROM tags WHERE ROW(t, s)::tag_sim >= ROW('tag1',
0.9)::tag_sim AND
ROW(t, s)::tag_sim <= ROW('tag1', 1.0)::tag_sim;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Index Scan using tags_composite on tags  (cost=0.29..9.29 rows=50 width=13)
   Index Cond: ((ROW(t, s)::tag_sim >= ROW('tag1', 0.9)) AND (ROW(t,
s)::tag_sim <= ROW('tag1', 1.0)))
(2 rows)

Worn estimate. Planner doesn't use statistics. In code I see usage of
function scalargtsel which returns default selectivity because
ROW('tag1', 0.9)::tag_sim is not Const.

May be someone known how to fix this issue?

Thanks.



Re: [GENERAL] Planner statistics usage for composite type

From
Tom Lane
Date:
Dmitry Lazurkin <dilaz03@gmail.com> writes:
> I am trying to find workaround for cross-column statistics.
> ...
> Worn estimate. Planner doesn't use statistics. In code I see usage of
> function scalargtsel which returns default selectivity because
> ROW('tag1', 0.9)::tag_sim is not Const.

Yeah, that's because eval_const_expressions doesn't know how to fold
a constant RowExpr to a simple Const.  I have a patch laying about
someplace to improve that, but I keep forgetting about it until
we're in beta phase :-(.  In the meantime you could perhaps do the
folding by hand:

regression=# EXPLAIN ANALYZE SELECT * FROM tags WHERE ROW(t, s)::tag_sim >= '(tag1,0.9)'::tag_sim AND
ROW(t, s)::tag_sim <= '(tag1,1.0)'::tag_sim;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Index Scan using tags_composite on tags  (cost=0.29..8.30 rows=1 width=13) (actual time=0.051..0.051 rows=0 loops=1)
   Index Cond: ((ROW(t, s)::tag_sim >= '(tag1,0.9)'::tag_sim) AND (ROW(t, s)::tag_sim <= '(tag1,1)'::tag_sim))
 Planning time: 0.230 ms
 Execution time: 0.110 ms
(4 rows)

            regards, tom lane


Re: [GENERAL] Planner statistics usage for composite type

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Dmitry Lazurkin <dilaz03@gmail.com> writes:
> > I am trying to find workaround for cross-column statistics.
> > ...
> > Worn estimate. Planner doesn't use statistics. In code I see usage of
> > function scalargtsel which returns default selectivity because
> > ROW('tag1', 0.9)::tag_sim is not Const.
>
> Yeah, that's because eval_const_expressions doesn't know how to fold
> a constant RowExpr to a simple Const.  I have a patch laying about
> someplace to improve that, but I keep forgetting about it until
> we're in beta phase :-(.  In the meantime you could perhaps do the
> folding by hand:

Put it into the CF..?

Thanks!

Stephen

Attachment

Re: [GENERAL] Planner statistics usage for composite type

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Yeah, that's because eval_const_expressions doesn't know how to fold
>> a constant RowExpr to a simple Const.  I have a patch laying about
>> someplace to improve that, but I keep forgetting about it until
>> we're in beta phase :-(.  In the meantime you could perhaps do the
>> folding by hand:

> Put it into the CF..?

Actually, I did already:
https://commitfest.postgresql.org/14/1136/

            regards, tom lane


Re: [GENERAL] Planner statistics usage for composite type

From
Dmitry Lazurkin
Date:
On 07/19/2017 06:37 PM, Tom Lane wrote:
> Dmitry Lazurkin <dilaz03@gmail.com> writes:
>> I am trying to find workaround for cross-column statistics.
>> ...
>> Worn estimate. Planner doesn't use statistics. In code I see usage of
>> function scalargtsel which returns default selectivity because
>> ROW('tag1', 0.9)::tag_sim is not Const.
> regression=# EXPLAIN ANALYZE SELECT * FROM tags WHERE ROW(t, s)::tag_sim >= '(tag1,0.9)'::tag_sim AND
> ROW(t, s)::tag_sim <= '(tag1,1.0)'::tag_sim;
>                                                       QUERY PLAN

>
----------------------------------------------------------------------------------------------------------------------
>  Index Scan using tags_composite on tags  (cost=0.29..8.30 rows=1 width=13) (actual time=0.051..0.051 rows=0 loops=1)
>    Index Cond: ((ROW(t, s)::tag_sim >= '(tag1,0.9)'::tag_sim) AND (ROW(t, s)::tag_sim <= '(tag1,1)'::tag_sim))
>  Planning time: 0.230 ms
>  Execution time: 0.110 ms
> (4 rows)

Thank you. (: