Re: Allow to collect statistics on virtual generated columns - Mailing list pgsql-hackers
From | Yugo Nagata |
---|---|
Subject | Re: Allow to collect statistics on virtual generated columns |
Date | |
Msg-id | 20250902163341.c174d0e83eedfc54d68b8e9c@sraoss.co.jp Whole thread Raw |
In response to | Re: Allow to collect statistics on virtual generated columns (Yugo Nagata <nagata@sraoss.co.jp>) |
List | pgsql-hackers |
On Wed, 20 Aug 2025 14:10:28 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Fri, 8 Aug 2025 12:21:25 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > On Fri, 1 Aug 2025 00:28:30 +0900 > > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > Hi, > > > > > > On Tue, 24 Jun 2025 17:05:33 +0900 > > > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > > > Instead, I'm thinking of an alternative approach: expanding the expression > > > > at the time statistics are collected. > > > > > > I've attached a new patch in this approache. > > > > > > This allows to collect statistics on virtual generated columns. > > > > > > During ANALYZE, generation expressions are expanded, and statistics are computed > > > using compute_expr_stats(). To support this, both compute_expr_stats() and AnlExprData > > > are now exported from extended_stats.c. However, since they are no longer specific > > > to extended statistics, it might be better to move them to analyze.c and vacuum.h. > > > > > > To enable the optimizer to make use of these statistics, a new field named > > > virtual_gencols is added to RelOptInfo. This field holds the expressions of > > > virtual generated columns in the table. In examine_variable(), if an expression > > > in a WHERE clause matches a virtual generated column, the corresponding statistics > > > are used for that expression. > > > > > > Example: > > > > > > - Before applying the patch, the cardinality estimate is erroneous. > > > > > > test=# create table t (i int, j int generated always as (i*10) virtual); > > > CREATE TABLE > > > test=# insert into t select generate_series(1,1000); > > > INSERT 0 1000 > > > test=# insert into t select 1 from generate_series(1,1000); > > > INSERT 0 1000 > > > test=# analyze t; > > > ANALYZE > > > test=# explain analyze select * from t where j = 10; > > > QUERY PLAN > > > ------------------------------------------------------------------------------------------------- > > > Seq Scan on t (cost=0.00..0.02 rows=1 width=8) (actual time=0.031..0.806 rows=1001.00 loops=1) > > > Filter: ((i * 10) = 10) > > > Rows Removed by Filter: 999 > > > Buffers: shared hit=9 > > > Planning: > > > Buffers: shared hit=10 > > > Planning Time: 0.299 ms > > > Execution Time: 0.948 ms > > > (8 rows) > > > > > > > > > - After applying the patch, the cardinality estimate is correct. > > > > > > test=# analyze t; > > > ANALYZE > > > test=# explain analyze select * from t where j = 10; > > > QUERY PLAN > > > ----------------------------------------------------------------------------------------------------- > > > Seq Scan on t (cost=0.00..41.50 rows=1001 width=8) (actual time=0.034..0.871 rows=1001.00 loops=1) > > > Filter: ((i * 10) = 10) > > > Rows Removed by Filter: 999 > > > Buffers: shared hit=9 > > > Planning: > > > Buffers: shared hit=6 > > > Planning Time: 0.374 ms > > > Execution Time: 1.028 ms > > > (8 rows) > > > > > > > > > Note that the patch is still a work in progress, so documentation and tests are not included. > > > > I've attached an updated patch. > > > > I modified the documentation to remove the statement that virtual generated columns > > do not have statistics. > > > > In addition, I added a test to ensure that statistics on virtual generated columns > > are available. I've attached a rebased patch. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
pgsql-hackers by date: