Thread: Re: pg_stat_advisor extension

Re: pg_stat_advisor extension

From
Ilia Evdokimov
Date:
Hi hackers,


I've encountered and addressed errors in the 
"0001-pg_stat_advisor-extension.patch" when applying it to the main 
branch, specifically trailing whitespace issues at lines 117 and 118:

```
0001-pg_stat_advisor-extension.patch:117: trailing whitespace.
                                                    QUERY PLAN
0001-pg_stat_advisor-extension.patch:118: trailing whitespace.

warning: 2 lines add whitespace errors.

```

An updated patch is attached for review


I welcome your insights, feedback, and evaluations regarding the 
necessity of integrating this new extension into PostgreSQL.


Kind regards,

Ilia Evdokimov,

Tantor Labs LLC.

Attachment

Re: pg_stat_advisor extension

From
Ilia Evdokimov
Date:
Dear Team,

Firstly, I would like to extend my sincere apologies for the confusion 
and technical oversights in our previous discussions regarding the 
'pg_stat_advisor extension'. To address this and facilitate a clearer, 
more focused dialogue, I have initiated a new thread to consolidate our 
discussions on this matter.

For context, our previous conversation can be found here: 
https://www.postgresql.org/message-id/flat/4681151706615977%40mail.yandex.ru.

The extension 'pg_stat_advisor' extension is architected to optimize 
query plan. It operates by suggesting when to create extended 
statistics, particularly in queries where current selectivity estimates 
fall short. This is achieved through the GUC parameter 
'pg_stat_advisor.suggest_statistics_threshold', which assesses the ratio 
of total tuples compared to the planned rows. This feature is 
instrumental in identifying scenarios where the planner's estimates 
could be optimized.

You can install the extension by:

```

LOAD 'pg_stat_advisor'

SET pg_stat_advisor.suggest_statistics_threshold = 1.0;

```


Example:

```

EXPLAIN ANALYZE SELECT * FROM t WHERE i = 100 AND j = 10;

NOTICE: pg_stat_advisor suggestion: CREATE STATISTICS t_i_j ON i, j FROM t

                                                         QUERY PLAN

--------------------------------------------------------------------------------------------------------

```

After EXPLAIN ANALYZE command you can see the message of suggestion 
creating statistics with name 't_i_j' on 'i', 'j' columns from 't' table.

Thank you for your understanding, patience, and continued support.



Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



Re: pg_stat_advisor extension

From
Andrei Lepikhov
Date:
On 6/2/2024 22:27, Ilia Evdokimov wrote:
> 
> I welcome your insights, feedback, and evaluations regarding the 
> necessity of integrating this new extension into PostgreSQL.
Besides other issues that were immediately raised during the discovery 
of the extension, Let me emphasize two issues:
1. In the case of parallel workers the plan_rows value has a different 
semantics than the number of rows predicted. Just explore 
get_parallel_divisor().
2. The extension recommends new statistics immediately upon an error 
finding. But what if the reason for the error is stale statistics? Or 
this error may be raised for only one specific set of constants, and 
estimation will be done well in another 99.9999% of cases for the same 
expression.

According to No.2, it might make sense to collect and track clause 
combinations and cardinality errors found and let the DBA make decisions 
on their own.

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: pg_stat_advisor extension

From
Ilia Evdokimov
Date:

On Feb 8 2024 at 00:00:00 jian he

>INT MAX

>should be 1.0?

I don’t know why Konstantin Knizhnik used the ratio of actual tuples to the planned ones, but most who start testing my extension expect that it will be a coefficient from 0 to 1, which will be the ratio of the estimated tuples to the actual ones. Therefore, I changed the value of this coefficient the other way around and now the value can be from 0 to 1. The patch with changes has been attached.


> now CREATE STATISTICS, the statistics name is optional

I constructed the name of the statistics so that the user could copy the line with 'CREATE STATISTICS' with the mouse and execute this command faster. But if the user wants ITS name, he can do it manually.


> here you can explicitly mention the statistics kind would be great

I agree with you. That would be my next step. That's why I'm doing it now.


> Also since the documentation is limited, more comments explaining SuggestMultiColumnStatisticsForNode would be great.

> overall the comments are very little, it should be more (that's my opinion).

Yes, certainly. I'll do it in the next patch.

I'm looking forward to your thoughts and feedback.

Regards,

Ilia Evdokimov,

Tantor Labs LLC.

Attachment

Re: pg_stat_advisor extension

From
Ilia Evdokimov
Date:
On Feb 08 2024 at 07:14:18, Andrei Lepikhov wrote:

> 1. In the case of parallel workers the plan_rows value has a different
> semantics than the number of rows predicted. Just explore
> get_parallel_divisor().
Yes, this is a very weighty and important issue. I need to think about this very carefully.
>2. The extension recommends new statistics immediately upon an error
>finding. But what if the reason for the error is stale statistics? Or
>this error may be raised for only one specific set of constants, and
>estimation will be done well in another 99.9999% of cases for the same
>expression.
>According to No.2, it might make sense to collect and track clause
>combinations and cardinality errors found and let the DBA make decisions
>on their own.
Your proposal is very interesting. In my opinion, it is worth considering updating the extended statistics if they are truly stale. And write about this in a separate message with suggestion updating statistics.
If I succeed, then in the next patch I will add the kind of extended statistics to the message, deal with the parallel workers and update statistics if necessary.
If you have additional suggestions and thoughts, feel free to write them in this thread.


Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: pg_stat_advisor extension

From
Ilia Evdokimov
Date:

>1. In the case of parallel workers the plan_rows value has a different semantics than the number of rows predicted. Just explore get_parallel_divisor().

>2. The extension recommends new statistics immediately upon an error finding. But what if the reason for the error is stale statistics? Or this error may be raised for only one specific set of constants, and estimation will be done well in another 99.9999% of cases for the same expression.

The new parameter, `pg_stat_advisor.analyze_scale_factor`, can suggest the execution of the ANALYZE command on specific tables. The extension now evaluates the ratio of `n_live_tup` (number of live tuples) to `n_mod_since_analyze` (number of modifications since last analyze) in the `pg_stat_all_tables` catalog. If this ratio exceeds the value specified in `analyze_scale_factor`, the extension will suggest an update to the table's statistics.

There are a lot of parameters that influences on estimated rows. Statistics might not help improve estimated rows. This feature is designed to provide users with data-driven insights to decide whether updating statistics via the ANALYZE command could potentially improve query performance. By suggesting rather than automatically executing statistics updates, we empower you to make informed decisions based on the specific needs and conditions of your database environment.

I've developed an extension that provides suggestions on whether to update or create statistics for your PostgreSQL database, without executing any changes. This approach allows you to consider various parameters that influence row estimates and make informed decisions about optimizing your database's performance.

Your feedback is invaluable, and we look forward to hearing about your experiences and any improvements you might suggest. Best regards, Ilia Evdokimov Tantor Labs LLC.

Attachment