Thread: Re: pg_stat_advisor extension
Hi hackers, I've encountered and addressed errors in the "0001-pg_stat_advisor-extension.patch" when applying it to the main branch, specifically trailing whitespace issues at lines 117 and 118: ``` 0001-pg_stat_advisor-extension.patch:117: trailing whitespace. QUERY PLAN 0001-pg_stat_advisor-extension.patch:118: trailing whitespace. warning: 2 lines add whitespace errors. ``` An updated patch is attached for review I welcome your insights, feedback, and evaluations regarding the necessity of integrating this new extension into PostgreSQL. Kind regards, Ilia Evdokimov, Tantor Labs LLC.
Attachment
Dear Team, Firstly, I would like to extend my sincere apologies for the confusion and technical oversights in our previous discussions regarding the 'pg_stat_advisor extension'. To address this and facilitate a clearer, more focused dialogue, I have initiated a new thread to consolidate our discussions on this matter. For context, our previous conversation can be found here: https://www.postgresql.org/message-id/flat/4681151706615977%40mail.yandex.ru. The extension 'pg_stat_advisor' extension is architected to optimize query plan. It operates by suggesting when to create extended statistics, particularly in queries where current selectivity estimates fall short. This is achieved through the GUC parameter 'pg_stat_advisor.suggest_statistics_threshold', which assesses the ratio of total tuples compared to the planned rows. This feature is instrumental in identifying scenarios where the planner's estimates could be optimized. You can install the extension by: ``` LOAD 'pg_stat_advisor' SET pg_stat_advisor.suggest_statistics_threshold = 1.0; ``` Example: ``` EXPLAIN ANALYZE SELECT * FROM t WHERE i = 100 AND j = 10; NOTICE: pg_stat_advisor suggestion: CREATE STATISTICS t_i_j ON i, j FROM t QUERY PLAN -------------------------------------------------------------------------------------------------------- ``` After EXPLAIN ANALYZE command you can see the message of suggestion creating statistics with name 't_i_j' on 'i', 'j' columns from 't' table. Thank you for your understanding, patience, and continued support. Best regards, Ilia Evdokimov, Tantor Labs LLC.
On 6/2/2024 22:27, Ilia Evdokimov wrote: > > I welcome your insights, feedback, and evaluations regarding the > necessity of integrating this new extension into PostgreSQL. Besides other issues that were immediately raised during the discovery of the extension, Let me emphasize two issues: 1. In the case of parallel workers the plan_rows value has a different semantics than the number of rows predicted. Just explore get_parallel_divisor(). 2. The extension recommends new statistics immediately upon an error finding. But what if the reason for the error is stale statistics? Or this error may be raised for only one specific set of constants, and estimation will be done well in another 99.9999% of cases for the same expression. According to No.2, it might make sense to collect and track clause combinations and cardinality errors found and let the DBA make decisions on their own. -- regards, Andrei Lepikhov Postgres Professional
On Feb 8 2024 at 00:00:00 jian he
>INT MAX
>should be 1.0?
I don’t know why Konstantin Knizhnik used the ratio of actual tuples to the planned ones, but most who start testing my extension expect that it will be a coefficient from 0 to 1, which will be the ratio of the estimated tuples to the actual ones. Therefore, I changed the value of this coefficient the other way around and now the value can be from 0 to 1. The patch with changes has been attached.
> now CREATE STATISTICS, the statistics name is optional
I constructed the name of the statistics so that the user could copy the line with 'CREATE STATISTICS' with the mouse and execute this command faster. But if the user wants ITS name, he can do it manually.
> here you can explicitly mention the statistics kind would be great
I agree with you. That would be my next step. That's why I'm doing it now.
> Also since the documentation is limited, more comments explaining SuggestMultiColumnStatisticsForNode would be great.
> overall the comments are very little, it should be more (that's my opinion).
Yes, certainly. I'll do it in the next patch.
I'm looking forward to your thoughts and feedback.
Regards,
Ilia Evdokimov,
Tantor Labs LLC.
Attachment
> semantics than the number of rows predicted. Just explore
> get_parallel_divisor().
>finding. But what if the reason for the error is stale statistics? Or
>this error may be raised for only one specific set of constants, and
>estimation will be done well in another 99.9999% of cases for the same
>expression.
>combinations and cardinality errors found and let the DBA make decisions
>on their own.
>1. In the case of parallel workers the plan_rows value has a different semantics than the number of rows predicted. Just explore get_parallel_divisor().
>2. The extension recommends new statistics immediately upon an error finding. But what if the reason for the error is stale statistics? Or this error may be raised for only one specific set of constants, and estimation will be done well in another 99.9999% of cases for the same expression.
The new parameter, `pg_stat_advisor.analyze_scale_factor`, can suggest the execution of the ANALYZE command on specific tables. The extension now evaluates the ratio of `n_live_tup` (number of live tuples) to `n_mod_since_analyze` (number of modifications since last analyze) in the `pg_stat_all_tables` catalog. If this ratio exceeds the value specified in `analyze_scale_factor`, the extension will suggest an update to the table's statistics.
There are a lot of parameters that influences on estimated rows. Statistics might not help improve estimated rows. This feature is designed to provide users with data-driven insights to decide whether updating statistics via the ANALYZE command could potentially improve query performance. By suggesting rather than automatically executing statistics updates, we empower you to make informed decisions based on the specific needs and conditions of your database environment.
I've developed an extension that provides suggestions on whether to update or create statistics for your PostgreSQL database, without executing any changes. This approach allows you to consider various parameters that influence row estimates and make informed decisions about optimizing your database's performance.
Your feedback is invaluable, and we look forward to hearing about your experiences and any improvements you might suggest. Best regards, Ilia Evdokimov Tantor Labs LLC.