Re: Showing applied extended statistics in explain Part 2 - Mailing list pgsql-hackers

From Tatsuro Yamada
Subject Re: Showing applied extended statistics in explain Part 2
Date
Msg-id CAOKkKFtaevmhnNxKenXrv+7KRYUOu=oVSjreeuP5Smg5FzqGOw@mail.gmail.com
Whole thread Raw
In response to Re: Showing applied extended statistics in explain Part 2  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Showing applied extended statistics in explain Part 2
Re: Showing applied extended statistics in explain Part 2
List pgsql-hackers
Hi Tomas!

Thanks for the comments!

1) The patch is not added to the CF app, which I think is a mistake. Can
you please add it to the 2024-07 commitfest? Otherwise people may not be
aware of it, won't do reviews etc. It'll require posting a rebased
patch, but should not be a big deal.

I added the patch to the 2024-07 commitfest today.


2) Not having the patch in a CF also means cfbot is not running tests on
it. Which is unfortunate, because the patch actually has an a bug cfbot
would find - I've noticed it after running the tests through the github
CI, see [2].
3) The bug has this symptom:
  ERROR:  unrecognized node type: 268
  CONTEXT:  PL/pgSQL function check_estimated_rows(text) line 7 ...
  STATEMENT:  SELECT * FROM check_estimated_rows('SELECT a, b FROM ...
4) I can think of two basic ways to fix this issue - either allow
copying of the StatisticExtInto node, or represent the information in a
different way (e.g. add a new node for that purpose, or use existing
nodes to do that).

Thanks for the info. I'll investigate using cfbot.
To fix the problem, I understand we need to create a new struct like 
(statistics OID, list of clauses, flag is_or). 
 

5) In [3] Tom raised two possible issues with doing this - cost of
copying the information, and locking. For the performance concerns, I
think the first thing we should do is measuring how expensive it is. I
suggest measuring the overhead for about three basic cases:

Okay, I'll measure it once the patch is completed and check the overhead.
I read [3][4] and in my opinion I agree with Robert.
As with indexes, there should be a mechanism for determining whether 
extended statistics are used or not. If it were available, users would be able to 
tune using extended statistics and get better execution plans.

 
6) I'm not sure we want to have this under EXPLAIN (VERBOSE). It's what
I did in the initial PoC patch, but maybe we should invent a new flag
for this purpose, otherwise VERBOSE will cover too much stuff? I'm
thinking about "STATS" for example.

This would probably mean the patch should also add a new auto_explain
"log_" flag to enable/disable this.

I thought it might be better to do this, so I'll fix it.

 
7) The patch really needs some docs - I'd mention this in the EXPLAIN
docs, probably. There's also a chapter about estimates, maybe that
should mention this too? Try searching for places in the SGML docs
mentioning extended stats and/or explain, I guess.

I plan to create documentation after the specifications are finalized.

 
For tests, I guess stats_ext is the right place to test this. I'm not
sure what's the best way to do this, though. If it's covered by VERBOSE,
that seems it might be unstable - and that would be an issue. But maybe
we might add a function similar to check_estimated_rows(), but verifying
 the query used the expected statistics to estimate expected clauses.

As for testing, I think it's more convenient for reviewers to include it in the patch, 
so I'm thinking of including it in the next patch.
 


So there's stuff to do to make this committable, but hopefully this
review gives you some guidance regarding what/how ;-)

Thank you! It helps me a lot!

The attached patch does not correspond to the above comment.
But it does solve some of the issues mentioned in previous threads.

The next patch is planned to include:
6) Add stats option to explain command
8) Add regression test (stats_ext.sql)
4) Add new node (resolve errors in cfbot and prepared statement)

Regards,
Tatsuro Yamada

 
[1]
https://www.postgresql.org/message-id/TYYPR01MB82310B308BA8770838F681619E5E2%40TYYPR01MB8231.jpnprd01.prod.outlook.com

[2] https://cirrus-ci.com/build/6436352672137216

[3] https://www.postgresql.org/message-id/459863.1627419001%40sss.pgh.pa.us

[4]
https://www.postgresql.org/message-id/CA%2BTgmoZU34zo4%3DhyqgLH16iGpHQ6%2BQAesp7k5a1cfZB%3D%2B9xtsw%40mail.gmail.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Conflict Detection and Resolution
Next
From: Peter Smith
Date:
Subject: Re: Logical Replication of sequences