Re: Showing applied extended statistics in explain Part 2 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Showing applied extended statistics in explain Part 2 |
Date | |
Msg-id | 6e44ed21-2c45-4069-be36-16bdb23e2fd5@vondra.me Whole thread Raw |
In response to | Re: Showing applied extended statistics in explain Part 2 (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>) |
List | pgsql-hackers |
On 11/18/24 13:52, Ilia Evdokimov wrote: > Hi everyone! > > Thank you for your work. > > 1) While exploring extended statistics, I encountered a bug that occurs > when using EXPLAIN (STATS) with queries containing OR conditions: > > CREATE TABLE t (a int, b int, c int, d int); > INSERT INTO t SELECT x/10+1, x, x + 10, x * 2 FROM > generate_series(1,10000) g(x); > CREATE STATISTICS ON a, b FROM t; > CREATE STATISTICS ON c, d FROM t; > ANALYZE; > > The following query works as expected: > > EXPLAIN (STATS) SELECT * FROM t WHERE a > 0 AND b > 0 AND c > 0 AND d > 0; > QUERY PLAN > ---------------------------------------------------------------- > Seq Scan on t (cost=0.00..255.00 rows=10000 width=16) > Filter: ((a > 0) AND (b > 0) AND (c > 0) AND (d > 0)) > Ext Stats: public.t_a_b_stat Clauses: ((a > 0) AND (b > 0)) > Ext Stats: public.t_c_d_stat Clauses: ((c > 0) AND (d > 0)) > (4 rows) > > However, when using OR conditions, the following query results in an error: > > EXPLAIN (ANALYZE, STATS) SELECT * FROM t WHERE a > 0 AND b > 0 OR c > 0 > AND d > 0; > ERROR: unrecognized node type: 314 > I believe this is the issue I mentioned when I first posted the original version of this patch: > 2) The deparsing is modeled (i.e. copied) from how we deal with index > quals, but it's having issues with nested OR clauses, because there > are nested RestrictInfo nodes and the deparsing does not expect that. In other words, the AND-clauses happens to be parsed like this: BoolExpr (boolop=and) RestrictInfo (clause=OpExpr, ...) RestrictInfo (clause=OpExpr, ...) And the deparse_expression() machinery does not expect RI nodes, which is where the error message comes from. An obvious solution would be to extend get_rule_expr() like this: case T_RestrictInfo: get_rule_expr((Node *) ((RestrictInfo *) node)->clause, context, showimplicit); break; But I think this would be wrong - AFAIK ruleutils.c is meant to handle these nodes. As the ruleutils.c header says: Functions to convert stored expressions/querytrees back to source text And RestrictInfo surely is not meant to be stored anywhere - it's a runtime only node, caching some data. So I think the correct solution is to not pass any expressions with RestrictInfo to deparse_expression(). Either by stripping the nodes, or by not adding them at all. The patch tries to do the stripping by maybe_extract_actual_clauses(), but that only looks at the top node, and that is not sufficient here. Maybe it would be possible to walk the whole tree, and remove all the RestrictInfos nodes - including intermediate ones, not just the top. But I'm not quite sure it wouldn't cause issues elsewhere (assuming it modifies the existing nodes). It still feels a bit like fixing a problem we shouldn't really have ... The only alternative approach I can think of is to make sure we never add any RestrictInfo nodes in these lists. But we build them for selectivity estimation, and the RestrictInfo is meant for that. So I'm a bit unsure what to do about this :-( In any case, I think this shows the patch needs more tests. > 2) It would be great if the STATS flag appeared as an option when > pressing Tab during query input in the psql command-line interface. > True. Tab autocomplete would be nice. regards -- Tomas Vondra
pgsql-hackers by date: