Re: Add support for (Var op Var) clause in extended MCV statistics - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Add support for (Var op Var) clause in extended MCV statistics
Date
Msg-id a8e84732-9cb8-43a5-8386-97f637e6cb18@vondra.me
Whole thread Raw
In response to Re: Add support for (Var op Var) clause in extended MCV statistics  (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>)
List pgsql-hackers
On 8/12/24 17:57, Ilia Evdokimov wrote:
> On 12.8.24 14:53, Tomas Vondra wrote:
> 
>> I agree, and I'm grateful someone picked up the original patch. I'll try
>> to help to keep it moving forward. If the thread gets stuck, feel free
>> to ping me to take a look.
> Good. Thank you!
>>> I started reviewing it and want to suggest some changes to better code:
>>> I think we should consider the case where the expression is not neither
>>> an OpExpr and VarOpVar expression.
>>>
>> Do you have some specific type of clauses in mind? Most of the extended
>> statistics only really handles this type of clauses, so I'm not sure
>> it's feasible to extend that - at least not in this patch.
> 
> I agree with Alena that we need to consider the following clauses: (Expr
> op Var), (Var op Expr) and (Expr op Expr). And we need to return false
> in these cases because we did it before my patch in
> 
>         /* Check if the expression has the right shape */
>         if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
>             return false;
> 
> In is_opclause_var_op_var() function it is really useless local Node
> *expr_left, *expr_right variables. However, we can't assign them NULL at
> the begin because if I passed not-null pointers I have to return the
> values. Otherwise remain them NULL.
> 
> Nevertheless, thank you for review, Alena.
> 

Ah, right. I agree we should handle clauses with expressions.

I don't recall why I wrote is_opclause_var_op_var() like this, but I
believe this was before we allowed extended statistics on expressions
(which was added in 2021, the patch is from 2020). I don't see why it
could not return expressions, but I haven't tried.

>>> Have you tested this code with any benchmarks?
>>>
>> FWIW I think we need to test two things - that it (a) improves the
>> estimates and (b) does not have significant overhead.
> Yes, but only TPC-B. And the performance did not drop. In general, it'd
> be better to do more tests and those listed by Tomas with new attached
> patch.

Is TPC-B really interesting/useful for this patch? The queries are super
simple, with only a single clause (so it may not even get to the code
handling extended statistics). Did you create any extended stats?

I think you'll need to construct a custom test, with queries that have
multiple (var op var) clauses, extended stats created, etc. And
benchmark that.

FWIW I don't think it makes sense to benchmark the query execution - if
the estimate improves, it's possible to get arbitrary speedup, but
that's expected and mostly mostly irrelevant I think.

What I'd focus on is benchmarking just the query planning - we need the
overhead to be negligible (or at least small) so that it does not hurt
people with already good plans.

BTW can you elaborate why you are interested in this patch? Do you just
think it's interesting/useful, or do you have a workload where it would
actually help? I'm asking because me being uncertain how beneficial this
is in practice (not just being nice in theory) was one of the reasons
why I didn't do more work on this in 2021.


regards

-- 
Tomas Vondra



pgsql-hackers by date:

Previous
From: Ilia Evdokimov
Date:
Subject: Re: Add support for (Var op Var) clause in extended MCV statistics
Next
From: Robert Haas
Date:
Subject: Re: [patch] Imporve pqmq