Re: Additional improvements to extended statistics - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Additional improvements to extended statistics |
Date | |
Msg-id | 57eb142c-d5a4-cfca-675b-0dfd9ef7c2e2@2ndquadrant.com Whole thread Raw |
In response to | Re: Additional improvements to extended statistics (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Additional improvements to extended statistics
Re: Additional improvements to extended statistics |
List | pgsql-hackers |
Hi, Here is an improved WIP version of the patch series, modified to address the issue with repeatedly applying the extended statistics, as discussed with Dean in this thread. It's a bit rough and not committable, but I need some feedback so I'm posting it in this state. (Note: The WIP patch is expected to fail regression tests. A couple stats_ext regression tests fail due to changed estimate - I've left it like that to make the changes more obvious for now.) Earlier in this thread I used this example: CREATE TABLE t (a int, b int); INSERT INTO t SELECT mod(i, 10), mod(i, 10) FROM generate_series(1,100000) s(i); CREATE STATISTICS s (mcv) ON a,b FROM t; ANALYZE t; EXPLAIN SELECT * FROM t WHERE a = 0 OR b = 0; which had this call graph with two statext_mcv_clauselist_selectivity calls (which was kinda the issue): clauselist_selectivity statext_clauselist_selectivity statext_mcv_clauselist_selectivity <--- (1) clauselist_selectivity_simple clause_selectivity clauselist_selectivity_or statext_clauselist_selectivity statext_mcv_clauselist_selectivity <--- (2) clauselist_selectivity_simple_or clause_selectivity clause_selectivity mcv_clauselist_selectivity clauselist_selectivity_simple_or mcv_clauselist_selectivity clauselist_selectivity_simple (already estimated) with the patches applied, the call looks like this: clauselist_selectivity_internal (use_extended_stats=1) statext_clauselist_selectivity statext_mcv_clauselist_selectivity (is_or=0) clauselist_selectivity_simple clause_selectivity_internal (use_extended_stats=0) clauselist_selectivity_or (use_extended_stats=0) clauselist_selectivity_simple_or clause_selectivity_internal (use_extended_stats=0) clause_selectivity_internal (use_extended_stats=0) mcv_clauselist_selectivity (is_or=0) clauselist_selectivity_simple The nested call is removed, which I think addresses the issue. As for the effect on estimates, there's a couple regression tests where the estimates change - not much though, an example is: SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 10'); estimated | actual -----------+-------- - 412 | 104 + 308 | 104 (1 row) This is on top of 0001, though. Interestingly enough, this ends up with the same estimate as current master, but I consider that a coincidence. As for the patches: 0001 is the original patch improving estimates of OR clauses 0002 adds thin wrappers for clause[list]_selectivity, with "internal" functions allowing to specify whether to keep considering extended stats 0003 does the same for the "simple" functions I've kept it like this to demonstrate that 0002 is not sufficient. In my response from March 24 I wrote this: > Isn't it the case that clauselist_selectivity_simple (and the OR > variant) should ignore extended stats entirely? That is, we'd need > to add a flag (or _simple variant) to clause_selectivity, so that it > calls causelist_selectivity_simple_or. But that's actually wrong, as 0002 shows (as it breaks a couple of regression tests), because of the way we handle OR clauses. At the top level, an OR-clause is actually just a single clause and it may get passed to clauselist_selectivity_simple. So entirely disabling extended stats for the "simple" functions would also mean disabling extended stats for a large number of OR clauses. Which is clearly wrong. So 0003 addresses that, by adding a flag to the two "simple" functions. Ultimately, this should probably do the same thing as 0002 and add thin wrappers, because the existing functions are part of the public API. Dean, does this address the issue you had in mind? Can you come up with an example of that issue in the form of a regression test or something? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: