Home > mailing lists

Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From	Dean Rasheed
Subject	Re: PoC/WIP: Extended statistics on expressions
Date	March 17, 2021 20:58:22
Msg-id	CAEZATCXcX0gm2LMb+xObroS-C+a5FXEqgR5RgQ9NWKjPC=9sNA@mail.gmail.com Whole thread
In response to	Re: PoC/WIP: Extended statistics on expressions (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses	Re: PoC/WIP: Extended statistics on expressions
List	pgsql-hackers

Tree view

On Wed, 17 Mar 2021 at 20:48, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> For reference, here is the test case I was using (which isn't really very good for
> catching dependence between columns):
>

And here's a test case with much more dependence between the columns:

DROP TABLE IF EXISTS foo;
CREATE TABLE foo (a int, b int, c int, d int);
INSERT INTO foo SELECT x%2, x%5, x%10, x%15 FROM generate_series(1,100000) x;
SELECT COUNT(DISTINCT a) FROM foo; -- 2
SELECT COUNT(DISTINCT b) FROM foo; -- 5
SELECT COUNT(DISTINCT c) FROM foo; -- 10
SELECT COUNT(DISTINCT d) FROM foo; -- 15
SELECT COUNT(DISTINCT (a+b)) FROM foo; -- 6
SELECT COUNT(DISTINCT (c+d)) FROM foo; -- 20
SELECT COUNT(DISTINCT ((a+b),c)) FROM foo; -- 10
SELECT COUNT(DISTINCT ((a+b),(c+d))) FROM foo; -- 30

-- First case: stats on [(a+b),c]
CREATE STATISTICS s1(ndistinct) ON (a+b),c FROM foo;
ANALYSE foo;
EXPLAIN ANALYSE
SELECT (a+b), (c+d) FROM foo GROUP BY (a+b), (c+d);
  -- Estimate = 150, Actual = 30
  -- This estimate is ndistinct((a+b),c) * ndistinct(d) = 10*15,
  -- which is much better than ndistinct((a+b)) * ndistinct(c) *
ndistinct(d) = 6*10*15 = 900
  -- Estimate with no stats = 1500

-- Second case: stats on (c+d) as well
CREATE STATISTICS s2 ON (c+d) FROM foo;
ANALYSE foo;
EXPLAIN ANALYSE
SELECT (a+b), (c+d) FROM foo GROUP BY (a+b), (c+d);
  -- Estimate = 120, Actual = 30
  -- This estimate is ndistinct((a+b)) * ndistinct((c+d)) = 6*20

Again, I'd say the current behaviour is pretty good.

Regards,
Dean

pgsql-hackers by date:

From: John Naylor
Date: 17 March 2021, 20:55:02
Subject: Re: WIP: BRIN multi-range indexes

From: Tomas Vondra
Date: 17 March 2021, 21:30:59
Subject: Re: PoC/WIP: Extended statistics on expressions

Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

Previous

Next