Home > mailing lists

Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers

From	Mark Dilger
Subject	Re: Use extended statistics to estimate (Var op Var) clauses
Date	August 11, 2021 22:02:29
Msg-id	6F6757E6-73F1-44CC-9046-FF59DF37B16E@enterprisedb.com Whole thread Raw
In response to	Re: Use extended statistics to estimate (Var op Var) clauses (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: Use extended statistics to estimate (Var op Var) clauses
List	pgsql-hackers

Tree view

> On Aug 11, 2021, at 10:38 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> So I'm a bit puzzled about the claim that random data make the problems more extreme. Can you explain?

Hmm... you appear to be right.

I changed the gentest.pl script to fill the tables with randomized data, but the random data is being regenerated each
testrun (since the calls to random() are in the gentest.sql file). Adding an explicit setseed() call in the test to
makesure the data is the same before and after applying your patch eliminates the differences.

So there are three tests here. The first tests deterministic orderly data. The second tests deterministic random data
withoutrepeats and hence without meaningful mvc. The third tests deterministic random data with rounding into twenty
bucketsskewed towards lower numbered buckets and hence with both repeats and meaningful mvc.

The original test set:

TOTAL:
better: 77827
worse: 12317

The random test set, with setseed() calls to make it deterministic:

TOTAL:
better: 49708
worse: 19393

The random test set , with setseed() calls to make it deterministic plus rounding into buckets:

TOTAL:
better: 81764
worse: 19594

Once the data is made deterministic, the third set looks slightly better than the first, rather than slightly worse.
Butalmost 20% of the query types still look worse after applying the patch. I'm going to dig deeper into those to see
ifthat conclusion survives bumping up the size of the dataset. It will take quite some time to run the tests with a
hugedataset, but I don't see how else to investigate this.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Melanie Plageman
Date: 11 August 2021, 22:00:40
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Thomas Munro
Date: 11 August 2021, 22:24:09
Subject: Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?

Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers

Previous

Next