Home > mailing lists

Re: [HACKERS] multivariate statistics (v25) - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: [HACKERS] multivariate statistics (v25)
Date	March 14, 2017 16:59:57
Msg-id	CAKJS1f80E=Vh+phdKNpLs82yaaZJuO+_-ABZTGBvZGs39ndXQw@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] multivariate statistics (v25) (David Rowley <david.rowley@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On 13 March 2017 at 23:00, David Rowley <david.rowley@2ndquadrant.com> wrote:

0003:

No more time today. Will try and get to those soon.

0003:

I've now read this patch. My main aim here was to learn what it does and how it works. I need to spend much longer understanding how your calculating the functional dependencies.

In the meantime I've pasted the notes I took while reading over the patch.

+ default:

+ elog(ERROR, "unexcpected statistics type requested: %d", type);

"unexpected", but we generally use "unknown".

@@ -1293,7 +1294,8 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)

info->rel = rel;

/* built/available statistics */

- info->ndist_built = true;

+ info->ndist_built = stats_are_built(htup, STATS_EXT_NDISTINCT);

+ info->deps_built = stats_are_built(htup, STATS_EXT_DEPENDENCIES);

I don't really like how this function is shaping up. You're calling stats_are_built() potentially twice for each stats type. There must be a nicer way to do this. Are non-built stats common enough to optimize building a StatisticExtInfo regardless and throwing it away if it happens to be useless?

Can you also rename mvoid to become something more esoid or similar. I seem to always read it as m-void instead of mv-oid and naturally I expect a void pointer rather than an Oid.

+dependencies, and for each one count the number of rows rows consistent it.

duplicate word "rows"

+Apllying the functional dependencies is fairly simple - given a list of

Applying

+In this case the default estimation based on AVIA principle happens to work

hmm, maybe I should know what AVIA principles are, but I don't. Is there something I should be reading? I searched a bit around the internet for a few minutes it didn't seem have a great idea either.

2017

+ Assert(tmp <= ((char *) output + len));

Shouldn't you just Assert(tmp == ((char *) output + len)); at the end of the loop?

+ if (dependencies->magic != STATS_DEPS_MAGIC)

+ elog(ERROR, "invalid dependency magic %d (expected %dd)",

+ dependencies->magic, STATS_DEPS_MAGIC);

+ if (dependencies->type != STATS_DEPS_TYPE_BASIC)

+ elog(ERROR, "invalid dependency type %d (expected %dd)",

+ dependencies->type, STATS_DEPS_TYPE_BASIC);

%dd ?

+ Assert(dependencies->ndeps > 0);

Why Assert() and not elog() ? Wouldn't think mean that a corrupt dependency could fail an Assert

+ dependencies = (MVDependencies) palloc0(sizeof(MVDependenciesData));

Why palloc0() and not palloc()?

Can you not just read it into a variable on the stack, then check the exact size using tempdeps.ndeps * sizeof(MVDependency), then memcpy() it over? That'll save you the realloc()

+ /* what minimum bytea size do we expect for those parameters */

+ expected_size = offsetof(MVDependenciesData, deps) +

+ dependencies->ndeps * (offsetof(MVDependencyData, attributes) +

+ sizeof(AttrNumber) * 2);

Can't quite make sense of this yet. Why * 2?

+ /* is the number of attributes valid? */

+ Assert((k >= 2) && (k <= STATS_MAX_DIMENSIONS));

Seems like a bad idea to Assert() this. Wouldn't some bad data being deserialized cause an Assert failure?

+ d = (MVDependency) palloc0(offsetof(MVDependencyData, attributes) +

+ (k * sizeof(AttrNumber)));

Why palloc0(), you seem to write out all the fields right away. Seems like a waste to zero the memory.

+ /* still within the bytea */

+ Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));

Any point? You're already Asserting that you've consumed the entire array at the end anyway.

+ appendStringInfoString(&str, "[");

appendStringInfoChar(&str. '['); would be better.

+ ret = pstrdup(str.data);

ret = pnstrdup(str.data, str.len);

+CREATE STATISTICS s1 WITH (dependencies) ON (a,a) FROM functional_dependencies;

+ERROR: duplicate column name in statistics definition

Is it worth mentioning which column here?

I'll try to spend more time understanding 0003 soon.

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 14 March 2017, 16:25:38
Subject: Re: [HACKERS] [PATCH] SortSupport for macaddr type

From: DEV_OPS
Date: 14 March 2017, 17:00:07
Subject: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflicttracking in serializable transactions

Re: [HACKERS] multivariate statistics (v25) - Mailing list pgsql-hackers

Previous

Next