Re: multivariate statistics (v19) - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: multivariate statistics (v19)
Date
Msg-id CAB7nPqSJyXmMregq2EfHMcj9hvnHAQwHw2zamz3+UkFevh5r4g@mail.gmail.com
Whole thread Raw
In response to Re: multivariate statistics (v19)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Thu, Aug 11, 2016 at 3:34 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 08/10/2016 02:23 PM, Michael Paquier wrote:
>>
>> On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>> The idea is that the syntax should work even for statistics built on
>>> multiple tables, e.g. to provide better statistics for joins. That's why
>>> the
>>> schema may be specified (as each table might be in different schema), and
>>> so
>>> on.
>>
>>
>> So you mean that the same statistics could be shared between tables?
>> But as this is visibly not a concept introduced yet in this set of
>> patches, why not just cut it off for now to simplify the whole? If
>> there is no schema-related field in pg_mv_statistics we could still
>> add it later if it proves to be useful.
>>
>
> Yes, I think creating statistics on multiple tables is one of the possible
> future directions. One of the previous patch versions introduced ALTER TABLE
> ... ADD STATISTICS syntax, but that ran into issues in gram.y, and given the
> multi-table possibilities the CREATE STATISTICS seems like a much better
> idea anyway.
>
> But I guess you're right we may make this a bit more strict now, and relax
> it in the future if needed. For example as we only support single-table
> statistics at this point, we may remove the schema and always create the
> statistics in the schema of the table.

This would simplify the code the code a bit so I'd suggest removing
that from the first shot. If there is demand for it, keeping the
infrastructure open for this extension is what we had better do.

> But I don't think we should make the statistics names unique only within a
> table (instead of within the schema).

They could be made unique using (name, table_oid, column_list).

>>>> There is a lot of mumbo-jumbo regarding the way dependencies are
>>>> stored with mainly serialize_mv_dependencies and
>>>> deserialize_mv_dependencies that operates them from bytea/dep trees.
>>>> That's not cool and not portable because pg_mv_statistic represents
>>>> that as pure bytea. I would suggest creating a generic data type that
>>>> does those operations, named like pg_dependency_tree and then use that
>>>> in those new catalogs. pg_node_tree is a precedent of such a thing.
>>>> New features could as well make use of this new data type of we are
>>>> able to design that in a way generic enough, so that would be a base
>>>> patch that the current 0002 applies on top of.
>>>
>>>
>>>
>>> Interesting idea, haven't thought about that. So are you suggesting to
>>> add a
>>> data type for each statistics type (dependencies, MCV, histogram, ...)?
>>
>>
>> Yes that would be something like that, it would be actually perhaps
>> better to have one single data type, and be able to switch between
>> each model easily instead of putting byteas in the catalog.
>
> Hmmm, not sure about that. For example what about combinations of statistics
> - e.g. when we have MCV list on the most common values and a histogram on
> the rest? Should we store both as a single value, or would that be in two
> separate values, or what?

The same statistics can combine two different things, using different
columns may depend on how readable things get...
Btw, for the format we could get inspired from pg_node_tree, with pg_stat_tree:
{HISTOGRAM :arg {BUCKET :index 0 :minvals ... }}
{DEPENDENCY :arg {:elt "a => c" ...} ... }
{MVC :arg {:index 0 :values {0,0} ... } ... }
Please consider that as a tentative idea to make things more friendly.
Others may have a different opinion on the matter.
-- 
Michael



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Heap WARM Tuples - Design Draft
Next
From: Venkata Balaji N
Date:
Subject: Re: Surprising behaviour of \set AUTOCOMMIT ON