Home > mailing lists

Re: pg_upgrade and statistics - Mailing list pgsql-hackers

From	Ants Aasma
Subject	Re: pg_upgrade and statistics
Date	March 16, 2012 04:36:25
Msg-id	CA+CSw_tEYivSXMLqr7j0S298Mgg1FciLVfnSd=DgTEHt-UnDNQ@mail.gmail.com Whole thread Raw
In response to	Re: pg_upgrade and statistics (Alvaro Herrera <alvherre@commandprompt.com>)
List	pgsql-hackers

Tree view

On Thu, Mar 15, 2012 at 8:48 PM, Alvaro Herrera <alvherre@commandprompt.com>
> What Peter proposed seems to me pretty reasonable, in the sense that it
> should be possible to come up with a function that creates some text
> representation of whatever is in pg_statistic, and another function to
> load that data into the new catalog(s).  There's no need to keep
> pg_statistic binary-compatible, or even continue to have only
> pg_statistic (IIRC Zoltan/Hans-Jurgen patch for cross-column stats adds
> a new catalog, pg_statistic2 or similar).  If the upgrade target release
> has room for more/improved stats, that's fine -- they'll be unused after
> loading the stats from the dump.  And the old stats can be
> reacommodated, if necessary.

I have been reading up on selectivity estimation research for the last
few days. I must say that I also think that having a text
representation as an intermediate won't create a huge maintenance
burden. The basic concepts that are there are pretty solid.
Conceptually MCV's and histograms continue to be essential even with
the more complex approaches. Trying to maintain binary compatibility
is probably a bad idea, as Tom noted with the array selectivity patch
- encoding of the information could be better. But given a textual
format it won't be too hard to just massage the data to the new
format. Making it possible to dump and load stats has the additional
bonus of enabling more experimentation with custom stats collectors.
One could easily prototype the stats collection with R, scipy, etc. Of
course the proof will be in the pudding.

Re, the patch, current posted WIP cross-col patch doesn't create a new
catalog,. It repurposes the stat slots mechanism to store multiple
dimensions. But I'll most likely rewrite it to use a separate catalog
because the storage requirements are rather different. I'll post a
proposal in the appropriate thread when I have decently clear idea how
this should work. One thing that seems clear is that multi-dimensional
histograms will want this mechanism even more, optimal histogram
construction is NP-hard in the multi-dimensional case and so people
will want to try different algorithms, or make different tradeoffs on
effort spent on constructing the histogram. Or even build one by hand.

Cheers,
Ants Aasma

pgsql-hackers by date:

From: Daniel Farina
Date: 16 March 2012, 03:02:12
Subject: Re: pg_terminate_backend for same-role

From: Yeb Havinga
Date: 16 March 2012, 04:44:48
Subject: Re: [v9.2] Add GUC sepgsql.client_label

Re: pg_upgrade and statistics - Mailing list pgsql-hackers

Previous

Next