Thread: Inserting additional data into pg_statistics

Inserting additional data into pg_statistics

From

Teodor Macicas

Date:

19 June 2010, 11:52:00

Hello everyone,

I'm new here, but I have a problem and couldn't find a solution so I
hope you will help me.
I'm trying to reproduce the pg_statistics catalog from one database
server to another one. To retrieve the data from the first server was a
trivial job. The trouble was when I tryied to insert new record to
pg_statistic table on the second server.

To be more precise, I want to run this [0] query and I'm struggling with
this error [1]. The query is generated by pg_dump. Anyways, it will be
the same problem if I write the SQL statement by hand.
So, has anyone any ideas how can I insert new record to pg_statistics
system catalog ?

[0] INSERT INTO pg_statistic
      VALUES (
54655,1,0,26,-1,2,3,0,0,1058,1058,0,0,null,'{1}',null,null,'{"AFRICA
","AMERICA"}',null,null,null );

[1] SQL error:
      ERROR:  cannot accept a value of type anyarray


Thank you a lot.
Looking forward for your replies.
-Tedy

Re: Inserting additional data into pg_statistics

From

Tom Lane

Date:

19 June 2010, 14:38:15

Teodor Macicas <teodor.macicas@epfl.ch> writes:
> I'm trying to reproduce the pg_statistics catalog from one database
> server to another one.

You can't really, and manually fooling with system catalogs is usually
a bad idea anyway.  Just run ANALYZE.

            regards, tom lane

Re: Inserting additional data into pg_statistics

From

Teodor Macicas

Date:

19 June 2010, 14:49:54

Thank you for your reply, Tom.
Why I can't ? And for my purpose is not a bad idea. I mean, I have to do
this and somehow I should find a solution.

In order to use ANALYZE I need the same data on 2nd machine, but the
data is quite large and the only information I need are the statistics
from pg_statistic.
So, why I receive that ERROR and I run the INSERT statement ?

Can anyone tell me where to look into Postgresql source to see how the
engine updates this table ?

Looking forward for reply.
With kindest regards,
Tedy

Tom Lane wrote:
> Teodor Macicas <teodor.macicas@epfl.ch> writes:
>
>> I'm trying to reproduce the pg_statistics catalog from one database
>> server to another one.
>>
>
> You can't really, and manually fooling with system catalogs is usually
> a bad idea anyway.  Just run ANALYZE.
>
>             regards, tom lane
>

Re: Inserting additional data into pg_statistics

From

Tom Lane

Date:

19 June 2010, 15:04:29

Teodor Macicas <teodor.macicas@epfl.ch> writes:
> Why I can't ? And for my purpose is not a bad idea. I mean, I have to do
> this and somehow I should find a solution.

> In order to use ANALYZE I need the same data on 2nd machine, but the
> data is quite large and the only information I need are the statistics
> from pg_statistic.

Er, if you haven't got the data on the second machine, then you *don't*
need or want that stuff in its pg_statistic.  It won't do you any good
to have incorrect information in there.

            regards, tom lane

Re: Inserting additional data into pg_statistics

From

Teodor Macicas

Date:

19 June 2010, 15:26:01

Hi Tom,

Thank you again for your quick reply.
Shall we pass the reason for I want to do this ? I can explain, but it's
a long story. The idea is that *for sure* I need those metadata there. I
want to use them on the second machine.

Regards,
Tedy

Tom Lane wrote:
> Teodor Macicas <teodor.macicas@epfl.ch> writes:
>
>> Why I can't ? And for my purpose is not a bad idea. I mean, I have to do
>> this and somehow I should find a solution.
>>
>
>
>> In order to use ANALYZE I need the same data on 2nd machine, but the
>> data is quite large and the only information I need are the statistics
>> from pg_statistic.
>>
>
> Er, if you haven't got the data on the second machine, then you *don't*
> need or want that stuff in its pg_statistic.  It won't do you any good
> to have incorrect information in there.
>
>             regards, tom lane
>

Re: Inserting additional data into pg_statistics

From

Teodor Macicas

Date:

20 June 2010, 14:46:56

Sorry for being so insistent, but I need to solve this one and I think
that you can help me.
Is here anyone who faced the same situation / problem ? If it is, please
tell me how can I manually insert records into pg_statistics.

Thank you again.
Regards,
Tedy

Teodor Macicas wrote:
> Hello everyone,
>
> I'm new here, but I have a problem and couldn't find a solution so I
> hope you will help me.
> I'm trying to reproduce the pg_statistics catalog from one database
> server to another one. To retrieve the data from the first server was a
> trivial job. The trouble was when I tryied to insert new record to
> pg_statistic table on the second server.
>
> To be more precise, I want to run this [0] query and I'm struggling with
> this error [1]. The query is generated by pg_dump. Anyways, it will be
> the same problem if I write the SQL statement by hand.
> So, has anyone any ideas how can I insert new record to pg_statistics
> system catalog ?
>
> [0] INSERT INTO pg_statistic
>       VALUES (
> 54655,1,0,26,-1,2,3,0,0,1058,1058,0,0,null,'{1}',null,null,'{"AFRICA
> ","AMERICA"}',null,null,null );
>
> [1] SQL error:
>       ERROR:  cannot accept a value of type anyarray
>
>
> Thank you a lot.
> Looking forward for your replies.
> -Tedy
>
>
>

Re: Inserting additional data into pg_statistics

From

Teodor Macicas

Date:

21 June 2010, 08:08:35

---

Hi Tom,
    Modifying the pg_statistics is not a good idea for most practical purposes. The modification, however, becomes a
necessityto implement automatic physical design techniques. We are developing an automatic physical designer for
Postgres.The designer will add features that most commercial systems provide right now, such as automatically selecting
indexesfor queries. My colleagues recently demonstrated a prototype version of the system at SIGMOD, and the demo
descriptioncan be found at http://www.cs.cmu.edu/~ddash/parinda-sigmod.pdf 

    We want to extend the system by doing the physical design outside the production database, and hence need to
replicatethe pg_statistics of the production database in another standing database. This is the reason, we would like
tomove the pg_statistics across the database, and both direct sql/pg_dump-restore mechanisms fail us. 

-Dash Debabrata


Tom Lane wrote:
> Teodor Macicas <teodor.macicas@epfl.ch> writes:
>
>> Why I can't ? And for my purpose is not a bad idea. I mean, I have to do
>> this and somehow I should find a solution.
>>
>
>
>> In order to use ANALYZE I need the same data on 2nd machine, but the
>> data is quite large and the only information I need are the statistics
>> from pg_statistic.
>>
>
> Er, if you haven't got the data on the second machine, then you *don't*
> need or want that stuff in its pg_statistic.  It won't do you any good
> to have incorrect information in there.
>
>             regards, tom lane
>

Re: Inserting additional data into pg_statistics

From

Cédric Villemain

Date:

21 June 2010, 10:14:08

2010/6/21 Teodor Macicas <teodor.macicas@epfl.ch>:
> ---
> Hi Tom,         Modifying the pg_statistics is not a good idea for most
> practical purposes. The modification, however, becomes a necessity to
> implement automatic physical design techniques. We are developing an
> automatic physical designer for Postgres. The designer will add features
> that most commercial systems provide right now, such as automatically
> selecting indexes for queries. My colleagues recently demonstrated a
> prototype version of the system at SIGMOD, and the demo description can be
> found at http://www.cs.cmu.edu/~ddash/parinda-sigmod.pdf
>
>        We want to extend the system by doing the physical design outside the
> production database, and hence need to replicate the pg_statistics of the
> production database in another standing database. This is the reason, we
> would like to move the pg_statistics across the database, and both direct
> sql/pg_dump-restore mechanisms fail us.

If not already there, watch how to hook the statistics when they are
used/requested in the query planner, not modifying system catalog. So
you can provide false stats to the planner....stats that you can store
in another table, not in the pg_catalog.

It looks to me that you are doing something similar to that :
http://www.pgcon.org/2010/schedule/events/233.en.html (your REF 7)
but with the 'offline' option, right ?

May I suggest you to read on 'segment exclusion'  idea in the
postgresql wiki ? http://wiki.postgresql.org/wiki/Segment_Exclusion

....sometime....

I am pretty sure the hooks for stats are not there, but ... if you
provide a (good) way to hook them without performance impact when the
hook is not used, that should be good for more than only your project.


>
> -Dash Debabrata
>
>
> Tom Lane wrote:
>>
>> Teodor Macicas <teodor.macicas@epfl.ch> writes:
>>
>>>
>>> Why I can't ? And for my purpose is not a bad idea. I mean, I have to do
>>> this and somehow I should find a solution.
>>>
>>
>>
>>>
>>> In order to use ANALYZE I need the same data on 2nd machine, but the data
>>> is quite large and the only information I need are the statistics from
>>> pg_statistic.
>>>
>>
>> Er, if you haven't got the data on the second machine, then you *don't*
>> need or want that stuff in its pg_statistic.  It won't do you any good
>> to have incorrect information in there.
>>
>>                        regards, tom lane
>>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: Inserting additional data into pg_statistics

From

Tom Lane

Date:

21 June 2010, 11:23:21

Teodor Macicas <teodor.macicas@epfl.ch> writes:
>     Modifying the pg_statistics is not a good idea for most
> practical purposes.

That's what I've been telling you.

>     We want to extend the system by doing the physical design
> outside the production database, and hence need to replicate the
> pg_statistics of the production database in another standing database.

Well, leaving aside the question of whether that's actually anywhere
near useful enough to justify the work, I'd *still* not support putting
the information into the second database's pg_statistic.  pg_statistic
should contain the truth for that database's own tables.  Seems like
what you need here is a second table along the lines of
pg_hypothetical_statistic, and then your planner hacks can include the
knowledge to look there instead of pg_statistic when doing hypothetical
planning.

Not that that's going to solve your immediate problem: there just isn't
any way at the SQL level to insert data into pg_statistic's anyarray
columns.  You're going to need some specialized C function that inserts
the data, hopefully only after validating that the actual array type
matches the column that the stats are alleged to be for.

            regards, tom lane

Re: Inserting additional data into pg_statistics

From

Adrian Klaver

Date:

21 June 2010, 11:34:39

On Monday 21 June 2010 7:23:06 am Tom Lane wrote:
> Teodor Macicas <teodor.macicas@epfl.ch> writes:
> >     Modifying the pg_statistics is not a good idea for most
> > practical purposes.
>
> That's what I've been telling you.
>
> >     We want to extend the system by doing the physical design
> > outside the production database, and hence need to replicate the
> > pg_statistics of the production database in another standing database.
>
> Well, leaving aside the question of whether that's actually anywhere
> near useful enough to justify the work, I'd *still* not support putting
> the information into the second database's pg_statistic.  pg_statistic
> should contain the truth for that database's own tables.  Seems like
> what you need here is a second table along the lines of
> pg_hypothetical_statistic, and then your planner hacks can include the
> knowledge to look there instead of pg_statistic when doing hypothetical
> planning.
>
> Not that that's going to solve your immediate problem: there just isn't
> any way at the SQL level to insert data into pg_statistic's anyarray
> columns.  You're going to need some specialized C function that inserts
> the data, hopefully only after validating that the actual array type
> matches the column that the stats are alleged to be for.
>
>             regards, tom lane

Another idea that just came to mind is to use something like

dblink
http://www.postgresql.org/docs/current/static/dblink.html

or
dbi-link
http://pgfoundry.org/projects/dbi-link

to see the information from the production db in the second db.

--
Adrian Klaver
adrian.klaver@gmail.com