Thread: Inserting additional data into pg_statistics
Hello everyone, I'm new here, but I have a problem and couldn't find a solution so I hope you will help me. I'm trying to reproduce the pg_statistics catalog from one database server to another one. To retrieve the data from the first server was a trivial job. The trouble was when I tryied to insert new record to pg_statistic table on the second server. To be more precise, I want to run this [0] query and I'm struggling with this error [1]. The query is generated by pg_dump. Anyways, it will be the same problem if I write the SQL statement by hand. So, has anyone any ideas how can I insert new record to pg_statistics system catalog ? [0] INSERT INTO pg_statistic VALUES ( 54655,1,0,26,-1,2,3,0,0,1058,1058,0,0,null,'{1}',null,null,'{"AFRICA ","AMERICA"}',null,null,null ); [1] SQL error: ERROR: cannot accept a value of type anyarray Thank you a lot. Looking forward for your replies. -Tedy
Teodor Macicas <teodor.macicas@epfl.ch> writes: > I'm trying to reproduce the pg_statistics catalog from one database > server to another one. You can't really, and manually fooling with system catalogs is usually a bad idea anyway. Just run ANALYZE. regards, tom lane
Thank you for your reply, Tom. Why I can't ? And for my purpose is not a bad idea. I mean, I have to do this and somehow I should find a solution. In order to use ANALYZE I need the same data on 2nd machine, but the data is quite large and the only information I need are the statistics from pg_statistic. So, why I receive that ERROR and I run the INSERT statement ? Can anyone tell me where to look into Postgresql source to see how the engine updates this table ? Looking forward for reply. With kindest regards, Tedy Tom Lane wrote: > Teodor Macicas <teodor.macicas@epfl.ch> writes: > >> I'm trying to reproduce the pg_statistics catalog from one database >> server to another one. >> > > You can't really, and manually fooling with system catalogs is usually > a bad idea anyway. Just run ANALYZE. > > regards, tom lane >
Teodor Macicas <teodor.macicas@epfl.ch> writes: > Why I can't ? And for my purpose is not a bad idea. I mean, I have to do > this and somehow I should find a solution. > In order to use ANALYZE I need the same data on 2nd machine, but the > data is quite large and the only information I need are the statistics > from pg_statistic. Er, if you haven't got the data on the second machine, then you *don't* need or want that stuff in its pg_statistic. It won't do you any good to have incorrect information in there. regards, tom lane
Hi Tom, Thank you again for your quick reply. Shall we pass the reason for I want to do this ? I can explain, but it's a long story. The idea is that *for sure* I need those metadata there. I want to use them on the second machine. Regards, Tedy Tom Lane wrote: > Teodor Macicas <teodor.macicas@epfl.ch> writes: > >> Why I can't ? And for my purpose is not a bad idea. I mean, I have to do >> this and somehow I should find a solution. >> > > >> In order to use ANALYZE I need the same data on 2nd machine, but the >> data is quite large and the only information I need are the statistics >> from pg_statistic. >> > > Er, if you haven't got the data on the second machine, then you *don't* > need or want that stuff in its pg_statistic. It won't do you any good > to have incorrect information in there. > > regards, tom lane >
Sorry for being so insistent, but I need to solve this one and I think that you can help me. Is here anyone who faced the same situation / problem ? If it is, please tell me how can I manually insert records into pg_statistics. Thank you again. Regards, Tedy Teodor Macicas wrote: > Hello everyone, > > I'm new here, but I have a problem and couldn't find a solution so I > hope you will help me. > I'm trying to reproduce the pg_statistics catalog from one database > server to another one. To retrieve the data from the first server was a > trivial job. The trouble was when I tryied to insert new record to > pg_statistic table on the second server. > > To be more precise, I want to run this [0] query and I'm struggling with > this error [1]. The query is generated by pg_dump. Anyways, it will be > the same problem if I write the SQL statement by hand. > So, has anyone any ideas how can I insert new record to pg_statistics > system catalog ? > > [0] INSERT INTO pg_statistic > VALUES ( > 54655,1,0,26,-1,2,3,0,0,1058,1058,0,0,null,'{1}',null,null,'{"AFRICA > ","AMERICA"}',null,null,null ); > > [1] SQL error: > ERROR: cannot accept a value of type anyarray > > > Thank you a lot. > Looking forward for your replies. > -Tedy > > >
--- Hi Tom, Modifying the pg_statistics is not a good idea for most practical purposes. The modification, however, becomes a necessityto implement automatic physical design techniques. We are developing an automatic physical designer for Postgres.The designer will add features that most commercial systems provide right now, such as automatically selecting indexesfor queries. My colleagues recently demonstrated a prototype version of the system at SIGMOD, and the demo descriptioncan be found at http://www.cs.cmu.edu/~ddash/parinda-sigmod.pdf We want to extend the system by doing the physical design outside the production database, and hence need to replicatethe pg_statistics of the production database in another standing database. This is the reason, we would like tomove the pg_statistics across the database, and both direct sql/pg_dump-restore mechanisms fail us. -Dash Debabrata Tom Lane wrote: > Teodor Macicas <teodor.macicas@epfl.ch> writes: > >> Why I can't ? And for my purpose is not a bad idea. I mean, I have to do >> this and somehow I should find a solution. >> > > >> In order to use ANALYZE I need the same data on 2nd machine, but the >> data is quite large and the only information I need are the statistics >> from pg_statistic. >> > > Er, if you haven't got the data on the second machine, then you *don't* > need or want that stuff in its pg_statistic. It won't do you any good > to have incorrect information in there. > > regards, tom lane >
2010/6/21 Teodor Macicas <teodor.macicas@epfl.ch>: > --- > Hi Tom, Modifying the pg_statistics is not a good idea for most > practical purposes. The modification, however, becomes a necessity to > implement automatic physical design techniques. We are developing an > automatic physical designer for Postgres. The designer will add features > that most commercial systems provide right now, such as automatically > selecting indexes for queries. My colleagues recently demonstrated a > prototype version of the system at SIGMOD, and the demo description can be > found at http://www.cs.cmu.edu/~ddash/parinda-sigmod.pdf > > We want to extend the system by doing the physical design outside the > production database, and hence need to replicate the pg_statistics of the > production database in another standing database. This is the reason, we > would like to move the pg_statistics across the database, and both direct > sql/pg_dump-restore mechanisms fail us. If not already there, watch how to hook the statistics when they are used/requested in the query planner, not modifying system catalog. So you can provide false stats to the planner....stats that you can store in another table, not in the pg_catalog. It looks to me that you are doing something similar to that : http://www.pgcon.org/2010/schedule/events/233.en.html (your REF 7) but with the 'offline' option, right ? May I suggest you to read on 'segment exclusion' idea in the postgresql wiki ? http://wiki.postgresql.org/wiki/Segment_Exclusion ....sometime.... I am pretty sure the hooks for stats are not there, but ... if you provide a (good) way to hook them without performance impact when the hook is not used, that should be good for more than only your project. > > -Dash Debabrata > > > Tom Lane wrote: >> >> Teodor Macicas <teodor.macicas@epfl.ch> writes: >> >>> >>> Why I can't ? And for my purpose is not a bad idea. I mean, I have to do >>> this and somehow I should find a solution. >>> >> >> >>> >>> In order to use ANALYZE I need the same data on 2nd machine, but the data >>> is quite large and the only information I need are the statistics from >>> pg_statistic. >>> >> >> Er, if you haven't got the data on the second machine, then you *don't* >> need or want that stuff in its pg_statistic. It won't do you any good >> to have incorrect information in there. >> >> regards, tom lane >> > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
Teodor Macicas <teodor.macicas@epfl.ch> writes: > Modifying the pg_statistics is not a good idea for most > practical purposes. That's what I've been telling you. > We want to extend the system by doing the physical design > outside the production database, and hence need to replicate the > pg_statistics of the production database in another standing database. Well, leaving aside the question of whether that's actually anywhere near useful enough to justify the work, I'd *still* not support putting the information into the second database's pg_statistic. pg_statistic should contain the truth for that database's own tables. Seems like what you need here is a second table along the lines of pg_hypothetical_statistic, and then your planner hacks can include the knowledge to look there instead of pg_statistic when doing hypothetical planning. Not that that's going to solve your immediate problem: there just isn't any way at the SQL level to insert data into pg_statistic's anyarray columns. You're going to need some specialized C function that inserts the data, hopefully only after validating that the actual array type matches the column that the stats are alleged to be for. regards, tom lane
On Monday 21 June 2010 7:23:06 am Tom Lane wrote: > Teodor Macicas <teodor.macicas@epfl.ch> writes: > > Modifying the pg_statistics is not a good idea for most > > practical purposes. > > That's what I've been telling you. > > > We want to extend the system by doing the physical design > > outside the production database, and hence need to replicate the > > pg_statistics of the production database in another standing database. > > Well, leaving aside the question of whether that's actually anywhere > near useful enough to justify the work, I'd *still* not support putting > the information into the second database's pg_statistic. pg_statistic > should contain the truth for that database's own tables. Seems like > what you need here is a second table along the lines of > pg_hypothetical_statistic, and then your planner hacks can include the > knowledge to look there instead of pg_statistic when doing hypothetical > planning. > > Not that that's going to solve your immediate problem: there just isn't > any way at the SQL level to insert data into pg_statistic's anyarray > columns. You're going to need some specialized C function that inserts > the data, hopefully only after validating that the actual array type > matches the column that the stats are alleged to be for. > > regards, tom lane Another idea that just came to mind is to use something like dblink http://www.postgresql.org/docs/current/static/dblink.html or dbi-link http://pgfoundry.org/projects/dbi-link to see the information from the production db in the second db. -- Adrian Klaver adrian.klaver@gmail.com