Thread: pg_advisor schema proof of concept
Dear hackers, please find attached a quick proof of concept for a 'pg_advisor' schema. Well, the name is 'xpg_advisor' at the time, because it is not a system schema hence it cannot starts with 'pg_'. It appears that some support functions would be useful. I've noticed some are available from pg_catalog, but I have not found yet what I was looking for. If you do not like some advices, just "DROP VIEW the_advice;" If you do not like advices at all, just "DROP SCHEMA xpg_advisor;" Have a nice day, -- Fabien Coelho - coelho@cri.ensmp.fr
> please find attached a quick proof of concept for a 'pg_advisor' schema. Here is a "larger" but nevertheless still quick proof of concept, alas with some buggy PL/pgSQL that I wrote with my little finger. It implements some foreign key type checks for which I submitted be patches some time ago. The more I think about it, the more I find it should be the way to go, rather than having a new external tool. -- Fabien Coelho - coelho@cri.ensmp.fr
Hello hackers, > please find attached a quick proof of concept for a 'pg_advisor' schema. I'm still pushing my agenda, despite lack of reaction on the list;-) I had time this week-end to improve my current 'pg_advisor' prototype schema. This new version is now less a proof of concept and more a preliminary implementation for discussion. If you want to test on an existing database, the scripts only creates an additionnal schema which may be removed quite simply. (1) if necessary: sh> createlang -d mybase plpgsql (2) sh> psql mybase < pg_catalog.sql (3) use: psql mybase> SELECT * FROM xpg_catalog.??????; (4) clean: psql mybase> DROP SCHEMA xpg_catalog CASCADE; also if (1) sh> droplang -d mybase plpgsql Some thoughts and questions about a "pg_advisor" schema design: (1) should it use pg_catalog.* or information_schema.*? - is portability desirable? - my initial version is based on pg_catalog.- information_schema could make it more portable? well, I'm not sure it would do the job. I need to know whatare the system schemas, and it is likely that this would differ? what about support functions? - should it be compatiblewith old versions of postgreSQL? if yes, what about support functions? (2) advices should be associated: - a kind (design/model, performance... what else?) - a severity (info, notice, warning,error... others? different?) - a title - an abstract - a description with examples - what about a "subject", suchas "referencial integrity" or "index"... if so, what could be the sujects? or maybe it is not needed? - should we usethe COMMENT infrastructure for that? I don't think so, but it could be done. (3) needed support function - should be added to pg_catalog? implemented in C? - can we use plpgsql? SQL? others? I wouldtry to avoid anything other that pg_catalog and sql functions, but I needed to add several functions that were missing. (4) advices implementations. - I implemented 11 basic "design" advices at the time. I tested them with existing databases,and I'm pretty happy with the result: I had very few comments on "good" design/model, and a lot of warningsor notice on badly designed tables. - what other "design" advices would be useful? how to grade them (from infoto error)? . "cross schema contraints/tables"? - what about "performance" advices? what support functions are usefulfor those? - others? (5) documentation - should include design notes for new advices? - how to make things more modular? - let us use commentsabout every view and columns... - how to 'localise' pg_advisor? a more general issue is how to 'localise' COMMENTS. (6) possible inclusion in postgresql? - among other contributions? what about contrib/advisor? - added to template1 on defaultinstallation? maybe not for a first release? or yes? it is easier to communicate about Have a nice day, -- Fabien Coelho - coelho@cri.ensmp.fr
On Monday 22 March 2004 09:38, Fabien COELHO wrote: > Hello hackers, > > > please find attached a quick proof of concept for a 'pg_advisor' schema. > > I'm still pushing my agenda, despite lack of reaction on the list;-) > I had time this week-end to improve my current 'pg_advisor' > prototype schema. Had a look, and it seems good to me - pretty much what I was thinking of. > This new version is now less a proof of concept and more a preliminary > implementation for discussion. > Some thoughts and questions about a "pg_advisor" schema design: > > (1) should it use pg_catalog.* or information_schema.*? > - is portability desirable? > - my initial version is based on pg_catalog. > - information_schema could make it more portable? Not sure portability is important, but using information_schema will presumably make it less likely that things will change between versions. > well, I'm not sure it would do the job. I need to know what are the > system schemas, and it is likely that this would differ? what about > support functions? > - should it be compatible with old versions of postgreSQL? > if yes, what about support functions? Not sure it's worth the trouble to support 7.3, and anything below that is going to be a lot of work. > (2) advices should be associated: > - a kind (design/model, performance... what else?) > - a severity (info, notice, warning, error... others? different?) > - a title > - an abstract > - a description with examples > - what about a "subject", such as "referencial integrity" or "index"... > if so, what could be the sujects? or maybe it is not needed? Might be useful to be able to run all relevant tests against a single table, especially if we end up with lots of tests. > - should we use the COMMENT infrastructure for that? > I don't think so, but it could be done. No - I think the separate table (advice_classification) is right. > (3) needed support function > - should be added to pg_catalog? implemented in C? > - can we use plpgsql? SQL? others? > I would try to avoid anything other that pg_catalog and sql functions, > but I needed to add several functions that were missing. If plpgsql works OK, I say stick with it. > (4) advices implementations. > - I implemented 11 basic "design" advices at the time. > I tested them with existing databases, and I'm pretty happy > with the result: I had very few comments on "good" design/model, > and a lot of warnings or notice on badly designed tables. Actually picked up a genuine mistake on one of my databases (mismatched pkey<=>fkey sizes). It's been worth the money already :-) > - what other "design" advices would be useful? > how to grade them (from info to error)? Probably a matter of opinion. It'll give people something to argue about, anyway. > . "cross schema contraints/tables"? > - what about "performance" advices? Well, I can see how you could examine the stats tables, but you'd probably need to be able to see the queries too. > what support functions are useful for those? > - others? > > (5) documentation > - should include design notes for new advices? I think so. > - how to make things more modular? We probably need a good list of tests before deciding what to make into "libraries" > - let us use comments about every view and columns... > - how to 'localise' pg_advisor? > a more general issue is how to 'localise' COMMENTS. Not sure we want any of the text in the comments. Put all the messages/titles in a description table like you already have and people can translate the text in that file. > (6) possible inclusion in postgresql? > - among other contributions? what about contrib/advisor? > - added to template1 on default installation? > maybe not for a first release? or yes? it is easier to communicate > about I think we're going to want a gborg project for developing/coordinating tests anyway. Having the schema included in contrib/ might help adoption, but so would pgadmin/phpgadmin. Any client-builders reading this? What do you think? -- Richard Huxton Archonet Ltd
Dear Richard, > > (1) should it use pg_catalog.* or information_schema.*? > > Not sure portability is important, but using information_schema will > presumably make it less likely that things will change between versions. Another issue I found is that, although all the contents of information_schema can be found in pg_catalog (as it derives from it!) not all of pg_catalog may be found in information_schema... In particular, for "performance" advices about indexes, operators and casts, I'm not sure the all information is available in information_schema, from the quick look I had about it. > Might be useful to be able to run all relevant tests against a single > table, especially if we end up with lots of tests. That could be done quite easily, I've added a feature in my working version about which schemas should be tested. It is easy to have both a schema/table names and to be able to filter those of interest to the user. I'll resend later an updated version for discussion. > If plpgsql works OK, I say stick with it. Hmmm. I'm not very happy with plpgsql, as I had an infinite recursion which is partly due to plpgsql, partly to a very bad plan by the optimiser, and partly to the fact that I want to do strange things with tables querying about tables, so it goes bad quickly if the table starts querying about itself to count it's own lines:-) > > (4) advices implementations. > > - I implemented 11 basic "design" advices at the time. [...] > > Actually picked up a genuine mistake on one of my databases (mismatched > pkey<=>fkey sizes). It's been worth the money already :-) That is a point. The other question is how many "false positive". That's why I put a grade, for things that are matters of opinions, as you say... So that "controversial" advices can be downgraded to notice or info. > > - what about "performance" advices? > > Well, I can see how you could examine the stats tables, but you'd probably > need to be able to see the queries too. I was thinking along the kind of missing index Tom was arguing about for RI checks, that may be helped if an appropriate index is available. I'm not sure what could be done, even with the query, in the general case. How to guess what index would help make a better plan? It depends on the optimiser itself, on what kind of indexes could be built, and so on. That's more human expect work than tool work. > > - let us use comments about every view and columns... > > - how to 'localise' pg_advisor? > > a more general issue is how to 'localise' COMMENTS. > > Not sure we want any of the text in the comments. Put all the > messages/titles in a description table like you already have and people > can translate the text in that file. Ok. but the system should be able to store several locales. I guess it is possible to know about the current locale within SQL, e.g. by querying lc_message in pg_settings for instance. > > (6) possible inclusion in postgresql? > > I think we're going to want a gborg project for developing/coordinating > tests anyway. Why not. How much work in the infrastructure ? What would be the added value ? better communication ? > Having the schema included in contrib/ might help adoption, > but so would pgadmin/phpgadmin. Sure. Thanks a lot for your comments. I'll "submitted" an updated version later. -- Fabien Coelho - coelho@cri.ensmp.fr
Fabien COELHO <coelho@cri.ensmp.fr> writes: >>> (1) should it use pg_catalog.* or information_schema.*? >> >> Not sure portability is important, but using information_schema will >> presumably make it less likely that things will change between versions. > Another issue I found is that, although all the contents of > information_schema can be found in pg_catalog (as it derives from it!) not > all of pg_catalog may be found in information_schema... This is necessarily so, as the information_schema by definition covers only concepts standardized by the SQL spec. Since the SQL spec considers things like indexes to be implementation details, it is simply not possible for information_schema to tell you everything you want to know to give performance advice. >> If plpgsql works OK, I say stick with it. > Hmmm. I'm not very happy with plpgsql, I don't know where you are planning on going with this. If it's only to be a contrib tool, it's okay to depend on plpgsql. But we couldn't incorporate it into the base system because plpgsql isn't part of the base system. regards, tom lane
Dear Tom, > This is necessarily so, as the information_schema by definition covers > only concepts standardized by the SQL spec. Since the SQL spec > considers things like indexes to be implementation details, it is simply > not possible for information_schema to tell you everything you want to > know to give performance advice. Well, it makes sense. As pg_catalog will be necessary for some advices, let us avoid "information_schema" for a greater homogeneity. > >> If plpgsql works OK, I say stick with it. > > > Hmmm. I'm not very happy with plpgsql, > > I don't know where you are planning on going with this. If it's only to > be a contrib tool, it's okay to depend on plpgsql. But we couldn't > incorporate it into the base system because plpgsql isn't part of the > base system. Well, the ultimate status of the tool basically depends on the patchers ("we" above) decision;-) If you veto the inclusion of advisor stuff into the base system because you do not want it there anyway, which may be perfectly legitimate, then I would not bother to port the plpgsql stuff just for the fun of it. On the otherhand, if you would be ready to consider it for inclusion in the base system some day, provided that the quality is fine and that there is no plpgsql in it, then it would make sense to discuss needed functions to be added to the base system. The current "preliminary" implementation requires plpgsql for : - array_index (find index of item in array, to deal with pg_constraint attribute lists) - array_ceq (whether two arrays contains the same values, possibly in a different order, idem) - count_tuples (count the number of tuples in a relation) I think these functions could be included in the base system, anyway. As for "performance advices", such as missing indexes for ri check that you suggested, I don't know. Some functions that already exists in the backend would be welcome to be called from sql, such as selecting an "=" operator variant given the oid of the expected types... but maybe they can be developped within SQL (i.e. without plpgsql). I haven't looked at it yet. As for what is not foreseen yet, who knows? ;-) Have a nice day, -- Fabien Coelho - coelho@cri.ensmp.fr
On Wednesday 24 March 2004 15:52, Tom Lane wrote: > > >> If plpgsql works OK, I say stick with it. > > > > Hmmm. I'm not very happy with plpgsql, > > I don't know where you are planning on going with this. If it's only to > be a contrib tool, it's okay to depend on plpgsql. But we couldn't > incorporate it into the base system because plpgsql isn't part of the > base system. So Tom, are you suggesting:1. A core in the base distribution (C / SQL)2. command-line tool in the base distro (pg_advisor)3.more open project (gborg?) to let people design/add tests, some of which will eventually end up in the standard set in the base distro. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > On Wednesday 24 March 2004 15:52, Tom Lane wrote: >> I don't know where you are planning on going with this. If it's only to >> be a contrib tool, it's okay to depend on plpgsql. But we couldn't >> incorporate it into the base system because plpgsql isn't part of the >> base system. > So Tom, are you suggesting: > 1. A core in the base distribution (C / SQL) > 2. command-line tool in the base distro (pg_advisor) > 3. more open project (gborg?) to let people design/add tests, some of which > will eventually end up in the standard set in the base distro. I'm not suggesting anything ;-). Just pointing out a constraint that might affect you guys' choices of where to go with this. There has been some talk of installing plpgsql by default, in which case the constraint would vanish anyway. So I wouldn't put a huge amount of emphasis on it at this stage. I just wanted to point it out so you wouldn't paint yourselves into a corner without realizing it. regards, tom lane
On Wednesday 24 March 2004 18:02, Tom Lane wrote: > Richard Huxton <dev@archonet.com> writes: > > So Tom, are you suggesting: > > 1. A core in the base distribution (C / SQL) > > 2. command-line tool in the base distro (pg_advisor) > > 3. more open project (gborg?) to let people design/add tests, some of > > which will eventually end up in the standard set in the base distro. > > I'm not suggesting anything ;-). Just pointing out a constraint that > might affect you guys' choices of where to go with this. The direction (and it's all Fabien's work so far) will depend on where it will live. That depends on whether anyone else thinks it's worthwhile. So howabout some feedback on Fabian's work people? -- Richard Huxton Archonet Ltd
>>(6) possible inclusion in postgresql? >> - among other contributions? what about contrib/advisor? >> - added to template1 on default installation? >> maybe not for a first release? or yes? it is easier to communicate >> about > > I think we're going to want a gborg project for developing/coordinating tests > anyway. Having the schema included in contrib/ might help adoption, but so > would pgadmin/phpgadmin. Any client-builders reading this? What do you think? Both phpPgAdmin (me) and the pgAdmin team have added or have thought about adding some 'schema analysis' features to our products. If pg_advisor is available, I certainly won't bother and I will just recommend to people that they install it. I think it probably should live in userland... Chris
> I was thinking along the kind of missing index Tom was arguing about > for RI checks, that may be helped if an appropriate index is available. > > I'm not sure what could be done, even with the query, in the general case. > How to guess what index would help make a better plan? It depends > on the optimiser itself, on what kind of indexes could be built, and so > on. That's more human expect work than tool work. Also, if they have a partial index on the FK, it's not good enough! In CVS, IS NOT NULL partial indexes should be used, but in general all others still won't... Chris
Christopher Kings-Lynne wrote: >>> (6) possible inclusion in postgresql? >>> - among other contributions? what about contrib/advisor? >>> - added to template1 on default installation? >>> maybe not for a first release? or yes? it is easier to communicate >>> about >> >> >> I think we're going to want a gborg project for >> developing/coordinating tests anyway. Having the schema included in >> contrib/ might help adoption, but so would pgadmin/phpgadmin. Any >> client-builders reading this? What do you think? > > > Both phpPgAdmin (me) and the pgAdmin team have added or have thought > about adding some 'schema analysis' features to our products. If > pg_advisor is available, I certainly won't bother and I will just > recommend to people that they install it. > > I think it probably should live in userland... Yeah, this should live in userland. Maybe this could be implemented as set of some descriptions, which is interpreted by a standalone tool, or interpreted by the gui tools available. This way, we could include a set of them into the admin tool distributions, ensuring a basic set is noticed by the admins (subject to update from contrib). Currently, a check for old style fk triggers is hard-coded into pgadmin3 (to detect missing adddepend), because fk triggers are considered internal and thus suppressed. There are plans (and basic work) for a FK index tool, which wouldn't be obsolete if a pg_advisor would detect it because it's intended to have a checkbox "fix this" in the list of detected fks. Regards, Andreas
Fabien, Christopher: It would be nice for pgAdmin & PhpPgAdmin to have GUI interfaces to pg_advisor, though. Also, I would argue for this to be a GBorg/pgFoundry project rather than part of the core. It's the sort of thing that could easily be database-version agnostic, and that SQL jockeys who are not Hackers could contribute to. > Also, if they have a partial index on the FK, it's not good enough! In > CVS, IS NOT NULL partial indexes should be used, but in general all > others still won't... Whoa, there, partner! Keep in mind that there are *often* reasons for using a partial index on an FK, or even no index at all! The docs for pg_advisor need to reflect that it only catches little details the developer might otherwise have missed. It's not smarter than a DBA. -- Josh Berkus Aglio Database Solutions San Francisco
> > Also, if they have a partial index on the FK, it's not good enough! In > > CVS, IS NOT NULL partial indexes should be used, but in general all > > others still won't... > > Whoa, there, partner! Keep in mind that there are *often* reasons for using > a partial index on an FK, or even no index at all! The docs for pg_advisor > need to reflect that it only catches little details the developer might > otherwise have missed. It's not smarter than a DBA. Sure. That's why advices are "graded" from info to error in the current preliminary version. Advices that may or may not be good depending on undecidable elements have a lower grade. For instance, most attributes should be "NOT NULL" from a statistical point of view, but it is perfectly legitimate to have nullable attributes mostly anywhere, so the corresponding advices is just an "info". -- Fabien Coelho - coelho@cri.ensmp.fr
On Thu, 2004-03-25 at 11:31, Fabien COELHO wrote: > > > > Also, if they have a partial index on the FK, it's not good enough! In > > > CVS, IS NOT NULL partial indexes should be used, but in general all > > > others still won't... > > > > Whoa, there, partner! Keep in mind that there are *often* reasons for using > > a partial index on an FK, or even no index at all! The docs for pg_advisor > > need to reflect that it only catches little details the developer might > > otherwise have missed. It's not smarter than a DBA. > > Sure. > > That's why advices are "graded" from info to error in the current > preliminary version. > > Advices that may or may not be good depending on undecidable elements > have a lower grade. For instance, most attributes should be "NOT NULL" > from a statistical point of view, but it is perfectly legitimate to > have nullable attributes mostly anywhere, so the corresponding advices > is just an "info". > Are you planning on making some type of differentiation on advise that is performance based rather than advise that is theory based? I see both cases being hinted at and it seems like a subtle but important piece of information... Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Dear Josh, > > That's why advices are "graded" from info to error in the current > > preliminary version. > > > Advices that may or may not be good depending on undecidable elements > > have a lower grade. For instance, most attributes should be "NOT NULL" > > from a statistical point of view, but it is perfectly legitimate to > > have nullable attributes mostly anywhere, so the corresponding advices > > is just an "info". > > Are you planning on making some type of differentiation on advise that > is performance based rather than advise that is theory based? I see > both cases being hinted at and it seems like a subtle but important > piece of information... The current working status is that advices have a grade (info notice warning error) and a kind (design, performance, meta). More precise and subtle classification can be though of, but the interest depends on the total number of advices in the system. Now there is around a dozen of them, so there is no urge. It is easy to add some more classification if needed. Another place where such information can be given is within the description which illustrate the advice. I think maybe this would be a better place. > LAMP = Linux Apache {middleware} Postgres [JOKE] What about renaming postgreSQL myPostgres? ;-) -- Fabien Coelho - coelho@cri.ensmp.fr
On Thursday 25 March 2004 21:59, Robert Treat wrote: > On Thu, 2004-03-25 at 11:31, Fabien COELHO wrote: > > Are you planning on making some type of differentiation on advise that > is performance based rather than advise that is theory based? I see > both cases being hinted at and it seems like a subtle but important > piece of information... Fabien already has - there is an "advice_kind" table, values: misc, design, performance. -- Richard Huxton Archonet Ltd
Hello, > Both phpPgAdmin (me) and the pgAdmin team have added or have thought > about adding some 'schema analysis' features to our products. If > pg_advisor is available, I certainly won't bother and I will just > recommend to people that they install it. Ok. A more precise question is: on the client side, whether PHP or anything else, can you take advantage of the information available and provide some usable somehow "dedicated" interface that would make it easy to access the available informations? What would help for that purpose? The current proposal is that the advices would be tables in a schema, so just by browsing the tables one can access advices. However, the tables contain the advice data, but explanations about what these advices mean are in another table. So maybe there is an interface job that would be welcome to show both the explanations and the data of interest for these explanations? Just like in psql "\*" shortcuts query about pg_tables to show informations. -- Fabien.
Fabien COELHO wrote: >Ok. > >A more precise question is: on the client side, whether PHP or anything >else, can you take advantage of the information available and provide some >usable somehow "dedicated" interface that would make it easy to access the >available informations? What would help for that purpose? > >The current proposal is that the advices would be tables in a schema, so >just by browsing the tables one can access advices. However, the tables >contain the advice data, but explanations about what these advices mean >are in another table. > >So maybe there is an interface job that would be welcome to show both the >explanations and the data of interest for these explanations? Just like >in psql "\*" shortcuts query about pg_tables to show informations. > > No problem, as long as referencing data is contained in the advice tables (i.e. referencing the 'offending' object), not just text so the advice can be shown as attribute of each object. Regards, Andreas
Hello Andreas, > No problem, as long as referencing data is contained in the advice > tables (i.e. referencing the 'offending' object), not just text so the > advice can be shown as attribute of each object. What do you mean by 'referencing data'? Things like oid attributes referencing pg_class or pg_constraint or pg_index? -- Fabien Coelho - coelho@cri.ensmp.fr
Hi Fabien, Fabien COELHO wrote: >Hello Andreas, > > > >>No problem, as long as referencing data is contained in the advice >>tables (i.e. referencing the 'offending' object), not just text so the >>advice can be shown as attribute of each object. >> >> > >What do you mean by 'referencing data'? >Things like oid attributes referencing pg_class or pg_constraint or >pg_index? > > in a way. If you're advising about e.g. a table, the row should reference that table by oid. This way, admin tools can LEFT JOIN the advisory table to display that data. Regards, Andreas