Thread: Can I use PostgreSQL to develop a self-organizing database system?
Hi, at NC State University, my students and I are working on a project called "self-organizing databases," please see description below. I would like to use an open-source database system for implementation and would really appreciate your opinion on whether PostgreSQL is suitable for the project. In general, I am very impressed by the quality of PostgreSQL code and documentation, as well as by the support of the developer community. For the project, I need a cost-based query optimizer with exhaustive join enumeration and use of statistics on stored relations; PostgreSQL has that. I also need the ability to process SQL queries with aggregation, extensive indexing capabilities, view mechanisms, and possibly integrity constraints; it seems that PostgreSQL has all that. We will modify the query optimizer to incorporate rewriting queries using views, and we will create view-generating and view-manipulating modules. Please let me know if you have comments. Sincerely, Rada Chirkova ====================== Self-Organizing Databases The goal of this project is to develop new effective methods to improve the performance of sets of frequent and important queries on large relational databases at all times, which could improve the efficiency of user interactions with data-management systems. Solving the problem will have the most effect in query optimization, data warehousing, and information integration, which are important research topics with direct practical applications. The project focuses on the methodology of evaluating queries using views; views are relations that are defined by auxiliary queries and can be used to rewrite and answer user queries. One way to improve query performance is precompute and store (i.e., materialize) views. To truly optimize query performance, it is critical to materialize the "right" views. The current focus of the project is on demonstrating that, by designing and materializing views, it is possible to ensure optimal or near-optimal performance of frequent and important queries, for common and important query types. We consider this problem in the broader context of designing self-organizing databases: A self-organizing database periodically determines, without human intervention, a representative set of frequent and important queries on the data, and incrementally designs and precomputes the optimal (or near-optimal) views for that representative query workload. As the representative query workload and the stored data change over time, self-organizing databases adapt to the changes by changing the set of materialized views that are used to improve the query-answering performance in the database. For building self-organizing databases, we consider an end-to-end solution that is, we consider all aspects of handling and using views, including: · designing and materializing views and indexes to improve query performance; · exploring the effects of materialized views on the process of query optimization; · adapting view design to the changing query workload, including the process of retiring views that are no longer useful; · developing methods for auomatically updating existing materialized views over time, to reflect the changes in the stored data; · developing methods to collect database statistics to reliably estimate the sizes of the views the system considers for materialization; · analyzing the use of system resources and allocating an appropriate amount of resources to view management in the system.
"Rada Chirkova" <chirkova@csc.ncsu.edu> writes: > at NC State University, my students and I are working on a project called > "self-organizing databases," please see description below. I would like to > use an open-source database system for implementation and would really > appreciate your opinion on whether PostgreSQL is suitable for the project. FWIW, I think Postgres will do very well as a base for your project. But some might say I'm biased ;-) Postgres was originally designed and built as a research tool, and it still bears obvious traces of that heritage, even though the current developers are almost single-mindedly focused on turning it into a production tool. You might find this entertaining: http://archives.postgresql.org/pgsql-hackers/2002-06/msg00085.php I'm not personally convinced that materialized views would be a great thing to add to Postgres, but I look forward to being proven wrong. Please keep us posted on your results. regards, tom lane PS: BTW, pgsql-hackers is probably a better list than pgsql-general for discussing research work with Postgres.
> Please let me know if you have comments. > > The project focuses on the methodology of evaluating queries using views; > views are relations that are defined by auxiliary queries and can be used to > rewrite and answer user queries. One way to improve query performance is > precompute and store (i.e., materialize) views. > self-organizing -> automagically materializing the "right" views I also use views to simplify application code by moving complicated queries onto the backend. I wonder what, if any, place this aspect of using views has in your project. Thanks, Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Mon, 2003-09-08 at 15:22, Karsten Hilbert wrote: [snip] > I also use views to simplify application code by moving > complicated queries onto the backend. I wonder what, if any, Are you referring to Access' tendency to do the joins on the client? -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "What other evidence do you have that they are terrorists, other than that they trained in these (terrorist training) camps?" 17-Sep-2002 Katie Couric to an FBI agent regarding the 5 men arrested near Buffalo NY
Ron Johnson <ron.l.johnson@cox.net> writes: > On Mon, 2003-09-08 at 15:22, Karsten Hilbert wrote: > > > I also use views to simplify application code by moving > > complicated queries onto the backend. I wonder what, if any, > > Are you referring to Access' tendency to do the joins on the client? No, I think he doesn't mean move execution to the backend, just lexically hoisting the complex repeated code out of the front-end code. A self-organizing materialized view system might be able to use normal non-materialized views as hints to help it find useful views to materialize. It also might help the programmer avoid implementing functionally equivalent but non-identical forms of the same view in different queries, defeating the automated recognition of similar views. Materialized views are a godsend for DSS systems, and self-organized materialized views sound too good to be true. Is this really feasible? It sounds awfully ambitious considering postgres doesn't even have materialized views yet. -- greg
On Mon, Sep 08, 2003 at 05:55:43PM -0400, Greg Stark wrote: > Materialized views are a godsend for DSS systems, and self-organized > materialized views sound too good to be true. Is this really feasible? > It sounds awfully ambitious considering postgres doesn't even have > materialized views yet. There is also the question of how would one actually use self-organizing views... what would be the syntax for them? Should the parser recognize that some join you are giving to it is a previously self-organized view and contract the sentence? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Puedes vivir solo una vez, pero si lo haces bien, una vez es suficiente"
>> I also use views to simplify application code by moving >> complicated queries onto the backend. I wonder what, if any, > Are you referring to Access' tendency to do the joins on the client? No. Usually, when I need some query quite often that joins and subselects a lot of tables I set up a view covering the constant parts of that query. Thus I can write much simpler queries in the frontend code selecting from the view. This may trade efficiency for convenience at times. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Mon, Sep 08, 2003 at 05:55:43PM -0400, Greg Stark wrote: > > > Materialized views are a godsend for DSS systems, and self-organized > > materialized views sound too good to be true. Is this really feasible? > > It sounds awfully ambitious considering postgres doesn't even have > > materialized views yet. > > There is also the question of how would one actually use self-organizing > views... what would be the syntax for them? Should the parser recognize > that some join you are giving to it is a previously self-organized view > and contract the sentence? In an ideal world the database would just magically notice that I do a lot of queries like: "SELECT count(*) FROM foo WHERE bar = ?" and decide to materialize: "SELECT count(*),bar FROM foo GROUP BY bar" (and figure out where to put the index.) Then it would optimize all my queries to be a simple indexed lookup. Of course it would also have to add rewrite rules or triggers to update the materialized view on every update/insert/delete. So the code to decide when to create the materialized view would have to keep statistics on these queries as well to decide when it's worthwhile to create the materialized view... Nobody said it was easy :) -- greg
> Materialized views are a godsend for DSS systems, and self-organized > materialized views sound too good to be true. Is this really feasible? > It sounds awfully ambitious considering postgres doesn't even have > materialized views yet. The concept of self-organization can be tested regardless of whether those views are materialized or just written automatically like common views, I suppose. This won't, of course, bring about the self-organizing efficiency but that's just a matter of adding materialized views in general and connecting both ends, unless I am mistaken. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
If anyone knows where I can get a self-organizing desk, please let me know. :-) --------------------------------------------------------------------------- Tom Lane wrote: > "Rada Chirkova" <chirkova@csc.ncsu.edu> writes: > > at NC State University, my students and I are working on a project called > > "self-organizing databases," please see description below. I would like to > > use an open-source database system for implementation and would really > > appreciate your opinion on whether PostgreSQL is suitable for the project. > > FWIW, I think Postgres will do very well as a base for your project. > But some might say I'm biased ;-) > > Postgres was originally designed and built as a research tool, and it > still bears obvious traces of that heritage, even though the current > developers are almost single-mindedly focused on turning it into a > production tool. You might find this entertaining: > http://archives.postgresql.org/pgsql-hackers/2002-06/msg00085.php > > I'm not personally convinced that materialized views would be a great > thing to add to Postgres, but I look forward to being proven wrong. > Please keep us posted on your results. > > regards, tom lane > > PS: BTW, pgsql-hackers is probably a better list than pgsql-general for > discussing research work with Postgres. > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073