Thread: Can I use PostgreSQL to develop a self-organizing database system?

Can I use PostgreSQL to develop a self-organizing database system?

From
"Rada Chirkova"
Date:
Hi,

at NC State University, my students and I are working on a project called
"self-organizing databases," please see description below. I would like to
use an open-source database system for implementation and would really
appreciate your opinion on whether PostgreSQL is suitable for the project.
In general, I am very impressed by the quality of PostgreSQL code and
documentation, as well as by the support of the developer community.

For the project, I need a cost-based query optimizer with exhaustive join
enumeration and use of statistics on stored relations; PostgreSQL has that.
I also need the ability to process SQL queries with aggregation, extensive
indexing capabilities, view mechanisms, and possibly integrity constraints;
it seems that PostgreSQL has all that. We will modify the query optimizer to
incorporate rewriting queries using views, and we will create
view-generating and view-manipulating modules.

Please let me know if you have comments.

Sincerely,

Rada Chirkova

======================
Self-Organizing Databases

The goal of this project is to develop new effective methods to improve the
performance of sets of frequent and important queries on large relational
databases at all times, which could improve the efficiency of user
interactions with data-management systems. Solving the problem will have the
most effect in query optimization, data warehousing, and information
integration, which are important research topics with direct practical
applications.

The project focuses on the methodology of evaluating queries using views;
views are relations that are defined by auxiliary queries and can be used to
rewrite and answer user queries. One way to improve query performance is
precompute and store (i.e., materialize) views.

To truly optimize query performance, it is critical to materialize the
"right" views. The current focus of the project is on demonstrating that, by
designing and materializing views, it is possible to ensure optimal or
near-optimal performance of frequent and important queries, for common and
important query types. We consider this problem in the broader context of
designing self-organizing databases: A self-organizing database periodically
determines, without human intervention, a representative set of frequent and
important queries on the data, and incrementally designs and precomputes the
optimal (or near-optimal) views for that representative query workload. As
the representative query workload and the stored data change over time,
self-organizing databases adapt to the changes by changing the set of
materialized views that are used to improve the query-answering performance
in the database.

For building self-organizing databases, we consider an end-to-end solution –
that is, we consider all aspects of handling and using views, including:
·    designing and materializing views and indexes to improve query
performance;
·    exploring the effects of materialized views on the process of query
optimization;
·    adapting view design to the changing query workload, including the process
of retiring views that are no longer useful;
·    developing methods for auomatically updating existing materialized views
over time, to reflect the changes in the stored data;
·    developing methods to collect database statistics to reliably estimate the
sizes of the views the system considers for materialization;
·    analyzing the use of system resources and allocating an appropriate amount
of resources to view management in the system.


Re: Can I use PostgreSQL to develop a self-organizing database system?

From
Tom Lane
Date:
"Rada Chirkova" <chirkova@csc.ncsu.edu> writes:
> at NC State University, my students and I are working on a project called
> "self-organizing databases," please see description below. I would like to
> use an open-source database system for implementation and would really
> appreciate your opinion on whether PostgreSQL is suitable for the project.

FWIW, I think Postgres will do very well as a base for your project.
But some might say I'm biased ;-)

Postgres was originally designed and built as a research tool, and it
still bears obvious traces of that heritage, even though the current
developers are almost single-mindedly focused on turning it into a
production tool.  You might find this entertaining:
http://archives.postgresql.org/pgsql-hackers/2002-06/msg00085.php

I'm not personally convinced that materialized views would be a great
thing to add to Postgres, but I look forward to being proven wrong.
Please keep us posted on your results.

            regards, tom lane

PS: BTW, pgsql-hackers is probably a better list than pgsql-general for
discussing research work with Postgres.

Re: Can I use PostgreSQL to develop a self-organizing database system?

From
Karsten Hilbert
Date:
> Please let me know if you have comments.
>
> The project focuses on the methodology of evaluating queries using views;
> views are relations that are defined by auxiliary queries and can be used to
> rewrite and answer user queries. One way to improve query performance is
> precompute and store (i.e., materialize) views.

> self-organizing -> automagically materializing the "right" views

I also use views to simplify application code by moving
complicated queries onto the backend. I wonder what, if any,
place this aspect of using views has in your project.

Thanks,
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Can I use PostgreSQL to develop a self-organizing

From
Ron Johnson
Date:
On Mon, 2003-09-08 at 15:22, Karsten Hilbert wrote:
[snip]
> I also use views to simplify application code by moving
> complicated queries onto the backend. I wonder what, if any,

Are you referring to Access' tendency to do the joins on the client?

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"What other evidence do you have that they are terrorists, other
than that they trained in these (terrorist training) camps?"
17-Sep-2002 Katie Couric to an FBI agent regarding the 5 men
arrested near Buffalo NY


Re: Can I use PostgreSQL to develop a self-organizing

From
Greg Stark
Date:
Ron Johnson <ron.l.johnson@cox.net> writes:

> On Mon, 2003-09-08 at 15:22, Karsten Hilbert wrote:
>
> > I also use views to simplify application code by moving
> > complicated queries onto the backend. I wonder what, if any,
>
> Are you referring to Access' tendency to do the joins on the client?

No, I think he doesn't mean move execution to the backend, just lexically
hoisting the complex repeated code out of the front-end code.

A self-organizing materialized view system might be able to use normal
non-materialized views as hints to help it find useful views to materialize.

It also might help the programmer avoid implementing functionally equivalent
but non-identical forms of the same view in different queries, defeating the
automated recognition of similar views.

Materialized views are a godsend for DSS systems, and self-organized
materialized views sound too good to be true. Is this really feasible?
It sounds awfully ambitious considering postgres doesn't even have
materialized views yet.

--
greg

Re: Can I use PostgreSQL to develop a self-organizing

From
Alvaro Herrera
Date:
On Mon, Sep 08, 2003 at 05:55:43PM -0400, Greg Stark wrote:

> Materialized views are a godsend for DSS systems, and self-organized
> materialized views sound too good to be true. Is this really feasible?
> It sounds awfully ambitious considering postgres doesn't even have
> materialized views yet.

There is also the question of how would one actually use self-organizing
views... what would be the syntax for them?  Should the parser recognize
that some join you are giving to it is a previously self-organized view
and contract the sentence?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Puedes vivir solo una vez, pero si lo haces bien, una vez es suficiente"

Re: Can I use PostgreSQL to develop a self-organizing

From
Karsten Hilbert
Date:
>> I also use views to simplify application code by moving
>> complicated queries onto the backend. I wonder what, if any,
> Are you referring to Access' tendency to do the joins on the client?
No. Usually, when I need some query quite often that joins and
subselects a lot of tables I set up a view covering the
constant parts of that query. Thus I can write much simpler
queries in the frontend code selecting from the view. This may
trade efficiency for convenience at times.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Can I use PostgreSQL to develop a self-organizing

From
Greg Stark
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

> On Mon, Sep 08, 2003 at 05:55:43PM -0400, Greg Stark wrote:
>
> > Materialized views are a godsend for DSS systems, and self-organized
> > materialized views sound too good to be true. Is this really feasible?
> > It sounds awfully ambitious considering postgres doesn't even have
> > materialized views yet.
>
> There is also the question of how would one actually use self-organizing
> views... what would be the syntax for them?  Should the parser recognize
> that some join you are giving to it is a previously self-organized view
> and contract the sentence?

In an ideal world the database would just magically notice that I do a lot of
queries like: "SELECT count(*) FROM foo WHERE bar = ?"
and decide to materialize: "SELECT count(*),bar FROM foo GROUP BY bar"
(and figure out where to put the index.)

Then it would optimize all my queries to be a simple indexed lookup.

Of course it would also have to add rewrite rules or triggers to update the
materialized view on every update/insert/delete. So the code to decide when to
create the materialized view would have to keep statistics on these queries as
well to decide when it's worthwhile to create the materialized view... Nobody
said it was easy :)

--
greg

Re: Can I use PostgreSQL to develop a self-organizing

From
Karsten Hilbert
Date:
> Materialized views are a godsend for DSS systems, and self-organized
> materialized views sound too good to be true. Is this really feasible?
> It sounds awfully ambitious considering postgres doesn't even have
> materialized views yet.
The concept of self-organization can be tested regardless of
whether those views are materialized or just written
automatically like common views, I suppose. This won't, of
course, bring about the self-organizing efficiency but that's
just a matter of adding materialized views in general and
connecting both ends, unless I am mistaken.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Can I use PostgreSQL to develop a self-organizing database

From
Bruce Momjian
Date:
If anyone knows where I can get a self-organizing desk, please let me
know. :-)

---------------------------------------------------------------------------

Tom Lane wrote:
> "Rada Chirkova" <chirkova@csc.ncsu.edu> writes:
> > at NC State University, my students and I are working on a project called
> > "self-organizing databases," please see description below. I would like to
> > use an open-source database system for implementation and would really
> > appreciate your opinion on whether PostgreSQL is suitable for the project.
>
> FWIW, I think Postgres will do very well as a base for your project.
> But some might say I'm biased ;-)
>
> Postgres was originally designed and built as a research tool, and it
> still bears obvious traces of that heritage, even though the current
> developers are almost single-mindedly focused on turning it into a
> production tool.  You might find this entertaining:
> http://archives.postgresql.org/pgsql-hackers/2002-06/msg00085.php
>
> I'm not personally convinced that materialized views would be a great
> thing to add to Postgres, but I look forward to being proven wrong.
> Please keep us posted on your results.
>
>             regards, tom lane
>
> PS: BTW, pgsql-hackers is probably a better list than pgsql-general for
> discussing research work with Postgres.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073