Re: Questions on specifying table relationships - Mailing list pgsql-general

From Patrick Bakker
Subject Re: Questions on specifying table relationships
Date
Msg-id A9CE1D556F89DD4FBA4CF797215DF61A02F366@20svbl1.vanbelle.local
Whole thread Raw
In response to Questions on specifying table relationships  (Patrick Bakker <patrick@vanbelle.com>)
List pgsql-general

Comments are in-line.

> On Friday 01 Nov 2002 5:34 pm, Patrick Bakker wrote:
> > Thanks for the reply Richard.
> > I was thinking about your query templates suggestion and I
> > don't think that is a sufficient solution for what I'm hoping
> > to do. I have two purposes in mind for the query generator.
> >
> > My application allows the user to choose a starting point
> > (ie. item, order, customer, location) and then shows a list - on the
> > left-hand side of the screen, at the full height of the screen -
> > presenting identifying elements (ie. name, order #, etc.) specific to
> > whichever starting point the user has chosen. In this way, as the user
> > changes selections on this list the content on the right automatically
> > updates to reflect the new selection.
> >
> > Now the user can change any of the options on the query for each of these
> > selection lists, in any combination, to create arbitrary selection lists
> > which are more focused for their purposes. Presenting each possible query
> > is likely to get very involved and I'd have to repeat it for every starting
> > point.
> >
> > Additionally, I'm planning on using the same query system to describe
> > reports in external files. The entire report system will consist of
> > externally defined reports like this. So in effect the query system I'm
> > looking for here is the underlying organization which would make your query
> > template system possible and completely dynamic. ie. I can add a new report
> > to the system by dropping in a report definition file in the server's
> > shared network drive where the application reads its configuration from.
>
> OK - so how would I create something like:
>
> SELECT o.ord_date, p.prod_type, sum(o.sale_price) AS tot_sales
> FROM orders o, products p
> WHERE o.prod_id=p.prod_id
> GROUP BY o.ord_date, p.prod_type;

The definition of a static query template for this could be:

<displayable-fields>
        <displayable-field default="yes">
                <source>Order.orderDate</source>
        </displayable-field>
        <displayable-field default="yes">
                <source>Order.Product.type</source>
        </displayable-field>
        <displayable-field default="yes">
                <source>SUM(Order.salePrice)</source>
        </displayable-field>
</displayable-fields>
<groupable-fields>
        <groupable-field default="yes">
                <source>Order.orderDate</source>
        </groupable-field>
        <groupable-field default="yes">
                <source>Order.Product.type</source>
        </groupable-field>
</groupable-fields>

 -- The query generator parses the source fields and converts from an EJB-derived notation to the actual
 database tables and columns. In addition it recognizes expressions. The part of the generator that forms
 the actual SQL queries is database specific and converts from a standard list of functions, operators,
 etc. to the database specific ones (not a problem in this case).

> Does your LH column list *all* the possible fields resulting
> from all possible joins on the order table (products, customers,
> departments...)

No. It only resolves as far as is necessary. ie. if the user is looking at orders and has only used order fields in defining their selection list then only the order table will be involved in the query. If the user now tries to limit the displayable orders by a field in the customer's table (ie. salesman), then the query will now retrieve the unique orders which match this query. In the case of the selection lists the query template definition has a very limited number of displayable fields but has a great variety in the how it can be limited. (see end for more explanation).

> > Can you elaborate on your statement "without knowing what
> > it means to connect two tables via two columns I'm not sure the
> > automated system could decide between options."? Are you referring to
> > the autoquery generator being unable to guess the join because it depends on the
> > meaning of the data in the tables or are you saying that the type of join
> > will vary for each query and therefore cannot be known ahead of time?
>
> I'm saying that even if you specify all possible joins
> between 2 tables, you might have to choose between two ways connecting 3 tables.
> The only way to do that is if you know what the columns *mean* and what your
> query is trying to achieve.

Ok. Each query template would hopefully provide the context necessary to decide which
join is intended.

> > I have a few other options for providing more information
> > to the auto-query generator. Since I'm parsing all of the fields
> > and relationships from EJB descriptors (jbosscmp-jdbc.xml,
> > jaws.xml and ejb-jar.xml) I have the following information available for
> > describing relationships:
> >
> [sample information]
> >
> > If I do it this way and require every query to begin with
> > the same EJB bean then each query would fully describe the relationship
> > path needed for each field.
>
> Well, you have enough here to specify relationships, but I
> still think it will need a sentient being to decide on the details. I could be
> wrong though.

Hmm. I have the feeling you could be right, but I can't think of a case where this would be true.

> Are you familiar with MS-Access (or similar consumer DBs) and
> it's grid way of building queries? That's about as general purpose and
> graphical as I've seen. You identify the tables you want, and then the columns within.
> It automatically indicates joins that it has been told about, or
> you can make your own. You still need to know something about the structure of the
> database though.

Yes. However, I'm looking for more of a middle-ground. Access has two stages - Query Design and Results. The ability to specify parameters is sort of a middle-ground where you can adjust the query and get different results. I'm looking to make a query template where the user can make more extensive changes - ie. design, customize, report. The structure of the query template will have to be sufficiently precise so as to avoid ambiguity in joins but otherwise be flexible for a user's query.

So each query template defines the following (broken down by SQL syntax):
 SELECT
        - Which fields the user can choose to include/exclude.
        - Which fields are mandatory and the user cannot remove from the query.
                - Further, mandatory fields may also be hidden from the user.
 FROM
        - Fully determined by the query automater based on all of the fields used in SELECT, WHERE, GROUP BY, HAVING
        and ORDER BY statements.
 WHERE and HAVING
        - There is a list of limitable fields which the user can use to apply limits to the query if they wish.
        - Again some may be fixed and or invisible to the user.
        - Choice between a WHERE and HAVING is determined while parsing the source of the field. If its an aggregate
        function like SUM the limit will be put in the HAVING clause of the query.
        - Each field listed in the limitable list of the query also has an associated data type with it and there are
        a limited number of these (string, date, money, number, etc...).
        - The user sees limits based on the type of the field. Here are some examples:

        [ Customer Name    ] [ begins with   ] [                 ]
        [ Order Date       ] [ between dates ] [ 01/01/2002 ] and [ 01/31/2002 ]

 GROUP BY
        - Again there is a list of fields which the user can use to specify the grouping of the query.

 ORDER BY
        - A list of fields which the user can use to order the arrangement of the query.

I'm now convinced it should work fairly well, so I am going to begin coding this and try to flush out the remaining snags as I run into them.

Patrick

pgsql-general by date:

Previous
From: "Booth, Robert"
Date:
Subject: Update FROM clause?
Next
From: "Roderick A. Anderson"
Date:
Subject: [Q] Sequences, last_value and inserts