Thread: Patch for adding DATACUBE operator

Patch for adding DATACUBE operator

From
sumit
Date:
Hi!

    We have added the CUBE operator for PostgreSQL. Please find the
attached patch.

    Another thing to note is that the file datacube.c should
be placed in src/backend/tcop/ and datacube.h should be in src/include.

The syntax of the query is

SELECT <field list><aggregate list>
INTO <destination table>
FROM <table expression>
WHERE <search condition>
GROUP BY <aggregate list>
HAVING <search condition>
WITH CUBE;

    An example along with the output is provided in the
README.datacube file. Kindly have a look. Let us know your response.

Srikanth M
Sumit Kumar Mukherjee

Attachment

Re: Patch for adding DATACUBE operator

From
"Andrew Dunstan"
Date:
I'm curious to know why you implement this as a union of queries, since,
unless my understanding is badly awry, you have all the information
necessary for the ALL rows by running the base (ie. without cube) query, Why
not just run that query and then add the ALL rows from examining the
results? ISTM that would be more efficient, since the summary table is in
most real world situations likely to be far, far smaller than the base
table.

andrew

----- Original Message -----
From: "sumit" <sumit@gdit.iiit.net>
To: <pgsql-patches@postgresql.org>
Sent: Monday, June 30, 2003 6:04 AM
Subject: [PATCHES] Patch for adding DATACUBE operator


>
> Hi!
>
> We have added the CUBE operator for PostgreSQL. Please find the
> attached patch.
>
> Another thing to note is that the file datacube.c should
> be placed in src/backend/tcop/ and datacube.h should be in src/include.
>
> The syntax of the query is
>
> SELECT <field list><aggregate list>
> INTO <destination table>
> FROM <table expression>
> WHERE <search condition>
> GROUP BY <aggregate list>
> HAVING <search condition>
> WITH CUBE;
>
> An example along with the output is provided in the
> README.datacube file. Kindly have a look. Let us know your response.
>
> Srikanth M
> Sumit Kumar Mukherjee
>


----------------------------------------------------------------------------
----


>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>


Re: Patch for adding DATACUBE operator

From
sumit
Date:
    Well, it is true that the information for the ALL rows is present
in the base query. But then adding the ALL rows directly after examining
the result, we think, is not as efficient as it sounds.

    Consider the situation you are talking about. Suppose we decide to
add the ALL rows directly to the base result. A major drawback would be
that we would have to execute the whole WHERE clause and/or joins over the
original relations, which, in case of large databases, would cause a lot of
overhead. Whereas, executing the UNION of SELECT queries on the base table
obtained from the base query (WITHOUT CUBE) does not have to handle these
situations (WHERE clause, JOIN cases). More importantly, this is faster
(due to the relatively smaller size).

    It is true that a summary table is much smaller than the base
table. But a DATACUBE (as per definition) takes into account even the base
query (the one WITHOUT CUBE). We have referred some sites and articles
before planning the code. Since we are building a DATACUBE all the
information is required.

    Hope we have made ourselves clear.

Sumit
Srikanth

On Mon, 30 Jun 2003, Andrew Dunstan wrote:

>
> I'm curious to know why you implement this as a union of queries, since,
> unless my understanding is badly awry, you have all the information
> necessary for the ALL rows by running the base (ie. without cube) query, Why
> not just run that query and then add the ALL rows from examining the
> results? ISTM that would be more efficient, since the summary table is in
> most real world situations likely to be far, far smaller than the base
> table.
>
> andrew


Re: Patch for adding DATACUBE operator

From
Tom Lane
Date:
sumit <sumit@gdit.iiit.net> writes:
> In keeping with this, we first define the syntax of a CUBE statement as follows:

> SELECT <field list><aggregate list>
> INTO <destination table>
> FROM <table expression>
> WHERE <search condition>
> GROUP BY <aggregate list>
> HAVING <search condition>
> WITH CUBE;

SQL99 defines a GROUP BY CUBE(...) operation, which seems to do the same
thing as this.  Why are you using a nonstandard syntax?

            regards, tom lane

Re: Patch for adding DATACUBE operator

From
sumit
Date:
    Thanks for letting know. Could you also let me know the exact
syntax, I mean, we are not sure whether GROUP BY CUBE(...) is followed by
a HAVING clause. Kindly inform us soon so that we can make the changes and
send you the updated patch and files.

Sumit

On Mon, 30 Jun 2003, Tom Lane wrote:


> SQL99 defines a GROUP BY CUBE(...) operation, which seems to do the same
> thing as this.  Why are you using a nonstandard syntax?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


Re: Patch for adding DATACUBE operator

From
Tom Lane
Date:
sumit <sumit@gdit.iiit.net> writes:
>     Thanks for letting know. Could you also let me know the exact
> syntax, I mean, we are not sure whether GROUP BY CUBE(...) is followed by
> a HAVING clause.

Better look at the SQL99 spec for yourself.  There's a link to a draft
version in our developer's FAQ, IIRC.

            regards, tom lane