Thread: PostgreSQL - 'SKYLINE OF' clause added!

PostgreSQL - 'SKYLINE OF' clause added!

From
"ranbeer makin"
Date:
We at International Institute of Information Technology (IIIT) Hyderabad, India, have extended the Postgres database<br
/>systemwith the skyline operation. For this work, we were guided by our Prof. Kamalakar Karlapalem<br /> (<a
href="http://www.iiit.ac.in/~kamal/">http://www.iiit.ac.in/~kamal/</a>).<br/><br />We have extended SQL 'SELECT' clause
byan optional 'SKYLINE OF' clause in<br />version 8.0.3. The changes are done in parser, transformation, <br
/>planner/optimizer(a bit) and executor stages. For its execution, two novel algorithms  - BNL (Block Nested Loop) and
SFS<br/>(Sort Filter Skyline) - are also implemented.<br /><br />Can this piece of work contribute to PostgreSQL? If
yes,then we'll send out a detailed report of this project including changes <br />made, issues involved/need to be
solved,limitations, future work, and the source code etc.<br /><br />Thanks very much.<br /><br />Regards,<br
/>Nikita<br/>Ranbeer<br /><br />--<br /><a
href="http://students.iiit.ac.in/~nikita/">http://students.iiit.ac.in/~nikita/</a><br /><a
href="http://students.iiit.ac.in/~ranbeer/">http://students.iiit.ac.in/~ranbeer/</a><br/><br /> 

Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Martijn van Oosterhout
Date:
On Sat, Mar 03, 2007 at 07:02:41PM +0530, ranbeer makin wrote:
> We at International Institute of Information Technology (IIIT) Hyderabad,
> India, have extended the Postgres database
> system with the skyline operation. For this work, we were guided by our
> Prof. Kamalakar Karlapalem
> (http://www.iiit.ac.in/~kamal/).

<snip>

> Can this piece of work contribute to PostgreSQL? If yes, then we'll send out
> a detailed report of this project including changes
> made, issues involved/need to be solved, limitations, future work, and the
> source code etc.

Well, that kind of depends. I have no idea what "Skyline" means so
telling us what it is would be a good start

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Andrew Dunstan
Date:

Martijn van Oosterhout wrote:
>
> Well, that kind of depends. I have no idea what "Skyline" means so
> telling us what it is would be a good start
>
>
>   

Also, showing us the diffs (maybe on a web site) would give us an idea 
of how intrusive it would be.

But I agree with Martijn - the first thing is to describe the 
functionality properly.

cheers

andrew


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
"ranbeer makin"
Date:

Here is a description of what the SKYLINE operator is:
---
Suppose you wish to purchase books and you are looking for books with high rating and low price. However, both the criteria of selecting books are complementary since books of higher rating are generally more expensive. For finding such books, you'll query the database system of the book store which will return a set of interesting books. The word 'interesting' implies all the books which are as good or better in both the dimensions (rating and price) and better in at least one dimension. This set of interesting points forms the Skyline.

Skyline operator finds points which are not dominated by other data points. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension.

For specifying the Skyline queries, we extend SQL SELECT statement by an optional SKYLINE OF clause as given below:           

SELECT ... FROM ... WHERE...

GROUP BY ... HAVING...

SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF],  .., dm [MIN | MAX | DIFF]

ORDER BY...

 
Where, d1, d2 ,…, dm denote the dimensions of the Skyline, and MIN, MAX, DIFF specify whether the value in that dimension should be minimized, maximized, or simply be different. When DIFF is specified, two tuples are compared only if the value of the attribute on which DIFF is applied is different.

When DISTINCT clause is specified and if there are two or more tuples with the same values of skyline attributes, then only one of them is retained in the skyline set. Otherwise, all of them are retained.

Let's consider the above example of purchasing books with high rating and low price.

 

Book Name

Rating (out of 5)

Price (Rs)

Prodigal Daughter

3

250

The city of Joy

5

400

Vanishing Acts

2

250

The Notebook

4

300

Fountain Head

5

350

Dear John

5

500

Table1. Sample of book database

 

Now, in order to get books with high rating and low price, you simply can issue the following query:

SELECT *

FROM Books

SKYLINE OF rating MAX, price MIN;

 

The Skyline set returned will be:

 

Book Name

Rating (out of 5)

Price (Rs)

Prodigal Daughter

3

250

The Notebook

4

300

Fountain Head

5

350

Table2. Skyline set

 

From this set, you can now make your choice of books, by weighing your personal preferences for price and rating of the books.

For more information, you can refer to:
S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421.430, 2001

---

Thanks.


On 3/3/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Sat, Mar 03, 2007 at 07:02:41PM +0530, ranbeer makin wrote:
> We at International Institute of Information Technology (IIIT) Hyderabad,
> India, have extended the Postgres database
> system with the skyline operation. For this work, we were guided by our
> Prof. Kamalakar Karlapalem
> (http://www.iiit.ac.in/~kamal/).

<snip>

> Can this piece of work contribute to PostgreSQL? If yes, then we'll send out
> a detailed report of this project including changes
> made, issues involved/need to be solved, limitations, future work, and the
> source code etc.

Well, that kind of depends. I have no idea what "Skyline" means so
telling us what it is would be a good start

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF6XrkIB7bNG8LQkwRAqw8AJ0UKAy41OMxdgLUdY1G+e7R6/jGPwCZAQY4
9uCKFUW65UBIx7fpogR75Yo=
=6Yc0
-----END PGP SIGNATURE-----


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Shane Ambler
Date:
ranbeer makin wrote:
> We at International Institute of Information Technology (IIIT) Hyderabad,
> India, have extended the Postgres database
> system with the skyline operation. For this work, we were guided by our
> Prof. Kamalakar Karlapalem
> (http://www.iiit.ac.in/~kamal/).
> 
> We have extended SQL 'SELECT' clause by an optional 'SKYLINE OF' clause in
> version 8.0.3. The changes are done in parser, transformation,
> planner/optimizer (a bit) and executor stages. For its execution, two novel
> algorithms  - BNL (Block Nested Loop) and SFS
> (Sort Filter Skyline) - are also implemented.
From what I read on your web pages it sounds interesting and may be a 
worthwhile addition to PostgreSQL. I'll have a look at it when it is 
available.

> Can this piece of work contribute to PostgreSQL? If yes, then we'll send 
> out
> a detailed report of this project including changes
> made, issues involved/need to be solved, limitations, future work, and the
> source code etc.

I am not one making the choice of accepting your work but from what I 
know you would want to make your patch available so others can review 
the stability/quality of your work and decide whether there is enough 
demand for the feature to have it included in the main distribution 
either as part of the main code or within the contrib section.

One option you have is to start a project at pgfoundry.org so others can 
access and try your contribution. This will allow your work to be 
available and to be tested by those interested in this feature. If your 
work proves to be worthwhile and in demand it can progress from there 
into the main distribution.

You most probably want to look at porting your changes to the latest 
postgresql release as well.

> Thanks very much.
> 
> Regards,
> Nikita
> Ranbeer
> 
> -- 
> http://students.iiit.ac.in/~nikita/
> http://students.iiit.ac.in/~ranbeer/
> 


-- 

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
"Joshua D. Drake"
Date:
> You most probably want to look at porting your changes to the latest
> postgresql release as well.

I believe many people would be interested, but to get the feature
accepted we would need a patch against -head as that is the latest
version of PostgreSQL under development.

You can see more information about that here:

http://www.postgresql.org/developer/

The above link will also help you with our required coding styles,
acceptance policies etc...

Sincerely,

Joshua D. Drake




> 
>> Thanks very much.
>>
>> Regards,
>> Nikita
>> Ranbeer
>>
>> -- 
>> http://students.iiit.ac.in/~nikita/
>> http://students.iiit.ac.in/~ranbeer/
>>
> 
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Jim Nasby
Date:
FWIW, this sounds like a subset of the Query By Example stuff that
someone is working on. I don't have a URL handy since I'm on a plane,
but I think google can find it.

On Mar 3, 2007, at 8:12 AM, ranbeer makin wrote:

>
> Here is a description of what the SKYLINE operator is:
> ---
> Suppose you wish to purchase books and you are looking for books
> with high rating and low price. However, both the criteria of
> selecting books are complementary since books of higher rating are
> generally more expensive. For finding such books, you'll query the
> database system of the book store which will return a set of
> interesting books. The word 'interesting' implies all the books
> which are as good or better in both the dimensions (rating and
> price) and better in at least one dimension. This set of
> interesting points forms the Skyline.
> Skyline operator finds points which are not dominated by other data
> points. A point dominates another point if it is as good or better
> in all dimensions and better in at least one dimension.
>
> For specifying the Skyline queries, we extend SQL SELECT statement
> by an optional SKYLINE OF clause as given below:
>
> SELECT ... FROM ... WHERE...
>
> GROUP BY ... HAVING...
>
> SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF],  .., dm [MIN | MAX |
> DIFF]
>
> ORDER BY...
>
>
> Where, d1, d2 ,…, dm denote the dimensions of the Skyline, and MIN,
> MAX, DIFF specify whether the value in that dimension should be
> minimized, maximized, or simply be different. When DIFF is
> specified, two tuples are compared only if the value of the
> attribute on which DIFF is applied is different.
>
> When DISTINCT clause is specified and if there are two or more
> tuples with the same values of skyline attributes, then only one of
> them is retained in the skyline set. Otherwise, all of them are
> retained.
>
> Let's consider the above example of purchasing books with high
> rating and low price.
>
>
> Book Name
>
> Rating (out of 5)
>
> Price (Rs)
>
> Prodigal Daughter
>
> 3
>
> 250
>
> The city of Joy
>
> 5
>
> 400
>
> Vanishing Acts
>
> 2
>
> 250
>
> The Notebook
>
> 4
>
> 300
>
> Fountain Head
>
> 5
>
> 350
>
> Dear John
>
> 5
>
> 500
>
> Table1. Sample of book database
>
>
> Now, in order to get books with high rating and low price, you
> simply can issue the following query:
>
> SELECT *
>
> FROM Books
>
> SKYLINE OF rating MAX, price MIN;
>
>
> The Skyline set returned will be:
>
>
> Book Name
>
> Rating (out of 5)
>
> Price (Rs)
>
> Prodigal Daughter
>
> 3
>
> 250
>
> The Notebook
>
> 4
>
> 300
>
> Fountain Head
>
> 5
>
> 350
>
> Table2. Skyline set
>
>
> From this set, you can now make your choice of books, by weighing
> your personal preferences for price and rating of the books.
>
> For more information, you can refer to:
> S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. In
> ICDE, pages 421.430, 2001
>
> ---
>
> Thanks.
>
>
>
> On 3/3/07, Martijn van Oosterhout <kleptog@svana.org> wrote: On
> Sat, Mar 03, 2007 at 07:02:41PM +0530, ranbeer makin wrote:
> > We at International Institute of Information Technology (IIIT)
> Hyderabad,
> > India, have extended the Postgres database
> > system with the skyline operation. For this work, we were guided
> by our
> > Prof. Kamalakar Karlapalem
> > (http://www.iiit.ac.in/~kamal/).
>
> <snip>
>
> > Can this piece of work contribute to PostgreSQL? If yes, then
> we'll send out
> > a detailed report of this project including changes
> > made, issues involved/need to be solved, limitations, future
> work, and the
> > source code etc.
>
> Well, that kind of depends. I have no idea what "Skyline" means so
> telling us what it is would be a good start
>
> Have a nice day,
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/
> kleptog/
> > From each according to his ability. To each according to his
> ability to litigate.
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iD8DBQFF6XrkIB7bNG8LQkwRAqw8AJ0UKAy41OMxdgLUdY1G+e7R6/jGPwCZAQY4
> 9uCKFUW65UBIx7fpogR75Yo=
> =6Yc0
> -----END PGP SIGNATURE-----
>
>

--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)




Re: PostgreSQL - 'SKYLINE OF' clause added!

From
David Fetter
Date:
On Mon, Mar 05, 2007 at 09:04:46PM -0600, Jim Nasby wrote:
> FWIW, this sounds like a subset of the Query By Example stuff that  
> someone is working on. I don't have a URL handy since I'm on a plane,  
> but I think google can find it.

It's now called ObelisQ <http://pgfoundry.org/projects/qbe>

Cheers,
D
> 
> On Mar 3, 2007, at 8:12 AM, ranbeer makin wrote:
> 
> >
> >Here is a description of what the SKYLINE operator is:
> >---
> >Suppose you wish to purchase books and you are looking for books  
> >with high rating and low price. However, both the criteria of  
> >selecting books are complementary since books of higher rating are  
> >generally more expensive. For finding such books, you'll query the  
> >database system of the book store which will return a set of  
> >interesting books. The word 'interesting' implies all the books  
> >which are as good or better in both the dimensions (rating and  
> >price) and better in at least one dimension. This set of  
> >interesting points forms the Skyline.
> >Skyline operator finds points which are not dominated by other data  
> >points. A point dominates another point if it is as good or better  
> >in all dimensions and better in at least one dimension.
> >
> >For specifying the Skyline queries, we extend SQL SELECT statement  
> >by an optional SKYLINE OF clause as given below:
> >
> >SELECT ... FROM ... WHERE...
> >
> >GROUP BY ... HAVING...
> >
> >SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF],  .., dm [MIN | MAX |  
> >DIFF]
> >
> >ORDER BY...
> >
> >
> >Where, d1, d2 ,…, dm denote the dimensions of the Skyline, and MIN,  
> >MAX, DIFF specify whether the value in that dimension should be  
> >minimized, maximized, or simply be different. When DIFF is  
> >specified, two tuples are compared only if the value of the  
> >attribute on which DIFF is applied is different.
> >
> >When DISTINCT clause is specified and if there are two or more  
> >tuples with the same values of skyline attributes, then only one of  
> >them is retained in the skyline set. Otherwise, all of them are  
> >retained.
> >
> >Let's consider the above example of purchasing books with high  
> >rating and low price.
> >
> >
> >Book Name
> >
> >Rating (out of 5)
> >
> >Price (Rs)
> >
> >Prodigal Daughter
> >
> >3
> >
> >250
> >
> >The city of Joy
> >
> >5
> >
> >400
> >
> >Vanishing Acts
> >
> >2
> >
> >250
> >
> >The Notebook
> >
> >4
> >
> >300
> >
> >Fountain Head
> >
> >5
> >
> >350
> >
> >Dear John
> >
> >5
> >
> >500
> >
> >Table1. Sample of book database
> >
> >
> >Now, in order to get books with high rating and low price, you  
> >simply can issue the following query:
> >
> >SELECT *
> >
> >FROM Books
> >
> >SKYLINE OF rating MAX, price MIN;
> >
> >
> >The Skyline set returned will be:
> >
> >
> >Book Name
> >
> >Rating (out of 5)
> >
> >Price (Rs)
> >
> >Prodigal Daughter
> >
> >3
> >
> >250
> >
> >The Notebook
> >
> >4
> >
> >300
> >
> >Fountain Head
> >
> >5
> >
> >350
> >
> >Table2. Skyline set
> >
> >
> >From this set, you can now make your choice of books, by weighing  
> >your personal preferences for price and rating of the books.
> >
> >For more information, you can refer to:
> >S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. In  
> >ICDE, pages 421.430, 2001
> >
> >---
> >
> >Thanks.
> >
> >
> >
> >On 3/3/07, Martijn van Oosterhout <kleptog@svana.org> wrote: On  
> >Sat, Mar 03, 2007 at 07:02:41PM +0530, ranbeer makin wrote:
> >> We at International Institute of Information Technology (IIIT)  
> >Hyderabad,
> >> India, have extended the Postgres database
> >> system with the skyline operation. For this work, we were guided  
> >by our
> >> Prof. Kamalakar Karlapalem
> >> (http://www.iiit.ac.in/~kamal/).
> >
> ><snip>
> >
> >> Can this piece of work contribute to PostgreSQL? If yes, then  
> >we'll send out
> >> a detailed report of this project including changes
> >> made, issues involved/need to be solved, limitations, future  
> >work, and the
> >> source code etc.
> >
> >Well, that kind of depends. I have no idea what "Skyline" means so
> >telling us what it is would be a good start
> >
> >Have a nice day,
> >--
> >Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/ 
> >kleptog/
> >> From each according to his ability. To each according to his  
> >ability to litigate.
> >
> >-----BEGIN PGP SIGNATURE-----
> >Version: GnuPG v1.4.1 (GNU/Linux)
> >
> >iD8DBQFF6XrkIB7bNG8LQkwRAqw8AJ0UKAy41OMxdgLUdY1G+e7R6/jGPwCZAQY4
> >9uCKFUW65UBIx7fpogR75Yo=
> >=6Yc0
> >-----END PGP SIGNATURE-----
> >
> >
> 
> --
> Jim Nasby                                            jim@nasby.net
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly

-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Chris Browne
Date:
decibel@decibel.org (Jim Nasby) writes:
> FWIW, this sounds like a subset of the Query By Example stuff that
> someone is working on. I don't have a URL handy since I'm on a plane,
> but I think google can find it.

The pgFoundry project is here...  http://pgfoundry.org/projects/qbe

And yes, indeed, this sounds quite a lot like what Meredith Patterson
presented at the Toronto conference.
-- 
(reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc"))
http://linuxfinances.info/info/finances.html
Rules of  the Evil Overlord #189. "I  will never tell the  hero "Yes I
was the one who  did it, but you'll never be able  to prove it to that
incompetent  old fool."  Chances  are, that  incompetent  old fool  is
standing behind the curtain."  <http://www.eviloverlord.com/>


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Josh Berkus
Date:
> And yes, indeed, this sounds quite a lot like what Meredith Patterson
> presented at the Toronto conference.

This would be good to have, though, since Meredith's work has some problematic 
IP encumbrances.

Question, though: is the SKYLINE syntax part of a standard anywhere?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Question, though: is the SKYLINE syntax part of a standard anywhere?

There's certainly not anything like that in SQL2003.

I'm also kind of wondering if the main use-cases couldn't be met with
suitable multi-input custom aggregates, which is something we already
have as of 8.2.
        regards, tom lane


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Josh Berkus
Date:
Tom,

> I'm also kind of wondering if the main use-cases couldn't be met with
> suitable multi-input custom aggregates, which is something we already
> have as of 8.2.

Actually, given that "skyline of" is *only* for aggregate sorting (as far as I 
can tell) it doesn't present the complications which QBE did for using a 
function interface.   

Ranbeer, would it be possible to use an aggregate function syntax instead of 
the SKYLINE OF syntax extension?   This allows us to sidestep the issue of 
non-standard additions to the reserved word list.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Alvaro Herrera
Date:
Josh Berkus wrote:
> Tom,
> 
> > I'm also kind of wondering if the main use-cases couldn't be met with
> > suitable multi-input custom aggregates, which is something we already
> > have as of 8.2.
> 
> Actually, given that "skyline of" is *only* for aggregate sorting (as far as I 
> can tell) it doesn't present the complications which QBE did for using a 
> function interface.   

There is people on a Venezuelan university working on SKYLINE OF and
other operators on Postgres.  I had some looks at their work because
they asked for help in the spanish list.

Not only they added the SKYLINE OF clause, but they also had some mods
to the ORDER BY clause, and a couple of other grammar changes as well.
While SKYLINE OF itself could probably be folded as aggregates, the
other stuff is not likely to be amenable to such treatment.

Also, keep in mind that there were plenty of changes in the executor.
This stuff is not likely to be very easy to implement efficiently using
our extant executor machinery; note that Ranbeer mentioned
implementation of "block nested loop" and other algorithms.  Not sure
how easy would be to fold that stuff into the optimizer for multi-input
aggregates, instead of hardwiring it to the SKYLINE OF syntax.

There's a certain group in the Venezuelan Uni that was about to finish
their thesis.  They promised me a look into their report; maybe I can
give further input from that and maybe merge Ranbeer's stuff with it.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Gavin Sherry
Date:
On Tue, 6 Mar 2007, Alvaro Herrera wrote:

> Also, keep in mind that there were plenty of changes in the executor.
> This stuff is not likely to be very easy to implement efficiently using
> our extant executor machinery; note that Ranbeer mentioned
> implementation of "block nested loop" and other algorithms.  Not sure
> how easy would be to fold that stuff into the optimizer for multi-input
> aggregates, instead of hardwiring it to the SKYLINE OF syntax.
>

Yes, there's been a lot of working on calculating skyline efficiently,
with different sorting techniques and so on. This is the most interesting
part of the idea. You could calculate the query Ranbeer gave using pure
SQL and, perhaps, use of some covariance aggregates or something already.
Of course, it gets harder when you want to calculate across many
dimensions.

Personally, I'd love to see some of these newer data analysis
capabilities added to PostgreSQL -- or at least put out there as
interesting patches.

Thanks,

Gavin


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Josh Berkus
Date:
Gavin,

> Personally, I'd love to see some of these newer data analysis
> capabilities added to PostgreSQL -- or at least put out there as
> interesting patches.

I think if the code is good enough, and we can avoid horrible non-standard 
syntax extensions, they should go in.   We have to defend our title as "most 
advanced database" and having stuff like Skyline first (before DB2 or MS) 
goes a long way for that.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I think if the code is good enough, and we can avoid horrible non-standard 
> syntax extensions, they should go in.   We have to defend our title as "most 
> advanced database" and having stuff like Skyline first (before DB2 or MS) 
> goes a long way for that.

Well, whether it's horrible or not is in the eye of the beholder, but
this is certainly a non-standard syntax extension.

My questions about whether to adopt it have more to do with
cost/benefit.  I haven't seen the patch, but it sounds like it will be
large and messy; and it's for a feature that nobody ever heard of before,
let alone one that the community has developed a consensus it wants.
I'm not interested in adopting stuff just "because DB2 hasn't got it".

It's also worth noting that what we've got here is a large patch
developed, by students, completely outside our normal development
process; so the odds that it's going to be anywhere near acceptable are
low.  I think the last time we applied a patch that met that description
was the INTERSECT/EXCEPT patch in 1999 ... maybe you don't remember
what a fiasco that was, but I do.

Sorry to be a thrower of cold water, but I just don't see that this
comes anywhere near being something we should be eager to accept.
        regards, tom lane


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
David Fuhry
Date:
The query Ranbeer gave - as with any skyline query - can be solved with 
just pure SQL:

select * from books b where not exists(  select * from books b2 where    b2.rating >= b.rating and b2.price <= b.price
and   (b2.rating > b.rating or b2.price < b.price)
 
);     book_name     | rating | price
-------------------+--------+------- Prodigal Daughter |      3 |   250 The Notebook      |      4 |   300 Fountain
Head    |      5 |   350
 
(3 rows)

The idea of the BNL (block nested loop) skyline algorithm is to avoid 
the nested loop by storing "dominating" records as the query proceeds - 
in the above example, records which are relatively high in rating and 
low in price - and comparing each candidate record to those first.

BNL is the most reasonable skyline algorithm in the absence of a 
multidimensional (usually R-Tree) index on the columns.  For answering 
skyline queries where such an index exists over all query columns, the 
most broadly used generalized algorithm is BBS [1].

Thanks,

Dave Fuhry

[1] Papadias, D., Tao, Y., Fu, G., and Seeger, B. 2005. Progressive 
skyline computation in database systems. ACM Trans. Database Syst. 30, 1 
(Mar. 2005), 41-82. DOI= http://doi.acm.org/10.1145/1061318.1061320

Gavin Sherry wrote:
> On Tue, 6 Mar 2007, Alvaro Herrera wrote:
> 
>> Also, keep in mind that there were plenty of changes in the executor.
>> This stuff is not likely to be very easy to implement efficiently using
>> our extant executor machinery; note that Ranbeer mentioned
>> implementation of "block nested loop" and other algorithms.  Not sure
>> how easy would be to fold that stuff into the optimizer for multi-input
>> aggregates, instead of hardwiring it to the SKYLINE OF syntax.
>>
> 
> Yes, there's been a lot of working on calculating skyline efficiently,
> with different sorting techniques and so on. This is the most interesting
> part of the idea. You could calculate the query Ranbeer gave using pure
> SQL and, perhaps, use of some covariance aggregates or something already.
> Of course, it gets harder when you want to calculate across many
> dimensions.
> 
> Personally, I'd love to see some of these newer data analysis
> capabilities added to PostgreSQL -- or at least put out there as
> interesting patches.
> 
> Thanks,
> 
> Gavin
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Josh Berkus
Date:
Tom,

> My questions about whether to adopt it have more to do with
> cost/benefit.  I haven't seen the patch, but it sounds like it will be
> large and messy; and it's for a feature that nobody ever heard of before,
> let alone one that the community has developed a consensus it wants.
> I'm not interested in adopting stuff just "because DB2 hasn't got it".

OK, to make it a clearer case: we have an increasing user base using 
PostgreSQL for decision support.  One of the reasons for this is that PG is 
the *only* OSDB which does a decent job of DSS.  Adding unique DSS features 
will make PostgreSQL attractive to a lot more DSS application developers, and 
help make up for the things which we don't have yet (parallel query, async 
I/O, windowing functions).  

"Approximate queries" is something with DSS users *want*.  Jim Grey addressed 
this in his ACM editiorial on the databases of the future.  It's something 
that *I* want, and if the Greenplum people aren't speaking up here, it's 
because they're not paying atttention.

Now, I don't know if this Skyline patch is our answer for approximate queries.  
Maybe I should pester Meredith about getting QBE free of its IP issues; it 
certainly looked more flexible than Skyline.  In either case, the code 
probably needs a complete refactor. 

But I think that approximate queries ought to be on our TODO list.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Mark Kirkwood
Date:
Josh Berkus wrote:
> 
> Now, I don't know if this Skyline patch is our answer for approximate queries.  
> Maybe I should pester Meredith about getting QBE free of its IP issues; it 
> certainly looked more flexible than Skyline.  In either case, the code 
> probably needs a complete refactor. 
> 
> But I think that approximate queries ought to be on our TODO list.
>

+1




Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Gavin Sherry
Date:
On Wed, 7 Mar 2007, Josh Berkus wrote:

> "Approximate queries" is something with DSS users *want*.  Jim Grey addressed
> this in his ACM editiorial on the databases of the future.  It's something
> that *I* want, and if the Greenplum people aren't speaking up here, it's
> because they're not paying atttention.
>
> Now, I don't know if this Skyline patch is our answer for approximate queries.
> Maybe I should pester Meredith about getting QBE free of its IP issues; it
> certainly looked more flexible than Skyline.  In either case, the code
> probably needs a complete refactor.

What people want from "approximate queries" is different to this: the
desire is usually to balance run time with level of accuracy/quality (some
times the desire is to have accurate results as well as similar results).
Neither skyline or QBE are about this. The only thing in the spec which
addresses this is 'tablesample'.

Thanks,

Gavin


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
"Luke Lonergan"
Date:
<p><font size="2">Yep - we're paying attention Josh!<br /><br /> I like the category being explored with skyline, I'm
notsure yet how it fits with existing 'soft data' models and applications that use them.<br /><br /> If SKYLINE is
interestingto app developers, maybe we should consider it for Bizgres?<br /><br /> - Luke<br /><br /> Msg is shrt cuz m
onma treo<br /><br />  -----Original Message-----<br /> From:   Gavin Sherry [<a
href="mailto:swm@alcove.com.au">mailto:swm@alcove.com.au</a>]<br/> Sent:   Wednesday, March 07, 2007 05:44 PM Eastern
StandardTime<br /> To:     Josh Berkus<br /> Cc:     Tom Lane; pgsql-hackers@postgresql.org; Alvaro Herrera; Chris
Browne<br/> Subject:        Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!<br /><br /> On Wed, 7 Mar 2007, Josh
Berkuswrote:<br /><br /> > "Approximate queries" is something with DSS users *want*.  Jim Grey addressed<br /> >
thisin his ACM editiorial on the databases of the future.  It's something<br /> > that *I* want, and if the
Greenplumpeople aren't speaking up here, it's<br /> > because they're not paying atttention.<br /> ><br /> >
Now,I don't know if this Skyline patch is our answer for approximate queries.<br /> > Maybe I should pester Meredith
aboutgetting QBE free of its IP issues; it<br /> > certainly looked more flexible than Skyline.  In either case, the
code<br/> > probably needs a complete refactor.<br /><br /> What people want from "approximate queries" is different
tothis: the<br /> desire is usually to balance run time with level of accuracy/quality (some<br /> times the desire is
tohave accurate results as well as similar results).<br /> Neither skyline or QBE are about this. The only thing in the
specwhich<br /> addresses this is 'tablesample'.<br /><br /> Thanks,<br /><br /> Gavin<br /><br />
---------------------------(endof broadcast)---------------------------<br /> TIP 9: In versions below 8.0, the planner
willignore your desire to<br />        choose an index scan if your joining column's datatypes do not<br />       
match<br/><br /></font> 

Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Shane Ambler
Date:
Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> I think if the code is good enough, and we can avoid horrible non-standard 
>> syntax extensions, they should go in.   We have to defend our title as "most 
>> advanced database" and having stuff like Skyline first (before DB2 or MS) 
>> goes a long way for that.
> 
> Well, whether it's horrible or not is in the eye of the beholder, but
> this is certainly a non-standard syntax extension.

Being non-standard should not be the only reason to reject a worthwhile 
feature. Do you really believe that the SQL standard covers every 
feature that a RDBMS could ever want to implement? Do you think that the 
current non-standard features of PostgreSQL should be removed?

> My questions about whether to adopt it have more to do with
> cost/benefit.  I haven't seen the patch, but it sounds like it will be
> large and messy; and it's for a feature that nobody ever heard of before,
> let alone one that the community has developed a consensus it wants.
> I'm not interested in adopting stuff just "because DB2 hasn't got it".

Partially agree but I do think it is worth looking at to see if some or 
all of the feature is worth implementing. The fact that several 
different groups have been mentioned to be working on this feature would 
indicate that it is worth considering. Maybe one of the other groups 
will have implemented it better than the first off the rank. Maybe our 
core developers can work out a better way to implement these features.

A few people on this list have said they are interested in this.

> It's also worth noting that what we've got here is a large patch
> developed, by students, completely outside our normal development
> process; so the odds that it's going to be anywhere near acceptable are
> low.  I think the last time we applied a patch that met that description
> was the INTERSECT/EXCEPT patch in 1999 ... maybe you don't remember
> what a fiasco that was, but I do.

True but the quals he has listed on his web pages look impressive and 
probably give him a little reason to have his work considered/looked at. 
He may just end up being a main PostgreSQL developer in the future.

> Sorry to be a thrower of cold water, but I just don't see that this
> comes anywhere near being something we should be eager to accept.

True we shouldn't just say "sounds good let's put it in" but with some 
indication that this feature is along the lines of what users want, 
would indicate that we should be asking -

Do we want this or a similar feature?
Is the theory behind this feature solid?
Can the same end results be gained with other existing methods?
Is the implementation offered worth considering?
Has it been developed to meet the PostgreSQL developer guidelines?
Is it reasonable to work on it to reach a level of quality/performance 
that we will be happy to include?
Can we implement this feature better ourselves?
Do we want to start this feature from scratch ourselves?



-- 

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Tom Lane
Date:
Shane Ambler <pgsql@Sheeky.Biz> writes:
> Tom Lane wrote:
>> Well, whether it's horrible or not is in the eye of the beholder, but
>> this is certainly a non-standard syntax extension.

> Being non-standard should not be the only reason to reject a worthwhile 
> feature.

No, but being non-standard is certainly an indicator that the feature
may not be of widespread interest --- if it were, the SQL committee
would've gotten around to including it; seems they've managed to include
everything but the kitchen sink already.  Add to that the complete lack
of any previous demand for the feature, and you have to wonder where the
market is.

> The fact that several 
> different groups have been mentioned to be working on this feature would 
> indicate that it is worth considering.

It looks to me more like someone published a paper that caught the
attention of a few profs looking for term projects for their students.

Now maybe it really is the best idea since sliced bread and will be seen
in the next SQL spec edition, but color me skeptical.  It seems to me
to be a very narrow-usage extension, as opposed to (eg) multi-input
aggregates or WITH/RECURSIVE, which provide general mechanisms applicable
to a multitude of problems.  Now even so it would be fine if the
implementation were similarly narrow in scope, but the published
description of the patch mentions a large chunk of additional executor
mechanisms.  If we're going to be adding as much code as that, I'd like
to see a wider scope of usage for it.

Basically, this patch isn't sounding like it has a reasonable
bang-to-the-buck ratio ...
        regards, tom lane


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Nikita
Date:
Few things from our side:

1. 'Skyline Of' is a new operator proposed in ICDE 2003, one of the topmost conferences of Data Engineering. Skyline operation is a hot area of research in query processing. Many of the database community people do know about this operator, and it is fast catching the attention.

2. The skyline operation is very useful in data analysis. Suppose, if we have a cricket database, and we want to find the bowlers who have taken maximum wickets in minimum overs, we can issue an easy-to-write query using 'Skyline of' syntax as follows:

Select * from Player_Match Skyline Of overs_bowled min, wickets_taken max;

This query gives 25 interesting tuples (result set) out of 24750 tuples in 0.0509 seconds. The same result is obtained in 0.8228 seconds if the following equivalent nested-query is issued:

select * from Player_Match p1 where not exists ( select * from Player_Match p2 where p2.overs_bowled <= p1.overs_bowled and p2.wickets_taken >= p1.wickets_taken and (p2.overs_bowled < p1.overs_bowled or p2.wickets_taken > p1.wickets_taken))

Note that the above time is the time elapsed between issuing a query and obtaining the result set.
As can be seen, the above query looks pretty cumbersome to write and is inefficient too. So, which query will the user prefer? As the number of dimensions increases, writing a nested-query will become a hedious task.
Btw, how can such a query be written using aggregate function syntax??

3. As far as optimizing the Skyline is concerned, it is still a research problem since it requires estimating the cardinality of the skyline result set.

4. Until and unless this operator is implemented in a popular database system, how can a user ever get to know about it and hence appreciate its usefulness?

Btw, it was our B.Tech final year project, and not a term project :-)

Regards.

On 3/8/07, Tom Lane <tgl@sss.pgh.pa.us > wrote:
Shane Ambler <pgsql@Sheeky.Biz> writes:
> Tom Lane wrote:
>> Well, whether it's horrible or not is in the eye of the beholder, but
>> this is certainly a non-standard syntax extension.

> Being non-standard should not be the only reason to reject a worthwhile
> feature.

No, but being non-standard is certainly an indicator that the feature
may not be of widespread interest --- if it were, the SQL committee
would've gotten around to including it; seems they've managed to include
everything but the kitchen sink already.  Add to that the complete lack
of any previous demand for the feature, and you have to wonder where the
market is.

> The fact that several
> different groups have been mentioned to be working on this feature would
> indicate that it is worth considering.

It looks to me more like someone published a paper that caught the
attention of a few profs looking for term projects for their students.

Now maybe it really is the best idea since sliced bread and will be seen
in the next SQL spec edition, but color me skeptical.  It seems to me
to be a very narrow-usage extension, as opposed to (eg) multi-input
aggregates or WITH/RECURSIVE, which provide general mechanisms applicable
to a multitude of problems.  Now even so it would be fine if the
implementation were similarly narrow in scope, but the published
description of the patch mentions a large chunk of additional executor
mechanisms.  If we're going to be adding as much code as that, I'd like
to see a wider scope of usage for it.

Basically, this patch isn't sounding like it has a reasonable
bang-to-the-buck ratio ...

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend



--
Pride sullies the noblest character

Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Naz Gassiep
Date:
I do see your points regarding the existence of use cases for this 
feature, and I agree that at worst, the implementation of this feature 
would provide a way to greatly simplify query design and at best provide 
a whole new method of obtaining decision supporting data from a 
relational database.

However I am strongly in disagreement with your fourth point, I.e., that 
users will only become aware of it once it has been implemented. This 
sort of mentality is what gave us the sad case of late 90s HTML in which 
browser vendors assumed that they could use the "if you build it they 
will come" argument for feature extension of the HTML spec. That is a 
debacle we are still suffering the effects of. Let us not do the same to 
SQL and implement SKYLINE on our own, only to have other DBMS vendors 
implement it in different ways and then finally when the SQL standard 
includes it they try to make some kind of average approximation of the 
implementations resulting in *none* of the DBs being compliant. Then 
we'll be between the rock of breaking backwards compatibility and the 
hard place of unwarranted standards non-compliance.

While Josh did point out that being in the leading group as far as 
implementing new functionality goes, I feel that it has to be weighed 
against the need to not strike out too aggressively, potentially 
isolating ourselves with excessive non-standard syntax or behavior.

While I am convinced there is a strong use case for this functionality 
and we should definitely start looking at it, I don't see why we should 
be in a rush to get it into core. People have survived without it up to 
now, I don't think our userbase will suffer if it is implemented 6 
months after <foo commercial DB> implements it, at least, not as much as 
it will suffer if we start drifting away from standards compliance.

Just my 2 rupees. :)

- Naz

Nikita wrote:
> Few things from our side:
>
> 1. 'Skyline Of' is a new operator proposed in ICDE 2003, one of the 
> topmost conferences of Data Engineering. Skyline operation is a hot 
> area of research in query processing. Many of the database community 
> people do know about this operator, and it is fast catching the 
> attention.
>
> 2. The skyline operation is very useful in data analysis. Suppose, if 
> we have a cricket database, and we want to find the bowlers who have 
> taken maximum wickets in minimum overs, we can issue an easy-to-write 
> query using 'Skyline of' syntax as follows:
>
> Select * from Player_Match Skyline Of overs_bowled min, wickets_taken max;
>
> This query gives 25 interesting tuples (result set) out of 24750 
> tuples in 0.0509 seconds. The same result is obtained in 0.8228 
> seconds if the following equivalent nested-query is issued:
>
> select * from Player_Match p1 where not exists ( select * from 
> Player_Match p2 where p2.overs_bowled <= p1.overs_bowled and 
> p2.wickets_taken >= p1.wickets_taken and (p2.overs_bowled < 
> p1.overs_bowled or p2.wickets_taken > p1.wickets_taken))
>
> Note that the above time is the time elapsed between issuing a query 
> and obtaining the result set.
> As can be seen, the above query looks pretty cumbersome to write and 
> is inefficient too. So, which query will the user prefer? As the 
> number of dimensions increases, writing a nested-query will become a 
> hedious task.
> Btw, how can such a query be written using aggregate function syntax??
>
> 3. As far as optimizing the Skyline is concerned, it is still a 
> research problem since it requires estimating the cardinality of the 
> skyline result set.
>
> 4. Until and unless this operator is implemented in a popular database 
> system, how can a user ever get to know about it and hence appreciate 
> its usefulness?
>
> Btw, it was our B.Tech final year project, and not a term project :-)
>
> Regards.
>
> On 3/8/07, *Tom Lane* <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> 
> wrote:
>
>     Shane Ambler <pgsql@Sheeky.Biz <mailto:pgsql@Sheeky.Biz>> writes:
>     > Tom Lane wrote:
>     >> Well, whether it's horrible or not is in the eye of the
>     beholder, but
>     >> this is certainly a non-standard syntax extension.
>
>     > Being non-standard should not be the only reason to reject a
>     worthwhile
>     > feature.
>
>     No, but being non-standard is certainly an indicator that the feature
>     may not be of widespread interest --- if it were, the SQL committee
>     would've gotten around to including it; seems they've managed to
>     include
>     everything but the kitchen sink already.  Add to that the complete
>     lack
>     of any previous demand for the feature, and you have to wonder
>     where the
>     market is.
>
>     > The fact that several
>     > different groups have been mentioned to be working on this
>     feature would
>     > indicate that it is worth considering.
>
>     It looks to me more like someone published a paper that caught the
>     attention of a few profs looking for term projects for their students.
>
>     Now maybe it really is the best idea since sliced bread and will
>     be seen
>     in the next SQL spec edition, but color me skeptical.  It seems to me
>     to be a very narrow-usage extension, as opposed to (eg) multi-input
>     aggregates or WITH/RECURSIVE, which provide general mechanisms
>     applicable
>     to a multitude of problems.  Now even so it would be fine if the
>     implementation were similarly narrow in scope, but the published
>     description of the patch mentions a large chunk of additional executor
>     mechanisms.  If we're going to be adding as much code as that, I'd
>     like
>     to see a wider scope of usage for it.
>
>     Basically, this patch isn't sounding like it has a reasonable
>     bang-to-the-buck ratio ...
>
>                             regards, tom lane
>
>     ---------------------------(end of
>     broadcast)---------------------------
>     TIP 6: explain analyze is your friend
>
>
>
>
> -- 
> Pride sullies the noblest character 


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Shane Ambler
Date:
Naz Gassiep wrote:

> Let us not do the same to 
> SQL and implement SKYLINE on our own, only to have other DBMS vendors 
> implement it in different ways and then finally when the SQL standard 
> includes it they try to make some kind of average approximation of the 
> implementations resulting in *none* of the DBs being compliant. Then 
> we'll be between the rock of breaking backwards compatibility and the 
> hard place of unwarranted standards non-compliance.
> 
> While Josh did point out that being in the leading group as far as 
> implementing new functionality goes, I feel that it has to be weighed 
> against the need to not strike out too aggressively, potentially 
> isolating ourselves with excessive non-standard syntax or behavior.
> 
> While I am convinced there is a strong use case for this functionality 
> and we should definitely start looking at it, I don't see why we should 
> be in a rush to get it into core. People have survived without it up to 
> now, I don't think our userbase will suffer if it is implemented 6 
> months after <foo commercial DB> implements it, at least, not as much as 
> it will suffer if we start drifting away from standards compliance.

And where did most of the SQL standard come from? A lot of it copies or 
is based on either the first db to implement a feature or the one to 
implement the best syntax.

And how much of the standard became standard because most of the db's 
had already implemented similar features?

Some things can syntactically be expressed more than one way, while 
others are limited in ways to coherently express what you want to achieve.

If we consider this thoroughly and compile a suitable syntax that covers 
all bases it could be used as the basis of the standard definition or be 
close to what ends up in the standard.


-- 

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz


Re: PostgreSQL - 'SKYLINE OF' clause added!

From
Tom Lane
Date:
Shane Ambler <pgsql@Sheeky.Biz> writes:
> If we consider this thoroughly and compile a suitable syntax that covers 
> all bases it could be used as the basis of the standard definition or be 
> close to what ends up in the standard.

I'll bet you a very good dinner that the word SKYLINE will never be seen
in the standard.

To me, the proposed feature seems an extremely narrow, special-purpose
thing.  The SQL committee have never been into that very much, and seem
even less interested in the last couple of revisions.  They like
mechanisms that can be used to solve a wide variety of problems, and
are not afraid to introduce conceptual complexity to get there.
Two examples for you: outer joins and recursive queries.  Oracle's
(+) syntax is more compact than what got into the spec, but less
precise and less functional.  For recursive queries, CONNECT BY is
way simpler than what got into the spec, but again doesn't cover as
much ground.  The SKYLINE clause seems to me to be right about on
par with CONNECT BY ... it does something useful, but only one thing.

I think the challenge for the SKYLINE authors is to recast what they did
as a general feature --- something feeling like, say, multi-argument
aggregates or window functions.  The mechanisms that they want to put
into the executor sound like they can do a whole lot more than SKYLINE;
find a way to expose that power.  And the fewer bespoke keywords needed,
the better ;-)
        regards, tom lane