Re: PostgreSQL - 'SKYLINE OF' clause added! - Mailing list pgsql-hackers

From Naz Gassiep
Subject Re: PostgreSQL - 'SKYLINE OF' clause added!
Date
Msg-id 45F41BC5.4040102@mira.net
Whole thread Raw
In response to Re: PostgreSQL - 'SKYLINE OF' clause added!  (Nikita <nikita.p@gmail.com>)
Responses Re: PostgreSQL - 'SKYLINE OF' clause added!
List pgsql-hackers
I do see your points regarding the existence of use cases for this 
feature, and I agree that at worst, the implementation of this feature 
would provide a way to greatly simplify query design and at best provide 
a whole new method of obtaining decision supporting data from a 
relational database.

However I am strongly in disagreement with your fourth point, I.e., that 
users will only become aware of it once it has been implemented. This 
sort of mentality is what gave us the sad case of late 90s HTML in which 
browser vendors assumed that they could use the "if you build it they 
will come" argument for feature extension of the HTML spec. That is a 
debacle we are still suffering the effects of. Let us not do the same to 
SQL and implement SKYLINE on our own, only to have other DBMS vendors 
implement it in different ways and then finally when the SQL standard 
includes it they try to make some kind of average approximation of the 
implementations resulting in *none* of the DBs being compliant. Then 
we'll be between the rock of breaking backwards compatibility and the 
hard place of unwarranted standards non-compliance.

While Josh did point out that being in the leading group as far as 
implementing new functionality goes, I feel that it has to be weighed 
against the need to not strike out too aggressively, potentially 
isolating ourselves with excessive non-standard syntax or behavior.

While I am convinced there is a strong use case for this functionality 
and we should definitely start looking at it, I don't see why we should 
be in a rush to get it into core. People have survived without it up to 
now, I don't think our userbase will suffer if it is implemented 6 
months after <foo commercial DB> implements it, at least, not as much as 
it will suffer if we start drifting away from standards compliance.

Just my 2 rupees. :)

- Naz

Nikita wrote:
> Few things from our side:
>
> 1. 'Skyline Of' is a new operator proposed in ICDE 2003, one of the 
> topmost conferences of Data Engineering. Skyline operation is a hot 
> area of research in query processing. Many of the database community 
> people do know about this operator, and it is fast catching the 
> attention.
>
> 2. The skyline operation is very useful in data analysis. Suppose, if 
> we have a cricket database, and we want to find the bowlers who have 
> taken maximum wickets in minimum overs, we can issue an easy-to-write 
> query using 'Skyline of' syntax as follows:
>
> Select * from Player_Match Skyline Of overs_bowled min, wickets_taken max;
>
> This query gives 25 interesting tuples (result set) out of 24750 
> tuples in 0.0509 seconds. The same result is obtained in 0.8228 
> seconds if the following equivalent nested-query is issued:
>
> select * from Player_Match p1 where not exists ( select * from 
> Player_Match p2 where p2.overs_bowled <= p1.overs_bowled and 
> p2.wickets_taken >= p1.wickets_taken and (p2.overs_bowled < 
> p1.overs_bowled or p2.wickets_taken > p1.wickets_taken))
>
> Note that the above time is the time elapsed between issuing a query 
> and obtaining the result set.
> As can be seen, the above query looks pretty cumbersome to write and 
> is inefficient too. So, which query will the user prefer? As the 
> number of dimensions increases, writing a nested-query will become a 
> hedious task.
> Btw, how can such a query be written using aggregate function syntax??
>
> 3. As far as optimizing the Skyline is concerned, it is still a 
> research problem since it requires estimating the cardinality of the 
> skyline result set.
>
> 4. Until and unless this operator is implemented in a popular database 
> system, how can a user ever get to know about it and hence appreciate 
> its usefulness?
>
> Btw, it was our B.Tech final year project, and not a term project :-)
>
> Regards.
>
> On 3/8/07, *Tom Lane* <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> 
> wrote:
>
>     Shane Ambler <pgsql@Sheeky.Biz <mailto:pgsql@Sheeky.Biz>> writes:
>     > Tom Lane wrote:
>     >> Well, whether it's horrible or not is in the eye of the
>     beholder, but
>     >> this is certainly a non-standard syntax extension.
>
>     > Being non-standard should not be the only reason to reject a
>     worthwhile
>     > feature.
>
>     No, but being non-standard is certainly an indicator that the feature
>     may not be of widespread interest --- if it were, the SQL committee
>     would've gotten around to including it; seems they've managed to
>     include
>     everything but the kitchen sink already.  Add to that the complete
>     lack
>     of any previous demand for the feature, and you have to wonder
>     where the
>     market is.
>
>     > The fact that several
>     > different groups have been mentioned to be working on this
>     feature would
>     > indicate that it is worth considering.
>
>     It looks to me more like someone published a paper that caught the
>     attention of a few profs looking for term projects for their students.
>
>     Now maybe it really is the best idea since sliced bread and will
>     be seen
>     in the next SQL spec edition, but color me skeptical.  It seems to me
>     to be a very narrow-usage extension, as opposed to (eg) multi-input
>     aggregates or WITH/RECURSIVE, which provide general mechanisms
>     applicable
>     to a multitude of problems.  Now even so it would be fine if the
>     implementation were similarly narrow in scope, but the published
>     description of the patch mentions a large chunk of additional executor
>     mechanisms.  If we're going to be adding as much code as that, I'd
>     like
>     to see a wider scope of usage for it.
>
>     Basically, this patch isn't sounding like it has a reasonable
>     bang-to-the-buck ratio ...
>
>                             regards, tom lane
>
>     ---------------------------(end of
>     broadcast)---------------------------
>     TIP 6: explain analyze is your friend
>
>
>
>
> -- 
> Pride sullies the noblest character 


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Grouped Index Tuples / Clustered Indexes
Next
From: "Florian G. Pflug"
Date:
Subject: Re: Grouped Index Tuples / Clustered Indexes