Thread: proposal: Preference SQL

proposal: Preference SQL

From

Jan Urbański

Date:

29 May 2008, 20:09:55

This is a proposal of a non-standard, albeit useful functionality in
Postgres (it is also a quite long email message).

Background:
I'm currently working on a GSoC project for PostgreSQL
(http://wiki.postgresql.org/wiki/Gsoc08-tss). But at the same time, I'm
now at the point where I need to choose a subject for my thesis.
Naturally, I thought of PG.
I have an idea to add a feature, that no other database currently has (I
haven't heard of it, anyway) and that could be potentially quite useful.
If not, at
least it'd be a great bragging point ;) But seriously, this could be
quite nice to have.

I'll have two years to complete my thesis, maybe not dedicating 100% of
my time to it, but that's still plenty of time to carefully design and
implement
a complex functionality.

What I need is some kind of decision: is it possible to eventually get
this feature integrated into core? Is it at all wanted? If not, I think
I'll carry on with implementing is just for the sake of my thesis and of
course my coding standards will be much lower (just joking :]). But if
there will be interest in doing it properly and releasing for the
general public, I'd try much harder to make it fast, portable, etc. etc.
adding work, but forseeing glory.

Now about the idea itself:

http://www.informatik.uni-augsburg.de/de/lehrstuehle/dbis/db/publications/all_db_tech-reports/tr-2001-7_kie_koe/tr-2001-7_kie_koe.pdf
That's one of the basic papers about Preference SQL, explaining what
it's all about. For those, who don't feel like reading through it just
because I said it's interesting, here's some more info (warning, it's a
bit formal):

Preference SQL is an extension to regular SQL, that allows expressing
preferences in SQL queries. Preferences are like "soft" WHERE clauses. A
preference doesn't need to be satisfied by a tuple for it to appear in
the result set, but it's "preferred" it is. More strictly, a set of
preference clauses in a SQL query defines a partial order on the result
set as it would appear without any preference clauses and then returns
the maximal elements.
The partial order imposed by the set of preferences P[1], P[2], ...,
P[n] is such that tuple T1 > T2 iff T1 all preferences T2 satisfies
and there is a preference satisfied by T1 and not satisfied by T2 (or
there is a preference satisfied by T1 that is "better" satisfied by T2
and all others are "equaly" satisfied). As can be seen, there could be
an order defined on the degree of satisfyiness of a preference, and the
exact semantics are not all that well defined for all concievable cases.
Defining a complete semantics will be a part of my thesis.

An example of a preference query would be (quoting the linked PDF):

SELECT * FROM programmers PREFERRING exp IN (‘java’, ‘C++’);

SELECT * FROM computers
PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed);

I guess you get the idea when looking at these queries, but some cases
can be tricky, and also there's more to preference SQL that explained
earlier. For more, refer to the aforementioned PDF, it's not that scary
at all.

Desing, implementation:
Nothing done, nothing even thought about as of now. I'm concentrating on
my GSoC project ;) But I'll have lots and lots of time for this, and
even as it will not be my priority thing to do (it *is* a thesis
subject, after all) it still think it's possible to implement.
I have seen two people doing a preference XPath implementation by
hacking Saxon in about three months. I guess it didn't support all of
it, I guess it was slow and more of a PoC and I'm sure it had a ton of
bugs, but it was quite impressive nonetheless.

So what say you? Preferring?

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

Re: proposal: Preference SQL

From

"Stephen R. van den Berg"

Date:

31 May 2008, 06:04:55

Jan Urba??ski wrote:
>Preference SQL is an extension to regular SQL, that allows expressing 
>preferences in SQL queries. Preferences are like "soft" WHERE clauses. A 
>preference doesn't need to be satisfied by a tuple for it to appear in 
>the result set, but it's "preferred" it is. More strictly, a set of 
>preference clauses in a SQL query defines a partial order on the result 
>set as it would appear without any preference clauses and then returns 
>the maximal elements.

>An example of a preference query would be (quoting the linked PDF):

>SELECT * FROM programmers PREFERRING exp IN (???java???, ???C++???);
>or
>SELECT * FROM computers
>PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed);

Forgive my ignorance, but it appears that this can already be achieved
by using a properly weighted ORDER BY clause, as in:

SELECT * FROM computers
ORDER BY HIGHEST(main_memory) DESC, HIGHEST(cpu_speed) DESC;
-- 
Sincerely,                                                          srb@cuci.nl          Stephen R. van den Berg.

"Sleep: A completely inadequate substitute for caffeine."

Re: proposal: Preference SQL

From

Jan Urbański

Date:

31 May 2008, 08:34:51

Stephen R. van den Berg wrote:
> Jan Urbański wrote:
>> An example of a preference query would be (quoting the linked PDF):
>
>> SELECT * FROM programmers PREFERRING exp IN ('java', 'C++');
>> or
>> SELECT * FROM computers
>> PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed);
>
> Forgive my ignorance, but it appears that this can already be achieved
> by using a properly weighted ORDER BY clause, as in:
>
> SELECT * FROM computers
> ORDER BY HIGHEST(main_memory) DESC, HIGHEST(cpu_speed) DESC;

No, these are quite different. Consider a table with three columns: id,
main_memory, cpu_speed containing four tuples:  id          main_memory        cpu_speed
---------------------------------------------------
comp1             100                      80
comp2             80                      100
comp3             100                     70
comp4             60                        60

Now the result of a SELECT id FROM computers PREFERRING
HIGHEST(main_memory) AND HIGHEST(cpu_speed) would be:   id
---------
comp1
comp2

This is because comp1 and comp2 are incomparable under the partial order
defined by the preferences. comp1 has the largest main memory and comp2
the fastest CPU, but the preference states you like main memory just as
much as CPU speed, so you get both tuples in the result. On the other
hand, comp3 is not in the result set, because comp1 is greater than it
under the preference partial order. The main_memory preference is
satisfied by comp3 just as well as it is by comp1, but the cpu_speed
preference is worse. The same goes for comp4.

And all this is significantly different from an ORDER BY, because first
it doesn't throw away any rows and second it gives you a linear order,
where every tuple can be compared with another. The clause you proposed
(though it's not legal in PG, because there is no HIGHEST function,
right?) would, as I understand it, prefer main memory more than CPU speed.

There are still some issues about the exact meaning of a PREFERRING
clause, but it is very different from a simple ORDER BY (and it has more
options than just PREFERRING and AND).
Anyway, from what I've read most or all preference clauses can be
rewritten to standard clauses, but sometimes it's difficult, and many
times it's costly.

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

Re: proposal: Preference SQL

From

Kevin Walker

Date:

31 May 2008, 11:11:47

<div style="text-align: left;">Yes, the preference clause can be rewritten using standard SQL.  The syntax to duplicate
theexample result set is listed below.  The syntax is not very flexible or easy to read.   <br /><br />select id <br
/>fromcomputer<br />where (main_memory = (select max(main_memory) <br />                      from computer)<br
/>      and cpu_speed = (select max(cpu_speed) <br />                        from computer<br />                       
wherecpu_speed < (select max(cpu_speed) from computer)))<br />   or (cpu_speed = (select max(cpu_speed) <br /> 
;                      from computer)<br />       and   main_memory = (select max(main_memory) <br
/>                       from computer<br />                         where main_memory < (select max(main_memory)
fromcomputer)))<br />;<br />~ <br />Kevin Walker<br /><br />-----Original Message-----<br />From:
pgsql-hackers-owner@postgresql.org[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Jan Urbanski<br />Sent:
Saturday,May 31, 2008 7:34 AM<br />To: Stephen R. van den Berg<br />Cc: Postgres - Hackers<br />Subject: Re: [HACKERS]
proposal:Preference SQL<br /><br />Stephen R. van den Berg wrote:<br />> Jan Ur bański wrote:<br />>> An
exampleof a preference query would be (quoting the linked PDF):<br />> <br />>> SELECT * FROM programmers
PREFERRINGexp IN ('java', 'C++'); or <br />>> SELECT * FROM computers PREFERRING HIGHEST(main_memory) AND <br
/>>>HIGHEST(cpu_speed);<br />> <br />> Forgive my ignorance, but it appears that this can already be
achieved<br />> by using a properly weighted ORDER BY clause, as in:<br />> <br />> SELECT * FROM computers<br
/>>ORDER BY HIGHEST(main_memory) DESC, HIGHEST(cpu_speed) DESC;<br /><br />No, these are quite different. Consider a
tablewith three columns: id, main_memory, cpu_speed containing four tuples:<br />   id          main_memory       
cpu_speed<br/>---------------------------------------------------<br />comp1             100                      
80<br/>comp2             80                      100<br />comp3             100                     70<br
/>comp4            60                        60<br /><br />Now the result of a SELECT id FROM computers PREFERRING<br
/>HIGHEST(main_memory)AND HIGHEST(cpu_speed) would be:<br />    id<br />---------<br />comp1<br />comp2<br /><br />This
isbecause comp1 and comp2 are incomparable under the partial orde r defined by the preferences. comp1 has the largest
mainmemory and comp2 the fastest CPU, but the preference states you like main memory just as much as CPU speed, so you
getboth tuples in the result. On the other hand, comp3 is not in the result set, because comp1 is greater than it under
thepreference partial order. The main_memory preference is satisfied by comp3 just as well as it is by comp1, but the
cpu_speedpreference is worse. The same goes for comp4.<br /><br />And all this is significantly different from an ORDER
BY,because first it doesn't throw away any rows and second it gives you a linear order, where every tuple can be
comparedwith another. The clause you proposed (though it's not legal in PG, because there is no HIGHEST function,<br
/>right?)would, as I understand it, prefer main memory more than CPU speed.<br /><br />There are still some issues
aboutthe exact meaning of a PREFERRING clause, but it is very different from a simple ORDER BY (and it has more optio
nsthan just PREFERRING and AND).<br />Anyway, from what I've read most or all preference clauses can be rewritten to
standardclauses, but sometimes it's difficult, and many times it's costly.<br /><br />Cheers,<br />Jan<br /><br />--<br
/>JanUrbanski<br />GPG key ID: E583D7D2<br /><br />ouden estin<br /><br /><br />--<br />Sent via pgsql-hackers mailing
list(pgsql-hackers@postgresql.org) To make changes to your subscription:<br
/>http://www.postgresql.org/mailpref/pgsql-hackers<br/><br /></div><br /><hr />Make every e-mail and IM count. <a
href="http://im.live.com/Messenger/IM/Join/Default.aspx?source=EML_WL_MakeCount" target="_new">Join the i'm Initiative
fromMicrosoft.</a>

Re: proposal: Preference SQL

From

"Heikki Linnakangas"

Date:

31 May 2008, 11:46:37

Jan Urbański wrote:
> SELECT * FROM computers
> PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed);

This seems similar to the SKYLINE OF patch that was discussed a year (?) 
ago. Are you familiar with that project? Can you summarize the differences?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: proposal: Preference SQL

From

Jan Urbański

Date:

31 May 2008, 11:54:50

Kevin Walker wrote:
> Yes, the preference clause can be rewritten using standard SQL.  The syntax to duplicate the example result set is
listedbelow.  The syntax is not very flexible or easy to read.   

> 
> select id 
> from computer
> where (main_memory = (select max(main_memory) 
>                       from computer)
>        and cpu_speed = (select max(cpu_speed) 
>                         from computer
>                         where cpu_speed < (select max(cpu_speed) from computer)))
>    or (cpu_speed = (select max(cpu_speed) 
>                         from computer)
>        and   main_memory = (select max(main_memory) 
>                         from computer
>                          where main_memory < (select max(main_memory) from computer)))

Well, that's not 100% correct, but the idea is something like this. In 
particular, if you'd have only one entry in the table, then this query 
would not return any rows, which would be wrong. Also, if you had a 
computer that has larger main memory and a faster CPU than any other 
copmuter, it should be returned as the result, but the above query would 
fail to do that.

The point is not rewriting that particular preference query into a 
standard query. The point is whether it's worth having an automated 
mechanism for executing arbitrary preference queries with complex 
preferences (again: the syntax is richer, I didn't want to go into any 
detail about it before getting some feedback).

Let me give you a more sophisticated example.
You have a webpage that sells used cars. You have your typical search 
form with car make, colour, engine power and so on. Normally, you would 
make the search form input fields correspond to SQL WHERE clauses. So, 
if I want a white Honda with a 180 hp engine and about 40k kilometers of 
mileage I enter these parameters and hit the submit button. Now imagine 
I don't get any results for my search. That could mean that you have no 
Honda cars in stock, but it can also mean that you have my perfect Honda 
at a bargain price, it's only that it's black. Or maybe you have a 
Honda, but it has a 160 hp engine? Or is it just that the one perfect 
Honda you have has a mileage of just over 41k km, and that's why I 
didnt' get it in my result set? People seldom want a perfect match when 
they are searching for something. They want the best match, they can 
get. So, if I wanted to get a Honda with a decent engine and my 
favourite color is white, I'd say:

SELECT * FROM cars WHERE make = 'Honda' PREFERRING (power = '180' AND 
mileage AROUND 40000) CASCADE color = 'white';

Remember, that an AND in a preference clause constructs a partial order. 
The query says: I equally prefer having a 180 hp engine and having a car 
that has a mileage of 40k km. Tha CASCADE clause intrudoces a less 
important preference. It means that the color is not as important to me 
as power and mileage, but if I had a choice I'd take the white one.

I'd strongly recommend skimming through the paper I mentioned in my 
first email, it explains stuff much better than I do.

Preference SQL in Postgres could, as I see it, become one of the 
distinct features that no other widespread database system has and that 
could potentially be massively useful for online shops, social networks, 
etc. - you name it.
After hearing Bruce's keynote at PGCon and how Postgres now should be 
aiming at more that just catching up with the big guys I just thought: 
well, that's one neat feature that none of them has, that's useful, 
that's kind of sexy *and* I could get my degree out of it.

Cheers,
Jan

-- 
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

Re: proposal: Preference SQL

From

Jan Urbański

Date:

31 May 2008, 12:30:13

Heikki Linnakangas wrote:
> Jan Urbański wrote:
>> SELECT * FROM computers
>> PREFERRING HIGHEST(main_memory) AND HIGHEST(cpu_speed);
>
> This seems similar to the SKYLINE OF patch that was discussed a year (?)
> ago. Are you familiar with that project? Can you summarize the differences?

Oh, I wasn't familiar with it. After reading through the archives - it
it very similar. I guess what some people are calling PREFERRING, others
are calling SKYLINE OF. There may be subtle differences though, I'll
post a comparision after getting some more info on it.

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

Re: proposal: Preference SQL

From

Kevin Walker

Date:

31 May 2008, 23:34:09

<div style="text-align: left;">Jan,<br /><br />;-)  ... agree with your points.  Note I wasn't trying to duplicate the
fullfunctionality of the preferences clause.  ...just pointing out that the example presented could be duplicated with
standardSQL with the result being both ugly and confusing.  <br /><br />My intent was show that the syntax you proposed
waseasier to read and more flexible. <br /><br />Kevin<br /></div><br /><br /><br /><hr id="stopSpelling" />> Date:
Sat,31 May 2008 16:53:54 +0200<br />> From: j.urbanski@students.mimuw.edu.pl<br />> To:
skywalkereast@hotmail.com<br/>> CC: srb@cuci.nl; pgsql-hackers@postgresql.org<br />> Subject: Re: [HACKERS]
proposal:Preference SQL<br />> <br />> Kevin Walker wrote:<br />> > Yes, the preference clause can be
rewrittenusing standard SQL. The syntax to duplicate the example result set is listed below. The syntax is not very
flexibleor easy to read. <br />> > <br />> > se lect id <br />> > from computer<br />> > where
(main_memory= (select max(main_memory) <br />> > from computer)<br />> > and cpu_speed = (select
max(cpu_speed)<br />> > from computer<br />> > where cpu_speed < (select max(cpu_speed) from
computer)))<br/>> > or (cpu_speed = (select max(cpu_speed) <br />> > from computer)<br />> > and
main_memory= (select max(main_memory) <br />> > from computer<br />> > where main_memory < (select
max(main_memory)from computer)))<br />> <br />> Well, that's not 100% correct, but the idea is something like
this.In <br />> particular, if you'd have only one entry in the table, then this query <br />> would not return
anyrows, which would be wrong. Also, if you had a <br />> computer that has larger main memory and a faster CP U
thanany other <br />> copmuter, it should be returned as the result, but the above query would <br />> fail to do
that.<br/>> <br />> The point is not rewriting that particular preference query into a <br />> standard query.
Thepoint is whether it's worth having an automated <br />> mechanism for executing arbitrary preference queries with
complex<br />> preferences (again: the syntax is richer, I didn't want to go into any <br />> detail about it
beforegetting some feedback).<br />> <br />> Let me give you a more sophisticated example.<br />> You have a
webpagethat sells used cars. You have your typical search <br />> form with car make, colour, engine power and so
on.Normally, you would <br />> make the search form input fields correspond to SQL WHERE clauses. So, <br />> if
Iwant a white Honda with a 180 hp engine and about 40k kilometers of <br />> mileage I enter these parameters and
hitthe submit button. Now imagine <br />> I don't get any re sults for my search. That could mean that you have no
<br/>> Honda cars in stock, but it can also mean that you have my perfect Honda <br />> at a bargain price, it's
onlythat it's black. Or maybe you have a <br />> Honda, but it has a 160 hp engine? Or is it just that the one
perfect<br />> Honda you have has a mileage of just over 41k km, and that's why I <br />> didnt' get it in my
resultset? People seldom want a perfect match when <br />> they are searching for something. They want the best
match,they can <br />> get. So, if I wanted to get a Honda with a decent engine and my <br />> favourite color is
white,I'd say:<br />> <br />> SELECT * FROM cars WHERE make = 'Honda' PREFERRING (power = '180' AND <br />>
mileageAROUND 40000) CASCADE color = 'white';<br />> <br />> Remember, that an AND in a preference clause
constructsa partial order. <br />> The query says: I equally prefer having a 180 hp engine and having a car <br
/>>that has a mileage of 40k km. Tha CASCADE clause intrudoces a less <br />> important preference. It means that
thecolor is not as important to me <br />> as power and mileage, but if I had a choice I'd take the white one.<br
/>><br />> I'd strongly recommend skimming through the paper I mentioned in my <br />> first email, it
explainsstuff much better than I do.<br />> <br />> Preference SQL in Postgres could, as I see it, become one of
the<br />> distinct features that no other widespread database system has and that <br />> could potentially be
massivelyuseful for online shops, social networks, <br />> etc. - you name it.<br />> After hearing Bruce's
keynoteat PGCon and how Postgres now should be <br />> aiming at more that just catching up with the big guys I just
thought:<br />> well, that's one neat feature that none of them has, that's useful, <br />> that's kind of sexy
*and*I could get my degree out of it.<br />> <br />> Cheers,<br />> Jan<br />> <br />> -- <br /> >
JanUrbanski<br />> GPG key ID: E583D7D2<br />> <br />> ouden estin<br /><br /><hr />E-mail for the greater
good.<a href="http://im.live.com/Messenger/IM/Join/Default.aspx?source=EML_WL_ GreaterGood" target="_new">Join the i'm
Initiativefrom Microsoft.</a>

Re: proposal: Preference SQL

From

Decibel!

Date:

03 June 2008, 19:09:10

On May 29, 2008, at 6:08 PM, Jan Urbański wrote:
> Now about the idea itself:
> http://www.informatik.uni-augsburg.de/de/lehrstuehle/dbis/db/
> publications/all_db_tech-reports/tr-2001-7_kie_koe/
> tr-2001-7_kie_koe.pdf
> That's one of the basic papers about Preference SQL, explaining
> what it's all about. For those, who don't feel like reading through
> it just because I said it's interesting, here's some more info
> (warning, it's a bit formal):
>
> Preference SQL is an extension to regular SQL, that allows
> expressing preferences in SQL queries. Preferences are like "soft"
> WHERE clauses. A preference doesn't need to be satisfied by a tuple
> for it to appear in the result set, but it's "preferred" it is.
> More strictly, a set of preference clauses in a SQL query defines a
> partial order on the result set as it would appear without any
> preference clauses and then returns the maximal elements.
> The partial order imposed by the set of preferences P[1], P
> [2], ..., P[n] is such that tuple T1 > T2  iff  T1 all preferences
> T2 satisfies and there is a preference satisfied by T1 and not
> satisfied by T2 (or there is a preference satisfied by T1 that is
> "better" satisfied by T2 and all others are "equaly" satisfied). As
> can be seen, there could be an order defined on the degree of
> satisfyiness of a preference, and the exact semantics are not all
> that well defined for all concievable cases. Defining a complete
> semantics will be a part of my thesis.


This seems like a subset of http://pgfoundry.org/projects/qbe/ ... or
do I misunderstand?
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Re: proposal: Preference SQL

From

Jan Urbański

Date:

03 June 2008, 19:27:26

Decibel! wrote:
> On May 29, 2008, at 6:08 PM, Jan Urbański wrote:
>> Preference SQL is an extension to regular SQL, that allows expressing
>> preferences in SQL queries. Preferences are like "soft" WHERE clauses.

> This seems like a subset of http://pgfoundry.org/projects/qbe/ ... or do
> I misunderstand?

I skimmed through the QBE howto, and I think it's actually far from it.
The thing that closely resembles preference clauses is the SKYLINE OF
operator, mentioned eariler in the thread - there is some archives
coverage on it.

I'm still working on producing a comparision of preference SQL and the
skyline operator, more to follow soon.

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

Re: proposal: Preference SQL

From

Josh Berkus

Date:

03 June 2008, 19:31:57

Jan,

> I'm still working on producing a comparision of preference SQL and the
> skyline operator, more to follow soon.

The big problem with all of these is that there's no standards on 
approximate queries yet.  So we're reluctant to support syntax extensions 
for them.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: proposal: Preference SQL

From

Jan Urbański

Date:

03 June 2008, 19:49:51

Josh Berkus wrote:
> Jan,
> 
>> I'm still working on producing a comparision of preference SQL and the
>> skyline operator, more to follow soon.
> 
> The big problem with all of these is that there's no standards on 
> approximate queries yet.  So we're reluctant to support syntax extensions 
> for them.
> 
Yes, I realized it after some thought - adding nonstandard syntax isn't 
so great after all. Right now I'm wondering if SQL standard window 
functions can do the things I though could be doable with preferences. 
Maybe I should talk to my thesis supervisor and find out if implementing 
window functions would be an equally good subject...
I suppose having window functions would be a nice thing? To be honest - 
I need a thesis subject and I like fiddling with Postgres. I'm trying to 
find an area in which my work would be useful to the community and 
enough of a standalone feature, that I can use it as the basis of my 
dissertation.

Also, going through the PG development process will ensure that the 
resulting code will be of topmost quality ;)

-- 
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

Re: proposal: Preference SQL

From

Andrew Dunstan

Date:

03 June 2008, 20:15:56

Jan Urbański wrote:
> Maybe I should talk to my thesis supervisor and find out if 
> implementing window functions would be an equally good subject...
> I suppose having window functions would be a nice thing? To be honest 
> - I need a thesis subject and I like fiddling with Postgres.

Window functions are on the TODO list and a good implementation is 
something we really really want.

You will get far less resistance than for your original idea.

cheers

andrew