Thread: LIMITing number of results in a VIEW with global variables

LIMITing number of results in a VIEW with global variables

From
Thomas Girault
Date:
Hello,

I am writing an extension to easily execute queries with conditions
expressing constraints in fuzzy logics.

I wrote some C functions that get or set global variables in C.
The variables are MU (FLOAT : degree of a fuzzy predicate), ALPHA
(FLOAT : threshold for filtering predicates) and K (INTEGER : limits
the number of results).
Here is an example for the variable ALPHA :
 /*--- sqlf.c ---*/ static float8 ALPHA; Datum get_alpha(PG_FUNCTION_ARGS); Datum get_alpha(PG_FUNCTION_ARGS){
PG_RETURN_FLOAT8(ALPHA);} Datum set_alpha(PG_FUNCTION_ARGS); Datum set_alpha(PG_FUNCTION_ARGS){     ALPHA =
PG_GETARG_FLOAT8(0);    PG_RETURN_FLOAT8(ALPHA); } /*--- sqlf.sql ---*/ CREATE OR REPLACE FUNCTION set_alpha(alpha
FLOAT)RETURNS FLOAT   AS '$libdir/sqlf', 'set_alpha'   LANGUAGE C STRICT;
 

These variables are parameters for filtering and sorting results.
The following cast operations are using MU and ALPHA.
 CREATE OR REPLACE FUNCTION fuzzy2bool(FLOAT) RETURNS BOOLEAN LANGUAGE SQL AS 'SELECT set_mu($1);SELECT $1 >
get_alpha()';
 CREATE CAST (FLOAT AS BOOLEAN) WITH FUNCTION fuzzy2bool(FLOAT) AS IMPLICIT;

With this implicit cast, the query SELECT age, young(age) FROM set_alpha(0.1), employees WHERE young(age);
is equivalent to SELECT age, young(age) FROM set_alpha(0.1), employees WHERE
fuzzy2bool(young(age));

Here, young(age) is a fuzzy predicate returning a float value in [0,1].
The queries keep results satisfying young(age) > alpha :
 age    young(age) 16    1 24    0.6 26    0.4 21    0.9 26    0.4


I can sort the results in the view 'sorted_employees' according to
value MU of a fuzzy predicate thanks to fuzzy2bool cast function.

CREATE OR REPLACE VIEW sorted_employees AS   SELECT *, get_mu() as mu   FROM employees   ORDER BY mu DESC;

The query SELECT age, mu FROM set_alpha(0.1), sorted_employees WHERE young(age);
gives the following results : age    mu 16    1 21    0.899999976158142 24    0.600000023841858 26    0.400000005960464
26   0.400000005960464
 

I am now trying to limit the number of results in the view according
to the global value K :

CREATE OR REPLACE VIEW filtered_employees AS   SELECT *, get_mu() as mu   FROM employees   ORDER BY mu DESC   LIMIT K;

The following query SELECT age, mu FROM set_k(5), set_alpha(0.1), filtered_employees
WHERE young(age);
gives the results : age    mu 24    1 16    1
instead of : age    mu 16    1 21    0.899999976158142 24    0.600000023841858 26    0.400000005960464 26
0.400000005960464

It seems that the 'LIMIT K' instruction have side effects on the MU value.

Why is it not working ? How to fix this issue ?


Thanks by advance,

Thomas Girault


Re: LIMITing number of results in a VIEW with global variables

From
Robert Haas
Date:
On Fri, Oct 14, 2011 at 10:43 AM, Thomas Girault <toma.girault@gmail.com> wrote:
> I am now trying to limit the number of results in the view according
> to the global value K :
>
> CREATE OR REPLACE VIEW filtered_employees AS
>    SELECT *, get_mu() as mu
>    FROM employees
>    ORDER BY mu DESC
>    LIMIT K;

Well, SQL, our our dialect of it anyway, doesn't have global
variables.  So I think the above is going to throw a syntax error.
You may have global variables in your C code, but those won't be
visible from the SQL level.

In general, I think you'd be better off not relying on C global
variables either, and instead passing the values you need as function
arguments.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: LIMITing number of results in a VIEW with global variables

From
Florian Pflug
Date:
On Oct14, 2011, at 16:43 , Thomas Girault wrote:
>  CREATE OR REPLACE FUNCTION fuzzy2bool(FLOAT)
>  RETURNS BOOLEAN LANGUAGE SQL AS 'SELECT set_mu($1);SELECT $1 > get_alpha()';

It seems dangerous for a cast to modify global state such a MU. The evaluation
order of functions during query execute isn't always easy to guess, and may
change depending on the execution plan.

> With this implicit cast, the query
>  SELECT age, young(age) FROM set_alpha(0.1), employees WHERE young(age);
> is equivalent to
>  SELECT age, young(age) FROM set_alpha(0.1), employees WHERE
> fuzzy2bool(young(age));

Those set_alpha() calls seem equally dangerous. If this alpha is supposed
to be a global parameter, why not set it *before* issuing the query?

> I can sort the results in the view 'sorted_employees' according to
> value MU of a fuzzy predicate thanks to fuzzy2bool cast function.
>
> CREATE OR REPLACE VIEW sorted_employees AS
>    SELECT *, get_mu() as mu
>    FROM employees
>    ORDER BY mu DESC;

Are you aware that an ORDER BY clause in a VIEW is only going to work
if you do "SELECT .. .FROM view". If the outer query is more complex
than that, I wouldn't bet on the results coming back in the expected order.

Usually, you'd attach ORDER BY to the outermost query (or to subqueries
containing a LIMIT clause).

> The following query
>  SELECT age, mu FROM set_k(5), set_alpha(0.1), filtered_employees

Again, you seem to rely on these set_k(), set_alpha() calls being
done before the query is executed. But postgres doesn't guarantee
that.

> WHERE young(age);
> gives the results :
>  age    mu
>  24    1
>  16    1
> instead of :
>  age    mu
>  16    1
>  21    0.899999976158142
>  24    0.600000023841858
>  26    0.400000005960464
>  26    0.400000005960464
>
> It seems that the 'LIMIT K' instruction have side effects on the MU value.

The execution plan may very well vary depending on the limit. Off-hand,
I'd guess that with a small K, one of these set_whatever() calls in one
of your FROM clauses gets executed after the computation it'd supposed
to affect has already happened.

> Why is it not working ? How to fix this issue ?

Don't rely on the execution order of function calls in a SELECT statement.
Divide your functions into two classes.

The ones which have side-effects (i.e. change global state). These should
*never* be called from SQL statements, except in the trivial case of
"SELECT my_func(...);". Also, they should be marked with VOLATILE

And the ones without side effects. Those should be marked with IMMUTABLE,
STABLE of VOLATILE, depending on how they're influenced by global state
changes. Read the documentation on these flags.

Finally, don't assume that ORDER BY inside a view influences the output
order of queries using the view. (Except in the trivial case of
"SELECT * FROM view"). As a rule of thumb, ORDER BY in a view only makes
sense if there's also a LIMIT clause. You are then guaranteed that the
view returns the first <limit> rows according to the specified order.
Don't assume they're necessarily returned in ascending order, though.

best regards,
Florian Pflug



Re: LIMITing number of results in a VIEW with global variables

From
Thomas Girault
Date:
Hello,

Thank you for your answer Robert.

> Well, SQL, our our dialect of it anyway, doesn't have global
> variables.  So I think the above is going to throw a syntax error.
> You may have global variables in your C code, but those won't be
> visible from the SQL level.

I was wrong in the definition of  filtered_employees view. 
The correct one is :

CREATE OR REPLACE VIEW filtered_employees AS
SELECT *, get_mu() as mu
FROM employees
ORDER BY mu DESC
LIMIT get_k();

Note that the access to global C variables (K, MU and ALPHA) from SQL is working well with my definitions of get_k(), get_mu()...  There is no syntax error here, however the view is not working because it has side effects on the value of MU.

I have also tried two other alternatives to the LIMIT keyword but it doesn't work yet : 
1) SQL instruction :  RANK() OVER(ORDER BY get_mu()) as sqlf_rank ;
2) C instruction : SPI_exec(query, K) ... it leads to a segmentation fault.

> In general, I think you'd be better off not relying on C global
> variables either, 

I don't understand how I could avoid using global variables in some cases.
For instance, I must store the float value $1 corresponding to a fuzzy predicate degree in the following fuzzy2bool cast operation :

CREATE OR REPLACE FUNCTION fuzzy2bool(FLOAT) 
RETURNS BOOLEAN LANGUAGE SQL AS 
'SELECT set_mu($1); -- Here $1 must be stored in MU for each record observed in the view;
SELECT $1 > get_alpha()'; -- Then $1 is converted to Boolean according to ALPHA global value

> and instead passing the values you need as function
> arguments.
Do you mean that I should define a function filter(table_name TEXT, k INTEGER, alpha FLOAT) ?

Thanks again for your help,

Thomas Girault

Re: LIMITing number of results in a VIEW with global variables

From
Thomas Girault
Date:
Hello Florian,

It seems dangerous for a cast to modify global state such a MU. The evaluation
order of functions during query execute isn't always easy to guess, and may
change depending on the execution plan.

I supposed that fuzzy2bool is called just before the terminal evaluation of the WHERE clause which needs a Boolean. My first tests showed that this hypothesis is right but it might be wrong in the case of alternative execution plans.
 
> With this implicit cast, the query
>  SELECT age, young(age) FROM set_alpha(0.1), employees WHERE young(age);
> is equivalent to
>  SELECT age, young(age) FROM set_alpha(0.1), employees WHERE
> fuzzy2bool(young(age));

Those set_alpha() calls seem equally dangerous. If this alpha is supposed
to be a global parameter, why not set it *before* issuing the query?

Alternatively, we could also set the alpha value before the query :   

SELECT set_alpha(0.1); SELECT age, young(age) FROM employees WHERE young(age);

I would be very interested to know if there is smarter way to set global variables.

> I can sort the results in the view 'sorted_employees' according to
> value MU of a fuzzy predicate thanks to fuzzy2bool cast function.
>
> CREATE OR REPLACE VIEW sorted_employees AS
>    SELECT *, get_mu() as mu
>    FROM employees
>    ORDER BY mu DESC;

Are you aware that an ORDER BY clause in a VIEW is only going to work
if you do "SELECT .. .FROM view".

I It really the first time I am using views, I didn't know that ! 
 
If the outer query is more complex
than that, I wouldn't bet on the results coming back in the expected order.
 
I don't mind if the ordering is wrong : it is just a way to process filtering according to K and ALPHA.

Usually, you'd attach ORDER BY to the outermost query (or to subqueries
containing a LIMIT clause).

> The following query
>  SELECT age, mu FROM set_k(5), set_alpha(0.1), filtered_employees

Again, you seem to rely on these set_k(), set_alpha() calls being
done before the query is executed. But postgres doesn't guarantee
that.

> WHERE young(age);
> gives the results :
>  age  mu
>  24   1
>  16   1
> instead of :
>  age  mu
>  16   1
>  21   0.899999976158142
>  24   0.600000023841858
>  26   0.400000005960464
>  26   0.400000005960464
>
> It seems that the 'LIMIT K' instruction have side effects on the MU value.

The execution plan may very well vary depending on the limit. Off-hand,
I'd guess that with a small K, one of these set_whatever() calls in one
of your FROM clauses gets executed after the computation it'd supposed
to affect has already happened.

> Why is it not working ? How to fix this issue ?

Don't rely on the execution order of function calls in a SELECT statement.
Divide your functions into two classes.

The ones which have side-effects (i.e. change global state). These should
*never* be called from SQL statements, except in the trivial case of
"SELECT my_func(...);". Also, they should be marked with VOLATILE

And the ones without side effects. Those should be marked with IMMUTABLE,
STABLE of VOLATILE, depending on how they're influenced by global state
changes. Read the documentation on these flags.

Finally, don't assume that ORDER BY inside a view influences the output
order of queries using the view. (Except in the trivial case of
"SELECT * FROM view"). As a rule of thumb, ORDER BY in a view only makes
sense if there's also a LIMIT clause. You are then guaranteed that the
view returns the first <limit> rows according to the specified order.
Don't assume they're necessarily returned in ascending order, though.

best regards,
Florian Pflug
 
Thank you very much ! I will try to follow your advices.

Thomas

Re: LIMITing number of results in a VIEW with global variables

From
Florian Pflug
Date:
On Oct15, 2011, at 14:52 , Thomas Girault wrote:
> Alternatively, we could also set the alpha value before the query :
> 
> SELECT set_alpha(0.1); SELECT age, young(age) FROM employees WHERE
> young(age);

That's certainly much safer.

> I would be very interested to know if there is smarter way to set global
> variables.

The closest thing to global variables are GUC settings. These can be
set globally in postgres.conf, per user and/or per-database via
ALTER ROLE/DATABSE ... [IN DATABASE ...], per session with SET, per
function via CREATE FUNCTION ... SET and finally per subtransaction 
with SET LOCAL. Modules can add their own GUC settings - you should be
able to find an example in one of the modules in contrib/

But your getter/setter-based solution isn't bad either - in fact, the
trigram module contains something very similar. Just don't try to
modify the value mid-query.

>> I can sort the results in the view 'sorted_employees' according to
>>> value MU of a fuzzy predicate thanks to fuzzy2bool cast function.
>>> 
>>> CREATE OR REPLACE VIEW sorted_employees AS
>>>   SELECT *, get_mu() as mu
>>>   FROM employees
>>>   ORDER BY mu DESC;
>> 
>> Are you aware that an ORDER BY clause in a VIEW is only going to work
>> if you do "SELECT .. .FROM view".
> 
> I It really the first time I am using views, I didn't know that !

Hm, I think I didn't explain that to well, so to avoid giving you a false
impression here's another try.

A SELECT without an ORDER BY clause attached to the *outmost* level never
guarantees any particular ordering of the result, nor any particular
relationship between the ordering of the SELECT's data sources and the
ordering of the SELECT's result. The only exception are SELECT's of the
form "select ... from <view>" where <view> has an ORDER BY attached to
the outmost level. From that, it follows that an ORDER BY in views used
in SELECTs more complex than the above is usually useless.

Things are different for views that combine ORDER BY and LIMIT, of course.
Then, not only the order of the view's results changes, but also it's output
set. Which of course affects every statement which uses the view.

best regards,
Florian Pflug