Thread: Different results from view and from its defintion query [w/ windowing function]

Different results from view and from its defintion query [w/ windowing function]

From
Thalis Kalfigkopoulos
Date:

Sorry for the lack of a more appropriate title.
The summary of my problem is: i run a query and I get some results; then I create a view using this query, and I run the same query on the view, and get different results. Details follow.

On the original table the analytical data is as follows:
# SELECT id,experiment,insertedon,score FROM data WHERE id=1160;

  id |        experiment        |     insertedon      | score 
--------+----------------------+---------------------+--------
  1160 | alpha          | 2012-08-19 01:01:12 | 220.69
  1160 | alpha          | 2012-08-19 01:01:22 | 220.69
  1160 | beta          | 2012-08-19 01:01:31 |  220.7
  1160 | beta          | 2012-08-19 01:01:42 |  220.7
  1160 | beta          | 2012-08-19 01:01:54 |  220.7

My query of interest using windowing functions is:

# SELECT id, experiment, first_value(insertedon) OVER (PARTITION BY score, id) AS first_insertedon, score FROM data WHERE id=1160;
 id   |        experiment        |  first_insertedon   | score  |
--------+----------------------+---------------------+--------+----------
  1160 | alpha          | 2012-08-19 01:01:12 | 220.69 |
  1160 | alpha          | 2012-08-19 01:01:12 | 220.69 |
  1160 | beta          | 2012-08-19 01:01:31 |  220.7 |
  1160 | beta          | 2012-08-19 01:01:31 |  220.7 |
  1160 | beta          | 2012-08-19 01:01:31 |  220.7 |

So far so good. I then create the view on this last query without the WHERE condition:
# CREATE VIEW clustered_view AS SELECT id, experiment, first_value(insertedon) OVER (PARTITION BY score, id) AS first_insertedon, score FROM data;

I see the view created correctly and its definition is according to the mentioned SQL query. I now select from the view adding the WHERE condition:
#  SELECT * from clustered_view WHERE id=1160;

  id   |        experiment         |  first_insertedon   | score  |
--------+----------------------+---------------------+--------+
  1160 | alpha          | 2012-08-19 01:01:12 | 220.69 |
  1160 | alpha          | 2012-08-19 01:01:12 | 220.69 |
  1160 | beta          | 2012-08-19 01:01:54 |  220.7 |
  1160 | beta          | 2012-08-19 01:01:54 |  220.7 |
  1160 | beta          | 2012-08-19 01:01:54 |  220.7 |

As you see, the 'first_insertedon' timestamp for the experiment 'beta' is no longer the first of the timestamps i.e. '2012-08-19 01:01:31' as the original query's results gave correctly, but it's now the last one i.e. '2012-08-19 01:01:54'

Any ideas? Missing the obvious?


TIA,
Thalis K.
Thalis Kalfigkopoulos <tkalfigo@gmail.com> writes:
> # SELECT id, experiment, first_value(insertedon) OVER (PARTITION BY score,
> id) AS first_insertedon, score FROM data WHERE id=1160;

> [ versus ]

> # CREATE VIEW clustered_view AS SELECT id, experiment,
> first_value(insertedon) OVER (PARTITION BY score, id) AS first_insertedon,
> score FROM data;

> #  SELECT * from clustered_view WHERE id=1160;

One possible reason these produce different results is in that in the
first case, the WHERE condition eliminates rows from the window
function's consideration.  In the second case, it doesn't --- the WHERE
only filters the result rows from the view.  However, the fact that "id"
is part of the partition list may insulate you from that; not quite sure
without seeing a more complete example.

Another likely reason for trouble is that the window function seems
underspecified: without any ORDER BY clause, you are going to get a
random one of the insertedon values for the same score and id.  It's
entirely likely that moving the WHERE clause would change the plan
enough to change the ordering of the rows seen by the window function.
Possibly you should be using min() instead of first_value().

            regards, tom lane


Re: Different results from view and from its defintion query [w/ windowing function]

From
Thalis Kalfigkopoulos
Date:
Hi Tom,

and thanks for the reply (I had the pleasure of meeting you 11 years ago in Pittsburgh; still a pleasure seeing your concise and helpful replies.)

In the end I went for a change of window function. Using "min(insertedon)" instead of "first_value(insertedon)" works correctly.

Alternatively your suggestion of adding an "ORDER BY insertedon" clause also seems to work. It makes the first_value(insertedon) behave as min(insertedon).


thanks again,
Thalis K.

On Mon, Aug 20, 2012 at 12:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thalis Kalfigkopoulos <tkalfigo@gmail.com> writes:
> # SELECT id, experiment, first_value(insertedon) OVER (PARTITION BY score,
> id) AS first_insertedon, score FROM data WHERE id=1160;

> [ versus ]

> # CREATE VIEW clustered_view AS SELECT id, experiment,
> first_value(insertedon) OVER (PARTITION BY score, id) AS first_insertedon,
> score FROM data;

> #  SELECT * from clustered_view WHERE id=1160;

One possible reason these produce different results is in that in the
first case, the WHERE condition eliminates rows from the window
function's consideration.  In the second case, it doesn't --- the WHERE
only filters the result rows from the view.  However, the fact that "id"
is part of the partition list may insulate you from that; not quite sure
without seeing a more complete example.

Another likely reason for trouble is that the window function seems
underspecified: without any ORDER BY clause, you are going to get a
random one of the insertedon values for the same score and id.  It's
entirely likely that moving the WHERE clause would change the plan
enough to change the ordering of the rows seen by the window function.
Possibly you should be using min() instead of first_value().

                        regards, tom lane

Re: Different results from view and from its defintion query [w/ windowing function]

From
Achilleas Mantzios
Date:

first_value refers to the first row from the window frame. Unless you force some kind of ordering, you cannot expect

consistent results out of this.

 

See the PARTITION BY ... ORDER BY syntax in http://www.postgresql.org/docs/9.1/static/tutorial-window.html

 

On Δευ 20 Αυγ 2012 01:55:38 Thalis Kalfigkopoulos wrote:


Sorry for the lack of a more appropriate title.
The summary of my problem is: i run a query and I get some results; then I create a view using this query, and I run the same query on the view, and get different results. Details follow.

On the original table the analytical data is as follows:
# SELECT id,experiment,insertedon,score FROM data WHERE id=1160;

  id |        experiment        |     insertedon      | score 
--------+----------------------+---------------------+--------
  1160 | alpha          | 2012-08-19 01:01:12 | 220.69
  1160 | alpha          | 2012-08-19 01:01:22 | 220.69
  1160 | beta          | 2012-08-19 01:01:31 |  220.7
  1160 | beta          | 2012-08-19 01:01:42 |  220.7
  1160 | beta          | 2012-08-19 01:01:54 |  220.7

My query of interest using windowing functions is:

# SELECT id, experiment, first_value(insertedon) OVER (PARTITION BY score, id) AS first_insertedon, score FROM data WHERE id=1160;
 id   |        experiment        |  first_insertedon   | score  |
--------+----------------------+---------------------+--------+----------
  1160 | alpha          | 2012-08-19 01:01:12 | 220.69 |
  1160 | alpha          | 2012-08-19 01:01:12 | 220.69 |
  1160 | beta          | 2012-08-19 01:01:31 |  220.7 |
  1160 | beta          | 2012-08-19 01:01:31 |  220.7 |
  1160 | beta          | 2012-08-19 01:01:31 |  220.7 |

So far so good. I then create the view on this last query without the WHERE condition:
# CREATE VIEW clustered_view AS SELECT id, experiment, first_value(insertedon) OVER (PARTITION BY score, id) AS first_insertedon, score FROM data;

I see the view created correctly and its definition is according to the mentioned SQL query. I now select from the view adding the WHERE condition:
#  SELECT * from clustered_view WHERE id=1160;

  id   |        experiment         |  first_insertedon   | score  |
--------+----------------------+---------------------+--------+
  1160 | alpha          | 2012-08-19 01:01:12 | 220.69 |
  1160 | alpha          | 2012-08-19 01:01:12 | 220.69 |
  1160 | beta          | 2012-08-19 01:01:54 |  220.7 |
  1160 | beta          | 2012-08-19 01:01:54 |  220.7 |
  1160 | beta          | 2012-08-19 01:01:54 |  220.7 |

As you see, the 'first_insertedon' timestamp for the experiment 'beta' is no longer the first of the timestamps i.e. '2012-08-19 01:01:31' as the original query's results gave correctly, but it's now the last one i.e. '2012-08-19 01:01:54'

Any ideas? Missing the obvious?


TIA,
Thalis K.



-

Achilleas Mantzios

IT DEPT