Re: Row pattern recognition - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Row pattern recognition
Date
Msg-id 20230626.100520.2022209340496571373.t-ishii@sranhm.sra.co.jp
Whole thread Raw
In response to Re: Row pattern recognition  (Vik Fearing <vik@postgresfriends.org>)
Responses Re: Row pattern recognition
Re: Row pattern recognition
List pgsql-hackers
> I have been dreaming of and lobbying for someone to pick up this
> feature.  I will be sure to review it from a standards perspective and
> will try my best to help with the technical aspect, but I am not sure
> to have the qualifications for that.
> 
> THANK YOU!

Thank you for looking into my proposal.

>> (I know SQL:2023 is already out, but I don't have access to it)
> 
> If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead
> of just its technical specification.
> 
> https://www.iso.org/standard/78936.html

Thanks for the info.

> I don't understand this.  RPR in a window specification limits the
> window to the matched rows, so this looks like your rpr() function is
> just the regular first_value() window function that we already have?

No, rpr() is different from first_value(). rpr() returns the argument
value at the first row in a frame only when matched rows found. On the
other hand first_value() returns the argument value at the first row
in a frame unconditionally.

company  |   tdate    | price | rpr  | first_value 
----------+------------+-------+------+-------------
 company1 | 2023-07-01 |   100 |      |         100
 company1 | 2023-07-02 |   200 |  200 |         200
 company1 | 2023-07-03 |   150 |  150 |         150
 company1 | 2023-07-04 |   140 |      |         140
 company1 | 2023-07-05 |   150 |  150 |         150
 company1 | 2023-07-06 |    90 |      |          90
 company1 | 2023-07-07 |   110 |      |         110
 company1 | 2023-07-08 |   130 |      |         130
 company1 | 2023-07-09 |   120 |      |         120
 company1 | 2023-07-10 |   130 |      |         130

For example, a frame starting with (tdate = 2023-07-02, price = 200)
consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
with (tdate = 2023-07-03, price = 150). This frame satisfies the
pattern too (price = 150, 140, 150), and rpr retus 150... and so on.

> As in your example, you cannot have START.price outside of the window
> specification; it can only go in the MEASURES clause.  Only startprice
> is allowed outside and it gets its qualification from the OVER.  Using
> w.startprice might have been better but that would require window
> names to be in the same namespace as range tables.
> 
> This currently works in Postgres:
> 
>   SELECT RANK() OVER w
>   FROM (VALUES (1)) AS w (x)
>   WINDOW w AS (ORDER BY w.x);

Interesting.

>> o SUBSET is not supported
> 
> Is this because you haven't done it yet, or because you ran into
> problems trying to do it?

Because it seems SUBSET is not useful without MEASURES support. Thus
my plan is, firstly implement MEASURES, then SUBSET. What do you
think?

>> o Regular expressions other than "+" are not supported
> 
> This is what I had a hard time imagining how to do while thinking
> about it.  The grammar is so different here and we allow many more
> operators (like "?" which is also the standard parameter symbol).
> People more expert than me will have to help here.

Yes, that is a problem.

> In this case, we should require the user to specify AFTER MATCH SKIP
> TO NEXT ROW so that behavior doesn't change when we implement the
> standard default.  (Your patch might do this already.)

Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next
patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW.

>> o Aggregate functions associated with window clause using RPR do not
>> respect RPR
> 
> I do not understand what this means.

Ok, let me explain. See example below. In my understanding "count"
should retun the number of rows in a frame restriced by the match
condition. For example at the first line (2023-07-01 | 100) count
returns 10. I think this should be 0 because the "restriced" frame
starting at the line contains no matched row. On the other hand the
(restricted) frame starting at second line (2023-07-02 | 200) contains
4 rows, thus count should return 4, instead of 9.

SELECT company, tdate, price, rpr(price) OVER w, count(*) OVER w FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 PATTERN (START DOWN+ UP+)
 DEFINE
  START AS TRUE,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);

company  |   tdate    | price | rpr  | count 
----------+------------+-------+------+-------
 company1 | 2023-07-01 |   100 |      |    10
 company1 | 2023-07-02 |   200 |  200 |     9
 company1 | 2023-07-03 |   150 |  150 |     8
 company1 | 2023-07-04 |   140 |      |     7
 company1 | 2023-07-05 |   150 |  150 |     6
 company1 | 2023-07-06 |    90 |      |     5
 company1 | 2023-07-07 |   110 |      |     4
 company1 | 2023-07-08 |   130 |      |     3
 company1 | 2023-07-09 |   120 |      |     2
 company1 | 2023-07-10 |   130 |      |     1

>> It seems RPR in the standard is quite complex. I think we can start
>> with a small subset of RPR then we could gradually enhance the
>> implementation.
> 
> I have no problem with that as long as we don't paint ourselves into a
> corner.

Totally agreed.

>> Comments and suggestions are welcome.
> 
> I have not looked at the patch yet, but is the reason for doing R020
> before R010 because you haven't done the MEASURES clause yet?

One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
harder for me because modifying main SELECT clause could be a hard
work. Another reason is, I had no idea how to implement PREV/NEXT in
other than in WINDOW clause. Other people might feel differently
though.

> In any case, I will be watching this with a close eye, and I am eager
> to help in any way I can.

Thank you! I am looking forward to comments on my patch.  Also any
idea how to implement MEASURES clause is welcome.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Speeding Up Bitmapset
Next
From: Alena Rybakina
Date:
Subject: Re: POC, WIP: OR-clause support for indexes