Re: Row pattern recognition - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: Row pattern recognition |
Date | |
Msg-id | 20230626.100520.2022209340496571373.t-ishii@sranhm.sra.co.jp Whole thread Raw |
In response to | Re: Row pattern recognition (Vik Fearing <vik@postgresfriends.org>) |
Responses |
Re: Row pattern recognition
Re: Row pattern recognition |
List | pgsql-hackers |
> I have been dreaming of and lobbying for someone to pick up this > feature. I will be sure to review it from a standards perspective and > will try my best to help with the technical aspect, but I am not sure > to have the qualifications for that. > > THANK YOU! Thank you for looking into my proposal. >> (I know SQL:2023 is already out, but I don't have access to it) > > If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead > of just its technical specification. > > https://www.iso.org/standard/78936.html Thanks for the info. > I don't understand this. RPR in a window specification limits the > window to the matched rows, so this looks like your rpr() function is > just the regular first_value() window function that we already have? No, rpr() is different from first_value(). rpr() returns the argument value at the first row in a frame only when matched rows found. On the other hand first_value() returns the argument value at the first row in a frame unconditionally. company | tdate | price | rpr | first_value ----------+------------+-------+------+------------- company1 | 2023-07-01 | 100 | | 100 company1 | 2023-07-02 | 200 | 200 | 200 company1 | 2023-07-03 | 150 | 150 | 150 company1 | 2023-07-04 | 140 | | 140 company1 | 2023-07-05 | 150 | 150 | 150 company1 | 2023-07-06 | 90 | | 90 company1 | 2023-07-07 | 110 | | 110 company1 | 2023-07-08 | 130 | | 130 company1 | 2023-07-09 | 120 | | 120 company1 | 2023-07-10 | 130 | | 130 For example, a frame starting with (tdate = 2023-07-02, price = 200) consists of rows (price = 200, 150, 140, 150) satisfying the pattern, thus rpr returns 200. Since in this example frame option "ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts with (tdate = 2023-07-03, price = 150). This frame satisfies the pattern too (price = 150, 140, 150), and rpr retus 150... and so on. > As in your example, you cannot have START.price outside of the window > specification; it can only go in the MEASURES clause. Only startprice > is allowed outside and it gets its qualification from the OVER. Using > w.startprice might have been better but that would require window > names to be in the same namespace as range tables. > > This currently works in Postgres: > > SELECT RANK() OVER w > FROM (VALUES (1)) AS w (x) > WINDOW w AS (ORDER BY w.x); Interesting. >> o SUBSET is not supported > > Is this because you haven't done it yet, or because you ran into > problems trying to do it? Because it seems SUBSET is not useful without MEASURES support. Thus my plan is, firstly implement MEASURES, then SUBSET. What do you think? >> o Regular expressions other than "+" are not supported > > This is what I had a hard time imagining how to do while thinking > about it. The grammar is so different here and we allow many more > operators (like "?" which is also the standard parameter symbol). > People more expert than me will have to help here. Yes, that is a problem. > In this case, we should require the user to specify AFTER MATCH SKIP > TO NEXT ROW so that behavior doesn't change when we implement the > standard default. (Your patch might do this already.) Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW. >> o Aggregate functions associated with window clause using RPR do not >> respect RPR > > I do not understand what this means. Ok, let me explain. See example below. In my understanding "count" should retun the number of rows in a frame restriced by the match condition. For example at the first line (2023-07-01 | 100) count returns 10. I think this should be 0 because the "restriced" frame starting at the line contains no matched row. On the other hand the (restricted) frame starting at second line (2023-07-02 | 200) contains 4 rows, thus count should return 4, instead of 9. SELECT company, tdate, price, rpr(price) OVER w, count(*) OVER w FROM stock WINDOW w AS ( PARTITION BY company ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING PATTERN (START DOWN+ UP+) DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price < PREV(price) ); company | tdate | price | rpr | count ----------+------------+-------+------+------- company1 | 2023-07-01 | 100 | | 10 company1 | 2023-07-02 | 200 | 200 | 9 company1 | 2023-07-03 | 150 | 150 | 8 company1 | 2023-07-04 | 140 | | 7 company1 | 2023-07-05 | 150 | 150 | 6 company1 | 2023-07-06 | 90 | | 5 company1 | 2023-07-07 | 110 | | 4 company1 | 2023-07-08 | 130 | | 3 company1 | 2023-07-09 | 120 | | 2 company1 | 2023-07-10 | 130 | | 1 >> It seems RPR in the standard is quite complex. I think we can start >> with a small subset of RPR then we could gradually enhance the >> implementation. > > I have no problem with that as long as we don't paint ourselves into a > corner. Totally agreed. >> Comments and suggestions are welcome. > > I have not looked at the patch yet, but is the reason for doing R020 > before R010 because you haven't done the MEASURES clause yet? One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked harder for me because modifying main SELECT clause could be a hard work. Another reason is, I had no idea how to implement PREV/NEXT in other than in WINDOW clause. Other people might feel differently though. > In any case, I will be watching this with a close eye, and I am eager > to help in any way I can. Thank you! I am looking forward to comments on my patch. Also any idea how to implement MEASURES clause is welcome. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
pgsql-hackers by date: