Re: Row pattern recognition - Mailing list pgsql-hackers
From | Vik Fearing |
---|---|
Subject | Re: Row pattern recognition |
Date | |
Msg-id | 90bff6fa-01b7-cf60-f4f4-3412514a7cd2@postgresfriends.org Whole thread Raw |
In response to | Re: Row pattern recognition (Tatsuo Ishii <ishii@sraoss.co.jp>) |
Responses |
Re: Row pattern recognition
Re: Row pattern recognition |
List | pgsql-hackers |
On 6/26/23 03:05, Tatsuo Ishii wrote: >> I don't understand this. RPR in a window specification limits the >> window to the matched rows, so this looks like your rpr() function is >> just the regular first_value() window function that we already have? > > No, rpr() is different from first_value(). rpr() returns the argument > value at the first row in a frame only when matched rows found. On the > other hand first_value() returns the argument value at the first row > in a frame unconditionally. > > company | tdate | price | rpr | first_value > ----------+------------+-------+------+------------- > company1 | 2023-07-01 | 100 | | 100 > company1 | 2023-07-02 | 200 | 200 | 200 > company1 | 2023-07-03 | 150 | 150 | 150 > company1 | 2023-07-04 | 140 | | 140 > company1 | 2023-07-05 | 150 | 150 | 150 > company1 | 2023-07-06 | 90 | | 90 > company1 | 2023-07-07 | 110 | | 110 > company1 | 2023-07-08 | 130 | | 130 > company1 | 2023-07-09 | 120 | | 120 > company1 | 2023-07-10 | 130 | | 130 > > For example, a frame starting with (tdate = 2023-07-02, price = 200) > consists of rows (price = 200, 150, 140, 150) satisfying the pattern, > thus rpr returns 200. Since in this example frame option "ROWS BETWEEN > CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts > with (tdate = 2023-07-03, price = 150). This frame satisfies the > pattern too (price = 150, 140, 150), and rpr retus 150... and so on. Okay, I see the problem now, and why you need the rpr() function. You are doing this as something that happens over a window frame, but it is actually something that *reduces* the window frame. The pattern matching needs to be done when the frame is calculated and not when any particular function is applied over it. This query (with all the defaults made explicit): SELECT s.company, s.tdate, s.price, FIRST_VALUE(s.tdate) OVER w, LAST_VALUE(s.tdate) OVER w, lowest OVER w FROM stock AS s WINDOW w AS ( PARTITION BY s.company ORDER BY s.tdate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING EXCLUDE NO OTHERS MEASURES LAST(DOWN) AS lowest AFTER MATCH SKIP PAST LAST ROW INITIAL PATTERN (START DOWN+ UP+) DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price < PREV(price) ); Should produce this result: company | tdate | price | first_value | last_value | lowest ----------+------------+-------+-------------+------------+-------- company1 | 07-01-2023 | 100 | | | company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 | 140 company1 | 07-03-2023 | 150 | | | company1 | 07-04-2023 | 140 | | | company1 | 07-05-2023 | 150 | | | company1 | 07-06-2023 | 90 | | | company1 | 07-07-2023 | 110 | | | company1 | 07-08-2023 | 130 | 07-05-2023 | 07-05-2023 | 120 company1 | 07-09-2023 | 120 | | | company1 | 07-10-2023 | 130 | | | (10 rows) Or if we switch to AFTER MATCH SKIP TO NEXT ROW, then we get: company | tdate | price | first_value | last_value | lowest ----------+------------+-------+-------------+------------+-------- company1 | 07-01-2023 | 100 | | | company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 | 140 company1 | 07-03-2023 | 150 | 07-03-2023 | 07-05-2023 | 140 company1 | 07-04-2023 | 140 | | | company1 | 07-05-2023 | 150 | 07-05-2023 | 07-08-2023 | 90 company1 | 07-06-2023 | 90 | | | company1 | 07-07-2023 | 110 | | | company1 | 07-08-2023 | 130 | 07-08-2023 | 07-10-2023 | 120 company1 | 07-09-2023 | 120 | | | company1 | 07-10-2023 | 130 | | | (10 rows) And then if we change INITIAL to SEEK: company | tdate | price | first_value | last_value | lowest ----------+------------+-------+-------------+------------+-------- company1 | 07-01-2023 | 100 | 07-02-2023 | 07-05-2023 | 140 company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 | 140 company1 | 07-03-2023 | 150 | 07-03-2023 | 07-05-2023 | 140 company1 | 07-04-2023 | 140 | 07-05-2023 | 07-08-2023 | 90 company1 | 07-05-2023 | 150 | 07-05-2023 | 07-08-2023 | 90 company1 | 07-06-2023 | 90 | 07-08-2023 | 07-10-2023 | 120 company1 | 07-07-2023 | 110 | 07-08-2023 | 07-10-2023 | 120 company1 | 07-08-2023 | 130 | 07-08-2023 | 07-10-2023 | 120 company1 | 07-09-2023 | 120 | | | company1 | 07-10-2023 | 130 | | | (10 rows) Since the pattern recognition is part of the frame, the window aggregates should Just Work. >>> o SUBSET is not supported >> >> Is this because you haven't done it yet, or because you ran into >> problems trying to do it? > > Because it seems SUBSET is not useful without MEASURES support. Thus > my plan is, firstly implement MEASURES, then SUBSET. What do you > think? SUBSET elements can be used in DEFINE clauses, but I do not think this is important compared to other features. >>> Comments and suggestions are welcome. >> >> I have not looked at the patch yet, but is the reason for doing R020 >> before R010 because you haven't done the MEASURES clause yet? > > One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked > harder for me because modifying main SELECT clause could be a hard > work. Another reason is, I had no idea how to implement PREV/NEXT in > other than in WINDOW clause. Other people might feel differently > though. I think we could do this with a single tuplesort if we use backtracking (which might be really slow for some patterns). I have not looked into it in any detail. We would need to be able to remove tuples from the end (even if only logically), and be able to update tuples inside the store. Both of those needs come from backtracking and possibly changing the classifier. Without backtracking, I don't see how we could do it without have a separate tuplestore for every current possible match. >> In any case, I will be watching this with a close eye, and I am eager >> to help in any way I can. > > Thank you! I am looking forward to comments on my patch. Also any > idea how to implement MEASURES clause is welcome. I looked at your v2 patches a little bit and the only comment that I currently have on the code is you spelled PERMUTE as PREMUTE. Everything else is hopefully explained above. -- Vik Fearing
pgsql-hackers by date: