Analytic Function Bug - Mailing list pgsql-general

From Rumpi Gravenstein
Subject Analytic Function Bug
Date
Msg-id CAEpg1wDJU6DzUJxJdFAUZFzmw6bY+p17bXvCLupwsxc0V6vqrg@mail.gmail.com
Whole thread Raw
Responses Re: Analytic Function Bug
List pgsql-general
Experts,

I am running on 

PostgreSQL 14.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit

I have the following query which returns what I expect:


with

d(logical_partition_key, model_usage) as ( values( 'TEST_DATA' ,'F(T61)(EXPORT)' )

)

,

usg_txt as (

SELECT DISTINCT logical_partition_key,

MODEL_USAGE as usage_text,

REPLACE ( REPLACE ( REPLACE ( REPLACE ( PIU.MODEL_USAGE::text, '<t context="USAGE_TEXT">', ''), '<t context="FCN_NAME_MODFR">', ''), '<t context="FCN_USAGE_MODFR">', ''), '</t>', '') AS txt

FROM d /*CAO_CALLOUT_GHOST_COMB_W*/ piu

)

,

parse( logical_partition_key, usage_text,txt, rpo_txt, indx ) as

(

select d.logical_partition_key, d.usage_text,d.txt, coalesce(a.rpo[1],a.rpo[2]) as rpo_txt, a.pos

from usg_txt d

left join lateral regexp_matches( txt, '([ ,\-()/&])|([^ ,\-()/&]+)','g') with ordinality as a(rpo,pos) on true

)

,

prv_nxt_token( logical_partition_key, usage_text,txt, rpo_txt, indx, mx_indx, prev,nxt,nxt2, prv2,prv3) as

(

/* Get prior and next token to support later logic */

select p.logical_partition_key, p.usage_text,

p.txt, p.rpo_txt, indx,

max( indx) over ( partition by p.txt ) mx_indx,

lag( p.rpo_txt,1 ) over ( partition by p.logical_partition_key,p.txt order by indx ) prev,

lag( p.rpo_txt,-1 ) over ( partition by p.logical_partition_key,p.txt order by indx ) nxt,

lag( p.rpo_txt,-2) over ( partition by p.logical_partition_key,p.txt order by indx ) nxt2,

lag( p.rpo_txt,2 ) over ( partition by p.logical_partition_key,p.txt order by indx ) prv2,

lag( p.rpo_txt,3 ) over ( partition by p.logical_partition_key,p.txt order by indx ) prv3

from parse p

)

select * from prv_nxt_token;


logical_partition_keyusage_texttxtrpo_txtindxmx_indxprevnxtnxt2prv2prv3
"TEST_DATA""F(T61)(EXPORT)""F(T61)(EXPORT)""F""1""7""[NULL]""(""T61""[NULL]""[NULL]"
"TEST_DATA""F(T61)(EXPORT)""F(T61)(EXPORT)""(""2""7""F""T61"")""[NULL]""[NULL]"
"TEST_DATA""F(T61)(EXPORT)""F(T61)(EXPORT)""T61""3""7""("")""(""F""[NULL]"
"TEST_DATA""F(T61)(EXPORT)""F(T61)(EXPORT)"")""4""7""T61""(""EXPORT""(""F"
"TEST_DATA""F(T61)(EXPORT)""F(T61)(EXPORT)""(""5""7"")""EXPORT"")""T61""("
"TEST_DATA""F(T61)(EXPORT)""F(T61)(EXPORT)""EXPORT""6""7""("")""[NULL]"")""T61"
"TEST_DATA""F(T61)(EXPORT)""F(T61)(EXPORT)"")""7""7""EXPORT""[NULL]""[NULL]""("")"


When I run the same statement, except this time using a large table instead of a values statement, I get the wrong answer.  The difference in the SQL statement that produced the following data is that the large table (10M records) and there is a closing where condition used to limit the result to what is shown:

select * from prv_nxt_token;
is replaced by:

select * from prv_nxt_token where logical_partition_key='TEST_DATA' and usage_text='F(T61)(EXPORT)';


Which returns:
logical_partition_keyusage_texttxtrpo_txtindxmx_indxprvnxtnxt2prv2prv3
"TEST_DATA"
"F(T61)(EXPORT)""F(T61)(EXPORT)""F""1""7""F""(""(""[NULL]""[NULL]"
"TEST_DATA"
"F(T61)(EXPORT)""F(T61)(EXPORT)""(""2""7""F""(""T61""F""[NULL]"
"TEST_DATA"
"F(T61)(EXPORT)""F(T61)(EXPORT)""T61""3""7""T61"")"")""(""("
"TEST_DATA"
"F(T61)(EXPORT)""F(T61)(EXPORT)"")""4""7""T61"")""(""T61""("
"TEST_DATA"
"F(T61)(EXPORT)""F(T61)(EXPORT)""(""5""7""(""EXPORT""EXPORT"")"")"
"TEST_DATA"
"F(T61)(EXPORT)""F(T61)(EXPORT)""EXPORT""6""7""(""EXPORT"")""("")"
"TEST_DATA"
"F(T61)(EXPORT)""F(T61)(EXPORT)"")""7""7"")""[NULL]""[NULL]""EXPORT""EXPORT"

Notice that the prv column (lag - 1 ) is just wrong.  I've highlighted obvious bad values.  Other columns are wrong as well.

Is this a PostgreSQL bug?




--
Rumpi Gravenstein

pgsql-general by date:

Previous
From: Ron Johnson
Date:
Subject: Re: PgBackRest Full backup and N/W reliability
Next
From: Tom Lane
Date:
Subject: Re: Analytic Function Bug