Thread: regexp_replace failing on 9.0.4

regexp_replace failing on 9.0.4

From
Rob Sargent
Date:
On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2]
test machine and my 9.1.2[3] dev box all is fine

This is may statement

update cms.segment_data s
set text = regexp_replace(s.text,
      '(^.*)ns/acres/pathology/dx/1.5(.*$)',
     E'\\1ns/acres/pathology/dx/1.6\\2')
from
   dm.payload_segment p,
   support.fix_pathdx_namespace f
where s.id  = p.segment_data_id
and p.item_id = f.item_id
and p.content_version = f.maxversion
;

"UPDATE 76" reported, but zero changes persist



When I just select the regexp like so

select legacy_id,
regexp_replace( substring(s.text, 1, 150) ,
'(^.*)ns/acres/pathology/dx/1.5(.*$)',
E'\\1ns/acres/pathology/dx/1.6\\2')
from
   dm.payload_segment p,
   support.fix_pathdx_namespace f,
   cms.segment_data s
where s.id  = p.segment_data_id
       and p.item_id = f.item_id
       and p.content_version = f.maxversion
;

I get the corrected data (chiefly the "1.6")

Can anyone see where I've gone off track?



[1]PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292], 64-bit

[2]PostgreSQL 9.0.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292], 64-bit

[3]PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit


Re: regexp_replace failing on 9.0.4

From
Tom Lane
Date:
Rob Sargent <robjsargent@gmail.com> writes:
> On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2]
> test machine and my 9.1.2[3] dev box all is fine

AFAICS from the commit logs, there were no changes affecting the regex
code between 9.0.3 and 9.0.4.  I'm suspicious that your data is
different on the different servers.

            regards, tom lane


Re: regexp_replace failing on 9.0.4

From
Rob Sargent
Date:
On 03/18/2013 01:19 PM, Tom Lane wrote:
> Rob Sargent <robjsargent@gmail.com> writes:
>> On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2]
>> test machine and my 9.1.2[3] dev box all is fine
>
> AFAICS from the commit logs, there were no changes affecting the regex
> code between 9.0.3 and 9.0.4.  I'm suspicious that your data is
> different on the different servers.
>
>             regards, tom lane
>
Good to hear, thought I might have glossed over the telling release note
- my usual mo

I restored my dev db from prod just for this run. test and prod are out
of sync by a couple of weeks.

What I had to do ultimately was as follows,

create table support.duplicate_pathdx_namespace
as select item_id, legacy_id, name, locked_by, maxversion,
    regexp_replace(substring(content,1,150),
       '(^.*)ns/acres/pathology/dx/1.5(.*$)',
        E'\\1ns/acres/pathology/dx/1.6\\2')
   || substring(content, 151) as content
from support.fix_pathdx_namespace;

and the update finally stuck, then copied that reconstructed textblock
over to the target production table in an regexp-less update.  Doing the
reg_exp_replace on the whole text blog as part of create temp table did
not work.

I'll double check my dev server from same pre-run dump.

Weird.





Re: regexp_replace failing on 9.0.4

From
Tom Lane
Date:
Rob Sargent <robjsargent@gmail.com> writes:
> On 03/18/2013 01:19 PM, Tom Lane wrote:
>> Rob Sargent <robjsargent@gmail.com> writes:
>>> On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2]
>>> test machine and my 9.1.2[3] dev box all is fine

>> AFAICS from the commit logs, there were no changes affecting the regex
>> code between 9.0.3 and 9.0.4.  I'm suspicious that your data is
>> different on the different servers.

> Good to hear, thought I might have glossed over the telling release note
> - my usual mo

Maybe we're barking up the wrong tree by suspecting the regex itself.
Perhaps the updates were suppressed by a trigger, or the transaction
rolled back instead of committing, or some such?

            regards, tom lane


Re: regexp_replace failing on 9.0.4

From
Rob Sargent
Date:
On 03/18/2013 02:40 PM, Tom Lane wrote:
> Rob Sargent <robjsargent@gmail.com> writes:
>> On 03/18/2013 01:19 PM, Tom Lane wrote:
>>> Rob Sargent <robjsargent@gmail.com> writes:
>>>> On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2]
>>>> test machine and my 9.1.2[3] dev box all is fine
>
>>> AFAICS from the commit logs, there were no changes affecting the regex
>>> code between 9.0.3 and 9.0.4.  I'm suspicious that your data is
>>> different on the different servers.
>
>> Good to hear, thought I might have glossed over the telling release note
>> - my usual mo
>
> Maybe we're barking up the wrong tree by suspecting the regex itself.
> Perhaps the updates were suppressed by a trigger, or the transaction
> rolled back instead of committing, or some such?
>
>             regards, tom lane
>
The work was all rolled into a function:
   o find the chapters;
   o copy the necessary data (mainly the text blob) into a back-out
table
   o "lock" the chapters (protect them from exposure to the client app)
   o perform the regexp_replace as the update to prod. table

The function was exec'd in a tx and committed, leaving the back-out
table and the programmatic locks in place, but the update itself had
been a no-op and continued to be with ad hoc update statements, until I
hit the final goofy answer ( rg_replace(string, start) || substring(end) )

Have not yet had a chance to re-create on dev.  Test worked like a charm.





Re: regexp_replace failing on 9.0.4

From
Rob Sargentg
Date:
> Maybe we're barking up the wrong tree by suspecting the regex itself.
> Perhaps the updates were suppressed by a trigger, or the transaction
> rolled back instead of committing, or some such?
>
>             regards, tom lane
Barking mad, more like it.  I had rolled  back the execution of my
function in my dev env. and running again produces the same result as
experienced in production system.

Why the update of the text type field view regexp_replace is failing
still confuses me, but that's pretty much my natural state.  The size of
the text field ranges from 7.5k to 24k char.


Here is the update (again)
   update cms.segment_data s
   set text = regexp_replace(f.content,
'(^.*)ns/acres/pathology/dx/1.5(.*$)', E'\\1ns/acres/pathology/dx/1.6\\2')
   from
     dm.payload_segment p,
     support.fix_pathdx_namespace f
   where s.id  = p.segment_data_id
   and p.item_id = f.item_id
   and p.content_version = f.maxversion
   ;



Re: regexp_replace failing on 9.0.4

From
Rob Sargent
Date:
On 03/18/2013 02:40 PM, Tom Lane wrote:
> Rob Sargent <robjsargent@gmail.com> writes:
>> On 03/18/2013 01:19 PM, Tom Lane wrote:
>>> Rob Sargent <robjsargent@gmail.com> writes:
>>>> On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2]
>>>> test machine and my 9.1.2[3] dev box all is fine
>
>>> AFAICS from the commit logs, there were no changes affecting the regex
>>> code between 9.0.3 and 9.0.4.  I'm suspicious that your data is
>>> different on the different servers.
>
>> Good to hear, thought I might have glossed over the telling release note
>> - my usual mo
>
> Maybe we're barking up the wrong tree by suspecting the regex itself.
> Perhaps the updates were suppressed by a trigger, or the transaction
> rolled back instead of committing, or some such?
>
>             regards, tom lane
>
For fun I decided to install 9.2 and thought I would try my luck there.
  Here's was I saw (apologies for the wide output).

#localhost:cms# select count(*) from pg_trigger;
+-------+
| count |
+-------+
|   364 |
+-------+
(1 row)

Time: 0.407 ms
#localhost:cms# select tgname from pg_trigger where tgname !~
'^RI_ConstraintTrigger';
+--------+
| tgname |
+--------+
+--------+
(0 rows)

#localhost:cms# select version();
+-------------------------------------------------------------------------------------------------------+
|                 version
     |
+---------------------------------------------------------------------------------------------------------
| PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit |
+-------------------------------------------------------------------------------------------------------+
(1 row)

select substring(text,1,150) from cms.segment_data
where id = 'c092880f-8484-4b29-b712-f3df12216701';
+----------------------------------------------------------------------------------------------------------------+
|                                                   substring
                                          |
+----------------------------------------------------------------------------------------------------------------+
| <?xml version="1.0" encoding="UTF-8"?>\r
                                         +|
| <pathology xmlns="http://amirsys.com/ns/acres/pathology/dx/1.5"
enable-tables="true"><metadata><id>diversion_c |
+----------------------------------------------------------------------------------------------------------------+
(1 row)

#localhost:cms# begin;
BEGIN

<< simple update in place>>
update cms.segment_data
set text = regexp_replace(text,'(^.*)ns/acres/pathology/dx/1.5(.*$)',
E'\\1ns/acres/pathology/dx/1.6\\2')
where id = 'c092880f-8484-4b29-b712-f3df12216701';
UPDATE 1

< check >>
select substring(text,1,150) from cms.segment_data
where id = 'c092880f-8484-4b29-b712-f3df12216701';
+----------------------------------------------------------------------------------------------------------------+
|                                                 substring
                                        |
+----------------------------------------------------------------------------------------------------------------+
| <?xml version="1.0" encoding="UTF-8"?>\r
                                         +|
| <pathology xmlns="http://amirsys.com/ns/acres/pathology/dx/1.5"
enable-tables="true"><metadata><id>diversion_c |
+----------------------------------------------------------------------------------------------------------------+
(1 row) NO CHANGE (still "1.5");

<< update in parts >>
update cms.segment_data set text =
regexp_replace(substring(text,1,150),
'(^.*)ns/acres/pathology/dx/1.5(.*$)',
E'\\1ns/acres/pathology/dx/1.6\\2') || substring(text, 151)
where id = 'c092880f-8484-4b29-b712-f3df12216701';
UPDATE 1

<<check>>
select substring(text,1,150) from cms.segment_data
where id = 'c092880f-8484-4b29-b712-f3df12216701';
+----------------------------------------------------------------------------------------------------------------+
|                                                   substring
                                          |
+----------------------------------------------------------------------------------------------------------------+
| <?xml version="1.0" encoding="UTF-8"?>\r
                                         +|
| <pathology xmlns="http://amirsys.com/ns/acres/pathology/dx/1.6"
enable-tables="true"><metadata><id>diversion_c |
+----------------------------------------------------------------------------------------------------------------+
(1 row)

<<CHANGED!! (now "1.6")>>

ROLLBACK



Re: regexp_replace failing on 9.0.4

From
Tom Lane
Date:
Rob Sargent <robjsargent@gmail.com> writes:
> For fun I decided to install 9.2 and thought I would try my luck there.
>   Here's was I saw (apologies for the wide output).

> << simple update in place>>
> update cms.segment_data
> set text = regexp_replace(text,'(^.*)ns/acres/pathology/dx/1.5(.*$)',
> E'\\1ns/acres/pathology/dx/1.6\\2')
> where id = 'c092880f-8484-4b29-b712-f3df12216701';
> UPDATE 1
> [ doesn't change the first 150 characters of the field ]

> << update in parts >>
> update cms.segment_data set text =
> regexp_replace(substring(text,1,150),
> '(^.*)ns/acres/pathology/dx/1.5(.*$)',
> E'\\1ns/acres/pathology/dx/1.6\\2') || substring(text, 151)
> where id = 'c092880f-8484-4b29-b712-f3df12216701';
> UPDATE 1
> [ does change the first 150 characters of the field ]

I'm suspicious that there is more than one match to the substring
in that field, with the extra match(es) coming beyond char 150.
The way that regexp is written, I think it would replace the last
match not the first.

            regards, tom lane


Re: regexp_replace failing on 9.0.4

From
Rob Sargent
Date:
On 03/21/2013 06:25 PM, Tom Lane wrote:
> Rob Sargent <robjsargent@gmail.com> writes:
>> For fun I decided to install 9.2 and thought I would try my luck there.
>>    Here's was I saw (apologies for the wide output).
>
>> << simple update in place>>
>> update cms.segment_data
>> set text = regexp_replace(text,'(^.*)ns/acres/pathology/dx/1.5(.*$)',
>> E'\\1ns/acres/pathology/dx/1.6\\2')
>> where id = 'c092880f-8484-4b29-b712-f3df12216701';
>> UPDATE 1
>> [ doesn't change the first 150 characters of the field ]
>
>> << update in parts >>
>> update cms.segment_data set text =
>> regexp_replace(substring(text,1,150),
>> '(^.*)ns/acres/pathology/dx/1.5(.*$)',
>> E'\\1ns/acres/pathology/dx/1.6\\2') || substring(text, 151)
>> where id = 'c092880f-8484-4b29-b712-f3df12216701';
>> UPDATE 1
>> [ does change the first 150 characters of the field ]
>
> I'm suspicious that there is more than one match to the substring
> in that field, with the extra match(es) coming beyond char 150.
> The way that regexp is written, I think it would replace the last
> match not the first.
>
>             regards, tom lane
>
Indeed there are (or at least may be) other instances of the namespace
string.  I was operating on the assumption that only the first would get
hit, but I fear greediness has gotten the better of my yet again.

And there's reason to believe the "9.0.3" test server db did not have
the proliferations of the string.

Thanks as always.