Thread: regexp_replace failing on 9.0.4
On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2] test machine and my 9.1.2[3] dev box all is fine This is may statement update cms.segment_data s set text = regexp_replace(s.text, '(^.*)ns/acres/pathology/dx/1.5(.*$)', E'\\1ns/acres/pathology/dx/1.6\\2') from dm.payload_segment p, support.fix_pathdx_namespace f where s.id = p.segment_data_id and p.item_id = f.item_id and p.content_version = f.maxversion ; "UPDATE 76" reported, but zero changes persist When I just select the regexp like so select legacy_id, regexp_replace( substring(s.text, 1, 150) , '(^.*)ns/acres/pathology/dx/1.5(.*$)', E'\\1ns/acres/pathology/dx/1.6\\2') from dm.payload_segment p, support.fix_pathdx_namespace f, cms.segment_data s where s.id = p.segment_data_id and p.item_id = f.item_id and p.content_version = f.maxversion ; I get the corrected data (chiefly the "1.6") Can anyone see where I've gone off track? [1]PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292], 64-bit [2]PostgreSQL 9.0.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292], 64-bit [3]PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
Rob Sargent <robjsargent@gmail.com> writes: > On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2] > test machine and my 9.1.2[3] dev box all is fine AFAICS from the commit logs, there were no changes affecting the regex code between 9.0.3 and 9.0.4. I'm suspicious that your data is different on the different servers. regards, tom lane
On 03/18/2013 01:19 PM, Tom Lane wrote: > Rob Sargent <robjsargent@gmail.com> writes: >> On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2] >> test machine and my 9.1.2[3] dev box all is fine > > AFAICS from the commit logs, there were no changes affecting the regex > code between 9.0.3 and 9.0.4. I'm suspicious that your data is > different on the different servers. > > regards, tom lane > Good to hear, thought I might have glossed over the telling release note - my usual mo I restored my dev db from prod just for this run. test and prod are out of sync by a couple of weeks. What I had to do ultimately was as follows, create table support.duplicate_pathdx_namespace as select item_id, legacy_id, name, locked_by, maxversion, regexp_replace(substring(content,1,150), '(^.*)ns/acres/pathology/dx/1.5(.*$)', E'\\1ns/acres/pathology/dx/1.6\\2') || substring(content, 151) as content from support.fix_pathdx_namespace; and the update finally stuck, then copied that reconstructed textblock over to the target production table in an regexp-less update. Doing the reg_exp_replace on the whole text blog as part of create temp table did not work. I'll double check my dev server from same pre-run dump. Weird.
Rob Sargent <robjsargent@gmail.com> writes: > On 03/18/2013 01:19 PM, Tom Lane wrote: >> Rob Sargent <robjsargent@gmail.com> writes: >>> On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2] >>> test machine and my 9.1.2[3] dev box all is fine >> AFAICS from the commit logs, there were no changes affecting the regex >> code between 9.0.3 and 9.0.4. I'm suspicious that your data is >> different on the different servers. > Good to hear, thought I might have glossed over the telling release note > - my usual mo Maybe we're barking up the wrong tree by suspecting the regex itself. Perhaps the updates were suppressed by a trigger, or the transaction rolled back instead of committing, or some such? regards, tom lane
On 03/18/2013 02:40 PM, Tom Lane wrote: > Rob Sargent <robjsargent@gmail.com> writes: >> On 03/18/2013 01:19 PM, Tom Lane wrote: >>> Rob Sargent <robjsargent@gmail.com> writes: >>>> On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2] >>>> test machine and my 9.1.2[3] dev box all is fine > >>> AFAICS from the commit logs, there were no changes affecting the regex >>> code between 9.0.3 and 9.0.4. I'm suspicious that your data is >>> different on the different servers. > >> Good to hear, thought I might have glossed over the telling release note >> - my usual mo > > Maybe we're barking up the wrong tree by suspecting the regex itself. > Perhaps the updates were suppressed by a trigger, or the transaction > rolled back instead of committing, or some such? > > regards, tom lane > The work was all rolled into a function: o find the chapters; o copy the necessary data (mainly the text blob) into a back-out table o "lock" the chapters (protect them from exposure to the client app) o perform the regexp_replace as the update to prod. table The function was exec'd in a tx and committed, leaving the back-out table and the programmatic locks in place, but the update itself had been a no-op and continued to be with ad hoc update statements, until I hit the final goofy answer ( rg_replace(string, start) || substring(end) ) Have not yet had a chance to re-create on dev. Test worked like a charm.
> Maybe we're barking up the wrong tree by suspecting the regex itself. > Perhaps the updates were suppressed by a trigger, or the transaction > rolled back instead of committing, or some such? > > regards, tom lane Barking mad, more like it. I had rolled back the execution of my function in my dev env. and running again produces the same result as experienced in production system. Why the update of the text type field view regexp_replace is failing still confuses me, but that's pretty much my natural state. The size of the text field ranges from 7.5k to 24k char. Here is the update (again) update cms.segment_data s set text = regexp_replace(f.content, '(^.*)ns/acres/pathology/dx/1.5(.*$)', E'\\1ns/acres/pathology/dx/1.6\\2') from dm.payload_segment p, support.fix_pathdx_namespace f where s.id = p.segment_data_id and p.item_id = f.item_id and p.content_version = f.maxversion ;
On 03/18/2013 02:40 PM, Tom Lane wrote: > Rob Sargent <robjsargent@gmail.com> writes: >> On 03/18/2013 01:19 PM, Tom Lane wrote: >>> Rob Sargent <robjsargent@gmail.com> writes: >>>> On our 9.0.4[1] server my regexp_replace is a no-op, but on the 9.0.3[2] >>>> test machine and my 9.1.2[3] dev box all is fine > >>> AFAICS from the commit logs, there were no changes affecting the regex >>> code between 9.0.3 and 9.0.4. I'm suspicious that your data is >>> different on the different servers. > >> Good to hear, thought I might have glossed over the telling release note >> - my usual mo > > Maybe we're barking up the wrong tree by suspecting the regex itself. > Perhaps the updates were suppressed by a trigger, or the transaction > rolled back instead of committing, or some such? > > regards, tom lane > For fun I decided to install 9.2 and thought I would try my luck there. Here's was I saw (apologies for the wide output). #localhost:cms# select count(*) from pg_trigger; +-------+ | count | +-------+ | 364 | +-------+ (1 row) Time: 0.407 ms #localhost:cms# select tgname from pg_trigger where tgname !~ '^RI_ConstraintTrigger'; +--------+ | tgname | +--------+ +--------+ (0 rows) #localhost:cms# select version(); +-------------------------------------------------------------------------------------------------------+ | version | +--------------------------------------------------------------------------------------------------------- | PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit | +-------------------------------------------------------------------------------------------------------+ (1 row) select substring(text,1,150) from cms.segment_data where id = 'c092880f-8484-4b29-b712-f3df12216701'; +----------------------------------------------------------------------------------------------------------------+ | substring | +----------------------------------------------------------------------------------------------------------------+ | <?xml version="1.0" encoding="UTF-8"?>\r +| | <pathology xmlns="http://amirsys.com/ns/acres/pathology/dx/1.5" enable-tables="true"><metadata><id>diversion_c | +----------------------------------------------------------------------------------------------------------------+ (1 row) #localhost:cms# begin; BEGIN << simple update in place>> update cms.segment_data set text = regexp_replace(text,'(^.*)ns/acres/pathology/dx/1.5(.*$)', E'\\1ns/acres/pathology/dx/1.6\\2') where id = 'c092880f-8484-4b29-b712-f3df12216701'; UPDATE 1 < check >> select substring(text,1,150) from cms.segment_data where id = 'c092880f-8484-4b29-b712-f3df12216701'; +----------------------------------------------------------------------------------------------------------------+ | substring | +----------------------------------------------------------------------------------------------------------------+ | <?xml version="1.0" encoding="UTF-8"?>\r +| | <pathology xmlns="http://amirsys.com/ns/acres/pathology/dx/1.5" enable-tables="true"><metadata><id>diversion_c | +----------------------------------------------------------------------------------------------------------------+ (1 row) NO CHANGE (still "1.5"); << update in parts >> update cms.segment_data set text = regexp_replace(substring(text,1,150), '(^.*)ns/acres/pathology/dx/1.5(.*$)', E'\\1ns/acres/pathology/dx/1.6\\2') || substring(text, 151) where id = 'c092880f-8484-4b29-b712-f3df12216701'; UPDATE 1 <<check>> select substring(text,1,150) from cms.segment_data where id = 'c092880f-8484-4b29-b712-f3df12216701'; +----------------------------------------------------------------------------------------------------------------+ | substring | +----------------------------------------------------------------------------------------------------------------+ | <?xml version="1.0" encoding="UTF-8"?>\r +| | <pathology xmlns="http://amirsys.com/ns/acres/pathology/dx/1.6" enable-tables="true"><metadata><id>diversion_c | +----------------------------------------------------------------------------------------------------------------+ (1 row) <<CHANGED!! (now "1.6")>> ROLLBACK
Rob Sargent <robjsargent@gmail.com> writes: > For fun I decided to install 9.2 and thought I would try my luck there. > Here's was I saw (apologies for the wide output). > << simple update in place>> > update cms.segment_data > set text = regexp_replace(text,'(^.*)ns/acres/pathology/dx/1.5(.*$)', > E'\\1ns/acres/pathology/dx/1.6\\2') > where id = 'c092880f-8484-4b29-b712-f3df12216701'; > UPDATE 1 > [ doesn't change the first 150 characters of the field ] > << update in parts >> > update cms.segment_data set text = > regexp_replace(substring(text,1,150), > '(^.*)ns/acres/pathology/dx/1.5(.*$)', > E'\\1ns/acres/pathology/dx/1.6\\2') || substring(text, 151) > where id = 'c092880f-8484-4b29-b712-f3df12216701'; > UPDATE 1 > [ does change the first 150 characters of the field ] I'm suspicious that there is more than one match to the substring in that field, with the extra match(es) coming beyond char 150. The way that regexp is written, I think it would replace the last match not the first. regards, tom lane
On 03/21/2013 06:25 PM, Tom Lane wrote: > Rob Sargent <robjsargent@gmail.com> writes: >> For fun I decided to install 9.2 and thought I would try my luck there. >> Here's was I saw (apologies for the wide output). > >> << simple update in place>> >> update cms.segment_data >> set text = regexp_replace(text,'(^.*)ns/acres/pathology/dx/1.5(.*$)', >> E'\\1ns/acres/pathology/dx/1.6\\2') >> where id = 'c092880f-8484-4b29-b712-f3df12216701'; >> UPDATE 1 >> [ doesn't change the first 150 characters of the field ] > >> << update in parts >> >> update cms.segment_data set text = >> regexp_replace(substring(text,1,150), >> '(^.*)ns/acres/pathology/dx/1.5(.*$)', >> E'\\1ns/acres/pathology/dx/1.6\\2') || substring(text, 151) >> where id = 'c092880f-8484-4b29-b712-f3df12216701'; >> UPDATE 1 >> [ does change the first 150 characters of the field ] > > I'm suspicious that there is more than one match to the substring > in that field, with the extra match(es) coming beyond char 150. > The way that regexp is written, I think it would replace the last > match not the first. > > regards, tom lane > Indeed there are (or at least may be) other instances of the namespace string. I was operating on the assumption that only the first would get hit, but I fear greediness has gotten the better of my yet again. And there's reason to believe the "9.0.3" test server db did not have the proliferations of the string. Thanks as always.