Thread: Trouble with regexp_matches
Dear list,
Version string PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
I’m running the query below, and it is limiting results as if “regexp_matches” being in where clause.
IMHO, it is wrong: in case there is no match, shall return null or empty array – not remove the result from the set!!!
Is this a collateral effect of using regexp_matches in columns?
If yes, shall not this information be BOLD RED FLASHING in documentation (or it is already, and some kind sould would point me where)?
-- First query (that is limiting results) ---------------------------------------------------------------------
select codigoocorrencia, datahoraocorrencia, datahoraimportacao, observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')
from batchloteocorrencia
where codigoocorrencia = '091'
and observacao is not null
order by datahoraimportacao DESC
Total results = 59
--Second query (that is not limiting results, as I did expect)-------------------------------------------------------------------
select codigoocorrencia, datahoraocorrencia, datahoraimportacao, observacao
from batchloteocorrencia
where codigoocorrencia = '091'
and observacao is not null
order by datahoraimportacao DESC
Total results = 3826
Why is that?
Regards,
Edson Richter
On 11/05/2016 10:01 AM, Edson Richter wrote: > Dear list, > > > > Version string PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, > compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit > > > > I’m running the query below, and it is limiting results as if > “regexp_matches” being in where clause. > > IMHO, it is wrong: in case there is no match, shall return null or empty > array – not remove the result from the set!!! > > > > Is this a collateral effect of using regexp_matches in columns? > > If yes, shall not this information be BOLD RED FLASHING in documentation > (or it is already, and some kind sould would point me where)? https://www.postgresql.org/docs/9.4/static/functions-matching.html "It is possible to force regexp_matches() to always return one row by using a sub-select; this is particularly useful in a SELECT target list when you want all rows returned, even non-matching ones: SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; " > > > > -- First query (that is limiting results) > --------------------------------------------------------------------- > > select codigoocorrencia, datahoraocorrencia, datahoraimportacao, > observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d') > > from batchloteocorrencia > > where codigoocorrencia = '091' > > and observacao is not null > > order by datahoraimportacao DESC > > > > Total results = 59 > > > > --Second query (that is not limiting results, as I did > expect)------------------------------------------------------------------- > > select codigoocorrencia, datahoraocorrencia, datahoraimportacao, observacao > > from batchloteocorrencia > > where codigoocorrencia = '091' > > and observacao is not null > > order by datahoraimportacao DESC > > > > Total results = 3826 > > > > > > Why is that? > > > > Regards, > > > > Edson Richter > -- Adrian Klaver adrian.klaver@aklaver.com
Edson Richter <edsonrichter@hotmail.com> writes: > I’m running the query below, and it is limiting results as if “regexp_matches” being in where clause. > IMHO, it is wrong: in case there is no match, shall return null or empty array – not remove the result from the set!!! Well, no, because regexp_matches() returns a set. If there's no match, there's zero rows in the set. The standard workaround is to use a scalar sub-select, which has the effect of converting a zero-row result into a NULL: select codigoocorrencia, datahoraocorrencia, datahoraimportacao, observacao, (select regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')) from ... As of v10 there will be a less confusing solution: use regexp_match() instead. regards, tom lane
> -----Mensagem original----- > De: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Enviada em: sábado, 5 de novembro de 2016 15:21 > Para: Edson Richter <edsonrichter@hotmail.com> > Cc: pgsql-general@postgresql.org > Assunto: Re: [GENERAL] Trouble with regexp_matches > > Edson Richter <edsonrichter@hotmail.com> writes: > > I’m running the query below, and it is limiting results as if > “regexp_matches” being in where clause. > > IMHO, it is wrong: in case there is no match, shall return null or empty array > – not remove the result from the set!!! > > Well, no, because regexp_matches() returns a set. If there's no match, > there's zero rows in the set. For me, it is a strange behavior - putting something in select clause will restrict results as if it lies in join or whereclauses. > > The standard workaround is to use a scalar sub-select, which has the effect > of converting a zero-row result into a NULL: > > select codigoocorrencia, datahoraocorrencia, datahoraimportacao, > observacao, (select regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')) > from ... > > As of v10 there will be a less confusing solution: use regexp_match() instead. > > regards, tom lane Ok, for now, I've changed regexp_matches() to "... substring(observacao from '\d\d/\d\d/\d\d\d\d')" which does the job gracefully. I still believe that an alert shall be made in the docs page (String functions), because seems confusing and error prone. Thanks, Edson Richter
> -----Mensagem original----- > De: Adrian Klaver [mailto:adrian.klaver@aklaver.com] > Enviada em: sábado, 5 de novembro de 2016 15:13 > Para: Edson Richter <edsonrichter@hotmail.com>; pgsql- > general@postgresql.org > Assunto: Re: [GENERAL] Trouble with regexp_matches > > On 11/05/2016 10:01 AM, Edson Richter wrote: > > Dear list, > > > > > > > > Version string PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, > > compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit > > > > > > > > I’m running the query below, and it is limiting results as if > > “regexp_matches” being in where clause. > > > > IMHO, it is wrong: in case there is no match, shall return null or > > empty array – not remove the result from the set!!! > > > > > > > > Is this a collateral effect of using regexp_matches in columns? > > > > If yes, shall not this information be BOLD RED FLASHING in > > documentation (or it is already, and some kind sould would point me > where)? > > https://www.postgresql.org/docs/9.4/static/functions-matching.html I see - I always believed that this page was related to WHERE clause or using functions in the PostgreSQL way (which is,in your turn, a alternative to "from" syntax) - not for the select clause. But now that you mention it, and re-reading all the information, I can understand the implications. Nevertheless, would be nice to put a huge warning at the "String functions" page about this behavior... > > "It is possible to force regexp_matches() to always return one row by using a > sub-select; this is particularly useful in a SELECT target list when you want all > rows returned, even non-matching ones: > > SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; " > Thanks, this worked - as well removing the "regexp_matches" and using "SUBSTRING( text FROM pattern)". I really appreciate your help. Kind regards, Edson Richter > > > > > > > > > > -- First query (that is limiting results) > > --------------------------------------------------------------------- > > > > select codigoocorrencia, datahoraocorrencia, datahoraimportacao, > > observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d') > > > > from batchloteocorrencia > > > > where codigoocorrencia = '091' > > > > and observacao is not null > > > > order by datahoraimportacao DESC > > > > > > > > Total results = 59 > > > > > > > > --Second query (that is not limiting results, as I did > > expect)--------------------------------------------------------------- > > ---- > > > > select codigoocorrencia, datahoraocorrencia, datahoraimportacao, > > observacao > > > > from batchloteocorrencia > > > > where codigoocorrencia = '091' > > > > and observacao is not null > > > > order by datahoraimportacao DESC > > > > > > > > Total results = 3826 > > > > > > > > > > > > Why is that? > > > > > > > > Regards, > > > > > > > > Edson Richter > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com
On 11/05/2016 11:56 AM, Edson Richter wrote: >> -----Mensagem original----- >> De: Adrian Klaver [mailto:adrian.klaver@aklaver.com] >> Enviada em: sábado, 5 de novembro de 2016 15:13 >> Para: Edson Richter <edsonrichter@hotmail.com>; pgsql- >> general@postgresql.org >> Assunto: Re: [GENERAL] Trouble with regexp_matches >> >> On 11/05/2016 10:01 AM, Edson Richter wrote: >>> Dear list, >>> >>> >>> >>> Version string PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, >>> compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit >>> >>> >>> >>> I’m running the query below, and it is limiting results as if >>> “regexp_matches” being in where clause. >>> >>> IMHO, it is wrong: in case there is no match, shall return null or >>> empty array – not remove the result from the set!!! >>> >>> >>> >>> Is this a collateral effect of using regexp_matches in columns? >>> >>> If yes, shall not this information be BOLD RED FLASHING in >>> documentation (or it is already, and some kind sould would point me >> where)? >> >> https://www.postgresql.org/docs/9.4/static/functions-matching.html > > > I see - I always believed that this page was related to WHERE clause or using functions in the PostgreSQL way (which is,in your turn, a alternative to "from" syntax) - not for the select clause. > But now that you mention it, and re-reading all the information, I can understand the implications. > > Nevertheless, would be nice to put a huge warning at the "String functions" page about this behavior... Well the above page is pointed to in the string functions section: https://www.postgresql.org/docs/9.4/static/functions-string.html "regexp_matches(string text, pattern text [, flags text]) setof text[] Return all captured substrings resulting from matching a POSIX regular expression against the string. See Section 9.7.3 for more information." Where 'Section 9.7.3' is a link to it. Not sure if that counts as a warning though:) > >> >> "It is possible to force regexp_matches() to always return one row by using a >> sub-select; this is particularly useful in a SELECT target list when you want all >> rows returned, even non-matching ones: >> >> SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; " >> > > Thanks, this worked - as well removing the "regexp_matches" and using "SUBSTRING( text FROM pattern)". > > > I really appreciate your help. > > > Kind regards, > > Edson Richter > > >> >> >>> >>> >>> >>> -- First query (that is limiting results) >>> --------------------------------------------------------------------- >>> >>> select codigoocorrencia, datahoraocorrencia, datahoraimportacao, >>> observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d') >>> >>> from batchloteocorrencia >>> >>> where codigoocorrencia = '091' >>> >>> and observacao is not null >>> >>> order by datahoraimportacao DESC >>> >>> >>> >>> Total results = 59 >>> >>> >>> >>> --Second query (that is not limiting results, as I did >>> expect)--------------------------------------------------------------- >>> ---- >>> >>> select codigoocorrencia, datahoraocorrencia, datahoraimportacao, >>> observacao >>> >>> from batchloteocorrencia >>> >>> where codigoocorrencia = '091' >>> >>> and observacao is not null >>> >>> order by datahoraimportacao DESC >>> >>> >>> >>> Total results = 3826 >>> >>> >>> >>> >>> >>> Why is that? >>> >>> >>> >>> Regards, >>> >>> >>> >>> Edson Richter >>> >> >> >> -- >> Adrian Klaver >> adrian.klaver@aklaver.com > > -- Adrian Klaver adrian.klaver@aklaver.com