Thread: Locale support for postgresql regex (src)
Hi, I modified two files in postgresql-7.1.3/src/backend/regex/ and in postgresql-7.1.3/src/include/regex/ so 'character class' (eg. [[:alnum:]], [[:alpha:]], etc.) now should support 'locale' settings. http://galaxy.metacerca.it/~anto/pgslq_7_1_3_regex_locale.tar.gz (~14 KB, 2 files) It is not a great work and do not support multibyte, but, for me, is sufficient to isolate, for example, an italian word containing ascii char > 127. For example: select T from tab where T ~* '(^|[^[:alnum:]]+)citt[[:alnum:]]*([^[:alnum:]]+|$)'; now match the word 'città' in a string like 'vado in città', 'città' etc.. PS: excuse my poor english Regards Antonello -- _______________________________________________________ Antonello Nocchi CERCA.COM S.r.l Via dello Stadio, 19 Tel. +39-0578-75.77.77 53045 - Montepulciano (Siena) Tel. +39-0578-71.67.09 ITALY Fax. +39-0578-71.51.89 antonello@cerca.com http://www.cerca.com
This has been saved for the 7.3 release: http://candle.pha.pa.us/cgi-bin/pgpatches2 --------------------------------------------------------------------------- Antonello Nocchi wrote: > Hi, > > I modified two files in postgresql-7.1.3/src/backend/regex/ and in > postgresql-7.1.3/src/include/regex/ > so 'character class' (eg. [[:alnum:]], [[:alpha:]], etc.) now should > support 'locale' settings. > > http://galaxy.metacerca.it/~anto/pgslq_7_1_3_regex_locale.tar.gz (~14 > KB, 2 files) > > It is not a great work and do not support multibyte, but, for me, is > sufficient to isolate, for example, an italian word containing ascii > char > 127. > For example: select T from tab where T ~* > '(^|[^[:alnum:]]+)citt[[:alnum:]]*([^[:alnum:]]+|$)'; > now match the word 'citt?' in a string like 'vado in citt?', 'citt?' > etc.. > > PS: excuse my poor english > > Regards > Antonello > > -- > _______________________________________________________ > > Antonello Nocchi CERCA.COM S.r.l > > Via dello Stadio, 19 Tel. +39-0578-75.77.77 > 53045 - Montepulciano (Siena) Tel. +39-0578-71.67.09 > ITALY Fax. +39-0578-71.51.89 > antonello@cerca.com http://www.cerca.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Your patch has been added to the PostgreSQL unapplied patches list at: http://candle.pha.pa.us/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Antonello Nocchi wrote: > Hi, > > I modified two files in postgresql-7.1.3/src/backend/regex/ and in > postgresql-7.1.3/src/include/regex/ > so 'character class' (eg. [[:alnum:]], [[:alpha:]], etc.) now should > support 'locale' settings. > > http://galaxy.metacerca.it/~anto/pgslq_7_1_3_regex_locale.tar.gz (~14 > KB, 2 files) > > It is not a great work and do not support multibyte, but, for me, is > sufficient to isolate, for example, an italian word containing ascii > char > 127. > For example: select T from tab where T ~* > '(^|[^[:alnum:]]+)citt[[:alnum:]]*([^[:alnum:]]+|$)'; > now match the word 'citt?' in a string like 'vado in citt?', 'citt?' > etc.. > > PS: excuse my poor english > > Regards > Antonello > > -- > _______________________________________________________ > > Antonello Nocchi CERCA.COM S.r.l > > Via dello Stadio, 19 Tel. +39-0578-75.77.77 > 53045 - Montepulciano (Siena) Tel. +39-0578-71.67.09 > ITALY Fax. +39-0578-71.51.89 > antonello@cerca.com http://www.cerca.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
This patch was rejected. Please continue discussion on the hackers list. Thanks. --------------------------------------------------------------------------- Antonello Nocchi wrote: > Hi, > > I modified two files in postgresql-7.1.3/src/backend/regex/ and in > postgresql-7.1.3/src/include/regex/ > so 'character class' (eg. [[:alnum:]], [[:alpha:]], etc.) now should > support 'locale' settings. > > http://galaxy.metacerca.it/~anto/pgslq_7_1_3_regex_locale.tar.gz (~14 > KB, 2 files) > > It is not a great work and do not support multibyte, but, for me, is > sufficient to isolate, for example, an italian word containing ascii > char > 127. > For example: select T from tab where T ~* > '(^|[^[:alnum:]]+)citt[[:alnum:]]*([^[:alnum:]]+|$)'; > now match the word 'citt?' in a string like 'vado in citt?', 'citt?' > etc.. > > PS: excuse my poor english > > Regards > Antonello > > -- > _______________________________________________________ > > Antonello Nocchi CERCA.COM S.r.l > > Via dello Stadio, 19 Tel. +39-0578-75.77.77 > 53045 - Montepulciano (Siena) Tel. +39-0578-71.67.09 > ITALY Fax. +39-0578-71.51.89 > antonello@cerca.com http://www.cerca.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026