Thread: What's the best way to translate MS generated translations of user input to what the user actually typed prior to insert or update into PG backend table ?

psql (11.5, server 11.3) on linux

I'm using MS-Access as a Windows front-end to a PG DB table through ODBC (PostgreSQL Unicode ODBC Driver).  Seems to be working fine except for when users enter "..." as part of a string, MS (in it's infinite wisdom) decides to translate that to what emacs is describing as...

character: … (displayed as …) (codepoint 8230, #o20046, #x2026)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x2026

It makes this translation for some users but not others, so I'm assuming it has something to do with how individuals set up their Windows env.  No matter, assume we can't have them change their environment in whatever soln we come up with. 

I want to change the 0x2026 to the 3 periods which the user entered.

I was thinking of a pre-insert and pre-update trigger which could make the translation.  But I'd rather not try to do this one char at a time... translate "..." today to fix today's issue, then "--" tomorrow when that pops up, then the way MS translates double quotes the next day, etc... .  Is there an elegant way to do this ?

Thanks In Advance for any help !
On Fri, Oct 30, 2020 at 9:03 AM David Gauthier <davegauthierpg@gmail.com> wrote:
I was thinking of a pre-insert and pre-update trigger which could make the translation.  But I'd rather not try to do this one char at a time... translate "..." today to fix today's issue, then "--" tomorrow when that pops up, then the way MS translates double quotes the next day, etc... .  Is there an elegant way to do this ?

I think the trigger solution is probably the correct one (not sure where else you could inject yourself into the data stream).  It seems inefficient for the trigger to pull its knowledge base from a table - thus code the logic directly within the trigger.  The other side of that equation is that now you need to update schema to change rules instead of adding a row to a table.  Neither is more elegant than the other, rather there is a trade-off here that only you need to make.

Some things to consider:

Can you get a reference from MS on these conversions so you can code them in advance of seeing them appear in data?
Is there some kind of pattern for the ones you know about that you can check for more generally and warn you when the pattern appears but there is not specific rule?
What is the downside of not doing this at all?  Or missing new ones?
Do you need to reverse the process so MS-Access is getting the data it expects?

David J.


On 30/10/2020 16:03, David Gauthier wrote:
psql (11.5, server 11.3) on linux

I'm using MS-Access as a Windows front-end to a PG DB table through ODBC (PostgreSQL Unicode ODBC Driver).  Seems to be working fine except for when users enter "..." as part of a string, MS (in it's infinite wisdom) decides to translate that to what emacs is describing as...

character: … (displayed as …) (codepoint 8230, #o20046, #x2026)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x2026

It makes this translation for some users but not others, so I'm assuming it has something to do with how individuals set up their Windows env.  No matter, assume we can't have them change their environment in whatever soln we come up with. 

I want to change the 0x2026 to the 3 periods which the user entered.

I was thinking of a pre-insert and pre-update trigger which could make the translation.  But I'd rather not try to do this one char at a time... translate "..." today to fix today's issue, then "--" tomorrow when that pops up, then the way MS translates double quotes the next day, etc... .  Is there an elegant way to do this ?

Thanks In Advance for any help !


IMHO this is the spelling functionality from Office/Access "correcting" your typing, you can probably see it happening as you type and tapping the escape key immediately afterwards undoes the "suggestion". You're right in that it would need either a desktop change per user or a translation table. You can find all the changes though and do them all at once which saves your "next day, next day...." issue?

Tim Clarke

 

Telephone: Witham: +44(0)1376 503500 | London: +44 (0)20 3009 0853 | Frankfurt: +49 (0)69 7191 6000 | Hong Kong: +852 5803 1687 | Toronto: +1 647 503 2848
Web: https://www.manifest.co.uk/

 

Minerva Analytics Ltd - A Solactive Company
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | United Kingdom


Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to admin@minerva.info
Legal: Minerva Analytics is the trading name of: Minerva Analytics Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here https://www.manifest.co.uk/legal/ for further information.

Next question, how does one actually replace the thing ?
replace(thestring,0x2026,'...') 
... isn't going to fly. 
Working with binary values in PG isn't at the top of my resume :-)


On Fri, Oct 30, 2020 at 12:20 PM Tim Clarke <tim.clarke@minerva.info> wrote:

On 30/10/2020 16:03, David Gauthier wrote:
psql (11.5, server 11.3) on linux

I'm using MS-Access as a Windows front-end to a PG DB table through ODBC (PostgreSQL Unicode ODBC Driver).  Seems to be working fine except for when users enter "..." as part of a string, MS (in it's infinite wisdom) decides to translate that to what emacs is describing as...

character: … (displayed as …) (codepoint 8230, #o20046, #x2026)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x2026

It makes this translation for some users but not others, so I'm assuming it has something to do with how individuals set up their Windows env.  No matter, assume we can't have them change their environment in whatever soln we come up with. 

I want to change the 0x2026 to the 3 periods which the user entered.

I was thinking of a pre-insert and pre-update trigger which could make the translation.  But I'd rather not try to do this one char at a time... translate "..." today to fix today's issue, then "--" tomorrow when that pops up, then the way MS translates double quotes the next day, etc... .  Is there an elegant way to do this ?

Thanks In Advance for any help !


IMHO this is the spelling functionality from Office/Access "correcting" your typing, you can probably see it happening as you type and tapping the escape key immediately afterwards undoes the "suggestion". You're right in that it would need either a desktop change per user or a translation table. You can find all the changes though and do them all at once which saves your "next day, next day...." issue?

Tim Clarke

 

Telephone: Witham: +44(0)1376 503500 | London: +44 (0)20 3009 0853 | Frankfurt: +49 (0)69 7191 6000 | Hong Kong: +852 5803 1687 | Toronto: +1 647 503 2848
Web: https://www.manifest.co.uk/

 

Minerva Analytics Ltd - A Solactive Company
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | United Kingdom


Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to admin@minerva.info
Legal: Minerva Analytics is the trading name of: Minerva Analytics Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here https://www.manifest.co.uk/legal/ for further information.

On 2020-10-30 13:51:49 -0400, David Gauthier wrote:
> Next question, how does one actually replace the thing ?
> replace(thestring,0x2026,'...') 
> ... isn't going to fly. 
> Working with binary values in PG isn't at the top of my resume :-)

Either just type (or copy/paste) it:

    select replace('a … string …', '…', '...');

or use the chr() function:

    select replace('a … string …', chr(8230), '...');


I would prefer the former as it is easier to read (as long as the
characters are printable), but the latter may be easier to type.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment
David Gauthier <davegauthierpg@gmail.com> writes:
> Next question, how does one actually replace the thing ?
> replace(thestring,0x2026,'...')
> ... isn't going to fly.

If you don't want to write the actual UTF8 character, maybe the chr()
function would suit you better [1].

            regards, tom lane

[1] https://www.postgresql.org/docs/current/functions-string.html