Thread: pgsql2shp : Encoding headache

pgsql2shp : Encoding headache

From
Arnaud Lesauvage
Date:
Hi all!

I have an UTF8 encoded shapefile, and an UTF8 encoded postgis-enabled
database. I want my shapefile to be encoded in WIN1252, and a particular
field to be in uppercase.

Since I am on windows, I don't have an iconv executable. Therefore, I am
trying to :
- dump the shapefile with shp2pgsql to an sql text file
- load the resulting sql file into a postgresql table
- dump this table into a shapefile with pgsql2shp (in WIN1252)

To load the shapefile into postgresql, I had to dump it without
specifying a '-W' argument, set my client_encoding to UTF8, then load
the sql file into postgresql.

If I look at the data with pgAdmin (with client_encoding set to UTF8),
it looks good : accentuation is fine, special characters are fine.

To dump the data in a WIN1252-encoded shapefile, pgsql2shp has no
encoding argument, so I set my client encoding to WIN1252 thtough the
environment variable PGCLIENTENCODING.

If I just dump the file this way, it seems to be fine. So this command
works fine :
pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT
mytext, mygeom FROM mytemptable"
->  [621679 rows]

But then, if I dump it through a query to have my field in uppercase, I
get an error 'character 0xc29f of encoding UTF8 has no equivalent in
WIN1252' (translated by myself, the message is in French)
The command is simply :
pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT
upper(mytext) as mytext, mygeom FROM mytemptable"

So I guess there is a problem with my 'upper' conversion, but I have no
idea what this 0xc29f character could be.

Any help would be greatly appreciated.
Thanks a lot !

--
Arnaud Lesauvage

Re: pgsql2shp : Encoding headache

From
Raymond O'Donnell
Date:
On 16/10/2009 10:36, Arnaud Lesauvage wrote:

> I have an UTF8 encoded shapefile, and an UTF8 encoded postgis-enabled
> database. I want my shapefile to be encoded in WIN1252, and a particular
> field to be in uppercase.
>
> Since I am on windows, I don't have an iconv executable. Therefore, I am
> trying to :

If it's any help to you, you can get iconv (and a bunch of other helpful
stuff) from GnuWin32:

  http://gnuwin32.sourceforge.net/


Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

Re: [postgis-users] pgsql2shp : Encoding headache

From
InterRob
Date:
Does that last query (invoking the upper() function) actually run well when executed in pgsql console?


Rob

2009/10/16 Arnaud Lesauvage <arnaud.listes@codata.eu>
Hi all!

I have an UTF8 encoded shapefile, and an UTF8 encoded postgis-enabled database. I want my shapefile to be encoded in WIN1252, and a particular field to be in uppercase.

Since I am on windows, I don't have an iconv executable. Therefore, I am trying to :
- dump the shapefile with shp2pgsql to an sql text file
- load the resulting sql file into a postgresql table
- dump this table into a shapefile with pgsql2shp (in WIN1252)

To load the shapefile into postgresql, I had to dump it without specifying a '-W' argument, set my client_encoding to UTF8, then load the sql file into postgresql.

If I look at the data with pgAdmin (with client_encoding set to UTF8), it looks good : accentuation is fine, special characters are fine.

To dump the data in a WIN1252-encoded shapefile, pgsql2shp has no encoding argument, so I set my client encoding to WIN1252 thtough the environment variable PGCLIENTENCODING.

If I just dump the file this way, it seems to be fine. So this command works fine :
pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT mytext, mygeom FROM mytemptable"
->  [621679 rows]

But then, if I dump it through a query to have my field in uppercase, I get an error 'character 0xc29f of encoding UTF8 has no equivalent in WIN1252' (translated by myself, the message is in French)
The command is simply :
pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT upper(mytext) as mytext, mygeom FROM mytemptable"

So I guess there is a problem with my 'upper' conversion, but I have no idea what this 0xc29f character could be.

Any help would be greatly appreciated.
Thanks a lot !

--
Arnaud Lesauvage
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: pgsql2shp : Encoding headache

From
Arnaud Lesauvage
Date:
Raymond O'Donnell a écrit :
> If it's any help to you, you can get iconv (and a bunch of other helpful
> stuff) from GnuWin32:
>
>   http://gnuwin32.sourceforge.net/

Thanks for your help Raymond.

I tried iconv but I have other problems now.
I still have to load the file into postgresql because the shapefiles
datafile (.dbf) is associated with an index file, and I have to use
pgsql2shp to rebuild it.
I'll try some more though. Maybe iconv before shp2pgsql, then load with
client_encoding set to WIN1252, then dump.
I'll see how it goes.

Arnaud

Re: pgsql2shp : Encoding headache

From
Arnaud Lesauvage
Date:
Arnaud Lesauvage a écrit :
> But then, if I dump it through a query to have my field in uppercase, I
> get an error 'character 0xc29f of encoding UTF8 has no equivalent in
> WIN1252' (translated by myself, the message is in French)
> The command is simply :
> pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT
> upper(mytext) as mytext, mygeom FROM mytemptable"

OK, I narrowed down the problem to the WIN1252 encoding.
Using LATIN1 or LATIN9 for instance works correctly.
Since my application seems to work with LATIN9, I'll go with it.

I am still perplex though. What is this 0xc29f character ? An internet
search tells me that this is some Kanju character. I am quite confident
that if this is true, it would not convert any better to LATIN9 than to
WIN1252.

Also, doing a search like :
SELECT * FROM mytable WHERE upper(myflied) ILIKE u&'%c29f%';
Gives me 0 result.
Am I wrong to think that the error 'character 0xc29f of UTF8' relates to
the character with code point C29F in UTF8 ?

Thanks again for your help/lightings on this matter.

Arnaud


Re: [postgis-users] pgsql2shp : Encoding headache

From
InterRob
Date:
I would do this last query searching for the 0xC29F character WITHOUT the upper() function on the source table, in the native (to table) UTF8 client encoding. No result either?


Rob

2009/10/16 Arnaud Lesauvage <arnaud.listes@codata.eu>
Arnaud Lesauvage a écrit :

But then, if I dump it through a query to have my field in uppercase, I get an error 'character 0xc29f of encoding UTF8 has no equivalent in WIN1252' (translated by myself, the message is in French)
The command is simply :
pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT upper(mytext) as mytext, mygeom FROM mytemptable"

OK, I narrowed down the problem to the WIN1252 encoding.
Using LATIN1 or LATIN9 for instance works correctly.
Since my application seems to work with LATIN9, I'll go with it.

I am still perplex though. What is this 0xc29f character ? An internet search tells me that this is some Kanju character. I am quite confident that if this is true, it would not convert any better to LATIN9 than to WIN1252.

Also, doing a search like :
SELECT * FROM mytable WHERE upper(myflied) ILIKE u&'%c29f%';
Gives me 0 result.
Am I wrong to think that the error 'character 0xc29f of UTF8' relates to the character with code point C29F in UTF8 ?

Thanks again for your help/lightings on this matter.

Arnaud


_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

slightly off-topic: Central Auth

From
"Scot Kreienkamp"
Date:

Hey everyone,

 

I apologize in advance for going slightly off topic, but I have never setup a centralized authentication scheme under Linux.  My question is, what do most people do for centralized command line, X, and PG authentication?  From what I’ve read the main choices are NIS or LDAP.  LDAP would be problematic as I would have to embed a login and plain text password in the ldap.conf file for binding to the MS AD.  On the other hand, it seems like NIS is old, inflexible, outdated, and possibly nearing end of life.  We are a largely Windows shop with many app and database servers running Linux.  The Linux environment is growing too large not to do centralized authentication of some kind. 

 

At this point I’m open to suggestions or comments.  SSH and X are required, PG would be nice to be able to auth centrally as well while I’m at it.

 

Thanks,

 

Scot Kreienkamp

 

Re: slightly off-topic: Central Auth

From
Scott Mead
Date:

On Fri, Oct 16, 2009 at 2:38 PM, Scot Kreienkamp <SKreien@la-z-boy.com> wrote:

Hey everyone,

 

I apologize in advance for going slightly off topic, but I have never setup a centralized authentication scheme under Linux.  My question is, what do most people do for centralized command line, X, and PG authentication?  From what I’ve read the main choices are NIS or LDAP.  LDAP would be problematic as I would have to embed a login and plain text password in the ldap.conf file for binding to the MS AD. 


    It sounds like PAM would be useful for you.  That's really what is was built for.
--Scott

Re: slightly off-topic: Central Auth

From
"Scot Kreienkamp"
Date:

These are all RH4 and 5, so they do all have PAM.  I thought PAM had to interface with something else, which is where NIS and LDAP enter the picture, to authenticate to another server though.  Otherwise I’m not sure how it works?

 

Thanks,

 

Scot Kreienkamp

skreien@la-z-boy.com

 

From: Scott Mead [mailto:scott.lists@enterprisedb.com]
Sent: Friday, October 16, 2009 2:50 PM
To: Scot Kreienkamp
Cc: pgsql-general
Subject: Re: [GENERAL] slightly off-topic: Central Auth

 

 

On Fri, Oct 16, 2009 at 2:38 PM, Scot Kreienkamp <SKreien@la-z-boy.com> wrote:

Hey everyone,

 

I apologize in advance for going slightly off topic, but I have never setup a centralized authentication scheme under Linux.  My question is, what do most people do for centralized command line, X, and PG authentication?  From what I’ve read the main choices are NIS or LDAP.  LDAP would be problematic as I would have to embed a login and plain text password in the ldap.conf file for binding to the MS AD. 


    It sounds like PAM would be useful for you.  That's really what is was built for.
--Scott

Re: slightly off-topic: Central Auth

From
"Scot Kreienkamp"
Date:
On 16/10/2009 19:38, Scot Kreienkamp wrote:
> Hey everyone,
>
>
>
> I apologize in advance for going slightly off topic, but I have never
> setup a centralized authentication scheme under Linux.  My question
is,
> what do most people do for centralized command line, X, and PG
> authentication?  From what I've read the main choices are NIS or LDAP.
> LDAP would be problematic as I would have to embed a login and plain
> text password in the ldap.conf file for binding to the MS AD.  On the
> other hand, it seems like NIS is old, inflexible, outdated, and
possibly
> nearing end of life.  We are a largely Windows shop with many app and
> database servers running Linux.  The Linux environment is growing too
> large not to do centralized authentication of some kind.
>
>
>
> At this point I'm open to suggestions or comments.  SSH and X are
> required, PG would be nice to be able to auth centrally as well while
> I'm at it.

Does "PG" = PostgreSQL? If so, it can do LDAP, Kerberos and PAM, among
other things:

[Scot Kreienkamp]
But of course.  :)

So I guess what I see taking shape is setting up everything to auth
against PAM locally, then setting up local PAM to auth to a remote
source.


Thanks,

Scot Kreienkamp
skreien@la-z-boy.com


Re: slightly off-topic: Central Auth

From
Raymond O'Donnell
Date:
On 16/10/2009 19:38, Scot Kreienkamp wrote:
> Hey everyone,
>
>
>
> I apologize in advance for going slightly off topic, but I have never
> setup a centralized authentication scheme under Linux.  My question is,
> what do most people do for centralized command line, X, and PG
> authentication?  From what I've read the main choices are NIS or LDAP.
> LDAP would be problematic as I would have to embed a login and plain
> text password in the ldap.conf file for binding to the MS AD.  On the
> other hand, it seems like NIS is old, inflexible, outdated, and possibly
> nearing end of life.  We are a largely Windows shop with many app and
> database servers running Linux.  The Linux environment is growing too
> large not to do centralized authentication of some kind.
>
>
>
> At this point I'm open to suggestions or comments.  SSH and X are
> required, PG would be nice to be able to auth centrally as well while
> I'm at it.

Does "PG" = PostgreSQL? If so, it can do LDAP, Kerberos and PAM, among
other things:

  http://www.postgresql.org/docs/8.4/static/client-authentication.html

Ray.


--
--  Raymond O'Donnell
--  Tela Design Ltd, Craughwell, Co. Galway, Ireland.
--  Software & graphic design and consultancy
--  ray@teladesign.ie
--

Re: slightly off-topic: Central Auth

From
Tom Lane
Date:
"Scot Kreienkamp" <SKreien@la-z-boy.com> writes:
> On 16/10/2009 19:38, Scot Kreienkamp wrote:
>> ...  We are a largely Windows shop with many app and
>> database servers running Linux.  The Linux environment is growing too
>> large not to do centralized authentication of some kind.

> So I guess what I see taking shape is setting up everything to auth
> against PAM locally, then setting up local PAM to auth to a remote
> source.

What are you using for central auth in the Windows portions of your
shop?

What I'd suggest is that you standardize on Kerberos auth (that's what
it's called in the Unix world, MS might have another name for it).
You can definitely plug Linux into an Active Directory server for this,
and I believe that you have the option to switch it around in future
if you decide you'd rather have a Linux machine as your central auth
server.

If you decide to go with this approach and use PAM as intermediary,
you'll need the patch I just committed in response to bug #5121 --- it
turns out nobody had ever tried that with Postgres before :-(.  But
I think it's also possible to just use PG's native Kerberos support
with AD, which would explain why nobody had tried it.

            regards, tom lane

Re: slightly off-topic: Central Auth

From
Magnus Hagander
Date:
2009/10/17 Tom Lane <tgl@sss.pgh.pa.us>:
> If you decide to go with this approach and use PAM as intermediary,
> you'll need the patch I just committed in response to bug #5121 --- it
> turns out nobody had ever tried that with Postgres before :-(.  But
> I think it's also possible to just use PG's native Kerberos support
> with AD, which would explain why nobody had tried it.

It most definitely is possible.  Note that the recommendation is to
use GSSAPI with Kerberos and not the native "krb5" authentication for
tihs if possible.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: slightly off-topic: Central Auth

From
Raymond O'Donnell
Date:
On 16/10/2009 20:38, Raymond O'Donnell wrote:
> Does "PG" = PostgreSQL? If so, it can do LDAP, Kerberos and PAM, among
> other things:

Apologies for the noise - I thought I was replying to a non-PG list. :-)

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

Re: [postgis-users] pgsql2shp : Encoding headache

From
Arnaud Lesauvage
Date:
InterRob a écrit :
 >> Arnaud Lesauvage a écrit :
 >> Also, doing a search like :
 >> SELECT * FROM mytable WHERE upper(myflied) ILIKE u&'%c29f%';
 >> Gives me 0 result.
 >> Am I wrong to think that the error 'character 0xc29f of UTF8' relates
 >> to the character with code point C29F in UTF8 ?
 >
> I would do this last query searching for the 0xC29F character WITHOUT the
> upper() function on the source table, in the native (to table) UTF8 client
> encoding. No result either?

No, no result either.
What puzzles me is that this field contains names of French streets, and
I can't see what character could be in such a field that has no
equivalent in win1252 !


--
Arnaud Lesauvage
IT Executive

Codata Belgium SA
143-145, Avenue bourgmestre Jean Materne - 5100 Namur - Belgium
Tel : +32 81 21 53 48 - Fax : +32 81 21 54 24
Mel : arnaud.lesauvage@codata.eu
Web : http://www.codata.eu/