Re: [Bacula-users] Catastrophic changes to PostgreSQL 8.4 - Mailing list pgsql-general

From Frank Sweetser
Subject Re: [Bacula-users] Catastrophic changes to PostgreSQL 8.4
Date
Msg-id 4B17CA23.8010306@wpi.edu
Whole thread Raw
In response to Re: Catastrophic changes to PostgreSQL 8.4  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-general
On 12/3/2009 3:33 AM, Craig Ringer wrote:
> Kern Sibbald wrote:
>> Hello,
>>
>> Thanks for all the answers; I am a bit overwhelmed by the number, so I am
>> going to try to answer everyone in one email.
>>
>> The first thing to understand is that it is *impossible* to know what the
>> encoding is on the client machine (FD -- or File daemon).  On say a

Or, even worse, which encoding the user or application was thinking of when it
wrote a particular out.  There's no guarantee that any two files on a system
were intended to be looked at with the same encoding.

>> Unix/Linux system, the user could create filenames with non-UTF-8 then switch
>> to UTF-8, or restore files that were tarred on Windows or on Mac, or simply
>> copy a Mac directory.  Finally, using system calls to create a file, you can
>> put *any* character into a filename.
>
> While true in theory, in practice it's pretty unusual to have filenames
> encoded with an encoding other than the system LC_CTYPE on a modern
> UNIX/Linux/BSD machine.

Unless, of course, you're at a good sized school with lots of international
students, and have fileservers holding filenames created on desktops running
in Chinese, Turkish, Russian, and other locales.

In the end, a filename is (under linux, at least) just a string of arbitrary
bytes containing anything except / and NULL.  If bacula tries to get too
clever, and munges or misinterprets those bytes strings - or, worse yet, if
the database does it behind your back - then stuff _will_ end up breaking.

(A few years back, someone heavily involved in linux kernel filesystem work
was talking about this exact issue, and made the remark that many doing
internationalization work secretly feel it would be easier to just teach
everyone english.  Impossible as this may be, I have since come to understand
what they were talking about...)

--
Frank Sweetser fs at wpi.edu  |  For every problem, there is a solution that
WPI Senior Network Engineer   |  is simple, elegant, and wrong. - HL Mencken
      GPG fingerprint = 6174 1257 129E 0D21 D8D4  E8A3 8E39 29E3 E2E8 8CEC

pgsql-general by date:

Previous
From: Eitan Talmi
Date:
Subject: Re: [Bacula-users] Catastrophic changes to PostgreSQL 8.4
Next
From: mrciken
Date:
Subject: Daily migration on Postgresql