Thread: Defining character sets for indicidual fields

Defining character sets for indicidual fields

From
"Ram Ravichandran"
Date:
Hi,

By default, my postgresql server is set to use UTF8 character set. I was wondering if there is any way to make sure that certain fields like url etc. only makes use of ascii. My main aim is to save space by using only 1 byte / character for urls  (some of the urls are over 200 characters long). Is this possible? Or are all characters eventually converted to UTF8 during storage?

Thanks,

Ram

Re: Defining character sets for indicidual fields

From
Steve Atkins
Date:
On May 31, 2008, at 6:22 PM, Ram Ravichandran wrote:

> Hi,
>
> By default, my postgresql server is set to use UTF8 character set. I
> was wondering if there is any way to make sure that certain fields
> like url etc. only makes use of ascii. My main aim is to save space
> by using only 1 byte / character for urls  (some of the urls are
> over 200 characters long). Is this possible? Or are all characters
> eventually converted to UTF8 during storage?

An ascii string and the UTF8 representation of it will take exactly
the same number of bytes, so if space used is your concern it's not an
issue.

Cheers,
   Steve


Re: Defining character sets for indicidual fields

From
Tino Wildenhain
Date:
Hi,

Steve Atkins wrote:
>
> On May 31, 2008, at 6:22 PM, Ram Ravichandran wrote:
>
>> Hi,
>>
>> By default, my postgresql server is set to use UTF8 character set. I
>> was wondering if there is any way to make sure that certain fields
>> like url etc. only makes use of ascii. My main aim is to save space by
>> using only 1 byte / character for urls  (some of the urls are over 200
>> characters long). Is this possible? Or are all characters eventually
>> converted to UTF8 during storage?
>
> An ascii string and the UTF8 representation of it will take exactly the
> same number of bytes, so if space used is your concern it's not an issue.

Even more, if you convert URLs from urlencoding to clear text, you can
quickly leave the ASCII char range (think punicode for the fqdn, think
utf-8 for the path)

Cheers
Tino

Attachment