Thread: RESOLVED: varchar comparison and trim()

RESOLVED: varchar comparison and trim()

From
ta@lavabit.com
Date:
Considering Kevin’s answer I have chosen to stick with “the Postgres way”
of resolving this, but here are some of my other rumblings that someone
might find useful:

This all is about converting part of (and interoperate with PG/database) a
legacy app that uses ISAM-like fixed length data.
So, data and queries come in containing a lot of trailing spaces.

Possible solutions for this “trailing space” problem that I’ve come to:

1.Use char(n) for all indexed/searchable columns
(+) simple as it can be, imposes no compatibility risk
(-) will cause data pages and already somewhat oversized btree indexes to
increase in size slightly – as PG does not feature compressed indexes yet,
and that would mean more disk pages to read in…

2. Redefine (in public schema) “varchar to varchar” operators to conform
to current “char to varchar” and “char to char” behavior
(+)  easy and transparent. Both char and varchar columns are actually
meant to store human-readable text where no trailing space has any
significance.  It’s highly unlikely this change would interfere with any
other software as programmers in general (being humans) are not supposed
to consider trailing spaces “significant” – if one wants to store trailing
spaces, char(n) and char arrays might be much better suited to
(-) “LANGUAGE internal” operators are supposed to run faster (have done no
timings) – so, slight slowdown in execution speed might be expected as
this comparison operations are performed quite often
(-) one has to make sure that “public” schema is always included in
search_path
(-) opposes “the standard”

3. Introduce a new datatype extension with its own set of operators (like
citext is)
(+) would not interfere with anything
(-) requires some knowledge and skill and is doubtful whether it would be
accepted by the community

4. Modify client application logic to prevent excess trailing spaces ever
reaching the database. (One can put constraints/domains or triggers to
enforce Rtrim() on data, but main problem remains:  the query arguments!)
(+) does not require any possibly incompatible database tweaks
(-) application logic might be too difficult to patch (ORM-s  etc…)

5. Use another database (MySql supports it out of the box – actually it’s
the default and only way it works - so this situation might arise as well
if you are converting a MySql application to Postgres, and MSSQL supports
it via ANSI_PADDING setting – it was their default at the beginning too).
(+) just switch to
(-) would miss some nice PG features (like search_path for instance)

I finally opted for “client app” solution, inserting a custom made query
parser in app=>database comm process, preventing all trailing spaces ever
leaving the application unless specifically instructed to.

Tom





Re: RESOLVED: varchar comparison and trim()

From
Michael Wood
Date:
Hi

On 21 December 2012 11:54, <ta@lavabit.com> wrote:
>
> Considering Kevin’s answer I have chosen to stick with “the Postgres way”
> of resolving this, but here are some of my other rumblings that someone
> might find useful:
>
> This all is about converting part of (and interoperate with PG/database) a
> legacy app that uses ISAM-like fixed length data.
> So, data and queries come in containing a lot of trailing spaces.
>
> Possible solutions for this “trailing space” problem that I’ve come to:

I just skimmed your post, but how about this for another solution:

Create insert and update triggers to trim the data as it arrives in
the database?  Will that do what you want?

--
Michael Wood <esiotrot@gmail.com>


Re: RESOLVED: varchar comparison and trim()

From
ta@lavabit.com
Date:
> I just skimmed your post, but how about this for another solution:
>
> Create insert and update triggers to trim the data as it arrives in
> the database?  Will that do what you want?
>
> --
> Michael Wood <esiotrot@gmail.com>
>

Unfortunately no. This would require a trigger for each and every table in
the database (there will be many), and all those triggers will have to be
maintained with every structure change. Although one can limit the number
of different trigger functions or write a generalized trigger function,
data entry is just one side of the problem and querying data is another:

create table mytable (field1 text);    -- or varchar

--than comes the following statement from client app:
Insert into mytable (‘aa    ‘);
-- if aforementioned trigger is in effect then actually ‘aa’ will be stored

-- but when client app asks for some data:
select * from mytable where field1=’aa    ‘;
--no records will satisfy the condition - unless field1 is of type char(n)!

Well, I’m happy with my solution (so far at least:-) parsing all queries
coming out of the client-app and Rtrim()ing all ISAM field references and
expressions. It adds some processing time to all queries but that’s
negligible taking into account that accessing PG/SQL tables is at least
10x slower than ISAM access and it has additional benefit of incurring no
additional processing cost in PG.

As I said before, some sort of ANSI_PADDING option, possibly scoped to a
table or a column, would be nice…

And ignoring “text to text” operator redefinition might be a bug.

A.Tom