Re: [Qgis-user] OCR - Mailing list pgsql-novice

From Mike Ellsworth
Subject Re: [Qgis-user] OCR
Date
Msg-id BANLkTinrieWobY61d1nSjsZfUXJROye_Fg@mail.gmail.com
Whole thread Raw
In response to Re: [Qgis-user] OCR  (Ramon Andinach <custard@westnet.com.au>)
List pgsql-novice
On Wed, May 4, 2011 at 6:04 AM, Ramon Andinach <custard@westnet.com.au> wrote:
>
> On 04/05/2011, at 17:18 , ALT SHN wrote:
>
>> Hello,
>>
>> This might seem a Little off topic, but maybe someone here can help me.
>>
>> I need to extract toponomical data from old digitized paper maps. I wish to explore Optical character recognition
(OCR).
>>
>> Does anyone has a suggestion/experience with this kind of challenge?
>>
>> Thank you,
>>
>> André Mano
>
> I'd argue about the "little". I've spent a lot of the last few weeks arguing with OCR software about tables of sample
datafrom old reports, that become point data to plot. Perfectly relevant :) 
>
> I've never tried getting data from digitized maps, but I'll offer the following generalisations in case it helps.
Generally,I have OCR programmes pass me the results as plain text. I lose the formatting, but I don't have to fix
stupidguesses about the formatting. 
>
> This is from my experience, so you may find different.
>
> 1. OCR loves paragraphs.
> 2. Different OCR programmes handle column text differently. Some understand columns, some just assume L->R straight
acrossboth columns. 
> 3. OCR does not get along with handwritten anything. (Unless the person was extra-extra neat and consistent in their
writing,and even then it's a maybe.) 
> 4. OCR on tabular data works best if the data is lined up in columns, and doesn't have random big gaps.
> 5. OCR will almost certainly be confused if there is a line on your map running through or near a word.
>  5a. Actually lines could confuse it quite a bit - I remember one that tried to recreate an in-line sketch map out of
asciicharacters. Quite amusing. 
> 6. You *will* need to check the results.
>
> I'd love to hear what OCR makes of maps. Very curious.
>
> -ramon.
> --

Going along with the poster above, I think you may have problems if
any of your text is skewed or 'wraps' in any way such that any
character is not horizontal to the scan.  Recognizing 7 out of 10,
then cleaning up, may be more time consuming than data entry.

You may be able to 1) scan 2) drag a 'box' over the various data
points and assign an x,y for the box 3) assign each box an identifier
4) then data enter per box.  At least you'll have a loose
representation of the data.

I know one thing I'd google for is 'OCR skewed text' and see if there
is an app that can handle that potential problem before I went much
further.


--
Mike Ellsworth

pgsql-novice by date:

Previous
From: Ramon Andinach
Date:
Subject: Re: [Qgis-user] OCR
Next
From: Mohlomi Moloi
Date:
Subject: Re: [Qgis-user] OCR