Thread: automatic scan a table, report on data formats in columns

automatic scan a table, report on data formats in columns

From
Shaozhong SHI
Date:
Is it possible to do the following?

automatically scan a table of all text columns
produce a report on data formats in columns as indicated in the following:

Column A                               Column B                   Column C 
alphabetic words/phrases     digits like xxxxx.xx      alphanumeric identifiers
City of London                       5 digits followed by a    iso12345
                                               decimal point and 2
                                               digits indicating precision


It is a bit like detecting regular expression patterns automatically.

Is automatically detecting something like regular expression patterns possible?

Regards,

David

Re: automatic scan a table, report on data formats in columns

From
Jian He
Date:

You first want to check a certain text pattern exists in an text column or not. (you can use generated columns).
Not the other way around to check one text column have what kind of pattern. If there is an pattern, it's text.




On Mon, Feb 21, 2022 at 3:36 PM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Is it possible to do the following?

automatically scan a table of all text columns
produce a report on data formats in columns as indicated in the following:

Column A                               Column B                   Column C 
alphabetic words/phrases     digits like xxxxx.xx      alphanumeric identifiers
City of London                       5 digits followed by a    iso12345
                                               decimal point and 2
                                               digits indicating precision


It is a bit like detecting regular expression patterns automatically.

Is automatically detecting something like regular expression patterns possible?

Regards,

David

Re: automatic scan a table, report on data formats in columns

From
"David G. Johnston"
Date:
On Mon, Feb 21, 2022 at 3:06 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Is it possible to do the following?

automatically scan a table of all text columns
produce a report on data formats in columns as indicated in the following:

Column A                               Column B                   Column C 
alphabetic words/phrases     digits like xxxxx.xx      alphanumeric identifiers
City of London                       5 digits followed by a    iso12345
                                               decimal point and 2
                                               digits indicating precision


It is a bit like detecting regular expression patterns automatically.

Is automatically detecting something like regular expression patterns possible?


Yep, and the answer for any text column you give me is:

^.*$

If you want a classification system where you have more (already known) complex RegularExpressions and you want to choose the best fit that is also possible, and probably much more useful.

For anything else you need a better problem specification.  And I'd probably tend toward wanting to run some kind of AI system on the data - i.e., not something I'd perform in-database.

David J.

Re: automatic scan a table, report on data formats in columns

From
Steve Midgley
Date:


On Mon, Feb 21, 2022, 2:06 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Is it possible to do the following?

automatically scan a table of all text columns
produce a report on data formats in columns as indicated in the following:

Column A                               Column B                   Column C 
alphabetic words/phrases     digits like xxxxx.xx      alphanumeric identifiers
City of London                       5 digits followed by a    iso12345
                                               decimal point and 2
                                               digits indicating precision


It is a bit like detecting regular expression patterns automatically.

Is automatically detecting something like regular expression patterns possible?

Regards,

David

Depending on your definition of automatic, I think this is very do-able. 

First you find the table names (using system catalog or hard-coded values, depending) you're interested in and then use the columns view (https://www.postgresql.org/docs/current/infoschema-columns.html) to enumerate over the fields in each table to find ones with data types you want to analyze). From there you can query each record in each column using regex or similar to classify each column as to its contents. 

Of course you have to write all that code so it's not automatic as in built-in. But it's automatic in the sense that once written it would work against any set of tables and columns and can be run without any human intervention or analysis in the moment. 

Steve 

Re: automatic scan a table, report on data formats in columns

From
Shaozhong SHI
Date:


On Mon, 21 Feb 2022 at 16:59, Steve Midgley <science@misuse.org> wrote:


On Mon, Feb 21, 2022, 2:06 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Is it possible to do the following?

automatically scan a table of all text columns
produce a report on data formats in columns as indicated in the following:

Column A                               Column B                   Column C 
alphabetic words/phrases     digits like xxxxx.xx      alphanumeric identifiers
City of London                       5 digits followed by a    iso12345
                                               decimal point and 2
                                               digits indicating precision


It is a bit like detecting regular expression patterns automatically.

Is automatically detecting something like regular expression patterns possible?

Regards,

David

Depending on your definition of automatic, I think this is very do-able. 

First you find the table names (using system catalog or hard-coded values, depending) you're interested in and then use the columns view (https://www.postgresql.org/docs/current/infoschema-columns.html) to enumerate over the fields in each table to find ones with data types you want to analyze). From there you can query each record in each column using regex or similar to classify each column as to its contents. 

Of course you have to write all that code so it's not automatic as in built-in. But it's automatic in the sense that once written it would work against any set of tables and columns and can be run without any human intervention or analysis in the moment. 

Steve 

If there is a context and predicable possibilities, it could well be do-able.  I guess.

Regards,

David