Thread: automatic scan a table, report on data formats in columns
Is it possible to do the following?
automatically scan a table of all text columns
produce a report on data formats in columns as indicated in the following:
Column A Column B Column C
alphabetic words/phrases digits like xxxxx.xx alphanumeric identifiers
City of London 5 digits followed by a iso12345
decimal point and 2
digits indicating precision
It is a bit like detecting regular expression patterns automatically.
Is automatically detecting something like regular expression patterns possible?
Regards,
David
You first want to check a certain text pattern exists in an text column or not. (you can use generated columns).
Not the other way around to check one text column have what kind of pattern. If there is an pattern, it's text.
On Mon, Feb 21, 2022 at 3:36 PM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Is it possible to do the following?automatically scan a table of all text columnsproduce a report on data formats in columns as indicated in the following:Column A Column B Column Calphabetic words/phrases digits like xxxxx.xx alphanumeric identifiersCity of London 5 digits followed by a iso12345decimal point and 2digits indicating precisionIt is a bit like detecting regular expression patterns automatically.Is automatically detecting something like regular expression patterns possible?Regards,David
On Mon, Feb 21, 2022 at 3:06 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Is it possible to do the following?automatically scan a table of all text columnsproduce a report on data formats in columns as indicated in the following:Column A Column B Column Calphabetic words/phrases digits like xxxxx.xx alphanumeric identifiersCity of London 5 digits followed by a iso12345decimal point and 2digits indicating precisionIt is a bit like detecting regular expression patterns automatically.Is automatically detecting something like regular expression patterns possible?
Yep, and the answer for any text column you give me is:
^.*$
If you want a classification system where you have more (already known) complex RegularExpressions and you want to choose the best fit that is also possible, and probably much more useful.
For anything else you need a better problem specification. And I'd probably tend toward wanting to run some kind of AI system on the data - i.e., not something I'd perform in-database.
David J.
On Mon, Feb 21, 2022, 2:06 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Is it possible to do the following?automatically scan a table of all text columnsproduce a report on data formats in columns as indicated in the following:Column A Column B Column Calphabetic words/phrases digits like xxxxx.xx alphanumeric identifiersCity of London 5 digits followed by a iso12345decimal point and 2digits indicating precisionIt is a bit like detecting regular expression patterns automatically.Is automatically detecting something like regular expression patterns possible?Regards,David
Depending on your definition of automatic, I think this is very do-able.
First you find the table names (using system catalog or hard-coded values, depending) you're interested in and then use the columns view (https://www.postgresql.org/docs/current/infoschema-columns.html) to enumerate over the fields in each table to find ones with data types you want to analyze). From there you can query each record in each column using regex or similar to classify each column as to its contents.
Of course you have to write all that code so it's not automatic as in built-in. But it's automatic in the sense that once written it would work against any set of tables and columns and can be run without any human intervention or analysis in the moment.
Steve
On Mon, 21 Feb 2022 at 16:59, Steve Midgley <science@misuse.org> wrote:
On Mon, Feb 21, 2022, 2:06 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:Is it possible to do the following?automatically scan a table of all text columnsproduce a report on data formats in columns as indicated in the following:Column A Column B Column Calphabetic words/phrases digits like xxxxx.xx alphanumeric identifiersCity of London 5 digits followed by a iso12345decimal point and 2digits indicating precisionIt is a bit like detecting regular expression patterns automatically.Is automatically detecting something like regular expression patterns possible?Regards,DavidDepending on your definition of automatic, I think this is very do-able.First you find the table names (using system catalog or hard-coded values, depending) you're interested in and then use the columns view (https://www.postgresql.org/docs/current/infoschema-columns.html) to enumerate over the fields in each table to find ones with data types you want to analyze). From there you can query each record in each column using regex or similar to classify each column as to its contents.Of course you have to write all that code so it's not automatic as in built-in. But it's automatic in the sense that once written it would work against any set of tables and columns and can be run without any human intervention or analysis in the moment.Steve
If there is a context and predicable possibilities, it could well be do-able. I guess.
Regards,
David