I think the key word here that will help you is biocuration and it's an established field involving people with
scientific,computational, and linguistic backgrounds who are familiar with the problem space so I would suggest talking
topeople working in this area first to get an idea of what's feasible, what's already out there, etc., as they will
knowthis better than the Postgres community.
You can see an example of the sort of annotation that is fully automated at the moment here:
https://monarchinitiative.org/tools/text-annotate
Given the potential impact on human health, some level of manual involvement in annotation is frequently part of the
workflow.
Daniel
-----Original Message-----
From: Achilleas Mantzios <achill@matrix.gatewaynet.com>
Sent: 05 June 2021 10:49
To: pgsql-general@lists.postgresql.org
Subject: Ideas for building a system that parses medical research publications/articles [EXT]
Hello
I am imagining a system that can parse papers from various sources
(web/files/etc) and in various formats (text, pdf, etc) and can store metadata for this paper ,some kind of global ID
ifapplicable, authors, areas of research, whether the paper is "new", "highlighted", "historical", type (e.g. Case
reports,Clinical trials), symptoms (e.g.
tics, GI pain, psychological changes, anxiety, ), and other key attributes (I guess dynamic), it must be full text
searchable,etc.
I am at the very beginning in this and it is done on a fully volunteer basis.
Lots of questions : is there any scientific/scholar analysis software already available? If yes and is really good and
opensource , then this will influence the rest of decisions. Otherwise , I'll have to form a team that can write one,
inthis case I'll have to decide DB, language, etc. I work 20 years with pgsql so it is the natural choice for any kind
ofdata, I just ask this for the sake of completeness.
All ideas welcome.
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.