Thread: How to read an external pdf file from postgres?

How to read an external pdf file from postgres?

From
Amine Tengilimoglu
Date:
  Hi;

     I want to read an external pdf file from postgres. pdf file will exist on the disk. postgres only know the disk full path as metadata. Is there any software or extension that can be used for this? Or do we have to develop software for it?  Or what is the best approach for this? I'd appreciate it if anyone with experience could make suggestions.

 Thanks.

Re: How to read an external pdf file from postgres?

From
Peter Eisentraut
Date:
On 12.01.22 12:16, Amine Tengilimoglu wrote:
>       I want to read an external pdf file from postgres. pdf file will 
> exist on the disk. postgres only know the disk full path as metadata. Is 
> there any software or extension that can be used for this? Or do we have 
> to develop software for it?  Or what is the best approach for this? I'd 
> appreciate it if anyone with experience could make suggestions.

You could write a function in PL/Perl or PL/Python to open and read the 
file and process the PDF data, using some third-party module that surely 
exists somewhere.



Re: How to read an external pdf file from postgres?

From
Дмитрий Иванов
Date:
What are you going to do with the data?
If you want to analyze it in some way, I can't think of a better option with a Python function. Or do you just want to transfer them? There are options here too, but in this case I like Python better.
--
Regards, Dmitry!


ср, 12 янв. 2022 г. в 16:16, Amine Tengilimoglu <aminetengilimoglu@gmail.com>:
  Hi;

     I want to read an external pdf file from postgres. pdf file will exist on the disk. postgres only know the disk full path as metadata. Is there any software or extension that can be used for this? Or do we have to develop software for it?  Or what is the best approach for this? I'd appreciate it if anyone with experience could make suggestions.

 Thanks.

Re: How to read an external pdf file from postgres?

From
Ian Lawrence Barwick
Date:
2022年1月12日(水) 20:16 Amine Tengilimoglu <aminetengilimoglu@gmail.com>:
>
>   Hi;
>
>      I want to read an external pdf file from postgres. pdf file will exist on the disk. postgres only know the disk
fullpath as metadata. Is there any software or extension that can be used for this? Or do we have to develop software
forit?  Or what is the best approach for this? I'd appreciate it if anyone with experience could make suggestions. 

By "read" do you mean "open the file and meaningful extract data from it"? If
so, speaking from prior experience, don't. And if you really have to, make sure
the source PDF is guaranteed to be in a well-defined, predictable format
enforceable by contract law and/or people with sharp pointy sticks. I have
successfully suppressed the memories of whatever it is I once had to do with
reading data from PDFs, but though the data was eventually imported into
PostgreSQL, there was a lot of mangling probably involving a Perl module (other
languages are probably available) before it got anywhere near the database.


Reagrds

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com