Thread: How to insert data from a text file

How to insert data from a text file

From
Mike
Date:
Background:

Thousands of folders and files filled with client data on the linux
samba server.
Searching for data is difficult due to ever-changing directory
structures and naming schemes.

Example (exaggerating to make a point):

/air/water/vegetable/animal/broccoli/potato/crabby_client.doc
/animal/broccoli/air/vegetable/poise_and_honor.pdf

It goes on for about 650,000 files

I used the following command to gather all file and directory names.
"/abc" is the main directory on the linux samba server where all
company data is located:

root@acme:/# ls -Ralh /abc > /home/mike/file_output.txt

file_output.txt is 33 megs. and lists approximately 650,000 file names
and their directory paths.

I want to pull all the file name info. and directory path info. from
"file_output.txt" and place each file name into a postgresql database
along with the related file location information.
The ultimate goal is to be able to make an alpha-numeric query and be
presented with matching file names and their location on the samba
server.

How do I pull the data from "file_output.txt" and place it into the
postgres database?

Any guidance and pointed RTFM declarations greatly appreciated.
Thanks,
Mike

Re: How to insert data from a text file

From
Sean Davis
Date:


On Thu, Jun 18, 2009 at 1:51 PM, Mike <1100100@gmail.com> wrote:
Background:

Thousands of folders and files filled with client data on the linux
samba server.
Searching for data is difficult due to ever-changing directory
structures and naming schemes.

Example (exaggerating to make a point):

/air/water/vegetable/animal/broccoli/potato/crabby_client.doc
/animal/broccoli/air/vegetable/poise_and_honor.pdf

It goes on for about 650,000 files

I used the following command to gather all file and directory names.
"/abc" is the main directory on the linux samba server where all
company data is located:

root@acme:/# ls -Ralh /abc > /home/mike/file_output.txt

file_output.txt is 33 megs. and lists approximately 650,000 file names
and their directory paths.

I want to pull all the file name info. and directory path info. from
"file_output.txt" and place each file name into a postgresql database
along with the related file location information.
The ultimate goal is to be able to make an alpha-numeric query and be
presented with matching file names and their location on the samba
server.

How do I pull the data from "file_output.txt" and place it into the
postgres database?

Any guidance and pointed RTFM declarations greatly appreciated.

With 650k lines, you can use python, perl, java, etc. to insert the records; pick you language of choice.  Alternatively, you can use the psql client and its \copy command. 

Just an aside, but on linux, have you looked into using either the find command or locate, or even google desktop?

Sean

Re: How to insert data from a text file

From
Mike
Date:
On Thu, Jun 18, 2009 at 2:04 PM, Sean Davis<sdavis2@mail.nih.gov> wrote:
> With 650k lines, you can use python, perl, java, etc. to insert the records;
> pick you language of choice.  Alternatively, you can use the psql client and
> its \copy command.

Sean,
Thanks for the quick response.
I'm barely literate with bash so there's probably quite an uphill
curve for me in python and perl.
Psql sounds very promising, though.  I think I'll start there.

> Just an aside, but on linux, have you looked into using either the find
> command or locate, or even google desktop?

The ultimate goal is to make the database available to the users.
So I'll give them a php page with a "search" field on an apache
webserver, and they can type in nouns and proper names to find
documents, etc. First, I need to crawl, then walk later.  <g>

Best regards,
Mike

Re: How to insert data from a text file

From
damien clochard
Date:
Mike a écrit :
> Background:
>
> Thousands of folders and files filled with client data on the linux
> samba server.
> Searching for data is difficult due to ever-changing directory
> structures and naming schemes.
>
> Example (exaggerating to make a point):
>
> /air/water/vegetable/animal/broccoli/potato/crabby_client.doc
> /animal/broccoli/air/vegetable/poise_and_honor.pdf
>
> It goes on for about 650,000 files
>
> I used the following command to gather all file and directory names.
> "/abc" is the main directory on the linux samba server where all
> company data is located:
>
> root@acme:/# ls -Ralh /abc > /home/mike/file_output.txt
>
> file_output.txt is 33 megs. and lists approximately 650,000 file names
> and their directory paths.
>
> I want to pull all the file name info. and directory path info. from
> "file_output.txt" and place each file name into a postgresql database
> along with the related file location information.
> The ultimate goal is to be able to make an alpha-numeric query and be
> presented with matching file names and their location on the samba
> server.
>
> How do I pull the data from "file_output.txt" and place it into the
> postgres database?
>

pgloader might help you :

http://pgfoundry.org/projects/pgloader/




Re: How to insert data from a text file

From
Michael Wood
Date:
2009/7/20 damien clochard <damien@dalibo.info>:
> Mike a écrit :
[...]
>> I used the following command to gather all file and directory names.
>> "/abc" is the main directory on the linux samba server where all
>> company data is located:
>>
>> root@acme:/# ls -Ralh /abc > /home/mike/file_output.txt

If you just want the file paths, using "find" might make things easier
to deal with:

# find /abc -type f -print >/home/mike/file_output.txt

>> file_output.txt is 33 megs. and lists approximately 650,000 file names
>> and their directory paths.
>>
>> I want to pull all the file name info. and directory path info. from
>> "file_output.txt" and place each file name into a postgresql database
>> along with the related file location information.
>> The ultimate goal is to be able to make an alpha-numeric query and be
>> presented with matching file names and their location on the samba
>> server.
>>
>> How do I pull the data from "file_output.txt" and place it into the
>> postgres database?

If you use "find" as mentioned above, then you can do this to get the
file and directory information by splitting on the last "/".

--
Michael Wood <esiotrot@gmail.com>

Re: How to insert data from a text file

From
Mike
Date:
On Mon, Jul 20, 2009 at 3:27 PM, Michael Wood<esiotrot@gmail.com> wrote:
>
> If you just want the file paths, using "find" might make things easier
> to deal with:
>
> # find /abc -type f -print >/home/mike/file_output.txt
>
>
> If you use "find" as mentioned above, then you can do this to get the
> file and directory information by splitting on the last "/".
>

Thanks Mr. Wood, this is what I've been focusing on for use with
different tables.
More trial and error needed on my part.
Thanks for your help.