Alvaro Herrera wrote:
> On Tue, Jun 22, 2004 at 03:52:03PM -0400, Madison Kelly wrote:
>
>
>> What is happening now is that the program does an 'ls' (system call)
>>to get a list of the files and directories starting at the root of a
>>mounted partition. These are read into an array which perl then
>>processes one at a time. the 'ls' value is searched for in the database
>>and if it doesn't exist, the values are inserted. If they do exist, they
>>are updated (at 1/10th the speed). If the file is in fact a directory
>>perl jumps into it and again reads in it's contents into another array
>>and processes the one at a time. It will do this until all files or
>>directories on the partition have been processed.
>
>
> So you read the entire filesystem again and again? Sounds like a
> horrible idea to me. Have you tried using the mtimes, etc?
I haven't heard of 'mtimes' before, I'll google for it now.
>> My previous question was performance based, now I just need to get
>>the darn thing working again. Like I said, after ~300 seconds perl dies.
>
>
> Out of memory? If you save your whole filesystem in a Perl array you
> are going to consume a lot of memory. This is, of course, not Postgres
> related, so I'm not sure why you are asking here.
Running just the perl portion which reads and parses the file system
works fine and fast. It isn't until I make the DB calls that everything
breaks. I know that the DB will slow things down but the amount of
performance loss I am seeing and the flat out breaking of the program
can't be reasonable.
Besides, postgresSQL should be able to handle 250,000 SELECTs
followed by an UPDATE or INSERT for each on an AMD Athlon 1700+ with
512MB RAM, shouldn't it? Besides, the program is dieing after 5 minutes
when the calls are being commited automatically so the work being done
shouldn't be filling any memory, should it?
Madison