Re: New to the list; would this be an okay question? - Mailing list pgsql-general

From Madison Kelly
Subject Re: New to the list; would this be an okay question?
Date
Msg-id 40D70001.1070903@alteeve.com
Whole thread Raw
In response to Re: New to the list; would this be an okay question?  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general
Sorry; I didn't include details at first because I wanted to make sure
that was an approprate request for this list.

I have the program run 'VACUUM ANALYZE' after every major update/insert
job and I have in fact indexed the three columns that I search through
when I need to decide to update (if the record exists) or insert (if it
does not). I have read about the EXPLAIN option/tool but I haven't been
able to get my head around how to properly use it yet.

Here is what I am trying to do:

The program is a Linux backup program that uses a web front-end as it's
interface (so that a client can access it from any system, inc. MS
workstations). The program is also designed to allow the user to search
for a given file or file by spec (file size, date modified, etc) on
media that is offline. The program is designed with externally connected
USB2 and firewire drives so it is all partition-based.

In order to make the web front-end stateful and to allow for the ability
to search I needed to keep in the database detailed information on every
file and directory on a given partition. Some of the information also
needs to be maintained so I can't just clear and rescan a partition. For
example, I need to keep track of what directories and files a user has
selected or not selected to be backed up in a given partition. This
means that whenever I need to update the contents of a partition I need
to run 'ls' starting at the mount point for the partition and scanning
down through all sub directories.

As each file is scanned I check the database to see if the file name I
am looking at already exists in the database. I do this by searching for
the file_name, File_parent_dir (parent directory) and file_src_uuid (the
UUID [serial number] of the partition the file is on). If there is a
match I run an "UPDATE" where the backup state is not touched. If the
file is new then the file is added to the database along with all of
it's particular information such as owning user, group, permissions,
filesize and so on.

Given that some file systems have 250,000 files and directories I need
to make sure that the database calls are as optimized as I can make
them. I have verified that the lag is in the database by commenting out
the actual database calls and letting the program traverse the file
system. In that case a job that with the database calls in place and
took nearly 200 seconds finishes in roughly 2 seconds.

If this is an okay request I would be happy to post the schema I am
using and the perl code I am using to make the DB calls.

Thanks!!

Madison Kelly

Martijn van Oosterhout wrote:
> Standard questions:
> - Have you VACUUMed?
> - Have you VACUUM ANALYZEd?
> - Have you done EXPLAIN ANALYZE on the complex queries?
> - Have you put INDEXes on the appropriate columns.
>
> You need to give more details is you want more detailed answers.
>
> On Mon, Jun 21, 2004 at 09:38:14AM -0400, Madison Kelly wrote:
>
>>Hi all,
>>
>>  I am new to the list and I didn't want to seem rude at all so I
>>wanted to ask if this was okay first.
>>
>>  I have a program I have written in perl which uses a postgresSQL
>>database as the backend. The program works but the performance is really
>>bad. I have been reading as much as I can on optimizing performance but
>>still it isn't very reasonable. At one point I had my program able to
>>process 175,000 records in 16min 10sec on a Pentium3 650MHz, 448MB RAM
>>test machine. Since then I got a Pentium3 1GHz, 512MB system and I have
>>tried a lot of things to get the performance up but now it is
>>substantially slower and I can't seem to figure out what I am doing wrong.
>>
>>  Would it be appropriate to ask for help on my program on this list?
>>Full disclosure: The program won't be initially GPL'ed because it is for
>>my company but it will be released for free to home users and the source
>>code will be made available (similar to other split-license programs)
>>though once my company makes it's money back I think they will fully GPL
>>it (I am on my boss's case about it :p ).
>>
>>  Thanks all!
>>
>>Madison Kelly
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 9: the planner will ignore your desire to choose an index scan if your
>>     joining column's datatypes do not match
>
>


pgsql-general by date:

Previous
From: "Chris Ochs"
Date:
Subject: Re: Possible SET SESSION AUTHORIZATION bug
Next
From: Madison Kelly
Date:
Subject: Re: New to the list; would this be an okay question?