Re: New to the list; would this be an okay question? - Mailing list pgsql-general
From | Madison Kelly |
---|---|
Subject | Re: New to the list; would this be an okay question? |
Date | |
Msg-id | 40D70001.1070903@alteeve.com Whole thread Raw |
In response to | Re: New to the list; would this be an okay question? (Martijn van Oosterhout <kleptog@svana.org>) |
List | pgsql-general |
Sorry; I didn't include details at first because I wanted to make sure that was an approprate request for this list. I have the program run 'VACUUM ANALYZE' after every major update/insert job and I have in fact indexed the three columns that I search through when I need to decide to update (if the record exists) or insert (if it does not). I have read about the EXPLAIN option/tool but I haven't been able to get my head around how to properly use it yet. Here is what I am trying to do: The program is a Linux backup program that uses a web front-end as it's interface (so that a client can access it from any system, inc. MS workstations). The program is also designed to allow the user to search for a given file or file by spec (file size, date modified, etc) on media that is offline. The program is designed with externally connected USB2 and firewire drives so it is all partition-based. In order to make the web front-end stateful and to allow for the ability to search I needed to keep in the database detailed information on every file and directory on a given partition. Some of the information also needs to be maintained so I can't just clear and rescan a partition. For example, I need to keep track of what directories and files a user has selected or not selected to be backed up in a given partition. This means that whenever I need to update the contents of a partition I need to run 'ls' starting at the mount point for the partition and scanning down through all sub directories. As each file is scanned I check the database to see if the file name I am looking at already exists in the database. I do this by searching for the file_name, File_parent_dir (parent directory) and file_src_uuid (the UUID [serial number] of the partition the file is on). If there is a match I run an "UPDATE" where the backup state is not touched. If the file is new then the file is added to the database along with all of it's particular information such as owning user, group, permissions, filesize and so on. Given that some file systems have 250,000 files and directories I need to make sure that the database calls are as optimized as I can make them. I have verified that the lag is in the database by commenting out the actual database calls and letting the program traverse the file system. In that case a job that with the database calls in place and took nearly 200 seconds finishes in roughly 2 seconds. If this is an okay request I would be happy to post the schema I am using and the perl code I am using to make the DB calls. Thanks!! Madison Kelly Martijn van Oosterhout wrote: > Standard questions: > - Have you VACUUMed? > - Have you VACUUM ANALYZEd? > - Have you done EXPLAIN ANALYZE on the complex queries? > - Have you put INDEXes on the appropriate columns. > > You need to give more details is you want more detailed answers. > > On Mon, Jun 21, 2004 at 09:38:14AM -0400, Madison Kelly wrote: > >>Hi all, >> >> I am new to the list and I didn't want to seem rude at all so I >>wanted to ask if this was okay first. >> >> I have a program I have written in perl which uses a postgresSQL >>database as the backend. The program works but the performance is really >>bad. I have been reading as much as I can on optimizing performance but >>still it isn't very reasonable. At one point I had my program able to >>process 175,000 records in 16min 10sec on a Pentium3 650MHz, 448MB RAM >>test machine. Since then I got a Pentium3 1GHz, 512MB system and I have >>tried a lot of things to get the performance up but now it is >>substantially slower and I can't seem to figure out what I am doing wrong. >> >> Would it be appropriate to ask for help on my program on this list? >>Full disclosure: The program won't be initially GPL'ed because it is for >>my company but it will be released for free to home users and the source >>code will be made available (similar to other split-license programs) >>though once my company makes it's money back I think they will fully GPL >>it (I am on my boss's case about it :p ). >> >> Thanks all! >> >>Madison Kelly >> >>---------------------------(end of broadcast)--------------------------- >>TIP 9: the planner will ignore your desire to choose an index scan if your >> joining column's datatypes do not match > >
pgsql-general by date: