Re: New to the list; would this be an okay question? - Mailing list pgsql-general

From Madison Kelly
Subject Re: New to the list; would this be an okay question?
Date
Msg-id 40D70026.1020506@alteeve.com
Whole thread Raw
In response to Re: New to the list; would this be an okay question?  (Richard Huxton <dev@archonet.com>)
Responses Re: New to the list; would this be an okay question?  (Richard Huxton <dev@archonet.com>)
List pgsql-general
Richard Huxton wrote:
> Madison Kelly wrote:
>
>> Hi all,
>>
>>   I am new to the list and I didn't want to seem rude at all so I
>> wanted to ask if this was okay first.
>
>
> No problem. Reading your message below, you might want to try the
> performance list, but general is a good place to start.
>
>>   I have a program I have written in perl which uses a postgresSQL
>> database as the backend. The program works but the performance is
>> really bad. I have been reading as much as I can on optimizing
>> performance but still it isn't very reasonable. At one point I had my
>> program able to process 175,000 records in 16min 10sec on a Pentium3
>> 650MHz, 448MB RAM test machine. Since then I got a Pentium3 1GHz,
>> 512MB system and I have tried a lot of things to get the performance
>> up but now it is substantially slower and I can't seem to figure out
>> what I am doing wrong.
>
>
> A few places to start:
> 1. VACUUM FULL
>    This will make sure any unused space is reclaimed
> 2. ANALYZE
>    This will recalculate stats for the tables
> 3. Basic performce tuning:
>    http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
>    There's also a good guide to the postgresql.conf file on varlena.com
>
>>   Would it be appropriate to ask for help on my program on this list?
>> Full disclosure: The program won't be initially GPL'ed because it is
>> for my company but it will be released for free to home users and the
>> source code will be made available (similar to other split-license
>> programs) though once my company makes it's money back I think they
>> will fully GPL it (I am on my boss's case about it :p ).
>
>
> No problem - what you licence your software under is your concern. Once
> you've taken the basic steps described above, try to pick out a specific
> query that you think is too slow and provide:
>
> 1. PostgreSQL version
> 2. Basic hardware info (as you have)
> 3. Sizes of tables.
> 4. Output of EXPLAIN ANALYZE <query here>
>
> The EXPLAIN ANALYZE runs the query and shows how much work PG thought it
> would be and how much it actually turned out to be.
>
> HTH

    Thank you very much!! I am using Psql 7.4 on a stock install of Fedora
Core 2 on my IBM thinkpad a22m (P3 1GHz, 512MB RAM, not the fastest
HDD). The drive carrier I am using is connected via USB2 and uses a few
different hard drives with the fastest being a couple of Barracuda
7200.7 drives (2MB cache, 7,200rpm). I described the program in my reply
to Martijn so here is some of the code (code not related to psql
snipped, let me know if posting it would help - sorry for the wrapping...):

  =-[ Calling the database ]-=
# Open the connection to the database
my $DB = DBI->connect("DBI:Pg:dbname=$db_name","$user")|| die("Connect
error (Is PostgresSQL running?): $DBI::errstr");

# Prepare the select statements before using them for speed:
$select_sth = $DB->prepare("SELECT null FROM file_dir WHERE
file_src_uuid=? AND file_parent_dir=? AND file_name=?") || die
"$DBI::errstr";
$select_up = $DB->prepare("UPDATE file_dir SET file_perm=?,
file_own_user=?, file_own_grp=?, file_size=?, file_mod_date=?,
file_mod_time=?, file_mod_time_zone=?, file_exist=? WHERE
file_src_uuid=? AND file_parent_dir=? AND file_name=?") || die
"$DBI::errstr";
$select_in = $DB->prepare("INSERT INTO file_dir ( file_src_uuid,
file_name, file_dir, file_parent_dir, file_perm, file_own_user,
file_own_grp, file_size, file_mod_date, file_mod_time,
file_mod_time_zone, file_backup, file_restore, file_display, file_exist
) VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )") || die
"$DBI::errstr";

# Set the 'file_exist' flag to 'false' and reset exiting files to 'true'.
$DB->do("UPDATE file_dir SET file_exist='f' WHERE
file_src_uuid='$file_src_uuid'") || die "$DBI::errstr";

# Start scanning the drive
$num = $select_sth->execute($file_src_uuid,$relative_dir,$file_name) ||
die "$DBI::errstr";
if ( $num > 0 )
{

$select_up->execute($file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_exist,$file_src_uuid,$file_parent_dir,$file_name)

|| die "$DBI::errstr";
}
else
{


$select_in->execute($file_src_uuid,$file_name,$file_dir,$file_parent_dir,$file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_backup,$file_restore,$file_display,$file_exist)

|| die "$DBI::errstr";
}

# We need to grab the existing file settings for the special file '/.'
$DBreq=$DB->prepare("SELECT file_backup, file_restore, file_display FROM
file_dir WHERE file_parent_dir='/' AND file_name='.' AND
file_src_uuid='$file_src_uuid'") || die $DBI::errstr;
$file_backup=$DBreq->execute();
@file_backup=$DBreq->fetchrow_array();
$file_backup=@file_backup[0];
$file_restore=@file_backup[1];
$file_display=@file_backup[2];

# Jump into the re-entrant subroutine to scan directories and sub-dirs
&list_files($real_dir, $exclude_list_num, $relative_dir, $file_backup,
$file_restore, $file_display);

# Inside the sub routine

# Does the directory/file/symlink already exist? (there are three of
these for each file type)
$num = $select_sth->execute($file_src_uuid,$relative_dir,$file_name) ||
die "$DBI::errstr";
if ( $num > 0 )
{


$select_up->execute($file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_exist,$file_src_uuid,$file_parent_dir,$file_name)

|| die "$DBI::errstr";
}
else
{
     # The file did not exist so we will use the passed parent settings
for the 'file_backup' flag and leave the 'file_display' flag set to 'f'


$select_in->execute($file_src_uuid,$file_name,$file_dir,$file_parent_dir,$file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_backup,$file_restore,$file_display,$file_exist)

|| die "$DBI::errstr";
}

# If this was a file I would loop and process the next file in the
directory, if it was a directory itself I would now re-enter the
subroutine to process it's contents and when I fell back I would pick up
where I left off

# Returning from the final subroutine and finishing up

$DB->do("VACUUM ANALYZE");

  =-[ finished DB related source code ]-=

Here is the schema for the 'file_dir' table which I hit repeatedly here:

  =-[ file_dir table and index schemas ]-=

CREATE TABLE file_dir (                            -- Used to store info
on every file on source partitions
     file_id            serial        unique,                -- make
this 'bigserial' if there may be more than 2 billion files in the database
     file_src_uuid        varchar(40)    not null,            -- the
UUID of the source partition hosting the original file
     file_org_uuid        varchar(40),                    -- the UUID
that the file came from (when the file was moved by TLE-BU)
     file_name        varchar(255)    not null,            -- Name of
the file or directory
     file_dir        bool        not null,            -- t = is
directory, f = file
     file_parent_dir        varchar(255)    not null,            -- if
directory '/foo/bar', parent is '/foo', if file '/foo/bar/file', parent
is '/foo/bar'. The mount directory is treated as '/' so any directories
below it will be ignored for this record.
     file_perm        varchar(10)    not null,            -- file or
directory permissions
     file_own_user        varchar(255)    not null,            -- The
file's owning user (by name, not UID!!)
     file_own_grp        varchar(255)    not null,            -- The
file's owning group (by name, not GID!!)
     file_size        bigint        not null,            -- File size in
bytes
     file_mod_date        varchar(12)    not null,            -- File's
last edited date
     file_mod_time        varchar(20)    not null,            -- File's
last edited time
     file_mod_time_zone    varchar(6)    not null,            -- File's
last edited time zone
     file_backup        boolean        not null    default 'f',    --
't' = Include in backup jobs, 'f' = Do not include in backup jobs
     file_restore        boolean        not null    default 'f',    --
't' = Include in restore jobs, 'f' = Do not include in restore jobs
     file_display        boolean        not null    default 'f',    --
't' = display, 'f' = hide
     file_exist        boolean        default 't'            -- Used to
catch files that have been deleted since the last scan. Before rescan,
all files in a given src_uuid are set to 0 (deleted) and then as each
file is found or updated it is reset back to 1 (exists) and anything
left with a value of '0' at the end of the scan is deleted and we will
remove their record.
);

-- CREATE INDEX file_dir_idx ON file_dir
(file_src_uuid,file_name,file_parent_dir);

  =-[ Finish file_dir table and index schemas ]-=

Thanks so much!!

Madison


pgsql-general by date:

Previous
From: Madison Kelly
Date:
Subject: Re: New to the list; would this be an okay question?
Next
From: "Sailer, Denis (YBUSA-CDR)"
Date:
Subject: JDBC driver inserting into a table with Bytea type gets out of me mory error.