Re: Performance issues - Mailing list pgsql-general
From | Eric Lee Green |
---|---|
Subject | Re: Performance issues |
Date | |
Msg-id | 200203182040.00901@badtux.org Whole thread Raw |
In response to | Re: Performance issues (Doug McNaught <doug@wireboard.com>) |
List | pgsql-general |
On Monday 18 March 2002 09:11 pm, Doug McNaught wrote: > Eric Lee Green <eric@badtux.org> writes: > > PostGreSQL 7.2, Red Hat Linux 6.2: > > > > I am having a very strange performance issue. I am using postgresql-ruby, > > but I don't think this is a postgresql-ruby issue. If I issue 1,000 > > inserts in a row as part of a transaction, I can get 900 inserts per > > second. If I mix the inserts with a select inbetween on the same row, I > > can insert at only 7 inserts per second. > > Hmm, my guess is that SELECT queries that return a null result set > are what's slowing things down--they have to scan the whole table. explain select * from migfiles where source = 'foo' and mtime = 5; says it's using the index. Seat of the pants verifies this: select * from migfiles where source = 'foo' and mtime = 5; returns immediately (there is no file named 'foo' in the database), just as select * from migfiles where source = 'bar' and mtime = 5; returns immediately (there is a file named 'bar' in the database). select * from migfiles where mtime = -1; takes about 5 seconds to tell me there's no record (200,000 records, slow IDE drive on a laptop), showing what scanning the whole database would cost (there is no index on mtime). Same thing with select count(*) from migfiles; it takes about 5 seconds to tell me how many records are in the database. > Is > there any reason you have to do this? Yeah, this is a two-phase commit protocol and if I created a unique index on (source,mtime) and then did the insert, I'd have to add an extra field for 'state' and then come back and do an update to "commit" the (non-sql) transaction. Right now I'm using the database record to signal the "commit", and recording the initial phase of the (non-sql) transaction elsewhere. In addition, if I recall right an error aborts a (sql) transaction, which I decidedly do NOT want to have happen, because there is an enormous performance gain from doing a bunchy of inserts as a single transaction rather than as seperate individual transactions. > You might try timing 1000 null-returning SELECTs against a populated > table and see how long they take, just to see if my hypothesis is > correct. Near instantaneously. > > The INSERT-without-SELECT goes fast because PG just appends to the > table without having to scan it. Not quite. There's the index to bother with. > IF you do this, or even better, use > COPY rather than INSERT for bulk loading, it'll go fast. I'm thinking of doing this. But this will mean that I must have a more complicated recovery mechanism. At the moment I have a single (non-SQL) transaction outstanding. If I go to bulk loading, I could have a thousand transactions outstanding. What currently consists of a single stat() call would turn into a complicated loop and transaction log walk. > What usage patterns is this app going to have? If "record not there" > is the common case, try putting a UNIQUE INDEX on 'source' and just > catch INSERT errors when they happen. The typical usage pattern is that on a daily run, 80% of the entries will match the 'select' statement and be skipped, the other 20% will match no record in the database and will have a transaction run against them and then be inserted into the database as the final 'commit' phase of the transaction. Typically they will be bunched together -- a thousand files that match the 'select', a hundred files that don't and must be processed, etc. My boss says we can't switch from MySQL unless I can make the beast insert at least 25 records per second into the database as part of the transactional process (we get more than that from MySQL, though MySQL on Linux has some decided scaling problems as you approach 10,000,000 records that makes me want to switch). I will create some more test programs tomorrow and post them if I can duplicate the problem in a way that won't cause Legal to have a heart attack :-}. Geez, I wish it was possible to make a living in the Open Source world nowdays, this business of NDA's and such sucks. -- Eric Lee Green GnuPG public key at http://badtux.org/eric/eric.gpg mailto:eric@badtux.org Web: http://www.badtux.org/eric Join the Conspiracy: http://evidence-eliminator-sucks.com
pgsql-general by date: