Re: Performance question - Mailing list pgsql-admin
From | Benjamin Krajmalnik |
---|---|
Subject | Re: Performance question |
Date | |
Msg-id | F4E6A2751A2823418A21D4A160B68988029D7A@fletch.stackdump.local Whole thread Raw |
In response to | Performance question ("Benjamin Krajmalnik" <kraj@illumen.com>) |
List | pgsql-admin |
Unfortunately, the application doing the logging is not written by me, so I have to play with the deck of cards which I am dealt. The developer stated that he uses connection pooling, but apparently the pg odbc driver does not support it, since I can see a new connection being made for each statement execution. I have asked him to see if he can create one connection and maintain it - we'll see. I have another scenario to try out, whereby instead of making a stored procedure call via ODBC I will insert the parameters of the SP call into a table via simple insert statements. If I see significant improvements (I am not holding my breath), I will use that route and run a background peocess on the server to issue the SP calls off of that recordset. > -----Original Message----- > From: Jim C. Nasby [mailto:decibel@decibel.org] > Sent: Monday, December 04, 2006 8:25 PM > To: Benjamin Krajmalnik > Cc: pgsql-admin@postgresql.org > Subject: Re: [ADMIN] Performance question > > Please cc the list so others can learn and help. > > Yes, if ODBC is tearing the connection down after every call, > performance *WILL* suck. Setup some kind of persistent connection. > Better yet, don't make lots of tiny calls to the database if > you can avoid it. > > On Mon, Dec 04, 2006 at 06:28:03PM -0700, Benjamin Krajmalnik wrote: > > Hi Jim, > > > > Apples to apples (as best as I can tell). > > The test procedure was as follows: > > > > I captured the SQL statements which call the stored procedures from > > our monitoring box to a file. > > I copied the file to the FreeBSD data server. > > Executed psql, and pumped the file to it through \i filename. > > Execution speed for the code in question was sub 10 seconds. > > Running the query directly via ODBC is taking about 60 ms. > > > > The numbers I just stated above are after I removed the > code which was > > accessing the one partitioned table where the constraint > exclusion was > > not taking place (more on that later). This code happened to be > > updating a daily aggregate - so each SP call was selecting the test > > record for the day and aggregating data. I ended up adding > a few more > > fields to a records which keeps some data in arrays for > quick access > > to generate our graphs, and now I run a scheduled task at midnight > > which takes the previous days aggregates and creates our snapshot > > records (which we use for historical trending of up to 24 months). > > This process now takes about 5 seconds, whereas running it > through the > > stored procedure it was adding almost 200 miliseconds to > the execution time). > > I assume this was caused by the fact that there was a > select - and the > > execution planner was not isolating to the single > partitioned table, > > followed by either an insert or another update. > > The new code is much more efficient and loads the db much > less, so it > > will be the preferred method. > > Right now the datra path between the monitoring appliacne > and the data > > server is 100Mbit. We are moving it to Gig tonight - so at least I > > hope I will realize some speed advantages. I have been > monitoring the > > switch, and the actual throughput is only 150kbit/sec, so I do not > > have a data throughput bpttlenecg, but rather I suspect > latencies in > > the ODBC connection. > > > > The item I found interesting is that the statement duration > was much > > longer. > > What I am thinking is that when running through psql, it is using a > > local pipe. > > When running through ODBC, each stored procedure call is creating a > > connection and then disconnecting, so it has added a huge > overhead. I > > think the duration is including the time it is taking to > > connect/authenticate disconnect/teardown. > > > > Thanks for the tip on the constraint exclusion. > > I need to go back and see how we are querying it for the > reports, and > > will adjust it accordingly. > > Does it make a difference if it is timestamp or timestamp without > > timezone? > > > > > > > -----Original Message----- > > > From: Jim Nasby [mailto:decibel@decibel.org] > > > Sent: Monday, December 04, 2006 5:53 PM > > > To: Benjamin Krajmalnik > > > Cc: pgsql-admin@postgresql.org > > > Subject: Re: [ADMIN] Performance question > > > > > > Are you sure you're doing an apples-apples comparison? Is > the load > > > on both machines the same, or does production have extra stuff > > > running? > > > Have you tried your original test again in the same setup without > > > ODBC to eliminate that possibility? (or redone your original test > > > with ODBC). > > > > > > As for the constraint exclusion, try changing the query so that > > > '2006-12-01' is cast to a timestamp (or change the constraints so > > > that you're constraining dates... the check constraints should > > > really match the type of the underlying table). > > > > > > On Dec 1, 2006, at 9:18 PM, Benjamin Krajmalnik wrote: > > > > > > > I am battling a performance issue and was wondering if > > > someone could > > > > help. PostgreSQL 8.1.5, FreeBSD. > > > > > > > > I have a very intense stored procedure which performs real time > > > > aggregation of data. > > > > > > > > I captured the stored procedure calls from a production > system and > > > > pumped them through psql, logging duration. > > > > The stored procedure ran at about 30ms per stored > procedure call - > > > > which > > > > was significantly faster than the previous production system. > > > > Based on > > > > these results, I was extremely optimistic and migrated > to the new > > > > server. > > > > > > > > Now that we turned up the new server, the stored procedure > > > is taking > > > > over 250ms. > > > > These are the execution times as recorded by enable > "all" logging. > > > > > > > > The only difference is that when I pumped the data in the test > > > > environment, I pumped it by reading the script via psql, > > > whereas the > > > > current system is calling the stored procedure via ODBC > through a > > > > 100Mbit ethernet connection. > > > > > > > > I cannot explain the difference between the 2 > scenarios. Does the > > > > duration include the overhead of the ODBC layer? > > > > > > > > Now, the second part that was troubling relates to the > execution > > > > planner. > > > > > > > > > > > > Two of the tables in which data is being entered are > partitioned > > > > tables. > > > > Data is being routed to the partition by using rules. Each > > > partition > > > > has a check constraint on a date range. Constraint > exclusion is on. > > > > > > > > The first partitioned table is very simple, and the stored > > > procedure > > > > simply inserts a record for each test which is executed. These > > > > are later used by a procedure running in the background which > > > calculates > > > > statistical data. > > > > > > > > My problem is with the second partitioned table. As part of my > > > > troubleshooting, I omitted the code from the stored procedure > > > > which accesses this data. In this table, I attempt to retrieve > > > > the daily record for a specific test. If it is not > found, after > > > > aggregating data I insert it. If the record already > exists, data > > > > is aggregated and the record is updated. > > > > > > > > The key is composed of a date field (monthdate) and an > int4 field > > > > (kstestsysid). > > > > The check constraints are of the form of: > > > > > > > > ALTER TABLE tblkssnapshot12 > > > > ADD CONSTRAINT tblkssnapshot12_chk CHECK (monthdate >= > > > '2006-12-01 > > > > 00:00:00'::timestamp without time zone AND monthdate < > '2007-01-01 > > > > 00:00:00'::timestamp without time zone); > > > > > > > > I ran a sample query which would be issued by the > stored procedure > > > > with the explain option. > > > > The query was: > > > > > > > > select monthdate, kstestssysid from tblkssnapshotdaily > > > where monthdate > > > > = '2006-12-01' and kstestssysid = 3143 > > > > > > > > The explain is as follows: > > > > > > > > select monthdate, kstestssysid from tblkssnapshotdaily > > > where monthdate > > > > = '2006-12-01' and kstestssysid = 3143 Result > (cost=0.00..122.49 > > > > rows=25 width=8) > > > > -> Append (cost=0.00..122.49 rows=25 width=8) > > > > -> Index Scan using > tblkssnapshotdaily_idx_monthtest on > > > > tblkssnapshotdaily (cost=0.00..3.52 rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot01_idx_monthtest on > > > > tblkssnapshot01 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot02_idx_monthtest on > > > > tblkssnapshot02 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot03_idx_monthtest on > > > > tblkssnapshot03 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot04_idx_monthtest on > > > > tblkssnapshot04 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot05_idx_monthtest on > > > > tblkssnapshot05 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot06_idx_monthtest on > > > > tblkssnapshot06 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot07_idx_monthtest on > > > > tblkssnapshot07 tblkssnapshotdaily (cost=0.00..5.94 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot08_idx_monthtest on > > > > tblkssnapshot08 tblkssnapshotdaily (cost=0.00..4.73 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot09_idx_monthtest on > > > > tblkssnapshot09 tblkssnapshotdaily (cost=0.00..5.56 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot10_idx_monthtest on > > > > tblkssnapshot10 tblkssnapshotdaily (cost=0.00..5.75 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot11_idx_monthtest on > > > > tblkssnapshot11 tblkssnapshotdaily (cost=0.00..5.81 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot12_idx_monthtest on > > > > tblkssnapshot12 tblkssnapshotdaily (cost=0.00..4.33 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot13_idx_monthtest on > > > > tblkssnapshot13 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot14_idx_monthtest on > > > > tblkssnapshot14 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot15_idx_monthtest on > > > > tblkssnapshot15 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot16_idx_monthtest on > > > > tblkssnapshot16 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot17_idx_monthtest on > > > > tblkssnapshot17 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot18_idx_monthtest on > > > > tblkssnapshot18 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot19_idx_monthtest on > > > > tblkssnapshot19 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot20_idx_monthtest on > > > > tblkssnapshot20 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot21_idx_monthtest on > > > > tblkssnapshot21 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot22_idx_monthtest on > > > > tblkssnapshot22 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot23_idx_monthtest on > > > > tblkssnapshot23 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > -> Index Scan using tblkssnapshot24_idx_monthtest on > > > > tblkssnapshot24 tblkssnapshotdaily (cost=0.00..4.83 > rows=1 width=8) > > > > Index Cond: ((monthdate = '2006-12-01'::date) AND > > > > (kstestssysid = 3143)) > > > > > > > > If constraint exclusion is on, why are index scans taking > > > place on all > > > > of the tables and not only on the only partitioned table > > > for which the > > > > constraint exists? > > > > > > > > I went to a partitioned table thinking this would improve > > > performance > > > > from the previous schema in which all of the data was in the > > > > parent table, but the end result was significantly slower > > > > performance (by orders of magnitude). > > > > > > > > > > > > > > > > ---------------------------(end of > > > > broadcast)--------------------------- > > > > TIP 5: don't forget to increase your free space map settings > > > > > > > > > > -- > > > Jim Nasby jim@nasby.net > > > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) > > > > > > > > > > > > > -- > Jim C. Nasby, Database Architect decibel@decibel.org > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" >
pgsql-admin by date: