Thread: Postgres refusing to use >1 core
- Opens a connection to the DB.
- Does some calculations on the data, including 1 to 102 SELECTs on table B.
- With an INSERT query, writes a new row to table C.
- Closes the connection.
- Thread dies. Its data is garbage collected eventually.
- Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading
- 7200 RPM 500 GB HDD
- About 1/2 total RAM is free during app execution
- Windows 7 x64
- Postgres 9.0.4 32-bit (32-bit required for PostGIS)
- App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres connection tool.
- An index for Y
- An index for Z
- An index for Y and Z
On Mon, May 9, 2011 at 4:23 PM, Aren Cambre <aren@arencambre.com> wrote: > I have a multi-threaded app. It uses ~22 threads to query Postgres. > Postgres won't use more than 1 CPU core. The 22-threaded app only has 3% CPU > utilization because it's mostly waiting on Postgres. > Here's the details: > The app has a "main" thread that reads table A's 11,000,000 rows, one at a > time. The main thread spawns a new thread for each row in table A's data. > This new thread: > > Opens a connection to the DB. > Does some calculations on the data, including 1 to 102 SELECTs on table B. > With an INSERT query, writes a new row to table C. > Closes the connection. > Thread dies. Its data is garbage collected eventually. > > Physical/software details: > > Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading > 7200 RPM 500 GB HDD > About 1/2 total RAM is free during app execution > Windows 7 x64 > Postgres 9.0.4 32-bit (32-bit required for PostGIS) > App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres > connection tool. > > At first, the app pounds all 8 cores. But it quickly tapers off, and only 1 > core that's busy. The other 7 cores are barely doing a thing. > Postgres has 9 open processes. 1 process was slamming that 1 busy core. The > other 8 Postgres processes were alive but idle. > Each thread creates its own connection. It's not concurrently shared with > the main thread or any other threads. I haven't disabled connection pooling; > when a thread closes a connection, it's technically releasing it into a pool > for later threads to use. > Disk utilization is low. The HDD light is off much more than it is on, and a > review of total HDD activity put it between 0% and 10% of total capacity. > The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds. > The app runs 2 different queries on table B. The 1st query is run once, the > 2nd query can be run up to 101 times. Table C has redundant indexes: every > column referenced in the SQL WHERE clauses for both queries are indexed > separately and jointly. E.g., if query X references columns Y and Z, there > are 3 indexes: > > An index for Y > An index for Z > An index for Y and Z > > Table C is simple. It has four columns: two integers, a varchar(18), and a > boolean. It has no indexes. A primary key on the varchar(18) column is its > only constraint. > A generalized version of my INSERT command for table C is: > INSERT INTO raw.C VALUES (:L, :M, :N, :P) > I am using parameters to fill in the 4 values. > I have verified table C manually, and correct data is being stored in it. > Several Google searches suggest Postgres should use multiple cores > automatically. I've consulted with Npgsql's developer, and he didn't see how > Npgsql itself could force Postgres to one core. > (See http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.) > What can I do to improve this? Could I be inadvertently limiting Postgres to > one core? Are you sure you are really using > 1 connection? While your test is running, log onto postgres with psql and grab the output of pg_stat_activity a few times. What do you see? merlin
Aren Cambre <aren@arencambre.com> wrote: > Postgres won't use more than 1 CPU core. One *connection* to PostgreSQL won't directly use more than one core. As Merlin suggests, perhaps you're really only running one query at a time? The other possibility is that you're somehow acquiring locks which cause one process to block others. > - Core i7 processor--4 physical cores, but OS sees 8 cores > via hyper-threading Most benchmarks I've seen comparing hyper-threading show that PostgreSQL performs better if you don't try to convince it that one core is actually two different cores. With HT on, you tend to see context switching storms, and performance suffers. > At first, the app pounds all 8 cores. You really shouldn't let the marketers get to you like that. You have four cores, not eight. The most important information for finding your bottleneck is probably going to be in pg_stat_activity and pg_locks. -Kevin
Are you sure you are really using > 1 connection? While your test is
running, log onto postgres with psql and grab the output of
pg_stat_activity a few times. What do you see?
> Postgres won't use more than 1 CPU core.One *connection* to PostgreSQL won't directly use more than one
core. As Merlin suggests, perhaps you're really only running one
query at a time? The other possibility is that you're somehow
acquiring locks which cause one process to block others.
> - Core i7 processor--4 physical cores, but OS sees 8 cores
> via hyper-threading
Most benchmarks I've seen comparing hyper-threading show that
PostgreSQL performs better if you don't try to convince it that one
core is actually two different cores. With HT on, you tend to see
context switching storms, and performance suffers.You really shouldn't let the marketers get to you like that. You
> At first, the app pounds all 8 cores.
have four cores, not eight.
On Mon, May 9, 2011 at 4:50 PM, Aren Cambre <aren@arencambre.com> wrote: >> Are you sure you are really using > 1 connection? While your test is >> running, log onto postgres with psql and grab the output of >> pg_stat_activity a few times. What do you see? > > Thanks. If a connection corresponds to a process, then this suggests I am > using 1 connection for my main thread, and all the threads it spawns are > sharing another connection. Yes. However I can tell you with absolute certainly that postgres will distribute work across cores. Actually the o/s does it -- each unique connection spawns a single threaded process on the backend. As long as your o/s of choice is supports using more than once process at once, your work will distribute. So, given that, your problem is: *) your code is actually using only one connection *) you have contention on the server side (say, a transaction outstanding that it blocking everyone) *) you have contention on the client side -- a lock in your code or inside npgsql *) your measuring is not correct. so follow the advice above. we need to see pg_stat_activity, and/or pg_locks while your test is running (especially take note of pg_lock records with granted=f) merlin
Aren Cambre <aren@arencambre.com> wrote: >>> - Core i7 processor--4 physical cores, but OS sees 8 cores >>> via hyper-threading >> >> Most benchmarks I've seen comparing hyper-threading show that >> PostgreSQL performs better if you don't try to convince it that >> one core is actually two different cores. With HT on, you tend >> to see context switching storms, and performance suffers. >> >> > At first, the app pounds all 8 cores. >> >> You really shouldn't let the marketers get to you like that. You >> have four cores, not eight. >> > > I agree. :-) Just trying to express things as my OS sees and > reports on them. Your OS won't *see* eight processors if you turn of HT. :-) I'm going to pursue this digression just a little further, because it probably will be biting you sooner or later. We make sure to configure the BIOS on our database servers to turn off hyperthreading. It really can make a big difference in performance. -Kevin
On 05/09/2011 05:59 PM, Kevin Grittner wrote: > I'm going to pursue this digression just a little further, because > it probably will be biting you sooner or later. We make sure to > configure the BIOS on our database servers to turn off > hyperthreading. It really can make a big difference in performance. > You're using connection pooling quite aggressively though. The sort of people who do actually benefit from hyperthreading are the ones who don't, where there's lots of CPU time being burnt up in overhead you don't see, and that even a virtual HT processor can help handle. I'm not a big fan of the current hyperthreading implementation, but it's not nearly as bad as the older ones, and there are situations where it is useful. I am unsurprised you don't ever see them on your workload though, you're well tweaked enough to probably be memory or disk limited much of the time. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
so follow the advice above. we need to see pg_stat_activity, and/or
pg_locks while your test is running (especially take note of pg_lock
records with granted=f)
Attachment
Your OS won't *see* eight processors if you turn of HT. :-)
I'm going to pursue this digression just a little further, because
it probably will be biting you sooner or later. We make sure to
configure the BIOS on our database servers to turn off
hyperthreading. It really can make a big difference in performance.
On Mon, May 9, 2011 at 8:15 PM, Aren Cambre <aren@arencambre.com> wrote: >> Your OS won't *see* eight processors if you turn of HT. :-) >> I'm going to pursue this digression just a little further, because >> it probably will be biting you sooner or later. We make sure to >> configure the BIOS on our database servers to turn off >> hyperthreading. It really can make a big difference in performance. > > OK, OK, I need to admit that this is a Core i7 720QM on an HP Envy 14 > laptop. :-) There is no BIOS option to disable HT. > I am a doctoral student (but married with kids, about 5-10 years over > traditional doctorate student age) and am trying to speed up some of my data > analysis with parallelism. Right now the current operation,if run in series, > takes 30 hours and only stresses one of the 8 (fake) cores. I'd rather see > something that maximizes CPU use, provided that it doesn't overwhelm I/O. The easiest way to use more cores is to just partition the data you want to work on into 4 or more chunks and launch that many multi-threaded processes at once.
On Mon, May 9, 2011 at 10:15 PM, Aren Cambre <aren@arencambre.com> wrote: >> Your OS won't *see* eight processors if you turn of HT. :-) >> I'm going to pursue this digression just a little further, because >> it probably will be biting you sooner or later. We make sure to >> configure the BIOS on our database servers to turn off >> hyperthreading. It really can make a big difference in performance. > > OK, OK, I need to admit that this is a Core i7 720QM on an HP Envy 14 > laptop. :-) There is no BIOS option to disable HT. > I am a doctoral student (but married with kids, about 5-10 years over > traditional doctorate student age) and am trying to speed up some of my data > analysis with parallelism. Right now the current operation,if run in series, > takes 30 hours and only stresses one of the 8 (fake) cores. I'd rather see > something that maximizes CPU use, provided that it doesn't overwhelm I/O. > Aren how are you reading through the table? if you are using OFFSET, you owe me a steak dinner. merlin
how are you reading through the table? if you are using OFFSET, you
owe me a steak dinner.
NpgsqlConnection arrestsConnection = new NpgsqlConnection(Properties.Settings.Default.dbConnectionString);
arrestsConnection.Open();
/// First clear out the geocoding table
NpgsqlCommand geocodingTableClear = new NpgsqlCommand("TRUNCATE raw.\"TxDPS geocoding\"", arrestsConnection);
geocodingTableClear.ExecuteNonQuery();
NpgsqlDataReader arrests = new NpgsqlCommand("SELECT * FROM \"raw\".\"TxDPS all arrests\"", arrestsConnection).ExecuteReader();
/// Based on the pattern defined at
foreach(IDataRecord arrest in
from row in arrests.AsParallel().Cast <IDataRecord>()
select row)
{
Geocoder geocodeThis = new Geocoder(arrest);
geocodeThis.Geocode();
}
arrestsConnection.Close();
On 10/05/11 10:40, Aren Cambre wrote: > how are you reading through the table? if you are using OFFSET, you > owe me a steak dinner. > > > Nope. :-) > > Below is my exact code for the main thread. The C# PLINQ statement is > highlighted. Let me know if I can help to explain this. Looking at that code, I can't help but wonder why you're not doing it server side in batches. In general, it's really inefficient to use this pattern: rows = runquery("select * from something"); for (item in rows) { // do something with item } Adding implicit parallelism within the loop won't help you much if client-side CPU use isn't your limitation. If each computation done on "item" is very expensive in client-side CPU this pattern makes sense, but otherwise should be avoided in favour of grabbing big chunks of rows and processing them all at once in batch SQL statements that let the database plan I/O efficiently. Even if you're going to rely on client-side looping - say, because of complex or library-based computation that must be done for each record - you must ensure that EACH THREAD HAS ITS OWN CONNECTION, whether that's a new connection established manually or one grabbed from an appropriate pool. Your code below shows no evidence of that at all; you're probably sharing one connection between all the threads, achieving no real parallelism whatsoever. Try limiting your parallel invocation to 4 threads (since that's number of cores you have) and making sure each has its own connection. In your case, that probably means having a new Geocoder instance grab a connection from a pool that contains at least 5 connections (one per Geocoder, plus the main connection). It also looks - though I don't know C# and npgsql so I can't be sure - like you're passing some kind of query result object to the Geocoder. Avoid that, because they might be using the connection to progressively read data behind the scenes in which case you might land up having locking issues, accidentally serializing your parallel work on the single main connection, etc. Instead, retrieve the contents of the IDataRecord (whatever that is) and pass that to the new Geocoder instance, so the new Geocoder has *absolutely* *no* *link* to the arrestsConnection and cannot possibly depend on it accidentally. Even better, use a server-side work queue implementation like pgq, and have each worker use its private connection to ask the server for the next record to process when it's done with the previous one, so you don't need a co-ordinating queue thread in your client side at all. You can also optionally make your client workers independent processes rather than threads that way, which simplifies debugging and resource management. -- Craig Ringer
On Mon, May 9, 2011 at 9:40 PM, Aren Cambre <aren@arencambre.com> wrote: >> how are you reading through the table? if you are using OFFSET, you >> owe me a steak dinner. >> > > Nope. :-) > Below is my exact code for the main thread. The C# PLINQ statement is > highlighted. Let me know if I can help to explain this. > > NpgsqlConnection arrestsConnection = new > NpgsqlConnection(Properties.Settings.Default.dbConnectionString); > > arrestsConnection.Open(); > > > > /// First clear out the geocoding table > > NpgsqlCommand geocodingTableClear = new NpgsqlCommand("TRUNCATE > raw.\"TxDPS geocoding\"", arrestsConnection); > > geocodingTableClear.ExecuteNonQuery(); > > > > NpgsqlDataReader arrests = new NpgsqlCommand("SELECT * FROM > \"raw\".\"TxDPS all arrests\"", arrestsConnection).ExecuteReader(); > > > > /// Based on the pattern defined at > > /// > http://social.msdn.microsoft.com/Forums/en-US/parallelextensions/thread/2f5ce226-c500-4899-a923-99285ace42ae. > > foreach(IDataRecord arrest in > > from row in arrests.AsParallel().Cast <IDataRecord>() > > select row) > > { > > Geocoder geocodeThis = new Geocoder(arrest); > > geocodeThis.Geocode(); > > } > > > > arrestsConnection.Close(); hm. I'm not exactly sure. how about turning on statement level logging on the server for a bit and seeing if any unexpected queries are being generated and sent to the server. merlin
On 05/11/2011 05:34 AM, Aren Cambre wrote: > Using one thread, the app can do about 111 rows per second, and it's > only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows / > 111 rows per second ~= 30 hours. > > I hoped to speed things up with some parallel processing. > > When the app is multithreaded, the app itself consumes about 3% CPU time > waiting for Postgres, which is only hammering 1 core and barely > exercising disk I/O (per two programs and HDD light). OK, so before looking at parallelism, you might want to look at why you're not getting much out of Pg and your app with even one thread. You should be able to put a high load on the disk disk - or one cpu core - without needing to split out work into multiple threads and parallel workers. I suspect your app is doing lots of tiny single-row queries instead of efficiently batching things. It'll be wasting huge amounts of time waiting for results. Even if every query is individually incredibly fast, with the number of them you seem to be doing you'll lose a LOT of time if you loop over lots of little SELECTs. The usual cause of the kind of slow performance you describe is an app that "chats" with the database continuously, so its pattern is: loop: ask for row from database using SELECT retrieve result do a tiny bit of processing continue loop This is incredibly inefficient, because Pg is always waiting for the app to ask for something or the app is waiting for Pg to return something. During each switch there are delays and inefficiencies. It's actually: loop: ask for a single row from database using SELECT [twiddle thumbs while database plans and executes the query] retrieve result do a tiny bit of processing [Pg twiddles its thumbs] continue loop What you want is your app and Pg working at the same time. Assuming that CPU is the limitation rather than database speed and disk I/O I'd use something like this: Thread 1: get cursor for selecting all rows from database loop: get 100 rows from cursor add rows to processing queue if queue contains over 1000 rows: wait until queue contains less than 1000 rows Thread 2: until there are no more rows: ask Thread 1 for 100 rows for each row: do a tiny bit of processing By using a producer/consumer model like that you can ensure that thread 1 is always talking to the database, keeping Pg busy, and thread 2 is always working the CPUs. The two threads should share NOTHING except the queue to keep the whole thing simple and clean. You must make sure that the "get 100 rows" operation of the producer can happen even while the producer is in the middle of getting some more rows from Pg (though not necessarily in the middle of actually appending them to the queue data structure) so you don't accidentally serialize on access to the producer thread. If the single producer thread can't keep, try reading in bigger batches or adding more producer threads with a shared queue. If the single consumer thread can't keep up with the producer, add more consumers to use more CPU cores. [producer 1] [producer 2] [...] [producer n] | | | | --------------------------------- | queue | --------------------------------- | | | | [worker 1] [worker 2] [...] [worker n] ... or you can have each worker fetch its own chunks of rows (getting rid of the producer/consumer split) using its own connection and just have lots more workers to handle all the wasted idle time. A producer/consumer approach will probably be faster, though. If the consumer threads produce a result that must be sent back to the database, you can either have each thread write it to the database using its own connection when it's done, or you can have them delegate that work to another thread that's dedicated to INSERTing the results. If the INSERTer can't keep up, guess what, you spawn more of them working off a shared queue. If the consumer threads require additional information from the database to do their work, make sure they avoid the: loop: fetch one row do work on row pattern, instead fetching sets of rows from the database in batches. Use joins if necessary, or the IN() criterion. -- Craig Ringer
On 11/05/11 05:34, Aren Cambre wrote: > Using one thread, the app can do about 111 rows per second, and it's > only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows / > 111 rows per second ~= 30 hours. I don't know how I missed that. You ARE maxing out one cpu core, so you're quite right that you need more threads unless you can make your single worker more efficient. Why not just spawn more copies of your program and have them work on ranges of the data, though? Might that not be simpler than juggling threading schemes? -- Craig Ringer
On Tue, May 10, 2011 at 7:35 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 11/05/11 05:34, Aren Cambre wrote: > >> Using one thread, the app can do about 111 rows per second, and it's >> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows / >> 111 rows per second ~= 30 hours. > > I don't know how I missed that. You ARE maxing out one cpu core, so > you're quite right that you need more threads unless you can make your > single worker more efficient. > > Why not just spawn more copies of your program and have them work on > ranges of the data, though? Might that not be simpler than juggling > threading schemes? I suggested that earlier. But now I'm wondering if there's efficiencies to be gained by moving all the heavy lifting to the db as well as splitting thiings into multiple partitions to work on. I.e. don't grab 1,000 rows and work on them on the client side and then insert data, do the data mangling in the query in the database. My experience has been that moving things like this into the database can result in performance gains of several factors, taking hour long processes and making them run in minutes.
> I suspect your app is doing lots of tiny single-row queries instead of > efficiently batching things. It'll be wasting huge amounts of time > waiting for results. Even if every query is individually incredibly > fast, with the number of them you seem to be doing you'll lose a LOT of > time if you loop over lots of little SELECTs. Using unix sockets, you can expect about 10-20.000 queries/s on small simple selects per core, which is quite a feat. TCP adds overhead, so it's slower. Over a network, add ping time. In plpgsql code, you avoid roundtrips, data serializing, and context switches, it can be 2-4x faster. But a big SQL query can process millions of rows/s, it is much more efficient.
On 05/10/2011 11:26 PM, Scott Marlowe wrote: > I.e. don't grab 1,000 rows and work on them on the client side and > then insert data, do the data mangling in the query in the database. > My experience has been that moving things like this into the database > can result in performance gains of several factors, taking hour long > processes and making them run in minutes. This is a problem I encounter constantly wherever I go. Programmer selects millions of rows from giant table. Programmer loops through results one by one doing some magic on them. Programmer submits queries back to the database. Even in batches, that's going to take ages. Databases are beasts at set-based operations. If the programmer can build a temp table of any kind and load that, they can turn their update/insert/whatever into a simple JOIN that runs several orders of magnitude faster. Going the route of parallelism will probably work too, but I doubt it's the right solution in this case. When there are tables with millions of rows involved, processing 111 per second is a bug. Even with ten perfectly balanced threads, 30 hours only becomes three. On decent hardware, you can probably drop, reload, and index the entire table faster than that. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
> This is a problem I encounter constantly wherever I go. Programmer > selects millions of rows from giant table. Programmer loops through > results one by one doing some magic on them. Programmer submits queries > back to the database. Even in batches, that's going to take ages. Reminds me of a recent question on stackoverflow : http://stackoverflow.com/questions/5952020/how-to-optimize-painfully-slow-mysql-query-that-finds-correlations And the answer : http://stackoverflow.com/questions/5952020/how-to-optimize-painfully-slow-mysql-query-that-finds-correlations/5954041#5954041 OP was thinking "row-based", with subqueries in the role of "doing some magicm". Using a set-based solution with cascading WITH CTEs (and using the previous CTE as a source in the next one for aggregation) => 100x speedup !
---- Original message ---- >Date: Wed, 11 May 2011 11:04:49 -0500 >From: pgsql-performance-owner@postgresql.org (on behalf of Shaun Thomas <sthomas@peak6.com>) >Subject: Re: [PERFORM] Postgres refusing to use >1 core >To: Scott Marlowe <scott.marlowe@gmail.com> >Cc: Craig Ringer <craig@postnewspapers.com.au>,Aren Cambre <aren@arencambre.com>,<pgsql-performance@postgresql.org> > >On 05/10/2011 11:26 PM, Scott Marlowe wrote: > >> I.e. don't grab 1,000 rows and work on them on the client side and >> then insert data, do the data mangling in the query in the database. >> My experience has been that moving things like this into the database >> can result in performance gains of several factors, taking hour long >> processes and making them run in minutes. > >This is a problem I encounter constantly wherever I go. Programmer >selects millions of rows from giant table. Programmer loops through >results one by one doing some magic on them. Programmer submits queries >back to the database. Even in batches, that's going to take ages. > >Databases are beasts at set-based operations. If the programmer can >build a temp table of any kind and load that, they can turn their >update/insert/whatever into a simple JOIN that runs several orders of >magnitude faster. Going the route of parallelism will probably work too, >but I doubt it's the right solution in this case. > >When there are tables with millions of rows involved, processing 111 per >second is a bug. Even with ten perfectly balanced threads, 30 hours only >becomes three. On decent hardware, you can probably drop, reload, and >index the entire table faster than that. > >-- >Shaun Thomas >OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 >312-676-8870 >sthomas@peak6.com > >______________________________________________ > >See http://www.peak6.com/email_disclaimer.php >for terms and conditions related to this email > >-- >Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-performance So, the $64 question: how did you find an engagement where, to bend Shakespeare, "first thing we do, is kill all the coders"isn't required? This RBAR mentality, abetted by xml/NoSql/xBase, is utterly pervasive. They absolutely refuse tolearn anything different from the COBOL/VSAM messes of their grandfathers; well modulo syntax, of course. The mere suggestion,in my experience, that doing things faster with fewer lines of code/statements in the engine is met with overthostility. Regards, Robert
On Wed, May 11, 2011 at 1:53 PM, <gnuoytr@rcn.com> wrote: > So, the $64 question: how did you find an engagement where, to bend Shakespeare, "first thing we do, is kill all the coders"isn't required? This RBAR mentality, abetted by xml/NoSql/xBase, is utterly pervasive. They absolutely refuse tolearn anything different from the COBOL/VSAM messes of their grandfathers; well modulo syntax, of course. The mere suggestion,in my experience, that doing things faster with fewer lines of code/statements in the engine is met with overthostility. It really depends. For a lot of development scaling to large numbers of users is never needed, and it's often more economical to develop quickly with a less efficient database layer. In my last job all our main development was against a large transactional / relational db. But some quick and dirty internal development used some very inefficient MVC methods but it only had to handle 45 users at a time, max, and that was 45 users who accessed the system a few minutes at a time. I've seen EVA systems that people tried to scale that were handling thousands of queries a second that when converted to real relational dbs needed dozens of queries a second to run, required a fraction of db horsepower, and could scale to the same number of users with only 1/10th to 1/100th the database underneath it. In those instances, you only have to show the much higher efficiency to the people who pay for the database servers.
On 05/11/2011 02:53 PM, gnuoytr@rcn.com wrote: > So, the $64 question: how did you find an engagement where, to bend > Shakespeare, "first thing we do, is kill all the coders" isn't > required? It's just one of those things you have to explain. Not just how to fix it, but *why* doing so fixes it. It's also not really a fair expectation in a lot of ways. Even when a coder uses all SQL, their inexperience in the engine can still ruin performance. We spend years getting to know PostgreSQL, or just general DB techniques. They do the same with coding. And unless they're a developer for a very graphics intensive project, they're probably not well acquainted with set theory. Just today, I took a query like this: UPDATE customer c SET c.login_counter = a.counter FROM (SELECT session_id, count(*) as counter FROM session WHERE date_created >= CURRENT_DATE GROUP BY session_id) a WHERE c.process_date = CURRENT_DATE AND c.customer_id = a.session_id And suggested this instead: CREATE TEMP TABLE tmp_login_counts AS SELECT session_id, count(1) AS counter FROM auth_token_arc WHERE date_created >= CURRENT_DATE GROUP BY session_id UPDATE reporting.customer c SET login_counter = a.counter FROM tmp_login_counts a WHERE c.process_date = CURRENT_DATE AND c.customer_id = a.session_id The original query, with our very large tables, ran for over *two hours* thanks to a nested loop iterating over the subquery. My replacement ran in roughly 30 seconds. If we were using a newer version of PG, we could have used a CTE. But do you get what I mean? Temp tables are a fairly common technique, but how would a coder know about CTEs? They're pretty new, even to *us*. We hold regular Lunch'n'Learns for our developers to teach them the good/bad of what they're doing, and that helps significantly. Even hours later, I see them using the techniques I showed them. The one I'm presenting soon is entitled '10 Ways to Ruin Performance' and they're all specific examples taken from day-to-day queries and jobs here, all from different categories of mistake. It's just a part of being a good DBA. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
---- Original message ---- >Date: Wed, 11 May 2011 17:04:50 -0500 >From: pgsql-performance-owner@postgresql.org (on behalf of Shaun Thomas <sthomas@peak6.com>) >Subject: Re: [PERFORM] Postgres refusing to use >1 core >To: <gnuoytr@rcn.com> >Cc: Scott Marlowe <scott.marlowe@gmail.com>,Craig Ringer <craig@postnewspapers.com.au>,Aren Cambre <aren@arencambre.com>,<pgsql-performance@postgresql.org> > >On 05/11/2011 02:53 PM, gnuoytr@rcn.com wrote: > >> So, the $64 question: how did you find an engagement where, to bend >> Shakespeare, "first thing we do, is kill all the coders" isn't >> required? > >It's just one of those things you have to explain. Not just how to fix >it, but *why* doing so fixes it. It's also not really a fair expectation >in a lot of ways. Even when a coder uses all SQL, their inexperience in >the engine can still ruin performance. We spend years getting to know >PostgreSQL, or just general DB techniques. They do the same with coding. >And unless they're a developer for a very graphics intensive project, >they're probably not well acquainted with set theory. > >Just today, I took a query like this: > > UPDATE customer c > SET c.login_counter = a.counter > FROM (SELECT session_id, count(*) as counter > FROM session > WHERE date_created >= CURRENT_DATE > GROUP BY session_id) a > WHERE c.process_date = CURRENT_DATE > AND c.customer_id = a.session_id > >And suggested this instead: > > CREATE TEMP TABLE tmp_login_counts AS > SELECT session_id, count(1) AS counter > FROM auth_token_arc > WHERE date_created >= CURRENT_DATE > GROUP BY session_id > > UPDATE reporting.customer c > SET login_counter = a.counter > FROM tmp_login_counts a > WHERE c.process_date = CURRENT_DATE > AND c.customer_id = a.session_id > >The original query, with our very large tables, ran for over *two hours* >thanks to a nested loop iterating over the subquery. My replacement ran >in roughly 30 seconds. If we were using a newer version of PG, we could >have used a CTE. But do you get what I mean? Temp tables are a fairly >common technique, but how would a coder know about CTEs? They're pretty >new, even to *us*. > >We hold regular Lunch'n'Learns for our developers to teach them the >good/bad of what they're doing, and that helps significantly. Even hours >later, I see them using the techniques I showed them. The one I'm >presenting soon is entitled '10 Ways to Ruin Performance' and they're >all specific examples taken from day-to-day queries and jobs here, all >from different categories of mistake. It's just a part of being a good DBA. > >-- >Shaun Thomas >OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 >312-676-8870 >sthomas@peak6.com > >______________________________________________ > >See http://www.peak6.com/email_disclaimer.php >for terms and conditions related to this email > >-- >Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-performance You're (both) fortunate to have Suits and colleagues who are open to doing this A Better Way. Bless you. Regards, Robert
On 5/11/11 3:04 PM, Shaun Thomas wrote: > The original query, with our very large tables, ran for over *two hours* > thanks to a nested loop iterating over the subquery. My replacement ran > in roughly 30 seconds. If we were using a newer version of PG, we could > have used a CTE. But do you get what I mean? Temp tables are a fairly > common technique, but how would a coder know about CTEs? They're pretty > new, even to *us*. For that matter, it would be even better if PostgreSQL realized that a materialize of the subquery was a better execution plan, and just did it for you. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
I suspect your app is doing lots of tiny single-row queries instead of
efficiently batching things. It'll be wasting huge amounts of time
waiting for results. Even if every query is individually incredibly
fast, with the number of them you seem to be doing you'll lose a LOT of
time if you loop over lots of little SELECTs.
- rte_nm = 'US71' AND county_num='206'
- rte_nm = 'SH71' AND county_num='206'
- rte_nm = 'UA71' AND county_num='206'
- rte_nm = 'UP71' AND county_num='206'
- ...
- rte_nm = 'SH71' AND rm = '256' (base marker)
- rte_nm = 'SH71' AND rm = '257' (+1)
- rte_nm = 'SH71' AND rm = '255' (-1)
- rte_nm = 'SH71' AND rm = '258' (+2)
- rte_nm = 'SH71' AND rm = '254' (-2)
- ...
- rte_nm = 'SH71' AND rm = '306' (+50)
- rte_nm = 'SH71' AND rm = '206' (-50)
- HA_Arrest_Key (varchar(18) that refers back to the TxDPS tickets table
- gid (integer that refers to the unique identity of the reference marker in the TxDOT table)
- distance (integer that is the distance, in reference markers, between that noted in the TxDPS ticket and the nearest marker found in the TxDOT table)
- hasLatLong (Boolean that is true if TxDPS also recorded latitude and longitude for the ticket, presumably from an in-car device. These don't appear to be that accurate, plus a substantial portion of tickets have no lat/long.)
By using a producer/consumer model like that you can ensure that thread
1 is always talking to the database, keeping Pg busy, and thread 2 is
always working the CPUs.
... or you can have each worker fetch its own chunks of rows (getting
rid of the producer/consumer split) using its own connection and just
have lots more workers to handle all the wasted idle time. A
producer/consumer approach will probably be faster, though.
> Using one thread, the app can do about 111 rows per second, and it'sI don't know how I missed that. You ARE maxing out one cpu core, so
> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows /
> 111 rows per second ~= 30 hours.
you're quite right that you need more threads unless you can make your
single worker more efficient.
Using unix sockets, you can expect about 10-20.000 queries/s on small simple selects per core, which is quite a feat. TCP adds overhead, so it's slower. Over a network, add ping time.I suspect your app is doing lots of tiny single-row queries instead of
efficiently batching things. It'll be wasting huge amounts of time
waiting for results. Even if every query is individually incredibly
fast, with the number of them you seem to be doing you'll lose a LOT of
time if you loop over lots of little SELECTs.
On Wed, May 11, 2011 at 9:20 PM, Aren Cambre <aren@arencambre.com> wrote: >> Using unix sockets, you can expect about 10-20.000 queries/s on small >> simple selects per core, which is quite a feat. TCP adds overhead, so it's >> slower. Over a network, add ping time. > > I'm talking to a Postgres on localhost, so in theory, I ought to be getting > really good throughput, but again, the problem may be with the way C#'s > PLINQ "multithreading in a can" is managing things. local tcp is gonna be slower not faster than unix sockets, not faster. But the big issue is that you need to exlpore doing the work in a large set not iteratively. Operations on sets are often much faster in aggregate.
On 5/11/2011 9:17 PM, Aren Cambre wrote: > > So here's what's going on. > <snip> If I were doing this, considering the small size of the data set, I'd read all the data into memory. Process it entirely in memory (with threads to saturate all the processors you have). Then write the results to the DB.
Have you checked that you are really doing fetch and processing in parallel? Dunno about C#, but under Java you have to make specific settings (e.g. setFetchSize) or driver will fetch all the data on query run. Check time needed to fetch first row from the query.> Using one thread, the app can do about 111 rows per second, and it'sI don't know how I missed that. You ARE maxing out one cpu core, so
> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows /
> 111 rows per second ~= 30 hours.
you're quite right that you need more threads unless you can make your
single worker more efficient.And the problem is my app already has between 20 and 30 threads. Something about C#'s PLINQ may not be working as intended...
Best regards, Vitalii Tymchyshyn
On Wed, 2011-05-11 at 17:04 -0500, Shaun Thomas wrote: > We hold regular Lunch'n'Learns for our developers to teach them the > good/bad of what they're doing, and that helps significantly. Even > hours later, I see them using the techniques I showed them. The one > I'm presenting soon is entitled '10 Ways to Ruin Performance' and > they're all specific examples taken from day-to-day queries and jobs > here, all from different categories of mistake. It's just a part of > being a good DBA. Do you happen to produce slides for these lunch n learns or are they more informal than that? I guess you can work out where I'm going with this ;) -- Michael Graham <mgraham@bloxx.com>
On 05/12/2011 03:30 AM, Michael Graham wrote: > Do you happen to produce slides for these lunch n learns or are they > more informal than that? I guess you can work out where I'm going with > this ;) Oh of course. I use rst2s5 for my stuff, so I have the slideshow and also generate a PDF complete with several paragraphs of explanation I distribute after the presentation itself. I have two of them now, but I'll probably have a third in a couple months. My next topic will probably be geared toward actual DBAs that might be intermediate level. Things like, what happens to an OLAP server that undergoes maintenance and experiences rapid (temporarily exponential) TPS increase. How that can affect the disk subsystem, how to recover, how to possibly bootstrap as a temporary fix, etc. Certainly things I would have liked to know before seeing them. I'm going to call it "Your Database Probably Hates You." ;) I have a tendency to enjoy "stories from the field," and I've got more than a few where I've saved a database from certain death. Sometimes it's tweaking a few config settings, sometimes it's new hardware based on system monitoring or allocation tests. Little things Senior DBAs might know after experiencing them, or reading lists like this one. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
Josh Berkus <josh@agliodbs.com> writes: > On 5/11/11 3:04 PM, Shaun Thomas wrote: >> The original query, with our very large tables, ran for over *two hours* >> thanks to a nested loop iterating over the subquery. My replacement ran >> in roughly 30 seconds. If we were using a newer version of PG, we could >> have used a CTE. But do you get what I mean? Temp tables are a fairly >> common technique, but how would a coder know about CTEs? They're pretty >> new, even to *us*. > For that matter, it would be even better if PostgreSQL realized that a > materialize of the subquery was a better execution plan, and just did it > for you. It does. I was a bit surprised that Shaun apparently got a plan that didn't include a materialize step, because when I test a similar query here, I get: 1. a hash join, until I turn off enable_hashjoin; then 2. a merge join, until I turn off enable_mergejoin; then 3. a nestloop with materialize on the subquery scan. In 9.0 and up I can get a nestloop without materialize by also turning off enable_material, but pre-9.0 there's no such option ... regards, tom lane
On 05/12/2011 09:51 AM, Tom Lane wrote: > It does. I was a bit surprised that Shaun apparently got a plan that > didn't include a materialize step, because when I test a similar query > here, I get: Remember when I said "old version" that prevented us from using CTEs? We're still on 8.2 (basically, we're waiting for EnterpriseDB 9.0). It's basically calculating the group aggregation wrong, but is that enough to trigger it to go nuts? SELECT c.* FROM customer c JOIN (SELECT session_id, count(1) as counter FROM session WHERE date_created >= '2011-05-11 05:00' AND date_created < '2011-05-11 06:00' AND from_interface = 'UNKNOWN' GROUP BY 1) a ON (c.customer_id = a.session_id) WHERE c.process_date = CURRENT_DATE - interval '1 day' AND c.row_out IS NULL; So sayeth the planner: Nested Loop (cost=167.49..2354.62 rows=6 width=237) (actual time=43.949..166858.604 rows=168 loops=1) -> GroupAggregate (cost=167.49..176.97 rows=2 width=8) (actual time=1.042..2.827 rows=209 loops=1) -> Sort (cost=167.49..170.64 rows=1260 width=8) (actual time=1.037..1.347 rows=230 loops=1) Sort Key: session.session_id -> Index Scan using idx_session_date_created on session (cost=0.00..102.61 rows=1260 width=8) (actual time=0.044. .0.690 rows=230 loops=1) Index Cond: ((date_created >= '11-MAY-11 05:00:00'::timestamp without time zone) AND (date_created <'11-MAY-11 06:00:00':: timestamp without time zone)) Filter: ((from_interface)::text = 'UNKNOWN'::text) -> Index Scan using idx_customer_customer_id on customer c (cost=0.00..1088.78 rows=3 width=237) (actual time=19.820..798.348rows=1 loops= 209) Index Cond: (c.customer_id = a.session_id) Filter: ((process_date = (('now'::text)::date - '@ 1 day'::interval)) AND (row_out IS NULL)) Total runtime: 166859.040 ms That one hour extract is much, much slower than this: SELECT 1 FROM customer c JOIN (SELECT session_id, count(*) as counter FROM session WHERE date_created >= '2011-05-08' GROUP BY 1) a ON (c.customer_id = a.session_id) WHERE c.process_date = CURRENT_DATE AND c.row_out IS NULL; Which gives this plan: Merge Join (cost=244565.52..246488.78 rows=377 width=0) (actual time=1958.781..2385.667 rows=22205 loops=1) Merge Cond: (a.session_id = c.customer_id) -> GroupAggregate (cost=19176.22..20275.99 rows=271 width=8) (actual time=1142.179..1459.779 rows=26643 loops=1) -> Sort (cost=19176.22..19541.68 rows=146184 width=8) (actual time=1142.152..1374.328 rows=179006 loops=1) Sort Key: session.session_id -> Index Scan using idx_session_date_created on session (cost=0.00..6635.51 rows=146184 width=8) (actualtime=0.0 20..160.339 rows=179267 loops=1) Index Cond: (date_created >= '08-MAY-11 00:00:00'::timestamp without time zone) -> Sort (cost=225389.30..225797.47 rows=163267 width=8) (actual time=816.585..855.459 rows=155067 loops=1) Sort Key: c.customer_id -> Index Scan using idx_customer_rpt on customer c (cost=0.00..211252.93 rows=163267 width=8) (actual time=0.037..90.337rows=155067 loops=1) Index Cond: (process_date = '10-MAY-11 00:00:00'::timestamp without time zone) Filter: (row_out IS NULL) But make the inner query slightly smaller, and... Nested Loop (cost=13755.53..223453.98 rows=276 width=0) -> GroupAggregate (cost=13755.53..14558.26 rows=198 width=8) -> Sort (cost=13755.53..14022.28 rows=106700 width=8) Sort Key: session.session_id -> Index Scan using idx_session_date_created on session (cost=0.00..4844.37 rows=106700 width=8) Index Cond: (date_created >= '09-MAY-11 00:00:00'::timestamp without time zone) -> Index Scan using idx_customer_customer_id on customer c (cost=0.00..1055.01 rows=1 width=8) Index Cond: (c.customer_id = a.session_id) Filter: ((process_date = '10-MAY-11 00:00:00'::timestamp without time zone) AND (row_out IS NULL)) I didn't want to wait two hours for that to finish. ;) But the stats are all pretty darn close, so far as I can tell. The only thing that's off is the group aggregate... by about two orders of magnitude. So I just chalked it up to 8.2 being relatively horrible, and punted to just using a temp table to trick the optimizer into doing it right. But my greater point was that even doing it all in SQL doesn't always work, which we all know. Use of EXPLAIN abounds, but that doesn't necessarily mean a dev will know how to fix a bad plan. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
2. Not TxDPS reference markers correspond to TxDOT reference markers.Now, if I've matched a route, I have to find the reference marker.The TxDOT database is pretty good but not 100% complete, so some TxDPS tickets' reference markers may not exist in the TxDOT table. Plus, it's possible that some TxDPS tickets have the wrong marker.To compensate, I am looking for the closest reference marker along the route that is not more than 50 marker units away, either direction. I've again implemented that with multiple queries, where I don't stop until I find a match. Suppose I am searching for reference marker 256 on TX 71. The queries will be like this:
- rte_nm = 'SH71' AND rm = '256' (base marker)
- rte_nm = 'SH71' AND rm = '257' (+1)
- rte_nm = 'SH71' AND rm = '255' (-1)
- rte_nm = 'SH71' AND rm = '258' (+2)
- rte_nm = 'SH71' AND rm = '254' (-2)
- ...
- rte_nm = 'SH71' AND rm = '306' (+50)
- rte_nm = 'SH71' AND rm = '206' (-50)
Assuming a matching route name was found in the prior step, the app will have 1 to 101 of these queries for each ticket.
This is a perfect example of a place where you could push some work out of the application and into the database. You can consolidate your 1 to 101 queries into a single query. If you use:
WHERE rte_nm='SH71' AND rm >= 206 AND rm <= 306 ORDER BY abs(rm - 256), rm - 256 DESC LIMIT 1
it will always return the same value as the first matching query from your list, and will never have to make more than one trip to the database. Your one trip might be slightly slower than any one of the single trips above, but it will certainly be much faster in the case where you have to hit any significant % of your 101 potential queries.
-Eric
This is a perfect example of a place where you could push some work out of the application and into the database. You can consolidate your 1 to 101 queries into a single query. If you use:
WHERE rte_nm='SH71' AND rm >= 206 AND rm <= 306 ORDER BY abs(rm - 256), rm - 256 DESC LIMIT 1
it will always return the same value as the first matching query from your list, and will never have to make more than one trip to the database. Your one trip might be slightly slower than any one of the single trips above, but it will certainly be much faster in the case where you have to hit any significant % of your 101 potential queries.
Shaun Thomas <sthomas@peak6.com> writes: > On 05/12/2011 09:51 AM, Tom Lane wrote: >> It does. I was a bit surprised that Shaun apparently got a plan that >> didn't include a materialize step, because when I test a similar query >> here, I get: > Remember when I said "old version" that prevented us from using CTEs? > We're still on 8.2 (basically, we're waiting for EnterpriseDB 9.0). > It's basically calculating the group aggregation wrong, but is that > enough to trigger it to go nuts? Hmm. As you say, the mistake it's making is a drastic underestimate of the number of groups in the subquery, leading to a bad choice of join method. I find it odd that replacing the subquery with a temp table helps, though, because (unless you stuck in an ANALYZE you didn't mention) it would have no stats at all about the number of groups in the temp table. Maybe the default guess just happens to produce the more desirable plan. regards, tom lane
On 05/12/2011 11:07 AM, Tom Lane wrote: > I find it odd that replacing the subquery with a temp table helps, > though, because (unless you stuck in an ANALYZE you didn't mention) > it would have no stats at all about the number of groups in the temp > table. I did have an analyze initially for exactly that reason. But what I found odd is that in my rush to execute this for the end of day reports, I forgot that step, and it still ran fine. I've found that the planner tends to treat un-analyzed tables somewhat pessimistically, which is fine by me. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
This is a perfect example of a place where you could push some work out of the application and into the database. You can consolidate your 1 to 101 queries into a single query. If you use:
WHERE rte_nm='SH71' AND rm >= 206 AND rm <= 306 ORDER BY abs(rm - 256), rm - 256 DESC LIMIT 1
it will always return the same value as the first matching query from your list, and will never have to make more than one trip to the database. Your one trip might be slightly slower than any one of the single trips above, but it will certainly be much faster in the case where you have to hit any significant % of your 101 potential queries.THANKS!! I've been obsessing so much about parallelism that I hadn't spent much time finding better queries.Aren
I have a multi-threaded app. It uses ~22 threads to query Postgres.Postgres won't use more than 1 CPU core. The 22-threaded app only has 3% CPU utilization because it's mostly waiting on Postgres.Here's the details:The app has a "main" thread that reads table A's 11,000,000 rows, one at a time. The main thread spawns a new thread for each row in table A's data. This new thread:
- Opens a connection to the DB.
- Does some calculations on the data, including 1 to 102 SELECTs on table B.
- With an INSERT query, writes a new row to table C.
- Closes the connection.
- Thread dies. Its data is garbage collected eventually.
Physical/software details:
- Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading
- 7200 RPM 500 GB HDD
- About 1/2 total RAM is free during app execution
- Windows 7 x64
- Postgres 9.0.4 32-bit (32-bit required for PostGIS)
- App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres connection tool.
At first, the app pounds all 8 cores. But it quickly tapers off, and only 1 core that's busy. The other 7 cores are barely doing a thing.Postgres has 9 open processes. 1 process was slamming that 1 busy core. The other 8 Postgres processes were alive but idle.Each thread creates its own connection. It's not concurrently shared with the main thread or any other threads. I haven't disabled connection pooling; when a thread closes a connection, it's technically releasing it into a pool for later threads to use.Disk utilization is low. The HDD light is off much more than it is on, and a review of total HDD activity put it between 0% and 10% of total capacity. The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds.The app runs 2 different queries on table B. The 1st query is run once, the 2nd query can be run up to 101 times. Table C has redundant indexes: every column referenced in the SQL WHERE clauses for both queries are indexed separately and jointly. E.g., if query X references columns Y and Z, there are 3 indexes:
- An index for Y
- An index for Z
- An index for Y and Z
Table C is simple. It has four columns: two integers, a varchar(18), and a boolean. It has no indexes. A primary key on the varchar(18) column is its only constraint.A generalized version of my INSERT command for table C is:INSERT INTO raw.C VALUES (:L, :M, :N, :P)I am using parameters to fill in the 4 values.I have verified table C manually, and correct data is being stored in it.Several Google searches suggest Postgres should use multiple cores automatically. I've consulted with Npgsql's developer, and he didn't see how Npgsql itself could force Postgres to one core. (See http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.)What can I do to improve this? Could I be inadvertently limiting Postgres to one core?Aren Cambre
Just wanted to again say thanks for everyone's help.The main problem was that my program was running in serial, not parallel, even though I thought I used a textbook example of PLINQ. Your assistance helped me get to the point where I could conclusively determine everything was running in serial. It was more obvious than I realized.Thanks to help through http://stackoverflow.com/questions/6086111/plinq-on-concurrentqueue-isnt-multithreading, I have switched to the .NET Framework's Task Parallel Library, and it's slamming the 8 cores hard now! And there's a bunch of concurrent connections to Postgres. :-)ArenOn Mon, May 9, 2011 at 4:23 PM, Aren Cambre <aren@arencambre.com> wrote:I have a multi-threaded app. It uses ~22 threads to query Postgres.Postgres won't use more than 1 CPU core. The 22-threaded app only has 3% CPU utilization because it's mostly waiting on Postgres.Here's the details:The app has a "main" thread that reads table A's 11,000,000 rows, one at a time. The main thread spawns a new thread for each row in table A's data. This new thread:
- Opens a connection to the DB.
- Does some calculations on the data, including 1 to 102 SELECTs on table B.
- With an INSERT query, writes a new row to table C.
- Closes the connection.
- Thread dies. Its data is garbage collected eventually.
Physical/software details:
- Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading
- 7200 RPM 500 GB HDD
- About 1/2 total RAM is free during app execution
- Windows 7 x64
- Postgres 9.0.4 32-bit (32-bit required for PostGIS)
- App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres connection tool.
At first, the app pounds all 8 cores. But it quickly tapers off, and only 1 core that's busy. The other 7 cores are barely doing a thing.Postgres has 9 open processes. 1 process was slamming that 1 busy core. The other 8 Postgres processes were alive but idle.Each thread creates its own connection. It's not concurrently shared with the main thread or any other threads. I haven't disabled connection pooling; when a thread closes a connection, it's technically releasing it into a pool for later threads to use.Disk utilization is low. The HDD light is off much more than it is on, and a review of total HDD activity put it between 0% and 10% of total capacity. The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds.The app runs 2 different queries on table B. The 1st query is run once, the 2nd query can be run up to 101 times. Table C has redundant indexes: every column referenced in the SQL WHERE clauses for both queries are indexed separately and jointly. E.g., if query X references columns Y and Z, there are 3 indexes:
- An index for Y
- An index for Z
- An index for Y and Z
Table C is simple. It has four columns: two integers, a varchar(18), and a boolean. It has no indexes. A primary key on the varchar(18) column is its only constraint.A generalized version of my INSERT command for table C is:INSERT INTO raw.C VALUES (:L, :M, :N, :P)I am using parameters to fill in the 4 values.I have verified table C manually, and correct data is being stored in it.Several Google searches suggest Postgres should use multiple cores automatically. I've consulted with Npgsql's developer, and he didn't see how Npgsql itself could force Postgres to one core. (See http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.)What can I do to improve this? Could I be inadvertently limiting Postgres to one core?Aren Cambre
On 23/05/11 12:09, Aren Cambre wrote: > Also, thanks for the advice on batching my queries. I am now using a > very efficient bulk data read and write methods for Postgres. > > My program bulk reads 100,000 rows, processes those rows (during which > it does a few SELECTs), and then writes 100,000 rows at a time. > > It cycles through this until it has processed all 12,000,000 rows. > > This, plus the parallelism fix, will probably convert this 30 hour > program to a <2 hour program. It's always good to hear when these things work out. Thanks for reporting back. Using the set-based nature of relational databases to your advantage, writing smarter queries that do more work server-side with fewer round-trips, and effective batching can make a huge difference. -- Craig Ringer
It's always good to hear when these things work out. Thanks forreporting back.
Using the set-based nature of relational databases to your advantage,
writing smarter queries that do more work server-side with fewer
round-trips, and effective batching can make a huge difference.