Thread: Postgres refusing to use >1 core

Postgres refusing to use >1 core

From
Aren Cambre
Date:
I have a multi-threaded app. It uses ~22 threads to query Postgres.

Postgres won't use more than 1 CPU core. The 22-threaded app only has 3% CPU utilization because it's mostly waiting on Postgres.

Here's the details:

The app has a "main" thread that reads table A's 11,000,000 rows, one at a time. The main thread spawns a new thread for each row in table A's data. This new thread:
  1. Opens a connection to the DB.
  2. Does some calculations on the data, including 1 to 102 SELECTs on table B.
  3. With an INSERT query, writes a new row to table C.
  4. Closes the connection.
  5. Thread dies. Its data is garbage collected eventually.
Physical/software details:
  • Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading
  • 7200 RPM 500 GB HDD
  • About 1/2 total RAM is free during app execution
  • Windows 7 x64
  • Postgres 9.0.4 32-bit (32-bit required for PostGIS)
  • App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres connection tool.
At first, the app pounds all 8 cores. But it quickly tapers off, and only 1 core that's busy. The other 7 cores are barely doing a thing.

Postgres has 9 open processes. 1 process was slamming that 1 busy core. The other 8 Postgres processes were alive but idle.

Each thread creates its own connection. It's not concurrently shared with the main thread or any other threads. I haven't disabled connection pooling; when a thread closes a connection, it's technically releasing it into a pool for later threads to use.

Disk utilization is low. The HDD light is off much more than it is on, and a review of total HDD activity put it between 0% and 10% of total capacity. The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds.

The app runs 2 different queries on table B. The 1st query is run once, the 2nd query can be run up to 101 times. Table C  has redundant indexes: every column referenced in the SQL WHERE clauses for both queries are indexed separately and jointly. E.g., if query X references columns Y and Z, there are 3 indexes:
  1. An index for Y
  2. An index for Z
  3. An index for Y and Z
Table C is simple. It has four columns: two integers, a varchar(18), and a boolean. It has no indexes. A primary key on the varchar(18) column is its only constraint.

A generalized version of my INSERT command for table C is:
INSERT INTO raw.C VALUES (:L, :M, :N, :P)

I am using parameters to fill in the 4 values.

I have verified table C manually, and correct data is being stored in it.

Several Google searches suggest Postgres should use multiple cores automatically. I've consulted with Npgsql's developer, and he didn't see how Npgsql itself could force Postgres to one core. (See http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.)

What can I do to improve this? Could I be inadvertently limiting Postgres to one core?

Aren Cambre

Re: Postgres refusing to use >1 core

From
Merlin Moncure
Date:
On Mon, May 9, 2011 at 4:23 PM, Aren Cambre <aren@arencambre.com> wrote:
> I have a multi-threaded app. It uses ~22 threads to query Postgres.
> Postgres won't use more than 1 CPU core. The 22-threaded app only has 3% CPU
> utilization because it's mostly waiting on Postgres.
> Here's the details:
> The app has a "main" thread that reads table A's 11,000,000 rows, one at a
> time. The main thread spawns a new thread for each row in table A's data.
> This new thread:
>
> Opens a connection to the DB.
> Does some calculations on the data, including 1 to 102 SELECTs on table B.
> With an INSERT query, writes a new row to table C.
> Closes the connection.
> Thread dies. Its data is garbage collected eventually.
>
> Physical/software details:
>
> Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading
> 7200 RPM 500 GB HDD
> About 1/2 total RAM is free during app execution
> Windows 7 x64
> Postgres 9.0.4 32-bit (32-bit required for PostGIS)
> App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres
> connection tool.
>
> At first, the app pounds all 8 cores. But it quickly tapers off, and only 1
> core that's busy. The other 7 cores are barely doing a thing.
> Postgres has 9 open processes. 1 process was slamming that 1 busy core. The
> other 8 Postgres processes were alive but idle.
> Each thread creates its own connection. It's not concurrently shared with
> the main thread or any other threads. I haven't disabled connection pooling;
> when a thread closes a connection, it's technically releasing it into a pool
> for later threads to use.
> Disk utilization is low. The HDD light is off much more than it is on, and a
> review of total HDD activity put it between 0% and 10% of total capacity.
> The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds.
> The app runs 2 different queries on table B. The 1st query is run once, the
> 2nd query can be run up to 101 times. Table C  has redundant indexes: every
> column referenced in the SQL WHERE clauses for both queries are indexed
> separately and jointly. E.g., if query X references columns Y and Z, there
> are 3 indexes:
>
> An index for Y
> An index for Z
> An index for Y and Z
>
> Table C is simple. It has four columns: two integers, a varchar(18), and a
> boolean. It has no indexes. A primary key on the varchar(18) column is its
> only constraint.
> A generalized version of my INSERT command for table C is:
> INSERT INTO raw.C VALUES (:L, :M, :N, :P)
> I am using parameters to fill in the 4 values.
> I have verified table C manually, and correct data is being stored in it.
> Several Google searches suggest Postgres should use multiple cores
> automatically. I've consulted with Npgsql's developer, and he didn't see how
> Npgsql itself could force Postgres to one core.
> (See http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.)
> What can I do to improve this? Could I be inadvertently limiting Postgres to
> one core?

Are you sure you are really using > 1 connection?  While your test is
running, log onto postgres with psql and grab the output of
pg_stat_activity a few times.  What do you see?

merlin

Re: Postgres refusing to use >1 core

From
"Kevin Grittner"
Date:
Aren Cambre <aren@arencambre.com> wrote:

> Postgres won't use more than 1 CPU core.

One *connection* to PostgreSQL won't directly use more than one
core.  As Merlin suggests, perhaps you're really only running one
query at a time?  The other possibility is that you're somehow
acquiring locks which cause one process to block others.

>    - Core i7 processor--4 physical cores, but OS sees 8 cores
>    via hyper-threading

Most benchmarks I've seen comparing hyper-threading show that
PostgreSQL performs better if you don't try to convince it that one
core is actually two different cores.  With HT on, you tend to see
context switching storms, and performance suffers.

> At first, the app pounds all 8 cores.

You really shouldn't let the marketers get to you like that.  You
have four cores, not eight.

The most important information for finding your bottleneck is
probably going to be in pg_stat_activity and pg_locks.

-Kevin

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
Are you sure you are really using > 1 connection?  While your test is
running, log onto postgres with psql and grab the output of
pg_stat_activity a few times.  What do you see?

Thanks. If a connection corresponds to a process, then this suggests I am using 1 connection for my main thread, and all the threads it spawns are sharing another connection.

Aren

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
> Postgres won't use more than 1 CPU core.

One *connection* to PostgreSQL won't directly use more than one
core.  As Merlin suggests, perhaps you're really only running one
query at a time?  The other possibility is that you're somehow
acquiring locks which cause one process to block others.

The "one connection" theory appears correct per prior email, if correctly understood what I was reading.

I guess I need to head back over to the Npgsql folks and see what I am doing wrong?
 
>    - Core i7 processor--4 physical cores, but OS sees 8 cores
>    via hyper-threading

Most benchmarks I've seen comparing hyper-threading show that
PostgreSQL performs better if you don't try to convince it that one
core is actually two different cores.  With HT on, you tend to see
context switching storms, and performance suffers.

> At first, the app pounds all 8 cores.

You really shouldn't let the marketers get to you like that.  You
have four cores, not eight.

I agree. :-) Just trying to express things as my OS sees and reports on them.

Aren

Re: Postgres refusing to use >1 core

From
Merlin Moncure
Date:
On Mon, May 9, 2011 at 4:50 PM, Aren Cambre <aren@arencambre.com> wrote:
>> Are you sure you are really using > 1 connection?  While your test is
>> running, log onto postgres with psql and grab the output of
>> pg_stat_activity a few times.  What do you see?
>
> Thanks. If a connection corresponds to a process, then this suggests I am
> using 1 connection for my main thread, and all the threads it spawns are
> sharing another connection.

Yes.  However I can tell you with absolute certainly that postgres
will distribute work across cores.  Actually the o/s does it -- each
unique connection spawns a single threaded process on the backend.  As
long as your o/s of choice is supports using more than once process at
once, your work will distribute.  So, given that, your problem is:

*) your code is actually using only one connection
*) you have contention on the server side (say, a transaction
outstanding that it blocking everyone)
*) you have contention on the client side -- a lock in your code or
inside npgsql
*) your measuring is not correct.

so follow the advice above. we need to see pg_stat_activity, and/or
pg_locks while your test is running (especially take note of pg_lock
records with granted=f)

merlin

Re: Postgres refusing to use >1 core

From
"Kevin Grittner"
Date:
Aren Cambre <aren@arencambre.com> wrote:

>>>    - Core i7 processor--4 physical cores, but OS sees 8 cores
>>>    via hyper-threading
>>
>> Most benchmarks I've seen comparing hyper-threading show that
>> PostgreSQL performs better if you don't try to convince it that
>> one core is actually two different cores.  With HT on, you tend
>> to see context switching storms, and performance suffers.
>>
>> > At first, the app pounds all 8 cores.
>>
>> You really shouldn't let the marketers get to you like that.  You
>> have four cores, not eight.
>>
>
> I agree. :-) Just trying to express things as my OS sees and
> reports on them.

Your OS won't *see* eight processors if you turn of HT.  :-)

I'm going to pursue this digression just a little further, because
it probably will be biting you sooner or later.  We make sure to
configure the BIOS on our database servers to turn off
hyperthreading.  It really can make a big difference in performance.

-Kevin

Re: Postgres refusing to use >1 core

From
Greg Smith
Date:
On 05/09/2011 05:59 PM, Kevin Grittner wrote:
> I'm going to pursue this digression just a little further, because
> it probably will be biting you sooner or later.  We make sure to
> configure the BIOS on our database servers to turn off
> hyperthreading.  It really can make a big difference in performance.
>

You're using connection pooling quite aggressively though.  The sort of
people who do actually benefit from hyperthreading are the ones who
don't, where there's lots of CPU time being burnt up in overhead you
don't see, and that even a virtual HT processor can help handle.  I'm
not a big fan of the current hyperthreading implementation, but it's not
nearly as bad as the older ones, and there are situations where it is
useful.  I am unsurprised you don't ever see them on your workload
though, you're well tweaked enough to probably be memory or disk limited
much of the time.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
so follow the advice above. we need to see pg_stat_activity, and/or
pg_locks while your test is running (especially take note of pg_lock
records with granted=f)

Attached.

The database is named de. The process with procpid 3728 has the SQL query for my "main" thread--the one that reads the 12,000,000 rows one by one. procpid 6272 was handling the queries from the ~22 threads, although at the time this was taken, it was idle. But if I monitor it, I can see the queries of tables B and C going through it.

I am not clear what to read into pg_locks except that the "main" thread (3728's query) sure has a lot of locks! But all 3728 is doing is reading rows from table A, nothing else.

Aren
Attachment

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
Your OS won't *see* eight processors if you turn of HT.  :-)

I'm going to pursue this digression just a little further, because
it probably will be biting you sooner or later.  We make sure to
configure the BIOS on our database servers to turn off
hyperthreading.  It really can make a big difference in performance.

OK, OK, I need to admit that this is a Core i7 720QM on an HP Envy 14 laptop. :-) There is no BIOS option to disable HT.

I am a doctoral student (but married with kids, about 5-10 years over traditional doctorate student age) and am trying to speed up some of my data analysis with parallelism. Right now the current operation,if run in series, takes 30 hours and only stresses one of the 8 (fake) cores. I'd rather see something that maximizes CPU use, provided that it doesn't overwhelm I/O.

Aren

Re: Postgres refusing to use >1 core

From
Scott Marlowe
Date:
On Mon, May 9, 2011 at 8:15 PM, Aren Cambre <aren@arencambre.com> wrote:
>> Your OS won't *see* eight processors if you turn of HT.  :-)
>> I'm going to pursue this digression just a little further, because
>> it probably will be biting you sooner or later.  We make sure to
>> configure the BIOS on our database servers to turn off
>> hyperthreading.  It really can make a big difference in performance.
>
> OK, OK, I need to admit that this is a Core i7 720QM on an HP Envy 14
> laptop. :-) There is no BIOS option to disable HT.
> I am a doctoral student (but married with kids, about 5-10 years over
> traditional doctorate student age) and am trying to speed up some of my data
> analysis with parallelism. Right now the current operation,if run in series,
> takes 30 hours and only stresses one of the 8 (fake) cores. I'd rather see
> something that maximizes CPU use, provided that it doesn't overwhelm I/O.

The easiest way to use more cores is to just partition the data you
want to work on into 4 or more chunks and launch that many
multi-threaded processes at once.

Re: Postgres refusing to use >1 core

From
Merlin Moncure
Date:
On Mon, May 9, 2011 at 10:15 PM, Aren Cambre <aren@arencambre.com> wrote:
>> Your OS won't *see* eight processors if you turn of HT.  :-)
>> I'm going to pursue this digression just a little further, because
>> it probably will be biting you sooner or later.  We make sure to
>> configure the BIOS on our database servers to turn off
>> hyperthreading.  It really can make a big difference in performance.
>
> OK, OK, I need to admit that this is a Core i7 720QM on an HP Envy 14
> laptop. :-) There is no BIOS option to disable HT.
> I am a doctoral student (but married with kids, about 5-10 years over
> traditional doctorate student age) and am trying to speed up some of my data
> analysis with parallelism. Right now the current operation,if run in series,
> takes 30 hours and only stresses one of the 8 (fake) cores. I'd rather see
> something that maximizes CPU use, provided that it doesn't overwhelm I/O.
> Aren

how are you reading through the table? if you are using OFFSET, you
owe me a steak dinner.

merlin

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
how are you reading through the table? if you are using OFFSET, you
owe me a steak dinner.


Nope. :-)

Below is my exact code for the main thread. The C# PLINQ statement is highlighted. Let me know if I can help to explain this.

            NpgsqlConnection arrestsConnection = new NpgsqlConnection(Properties.Settings.Default.dbConnectionString);

            arrestsConnection.Open();

 

            /// First clear out the geocoding table

            NpgsqlCommand geocodingTableClear = new NpgsqlCommand("TRUNCATE raw.\"TxDPS geocoding\"", arrestsConnection);

            geocodingTableClear.ExecuteNonQuery();

 

            NpgsqlDataReader arrests = new NpgsqlCommand("SELECT * FROM \"raw\".\"TxDPS all arrests\"", arrestsConnection).ExecuteReader();

 

            /// Based on the pattern defined at

            /// http://social.msdn.microsoft.com/Forums/en-US/parallelextensions/thread/2f5ce226-c500-4899-a923-99285ace42ae.

            foreach(IDataRecord arrest in

                from row in arrests.AsParallel().Cast <IDataRecord>()

                select row)

            {

                Geocoder geocodeThis = new Geocoder(arrest);

                geocodeThis.Geocode();

            }

 

            arrestsConnection.Close();

 

Aren

Re: Postgres refusing to use >1 core

From
Craig Ringer
Date:
On 10/05/11 10:40, Aren Cambre wrote:
>     how are you reading through the table? if you are using OFFSET, you
>     owe me a steak dinner.
>
>
> Nope. :-)
>
> Below is my exact code for the main thread. The C# PLINQ statement is
> highlighted. Let me know if I can help to explain this.

Looking at that code, I can't help but wonder why you're not doing it
server side in batches. In general, it's really inefficient to use this
pattern:

rows = runquery("select * from something");
for (item in rows) {
  // do something with item
}

Adding implicit parallelism within the loop won't help you much if
client-side CPU use isn't your limitation. If each computation done on
"item" is very expensive in client-side CPU this pattern makes sense,
but otherwise should be avoided in favour of grabbing big chunks of rows
and processing them all at once in batch SQL statements that let the
database plan I/O efficiently.

Even if you're going to rely on client-side looping - say, because of
complex or library-based computation that must be done for each record -
you must ensure that EACH THREAD HAS ITS OWN CONNECTION, whether that's
a new connection established manually or one grabbed from an appropriate
pool. Your code below shows no evidence of that at all; you're probably
sharing one connection between all the threads, achieving no real
parallelism whatsoever.

Try limiting your parallel invocation to 4 threads (since that's number
of cores you have) and making sure each has its own connection. In your
case, that probably means having a new Geocoder instance grab a
connection from a pool that contains at least 5 connections (one per
Geocoder, plus the main connection).

It also looks - though I don't know C# and npgsql so I can't be sure -
like you're passing some kind of query result object to the Geocoder.
Avoid that, because they might be using the connection to progressively
read data behind the scenes in which case you might land up having
locking issues, accidentally serializing your parallel work on the
single main connection, etc. Instead, retrieve the contents of the
IDataRecord (whatever that is) and pass that to the new Geocoder
instance, so the new Geocoder has *absolutely* *no* *link* to the
arrestsConnection and cannot possibly depend on it accidentally.

Even better, use a server-side work queue implementation like pgq, and
have each worker use its private connection to ask the server for the
next record to process when it's done with the previous one, so you
don't need a co-ordinating queue thread in your client side at all. You
can also optionally make your client workers independent processes
rather than threads that way, which simplifies debugging and resource
management.

--
Craig Ringer

Re: Postgres refusing to use >1 core

From
Merlin Moncure
Date:
On Mon, May 9, 2011 at 9:40 PM, Aren Cambre <aren@arencambre.com> wrote:
>> how are you reading through the table? if you are using OFFSET, you
>> owe me a steak dinner.
>>
>
> Nope. :-)
> Below is my exact code for the main thread. The C# PLINQ statement is
> highlighted. Let me know if I can help to explain this.
>
>             NpgsqlConnection arrestsConnection = new
> NpgsqlConnection(Properties.Settings.Default.dbConnectionString);
>
>             arrestsConnection.Open();
>
>
>
>             /// First clear out the geocoding table
>
>             NpgsqlCommand geocodingTableClear = new NpgsqlCommand("TRUNCATE
> raw.\"TxDPS geocoding\"", arrestsConnection);
>
>             geocodingTableClear.ExecuteNonQuery();
>
>
>
>             NpgsqlDataReader arrests = new NpgsqlCommand("SELECT * FROM
> \"raw\".\"TxDPS all arrests\"", arrestsConnection).ExecuteReader();
>
>
>
>             /// Based on the pattern defined at
>
>             ///
> http://social.msdn.microsoft.com/Forums/en-US/parallelextensions/thread/2f5ce226-c500-4899-a923-99285ace42ae.
>
>             foreach(IDataRecord arrest in
>
>                 from row in arrests.AsParallel().Cast <IDataRecord>()
>
>                 select row)
>
>             {
>
>                 Geocoder geocodeThis = new Geocoder(arrest);
>
>                 geocodeThis.Geocode();
>
>             }
>
>
>
>             arrestsConnection.Close();


hm. I'm not exactly sure.  how about turning on statement level
logging on the server for a bit and seeing if any unexpected queries
are being generated and sent to the server.

merlin

Re: Postgres refusing to use >1 core

From
Craig Ringer
Date:
On 05/11/2011 05:34 AM, Aren Cambre wrote:

> Using one thread, the app can do about 111 rows per second, and it's
> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows /
> 111 rows per second ~= 30 hours.
>
> I hoped to speed things up with some parallel processing.
>
> When the app is multithreaded, the app itself consumes about 3% CPU time
> waiting for Postgres, which is only hammering 1 core and barely
> exercising disk I/O (per two programs and HDD light).

OK, so before looking at parallelism, you might want to look at why
you're not getting much out of Pg and your app with even one thread. You
should be able to put a high load on the disk disk - or one cpu core -
without needing to split out work into multiple threads and parallel
workers.

I suspect your app is doing lots of tiny single-row queries instead of
efficiently batching things. It'll be wasting huge amounts of time
waiting for results. Even if every query is individually incredibly
fast, with the number of them you seem to be doing you'll lose a LOT of
time if you loop over lots of little SELECTs.

The usual cause of the kind of slow performance you describe is an app
that "chats" with the database continuously, so its pattern is:

loop:
  ask for row from database using SELECT
  retrieve result
  do a tiny bit of processing
  continue loop

This is incredibly inefficient, because Pg is always waiting for the app
to ask for something or the app is waiting for Pg to return something.
During each switch there are delays and inefficiencies. It's actually:


loop:
  ask for a single row from database using SELECT
  [twiddle thumbs while database plans and executes the query]
  retrieve result
  do a tiny bit of processing   [Pg twiddles its thumbs]
  continue loop

What you want is your app and Pg working at the same time.

Assuming that CPU is the limitation rather than database speed and disk
I/O I'd use something like this:

Thread 1:
  get cursor for selecting all rows from database
  loop:
     get 100 rows from cursor
     add rows to processing queue
     if queue contains over 1000 rows:
         wait until queue contains less than 1000 rows

Thread 2:
  until there are no more rows:
    ask Thread 1 for 100 rows
    for each row:
       do a tiny bit of processing


By using a producer/consumer model like that you can ensure that thread
1 is always talking to the database, keeping Pg busy, and thread 2 is
always working the CPUs. The two threads should share NOTHING except the
queue to keep the whole thing simple and clean. You must make sure that
the "get 100 rows" operation of the producer can happen even while the
producer is in the middle of getting some more rows from Pg (though not
necessarily in the middle of actually appending them to the queue data
structure) so you don't accidentally serialize on access to the producer
thread.

If the single producer thread can't keep, try reading in bigger batches
or adding more producer threads with a shared queue. If the single
consumer thread can't keep up with the producer, add more consumers to
use more CPU cores.

[producer 1] [producer 2] [...] [producer n]
    |           |          |        |
    ---------------------------------
                     |
                   queue
                     |
    ---------------------------------
    |          |        |           |
[worker 1] [worker 2] [...]   [worker n]

... or you can have each worker fetch its own chunks of rows (getting
rid of the producer/consumer split) using its own connection and just
have lots more workers to handle all the wasted idle time. A
producer/consumer approach will probably be faster, though.

If the consumer threads produce a result that must be sent back to the
database, you can either have each thread write it to the database using
its own connection when it's done, or you can have them delegate that
work to another thread that's dedicated to INSERTing the results. If the
INSERTer can't keep up, guess what, you spawn more of them working off a
shared queue.

If the consumer threads require additional information from the database
to do their work, make sure they avoid the:

loop:
  fetch one row
  do work on row

pattern, instead fetching sets of rows from the database in batches. Use
joins if necessary, or the IN() criterion.

--
Craig Ringer

Re: Postgres refusing to use >1 core

From
Craig Ringer
Date:
On 11/05/11 05:34, Aren Cambre wrote:

> Using one thread, the app can do about 111 rows per second, and it's
> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows /
> 111 rows per second ~= 30 hours.

I don't know how I missed that. You ARE maxing out one cpu core, so
you're quite right that you need more threads unless you can make your
single worker more efficient.

Why not just spawn more copies of your program and have them work on
ranges of the data, though? Might that not be simpler than juggling
threading schemes?

--
Craig Ringer

Re: Postgres refusing to use >1 core

From
Scott Marlowe
Date:
On Tue, May 10, 2011 at 7:35 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 11/05/11 05:34, Aren Cambre wrote:
>
>> Using one thread, the app can do about 111 rows per second, and it's
>> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows /
>> 111 rows per second ~= 30 hours.
>
> I don't know how I missed that. You ARE maxing out one cpu core, so
> you're quite right that you need more threads unless you can make your
> single worker more efficient.
>
> Why not just spawn more copies of your program and have them work on
> ranges of the data, though? Might that not be simpler than juggling
> threading schemes?

I suggested that earlier.  But now I'm wondering if there's
efficiencies to be gained by moving all the heavy lifting to the db as
well as splitting thiings into multiple partitions to work on.  I.e.
don't grab 1,000 rows and work on them on the client side and then
insert data, do the data mangling in the query in the database.  My
experience has been that moving things like this into the database can
result in performance gains of several factors, taking hour long
processes and making them run in minutes.

Re: Postgres refusing to use >1 core

From
"Pierre C"
Date:
> I suspect your app is doing lots of tiny single-row queries instead of
> efficiently batching things. It'll be wasting huge amounts of time
> waiting for results. Even if every query is individually incredibly
> fast, with the number of them you seem to be doing you'll lose a LOT of
> time if you loop over lots of little SELECTs.

Using unix sockets, you can expect about 10-20.000 queries/s on small
simple selects per core, which is quite a feat. TCP adds overhead, so it's
slower. Over a network, add ping time.

In plpgsql code, you avoid roundtrips, data serializing, and context
switches, it can be 2-4x faster.

But a big SQL query can process millions of rows/s, it is much more
efficient.

Re: Postgres refusing to use >1 core

From
Shaun Thomas
Date:
On 05/10/2011 11:26 PM, Scott Marlowe wrote:

> I.e. don't grab 1,000 rows and work on them on the client side and
> then insert data, do the data mangling in the query in the database.
> My experience has been that moving things like this into the database
> can result in performance gains of several factors, taking hour long
> processes and making them run in minutes.

This is a problem I encounter constantly wherever I go. Programmer
selects millions of rows from giant table. Programmer loops through
results one by one doing some magic on them. Programmer submits queries
back to the database. Even in batches, that's going to take ages.

Databases are beasts at set-based operations. If the programmer can
build a temp table of any kind and load that, they can turn their
update/insert/whatever into a simple JOIN that runs several orders of
magnitude faster. Going the route of parallelism will probably work too,
but I doubt it's the right solution in this case.

When there are tables with millions of rows involved, processing 111 per
second is a bug. Even with ten perfectly balanced threads, 30 hours only
becomes three. On decent hardware, you can probably drop, reload, and
index the entire table faster than that.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Postgres refusing to use >1 core

From
"Pierre C"
Date:
> This is a problem I encounter constantly wherever I go. Programmer
> selects millions of rows from giant table. Programmer loops through
> results one by one doing some magic on them. Programmer submits queries
> back to the database. Even in batches, that's going to take ages.

Reminds me of a recent question on stackoverflow :

http://stackoverflow.com/questions/5952020/how-to-optimize-painfully-slow-mysql-query-that-finds-correlations

And the answer :


http://stackoverflow.com/questions/5952020/how-to-optimize-painfully-slow-mysql-query-that-finds-correlations/5954041#5954041

OP was thinking "row-based", with subqueries in the role of "doing some
magicm".
Using a set-based solution with cascading WITH CTEs (and using the
previous CTE as a source in the next one for aggregation) => 100x speedup !

Re: Postgres refusing to use >1 core

From
Date:
---- Original message ----
>Date: Wed, 11 May 2011 11:04:49 -0500
>From: pgsql-performance-owner@postgresql.org (on behalf of Shaun Thomas <sthomas@peak6.com>)
>Subject: Re: [PERFORM] Postgres refusing to use >1 core
>To: Scott Marlowe <scott.marlowe@gmail.com>
>Cc: Craig Ringer <craig@postnewspapers.com.au>,Aren Cambre <aren@arencambre.com>,<pgsql-performance@postgresql.org>
>
>On 05/10/2011 11:26 PM, Scott Marlowe wrote:
>
>> I.e. don't grab 1,000 rows and work on them on the client side and
>> then insert data, do the data mangling in the query in the database.
>> My experience has been that moving things like this into the database
>> can result in performance gains of several factors, taking hour long
>> processes and making them run in minutes.
>
>This is a problem I encounter constantly wherever I go. Programmer
>selects millions of rows from giant table. Programmer loops through
>results one by one doing some magic on them. Programmer submits queries
>back to the database. Even in batches, that's going to take ages.
>
>Databases are beasts at set-based operations. If the programmer can
>build a temp table of any kind and load that, they can turn their
>update/insert/whatever into a simple JOIN that runs several orders of
>magnitude faster. Going the route of parallelism will probably work too,
>but I doubt it's the right solution in this case.
>
>When there are tables with millions of rows involved, processing 111 per
>second is a bug. Even with ten perfectly balanced threads, 30 hours only
>becomes three. On decent hardware, you can probably drop, reload, and
>index the entire table faster than that.
>
>--
>Shaun Thomas
>OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
>312-676-8870
>sthomas@peak6.com
>
>______________________________________________
>
>See  http://www.peak6.com/email_disclaimer.php
>for terms and conditions related to this email
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

So, the $64 question:  how did you find an engagement where, to bend Shakespeare, "first thing we do, is kill all the
coders"isn't required?  This RBAR mentality, abetted by xml/NoSql/xBase, is utterly pervasive.  They absolutely refuse
tolearn anything different from the COBOL/VSAM messes of their grandfathers; well modulo syntax, of course.  The mere
suggestion,in my experience, that doing things faster with fewer lines of code/statements in the engine is met with
overthostility. 

Regards,
Robert

Re: Postgres refusing to use >1 core

From
Scott Marlowe
Date:
On Wed, May 11, 2011 at 1:53 PM,  <gnuoytr@rcn.com> wrote:

> So, the $64 question:  how did you find an engagement where, to bend Shakespeare, "first thing we do, is kill all the
coders"isn't required?  This RBAR mentality, abetted by xml/NoSql/xBase, is utterly pervasive.  They absolutely refuse
tolearn anything different from the COBOL/VSAM messes of their grandfathers; well modulo syntax, of course.  The mere
suggestion,in my experience, that doing things faster with fewer lines of code/statements in the engine is met with
overthostility. 

It really depends.  For a lot of development scaling to large numbers
of users is never needed, and it's often more economical to develop
quickly with a less efficient database layer.  In my last job all our
main development was against a large transactional / relational db.
But some quick and dirty internal development used some very
inefficient MVC methods but it only had to handle 45 users at a time,
max, and that was 45 users who accessed the system a few minutes at a
time.

I've seen EVA systems that people tried to scale that were handling
thousands of queries a second that when converted to real relational
dbs needed dozens of queries a second to run, required a fraction of
db horsepower, and could scale to the same number of users with only
1/10th to 1/100th the database underneath it.  In those instances, you
only have to show the much higher efficiency to the people who pay for
the database servers.

Re: Postgres refusing to use >1 core

From
Shaun Thomas
Date:
On 05/11/2011 02:53 PM, gnuoytr@rcn.com wrote:

> So, the $64 question:  how did you find an engagement where, to bend
> Shakespeare, "first thing we do, is kill all the coders" isn't
> required?

It's just one of those things you have to explain. Not just how to fix
it, but *why* doing so fixes it. It's also not really a fair expectation
in a lot of ways. Even when a coder uses all SQL, their inexperience in
the engine can still ruin performance. We spend years getting to know
PostgreSQL, or just general DB techniques. They do the same with coding.
And unless they're a developer for a very graphics intensive project,
they're probably not well acquainted with set theory.

Just today, I took a query like this:

   UPDATE customer c
      SET c.login_counter = a.counter
     FROM (SELECT session_id, count(*) as counter
             FROM session
            WHERE date_created >= CURRENT_DATE
            GROUP BY session_id) a
    WHERE c.process_date = CURRENT_DATE
      AND c.customer_id = a.session_id

And suggested this instead:

   CREATE TEMP TABLE tmp_login_counts AS
   SELECT session_id, count(1) AS counter
     FROM auth_token_arc
    WHERE date_created >= CURRENT_DATE
    GROUP BY session_id

   UPDATE reporting.customer c
      SET login_counter = a.counter
     FROM tmp_login_counts a
    WHERE c.process_date = CURRENT_DATE
      AND c.customer_id = a.session_id

The original query, with our very large tables, ran for over *two hours*
thanks to a nested loop iterating over the subquery. My replacement ran
in roughly 30 seconds. If we were using a newer version of PG, we could
have used a CTE. But do you get what I mean? Temp tables are a fairly
common technique, but how would a coder know about CTEs? They're pretty
new, even to *us*.

We hold regular Lunch'n'Learns for our developers to teach them the
good/bad of what they're doing, and that helps significantly. Even hours
later, I see them using the techniques I showed them. The one I'm
presenting soon is entitled '10 Ways to Ruin Performance' and they're
all specific examples taken from day-to-day queries and jobs here, all
from different categories of mistake. It's just a part of being a good DBA.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Postgres refusing to use >1 core

From
Date:
---- Original message ----
>Date: Wed, 11 May 2011 17:04:50 -0500
>From: pgsql-performance-owner@postgresql.org (on behalf of Shaun Thomas <sthomas@peak6.com>)
>Subject: Re: [PERFORM] Postgres refusing to use >1 core
>To: <gnuoytr@rcn.com>
>Cc: Scott Marlowe <scott.marlowe@gmail.com>,Craig Ringer <craig@postnewspapers.com.au>,Aren Cambre
<aren@arencambre.com>,<pgsql-performance@postgresql.org>
>
>On 05/11/2011 02:53 PM, gnuoytr@rcn.com wrote:
>
>> So, the $64 question:  how did you find an engagement where, to bend
>> Shakespeare, "first thing we do, is kill all the coders" isn't
>> required?
>
>It's just one of those things you have to explain. Not just how to fix
>it, but *why* doing so fixes it. It's also not really a fair expectation
>in a lot of ways. Even when a coder uses all SQL, their inexperience in
>the engine can still ruin performance. We spend years getting to know
>PostgreSQL, or just general DB techniques. They do the same with coding.
>And unless they're a developer for a very graphics intensive project,
>they're probably not well acquainted with set theory.
>
>Just today, I took a query like this:
>
>   UPDATE customer c
>      SET c.login_counter = a.counter
>     FROM (SELECT session_id, count(*) as counter
>             FROM session
>            WHERE date_created >= CURRENT_DATE
>            GROUP BY session_id) a
>    WHERE c.process_date = CURRENT_DATE
>      AND c.customer_id = a.session_id
>
>And suggested this instead:
>
>   CREATE TEMP TABLE tmp_login_counts AS
>   SELECT session_id, count(1) AS counter
>     FROM auth_token_arc
>    WHERE date_created >= CURRENT_DATE
>    GROUP BY session_id
>
>   UPDATE reporting.customer c
>      SET login_counter = a.counter
>     FROM tmp_login_counts a
>    WHERE c.process_date = CURRENT_DATE
>      AND c.customer_id = a.session_id
>
>The original query, with our very large tables, ran for over *two hours*
>thanks to a nested loop iterating over the subquery. My replacement ran
>in roughly 30 seconds. If we were using a newer version of PG, we could
>have used a CTE. But do you get what I mean? Temp tables are a fairly
>common technique, but how would a coder know about CTEs? They're pretty
>new, even to *us*.
>
>We hold regular Lunch'n'Learns for our developers to teach them the
>good/bad of what they're doing, and that helps significantly. Even hours
>later, I see them using the techniques I showed them. The one I'm
>presenting soon is entitled '10 Ways to Ruin Performance' and they're
>all specific examples taken from day-to-day queries and jobs here, all
>from different categories of mistake. It's just a part of being a good DBA.
>
>--
>Shaun Thomas
>OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
>312-676-8870
>sthomas@peak6.com
>
>______________________________________________
>
>See  http://www.peak6.com/email_disclaimer.php
>for terms and conditions related to this email
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

You're (both) fortunate to have Suits and colleagues who are open to doing this A Better Way.  Bless you.

Regards,
Robert

Re: Postgres refusing to use >1 core

From
Josh Berkus
Date:
On 5/11/11 3:04 PM, Shaun Thomas wrote:
> The original query, with our very large tables, ran for over *two hours*
> thanks to a nested loop iterating over the subquery. My replacement ran
> in roughly 30 seconds. If we were using a newer version of PG, we could
> have used a CTE. But do you get what I mean? Temp tables are a fairly
> common technique, but how would a coder know about CTEs? They're pretty
> new, even to *us*.

For that matter, it would be even better if PostgreSQL realized that a
materialize of the subquery was a better execution plan, and just did it
for you.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
I suspect your app is doing lots of tiny single-row queries instead of
efficiently batching things. It'll be wasting huge amounts of time
waiting for results. Even if every query is individually incredibly
fast, with the number of them you seem to be doing you'll lose a LOT of
time if you loop over lots of little SELECTs.

So here's what's going on.

I have a table of about 12,000,000 traffic tickets written by the Texas Department of Public Safety (TxDPS). Each ticket has a route name and a reference marker. On Interstate highways, reference marker = mile post. On all other roads, from US highways down to Farm to Market roads, the reference marker is based on a grid superimposed over the state. Basically that reference marker increments as the road crosses a grid line, so unless the road is perfectly N-S or E-W, these reference markers are more than a mile apart.

I have a separate table with data from the Texas Department of Transportation (TxDOT). It is a database of almost all the state's reference markers, along with latitude and longitude for each.

I am trying to geolocate each ticket by correlating the ticket's route/reference marker to the same in the TxDOT database. And it's not straightforward for a few reasons:

1. TxDPS and TxDOT formats are different.

TxDPS uses 1-5 to denote route type. 1 = Interstate. 2 = US or any state highway except Farm to Market. 3 = Farm to Market, 4 = county road, 5 = local road. So if the route name is 0071 and route type is 2, it could mean US 71 or TX 71, both of which really exist in Texas but are on different parts of the state.

I haven't proven it yet, but it is possible that no two routes of the same number are in the same county. You wouldn't find both TX 71 and US 71 in the same county.

For now, I am looking up the TxDOT database based on route type, name, and county, and I may need to repeat the lookup until I get a match.

In the above example, if the ticket is written for route_name = 0071, route_type = 2, and county = 206, then I need to do searches against the TxDOT database for:
  1. rte_nm = 'US71' AND county_num='206'
  2. rte_nm = 'SH71' AND county_num='206'
  3. rte_nm = 'UA71' AND county_num='206'
  4. rte_nm = 'UP71' AND county_num='206'
  5. ...
2. Not TxDPS reference markers correspond to TxDOT reference markers.

Now, if I've matched a route, I have to find the reference marker.

The TxDOT database is pretty good but not 100% complete, so some TxDPS tickets' reference markers may not exist in the TxDOT table. Plus, it's possible that some TxDPS tickets have the wrong marker.

To compensate, I am looking for the closest reference marker along the route that is not more than 50 marker units away, either direction. I've again implemented that with multiple queries, where I don't stop until I find a match. Suppose I am searching for reference marker 256 on TX 71. The queries will be like this:
  1. rte_nm = 'SH71' AND rm = '256' (base marker)
  2. rte_nm = 'SH71' AND rm = '257' (+1)
  3. rte_nm = 'SH71' AND rm = '255' (-1)
  4. rte_nm = 'SH71' AND rm = '258' (+2)
  5. rte_nm = 'SH71' AND rm = '254' (-2)
  6. ...
  7. rte_nm = 'SH71' AND rm = '306' (+50)
  8. rte_nm = 'SH71' AND rm = '206' (-50)
Assuming a matching route name was found in the prior step, the app will have 1 to 101 of these queries for each ticket.

Assuming steps 1 and 2 above worked out, now I have a reference marker. So I write to a third table that has four columns:
  1. HA_Arrest_Key (varchar(18) that refers back to the TxDPS tickets table
  2. gid (integer that refers to the unique identity of the reference marker in the TxDOT table)
  3. distance (integer that is the distance, in reference markers, between that noted in the TxDPS ticket and the nearest marker found in the TxDOT table)
  4. hasLatLong (Boolean that is true if TxDPS also recorded latitude and longitude for the ticket, presumably from an in-car device. These don't appear to be that accurate, plus a substantial portion of tickets have no lat/long.)
Right now, I am doing a separate INSERT for each of the 12,000,000 rows inserted into this table.

I guess the app is chatty like you suggest? HOWEVER, if I am reading system activity correctly, the master thread that is going through the 12,000,000 tickets appears to have its own Postgres process, and based on how quickly RAM usage initially shoots up the first ~60 seconds or so the app runs, it may be reading all these rows into memory. But I am consulting with Npgsql developers separately to make sure I am really understanding correctly. They suspect that the PLINQ stuff (basically "multithreading in a can") may not be dispatching threads as expected because it may be misreading things.

By using a producer/consumer model like that you can ensure that thread
1 is always talking to the database, keeping Pg busy, and thread 2 is
always working the CPUs.

Thanks for the example and illustration.

... or you can have each worker fetch its own chunks of rows (getting
rid of the producer/consumer split) using its own connection and just
have lots more workers to handle all the wasted idle time. A
producer/consumer approach will probably be faster, though.

That's what PLINQ is supposed to do. In theory. :-) Working with Npgsql folks to see if something is tripping it up.

Aren

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
> Using one thread, the app can do about 111 rows per second, and it's
> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows /
> 111 rows per second ~= 30 hours.

I don't know how I missed that. You ARE maxing out one cpu core, so
you're quite right that you need more threads unless you can make your
single worker more efficient.

And the problem is my app already has between 20 and 30 threads. Something about C#'s PLINQ may not be working as intended...

Aren

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:

I suspect your app is doing lots of tiny single-row queries instead of
efficiently batching things. It'll be wasting huge amounts of time
waiting for results. Even if every query is individually incredibly
fast, with the number of them you seem to be doing you'll lose a LOT of
time if you loop over lots of little SELECTs.

Using unix sockets, you can expect about 10-20.000 queries/s on small simple selects per core, which is quite a feat. TCP adds overhead, so it's slower. Over a network, add ping time.

I'm talking to a Postgres on localhost, so in theory, I ought to be getting really good throughput, but again, the problem may be with the way C#'s PLINQ "multithreading in a can" is managing things.

Aren 

Re: Postgres refusing to use >1 core

From
Scott Marlowe
Date:
On Wed, May 11, 2011 at 9:20 PM, Aren Cambre <aren@arencambre.com> wrote:
>> Using unix sockets, you can expect about 10-20.000 queries/s on small
>> simple selects per core, which is quite a feat. TCP adds overhead, so it's
>> slower. Over a network, add ping time.
>
> I'm talking to a Postgres on localhost, so in theory, I ought to be getting
> really good throughput, but again, the problem may be with the way C#'s
> PLINQ "multithreading in a can" is managing things.

local tcp is gonna be slower not faster than unix sockets, not faster.
 But the big issue is that you need to exlpore doing the work in a
large set not iteratively.  Operations on sets are often much faster
in aggregate.

Re: Postgres refusing to use >1 core

From
David Boreham
Date:
On 5/11/2011 9:17 PM, Aren Cambre wrote:
>
> So here's what's going on.
>
<snip>

If I were doing this, considering the small size of the data set, I'd
read all the data into memory.
Process it entirely in memory (with threads to saturate all the
processors you have).
Then write the results to the DB.



Re: Postgres refusing to use >1 core

From
Vitalii Tymchyshyn
Date:
12.05.11 06:18, Aren Cambre написав(ла):
> Using one thread, the app can do about 111 rows per second, and it's
> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows /
> 111 rows per second ~= 30 hours.

I don't know how I missed that. You ARE maxing out one cpu core, so
you're quite right that you need more threads unless you can make your
single worker more efficient.

And the problem is my app already has between 20 and 30 threads. Something about C#'s PLINQ may not be working as intended...

Have you checked that you are really doing fetch and processing in parallel? Dunno about C#, but under Java you have to make specific settings (e.g. setFetchSize) or driver will fetch all the data on query run. Check time needed to fetch first row from the query.

Best regards, Vitalii Tymchyshyn

Re: Postgres refusing to use >1 core

From
Michael Graham
Date:
On Wed, 2011-05-11 at 17:04 -0500, Shaun Thomas wrote:
> We hold regular Lunch'n'Learns for our developers to teach them the
> good/bad of what they're doing, and that helps significantly. Even
> hours later, I see them using the techniques I showed them. The one
> I'm presenting soon is entitled '10 Ways to Ruin Performance' and
> they're all specific examples taken from day-to-day queries and jobs
> here, all from different categories of mistake. It's just a part of
> being a good DBA.

Do you happen to produce slides for these lunch n learns or are they
more informal than that?  I guess you can work out where I'm going with
this ;)

--
Michael Graham <mgraham@bloxx.com>



Re: Postgres refusing to use >1 core

From
Shaun Thomas
Date:
On 05/12/2011 03:30 AM, Michael Graham wrote:

> Do you happen to produce slides for these lunch n learns or are they
> more informal than that?  I guess you can work out where I'm going with
> this ;)

Oh of course. I use rst2s5 for my stuff, so I have the slideshow and
also generate a PDF complete with several paragraphs of explanation I
distribute after the presentation itself. I have two of them now, but
I'll probably have a third in a couple months.

My next topic will probably be geared toward actual DBAs that might be
intermediate level. Things like, what happens to an OLAP server that
undergoes maintenance and experiences rapid (temporarily exponential)
TPS increase. How that can affect the disk subsystem, how to recover,
how to possibly bootstrap as a temporary fix, etc. Certainly things I
would have liked to know before seeing them. I'm going to call it "Your
Database Probably Hates You." ;)

I have a tendency to enjoy "stories from the field," and I've got more
than a few where I've saved a database from certain death. Sometimes
it's tweaking a few config settings, sometimes it's new hardware based
on system monitoring or allocation tests. Little things Senior DBAs
might know after experiencing them, or reading lists like this one.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Postgres refusing to use >1 core

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 5/11/11 3:04 PM, Shaun Thomas wrote:
>> The original query, with our very large tables, ran for over *two hours*
>> thanks to a nested loop iterating over the subquery. My replacement ran
>> in roughly 30 seconds. If we were using a newer version of PG, we could
>> have used a CTE. But do you get what I mean? Temp tables are a fairly
>> common technique, but how would a coder know about CTEs? They're pretty
>> new, even to *us*.

> For that matter, it would be even better if PostgreSQL realized that a
> materialize of the subquery was a better execution plan, and just did it
> for you.

It does.  I was a bit surprised that Shaun apparently got a plan that
didn't include a materialize step, because when I test a similar query
here, I get:
1. a hash join, until I turn off enable_hashjoin; then
2. a merge join, until I turn off enable_mergejoin; then
3. a nestloop with materialize on the subquery scan.
In 9.0 and up I can get a nestloop without materialize by also turning
off enable_material, but pre-9.0 there's no such option ...

            regards, tom lane

Re: Postgres refusing to use >1 core

From
Shaun Thomas
Date:
On 05/12/2011 09:51 AM, Tom Lane wrote:

> It does.  I was a bit surprised that Shaun apparently got a plan that
> didn't include a materialize step, because when I test a similar query
> here, I get:

Remember when I said "old version" that prevented us from using CTEs?
We're still on 8.2 (basically, we're waiting for EnterpriseDB 9.0).
It's basically calculating the group aggregation wrong, but is that
enough to trigger it to go nuts?

SELECT c.*
  FROM customer c
  JOIN (SELECT session_id, count(1) as counter
          FROM session
         WHERE date_created >= '2011-05-11 05:00'
           AND date_created < '2011-05-11 06:00'
           AND from_interface = 'UNKNOWN'
         GROUP BY 1) a ON (c.customer_id = a.session_id)
 WHERE c.process_date = CURRENT_DATE - interval '1 day'
   AND c.row_out IS NULL;

So sayeth the planner:

 Nested Loop  (cost=167.49..2354.62 rows=6 width=237) (actual time=43.949..166858.604 rows=168 loops=1)
   ->  GroupAggregate  (cost=167.49..176.97 rows=2 width=8) (actual time=1.042..2.827 rows=209 loops=1)
         ->  Sort  (cost=167.49..170.64 rows=1260 width=8) (actual time=1.037..1.347 rows=230 loops=1)
               Sort Key: session.session_id
               ->  Index Scan using idx_session_date_created on session  (cost=0.00..102.61 rows=1260 width=8) (actual
time=0.044.
.0.690 rows=230 loops=1)
                     Index Cond: ((date_created >= '11-MAY-11 05:00:00'::timestamp without time zone) AND (date_created
<'11-MAY-11 06:00:00':: 
timestamp without time zone))
                     Filter: ((from_interface)::text = 'UNKNOWN'::text)
   ->  Index Scan using idx_customer_customer_id on customer c  (cost=0.00..1088.78 rows=3 width=237) (actual
time=19.820..798.348rows=1 loops= 
209)
         Index Cond: (c.customer_id = a.session_id)
         Filter: ((process_date = (('now'::text)::date - '@ 1 day'::interval)) AND (row_out IS NULL))
 Total runtime: 166859.040 ms

That one hour extract is much, much slower than this:

SELECT 1
  FROM customer c
  JOIN (SELECT session_id, count(*) as counter
            FROM session
           WHERE date_created >= '2011-05-08'
           GROUP BY 1) a ON (c.customer_id = a.session_id)
 WHERE c.process_date = CURRENT_DATE
   AND c.row_out IS NULL;

Which gives this plan:

 Merge Join  (cost=244565.52..246488.78 rows=377 width=0) (actual time=1958.781..2385.667 rows=22205 loops=1)
   Merge Cond: (a.session_id = c.customer_id)
   ->  GroupAggregate  (cost=19176.22..20275.99 rows=271 width=8) (actual time=1142.179..1459.779 rows=26643 loops=1)
         ->  Sort  (cost=19176.22..19541.68 rows=146184 width=8) (actual time=1142.152..1374.328 rows=179006 loops=1)
               Sort Key: session.session_id
               ->  Index Scan using idx_session_date_created on session  (cost=0.00..6635.51 rows=146184 width=8)
(actualtime=0.0 
20..160.339 rows=179267 loops=1)
                     Index Cond: (date_created >= '08-MAY-11 00:00:00'::timestamp without time zone)
   ->  Sort  (cost=225389.30..225797.47 rows=163267 width=8) (actual time=816.585..855.459 rows=155067 loops=1)
         Sort Key: c.customer_id
         ->  Index Scan using idx_customer_rpt on customer c  (cost=0.00..211252.93 rows=163267 width=8) (actual
time=0.037..90.337rows=155067  
loops=1)
               Index Cond: (process_date = '10-MAY-11 00:00:00'::timestamp without time zone)
               Filter: (row_out IS NULL)

But make the inner query slightly smaller, and...

 Nested Loop  (cost=13755.53..223453.98 rows=276 width=0)
   ->  GroupAggregate  (cost=13755.53..14558.26 rows=198 width=8)
         ->  Sort  (cost=13755.53..14022.28 rows=106700 width=8)
               Sort Key: session.session_id
               ->  Index Scan using idx_session_date_created on session  (cost=0.00..4844.37 rows=106700 width=8)
                     Index Cond: (date_created >= '09-MAY-11 00:00:00'::timestamp without time zone)
   ->  Index Scan using idx_customer_customer_id on customer c  (cost=0.00..1055.01 rows=1 width=8)
         Index Cond: (c.customer_id = a.session_id)
         Filter: ((process_date = '10-MAY-11 00:00:00'::timestamp without time zone) AND (row_out IS NULL))

I didn't want to wait two hours for that to finish. ;) But the
stats are all pretty darn close, so far as I can tell. The only
thing that's off is the group aggregate... by about two orders
of magnitude. So I just chalked it up to 8.2 being relatively
horrible, and punted to just using a temp table to trick the
optimizer into doing it right.

But my greater point was that even doing it all in SQL doesn't
always work, which we all know. Use of EXPLAIN abounds, but that
doesn't necessarily mean a dev will know how to fix a bad plan.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Postgres refusing to use >1 core

From
Eric McKeeth
Date:
On Wed, May 11, 2011 at 9:17 PM, Aren Cambre <aren@arencambre.com> wrote:
2. Not TxDPS reference markers correspond to TxDOT reference markers.

Now, if I've matched a route, I have to find the reference marker.

The TxDOT database is pretty good but not 100% complete, so some TxDPS tickets' reference markers may not exist in the TxDOT table. Plus, it's possible that some TxDPS tickets have the wrong marker.

To compensate, I am looking for the closest reference marker along the route that is not more than 50 marker units away, either direction. I've again implemented that with multiple queries, where I don't stop until I find a match. Suppose I am searching for reference marker 256 on TX 71. The queries will be like this:
  1. rte_nm = 'SH71' AND rm = '256' (base marker)
  2. rte_nm = 'SH71' AND rm = '257' (+1)
  3. rte_nm = 'SH71' AND rm = '255' (-1)
  4. rte_nm = 'SH71' AND rm = '258' (+2)
  5. rte_nm = 'SH71' AND rm = '254' (-2)
  6. ...
  7. rte_nm = 'SH71' AND rm = '306' (+50)
  8. rte_nm = 'SH71' AND rm = '206' (-50)
Assuming a matching route name was found in the prior step, the app will have 1 to 101 of these queries for each ticket.

This is a perfect example of a place where you could push some work out of the application and into the database. You can consolidate your 1 to 101 queries into a single query. If you use:

WHERE rte_nm='SH71' AND rm >= 206 AND rm <= 306 ORDER BY abs(rm - 256), rm - 256 DESC LIMIT 1

it will always return the same value as the first matching query from your list, and will never have to make more than one trip to the database. Your one trip might be slightly slower than any one of the single trips above, but it will certainly be much faster in the case where you have to hit any significant % of your 101 potential queries.

-Eric

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
Everyone,

Just wanted to say thanks for your help with my performance question. You have given me plenty of things to investigate. Further, I think the problem is almost certainly with my app, so I need to do more work there!

I really like the idea of just loading everything in memory and then dumping it all out later. I have 6 GB RAM, so it should be plenty to handle this.

Aren Cambre

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
This is a perfect example of a place where you could push some work out of the application and into the database. You can consolidate your 1 to 101 queries into a single query. If you use:

WHERE rte_nm='SH71' AND rm >= 206 AND rm <= 306 ORDER BY abs(rm - 256), rm - 256 DESC LIMIT 1

it will always return the same value as the first matching query from your list, and will never have to make more than one trip to the database. Your one trip might be slightly slower than any one of the single trips above, but it will certainly be much faster in the case where you have to hit any significant % of your 101 potential queries.

THANKS!! I've been obsessing so much about parallelism that I hadn't spent much time finding better queries.

Aren

Re: Postgres refusing to use >1 core

From
Tom Lane
Date:
Shaun Thomas <sthomas@peak6.com> writes:
> On 05/12/2011 09:51 AM, Tom Lane wrote:
>> It does.  I was a bit surprised that Shaun apparently got a plan that
>> didn't include a materialize step, because when I test a similar query
>> here, I get:

> Remember when I said "old version" that prevented us from using CTEs?
> We're still on 8.2 (basically, we're waiting for EnterpriseDB 9.0).
> It's basically calculating the group aggregation wrong, but is that
> enough to trigger it to go nuts?

Hmm.  As you say, the mistake it's making is a drastic underestimate of
the number of groups in the subquery, leading to a bad choice of join
method.  I find it odd that replacing the subquery with a temp table
helps, though, because (unless you stuck in an ANALYZE you didn't
mention) it would have no stats at all about the number of groups in the
temp table.  Maybe the default guess just happens to produce the more
desirable plan.

            regards, tom lane

Re: Postgres refusing to use >1 core

From
Shaun Thomas
Date:
On 05/12/2011 11:07 AM, Tom Lane wrote:

> I find it odd that replacing the subquery with a temp table helps,
> though, because (unless you stuck in an ANALYZE you didn't mention)
> it would have no stats at all about the number of groups in the temp
> table.

I did have an analyze initially for exactly that reason. But what I
found odd is that in my rush to execute this for the end of day reports,
I forgot that step, and it still ran fine. I've found that the planner
tends to treat un-analyzed tables somewhat pessimistically, which is
fine by me.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
Just want to again say thanks for this query. It seriously sped up part of my program.

Aren

On Thu, May 12, 2011 at 1:27 PM, Aren Cambre <aren@arencambre.com> wrote:
This is a perfect example of a place where you could push some work out of the application and into the database. You can consolidate your 1 to 101 queries into a single query. If you use:

WHERE rte_nm='SH71' AND rm >= 206 AND rm <= 306 ORDER BY abs(rm - 256), rm - 256 DESC LIMIT 1

it will always return the same value as the first matching query from your list, and will never have to make more than one trip to the database. Your one trip might be slightly slower than any one of the single trips above, but it will certainly be much faster in the case where you have to hit any significant % of your 101 potential queries.

THANKS!! I've been obsessing so much about parallelism that I hadn't spent much time finding better queries.

Aren

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
Just wanted to again say thanks for everyone's help.

The main problem was that my program was running in serial, not parallel, even though I thought I used a textbook example of PLINQ. Your assistance helped me get to the point where I could conclusively determine everything was running in serial. It was more obvious than I realized.

Thanks to help through http://stackoverflow.com/questions/6086111/plinq-on-concurrentqueue-isnt-multithreading, I have switched to the .NET Framework's Task Parallel Library, and it's slamming the 8 cores hard now! And there's a bunch of concurrent connections to Postgres. :-)

Aren

On Mon, May 9, 2011 at 4:23 PM, Aren Cambre <aren@arencambre.com> wrote:
I have a multi-threaded app. It uses ~22 threads to query Postgres.

Postgres won't use more than 1 CPU core. The 22-threaded app only has 3% CPU utilization because it's mostly waiting on Postgres.

Here's the details:

The app has a "main" thread that reads table A's 11,000,000 rows, one at a time. The main thread spawns a new thread for each row in table A's data. This new thread:
  1. Opens a connection to the DB.
  2. Does some calculations on the data, including 1 to 102 SELECTs on table B.
  3. With an INSERT query, writes a new row to table C.
  4. Closes the connection.
  5. Thread dies. Its data is garbage collected eventually.
Physical/software details:
  • Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading
  • 7200 RPM 500 GB HDD
  • About 1/2 total RAM is free during app execution
  • Windows 7 x64
  • Postgres 9.0.4 32-bit (32-bit required for PostGIS)
  • App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres connection tool.
At first, the app pounds all 8 cores. But it quickly tapers off, and only 1 core that's busy. The other 7 cores are barely doing a thing.

Postgres has 9 open processes. 1 process was slamming that 1 busy core. The other 8 Postgres processes were alive but idle.

Each thread creates its own connection. It's not concurrently shared with the main thread or any other threads. I haven't disabled connection pooling; when a thread closes a connection, it's technically releasing it into a pool for later threads to use.

Disk utilization is low. The HDD light is off much more than it is on, and a review of total HDD activity put it between 0% and 10% of total capacity. The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds.

The app runs 2 different queries on table B. The 1st query is run once, the 2nd query can be run up to 101 times. Table C  has redundant indexes: every column referenced in the SQL WHERE clauses for both queries are indexed separately and jointly. E.g., if query X references columns Y and Z, there are 3 indexes:
  1. An index for Y
  2. An index for Z
  3. An index for Y and Z
Table C is simple. It has four columns: two integers, a varchar(18), and a boolean. It has no indexes. A primary key on the varchar(18) column is its only constraint.

A generalized version of my INSERT command for table C is:
INSERT INTO raw.C VALUES (:L, :M, :N, :P)

I am using parameters to fill in the 4 values.

I have verified table C manually, and correct data is being stored in it.

Several Google searches suggest Postgres should use multiple cores automatically. I've consulted with Npgsql's developer, and he didn't see how Npgsql itself could force Postgres to one core. (See http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.)

What can I do to improve this? Could I be inadvertently limiting Postgres to one core?

Aren Cambre

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
Also, thanks for the advice on batching my queries. I am now using a very efficient bulk data read and write methods for Postgres.

My program bulk reads 100,000 rows, processes those rows (during which it does a few SELECTs), and then writes 100,000 rows at a time.

It cycles through this until it has processed all 12,000,000 rows.

This, plus the parallelism fix, will probably convert this 30 hour program to a <2 hour program.

Aren

On Sun, May 22, 2011 at 9:08 AM, Aren Cambre <aren@arencambre.com> wrote:
Just wanted to again say thanks for everyone's help.

The main problem was that my program was running in serial, not parallel, even though I thought I used a textbook example of PLINQ. Your assistance helped me get to the point where I could conclusively determine everything was running in serial. It was more obvious than I realized.

Thanks to help through http://stackoverflow.com/questions/6086111/plinq-on-concurrentqueue-isnt-multithreading, I have switched to the .NET Framework's Task Parallel Library, and it's slamming the 8 cores hard now! And there's a bunch of concurrent connections to Postgres. :-)

Aren

On Mon, May 9, 2011 at 4:23 PM, Aren Cambre <aren@arencambre.com> wrote:
I have a multi-threaded app. It uses ~22 threads to query Postgres.

Postgres won't use more than 1 CPU core. The 22-threaded app only has 3% CPU utilization because it's mostly waiting on Postgres.

Here's the details:

The app has a "main" thread that reads table A's 11,000,000 rows, one at a time. The main thread spawns a new thread for each row in table A's data. This new thread:
  1. Opens a connection to the DB.
  2. Does some calculations on the data, including 1 to 102 SELECTs on table B.
  3. With an INSERT query, writes a new row to table C.
  4. Closes the connection.
  5. Thread dies. Its data is garbage collected eventually.
Physical/software details:
  • Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading
  • 7200 RPM 500 GB HDD
  • About 1/2 total RAM is free during app execution
  • Windows 7 x64
  • Postgres 9.0.4 32-bit (32-bit required for PostGIS)
  • App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres connection tool.
At first, the app pounds all 8 cores. But it quickly tapers off, and only 1 core that's busy. The other 7 cores are barely doing a thing.

Postgres has 9 open processes. 1 process was slamming that 1 busy core. The other 8 Postgres processes were alive but idle.

Each thread creates its own connection. It's not concurrently shared with the main thread or any other threads. I haven't disabled connection pooling; when a thread closes a connection, it's technically releasing it into a pool for later threads to use.

Disk utilization is low. The HDD light is off much more than it is on, and a review of total HDD activity put it between 0% and 10% of total capacity. The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds.

The app runs 2 different queries on table B. The 1st query is run once, the 2nd query can be run up to 101 times. Table C  has redundant indexes: every column referenced in the SQL WHERE clauses for both queries are indexed separately and jointly. E.g., if query X references columns Y and Z, there are 3 indexes:
  1. An index for Y
  2. An index for Z
  3. An index for Y and Z
Table C is simple. It has four columns: two integers, a varchar(18), and a boolean. It has no indexes. A primary key on the varchar(18) column is its only constraint.

A generalized version of my INSERT command for table C is:
INSERT INTO raw.C VALUES (:L, :M, :N, :P)

I am using parameters to fill in the 4 values.

I have verified table C manually, and correct data is being stored in it.

Several Google searches suggest Postgres should use multiple cores automatically. I've consulted with Npgsql's developer, and he didn't see how Npgsql itself could force Postgres to one core. (See http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.)

What can I do to improve this? Could I be inadvertently limiting Postgres to one core?

Aren Cambre


Re: Postgres refusing to use >1 core

From
Craig Ringer
Date:
On 23/05/11 12:09, Aren Cambre wrote:
> Also, thanks for the advice on batching my queries. I am now using a
> very efficient bulk data read and write methods for Postgres.
>
> My program bulk reads 100,000 rows, processes those rows (during which
> it does a few SELECTs), and then writes 100,000 rows at a time.
>
> It cycles through this until it has processed all 12,000,000 rows.
>
> This, plus the parallelism fix, will probably convert this 30 hour
> program to a <2 hour program.

It's always good to hear when these things work out. Thanks for
reporting back.

Using the set-based nature of relational databases to your advantage,
writing smarter queries that do more work server-side with fewer
round-trips, and effective batching can make a huge difference.

--
Craig Ringer

Re: Postgres refusing to use >1 core

From
Aren Cambre
Date:
It's always good to hear when these things work out. Thanks for
reporting back.

Using the set-based nature of relational databases to your advantage,
writing smarter queries that do more work server-side with fewer
round-trips, and effective batching can make a huge difference.

Glad I could be a good digital citizen! :-)

Correction: it's going to run for significantly more than 2 hours, but far less than 30 hours!

I'm loving seeing the CPU meter showing all 8 of my (fake) cores being pounded mercilessly!

Aren