Thread: Six PostgreSQL questions from a pokerplayer

Six PostgreSQL questions from a pokerplayer

From
Patvs
Date:
I use poker software (HoldemManager) to keep track of the statistics (and
show nice graphs) of millions of poker hand histories.
This software (also PokerTracker 3) imports all the poker hands in
PostgreSQL. The software runs on Windows) only.
All of its users have NORMAL PCs. From single-core laptops, to a quadcore
desktop at best.

Questions:

-1 [quote] "POSTGRESQL uses a multi-process model. Because of this, all
multi-cpu operating systems can spread multiple database connections among
the available CPUs.
However, if only a single database connection is active, it can only use one
CPU. POSTGRESQL does not use multi-threading to allow a single process to
use multiple CPUs."[/quote]

I can see two databases in my pgAdmin: postgres and HoldemManager. All the
poker data (about 30 GB of data) is in the HoldemManager database.
Does the quote above (if true?) means, having a 2 Ghz single core or a Xeon
2x quadcore (8x 2 Ghz cores) will make no real difference for my
performance?
And the real performance increase is only for professional servers running
multiple databases? Will I greatly benefit from having quad instead of a
single-core system?

-2 In the recent 8.3 vs 8.4 benchmarks, 8.4. was much faster than 8.3
running on a 16 and 32 core server (with 64GB RAM).
With 8 cores, they were about the same speed. Does this mean on a normal
single core computer, there will be NO NOTICABLE performance increase in 8.3
vs 8.4 and even 8.2?

-3 [quote] "With PostgreSQL, you could easily have more than 1GB per backend
(if necessary) without running out of memory, which significantly pushes
away the point when you need to go to 64-bit.
In some cases it may actually be better to run a 32-bit build of PostgreSQL
to reduce memory usage. In a 64-bit server, every pointer and every integer
will take twice as much space as in a 32bit server. That overhead can be
significant, and is most likely unnecessary." [/quote]

I have no idea what the maximum amount of RAM is, my database uses. But what
exactly "will take twice as much space"?
Does this mean a simple database uses double the amount of RAM on a 64 bit
system? And it's probably better for my 30 GB database to
run a 32-bit build of PostgreSQL to reduce memory usage?

-4 One a scale from 1 to 10, how significant are the following on
performance increase:
-[ ] Getting a faster harddisk (RAID or a SSD)
-[ ] Getting a faster CPU
-[ ] Upgrading PostgreSQL (8.2 and 8.3) to 8.4
-[ ] Tweaking PostgreSQL (increasing # shared_buffers, wal_buffers,
effective_cache_size, etc.)
-[10!] Something else?
-[ ] Does NOT effect me, but I was wondering what a switch from Windows to
LINUX/Solaris does for professional server users in terms of performance.


-5 The IO operations/s performance of your harddisk vs read/write speeds vs
access time? What is more important?
With 4 regular harddisks in RAID0 you get great read/write speeds, but the
SSDs excel in IO/s and a 0.1ms access time.
What is the most usefull for which situations?


-6 The 8.4.0-1 one-click installer automatically set the encoding to UTF8.
With the other installers, I was able to
change the encoding to SQL_ASCII during the installation process. How do I
solve this after I've installed 8.4.0-1?
(I was unable to delete the postgres database, so I couldn't create a new
one with the right encoding in 8.4.0-1)
--
View this message in context:
http://www.nabble.com/Six-PostgreSQL-questions-from-a-pokerplayer-tp24337072p24337072.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Six PostgreSQL questions from a pokerplayer

From
Craig Ringer
Date:
On Sat, 2009-07-04 at 11:51 -0700, Patvs wrote:

> I can see two databases in my pgAdmin: postgres and HoldemManager. All the
> poker data (about 30 GB of data) is in the HoldemManager database.
> Does the quote above (if true?) means, having a 2 Ghz single core or a Xeon
> 2x quadcore (8x 2 Ghz cores) will make no real difference for my
> performance?

What matters isn't the number of databases, but the number of
connections. Any given connection can use at most one full core.

If you have only one actively working connection you will still gain a
bit of performance from having a second core that can do other misc work
for the OS, I/O management and general housekeeping so that the first
core can be fully dedicated to the active pg backend. More than that
probably won't gain you anything.

If you want to improve performance, first learn about where your code is
bottlenecked. Is it even CPU-limited? Often databases are really limited
by disk I/O performance rather than CPU time.

If it is CPU-limited, you might gain from having fewer faster cores,
and/or significantly faster RAM. If it's not CPU-limited, you'd be
wasting time effort and money upgrading those parts.

> -2 In the recent 8.3 vs 8.4 benchmarks, 8.4. was much faster than 8.3
> running on a 16 and 32 core server (with 64GB RAM).
> With 8 cores, they were about the same speed. Does this mean on a normal
> single core computer, there will be NO NOTICABLE performance increase in 8.3
> vs 8.4 and even 8.2?

Benchmark it and see. It'll be rather workload-dependent.

> I have no idea what the maximum amount of RAM is, my database uses. But what
> exactly "will take twice as much space"?
> Does this mean a simple database uses double the amount of RAM on a 64 bit
> system?

Absolutely not. Certain data structures take up more room because of
alignment/padding concerns, pointer size increases, etc. That does mean
that you can fit fewer of them into a given amount of memory, but it's
not a simple doubling by any stretch.

What that does mean, though, is that if you don't have significantly
more RAM than a 32-bit machine can address (say, 6 to 8 GB), you should
stick with 32-bit binaries.

> -4 One a scale from 1 to 10, how significant are the following on
> performance increase:
> -[ ] Getting a faster harddisk (RAID or a SSD)
> -[ ] Getting a faster CPU
> -[ ] Upgrading PostgreSQL (8.2 and 8.3) to 8.4
> -[ ] Tweaking PostgreSQL (increasing # shared_buffers, wal_buffers,
> effective_cache_size, etc.)
> -[10!] Something else?

Very workload dependent. Analyse what parts of your system are busiest
and which are largely idle while Pg is working hard, then consider
upgrading the busy bits.

Tweaking Pg again depends a lot on workload. Sometimes you won't gain
much, sometimes you'll see incredible gains (say, if you increase
sort/working memory\ so a sort that used to spill to disk can instead be
done in RAM).

If you have very few connections and they do really complex queries, you
might benefit from dramatically increasing work mem etc.

> -[ ] Does NOT effect me, but I was wondering what a switch from Windows to
> LINUX/Solaris does for professional server users in terms of performance.

Not a bad plan, honestly. Pg is just more mature on UNIX/Linux at this
point.

> -5 The IO operations/s performance of your harddisk vs read/write speeds vs
> access time? What is more important?

Depends on workload. If you're doing lots of sequential scans, you want
really fast sequential reads. If you're doing lots of index scans etc,
you will benefit from both sequential read speed and access time.

If you have particular queries you note are slow, consider running them
with EXPLAIN ANALYZE to see what their query plans are. What disk access
patterns are the queries resulting in? Do they have sorts spilling to
disk? etc.

> With 4 regular harddisks in RAID0 you get great read/write speeds, but the
> SSDs excel in IO/s and a 0.1ms access time.

... but are often really, really, really, really slow at writing. The
fancier ones are fast at writing but generally slow down over time.

> What is the most usefull for which situations?

Depends on your workload, see above.

--
Craig Ringer


Re: Six PostgreSQL questions from a pokerplayer

From
Mark Kirkwood
Date:
Craig Ringer wrote:
> On Sat, 2009-07-04 at 11:51 -0700, Patvs wrote:
>
>
>
>> With 4 regular harddisks in RAID0 you get great read/write speeds, but the
>> SSDs excel in IO/s and a 0.1ms access time.
>>
>
> ... but are often really, really, really, really slow at writing. The
> fancier ones are fast at writing but generally slow down over time.
>
>

Also, (probably pointing out the obvious here) to be on the safe side
you should avoid RAID0 for any data that is important to you - as it's
pretty easy to get one bad disk straight from new!

With respect to SSD's one option for a small sized database is 2xSSD in
RAID1 - provided they are the *right* SSD that is, which at this point
in time seems to be the Intel X25E. Note that I have not benchmarked
this configuration, so no guarantees that it (or the Intel SSDs
themselves) are as good as the various on-the-web tests indicate!

regards

Mark

Re: Six PostgreSQL questions from a pokerplayer

From
Scott Carey
Date:

On 7/5/09 11:13 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote:

> Craig Ringer wrote:
>> On Sat, 2009-07-04 at 11:51 -0700, Patvs wrote:
>>
>>
>>
>>> With 4 regular harddisks in RAID0 you get great read/write speeds, but the
>>> SSDs excel in IO/s and a 0.1ms access time.
>>>
>>
>> ... but are often really, really, really, really slow at writing. The
>> fancier ones are fast at writing but generally slow down over time.
>>
>>
>
> Also, (probably pointing out the obvious here) to be on the safe side
> you should avoid RAID0 for any data that is important to you - as it's
> pretty easy to get one bad disk straight from new!
>
> With respect to SSD's one option for a small sized database is 2xSSD in
> RAID1 - provided they are the *right* SSD that is, which at this point
> in time seems to be the Intel X25E. Note that I have not benchmarked
> this configuration, so no guarantees that it (or the Intel SSDs
> themselves) are as good as the various on-the-web tests indicate!

There is no reason to go RAID 1 with SSD's if this is an end-user box and
the data is recoverable.   Unlike a hard drive, a decent SSD isn't expected
to go bad.  I have deployed over 150 Intel X25-M's and they all work
flawlessly.  Some had the 'slowdown' problem due to how they were written
to, but the recent firmware fixed that.  At this point, I consider a single
high quality SSD as more fault tolerant than software raid-1.

Unless there are lots of writes going on (I'm guessing its mostly read,
given the description) a single X25-M will make the DB go very fast
regardless of random or sequential access.

If the system is CPU bound, then getting a SSD like that won't help as much.
But I'd be willing to bet that in a normal PC or workstation I/O is the
limiting factor.   Some tuning of work_mem and shared_buffers might help
some too.

Use some monitoring tools (PerfMon 'Physical Disk' stats on windows) to see
if normal use is causing a lot of disk access.  If so, and especially if its
mostly reads, an Intel X-25M will make a huge difference.  If there is lots
of writes, an X-25E will do but its 40% the space for the same price.

>
> regards
>
> Mark
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: Six PostgreSQL questions from a pokerplayer

From
Scott Carey
Date:
On 7/6/09 1:43 AM, "Scott Carey" <scott@richrelevance.com> wrote:

>
>
>
> On 7/5/09 11:13 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote:
>
>> Craig Ringer wrote:
>>> On Sat, 2009-07-04 at 11:51 -0700, Patvs wrote:
>>>
> There is no reason to go RAID 1 with SSD's if this is an end-user box and
> the data is recoverable.   Unlike a hard drive, a decent SSD isn't expected
> to go bad.

Clarification -- normal hard drives are expected to have a chance of dying
within the first few months, or days.  SSD's are expected to wear down
slowly and die eventually -- but better ones will do so by entering a
read-only state.


Re: Six PostgreSQL questions from a pokerplayer

From
Greg Stark
Date:
On Sat, Jul 4, 2009 at 7:51 PM, Patvs<patvs@chello.nl> wrote:
> -4 One a scale from 1 to 10, how significant are the following on
> performance increase:
> -[ ] Getting a faster harddisk (RAID or a SSD)
> -[ ] Getting a faster CPU
> -[ ] Upgrading PostgreSQL (8.2 and 8.3) to 8.4
> -[ ] Tweaking PostgreSQL (increasing # shared_buffers, wal_buffers,
> effective_cache_size, etc.)
> -[10!] Something else?

It sounds like you have specific performance problems you're trying to
address. Given the use case it seems surprising that you're looking at
such heavy-duty hardware. It seems more likely that
PokerTracker/Holdem Manager is missing some indexes in its schema or
that some queries could be tweaked to run more efficiently.

Perhaps if you set log_statement_duration and send any slow queries
here we would find a problem that could be fixed.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: Six PostgreSQL questions from a pokerplayer

From
Stephen Frost
Date:
* Craig Ringer (craig@postnewspapers.com.au) wrote:
> What that does mean, though, is that if you don't have significantly
> more RAM than a 32-bit machine can address (say, 6 to 8 GB), you should
> stick with 32-bit binaries.

I'm not sure this is always true since on the amd64/em64t platforms
you'll get more registers and whatnot in 64-bit mode which can offset
the pointer size increases.

    Thanks,

        Stephen

Attachment

Re: Six PostgreSQL questions from a pokerplayer

From
Greg Smith
Date:
On Sat, 4 Jul 2009, Patvs wrote:

> I use poker software (HoldemManager) to keep track of the statistics (and
> show nice graphs) of millions of poker hand histories.
> This software (also PokerTracker 3) imports all the poker hands in
> PostgreSQL.

I've got about 200MB of PokerTracker data myself in a PostgreSQL database,
pretty familiar with what you're doing.

1) I don't think there's much that software does that will take advantage
of multiple cores.  You might get better real-time performance while
playing in that case, because you can have database/hand history
program/table processes all doing their own thing at once, but the
database itself isn't going to benefit from more cores.

2) The main performance benefit of 8.4 kicks in when you're deleting data.
Since that's not happening in your hand history database, I wouldn't
expect that to run any better than 8.3.  Eventually you might see the
software rewritten to take advantage of the new programming features added
in 8.4, that might give the newer version a significant advantage
eventually; until then, 8.3 will run at about the same speed.

3) There's not much reason for you to consider running in 64 bits, you
would need to be on something other than Windows to fully take advantage
of that.  The database server doesn't support it yet on that platform
partly because there's so little to gain:
http://wiki.postgresql.org/wiki/64bit_Windows_port

4) None of your options are the right first step.  The best thing you
could do to improve performance here is add significantly more RAM to your
server, so that more hand data could be stored there.  That will help you
out more than adding more cores, and you'll need a 64-bit Windows to fully
take advantage of it--but you don't need to give that memory directly to a
64-bit database to see that gain.  If you're not running with at least 8GB
or RAM, nothing else you can do will give you as much bang for your buck
as upgrading to there (pretty easy on a lot of desktops, harder to get
into a portable).  Along with that, you might as well follow the basic
tuning guide at
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server and get some
of the basics done correctly.  You may find correctly setting
effective_cache_size, default_statistics_target, and work_mem in
particular could give you better results when running queries against the
database; a modest bump to shared_buffers might help too, but you can't go
crazy there on Windows.  The defaults really aren't set well for as much
data as you've got in a small number of tables.

5) It's hard to imagine your use case involving anything but random I/O,
particularly if you have a decent amount of memory in the system, so a SSD
should be significantly better than your other disk options here.  That
would be the third area for improvement after getting the memory and basic
database parameters are set correctly if I were tuning your system.

6) Normally to change the locale you have to shutdown the database, delete
its data directory, and then run the "initdb" command with appropriate
options to use an alternate locale.  I thought the one-click installer
handled that though--the screen shots at
http://www.enterprisedb.com/learning/pginst_guide.do show the "Advanced
Options" page allowing one to set the locale.  This is really the wrong
list for that questions--if you still have trouble there, try sending
something with *just* that one to the pgsql-general list instead.  From
the replies you've gotten here you can see everyone is fixed on the
performance questions, and this one is buried at the bottom of your long
message.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Six PostgreSQL questions from a pokerplayer

From
Dave Page
Date:
On Mon, Jul 6, 2009 at 2:26 PM, Greg Smith<gsmith@gregsmith.com> wrote:

> 6) Normally to change the locale you have to shutdown the database, delete
> its data directory, and then run the "initdb" command with appropriate
> options to use an alternate locale.  I thought the one-click installer
> handled that though--the screen shots at
> http://www.enterprisedb.com/learning/pginst_guide.do show the "Advanced
> Options" page allowing one to set the locale.  This is really the wrong list
> for that questions--if you still have trouble there, try sending something
> with *just* that one to the pgsql-general list instead.  From the replies
> you've gotten here you can see everyone is fixed on the performance
> questions, and this one is buried at the bottom of your long message.

On Windows, the installer will always use utf-8, as it's the only
encoding we know should work with any locale on that platform (and
there's no easy way of figuring out other combinations without trying
them). We intentionally don't make SQL_ASCII available, as we consider
that to be an 'expert' choice which regularly gets misused. To get
round that if you really need to, either manually init a new cluster
using initdb, or do something like:

CREATE DATABASE foo WITH ENCODING 'SQL_ASCII' TEMPLATE template0;

to get a single database in SQL_ASCII.

--
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Six PostgreSQL questions from a pokerplayer

From
Mark Mielke
Date:
On 07/06/2009 06:23 AM, Stephen Frost wrote:
* Craig Ringer (craig@postnewspapers.com.au) wrote: 
What that does mean, though, is that if you don't have significantly
more RAM than a 32-bit machine can address (say, 6 to 8 GB), you should
stick with 32-bit binaries.   
I'm not sure this is always true since on the amd64/em64t platforms
you'll get more registers and whatnot in 64-bit mode which can offset
the pointer size increases. 

Which leads to other things like faster calling conventions...

Even if you only have 4 GB of RAM, the 32-bit kernel needs to fight with "low memory" vs "high memory", whereas 64-bit has a clean address space.

All things being equal, I recommend 64-bit.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

Re: Six PostgreSQL questions from a pokerplayer

From
Craig Ringer
Date:
On Mon, 2009-07-06 at 15:27 -0400, Mark Mielke wrote:

> Even if you only have 4 GB of RAM, the 32-bit kernel needs to fight
> with "low memory" vs "high memory", whereas 64-bit has a clean address
> space.

That's a good point. The cutoff is probably closer to 2G or at most 3G.
Certainly it's madness to use hacks like PAE to gain access to the RAM
behind the PCI address space rather than just going 64-bit ... unless
you have a really pressing reason, at least.

It's also nice that on a 64 bit machine, there's no 2G/2G or 3G/1G
userspace/kernelspace address mapping split to limit your app's memory
use. I seem to recall that Windows uses 2G/2G which can be painfully
limiting for memory-hungry applications.

Personally, I'd probably go 64-bit on any reasonably modern machine that
could be expected to have more than 2 or 3 GB of RAM. Then again, I
can't imagine willingly building a production database server for any
non-trivial (ie > a couple of gigs) database with less than 8GB of RAM
with RAM prices so absurdly low. Skip-lunch-to-afford-more-RAM low.

--
Craig Ringer


Re: Six PostgreSQL questions from a pokerplayer

From
Scott Marlowe
Date:
On Mon, Jul 6, 2009 at 10:51 PM, Craig
Ringer<craig@postnewspapers.com.au> wrote:
>
> Personally, I'd probably go 64-bit on any reasonably modern machine that
> could be expected to have more than 2 or 3 GB of RAM. Then again, I
> can't imagine willingly building a production database server for any
> non-trivial (ie > a couple of gigs) database with less than 8GB of RAM
> with RAM prices so absurdly low. Skip-lunch-to-afford-more-RAM low.

Exactly, I was pricing out a new db server at work, and the difference
in cost on a $7000 or so machine was something like $250 or so to go
from 16G to 32G of RAM.

I also can't imagine running a large pgsql server on windows, even 64
bit windows.