Thread: Six PostgreSQL questions from a pokerplayer
I use poker software (HoldemManager) to keep track of the statistics (and show nice graphs) of millions of poker hand histories. This software (also PokerTracker 3) imports all the poker hands in PostgreSQL. The software runs on Windows) only. All of its users have NORMAL PCs. From single-core laptops, to a quadcore desktop at best. Questions: -1 [quote] "POSTGRESQL uses a multi-process model. Because of this, all multi-cpu operating systems can spread multiple database connections among the available CPUs. However, if only a single database connection is active, it can only use one CPU. POSTGRESQL does not use multi-threading to allow a single process to use multiple CPUs."[/quote] I can see two databases in my pgAdmin: postgres and HoldemManager. All the poker data (about 30 GB of data) is in the HoldemManager database. Does the quote above (if true?) means, having a 2 Ghz single core or a Xeon 2x quadcore (8x 2 Ghz cores) will make no real difference for my performance? And the real performance increase is only for professional servers running multiple databases? Will I greatly benefit from having quad instead of a single-core system? -2 In the recent 8.3 vs 8.4 benchmarks, 8.4. was much faster than 8.3 running on a 16 and 32 core server (with 64GB RAM). With 8 cores, they were about the same speed. Does this mean on a normal single core computer, there will be NO NOTICABLE performance increase in 8.3 vs 8.4 and even 8.2? -3 [quote] "With PostgreSQL, you could easily have more than 1GB per backend (if necessary) without running out of memory, which significantly pushes away the point when you need to go to 64-bit. In some cases it may actually be better to run a 32-bit build of PostgreSQL to reduce memory usage. In a 64-bit server, every pointer and every integer will take twice as much space as in a 32bit server. That overhead can be significant, and is most likely unnecessary." [/quote] I have no idea what the maximum amount of RAM is, my database uses. But what exactly "will take twice as much space"? Does this mean a simple database uses double the amount of RAM on a 64 bit system? And it's probably better for my 30 GB database to run a 32-bit build of PostgreSQL to reduce memory usage? -4 One a scale from 1 to 10, how significant are the following on performance increase: -[ ] Getting a faster harddisk (RAID or a SSD) -[ ] Getting a faster CPU -[ ] Upgrading PostgreSQL (8.2 and 8.3) to 8.4 -[ ] Tweaking PostgreSQL (increasing # shared_buffers, wal_buffers, effective_cache_size, etc.) -[10!] Something else? -[ ] Does NOT effect me, but I was wondering what a switch from Windows to LINUX/Solaris does for professional server users in terms of performance. -5 The IO operations/s performance of your harddisk vs read/write speeds vs access time? What is more important? With 4 regular harddisks in RAID0 you get great read/write speeds, but the SSDs excel in IO/s and a 0.1ms access time. What is the most usefull for which situations? -6 The 8.4.0-1 one-click installer automatically set the encoding to UTF8. With the other installers, I was able to change the encoding to SQL_ASCII during the installation process. How do I solve this after I've installed 8.4.0-1? (I was unable to delete the postgres database, so I couldn't create a new one with the right encoding in 8.4.0-1) -- View this message in context: http://www.nabble.com/Six-PostgreSQL-questions-from-a-pokerplayer-tp24337072p24337072.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Sat, 2009-07-04 at 11:51 -0700, Patvs wrote: > I can see two databases in my pgAdmin: postgres and HoldemManager. All the > poker data (about 30 GB of data) is in the HoldemManager database. > Does the quote above (if true?) means, having a 2 Ghz single core or a Xeon > 2x quadcore (8x 2 Ghz cores) will make no real difference for my > performance? What matters isn't the number of databases, but the number of connections. Any given connection can use at most one full core. If you have only one actively working connection you will still gain a bit of performance from having a second core that can do other misc work for the OS, I/O management and general housekeeping so that the first core can be fully dedicated to the active pg backend. More than that probably won't gain you anything. If you want to improve performance, first learn about where your code is bottlenecked. Is it even CPU-limited? Often databases are really limited by disk I/O performance rather than CPU time. If it is CPU-limited, you might gain from having fewer faster cores, and/or significantly faster RAM. If it's not CPU-limited, you'd be wasting time effort and money upgrading those parts. > -2 In the recent 8.3 vs 8.4 benchmarks, 8.4. was much faster than 8.3 > running on a 16 and 32 core server (with 64GB RAM). > With 8 cores, they were about the same speed. Does this mean on a normal > single core computer, there will be NO NOTICABLE performance increase in 8.3 > vs 8.4 and even 8.2? Benchmark it and see. It'll be rather workload-dependent. > I have no idea what the maximum amount of RAM is, my database uses. But what > exactly "will take twice as much space"? > Does this mean a simple database uses double the amount of RAM on a 64 bit > system? Absolutely not. Certain data structures take up more room because of alignment/padding concerns, pointer size increases, etc. That does mean that you can fit fewer of them into a given amount of memory, but it's not a simple doubling by any stretch. What that does mean, though, is that if you don't have significantly more RAM than a 32-bit machine can address (say, 6 to 8 GB), you should stick with 32-bit binaries. > -4 One a scale from 1 to 10, how significant are the following on > performance increase: > -[ ] Getting a faster harddisk (RAID or a SSD) > -[ ] Getting a faster CPU > -[ ] Upgrading PostgreSQL (8.2 and 8.3) to 8.4 > -[ ] Tweaking PostgreSQL (increasing # shared_buffers, wal_buffers, > effective_cache_size, etc.) > -[10!] Something else? Very workload dependent. Analyse what parts of your system are busiest and which are largely idle while Pg is working hard, then consider upgrading the busy bits. Tweaking Pg again depends a lot on workload. Sometimes you won't gain much, sometimes you'll see incredible gains (say, if you increase sort/working memory\ so a sort that used to spill to disk can instead be done in RAM). If you have very few connections and they do really complex queries, you might benefit from dramatically increasing work mem etc. > -[ ] Does NOT effect me, but I was wondering what a switch from Windows to > LINUX/Solaris does for professional server users in terms of performance. Not a bad plan, honestly. Pg is just more mature on UNIX/Linux at this point. > -5 The IO operations/s performance of your harddisk vs read/write speeds vs > access time? What is more important? Depends on workload. If you're doing lots of sequential scans, you want really fast sequential reads. If you're doing lots of index scans etc, you will benefit from both sequential read speed and access time. If you have particular queries you note are slow, consider running them with EXPLAIN ANALYZE to see what their query plans are. What disk access patterns are the queries resulting in? Do they have sorts spilling to disk? etc. > With 4 regular harddisks in RAID0 you get great read/write speeds, but the > SSDs excel in IO/s and a 0.1ms access time. ... but are often really, really, really, really slow at writing. The fancier ones are fast at writing but generally slow down over time. > What is the most usefull for which situations? Depends on your workload, see above. -- Craig Ringer
Craig Ringer wrote: > On Sat, 2009-07-04 at 11:51 -0700, Patvs wrote: > > > >> With 4 regular harddisks in RAID0 you get great read/write speeds, but the >> SSDs excel in IO/s and a 0.1ms access time. >> > > ... but are often really, really, really, really slow at writing. The > fancier ones are fast at writing but generally slow down over time. > > Also, (probably pointing out the obvious here) to be on the safe side you should avoid RAID0 for any data that is important to you - as it's pretty easy to get one bad disk straight from new! With respect to SSD's one option for a small sized database is 2xSSD in RAID1 - provided they are the *right* SSD that is, which at this point in time seems to be the Intel X25E. Note that I have not benchmarked this configuration, so no guarantees that it (or the Intel SSDs themselves) are as good as the various on-the-web tests indicate! regards Mark
On 7/5/09 11:13 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote: > Craig Ringer wrote: >> On Sat, 2009-07-04 at 11:51 -0700, Patvs wrote: >> >> >> >>> With 4 regular harddisks in RAID0 you get great read/write speeds, but the >>> SSDs excel in IO/s and a 0.1ms access time. >>> >> >> ... but are often really, really, really, really slow at writing. The >> fancier ones are fast at writing but generally slow down over time. >> >> > > Also, (probably pointing out the obvious here) to be on the safe side > you should avoid RAID0 for any data that is important to you - as it's > pretty easy to get one bad disk straight from new! > > With respect to SSD's one option for a small sized database is 2xSSD in > RAID1 - provided they are the *right* SSD that is, which at this point > in time seems to be the Intel X25E. Note that I have not benchmarked > this configuration, so no guarantees that it (or the Intel SSDs > themselves) are as good as the various on-the-web tests indicate! There is no reason to go RAID 1 with SSD's if this is an end-user box and the data is recoverable. Unlike a hard drive, a decent SSD isn't expected to go bad. I have deployed over 150 Intel X25-M's and they all work flawlessly. Some had the 'slowdown' problem due to how they were written to, but the recent firmware fixed that. At this point, I consider a single high quality SSD as more fault tolerant than software raid-1. Unless there are lots of writes going on (I'm guessing its mostly read, given the description) a single X25-M will make the DB go very fast regardless of random or sequential access. If the system is CPU bound, then getting a SSD like that won't help as much. But I'd be willing to bet that in a normal PC or workstation I/O is the limiting factor. Some tuning of work_mem and shared_buffers might help some too. Use some monitoring tools (PerfMon 'Physical Disk' stats on windows) to see if normal use is causing a lot of disk access. If so, and especially if its mostly reads, an Intel X-25M will make a huge difference. If there is lots of writes, an X-25E will do but its 40% the space for the same price. > > regards > > Mark > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On 7/6/09 1:43 AM, "Scott Carey" <scott@richrelevance.com> wrote: > > > > On 7/5/09 11:13 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote: > >> Craig Ringer wrote: >>> On Sat, 2009-07-04 at 11:51 -0700, Patvs wrote: >>> > There is no reason to go RAID 1 with SSD's if this is an end-user box and > the data is recoverable. Unlike a hard drive, a decent SSD isn't expected > to go bad. Clarification -- normal hard drives are expected to have a chance of dying within the first few months, or days. SSD's are expected to wear down slowly and die eventually -- but better ones will do so by entering a read-only state.
On Sat, Jul 4, 2009 at 7:51 PM, Patvs<patvs@chello.nl> wrote: > -4 One a scale from 1 to 10, how significant are the following on > performance increase: > -[ ] Getting a faster harddisk (RAID or a SSD) > -[ ] Getting a faster CPU > -[ ] Upgrading PostgreSQL (8.2 and 8.3) to 8.4 > -[ ] Tweaking PostgreSQL (increasing # shared_buffers, wal_buffers, > effective_cache_size, etc.) > -[10!] Something else? It sounds like you have specific performance problems you're trying to address. Given the use case it seems surprising that you're looking at such heavy-duty hardware. It seems more likely that PokerTracker/Holdem Manager is missing some indexes in its schema or that some queries could be tweaked to run more efficiently. Perhaps if you set log_statement_duration and send any slow queries here we would find a problem that could be fixed. -- greg http://mit.edu/~gsstark/resume.pdf
* Craig Ringer (craig@postnewspapers.com.au) wrote: > What that does mean, though, is that if you don't have significantly > more RAM than a 32-bit machine can address (say, 6 to 8 GB), you should > stick with 32-bit binaries. I'm not sure this is always true since on the amd64/em64t platforms you'll get more registers and whatnot in 64-bit mode which can offset the pointer size increases. Thanks, Stephen
Attachment
On Sat, 4 Jul 2009, Patvs wrote: > I use poker software (HoldemManager) to keep track of the statistics (and > show nice graphs) of millions of poker hand histories. > This software (also PokerTracker 3) imports all the poker hands in > PostgreSQL. I've got about 200MB of PokerTracker data myself in a PostgreSQL database, pretty familiar with what you're doing. 1) I don't think there's much that software does that will take advantage of multiple cores. You might get better real-time performance while playing in that case, because you can have database/hand history program/table processes all doing their own thing at once, but the database itself isn't going to benefit from more cores. 2) The main performance benefit of 8.4 kicks in when you're deleting data. Since that's not happening in your hand history database, I wouldn't expect that to run any better than 8.3. Eventually you might see the software rewritten to take advantage of the new programming features added in 8.4, that might give the newer version a significant advantage eventually; until then, 8.3 will run at about the same speed. 3) There's not much reason for you to consider running in 64 bits, you would need to be on something other than Windows to fully take advantage of that. The database server doesn't support it yet on that platform partly because there's so little to gain: http://wiki.postgresql.org/wiki/64bit_Windows_port 4) None of your options are the right first step. The best thing you could do to improve performance here is add significantly more RAM to your server, so that more hand data could be stored there. That will help you out more than adding more cores, and you'll need a 64-bit Windows to fully take advantage of it--but you don't need to give that memory directly to a 64-bit database to see that gain. If you're not running with at least 8GB or RAM, nothing else you can do will give you as much bang for your buck as upgrading to there (pretty easy on a lot of desktops, harder to get into a portable). Along with that, you might as well follow the basic tuning guide at http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server and get some of the basics done correctly. You may find correctly setting effective_cache_size, default_statistics_target, and work_mem in particular could give you better results when running queries against the database; a modest bump to shared_buffers might help too, but you can't go crazy there on Windows. The defaults really aren't set well for as much data as you've got in a small number of tables. 5) It's hard to imagine your use case involving anything but random I/O, particularly if you have a decent amount of memory in the system, so a SSD should be significantly better than your other disk options here. That would be the third area for improvement after getting the memory and basic database parameters are set correctly if I were tuning your system. 6) Normally to change the locale you have to shutdown the database, delete its data directory, and then run the "initdb" command with appropriate options to use an alternate locale. I thought the one-click installer handled that though--the screen shots at http://www.enterprisedb.com/learning/pginst_guide.do show the "Advanced Options" page allowing one to set the locale. This is really the wrong list for that questions--if you still have trouble there, try sending something with *just* that one to the pgsql-general list instead. From the replies you've gotten here you can see everyone is fixed on the performance questions, and this one is buried at the bottom of your long message. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, Jul 6, 2009 at 2:26 PM, Greg Smith<gsmith@gregsmith.com> wrote: > 6) Normally to change the locale you have to shutdown the database, delete > its data directory, and then run the "initdb" command with appropriate > options to use an alternate locale. I thought the one-click installer > handled that though--the screen shots at > http://www.enterprisedb.com/learning/pginst_guide.do show the "Advanced > Options" page allowing one to set the locale. This is really the wrong list > for that questions--if you still have trouble there, try sending something > with *just* that one to the pgsql-general list instead. From the replies > you've gotten here you can see everyone is fixed on the performance > questions, and this one is buried at the bottom of your long message. On Windows, the installer will always use utf-8, as it's the only encoding we know should work with any locale on that platform (and there's no easy way of figuring out other combinations without trying them). We intentionally don't make SQL_ASCII available, as we consider that to be an 'expert' choice which regularly gets misused. To get round that if you really need to, either manually init a new cluster using initdb, or do something like: CREATE DATABASE foo WITH ENCODING 'SQL_ASCII' TEMPLATE template0; to get a single database in SQL_ASCII. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
On 07/06/2009 06:23 AM, Stephen Frost wrote:
Which leads to other things like faster calling conventions...
Even if you only have 4 GB of RAM, the 32-bit kernel needs to fight with "low memory" vs "high memory", whereas 64-bit has a clean address space.
All things being equal, I recommend 64-bit.
Cheers,
mark
* Craig Ringer (craig@postnewspapers.com.au) wrote:What that does mean, though, is that if you don't have significantly more RAM than a 32-bit machine can address (say, 6 to 8 GB), you should stick with 32-bit binaries.I'm not sure this is always true since on the amd64/em64t platforms you'll get more registers and whatnot in 64-bit mode which can offset the pointer size increases.
Which leads to other things like faster calling conventions...
Even if you only have 4 GB of RAM, the 32-bit kernel needs to fight with "low memory" vs "high memory", whereas 64-bit has a clean address space.
All things being equal, I recommend 64-bit.
Cheers,
mark
-- Mark Mielke <mark@mielke.cc>
On Mon, 2009-07-06 at 15:27 -0400, Mark Mielke wrote: > Even if you only have 4 GB of RAM, the 32-bit kernel needs to fight > with "low memory" vs "high memory", whereas 64-bit has a clean address > space. That's a good point. The cutoff is probably closer to 2G or at most 3G. Certainly it's madness to use hacks like PAE to gain access to the RAM behind the PCI address space rather than just going 64-bit ... unless you have a really pressing reason, at least. It's also nice that on a 64 bit machine, there's no 2G/2G or 3G/1G userspace/kernelspace address mapping split to limit your app's memory use. I seem to recall that Windows uses 2G/2G which can be painfully limiting for memory-hungry applications. Personally, I'd probably go 64-bit on any reasonably modern machine that could be expected to have more than 2 or 3 GB of RAM. Then again, I can't imagine willingly building a production database server for any non-trivial (ie > a couple of gigs) database with less than 8GB of RAM with RAM prices so absurdly low. Skip-lunch-to-afford-more-RAM low. -- Craig Ringer
On Mon, Jul 6, 2009 at 10:51 PM, Craig Ringer<craig@postnewspapers.com.au> wrote: > > Personally, I'd probably go 64-bit on any reasonably modern machine that > could be expected to have more than 2 or 3 GB of RAM. Then again, I > can't imagine willingly building a production database server for any > non-trivial (ie > a couple of gigs) database with less than 8GB of RAM > with RAM prices so absurdly low. Skip-lunch-to-afford-more-RAM low. Exactly, I was pricing out a new db server at work, and the difference in cost on a $7000 or so machine was something like $250 or so to go from 16G to 32G of RAM. I also can't imagine running a large pgsql server on windows, even 64 bit windows.