Thread: PG-related ACM Article: "The Pathologies of Big Data"
Just stumbled across this recent article published in the Communications of the ACM: http://cacm.acm.org/magazines/2009/8/34493-the-pathologies-of-big-data/fulltext The author shares some insights relating to difficulties processing a 6.75 billion-row table, a dummy table representing census-type data for everyone on earth, in Postgres. I'd really like to replicate the author's experiment, but it's not clear from the article what his table definition looks like. He claims to be using a 16-byte record to store the several columns he needs for each row, so perhaps he's using a user-defined type? The author implies with his definition of "big data" that the dataset he analyzed is "... too large to be placed in a relational database... ". From Fig. 2, the SELECT query he ran took just under 10^5 seconds (~28 hours) when run on 6.75 billion rows. This amount of time for the query didn't seem surprising to me given how many rows he has to process, but in a recent post on comp.databases.ingres someone claimed that on a far-inferior PC, Ingres ran the same SELECT query in 105 minutes! This would be very impressive (a 10-fold improvement over Postgres) if true. The author complained that "on larger tables [Postgres' planner] switched to sorting by grouping columns", which he blamed for the slow query execution. I don't personally see this plan as a problem, but maybe someone can enlighten me. One intriguing tidbit I picked up from the article: "in modern systems, as demonstrated in the figure, random access to memory is typically slower than sequential access to disk." In hindsight, this seems plausible (since modern disks can sustain sequential reads at well over 100MB/sec). Anyway, it would be very interesting to attempt to speed up the author's query if at all possible.
On Fri, Aug 7, 2009 at 9:17 PM, Josh Kupershmidt<schmiddy@gmail.com> wrote: > Just stumbled across this recent article published in the > Communications of the ACM: > > http://cacm.acm.org/magazines/2009/8/34493-the-pathologies-of-big-data/fulltext > > The author shares some insights relating to difficulties processing a > 6.75 billion-row > table, a dummy table representing census-type data for everyone on earth, in > Postgres. > > I'd really like to replicate the author's experiment, but it's not clear from > the article what his table definition looks like. He claims to be using a > 16-byte record to store the several columns he needs for each row, so perhaps > he's using a user-defined type? or four integers, or who knows. Postgres's per-row overhead is 24 bytes plus a 16-bit line pointer so you're talking about 42 bytes per row. There's per-page overhead and alignment but in this case it shouldn't be much. > The author implies with his definition of "big data" that the dataset he > analyzed is "... too large to be placed in a relational database... ". From > Fig. 2, the SELECT query he ran took just under 10^5 seconds (~28 hours) when > run on 6.75 billion rows. This amount of time for the query didn't seem > surprising to me given how many rows he has to process, but in a recent post > on comp.databases.ingres someone claimed that on a far-inferior PC, Ingres > ran the same SELECT query in 105 minutes! This would be very impressive (a > 10-fold improvement over Postgres) if true. 6.75 billion * 42 bytes is 283.5GB. Assuming you stick that on a single spindle capable of 100MB/s: You have: 283.5GB / (100MB/s) You want: min * 47.25 So something's not adding up. > One intriguing tidbit I picked up from the article: "in modern systems, as > demonstrated in the figure, random access to memory is typically slower than > sequential access to disk." In hindsight, this seems plausible (since modern > disks can sustain sequential reads at well over 100MB/sec). Sure, but the slowest PCIe bus can sustain 1GB/s and your memory bandwidth is probably at least 8GB/s. -- greg http://mit.edu/~gsstark/resume.pdf
On Fri, Aug 7, 2009 at 2:17 PM, Josh Kupershmidt<schmiddy@gmail.com> wrote: > Just stumbled across this recent article published in the > Communications of the ACM: > > http://cacm.acm.org/magazines/2009/8/34493-the-pathologies-of-big-data/fulltext > > The author shares some insights relating to difficulties processing a > 6.75 billion-row > table, a dummy table representing census-type data for everyone on earth, in > Postgres. > > I'd really like to replicate the author's experiment, but it's not clear from > the article what his table definition looks like. He claims to be using a > 16-byte record to store the several columns he needs for each row, so perhaps > he's using a user-defined type? > > The author implies with his definition of "big data" that the dataset he > analyzed is "... too large to be placed in a relational database... ". From > Fig. 2, the SELECT query he ran took just under 10^5 seconds (~28 hours) when > run on 6.75 billion rows. This amount of time for the query didn't seem > surprising to me given how many rows he has to process, but in a recent post > on comp.databases.ingres someone claimed that on a far-inferior PC, Ingres > ran the same SELECT query in 105 minutes! This would be very impressive (a > 10-fold improvement over Postgres) if true. Well, from the article, I got the feeling he never showed up here on the list to ask for help, and he just assumed he knew enough about postgresql to say it couldn't scale well. I just checked the archives, and his name doesn't show up. When you look at his slides, this one makes we wonder about a few points: http://deliveryimages.acm.org/10.1145/1540000/1536632/figs/f3.jpg He was using 8 15kSAS in RAID-5. Just the fact that he's using RAID-5 to test makes me wonder, but for his mostly-read workload it's useful. But on his machine he was only getting 53MB/second sequential reads? That makes no sense. I was getting 50MB/s from a 4 disk SATA RAID on older 120G hard drives years ago. SAS drives haven't been around that long really, so I can't imagine having 7 disks (1 for parity) and only getting 53/7 or 7.5MB/second from them. That's horrible. I had 9 Gig 5.25 full height drives faster than that back in the day, on eight bit scsi controllers. His memory read speed was pretty bad too at only 350MB/s. I have a 12 drive RAID-10 that can outrun his memory reads. So I tend to think his OS was setup poorly, or his hardware was broken, or something like that. > The author complained that "on larger tables [Postgres' planner] switched to > sorting by grouping columns", which he blamed for the slow query execution. I > don't personally see this plan as a problem, but maybe someone can enlighten > me. I'm sure that if he was on faster hardware it might have been quite a bit faster. I'd love to try his test on a real server with RAID-10 and lots of memory. I'm certain I could get the run time down by a couple factors. I wonder if he cranked up work_mem? I wonder if he even upped shared_buffers? > One intriguing tidbit I picked up from the article: "in modern systems, as > demonstrated in the figure, random access to memory is typically slower than > sequential access to disk." In hindsight, this seems plausible (since modern > disks can sustain sequential reads at well over 100MB/sec). This is generally always true. But his numbers are off by factors for a modern system. Pentium IIs could sequentially read in the several hundreds of megs per second from memory. Any modern piece of kit, including my laptop, can do much much better than 350Meg/second from memory. I wonder if he'd make his work available to mess with, as it seems he did a pretty poor job setting up his database server / OS for this test. At the very least I wonder if he has a colleague on this list who might point him to us so we can try to help him improve the dismal performance he seems to be getting. Or maybe he could just google "postgresql performance tuning" and take it from there...
Oh I just noticed his graphic is "values per second" but he had originally said they were 16 bit values. Even if they were 32 or 64 bit values, I'd expect way more than what he's getting there. On Fri, Aug 7, 2009 at 6:40 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > Well, from the article, I got the feeling he never showed up here on > the list to ask for help, and he just assumed he knew enough about > postgresql to say it couldn't scale well. I just checked the > archives, and his name doesn't show up. > > When you look at his slides, this one makes we wonder about a few points: > > http://deliveryimages.acm.org/10.1145/1540000/1536632/figs/f3.jpg >
Well, there is CPU overhead for reading postgres pages and tuples. On a disk subsystem that gets 1GB/sec sequential reads, I can't get more than about 700MB/sec of I/O and on a select count(*) query on very large tables with large rows (600 bytes) and its closer to 300MB/sec if the rows are smaller (75 bytes). In both cases it is CPU bound with little i/o wait and disk utilization under 65% in iostat. I also get over 13GB/sec to RAM from a single thread (Nehalem processor). I don't see how on any recent hardware, random access to RAM is slower than sequential from disk. RAM access, random or not, is measured in GB/sec... On 8/7/09 5:42 PM, "Scott Marlowe" <scott.marlowe@gmail.com> wrote: > Oh I just noticed his graphic is "values per second" but he had > originally said they were 16 bit values. Even if they were 32 or 64 > bit values, I'd expect way more than what he's getting there. > > On Fri, Aug 7, 2009 at 6:40 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: >> Well, from the article, I got the feeling he never showed up here on >> the list to ask for help, and he just assumed he knew enough about >> postgresql to say it couldn't scale well. I just checked the >> archives, and his name doesn't show up. >> >> When you look at his slides, this one makes we wonder about a few points: >> >> http://deliveryimages.acm.org/10.1145/1540000/1536632/figs/f3.jpg >> > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Fri, Aug 7, 2009 at 7:34 PM, Scott Carey<scott@richrelevance.com> wrote: > Well, there is CPU overhead for reading postgres pages and tuples. On a > disk subsystem that gets 1GB/sec sequential reads, I can't get more than > about 700MB/sec of I/O and on a select count(*) query on very large tables > with large rows (600 bytes) and its closer to 300MB/sec if the rows are > smaller (75 bytes). In both cases it is CPU bound with little i/o wait and > disk utilization under 65% in iostat. > > I also get over 13GB/sec to RAM from a single thread (Nehalem processor). > > I don't see how on any recent hardware, random access to RAM is slower than > sequential from disk. RAM access, random or not, is measured in GB/sec... I don't think anybody's arguing that.
>> I don't see how on any recent hardware, random access to RAM is slower >> than >> sequential from disk. RAM access, random or not, is measured in >> GB/sec... > > I don't think anybody's arguing that. http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2795&p=5 These guys mention about 50 ns memory latency ; this would translate into 20 million memory "seeks" per second, which is in the same ballpark as the numbers given by the article... If you count 10GB/s bandwidth, 50 ns is the time to fetch 500 bytes.