Thread: PG-related ACM Article: "The Pathologies of Big Data"

PG-related ACM Article: "The Pathologies of Big Data"

From

Josh Kupershmidt

Date:

07 August 2009, 21:09:20

Just stumbled across this recent article published in the
Communications of the ACM:

http://cacm.acm.org/magazines/2009/8/34493-the-pathologies-of-big-data/fulltext

The author shares some insights relating to difficulties processing a
6.75 billion-row
table, a dummy table representing census-type data for everyone on earth, in
Postgres.

I'd really like to replicate the author's experiment, but it's not clear from
the article what his table definition looks like. He claims to be using a
16-byte record to store the several columns he needs for each row, so perhaps
he's using a user-defined type?

The author implies with his definition of "big data" that the dataset he
analyzed is "... too large to be placed in a relational database... ". From
Fig. 2, the SELECT query he ran took just under 10^5 seconds (~28 hours) when
run on 6.75 billion rows. This amount of time for the query didn't seem
surprising to me given how many rows he has to process, but in a recent post
on comp.databases.ingres someone claimed that on a far-inferior PC, Ingres
ran the same SELECT query in 105 minutes! This would be very impressive (a
10-fold improvement over Postgres) if true.

The author complained that "on larger tables [Postgres' planner] switched to
sorting by grouping columns", which he blamed for the slow query execution. I
don't personally see this plan as a problem, but maybe someone can enlighten
me.

One intriguing tidbit I picked up from the article: "in modern systems, as
demonstrated in the figure, random access to memory is typically slower than
sequential access to disk." In hindsight, this seems plausible (since modern
disks can sustain sequential reads at well over 100MB/sec).

Anyway, it would be very interesting to attempt to speed up the author's query
if at all possible.

Re: PG-related ACM Article: "The Pathologies of Big Data"

From

Greg Stark

Date:

07 August 2009, 21:28:53

On Fri, Aug 7, 2009 at 9:17 PM, Josh Kupershmidt<schmiddy@gmail.com> wrote:
> Just stumbled across this recent article published in the
> Communications of the ACM:
>
> http://cacm.acm.org/magazines/2009/8/34493-the-pathologies-of-big-data/fulltext
>
> The author shares some insights relating to difficulties processing a
> 6.75 billion-row
> table, a dummy table representing census-type data for everyone on earth, in
> Postgres.
>
> I'd really like to replicate the author's experiment, but it's not clear from
> the article what his table definition looks like. He claims to be using a
> 16-byte record to store the several columns he needs for each row, so perhaps
> he's using a user-defined type?

or four integers, or who knows. Postgres's per-row overhead is 24
bytes plus a 16-bit line pointer so you're talking about 42 bytes per
row. There's per-page overhead and alignment but in this case it
shouldn't be much.



> The author implies with his definition of "big data" that the dataset he
> analyzed is "... too large to be placed in a relational database... ". From
> Fig. 2, the SELECT query he ran took just under 10^5 seconds (~28 hours) when
> run on 6.75 billion rows. This amount of time for the query didn't seem
> surprising to me given how many rows he has to process, but in a recent post
> on comp.databases.ingres someone claimed that on a far-inferior PC, Ingres
> ran the same SELECT query in 105 minutes! This would be very impressive (a
> 10-fold improvement over Postgres) if true.

6.75 billion * 42 bytes is 283.5GB.

Assuming you stick that on a single spindle capable of 100MB/s:

You have: 283.5GB / (100MB/s)
You want: min
        * 47.25

So something's not adding up.

> One intriguing tidbit I picked up from the article: "in modern systems, as
> demonstrated in the figure, random access to memory is typically slower than
> sequential access to disk." In hindsight, this seems plausible (since modern
> disks can sustain sequential reads at well over 100MB/sec).

Sure, but the slowest PCIe bus can sustain 1GB/s and your memory
bandwidth is probably at least 8GB/s.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: PG-related ACM Article: "The Pathologies of Big Data"

From

Scott Marlowe

Date:

07 August 2009, 21:41:09

On Fri, Aug 7, 2009 at 2:17 PM, Josh Kupershmidt<schmiddy@gmail.com> wrote:
> Just stumbled across this recent article published in the
> Communications of the ACM:
>
> http://cacm.acm.org/magazines/2009/8/34493-the-pathologies-of-big-data/fulltext
>
> The author shares some insights relating to difficulties processing a
> 6.75 billion-row
> table, a dummy table representing census-type data for everyone on earth, in
> Postgres.
>
> I'd really like to replicate the author's experiment, but it's not clear from
> the article what his table definition looks like. He claims to be using a
> 16-byte record to store the several columns he needs for each row, so perhaps
> he's using a user-defined type?
>
> The author implies with his definition of "big data" that the dataset he
> analyzed is "... too large to be placed in a relational database... ". From
> Fig. 2, the SELECT query he ran took just under 10^5 seconds (~28 hours) when
> run on 6.75 billion rows. This amount of time for the query didn't seem
> surprising to me given how many rows he has to process, but in a recent post
> on comp.databases.ingres someone claimed that on a far-inferior PC, Ingres
> ran the same SELECT query in 105 minutes! This would be very impressive (a
> 10-fold improvement over Postgres) if true.

Well, from the article, I got the feeling he never showed up here on
the list to ask for help, and he just assumed he knew enough about
postgresql to say it couldn't scale well.  I just checked the
archives, and his name doesn't show up.

When you look at his slides, this one makes we wonder about a few points:

http://deliveryimages.acm.org/10.1145/1540000/1536632/figs/f3.jpg

He was using 8 15kSAS in RAID-5.  Just the fact that he's using RAID-5
to test makes me wonder, but for his mostly-read workload it's useful.
 But on his machine he was only getting 53MB/second sequential reads?
That makes no sense.  I was getting 50MB/s from a 4 disk SATA RAID on
older 120G hard drives years ago.  SAS drives haven't been around that
long really, so I can't imagine having 7 disks (1 for parity) and only
getting 53/7 or 7.5MB/second from them.  That's horrible.  I had 9 Gig
5.25 full height drives faster than that back in the day, on eight bit
scsi controllers.  His memory read speed was pretty bad too at only
350MB/s.  I have a 12 drive RAID-10 that can outrun his memory reads.
So I tend to think his OS was setup poorly, or his hardware was
broken, or something like that.

> The author complained that "on larger tables [Postgres' planner] switched to
> sorting by grouping columns", which he blamed for the slow query execution. I
> don't personally see this plan as a problem, but maybe someone can enlighten
> me.

I'm sure that if he was on faster hardware it might have been quite a
bit faster.  I'd love to try his test on a real server with RAID-10
and lots of memory.  I'm certain I could get the run time down by a
couple factors.

I wonder if he cranked up work_mem? I wonder if he even upped shared_buffers?

> One intriguing tidbit I picked up from the article: "in modern systems, as
> demonstrated in the figure, random access to memory is typically slower than
> sequential access to disk." In hindsight, this seems plausible (since modern
> disks can sustain sequential reads at well over 100MB/sec).

This is generally always true.  But his numbers are off by factors for
a modern system.  Pentium IIs could sequentially read in the several
hundreds of megs per second from memory.  Any modern piece of kit,
including my laptop, can do much much better than 350Meg/second from
memory.

I wonder if he'd make his work available to mess with, as it seems he
did a pretty poor job setting up his database server / OS for this
test.  At the very least I wonder if he has a colleague on this list
who might point him to us so we can try to help him improve the dismal
performance he seems to be getting.  Or maybe he could just google
"postgresql performance tuning" and take it from there...

Re: PG-related ACM Article: "The Pathologies of Big Data"

From

Scott Marlowe

Date:

07 August 2009, 21:42:58

Oh I just noticed his graphic is "values per second" but he had
originally said they were 16 bit values.  Even if they were 32 or 64
bit values, I'd expect way more than what he's getting there.

On Fri, Aug 7, 2009 at 6:40 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
> Well, from the article, I got the feeling he never showed up here on
> the list to ask for help, and he just assumed he knew enough about
> postgresql to say it couldn't scale well.  I just checked the
> archives, and his name doesn't show up.
>
> When you look at his slides, this one makes we wonder about a few points:
>
> http://deliveryimages.acm.org/10.1145/1540000/1536632/figs/f3.jpg
>

Re: PG-related ACM Article: "The Pathologies of Big Data"

From

Scott Carey

Date:

07 August 2009, 22:34:56

Well, there is CPU overhead for reading postgres pages and tuples.  On a
disk subsystem that gets 1GB/sec sequential reads, I can't get more than
about 700MB/sec of I/O and on a select count(*) query on very large tables
with large rows (600 bytes) and its closer to 300MB/sec if the rows are
smaller (75 bytes). In both cases it is CPU bound with little i/o wait and
disk utilization under 65% in iostat.

I also get over 13GB/sec to RAM from a single thread (Nehalem processor).

I don't see how on any recent hardware, random access to RAM is slower than
sequential from disk.  RAM access, random or not, is measured in GB/sec...

On 8/7/09 5:42 PM, "Scott Marlowe" <scott.marlowe@gmail.com> wrote:

> Oh I just noticed his graphic is "values per second" but he had
> originally said they were 16 bit values.  Even if they were 32 or 64
> bit values, I'd expect way more than what he's getting there.
>
> On Fri, Aug 7, 2009 at 6:40 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
>> Well, from the article, I got the feeling he never showed up here on
>> the list to ask for help, and he just assumed he knew enough about
>> postgresql to say it couldn't scale well.  I just checked the
>> archives, and his name doesn't show up.
>>
>> When you look at his slides, this one makes we wonder about a few points:
>>
>> http://deliveryimages.acm.org/10.1145/1540000/1536632/figs/f3.jpg
>>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: PG-related ACM Article: "The Pathologies of Big Data"

From

Scott Marlowe

Date:

08 August 2009, 00:04:04

On Fri, Aug 7, 2009 at 7:34 PM, Scott Carey<scott@richrelevance.com> wrote:
> Well, there is CPU overhead for reading postgres pages and tuples.  On a
> disk subsystem that gets 1GB/sec sequential reads, I can't get more than
> about 700MB/sec of I/O and on a select count(*) query on very large tables
> with large rows (600 bytes) and its closer to 300MB/sec if the rows are
> smaller (75 bytes). In both cases it is CPU bound with little i/o wait and
> disk utilization under 65% in iostat.
>
> I also get over 13GB/sec to RAM from a single thread (Nehalem processor).
>
> I don't see how on any recent hardware, random access to RAM is slower than
> sequential from disk.  RAM access, random or not, is measured in GB/sec...

I don't think anybody's arguing that.

Re: PG-related ACM Article: "The Pathologies of Big Data"

From

Pierre Frédéric Caillaud

Date:

08 August 2009, 06:24:42

>> I don't see how on any recent hardware, random access to RAM is slower
>> than
>> sequential from disk.  RAM access, random or not, is measured in
>> GB/sec...
>
> I don't think anybody's arguing that.

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2795&p=5

These guys mention about 50 ns memory latency ; this would translate into
20 million memory "seeks" per second, which is in the same ballpark as the
numbers given by the article...

If you count 10GB/s bandwidth, 50 ns is the time to fetch 500 bytes.