Thread: Summaries on SSD usage?

Summaries on SSD usage?

From

Stefan Keller

Date:

30 August 2011, 17:23:19

Hi,

I'm looking for summaries (or best practices) on SSD usage with PostgreSQL.
My use case is mainly a "read-only" database.
Are there any around?

Yours, Stefan

Re: Summaries on SSD usage?

From

Jim Nasby

Date:

01 September 2011, 23:24:09

On Aug 30, 2011, at 12:23 PM, Stefan Keller wrote:
> I'm looking for summaries (or best practices) on SSD usage with PostgreSQL.
> My use case is mainly a "read-only" database.
> Are there any around?

I'm not sure, but for read-only why not just put more memory in the server? It'll be a lot cheaper than SSDs.
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: Summaries on SSD usage?

From

Stefan Keller

Date:

01 September 2011, 23:26:35

You mean something like "Unlogged Tables" in PostgreSQL 9.1 (=
in-memory database) or simply a large ramdisk?

Yours, Stefan

2011/9/1 Jim Nasby <jim@nasby.net>:
> On Aug 30, 2011, at 12:23 PM, Stefan Keller wrote:
>> I'm looking for summaries (or best practices) on SSD usage with PostgreSQL.
>> My use case is mainly a "read-only" database.
>> Are there any around?
>
> I'm not sure, but for read-only why not just put more memory in the server? It'll be a lot cheaper than SSDs.
> --
> Jim C. Nasby, Database Architect                   jim@nasby.net
> 512.569.9461 (cell)                         http://jim.nasby.net
>
>
>

Re: Summaries on SSD usage?

From

Jesper Krogh

Date:

02 September 2011, 04:15:07

On 2011-09-01 23:28, Jim Nasby wrote:
> On Aug 30, 2011, at 12:23 PM, Stefan Keller wrote:
>> I'm looking for summaries (or best practices) on SSD usage with PostgreSQL.
>> My use case is mainly a "read-only" database.
>> Are there any around?
> I'm not sure, but for read-only why not just put more memory in the server? It'll be a lot cheaper than SSDs
It is "really expensive" to go over 512GB memory and the performance
regression for
just hitting disk in a system where you assume everything is in memory is
really huge. SSD makes the "edge" be a bit smoother than rotating drives
do.

Jesper

--
Jesper

Re: Summaries on SSD usage?

From

Scott Marlowe

Date:

02 September 2011, 04:58:19

On Tue, Aug 30, 2011 at 11:23 AM, Stefan Keller <sfkeller@gmail.com> wrote:
> Hi,
>
> I'm looking for summaries (or best practices) on SSD usage with PostgreSQL.
> My use case is mainly a "read-only" database.
> Are there any around?

How big is your DB?
What kind of reads are most common, random access or sequential?
How big of a dataset do you pull out at once with a query.

SSDs are usually not a big winner for read only databases.
If the dataset is small (dozen or so gigs)  get more RAM to fit it in
If it's big and sequentially accessed, then build a giant RAID-10 or RAID-6
If it's big and randomly accessed then buy a bunch of SSDs and RAID them

Re: Summaries on SSD usage?

From

Shaun Thomas

Date:

02 September 2011, 14:30:15

On 09/01/2011 11:14 PM, Jesper Krogh wrote:

> It is "really expensive" to go over 512GB memory and the performance
>  regression for just hitting disk in a system where you assume
> everything is in memory is really huge. SSD makes the "edge" be a bit
> smoother than rotating drives do.

Ironically, this is actually the topic of my presentation at Postgres
Open. We transitioned to NVRAM PCI cards for exactly this reason. Having
a giant database in cache is great, until a few reads come from your
slow backing disks, or heaven-forbid, you have to restart your database
during a high transactional period.

Lemme tell ya... no RAID-10 in the world can supply 12k TPS with little
to no warning. A good set of SSDs or PCI cards can.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Summaries on SSD usage?

From

Stefan Keller

Date:

02 September 2011, 22:04:38

2011/9/2 Scott Marlowe <scott.marlowe@gmail.com>:
> On Tue, Aug 30, 2011 at 11:23 AM, Stefan Keller <sfkeller@gmail.com> wrote:
> How big is your DB?
> What kind of reads are most common, random access or sequential?
> How big of a dataset do you pull out at once with a query.
>
> SSDs are usually not a big winner for read only databases.
> If the dataset is small (dozen or so gigs)  get more RAM to fit it in
> If it's big and sequentially accessed, then build a giant RAID-10 or RAID-6
> If it's big and randomly accessed then buy a bunch of SSDs and RAID them

My dataset is a mirror of OpenStreetMap updated daily. For Switzerland
it's about 10 GB total disk space used (half for tables, half for
indexes) based on 2 GB raw XML input. Europe would be about 70 times
larger (130 GB) and world has 250 GB raw input.

It's both randomly (= index scan?) and sequentially (= seq scan?)
accessed with queries like: " SELECT * FROM osm_point WHERE tags @>
hstore('tourism','zoo') AND name ILIKE 'Zoo%' ".  You can try it
yourself online, e.g.
http://labs.geometa.info/postgisterminal/?xapi=node[tourism=zoo]

So I'm still unsure what's better: SSD, NVRAM (PCI card) or plain RAM?
And I'm eager to understand if unlogged tables could help anyway.

Yours, Stefan

Re: Summaries on SSD usage?

From

Jesper Krogh

Date:

03 September 2011, 06:49:40

On 2011-09-03 00:04, Stefan Keller wrote:
> 2011/9/2 Scott Marlowe<scott.marlowe@gmail.com>:
>> On Tue, Aug 30, 2011 at 11:23 AM, Stefan Keller<sfkeller@gmail.com>  wrote:
>> How big is your DB?
>> What kind of reads are most common, random access or sequential?
>> How big of a dataset do you pull out at once with a query.
>>
>> SSDs are usually not a big winner for read only databases.
>> If the dataset is small (dozen or so gigs)  get more RAM to fit it in
>> If it's big and sequentially accessed, then build a giant RAID-10 or RAID-6
>> If it's big and randomly accessed then buy a bunch of SSDs and RAID them
> My dataset is a mirror of OpenStreetMap updated daily. For Switzerland
> it's about 10 GB total disk space used (half for tables, half for
> indexes) based on 2 GB raw XML input. Europe would be about 70 times
> larger (130 GB) and world has 250 GB raw input.
>
> It's both randomly (= index scan?) and sequentially (= seq scan?)
> accessed with queries like: " SELECT * FROM osm_point WHERE tags @>
> hstore('tourism','zoo') AND name ILIKE 'Zoo%' ".  You can try it
> yourself online, e.g.
> http://labs.geometa.info/postgisterminal/?xapi=node[tourism=zoo]
>
> So I'm still unsure what's better: SSD, NVRAM (PCI card) or plain RAM?
> And I'm eager to understand if unlogged tables could help anyway

It's not that hard to figure out.. take some of your "typical" queries.
say the one above..  Change the search-term to something "you'd expect
the user to enter in a minute, but hasn't been run". (could be "museum"
instead
of "zoo".. then you run it with \timing  and twice.. if the two queries are
"close" to each other in timing, then you only hit memory anyway and
neither SSD, NVRAM or more RAM will buy you anything. Faster memory
and faster CPU-cores will..  if you have a significant speedup to the
second run, then more RAM, NVRAM, SSD is a good fix.

Typically I have slow-query-logging turned on, permanently set to around
250ms.
If I find queries in the log that "i didnt expect" to take above 250ms then
I'd start to investigate if query-plans are correct .. and so on..

The above numbers are "raw-data" size and now how PG uses them.. or?
And you havent told anything about the size of your current system.

Jesper

Re: Summaries on SSD usage?

From

Stefan Keller

Date:

03 September 2011, 08:56:22

2011/9/3 Jesper Krogh <jesper@krogh.cc>:
> On 2011-09-03 00:04, Stefan Keller wrote:
> It's not that hard to figure out.. take some of your "typical" queries.
> say the one above..  Change the search-term to something "you'd expect
> the user to enter in a minute, but hasn't been run". (could be "museum"
> instead
> of "zoo".. then you run it with \timing  and twice.. if the two queries are
> "close" to each other in timing, then you only hit memory anyway and
> neither SSD, NVRAM or more RAM will buy you anything. Faster memory
> and faster CPU-cores will..  if you have a significant speedup to the
> second run, then more RAM, NVRAM, SSD is a good fix.
>
> Typically I have slow-query-logging turned on, permanently set to around
> 250ms.
> If I find queries in the log that "i didnt expect" to take above 250ms then
> I'd start to investigate if query-plans are correct .. and so on..
>
> The above numbers are "raw-data" size and now how PG uses them.. or?
> And you havent told anything about the size of your current system.

Its definitely the case that the second query run is much faster
(first ones go up to 30 seconds and more...).

PG uses the raw data for Switzerlad like this: 10 GB total disk space
based on 2 GB raw XML input. Table osm_point is one of the four big
tables and uses 984 MB for table and 1321 MB for indexes (where hstore
is the biggest from id, name and geometry).

Stefan

Fwd: Summaries on SSD usage?

From

Stefan Keller

Date:

06 September 2011, 13:46:14

Shaun,

2011/9/2 Shaun Thomas <sthomas@peak6.com>:
> Ironically, this is actually the topic of my presentation at Postgres Open.>

Do you think my problem would now be solved with NVRAM PCI card?

Stefan

---------- Forwarded message ----------
From: Stefan Keller <sfkeller@gmail.com>
Date: 2011/9/3
Subject: Re: [PERFORM] Summaries on SSD usage?
To: Jesper Krogh <jesper@krogh.cc>
Cc: pgsql-performance@postgresql.org


2011/9/3 Jesper Krogh <jesper@krogh.cc>:
> On 2011-09-03 00:04, Stefan Keller wrote:
> It's not that hard to figure out.. take some of your "typical" queries.
> say the one above..  Change the search-term to something "you'd expect
> the user to enter in a minute, but hasn't been run". (could be "museum"
> instead
> of "zoo".. then you run it with \timing  and twice.. if the two queries are
> "close" to each other in timing, then you only hit memory anyway and
> neither SSD, NVRAM or more RAM will buy you anything. Faster memory
> and faster CPU-cores will..  if you have a significant speedup to the
> second run, then more RAM, NVRAM, SSD is a good fix.
>
> Typically I have slow-query-logging turned on, permanently set to around
> 250ms.
> If I find queries in the log that "i didnt expect" to take above 250ms then
> I'd start to investigate if query-plans are correct .. and so on..
>
> The above numbers are "raw-data" size and now how PG uses them.. or?
> And you havent told anything about the size of your current system.

Its definitely the case that the second query run is much faster
(first ones go up to 30 seconds and more...).

PG uses the raw data for Switzerlad like this: 10 GB total disk space
based on 2 GB raw XML input. Table osm_point is one of the four big
tables and uses 984 MB for table and 1321 MB for indexes (where hstore
is the biggest from id, name and geometry).

Stefan

Re: Fwd: Summaries on SSD usage?

From

Shaun Thomas

Date:

06 September 2011, 14:07:31

On 09/06/2011 08:45 AM, Stefan Keller wrote:

> Do you think my problem would now be solved with NVRAM PCI card?

That's a tough call. Part of the reason I'm doing the presentation is
because there are a lot of other high OLTP databases out there which
have (or will) reached critical mass where cache can't fulfill generic
database requests anymore.

As an example, we were around 11k database transactions per second on
250GB of data with 32GB of RAM. The first thing we tried was bumping it
up to 64GB, and that kinda worked. But what you'll find, is that an
autovacuum, or a nightly vacuum, will occasionally hit a large table and
flush all of that handy cached data down the tubes, and then your
database starts choking trying to keep up with the requests.

Even a large, well equipped RAID can only really offer 2500-ish TPS
before you start getting into the larger and more expensive SANs, so you
either have to pre-load your memory with dd or pgfincore, or if your
random access patterns actually exceed your RAM, you need a bigger disk
pool or tiered storage. And by tiered storage, I mean tablespaces, with
critical high-TPS tables located on a PCIe card or a pool of modern
(capacitor-backed, firmware GC) SSDs.

Your case looks more like you have just a couple big-ass queries/tables
that occasionally give you trouble. If optimizing the queries, index
tweaks, and other sundry tools can't help anymore, you may have to start
dragging ou the bigger guns. But if you can afford it, having some NVRam
storage around as a top-tier tablespace for critical-need data is
probably good practice these days.

They're expensive, though. Even the cheap ones start around $5k. Just
remember you're paying for the performance in this case, and not storage
capacity. Some vendors have demo hardware they'll let you use to
determine if it applies to your case, so you might want to contact
FusionIO, RAMSAN, Virident, or maybe OCZ.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email