Home > mailing lists

Thread: Postgres for a "data warehouse", 5-10 TB

Postgres for a "data warehouse", 5-10 TB

From

Igor Chudov

Date:

11 September 2011, 09:36:11

I have been a MySQL user for years, including owning a few

multi-gigabyte databases for my websites, and using it to host

algebra.com (about 12 GB database).

I have had my ups and downs with MySQL. The ups were ease of use and

decent performance for small databases such as algebra.com. The downs

were things like twenty hour REPAIR TABLE operations on a 35 GB

table, etc.

Right now I have a personal (one user) project to create a 5-10

Terabyte data warehouse. The largest table will consume the most space

and will take, perhaps, 200,000,000 rows.

I want to use it to obtain valuable business intelligence and to make

money.

I expect it to grow, never shrink, and to be accessed via batch

queries. I do not care for batch queries to be super fast, for example

an hour per query would be just fine.

However, while an hour is fine, two weeks per query is NOT fine.

I have a server with about 18 TB of storage and 48 GB of RAM, and 12

CPU cores.

My initial plan was to use MySQL, InnoDB, and deal with problems as

they arise. Perhaps, say, I would implement my own joining

procedures.

After reading some disparaging stuff about InnoDB performance on large

datasets, however, I am getting cold feet. I have a general feeling

that, perhaps, I will not be able to succeed with MySQL, or, perhaps,

with either MySQL and Postgres.

I do not know much about Postgres, but I am very eager to learn and

see if I can use it for my purposes more effectively than MySQL.

I cannot shell out $47,000 per CPU for Oracle for this project.

To be more specific, the batch queries that I would do, I hope,

would either use small JOINS of a small dataset to a large dataset, or

just SELECTS from one big table.

So... Can Postgres support a 5-10 TB database with the use pattern

stated above?

Thanks!

Re: Postgres for a "data warehouse", 5-10 TB

From

Scott Marlowe

Date:

11 September 2011, 09:52:16

On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov <ichudov@gmail.com> wrote:
> I have a server with about 18 TB of storage and 48 GB of RAM, and 12
> CPU cores.

1 or 2 fast cores is plenty for what you're doing.  But the drive
array and how it's configured etc are very important.  There's a huge
difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and
36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for
data warehouse.)

> I do not know much about Postgres, but I am very eager to learn and
> see if I can use it for my purposes more effectively than MySQL.
> I cannot shell out $47,000 per CPU for Oracle for this project.
> To be more specific, the batch queries that I would do, I hope,

Hopefully if needs be you can spend some small percentage of that for
a fast IO subsystem is needed.

> would either use small JOINS of a small dataset to a large dataset, or
> just SELECTS from one big table.
> So... Can Postgres support a 5-10 TB database with the use pattern
> stated above?

I use it on a ~3TB DB and it works well enough.  Fast IO is the key
here.  Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of
random writing.

Re: Postgres for a "data warehouse", 5-10 TB

From

pasman pasmański

Date:

11 September 2011, 10:36:52

For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan).

2011/9/11, Scott Marlowe <scott.marlowe@gmail.com>:
> On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov <ichudov@gmail.com> wrote:
>> I have a server with about 18 TB of storage and 48 GB of RAM, and 12
>> CPU cores.
>
> 1 or 2 fast cores is plenty for what you're doing.  But the drive
> array and how it's configured etc are very important.  There's a huge
> difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and
> 36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for
> data warehouse.)
>
>> I do not know much about Postgres, but I am very eager to learn and
>> see if I can use it for my purposes more effectively than MySQL.
>> I cannot shell out $47,000 per CPU for Oracle for this project.
>> To be more specific, the batch queries that I would do, I hope,
>
> Hopefully if needs be you can spend some small percentage of that for
> a fast IO subsystem is needed.
>
>> would either use small JOINS of a small dataset to a large dataset, or
>> just SELECTS from one big table.
>> So... Can Postgres support a 5-10 TB database with the use pattern
>> stated above?
>
> I use it on a ~3TB DB and it works well enough.  Fast IO is the key
> here.  Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of
> random writing.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


--
------------
pasman

Re: Postgres for a "data warehouse", 5-10 TB

From

Igor Chudov

Date:

11 September 2011, 11:00:07

On Sun, Sep 11, 2011 at 7:52 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov <ichudov@gmail.com> wrote:
> I have a server with about 18 TB of storage and 48 GB of RAM, and 12
> CPU cores.

1 or 2 fast cores is plenty for what you're doing.

I need those cores to perform other tasks, like image manipulation with imagemagick, XML forming and parsing etc.

But the drive
array and how it's configured etc are very important. There's a huge
difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and
36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for
data warehouse.)

Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6 configuration.

They are managed by a 3WARE 9750 RAID CARD.

I would say that I am not very concerned with linear relationship of read speed to disk speed. If that stuff is somewhat slow, it is OK with me.

What I want to avoid is severe degradation of performance due to size (time complexity greater than O(1)), disastrous REPAIR TABLE operations etc.

> I do not know much about Postgres, but I am very eager to learn and
> see if I can use it for my purposes more effectively than MySQL.
> I cannot shell out $47,000 per CPU for Oracle for this project.
> To be more specific, the batch queries that I would do, I hope,

Hopefully if needs be you can spend some small percentage of that for
a fast IO subsystem is needed.

I am actually open for suggestions here.

> would either use small JOINS of a small dataset to a large dataset, or
> just SELECTS from one big table.
> So... Can Postgres support a 5-10 TB database with the use pattern
> stated above?

I use it on a ~3TB DB and it works well enough. Fast IO is the key
here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of
random writing.

I do not plan to do a lot of random writing. My current design is that my perl scripts write to a temporary table every week, and then I do INSERT..ON DUPLICATE KEY UPDATE.

By the way, does that INSERT UPDATE functionality or something like this exist in Postgres?

Re: Postgres for a "data warehouse", 5-10 TB

From

Igor Chudov

Date:

11 September 2011, 11:01:22

2011/9/11 pasman pasmański <pasman.p@gmail.com>

For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan).

I have 6 Gb/s disk drives, so it should be not too far, maybe 5 hours for a seqscan.

2011/9/11, Scott Marlowe <scott.marlowe@gmail.com>:
> On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov <ichudov@gmail.com> wrote:
>> I have a server with about 18 TB of storage and 48 GB of RAM, and 12
>> CPU cores.
>
> 1 or 2 fast cores is plenty for what you're doing. But the drive
> array and how it's configured etc are very important. There's a huge
> difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and
> 36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for
> data warehouse.)
>
>> I do not know much about Postgres, but I am very eager to learn and
>> see if I can use it for my purposes more effectively than MySQL.
>> I cannot shell out $47,000 per CPU for Oracle for this project.
>> To be more specific, the batch queries that I would do, I hope,
>
> Hopefully if needs be you can spend some small percentage of that for
> a fast IO subsystem is needed.
>
>> would either use small JOINS of a small dataset to a large dataset, or
>> just SELECTS from one big table.
>> So... Can Postgres support a 5-10 TB database with the use pattern
>> stated above?
>
> I use it on a ~3TB DB and it works well enough. Fast IO is the key
> here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of
> random writing.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
------------
pasman

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Postgres for a "data warehouse", 5-10 TB

From

Claudio Freire

Date:

11 September 2011, 11:16:29

On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov <ichudov@gmail.com> wrote:
> Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6
> configuration.
> They are managed by a 3WARE 9750 RAID CARD.
>
> I would say that I am not very concerned with linear relationship of read
> speed to disk speed. If that stuff is somewhat slow, it is OK with me.

With Raid 6 you'll have abysmal performance on write operations.
In data warehousing, there's lots of writes to temporary files, for
sorting and stuff like that.

You should either migrate to raid 10, or set up a separate array for
temporary files, perhaps raid 0.

Re: Postgres for a "data warehouse", 5-10 TB

From

Andy Colson

Date:

11 September 2011, 11:16:38

On 09/11/2011 07:35 AM, Igor Chudov wrote:
> I have been a MySQL user for years, including owning a few
> multi-gigabyte databases for my websites, and using it to host
> algebra.com <http://algebra.com> (about 12 GB database).
>
> I have had my ups and downs with MySQL. The ups were ease of use and
> decent performance for small databases such as algebra.com <http://algebra.com>. The downs
> were things like twenty hour REPAIR TABLE operations on a 35 GB
> table, etc.
>
> Right now I have a personal (one user) project to create a 5-10
> Terabyte data warehouse. The largest table will consume the most space
> and will take, perhaps, 200,000,000 rows.
>
> I want to use it to obtain valuable business intelligence and to make
> money.
>
> I expect it to grow, never shrink, and to be accessed via batch
> queries. I do not care for batch queries to be super fast, for example
> an hour per query would be just fine.
>
> However, while an hour is fine, two weeks per query is NOT fine.
>
> I have a server with about 18 TB of storage and 48 GB of RAM, and 12
> CPU cores.
>
> My initial plan was to use MySQL, InnoDB, and deal with problems as
> they arise. Perhaps, say, I would implement my own joining
> procedures.
>
> After reading some disparaging stuff about InnoDB performance on large
> datasets, however, I am getting cold feet. I have a general feeling
> that, perhaps, I will not be able to succeed with MySQL, or, perhaps,
> with either MySQL and Postgres.
>
> I do not know much about Postgres, but I am very eager to learn and
> see if I can use it for my purposes more effectively than MySQL.
>
> I cannot shell out $47,000 per CPU for Oracle for this project.
>
> To be more specific, the batch queries that I would do, I hope,
> would either use small JOINS of a small dataset to a large dataset, or
> just SELECTS from one big table.
>
> So... Can Postgres support a 5-10 TB database with the use pattern
> stated above?
>
> Thanks!
>
> i
>

That is a scale or two larger than I have experience with.  I converted my website database from mysql to PG, and it
hasseveral db's between 1 and 10 gig.  There are parts of the website that were faster with mysql, and there are parts
fasterwith PG.  One spot, because PG has superior join support on select statements, I was able to change the code to
generatea single more complicated sql statement vs. mysql that had to fire off several simpler statements.  Its a
searchscreen where you can type in 15'ish different options.  I was able to generate a single sql statement which joins
8some odd tables and plenty of where statements.  PG runs it in the blink of an eye.  Its astonishing compared to the
painof mysql.  If you ever have to write your own join, or your own lookup function, that's a failure of your database. 

One spot that was slower was a batch insert of data.  Its not so much slower that it was a problem.  I use COPY on PG
vsprepared insert's on mysql.  It was pretty close, but mysql still won. 

Seeing as you can setup and test both databases, have you considered a trial run?

Things to watch for:

I think the same amount of data will use more disk space in PG than in mysql.

Importing data into PG should use COPY and multiple connections at the same time.

PG will only use multi-core if you use multiple connections. (each connecion uses one core).

Huge result sets (like a select statement that returns 1,000,000 rows) will be slow.

PG is a much fuller database than mysql, and as such you can influence its join types, and function calls. (table scan
vsindex, immutable function vs stable, perl function vs sql).  So if at first it appears slow, you have a million
options. I think the only option you have in mysql is to pull the data back and code it yourself. 

Upgrading to major versions of PG may or may not be painful.  (mysql sometimes works seamlessly between versions, it
appearsbrilliant.  But I have had problems with an update, and when it goes bad, you dont have a lot of options).  In
thepast PG's only method of upgrade was a full backup of old, restore in new.  Things have gotten better, there is new
pg_upgradesupport (still kinda new though), and there is some 3rd party replication support where you replicate your
9.0database to a new 9.1 database, and at some point you promote the new 9.1 database as the new master.  Or something
likethat.  I've only read posts about it, never done it.  But with that much data, you'll need an upgrade plan. 

All in all, if I can summarize my personal view: mysql is fast at the expense of safety and usability.   (mysql still
cannotdo update's with subselects).  PG is safe and usable at the expense of speed, and you wont be disappointed by the
speed.

-Andy

Re: Postgres for a "data warehouse", 5-10 TB

From

Igor Chudov

Date:

11 September 2011, 11:22:22

On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire <klaussfreire@gmail.com> wrote:

On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov <ichudov@gmail.com> wrote:
> Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6
> configuration.
> They are managed by a 3WARE 9750 RAID CARD.
>
> I would say that I am not very concerned with linear relationship of read
> speed to disk speed. If that stuff is somewhat slow, it is OK with me.

With Raid 6 you'll have abysmal performance on write operations.
In data warehousing, there's lots of writes to temporary files, for
sorting and stuff like that.

You should either migrate to raid 10, or set up a separate array for
temporary files, perhaps raid 0.

Thanks. I will rebuild the RAID array early next week and I will see if I have a Raid 10 option with that card.

Quantitatively, what would you say is the write speed difference between RAID 10 and RAID 6?

Re: Postgres for a "data warehouse", 5-10 TB

From

Andy Colson

Date:

11 September 2011, 11:23:27

On 09/11/2011 08:59 AM, Igor Chudov wrote:
>
>
> I do not plan to do a lot of random writing. My current design is that my perl scripts write to a temporary table
everyweek, and then I do INSERT..ON DUPLICATE KEY UPDATE. 
>
> By the way, does that INSERT UPDATE functionality or something like this exist in Postgres?
>
> i

You have two options:

1) write a function like:
create function doinsert(_id integer, _value text) returns void as $$
begin
   update thetable set value = _value where id = _id;
   if not found then
      insert into thetable(id, value) values (_id, _value);
    end if
end;
$$ language plpgsql;

2) use two sql statements:
-- update the existing
update realTable set value = (select value from tmp where tmp.id = realTable.id)
where exists (select value from tmp where tmp.id = realTable.id);

-- insert the missing
insert into realTable(id, value)
select id, value from tmp where not exists(select 1 from realTable where tmp.id = realTable.id);


-Andy

Re: Postgres for a "data warehouse", 5-10 TB

From

Claudio Freire

Date:

11 September 2011, 11:27:50

On Sun, Sep 11, 2011 at 4:16 PM, Andy Colson <andy@squeakycode.net> wrote:
> Upgrading to major versions of PG may or may not be painful.  (mysql
> sometimes works seamlessly between versions, it appears brilliant.  But I
> have had problems with an update, and when it goes bad, you dont have a lot
> of options).  In the past PG's only method of upgrade was a full backup of
> old, restore in new.  Things have gotten better, there is new pg_upgrade
> support (still kinda new though), and there is some 3rd party replication
> support where you replicate your 9.0 database to a new 9.1 database, and at
> some point you promote the new 9.1 database as the new master.  Or something
> like that.  I've only read posts about it, never done it.  But with that
> much data, you'll need an upgrade plan.

I have used slony to do database migration. It is a pain to set up,
but it saves you hours of downtime.
Basically, you replicate your 9.0 database into a 9.1 slave while the
9.0 is still hot and working, so you only have a very small downtime.
It's an option, but it's a lot of work to set up, only warranted if
you really cannot afford the downtime.

Re: Postgres for a "data warehouse", 5-10 TB

From

Andy Colson

Date:

11 September 2011, 11:36:37

On 09/11/2011 09:21 AM, Igor Chudov wrote:
>
>
> On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire <klaussfreire@gmail.com <mailto:klaussfreire@gmail.com>> wrote:
>
>     On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov <ichudov@gmail.com <mailto:ichudov@gmail.com>> wrote:
>      > Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6
>      > configuration.
>      > They are managed by a 3WARE 9750 RAID CARD.
>      >
>      > I would say that I am not very concerned with linear relationship of read
>      > speed to disk speed. If that stuff is somewhat slow, it is OK with me.
>
>     With Raid 6 you'll have abysmal performance on write operations.
>     In data warehousing, there's lots of writes to temporary files, for
>     sorting and stuff like that.
>
>     You should either migrate to raid 10, or set up a separate array for
>     temporary files, perhaps raid 0.
>
>
> Thanks. I will rebuild the RAID array early next week and I will see if I have a Raid 10 option with that card.
>
> Quantitatively, what would you say is the write speed difference between RAID 10 and RAID 6?
>

Note that using RAID 10, while faster, cuts your usable space in half. 12 2TB drives in raid 10 == 6 drives * 2TB == 12
TBtotal space.  That's not big enough, is it? 

-Andy

Re: Postgres for a "data warehouse", 5-10 TB

From

Claudio Freire

Date:

11 September 2011, 11:44:25

On Sun, Sep 11, 2011 at 4:21 PM, Igor Chudov <ichudov@gmail.com> wrote:
> Quantitatively, what would you say is the write speed difference between
> RAID 10 and RAID 6?

https://support.nstein.com/blog/archives/73

There you can see a comparison with 4 drives, and raid 10 is twice as fast.
Since raid 5/6 doesn't scale write performance at all (it performs as
a single drive), it's quite expected. 12 drives would probably be
around 6 times as fast as raid 6.

You definitely should do some benchmarks to confirm, though.

And Andy is right, you'll have a lot less space. If raid 10 doesn't
give you enough room, just leave two spare drives for a raid 0
temporary partition. That will be at least twice as fast as doing
temporary tables on the raid 6.

You'll obviously have to get creative, tons of options.

Re: Postgres for a "data warehouse", 5-10 TB

From

Marti Raudsepp

Date:

11 September 2011, 14:02:48

On Sun, Sep 11, 2011 at 17:23, Andy Colson <andy@squeakycode.net> wrote:
> On 09/11/2011 08:59 AM, Igor Chudov wrote:
>> By the way, does that INSERT UPDATE functionality or something like this exist in Postgres?
> You have two options:
> 1) write a function like:
> create function doinsert(_id integer, _value text) returns void as
> 2) use two sql statements:

Unfortunately both of these options have caveats. Depending on your
I/O speed, you might need to use multiple loader threads to saturate
the write bandwidth.

However, neither option is safe from race conditions. If you need to
load data from multiple threads at the same time, they won't see each
other's inserts (until commit) and thus cause unique violations. If
you could somehow partition their operation by some key, so threads
are guaranteed not to conflict each other, then that would be perfect.
The 2nd option given by Andy is probably faster.

You *could* code a race-condition-safe function, but that would be a
no-go on a data warehouse, since each call needs a separate
subtransaction which involves allocating a transaction ID.

----

Which brings me to another important point: don't do lots of small
write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides
being inefficient, they introduce a big maintenance burden. In
PostgreSQL's MVCC, each tuple contains a reference to the 32-bit
transaction ID that inserted it (xmin). After hitting the maximum
32-bit value transaction ID, the number "wraps around". To prevent old
rows from appearing as new, a "vacuum freeze" process will run after
passing autovacuum_freeze_max_age transactions (200 million by
default) to update all old rows in your database. Using fewer
transaction IDs means it runs less often.

On small databases, this is usually not important. But on a 10TB data
warehouse, rewriting a large part of your database totally kills
performance for any other processes.
This is detailed in the documentation:
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

Regards,
Marti

Re: Postgres for a "data warehouse", 5-10 TB

From

Ogden

Date:

11 September 2011, 15:37:07

On Sep 11, 2011, at 9:21 AM, Igor Chudov wrote:

On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov <ichudov@gmail.com> wrote:
> Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6
> configuration.
> They are managed by a 3WARE 9750 RAID CARD.
>
> I would say that I am not very concerned with linear relationship of read
> speed to disk speed. If that stuff is somewhat slow, it is OK with me.

With Raid 6 you'll have abysmal performance on write operations.
In data warehousing, there's lots of writes to temporary files, for
sorting and stuff like that.

You should either migrate to raid 10, or set up a separate array for
temporary files, perhaps raid 0.

Thanks. I will rebuild the RAID array early next week and I will see if I have a Raid 10 option with that card.

Quantitatively, what would you say is the write speed difference between RAID 10 and RAID 6?

As someone who migrated a RAID 5 installation to RAID 10, I am getting far better read and write performance on heavy calculation queries. Writing on the RAID 5 really made things crawl. For lots of writing, I think RAID 10 is the best. It should also be noted that I changed my filesystem from ext3 to XFS - this is something you can look into as well.

Ogden

Re: Postgres for a "data warehouse", 5-10 TB

From

J Sisson

Date:

11 September 2011, 16:08:45

On Sun, Sep 11, 2011 at 1:36 PM, Ogden <lists@darkstatic.com> wrote:

As someone who migrated a RAID 5 installation to RAID 10, I am getting far better read and write performance on heavy calculation queries. Writing on the RAID 5 really made things crawl. For lots of writing, I think RAID 10 is the best. It should also be noted that I changed my filesystem from ext3 to XFS - this is something you can look into as well.

Ogden

RAID 10 on XFS here, too, both in OLTP and Data-warehousing scenarios. Our largest OLTP is ~375 GB, and PostgreSQL performs admirably (we converted from MSSQL to PostgreSQL, and we've had more issues with network bottlenecks since converting (where MSSQL was always the bottleneck before)). Now that we have fiber interconnects between our two main datacenters, I'm actually having to work again haha.

But yeah, we tried quite a few file systems, and XFS **for our workloads** performed better than everything else we tested, and RAID 10 is a given if you do any significant writing.

Re: Postgres for a "data warehouse", 5-10 TB

From

J Sisson

Date:

11 September 2011, 16:30:33

Sorry, meant to send this to the list.

For really big data-warehousing, this document really helped us:

http://pgexperts.com/document.html?id=49

Re: Postgres for a "data warehouse", 5-10 TB

From

Scott Marlowe

Date:

11 September 2011, 17:10:11

2011/9/11 pasman pasmański <pasman.p@gmail.com>:
> For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan).

Random data point.  Our primary servers were built for OLTP with 48
cores and 32 15kSAS drives.  We started out on Arecas but the
Supermicro 1Us we were using didn't provide enough cooling and the
Arecas were burning out after 2 to 4 months, so on those machines, we
pulled the Arecas and replaced them with simple LSI SAS non-RAID
cards.  Both were RAID-10, the latter with linux software RAID.

With the Arecas the OLTP performance is outstanding, garnering us
~8500tps at 40 to 50 threads.  However, sequentual performance was
just so so at around read / write speeds of 500/350MB/s.  The SW
RAID-10 can read AND write at right around 1GB/s.  what it lacks in
transactional throughput it more than makes up for in sequential read
/ write performance.

Another data point.  We had a big Oracle installation at my last job,
and OLAP queries were killing it midday, so I built a simple
replication system to grab rows from the big iron Oracle SUN box and
shove into a single core P IV 2.xGHz machine with 4 120G SATA drives
in SW RAID-10.

That machine handily beat the big iron Oracle machine at OLAP queries,
running in 20 minutes what was taking well over an hour for the big
Oracle machine to do, even during its (Oracle machine) off peak load
times.

Re: Postgres for a "data warehouse", 5-10 TB

From

Stephen Frost

Date:

11 September 2011, 20:01:43

* Igor Chudov (ichudov@gmail.com) wrote:
> Right now I have a personal (one user) project to create a 5-10
> Terabyte data warehouse. The largest table will consume the most space
> and will take, perhaps, 200,000,000 rows.

I run data-warehouse databases on that order (current largest single
instance is ~4TB running under 9.0.4).  If the largest table is only
200M rows, PG should handle that quite well.  Our data is partitioned by
month and each month is about 200M records and simple queries can run in
15-20 minutes (with a single thread), with complex windowing queries
(split up and run in parallel) finishing in a couple of hours.

> However, while an hour is fine, two weeks per query is NOT fine.

What's really, really, really useful are two things: EXPLAIN, and this
mailing list. :)  Seriously, run EXPLAIN on your queries before you run
them and see if how the query is going to be executed makes sense.
Here's a real easy hint: if it says "External Sort" and has big numbers,
come talk to us here- that's about one of the worst things you can
possibly do.  Of course, PG's going to avoid doing that, but you may
have written a query (unintentionally) which forces PG to do a sort, or
something else.

> I have a server with about 18 TB of storage and 48 GB of RAM, and 12
> CPU cores.

If you partition up your data and don't mind things running in different
transactions, you can definitely get a speed boost with PG by running
things in parallel.  PG will handle that very well, in fact, if two
queries are running against the same table, PG will actually combine
them and only actually read the data from disk once.

> I cannot shell out $47,000 per CPU for Oracle for this project.

The above data warehouse was migrated from an Oracle-based system. :)

> To be more specific, the batch queries that I would do, I hope,
> would either use small JOINS of a small dataset to a large dataset, or
> just SELECTS from one big table.

Make sure that you set your 'work_mem' correctly- PG will use that to
figure out if it can hash the small table (you want that to happen,
trust me..).  If you do end up having sorts, it'll also use the work_mem
value to figure out how much memory to use for sorting.

> So... Can Postgres support a 5-10 TB database with the use pattern
> stated above?

Yes, certainly.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Postgres for a "data warehouse", 5-10 TB

From

Igor Chudov

Date:

11 September 2011, 20:17:24

On Sun, Sep 11, 2011 at 6:01 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Igor Chudov (ichudov@gmail.com) wrote:
> Right now I have a personal (one user) project to create a 5-10
> Terabyte data warehouse. The largest table will consume the most space
> and will take, perhaps, 200,000,000 rows.

I run data-warehouse databases on that order (current largest single
instance is ~4TB running under 9.0.4). If the largest table is only
200M rows, PG should handle that quite well. Our data is partitioned by
month and each month is about 200M records and simple queries can run in
15-20 minutes (with a single thread), with complex windowing queries
(split up and run in parallel) finishing in a couple of hours.

Which brings up a question.

Can I partition data by month (or quarter), without that month being part of PRIMARY KEY?

If this question sounds weird, I am asking because MySQL enforces this, which does not fit my data.

If I can keep my primary key to be the ID that I want (which comes with data), but still partition it by month, I will be EXTREMELY happy.

> However, while an hour is fine, two weeks per query is NOT fine.

What's really, really, really useful are two things: EXPLAIN, and this
mailing list. :) Seriously, run EXPLAIN on your queries before you run
them and see if how the query is going to be executed makes sense.
Here's a real easy hint: if it says "External Sort" and has big numbers,
come talk to us here- that's about one of the worst things you can
possibly do. Of course, PG's going to avoid doing that, but you may
have written a query (unintentionally) which forces PG to do a sort, or
something else.

Very good, thanks

> I have a server with about 18 TB of storage and 48 GB of RAM, and 12
> CPU cores.

If you partition up your data and don't mind things running in different
transactions, you can definitely get a speed boost with PG by running
things in parallel. PG will handle that very well, in fact, if two
queries are running against the same table, PG will actually combine
them and only actually read the data from disk once.

> I cannot shell out $47,000 per CPU for Oracle for this project.

The above data warehouse was migrated from an Oracle-based system. :)

I am wondering, why?

> To be more specific, the batch queries that I would do, I hope,
> would either use small JOINS of a small dataset to a large dataset, or
> just SELECTS from one big table.

Make sure that you set your 'work_mem' correctly- PG will use that to
figure out if it can hash the small table (you want that to happen,
trust me..). If you do end up having sorts, it'll also use the work_mem
value to figure out how much memory to use for sorting.

I could, say, set work_mem to 30 GB? (64 bit linux)

> So... Can Postgres support a 5-10 TB database with the use pattern
> stated above?

Yes, certainly.

that's great to know.

Thanks,

Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk5tPc8ACgkQrzgMPqB3kigtSgCffwEmi3AD6Ryff7qZyQYieyKQ
jhoAoJDFC1snQmwCIBUjwlC6WVRyAOkn
=LPtP
-----END PGP SIGNATURE-----

Re: Postgres for a "data warehouse", 5-10 TB

From

Claudio Freire

Date:

11 September 2011, 20:50:16

On Mon, Sep 12, 2011 at 1:16 AM, Igor Chudov <ichudov@gmail.com> wrote:
> I could, say, set work_mem to 30 GB? (64 bit linux)

I don't think you'd want that. Remember, work_mem is the amount of
memory *per sort*.
Queries can request several times that much memory, once per sort they
need to perform.

You can set it really high, but not 60% of your RAM - that wouldn't be wise.

Re: Postgres for a "data warehouse", 5-10 TB

From

Stephen Frost

Date:

11 September 2011, 23:28:13

* Igor Chudov (ichudov@gmail.com) wrote:
> Can I partition data by month (or quarter), without that month being part of
> PRIMARY KEY?

The way partitioning works in PG is by using CHECK constraints.  Not
sure if you're familiar with those (not sure if MySQL has them), so
here's a quick example:

Create a parent table.  Then create two tables which inherit from that
parent table (this is more of an implementation detail than anything
else, the parent table is always empty, it's just there to be the
single, combined, table that you run your select queries against).  On
each of the two 'child' tables, create a CHECK constraint.  On table1,
you do:
  alter table table1 add check (date < '2000-01-01');
On table2, you do:
  alter table table2 add check (date >= '2000-01-01');

Once those are done, you can query against the 'parent' table with
something like:
select * from parent where date = '2010-01-01';

And PG will realize it only has to look at table2 to get the results for
that query.  This means the partitioning can be more-or-less any check
constraint that will be satisfied by the data in the table (and PG will
check/enforce this) and that PG can figure out will eliminate a partition
from possibly having the data that matches the request.

Technically, this means that you could have all kinds of different ways
your data is split across the partitions, but remember that all the
constraints have to actually be TRUE. :)  Typically, people do split
based on the PK, but it's not required (realize that PG doesn't support
cross-table PKs, so if you don't have CHECK constraints which make sure
that the tables don't cover the same PK value, you could end up with
duplicate values across the tables...).

> If this question sounds weird, I am asking because MySQL enforces this,
> which does not fit my data.

That part is a little strange..

> If I can keep my primary key to be the ID that I want (which comes with
> data), but still partition it by month, I will be EXTREMELY happy.

As I said above, the actual PK is going to be independent and in the
base/child tables.  That said, yes, you could have the PK in each table
be whatever you want and you use month to partition the 'main' table.
You then have to come up with some other way to make sure your PK is
enforced, however, or figure out a way to deal with things if it's not.
Based on what you've been describing, I'm afraid you'd have to actually
search all the partitions for a given ID on an update, to figure out if
you're doing an UPDATE or an INSERT...  Unless, of course, the month is
included in the PK somewhere, or is in the incoming data and you can be
100% confident that the incoming data is never wrong.. :)

> I am wondering, why?

Cost, and we had a real hard time (this was a while ago..) getting
Oracle to run decently on Linux, and the Sun gear was just too damn
expensive.  Also, ease of maintenance- it takes a LOT less effort to
keep a PG database set up and running smoothly than an Oracle one, imv.

> I could, say, set work_mem to 30 GB? (64 bit linux)

You can, but you have to be careful with it, because PG will think it
can use 30GB for EACH sort in a given query, and in EACH hash in a given
query.  What I would recommend is setting the default to something like
256MB and then looking at specific queries and bumping it up for those
queries when it's clear that it'll help the query and won't cause the
system to go into swap.  Note that you can set work_mem for a given
session after you connect to the database, just do:

set work_mem = '1GB';

in your session before running other queries.  Doing that won't impact
other sessions.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Postgres for a "data warehouse", 5-10 TB

From

Stephen Frost

Date:

11 September 2011, 23:29:32

* Claudio Freire (klaussfreire@gmail.com) wrote:
> I don't think you'd want that. Remember, work_mem is the amount of
> memory *per sort*.
> Queries can request several times that much memory, once per sort they
> need to perform.
>
> You can set it really high, but not 60% of your RAM - that wouldn't be wise.

Oh, I dunno..  It's only used by the planner, so sometimes you have to
bump it up, especially when PG thinks the number of rows returned from
something will be a lot more than it really will be. :)

/me has certain queries where it's been set to 100GB... ;)

I agree that it shouldn't be the default, however.  That's asking for
trouble.  Do it for the specific queries that need it.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Postgres for a "data warehouse", 5-10 TB

From

Ondrej Ivanič

Date:

11 September 2011, 23:56:44

Hi,

On 12 September 2011 12:28, Stephen Frost <sfrost@snowman.net> wrote:
> Once those are done, you can query against the 'parent' table with
> something like:
> select * from parent where date = '2010-01-01';
>
> And PG will realize it only has to look at table2 to get the results for
> that query.  This means the partitioning can be more-or-less any check
> constraint that will be satisfied by the data in the table (and PG will
> check/enforce this) and that PG can figure out will eliminate a partition
> from possibly having the data that matches the request.

Theory is nice but there are few gotchas (in 8.4) :

- planner can use constant expressions only. You will get scans across
all partitions when you use function (like now(), immutable function
with constant arguments), sub query (like part_col = (select x from
...) .. ) or anything which can't be evaluated to constat during query
planning.

- partitions constraints are not "pushed to joins" (assuming tables
partitioned by primary key):
select ... from X left join Y on X.primary_key = Y.primary_key where
part_col >= ... and X.primary_key >= .,, and X.primary_key < ...
must be rewritten like
select ... from X
left join Y on X.primary_key = Y.primary_key and X.primary_key >= .,,
and Y.primary_key < ...
where X.primary_key >= .,, and X.primary_key < ...
in order to avoid scan entire Y table (not only relevant partitions)

- ORDER BY / LIMIT X issue fixed in 9.1 (Allow inheritance table scans
to return meaningfully-sorted results.

Moreover all queries should have 'WHERE' on column which is used for
partitioning otherwise partitioning is not very useful (yes, it could
simplify data management -- drop partition vs delete from X where
part_col between A and B)

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: Postgres for a "data warehouse", 5-10 TB

From

Shaun Thomas

Date:

12 September 2011, 13:09:53

On 09/11/2011 09:44 AM, Claudio Freire wrote:

> And Andy is right, you'll have a lot less space. If raid 10 doesn't
> give you enough room, just leave two spare drives for a raid 0
> temporary partition. That will be at least twice as fast as doing
> temporary tables on the raid 6.

Alternatively, throw a lot of memory at the system and point the temp
space at /dev/shm. We've had really good luck doing that here, to avoid
excessive writes to our NVRAM PCIe cards. Make sure the transaction logs
(and any archives) get written to a separate LUN (ideally on a separate
controller) for even more win.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

Re: Postgres for a "data warehouse", 5-10 TB

From

Shaun Thomas

Date:

12 September 2011, 13:23:03

On 09/11/2011 12:02 PM, Marti Raudsepp wrote:

> Which brings me to another important point: don't do lots of small
> write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides
> being inefficient, they introduce a big maintenance burden.

I'd like to second this. Before a notable application overhaul, we were
handling about 300-million transactions per day (250M of that was over a
6-hour period). To avoid the risk of mid-day vacuum-freeze, we disabled
autovacuum and run a nightly vacuum over the entire database. And that
was *after* bumping  autovacuum_freeze_max_age to 600-million.

You do *not* want to screw with that if you don't have to, and a setting
of 600M is about 1/3 of the reasonable boundary there. If not for the
forced autovacuums, a database with this much traffic would be corrupt
in less than a week. We've managed to cut that transaction traffic by
60%, and it greatly improved the database's overall health.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

Re: Postgres for a "data warehouse", 5-10 TB

From

Shaun Thomas

Date:

12 September 2011, 13:36:31

On 09/11/2011 09:27 AM, Claudio Freire wrote:

> I have used slony to do database migration. It is a pain to set up,
> but it saves you hours of downtime.

I've had to shoot this option down in two separate upgrade scenarios in
two different companies. Theoretically it's possible, but slony is based
on triggers. If you have an OLTP database with frequent writes, that
overhead (firing the trigger, storing the replicated data, reading the
replication log, traffic to the upgrade node) can literally kill your
application. Downtime is one thing, but maintenance windows can be
planned. Ruining application performance for an undetermined length of
time is probably worse.

Thankfully 8.4 added pg_migrator/pg_upgrade, so that kind of pain is
probably over for the most part. But even without it, 8.4 and above have
parallel restore, which can drop upgrade times down to a fraction of
their former length.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

Re: Postgres for a "data warehouse", 5-10 TB

From

Robert Klemme

Date:

12 September 2011, 14:04:50

On 11.09.2011 22:10, Scott Marlowe wrote:

> Another data point.  We had a big Oracle installation at my last job,
> and OLAP queries were killing it midday, so I built a simple
> replication system to grab rows from the big iron Oracle SUN box and
> shove into a single core P IV 2.xGHz machine with 4 120G SATA drives
> in SW RAID-10.
>
> That machine handily beat the big iron Oracle machine at OLAP queries,
> running in 20 minutes what was taking well over an hour for the big
> Oracle machine to do, even during its (Oracle machine) off peak load
> times.

Um, that sounds as if the SUN setup was really bad.  Do you remember any
details about the drive configuration there?

Kind regards

    robert

Re: Postgres for a "data warehouse", 5-10 TB

From

Robert Klemme

Date:

12 September 2011, 14:16:10

On 11.09.2011 19:02, Marti Raudsepp wrote:
> On Sun, Sep 11, 2011 at 17:23, Andy Colson<andy@squeakycode.net>  wrote:
>> On 09/11/2011 08:59 AM, Igor Chudov wrote:
>>> By the way, does that INSERT UPDATE functionality or something like this exist in Postgres?
>> You have two options:
>> 1) write a function like:
>> create function doinsert(_id integer, _value text) returns void as
>> 2) use two sql statements:
>
> Unfortunately both of these options have caveats. Depending on your
> I/O speed, you might need to use multiple loader threads to saturate
> the write bandwidth.
>
> However, neither option is safe from race conditions. If you need to
> load data from multiple threads at the same time, they won't see each
> other's inserts (until commit) and thus cause unique violations. If
> you could somehow partition their operation by some key, so threads
> are guaranteed not to conflict each other, then that would be perfect.
> The 2nd option given by Andy is probably faster.
>
> You *could* code a race-condition-safe function, but that would be a
> no-go on a data warehouse, since each call needs a separate
> subtransaction which involves allocating a transaction ID.

Wouldn't it be sufficient to reverse order for race condition safety?
Pseudo code:

begin
   insert ...
catch
   update ...
   if not found error
end

Speed is another matter though...

Kind regards

    robert

Re: Postgres for a "data warehouse", 5-10 TB

From

Andy Colson

Date:

12 September 2011, 14:22:49

On 9/12/2011 12:15 PM, Robert Klemme wrote:
> On 11.09.2011 19:02, Marti Raudsepp wrote:
>> On Sun, Sep 11, 2011 at 17:23, Andy Colson<andy@squeakycode.net> wrote:
>>> On 09/11/2011 08:59 AM, Igor Chudov wrote:
>>>> By the way, does that INSERT UPDATE functionality or something like
>>>> this exist in Postgres?
>>> You have two options:
>>> 1) write a function like:
>>> create function doinsert(_id integer, _value text) returns void as
>>> 2) use two sql statements:
>>
>> Unfortunately both of these options have caveats. Depending on your
>> I/O speed, you might need to use multiple loader threads to saturate
>> the write bandwidth.
>>
>> However, neither option is safe from race conditions. If you need to
>> load data from multiple threads at the same time, they won't see each
>> other's inserts (until commit) and thus cause unique violations. If
>> you could somehow partition their operation by some key, so threads
>> are guaranteed not to conflict each other, then that would be perfect.
>> The 2nd option given by Andy is probably faster.
>>
>> You *could* code a race-condition-safe function, but that would be a
>> no-go on a data warehouse, since each call needs a separate
>> subtransaction which involves allocating a transaction ID.
>
> Wouldn't it be sufficient to reverse order for race condition safety?
> Pseudo code:
>
> begin
> insert ...
> catch
> update ...
> if not found error
> end
>
> Speed is another matter though...
>
> Kind regards
>
> robert
>
>
>

No, I dont think so, if you had two loaders, both would start a
transaction, then neither could see what the other was doing.  There are
transaction isolation levels, but they are like playing with fire.  (in
my opinion).

-Andy

Re: Postgres for a "data warehouse", 5-10 TB

From

Scott Marlowe

Date:

12 September 2011, 16:08:13

On Mon, Sep 12, 2011 at 11:04 AM, Robert Klemme
<shortcutter@googlemail.com> wrote:
> On 11.09.2011 22:10, Scott Marlowe wrote:
>
>> Another data point.  We had a big Oracle installation at my last job,
>> and OLAP queries were killing it midday, so I built a simple
>> replication system to grab rows from the big iron Oracle SUN box and
>> shove into a single core P IV 2.xGHz machine with 4 120G SATA drives
>> in SW RAID-10.
>>
>> That machine handily beat the big iron Oracle machine at OLAP queries,
>> running in 20 minutes what was taking well over an hour for the big
>> Oracle machine to do, even during its (Oracle machine) off peak load
>> times.
>
> Um, that sounds as if the SUN setup was really bad.  Do you remember any
> details about the drive configuration there?

It was actually setup quite well.  A very fast SAN with individual
drive arrays etc.  It was VERY fast at transactional throughput.  BUT
it was not setup for massive OLAP work.  The drives that housed the
statistical data we were running OLAP against were the slowest in the
set, since they were made to mostly just take in a small amount of
data each minute from the java servers.  Originally the stats had been
on a pg server in production and very fast, but some political
decision moved it onto the Oracle server.  The Oracle DBA wasn't any
happier with this move than me, btw.

Re: Postgres for a "data warehouse", 5-10 TB

From

Scott Marlowe

Date:

12 September 2011, 16:49:04

On Mon, Sep 12, 2011 at 10:22 AM, Shaun Thomas <sthomas@peak6.com> wrote:
> On 09/11/2011 12:02 PM, Marti Raudsepp wrote:
>
>> Which brings me to another important point: don't do lots of small
>> write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides
>> being inefficient, they introduce a big maintenance burden.
>
> I'd like to second this. Before a notable application overhaul, we were
> handling about 300-million transactions per day (250M of that was over a
> 6-hour period). To avoid the risk of mid-day vacuum-freeze, we disabled
> autovacuum and run a nightly vacuum over the entire database. And that was
> *after* bumping  autovacuum_freeze_max_age to 600-million.
>
> You do *not* want to screw with that if you don't have to, and a setting of
> 600M is about 1/3 of the reasonable boundary there. If not for the forced
> autovacuums, a database with this much traffic would be corrupt in less than
> a week. We've managed to cut that transaction traffic by 60%, and it greatly
> improved the database's overall health.

I put it to you that your hardware has problems if you have a pg db
that's corrupting from having too much vacuum activity.  I've had
exactly one pg corruption problem in the past, and it was a bad SATA
hard drive on a stats server.  I have four 48 core opterons running
quite hard during the day, have autovacuum on and VERY aggresively
tuned and have had zero corruption issues in over 3 years of hard
running.

Re: Postgres for a "data warehouse", 5-10 TB

From

Shaun Thomas

Date:

12 September 2011, 17:04:49

On 09/12/2011 02:48 PM, Scott Marlowe wrote:

> I put it to you that your hardware has problems if you have a pg db
> that's corrupting from having too much vacuum activity.

What? No. We optimized by basically forcing autovacuum to never run
during our active periods. We never actually encountered wrap-around
corruption. I was just saying that 600M is a relatively high setting for
autovacuum_freeze_max_age. :)

I was alluding to the fact that if a DBA had his system running for a
week at our transaction level, and PG didn't have forced auto vacuum,
and their maintenance lapsed even slightly, they could end up with a
corrupt database. Not too far-fetched for someone coming from MySQL, really.

Our problem is we run a financial site, and the front-end very
aggressively monitors network and database timeouts. The limit is
sufficiently low that a vacuum would cause enough IO to trigger
application timeouts, even with vacuum_cost_delay. And of course,
setting vacuum_cost_delay too high quickly triples or quadruples vacuum
times. Now that we're using FusionIO cards, I've been thinking about
turning autovacuum back on, but I want to run some tests first.

My point stands, though. Don't go crazy with transactions until you know
your config can stand up to it, and reduce if possible. We found some
tweak points that drastically reduced transaction count with no
detrimental effect on the app itself, so we jumped on them.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

Re: Postgres for a "data warehouse", 5-10 TB

From

Marti Raudsepp

Date:

12 September 2011, 17:20:10

On Mon, Sep 12, 2011 at 23:04, Shaun Thomas <sthomas@peak6.com> wrote:
> I was alluding to the fact that if a DBA had his system running for a week
> at our transaction level, and PG didn't have forced auto vacuum, and their
> maintenance lapsed even slightly, they could end up with a corrupt database.

It doesn't actually corrupt your database. If you manage to hit the
wraparound age, PostgreSQL disallows new connections and tells you to
run a VACUUM from a standalone backend. (But that should never happen
due to the forced vacuum freeze processes)

Regards,
Marti

Re: Postgres for a "data warehouse", 5-10 TB

From

Scott Marlowe

Date:

12 September 2011, 17:44:52

On Mon, Sep 12, 2011 at 2:04 PM, Shaun Thomas <sthomas@peak6.com> wrote:
> On 09/12/2011 02:48 PM, Scott Marlowe wrote:
>
>> I put it to you that your hardware has problems if you have a pg db
>> that's corrupting from having too much vacuum activity.
>
> What? No. We optimized by basically forcing autovacuum to never run during
> our active periods. We never actually encountered wrap-around corruption. I
> was just saying that 600M is a relatively high setting for
> autovacuum_freeze_max_age. :)

You don't get corruption from wrap around, you get a database that
stops and tells you to run a vacuum by hand on a single user backend
and won't come up until you do.  You throw around the word corruption
a lot.  The PostgreSQL team works REALLY hard to prevent any kind of
corruption scenario from rearing its ugly head, so when the word
corruption pops up I start to wonder about the system (hardware wise)
someone is using, since only killing the postmaster by hand, then
deleting the interlock file and starting a new postmaster while old
postgres children are still active is just about the only way to
corrupt pgsql, short of using vi on one of the files in
/data/base/xxx/yyy etc.

>
> I was alluding to the fact that if a DBA had his system running for a week
> at our transaction level, and PG didn't have forced auto vacuum, and their
> maintenance lapsed even slightly, they could end up with a corrupt database.
> Not too far-fetched for someone coming from MySQL, really.
>
> Our problem is we run a financial site, and the front-end very aggressively
> monitors network and database timeouts. The limit is sufficiently low that a
> vacuum would cause enough IO to trigger application timeouts, even with
> vacuum_cost_delay. And of course, setting vacuum_cost_delay too high quickly
> triples or quadruples vacuum times. Now that we're using FusionIO cards,
> I've been thinking about turning autovacuum back on, but I want to run some
> tests first.
>
> My point stands, though. Don't go crazy with transactions until you know
> your config can stand up to it, and reduce if possible. We found some tweak
> points that drastically reduced transaction count with no detrimental effect
> on the app itself, so we jumped on them.
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
> 312-676-8870
> sthomas@peak6.com
>
> ______________________________________________
>
> See http://www.peak6.com/email-disclaimer/ for terms and conditions related
> to this email
>



--
To understand recursion, one must first understand recursion.

Re: Postgres for a "data warehouse", 5-10 TB

From

Shaun Thomas

Date:

12 September 2011, 17:55:52

On 09/12/2011 03:44 PM, Scott Marlowe wrote:

> The PostgreSQL team works REALLY hard to prevent any kind of
> corruption scenario from rearing its ugly head, so when the word
> corruption pops up I start to wonder about the system (hardware
> wise) someone is using,

You've apparently never used early versions of EnterpriseDB. ;)

Kidding aside, it's apparently been a while since I read that particular
part of the manual. The error I *was* familiar with was from the 8.0 manual:

"WARNING:  some databases have not been vacuumed in 1613770184 transactions
HINT:  Better vacuum them within 533713463 transactions, or you may have
a wraparound failure."

Ever since the early days, I've been so paranoid about regular
vacuuming, I'm probably still a little overcautious.

So, my bad. Having a database down for a few hours isn't exactly
desirable, but it's certainly not corruption. :)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

Re: Postgres for a "data warehouse", 5-10 TB

From

Scott Marlowe

Date:

12 September 2011, 18:00:39

On Mon, Sep 12, 2011 at 2:55 PM, Shaun Thomas <sthomas@peak6.com> wrote:
> On 09/12/2011 03:44 PM, Scott Marlowe wrote:
>
>> The PostgreSQL team works REALLY hard to prevent any kind of
>> corruption scenario from rearing its ugly head, so when the word
>> corruption pops up I start to wonder about the system (hardware
>> wise) someone is using,
>
>
> You've apparently never used early versions of EnterpriseDB. ;)
>
> Kidding aside, it's apparently been a while since I read that particular
> part of the manual. The error I *was* familiar with was from the 8.0 manual:
>
> "WARNING:  some databases have not been vacuumed in 1613770184 transactions
> HINT:  Better vacuum them within 533713463 transactions, or you may have a
> wraparound failure."
>
> Ever since the early days, I've been so paranoid about regular vacuuming,
> I'm probably still a little overcautious.
>
> So, my bad. Having a database down for a few hours isn't exactly desirable,
> but it's certainly not corruption. :)

No biggie, more a question of semantics.  Just a trigger word for me.
I started with pgsql 6.5.2 so I know ALL ABOUT corruption.  hehe.

Re: Postgres for a "data warehouse", 5-10 TB

From

Robert Klemme

Date:

12 September 2011, 18:26:37

On 12.09.2011 19:22, Andy Colson wrote:
> On 9/12/2011 12:15 PM, Robert Klemme wrote:
>> On 11.09.2011 19:02, Marti Raudsepp wrote:
>>> On Sun, Sep 11, 2011 at 17:23, Andy Colson<andy@squeakycode.net> wrote:
>>>> On 09/11/2011 08:59 AM, Igor Chudov wrote:
>>>>> By the way, does that INSERT UPDATE functionality or something like
>>>>> this exist in Postgres?
>>>> You have two options:
>>>> 1) write a function like:
>>>> create function doinsert(_id integer, _value text) returns void as
>>>> 2) use two sql statements:
>>>
>>> Unfortunately both of these options have caveats. Depending on your
>>> I/O speed, you might need to use multiple loader threads to saturate
>>> the write bandwidth.
>>>
>>> However, neither option is safe from race conditions. If you need to
>>> load data from multiple threads at the same time, they won't see each
>>> other's inserts (until commit) and thus cause unique violations. If
>>> you could somehow partition their operation by some key, so threads
>>> are guaranteed not to conflict each other, then that would be perfect.
>>> The 2nd option given by Andy is probably faster.
>>>
>>> You *could* code a race-condition-safe function, but that would be a
>>> no-go on a data warehouse, since each call needs a separate
>>> subtransaction which involves allocating a transaction ID.
>>
>> Wouldn't it be sufficient to reverse order for race condition safety?
>> Pseudo code:
>>
>> begin
>> insert ...
>> catch
>> update ...
>> if not found error
>> end
>>
>> Speed is another matter though...

> No, I dont think so, if you had two loaders, both would start a
> transaction, then neither could see what the other was doing.

It depends.  But the point is that not both INSERTS can succeed.  The
one which fails will attempt the UPDATE and - depending on isolation
level and DB implementation - will be blocked or fail.

In the case of PG this particular example will work:

1. TX inserts new PK row
2. TX tries to insert same PK row => blocks
1. TX commits
2. TX fails with PK violation
2. TX does the update (if the error is caught)

> There are
> transaction isolation levels, but they are like playing with fire. (in
> my opinion).

You make them sound like witchcraft.  But they are clearly defined -
even standardized.  Granted, different RDBMS might implement them in
different ways - here's PG's view of TX isolation:

http://www.postgresql.org/docs/8.4/interactive/transaction-iso.html

In my opinion anybody working with RDBMS should make himself familiar
with this concept - at least know about it - because it is one of the
fundamental features of RDBMS and certainly needs consideration in
applications with highly concurrent DB activity.

Kind regards

    robert

Re: Postgres for a "data warehouse", 5-10 TB

From

Marti Raudsepp

Date:

13 September 2011, 12:45:04

On Tue, Sep 13, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote:
> In the case of PG this particular example will work:
> 1. TX inserts new PK row
> 2. TX tries to insert same PK row => blocks
> 1. TX commits
> 2. TX fails with PK violation
> 2. TX does the update (if the error is caught)

That goes against the point I was making in my earlier comment. In
order to implement this error-catching logic, you'll have to allocate
a new subtransaction (transaction ID) for EVERY ROW you insert. If
you're going to be loading billions of rows this way, you will invoke
the wrath of the "vacuum freeze" process, which will seq-scan all
older tables and re-write every row that it hasn't touched yet. You'll
survive it if your database is a few GB in size, but in the terabyte
land that's unacceptable. Transaction IDs are a scarce resource there.

In addition, such blocking will limit the parallelism you will get
from multiple inserters.

Regards,
Marti

Re: Postgres for a "data warehouse", 5-10 TB

From

Robert Klemme

Date:

13 September 2011, 13:34:19

On Tue, Sep 13, 2011 at 5:13 PM, Marti Raudsepp <marti@juffo.org> wrote:
> On Tue, Sep 13, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote:
>> In the case of PG this particular example will work:
>> 1. TX inserts new PK row
>> 2. TX tries to insert same PK row => blocks
>> 1. TX commits
>> 2. TX fails with PK violation
>> 2. TX does the update (if the error is caught)
>
> That goes against the point I was making in my earlier comment. In
> order to implement this error-catching logic, you'll have to allocate
> a new subtransaction (transaction ID) for EVERY ROW you insert.

I don't think so.  You only need to catch the error (see attachment).
Or does this create a sub transaction?

> If
> you're going to be loading billions of rows this way, you will invoke
> the wrath of the "vacuum freeze" process, which will seq-scan all
> older tables and re-write every row that it hasn't touched yet. You'll
> survive it if your database is a few GB in size, but in the terabyte
> land that's unacceptable. Transaction IDs are a scarce resource there.

Certainly.  But it's not needed as far as I can see.

> In addition, such blocking will limit the parallelism you will get
> from multiple inserters.

Yes, I mentioned the speed issue.  But regardless of the solution for
MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you
will have the locking problem anyhow if you plan to insert
concurrently into the same table and be robust.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Attachment

ins.sql

Re: Postgres for a "data warehouse", 5-10 TB

From

"Marc Mamin"

Date:

13 September 2011, 15:09:32

Hi,

> (see attachment)

under high concurency you may expect that your data is already in.
In such a case you better do nothing at all:

begin

select dat=a_dat from t where id=a_id into test:

if test is null then

   begin

    insert into t (id, dat) values (a_id, a_dat);
    exception
    when unique_violation then
      update t set dat = a_dat where id = a_id and dat <> a_dat;
      return 0;

   end;

elsif not test then

    update t set dat = a_dat where id = a_id;
      return 0;

end if;

return 1;

best regards,

Marc Mamin

-----Ursprüngliche Nachricht-----
Von: pgsql-performance-owner@postgresql.org im Auftrag von Robert Klemme
Gesendet: Di 9/13/2011 6:34
An: Marti Raudsepp
Cc: pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] Postgres for a "data warehouse", 5-10 TB

On Tue, Sep 13, 2011 at 5:13 PM, Marti Raudsepp <marti@juffo.org> wrote:
> On Tue, Sep 13, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote:
>> In the case of PG this particular example will work:
>> 1. TX inserts new PK row
>> 2. TX tries to insert same PK row => blocks
>> 1. TX commits
>> 2. TX fails with PK violation
>> 2. TX does the update (if the error is caught)
>
> That goes against the point I was making in my earlier comment. In
> order to implement this error-catching logic, you'll have to allocate
> a new subtransaction (transaction ID) for EVERY ROW you insert.

I don't think so. You only need to catch the error (see attachment).
Or does this create a sub transaction?

> If
> you're going to be loading billions of rows this way, you will invoke
> the wrath of the "vacuum freeze" process, which will seq-scan all
> older tables and re-write every row that it hasn't touched yet. You'll
> survive it if your database is a few GB in size, but in the terabyte
> land that's unacceptable. Transaction IDs are a scarce resource there.

Certainly. But it's not needed as far as I can see.

> In addition, such blocking will limit the parallelism you will get
> from multiple inserters.

Yes, I mentioned the speed issue. But regardless of the solution for
MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you
will have the locking problem anyhow if you plan to insert
concurrently into the same table and be robust.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: Postgres for a "data warehouse", 5-10 TB

From

Marti Raudsepp

Date:

13 September 2011, 15:19:58

On Tue, Sep 13, 2011 at 19:34, Robert Klemme <shortcutter@googlemail.com> wrote:
> I don't think so.  You only need to catch the error (see attachment).
> Or does this create a sub transaction?

Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a
SAVEPOINT it can roll back to in case of an error.

> Yes, I mentioned the speed issue.  But regardless of the solution for
> MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you
> will have the locking problem anyhow if you plan to insert
> concurrently into the same table and be robust.

In a mass-loading application you can often divide the work between
threads in a manner that doesn't cause conflicts.

For example, if the unique key is foobar_id and you have 4 threads,
thread 0 will handle rows where (foobar_id%4)=0, thread 1 takes
(foobar_id%4)=1 etc. Or potentially hash foobar_id before dividing the
work.

I already suggested this in my original post.

Regards,
Marti

Re: Postgres for a "data warehouse", 5-10 TB

From

Igor Chudov

Date:

13 September 2011, 15:39:20

I do not need to do insert updates from many threads. I want to do it from one thread.

My current MySQL architecture is that I have a table with same layout as the main one, to hold new and updated objects.

When there is enough objects, I begin a big INSERT SELECT ... ON DUPLICATE KEY UPDATE and stuff that into the master table.

On Tue, Sep 13, 2011 at 1:11 PM, Marti Raudsepp <marti@juffo.org> wrote:

On Tue, Sep 13, 2011 at 19:34, Robert Klemme <shortcutter@googlemail.com> wrote:
> I don't think so. You only need to catch the error (see attachment).
> Or does this create a sub transaction?

Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a
SAVEPOINT it can roll back to in case of an error.

> Yes, I mentioned the speed issue. But regardless of the solution for
> MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you
> will have the locking problem anyhow if you plan to insert
> concurrently into the same table and be robust.

In a mass-loading application you can often divide the work between
threads in a manner that doesn't cause conflicts.

For example, if the unique key is foobar_id and you have 4 threads,
thread 0 will handle rows where (foobar_id%4)=0, thread 1 takes
(foobar_id%4)=1 etc. Or potentially hash foobar_id before dividing the
work.

I already suggested this in my original post.

Regards,
Marti

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Postgres for a "data warehouse", 5-10 TB

From

Stefan Keller

Date:

13 September 2011, 15:57:50

Interesting debate.

2011/9/13 Marti Raudsepp <marti@juffo.org>:
> Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a
> SAVEPOINT it can roll back to in case of an error.

Are you sure? In theory I always understood that there are no
"subtransactions".

In fact when looking at the docs there is chapter 39.6.6. saying "By
default, any error occurring in a PL/pgSQL function aborts execution
of the function, and indeed of the surrounding transaction as well.
You can trap errors and recover from them by using a BEGIN block with
an EXCEPTION clause."
(http://www.postgresql.org/docs/current/interactive/plpgsql-control-structures.html
)

So the doc isn't totally explicit about this. But whatever: What would
be the the function of a subtransaction? To give the possibility to
recover and continue within the surrounding transaction?

Stefan

2011/9/13 Marti Raudsepp <marti@juffo.org>:
> On Tue, Sep 13, 2011 at 19:34, Robert Klemme <shortcutter@googlemail.com> wrote:
>> I don't think so.  You only need to catch the error (see attachment).
>> Or does this create a sub transaction?
>
> Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a
> SAVEPOINT it can roll back to in case of an error.
>
>> Yes, I mentioned the speed issue.  But regardless of the solution for
>> MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you
>> will have the locking problem anyhow if you plan to insert
>> concurrently into the same table and be robust.
>
> In a mass-loading application you can often divide the work between
> threads in a manner that doesn't cause conflicts.
>
> For example, if the unique key is foobar_id and you have 4 threads,
> thread 0 will handle rows where (foobar_id%4)=0, thread 1 takes
> (foobar_id%4)=1 etc. Or potentially hash foobar_id before dividing the
> work.
>
> I already suggested this in my original post.
>
> Regards,
> Marti
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Postgres for a "data warehouse", 5-10 TB

From

Scott Marlowe

Date:

13 September 2011, 16:27:20

On Tue, Sep 13, 2011 at 12:57 PM, Stefan Keller <sfkeller@gmail.com> wrote:
> Are you sure? In theory I always understood that there are no
> "subtransactions".

"subtransaction" is just another way of saying save points / rollback.

Re: Postgres for a "data warehouse", 5-10 TB

From

Robert Klemme

Date:

13 September 2011, 17:23:20

On 13.09.2011 20:11, Marti Raudsepp wrote:
> On Tue, Sep 13, 2011 at 19:34, Robert Klemme<shortcutter@googlemail.com>  wrote:
>> I don't think so.  You only need to catch the error (see attachment).
>> Or does this create a sub transaction?
>
> Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a
> SAVEPOINT it can roll back to in case of an error.

Ouch!  Learn something new every day.  Thanks for the update!

http://www.postgresql.org/docs/8.4/interactive/plpgsql-structure.html

Side note: it seems that Oracle handles this differently (i.e. no
subtransaction but the INSERT would be rolled back) making the pattern
pretty usable for this particular situation.  Also, I have never heard
that TX ids are such a scarse resource over there.

Would anybody think it a good idea to optionally have a BEGIN EXCEPTION
block without the current TX semantics?  In absence of that what would
be a better pattern to do it (other than UPDATE and INSERT if not found)?

>> Yes, I mentioned the speed issue.  But regardless of the solution for
>> MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you
>> will have the locking problem anyhow if you plan to insert
>> concurrently into the same table and be robust.
>
> In a mass-loading application you can often divide the work between
> threads in a manner that doesn't cause conflicts.

Yeah, but concurrency might not the only reason to optionally update.
If the data is there you might rather want to overwrite it instead of
failure.

Kind regards

    robert

Re: Postgres for a "data warehouse", 5-10 TB

From

Robert Klemme

Date:

13 September 2011, 17:30:16

On 13.09.2011 20:57, Stefan Keller wrote:
> Interesting debate.

Indeed.

> 2011/9/13 Marti Raudsepp<marti@juffo.org>:
>> Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a
>> SAVEPOINT it can roll back to in case of an error.
>
> Are you sure? In theory I always understood that there are no
> "subtransactions".

What theory are you referring to?

> In fact when looking at the docs there is chapter 39.6.6. saying "By
> default, any error occurring in a PL/pgSQL function aborts execution
> of the function, and indeed of the surrounding transaction as well.
> You can trap errors and recover from them by using a BEGIN block with
> an EXCEPTION clause."
> (http://www.postgresql.org/docs/current/interactive/plpgsql-control-structures.html
> )
>
> So the doc isn't totally explicit about this. But whatever: What would
> be the the function of a subtransaction? To give the possibility to
> recover and continue within the surrounding transaction?

I find this pretty explicit:

It is important not to confuse the use of BEGIN/END for grouping
statements in PL/pgSQL with the similarly-named SQL commands for
transaction control. PL/pgSQL's BEGIN/END are only for grouping; they do
not start or end a transaction. Functions and trigger procedures are
always executed within a transaction established by an outer query —
they cannot start or commit that transaction, since there would be no
context for them to execute in. However, a block containing an EXCEPTION
clause effectively forms a subtransaction that can be rolled back
without affecting the outer transaction. For more about that see Section
38.6.5.

http://www.postgresql.org/docs/8.4/interactive/plpgsql-structure.html

Cheers

    robert

Re: Postgres for a "data warehouse", 5-10 TB

From

"Kevin Grittner"

Date:

13 September 2011, 17:34:13

Robert Klemme  wrote:
> On 12.09.2011 19:22, Andy Colson wrote:

>> There are transaction isolation levels, but they are like playing
>> with fire. (in my opinion).

> You make them sound like witchcraft. But they are clearly defined
> - even standardized.

Yeah, for decades.  Developing concurrency control from scratch at
the application level over and over again is more like playing with
fire, in my book.

> Granted, different RDBMS might implement them in different ways -
> here's PG's view of TX isolation:

> http://www.postgresql.org/docs/8.4/interactive/transaction-iso.html

Oh, that link is *so* day-before-yesterday!  Try this one:

http://www.postgresql.org/docs/9.1/interactive/transaction-iso.html

> In my opinion anybody working with RDBMS should make himself
> familiar with this concept - at least know about it - because it
> is one of the fundamental features of RDBMS and certainly needs
> consideration in applications with highly concurrent DB activity.

+1

Understanding what levels of transaction isolation are available,
and what the implications of each are, is fundamental.  Just as
there are cases where a foreign key constraint doesn't exactly work
for what you need to enforce, there are cases where serializable
transactions don't fit.  But where they do fit, developing the
equivalent from scratch all over again is not as safe or productive
as using the built-in feature.

-Kevin

Re: Postgres for a "data warehouse", 5-10 TB

From

Gianni Ciolli

Date:

13 September 2011, 17:59:49

On Mon, Sep 12, 2011 at 11:26:10PM +0200, Robert Klemme wrote:
> You make them sound like witchcraft.  But they are clearly defined -
> even standardized.  Granted, different RDBMS might implement them in
> different ways - here's PG's view of TX isolation:
>
> http://www.postgresql.org/docs/8.4/interactive/transaction-iso.html

Even better: PostgreSQL 9.1 (Released yesterday! Fresher than milk...)
ships an improved algorithm for serializable transaction isolation
level:

  http://www.postgresql.org/docs/9.1/interactive/transaction-iso.html

More info:

  http://wiki.postgresql.org/wiki/Serializable

Cheers,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it