Thread: Large Database Design Help

Large Database Design Help

From

Orion Henry

Date:

09 February 2006, 15:07:00

Hello All,

I've inherited a postgresql database that I would like to refactor.   It
was origionally designed for Postgres 7.0 on a PIII 500Mhz and some
design decisions were made that don't make sense any more.  Here's the
problem:

1) The database is very large, the largest table has 40 million tuples.

2) The database needs to import 10's of thousands of tuples each night
quickly.  The current method is VERY slow.

3) I can't import new records with a COPY or drop my indexes b/c some of
them are new records (INSERTS) and some are altered records (UPDATES)
and the only way I can think of to identify these records is to perform
a select for each record.

Here is how the database is currently laid out and you'll see why I have
a problem with it

1) The data is easily partitionable by client ID.  In an attempt to keep
the indexes small and the inserts fast one table was made per client
ID.  Thus the primary table in the database (the one with 40 million
tuples) is really 133 tables each ending with a three digit suffix.
The largest of these client tables has 8 million of the 40 million
tuples.   The system started with around a half dozen clients and is now
a huge pain to manage with so many tables.  I was hoping new hardware
and new postgres features would allow for this data to be merged safely
into a single table.

2) The imports are not done inside of transactions.  I'm assuming the
system designers excluded this for a reason.  Will I run into problems
performing tens of thousands of inserts and updates inside a single
transaction?

3) The current code that bulk loads data into the database is a loop
that looks like this:

                $result = exe("INSERT INTO $table ($name_str) SELECT
$val_str WHERE NOT EXISTS (SELECT 1 FROM $table WHERE $keys)");
                if ($result == 0)
                {
                        $result = exe("UPDATE $table SET $non_keys WHERE
$keys");
                }

Is there a faster way to bulk load data when it's not known ahead of
time if it's a new record or an updated record?

What  I would LIKE to do but am afraid I will hit a serious performance
wall (or am missing an obvious / better way to do it)

1) Merge all 133 client tables into a single new table, add a client_id
column, do the data partitioning on the indexes not the tables as seen here:

  CREATE INDEX actioninfo_order_number_XXX_idx ON actioninfo (
order_number ) WHERE client_id = XXX;
  CREATE INDEX actioninfo_trans_date_XXX_idx ON actioninfo (
transaction_date ) WHERE client_id = XXX;

    (Aside question: if I were to find a way to use COPY and I were
loading data on a single client_id, would dropping just the indexes for
that client_id accelerate the load?)

2) Find some way to make the bulk loads faster or more efficent (help!)

3) Wrap each load into a transaction ( tens of thousands of records per
load )

Is this a good plan?  Is there a better way?  Am I walking into a trap?
Should I leave well enough alone and not try and fix something that's
not broken?

FWIW here's the hardware and the profile of the current uber table:

      Column       |  Type   | Modifiers
-------------------+---------+-----------
 order_number      | integer | not null
 order_line_number | integer | not null
 action_number     | integer | not null
 transaction_date  | date    |
 code              | text    |
 trans_group_code  | text    |
 quantity          | integer |
 extension         | money   |
 sales_tax         | money   |
 shipping          | money   |
 discount          | money   |

Dual Opteron 246, 4 disk SCSI RAID5, 4GB of RAM

# du -sh /var/lib/postgres/data/
16G     /var/lib/postgres/data/

( the current database is PG 7.4 - I intend to upgrade it to 8.1 if and
when I do this refactoring )
( the current OS is Debian Unstable but I intend to be running RHEL 4.0
if and when I do this refactoring )

Re: Large Database Design Help

From

"Jim C. Nasby"

Date:

09 February 2006, 15:45:15

On Thu, Feb 09, 2006 at 11:07:06AM -0800, Orion Henry wrote:
>
> Hello All,
>
> I've inherited a postgresql database that I would like to refactor.   It
> was origionally designed for Postgres 7.0 on a PIII 500Mhz and some
> design decisions were made that don't make sense any more.  Here's the
> problem:
>
> 1) The database is very large, the largest table has 40 million tuples.
>
> 2) The database needs to import 10's of thousands of tuples each night
> quickly.  The current method is VERY slow.
>
> 3) I can't import new records with a COPY or drop my indexes b/c some of
> them are new records (INSERTS) and some are altered records (UPDATES)
> and the only way I can think of to identify these records is to perform
> a select for each record.
>
> Here is how the database is currently laid out and you'll see why I have
> a problem with it
>
> 1) The data is easily partitionable by client ID.  In an attempt to keep
> the indexes small and the inserts fast one table was made per client
> ID.  Thus the primary table in the database (the one with 40 million
> tuples) is really 133 tables each ending with a three digit suffix.
> The largest of these client tables has 8 million of the 40 million
> tuples.   The system started with around a half dozen clients and is now
> a huge pain to manage with so many tables.  I was hoping new hardware
> and new postgres features would allow for this data to be merged safely
> into a single table.

If all the clients are equally active then partitioning by client
probably makes little sense. If some clients are much more active than
others then leaving this as-is could be a pretty big win. If the
partitioning is done with either a view and rules or inherited tables
and rules it shouldn't be too hard to manage.

> 2) The imports are not done inside of transactions.  I'm assuming the
> system designers excluded this for a reason.  Will I run into problems
> performing tens of thousands of inserts and updates inside a single
> transaction?

Never attribute to thoughtful design that which can be fully explained
by ignorance. :) I'd bet they just didn't know any better.

> 3) The current code that bulk loads data into the database is a loop
> that looks like this:
>
>                $result = exe("INSERT INTO $table ($name_str) SELECT
> $val_str WHERE NOT EXISTS (SELECT 1 FROM $table WHERE $keys)");
>                if ($result == 0)
>                {
>                        $result = exe("UPDATE $table SET $non_keys WHERE
> $keys");
>                }
>
> Is there a faster way to bulk load data when it's not known ahead of
> time if it's a new record or an updated record?

Uuuugly. :) Instead, load everything into a temp table using COPY and
then UPDATE real_table ... FROM temp_table t WHERE real_table.key =
t.key and INSERT SELECT ... WHERE NOT EXISTS. But take note that this is
a race condition so you can only do it if you know nothing else will be
inserting into the real table at the same time.

You might want to look at the stats-proc code at
http://cvs.distributed.net; it does exactly this type of thing.

> What  I would LIKE to do but am afraid I will hit a serious performance
> wall (or am missing an obvious / better way to do it)
>
> 1) Merge all 133 client tables into a single new table, add a client_id
> column, do the data partitioning on the indexes not the tables as seen here:
>
>  CREATE INDEX actioninfo_order_number_XXX_idx ON actioninfo (
> order_number ) WHERE client_id = XXX;
>  CREATE INDEX actioninfo_trans_date_XXX_idx ON actioninfo (
> transaction_date ) WHERE client_id = XXX;
>
>    (Aside question: if I were to find a way to use COPY and I were
> loading data on a single client_id, would dropping just the indexes for
> that client_id accelerate the load?)

Hrm, I believe it would...

> 2) Find some way to make the bulk loads faster or more efficent (help!)
Don't do things row-by-row. If you can't ensure that there will be only
one process inserting to eliminate the race condition I mentioned above
then reply back and I'll point you at code that should still be much
faster than what you're doing now.

> 3) Wrap each load into a transaction ( tens of thousands of records per
> load )

Getting rid of row-by-row will be your biggest win. If you do have to do
row-by-row, at least wrap it in a transaction. As long as the
transaction doesn't take *too* long it won't be an issue.

> Is this a good plan?  Is there a better way?  Am I walking into a trap?
> Should I leave well enough alone and not try and fix something that's
> not broken?
>
> FWIW here's the hardware and the profile of the current uber table:
>
>      Column       |  Type   | Modifiers
> -------------------+---------+-----------
> order_number      | integer | not null
> order_line_number | integer | not null
> action_number     | integer | not null
> transaction_date  | date    |
> code              | text    |
> trans_group_code  | text    |
> quantity          | integer |
> extension         | money   |
> sales_tax         | money   |
> shipping          | money   |
> discount          | money   |
>
> Dual Opteron 246, 4 disk SCSI RAID5, 4GB of RAM

Remember that the write performance of raid5 normally stinks.

> # du -sh /var/lib/postgres/data/
> 16G     /var/lib/postgres/data/
>
> ( the current database is PG 7.4 - I intend to upgrade it to 8.1 if and
> when I do this refactoring )

Going to 8.1 would help in a large number of ways even if you don't
refactor. The stats-proc code I mentioned runs 2x faster under 8.1 than
it does under 7.4.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Large Database Design Help

From

Greg Stark

Date:

09 February 2006, 15:45:17

Orion Henry <lab@orangekids.org> writes:

> What  I would LIKE to do but am afraid I will hit a serious performance wall
> (or am missing an obvious / better way to do it)
>
> 1) Merge all 133 client tables into a single new table, add a client_id column,
> do the data partitioning on the indexes not the tables as seen here:
>
>   CREATE INDEX actioninfo_order_number_XXX_idx ON actioninfo ( order_number )
> WHERE client_id = XXX;
>   CREATE INDEX actioninfo_trans_date_XXX_idx ON actioninfo ( transaction_date )
> WHERE client_id = XXX;

The advantages to the partitioned scheme are a) you can drop a client quickly
in a single operation b) the indexes are only half as wide since they don't
include client_id and c) you can do a sequential scan of an entire client
without using the index at all.

Unless any of these are overwhelming I would say to go ahead and merge them.
If you frequently scan all the records of a single client or frequently drop
entire clients then the current scheme may be helpful.

>     (Aside question: if I were to find a way to use COPY and I were loading
> data on a single client_id, would dropping just the indexes for that client_id
> accelerate the load?)

Dropping indexes would accelerate the load but unless you're loading a large
number of records relative the current size I'm not sure it would be a win
since you would then have to rebuild the index for the entire segment.

> 2) Find some way to make the bulk loads faster or more efficent (help!)

If your existing data isn't changing while you're doing the load (and if it is
then your existing load process has a race condition btw) then you could do it
in a couple big queries:

COPY ${table}_new FROM '...';
CREATE TABLE ${table}_exists  as SELECT * FROM ${table}_new WHERE EXISTS (select 1 from $table where ${table}_new.key =
$table.key);
CREATE TABLE ${table}_insert  as SELECT * FROM ${table}_new WHERE NOT EXISTS (select 1 from $table where
${table}_new.key= $table.key); 

UPDATE $table set ... FROM ${table}_exists WHERE ${table}_exists.key = ${table}.key
INSERT INTO $table (select * from ${table}_insert)

actually you could skip the whole ${table_insert} step there and just do the
insert I guess. There are also other approaches you could use like adding a
new column to ${table}_new instead of creating new tables, etc.

> 3) Wrap each load into a transaction ( tens of thousands of records per load )

Yes, Postgres is faster if you do more operations in a single transaction.
Every COMMIT means waiting for an fsync. The only disadvantage to batching
them into a large transaction is if it lasts a *long* time then it could
create problems with your vacuum strategy. Any vacuum that runs while the
transaction is still running won't be able to vacuum anything.

You might consider running VACUUM FULL or CLUSTER on the table when you're
done with the loading process. It will lock the table while it runs though.

--
greg

Re: Large Database Design Help

From

Matthew Nuzum

Date:

09 February 2006, 17:50:27

On 2/9/06, Orion Henry <lab@orangekids.org> wrote:
>
> Hello All,
>
> I've inherited a postgresql database that I would like to refactor.   It
> was origionally designed for Postgres 7.0 on a PIII 500Mhz and some
> design decisions were made that don't make sense any more.  Here's the
> problem:
>
> 1) The database is very large, the largest table has 40 million tuples.
>
> 2) The database needs to import 10's of thousands of tuples each night
> quickly.  The current method is VERY slow.
>
> 3) I can't import new records with a COPY or drop my indexes b/c some of
> them are new records (INSERTS) and some are altered records (UPDATES)
> and the only way I can think of to identify these records is to perform
> a select for each record.

 [snip]
>
> 3) The current code that bulk loads data into the database is a loop
> that looks like this:
>
>                 $result = exe("INSERT INTO $table ($name_str) SELECT
> $val_str WHERE NOT EXISTS (SELECT 1 FROM $table WHERE $keys)");
>                 if ($result == 0)
>                 {
>                         $result = exe("UPDATE $table SET $non_keys WHERE
> $keys");
>                 }
>
> Is there a faster way to bulk load data when it's not known ahead of
> time if it's a new record or an updated record?

I experimented with something like this and I was able to successively
decrease the amount of time needed with an import. The final solution
that took my import down from aproximately 24 hours to about 30 min
was to use a C#/Java hashtable or a python dictionary. For example,
the unique data in one particular table was "User_Agent" so I made it
the key in my hashtable. I actually added a method to the hashtable so
that when I added a new record to the hashtable it would do the insert
into the db.

The downside to this is that it used *GOBS* of RAM.

Using Python, I was able to dramatically decrease the ram usage by
switching to a GDB based dictionary instead of the standard
dictionary. It only increased the time by about 50% so the total
processing time was about 45 min vs the previous 30 min.

I only had about 35 million records and my technique was getting to
the point where it was unweldy, so with your 40 million and counting
records you would probably want to start with the GDB technique unless
you have a ton of available ram.

You might interpret this as being a knock against PostgreSQL since I
pulled the data out of the db, but it's not; You'd be hard pressed to
find anything as fast as the in-memory hashtable or the on disk GDB;
however it's usefullness is very limited and for anything more complex
than just key=>value lookups moving to PostgreSQL is likely a big win.

--
Matthew Nuzum
www.bearfruit.org

Re: Large Database Design Help

From

Markus Schaber

Date:

10 February 2006, 06:04:31

Hi, Greg,

Greg Stark wrote:

>>    (Aside question: if I were to find a way to use COPY and I were loading
>>data on a single client_id, would dropping just the indexes for that client_id
>>accelerate the load?)
> Dropping indexes would accelerate the load but unless you're loading a large
> number of records relative the current size I'm not sure it would be a win
> since you would then have to rebuild the index for the entire segment.

And, additionally, rebuilding a partial index with "WHERE client_id=42"
needs a full table scan, which is very slow, so temporarily dropping the
indices will not be useful if you merge the tables.

Btw, I don't know whether PostgreSQL can make use of partial indices
when building other partial indices. If yes, you could temporarily drop
all but one of the partial indices for a specific client.

HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Large Database Design Help

From

Markus Schaber

Date:

10 February 2006, 06:24:39

Hi, Henry,

Orion Henry wrote:

> 1) The database is very large, the largest table has 40 million tuples.

I'm afraid this doesn't qualify as '_very_ large' yet, but it
definitively is large enough to have some deep thoughts about it. :-)

> 1) The data is easily partitionable by client ID.  In an attempt to keep
> the indexes small and the inserts fast one table was made per client
> ID.  Thus the primary table in the database (the one with 40 million
> tuples) is really 133 tables each ending with a three digit suffix.
> The largest of these client tables has 8 million of the 40 million
> tuples.   The system started with around a half dozen clients and is now
> a huge pain to manage with so many tables.  I was hoping new hardware
> and new postgres features would allow for this data to be merged safely
> into a single table.

It possibly is a good idea to merge them.

If you decide to keep them separated for whatever reason, you might want
to use schemas instead of three digit suffixes. Together with
appropriate named users or 'set search_path', this may help you to
simplify your software.

In case you want to keep separate tables, but need some reports touching
all tables from time to time, table inheritance may help you. Just
create a base table, and then inherit all user specific tables from that
base table. Of course, this can be combined with the schema approach by
having the child tables in their appropriate schemas.

> 2) The imports are not done inside of transactions.  I'm assuming the
> system designers excluded this for a reason.  Will I run into problems
> performing tens of thousands of inserts and updates inside a single
> transaction?

Yes, it should give you a huge boost. Every commit has to flush the WAL
out to disk, which takes at least one disk spin. So on a simple 7200 RPM
disk, you cannot have more than 120 transactions/second.

It may make sense to split such a bulk load into transactions of some
tens of thousands of rows, but that depends on how easy it is for your
application to resume in the middle of the bulk if the connection
aborts, and how much concurrent access you have on the backend.

> 3) The current code that bulk loads data into the database is a loop
> that looks like this:
>
>                $result = exe("INSERT INTO $table ($name_str) SELECT
> $val_str WHERE NOT EXISTS (SELECT 1 FROM $table WHERE $keys)");
>                if ($result == 0)
>                {
>                        $result = exe("UPDATE $table SET $non_keys WHERE
> $keys");
>                }
> Is there a faster way to bulk load data when it's not known ahead of
> time if it's a new record or an updated record?

Perhaps the easiest way might be to issue the update first. Update
returns a row count of the updated rows. If it is 0, you have to insert
the row.

This can even be encapsulated into a "before insert" trigger on the
table, which tries the update and ignores the insert if the update
succeeded. This way, you can even use COPY on the client side.

We're using this approach for one of our databases, where a client side
crash can result in occasional duplicates being COPYed to the table.

> Dual Opteron 246, 4 disk SCSI RAID5, 4GB of RAM

For lots non-read-only database workloads, RAID5 is a performance
killer. Raid 1/0 might be better, or having two mirrors of two disks
each, the first mirror holding system, swap, and the PostgreSQL WAL
files, the second one holding the data. Don't forget to tune the
postgresql settings appropriately. :-)

> # du -sh /var/lib/postgres/data/
> 16G     /var/lib/postgres/data/

Your database seems to be small enough to fit on a single disk, so the
two mirrors approach I described above will be the best IMHO.

> ( the current database is PG 7.4 - I intend to upgrade it to 8.1 if and
> when I do this refactoring )

This is a very good idea, 8.1 is miles ahead of 7.4 in many aspects.

> ( the current OS is Debian Unstable but I intend to be running RHEL 4.0
> if and when I do this refactoring )

This should not make too much difference.

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Large Database Design Help

From

PFC

Date:

10 February 2006, 16:18:06

> was origionally designed for Postgres 7.0 on a PIII 500Mhz and some

    Argh.

> 1) The database is very large, the largest table has 40 million tuples.

    Is this simple types (like a few ints, text...) ?
    How much space does it use on disk ? can it fit in RAM ?

> 2) The database needs to import 10's of thousands of tuples each night
> quickly.  The current method is VERY slow.

    You bet, COMMIT'ing after each insert or update is about the worst that
can be done. It works fine on MySQL/MyISAM (which doesn't know about
commit...) so I'd guess the system designer had a previous experience with
MySQL.

    My advice woule be :

    - get a decent machine with some RAM (I guess you already knew this)...

    Now, the update.

    I would tend to do this :

- Generate a text file with your update data, using whatever tool you like
best (perl, php, python, java...)
- CREATE TEMPORARY TABLE blah ...
- COPY blah FROM your update file.

    COPY is super fast. I think temporary tables don't write to the xlog, so
they are also very fast. This should not take more than a few seconds for
a few 10 K's of simple rows on modern hardware. It actually takes a
fraction of a second on my PC for about 9K rows with 5 INTEGERs on them.

    You can also add constraints on your temporary table, to sanitize your
data, in order to be reasonably sure that the following updates will work.

    The data you feed to copy should be correct, or it will rollback. This is
your script's job to escape everything.

    Now you got your data in the database. You have several options :

    - You are confident that the UPDATE will work without being rolled back
by some constraint violation. Therefore, you issue a big joined UPDATE to
update all the rows in your main table which are also in your temp table.
Then you issue an INSERT INTO ... SELECT ... to insert the ones which were
not already in the big table.

    Joined updates can be slow if your RAM is too small and it has to thrash
the disk looking for every tuple around.
    You can cheat and CLUSTER your main table (say, once a week), so it is
all in index order. Then you arrange your update data so it is in the same
order (for instance, you SELECT INTO another temp table, with an ORDER BY
corresponding to the CLUSTER on the main table). Having both in the same
order will help reducing random disk accesses.

    - If you don't like this method, then you might want to use the same
strategy as before (ie. a zillion queries), but write it in PSQL instead.
PSQL is a lot faster, because everything is already parsed and planned
beforehand. So you could do the following :

- for each row in the temporary update table :
    - UPDATE the corresponding row in the main table
        - IF FOUND, then cool, it was updated, nothing more to do.
         You don't need to SELECT in order to know if the row is there.
         UPDATE does it for you, without the race condition.
        - IF NOT FOUND, then insert.
        This has a race condition.
        You know your application, so you'll know if it matters or not.

    What do you think ?

> 3) I can't import new records with a COPY or drop my indexes b/c some of
> them are new records (INSERTS) and some are altered records (UPDATES)
> and the only way I can think of to identify these records is to perform
> a select for each record.

    Yes and no ; if you must do this, then use PSQL, it's a lot faster. And
skip the SELECT.
    Also, use the latest version. It really rocks.
    Like many said on the list, put pg_xlog on its own physical disk, with
ext2fs.

> 3) Wrap each load into a transaction ( tens of thousands of records per
> load )

    That's the idea. The first strategy here (big update) uses one
transaction anyway. For the other one, your choice. You can either do it
all in 1 transaction, or in bunches of 1000 rows... but 1 row at a time
would be horrendously slow.

    Regards,

    P.F.C

Re: Large Database Design Help

From

Ragnar

Date:

10 February 2006, 18:38:40

On Fri, 2006-02-10 at 11:24 +0100, Markus Schaber wrote:

> For lots non-read-only database workloads, RAID5 is a performance
> killer. Raid 1/0 might be better, or having two mirrors of two disks
> each, the first mirror holding system, swap, and the PostgreSQL WAL
> files, the second one holding the data.

I was under the impression that it is preferable to keep the WAL on
its own spindles with no other activity there, to take full advantage
of the sequential nature of the WAL writes.

That would mean one mirror for the WAL, and one for the rest.
This, of course, may sometimes be too much wasted disk space, as the WAL
typically will not use a whole disk, so you might partition this mirror
into a small ext2 filesystem for WAL, and use the rest for files seldom
accessed, such as backups.

gnari

Re: Large Database Design Help

From

Scott Marlowe

Date:

10 February 2006, 18:42:27

On Fri, 2006-02-10 at 16:39, Ragnar wrote:
> On Fri, 2006-02-10 at 11:24 +0100, Markus Schaber wrote:
>
> > For lots non-read-only database workloads, RAID5 is a performance
> > killer. Raid 1/0 might be better, or having two mirrors of two disks
> > each, the first mirror holding system, swap, and the PostgreSQL WAL
> > files, the second one holding the data.
>
> I was under the impression that it is preferable to keep the WAL on
> its own spindles with no other activity there, to take full advantage
> of the sequential nature of the WAL writes.
>
> That would mean one mirror for the WAL, and one for the rest.
> This, of course, may sometimes be too much wasted disk space, as the WAL
> typically will not use a whole disk, so you might partition this mirror
> into a small ext2 filesystem for WAL, and use the rest for files seldom
> accessed, such as backups.

Well, on most database servers, the actual access to the OS and swap
drives should drop to about zero over time, so this is a workable
solution if you've only got enough drives / drive slots for two mirrors.

Re: Large Database Design Help

From

Orion

Date:

22 February 2006, 18:07:37

I just wanted to thank everyone for your input on my question.  You've
given me a lot of tools to solve my problem here.

Orion