Re: perf problem with huge table - Mailing list pgsql-performance

From Jon Lewison
Subject Re: perf problem with huge table
Date
Msg-id 7ac84c51002101530q59fa87c1w205fdf5243d9d69a@mail.gmail.com
Whole thread Raw
In response to Re: perf problem with huge table  (Dave Crooke <dcrooke@gmail.com>)
Responses Re: perf problem with huge table  (Dave Crooke <dcrooke@gmail.com>)
List pgsql-performance


On Wed, Feb 10, 2010 at 4:16 PM, Dave Crooke <dcrooke@gmail.com> wrote:
Hi Rama

I'm actually looking at going in the other direction ....

I have an app using PG where we have a single table where we just added a lot of data, and I'm ending up with many millions of rows, and I'm finding that the single table schema simply doesn't scale.

In PG, the table partitioning is only handled by the database for reads, for insert/update you need to do quite a lot of DIY (setting up triggers, etc.) so I am planning to just use named tables and generate the necessary DDL / DML in vanilla SQL the same way that your older code does.

My experience is mostly with Oracle, which is not MVCC, so I've had to relearn some stuff:

Just a nit, but Oracle implements MVCC.  90% of the databases out there do.
 
- Oracle often answers simple queries (e.g. counts and max / min) using only the index, which is of course pre-sorted. PG has to go out and fetch the rows to see if they are still in scope, and if they are stored all over the place on disk it means an 8K random page fetch for each row. This means that adding an index to PG is not nearly the silver bullet that it can be with some non-MVCC databases.

- PG's indexes seem to be quite a bit larger than Oracle's, but that's gut feel, I haven't been doing true comparisons ...  however, for my app I have limited myself to only two indexes on that table, and each index is larger (in disk space) than the table itself ... I have 60GB of data and 140GB of indexes :-)

- There is a lot of row turnover in my big table (I age out data) .... a big delete (millions of rows) in PG seems a bit more expensive to process than in Oracle, however PG is not nearly as sensitive to transaction sizes as Oracle is, so you can cheerfully throw out one big "DELETE from FOO where ..." and let the database chew on it .

I find partitioning pretty useful in this scenario if the data allows is.  Aging out data just means dropping a partition rather than a delete statement.

pgsql-performance by date:

Previous
From: "Jorge Montero"
Date:
Subject: Re: Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
Next
From: Dave Crooke
Date:
Subject: Re: perf problem with huge table