Re: Should I partition this table? - Mailing list pgsql-general

From Bill Moran
Subject Re: Should I partition this table?
Date
Msg-id 20140710112030.0af75456ac95f41277ecc670@potentialtech.com
Whole thread Raw
In response to Should I partition this table?  (AlexK <alkuzo@gmail.com>)
Responses Re: Should I partition this table?  (AlexK <alkuzo@gmail.com>)
Re: Should I partition this table?  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
On Thu, 10 Jul 2014 07:59:20 -0700 (PDT) AlexK <alkuzo@gmail.com> wrote:

> My table currently uses up 62 GB of storage, and it has 450 M rows. This
> narrow table has a PK on (ParentID, ChildNumber), and it has between 20K and
> 50K of child rows per parent.
>
> The data is inserted daily, rarely modified, never deleted. The performance
> of modifications is not an issue. The only select from it is as follows:
>
> SELECT <column_lis> FROM MyChildTable WHERE ParentID=?
> ORDER BY ChildNumber;
>
> The selects are frequent, and their performance is essential.
>
> Would you advise me to partition this table?

In general, yes, given the information you provided.  A parition on
ParentID % $something should improve performance.  Exactly what
$something is will take some experimenting on your part to determine.
In my experience, the scenario you describe is likely to see SELECT
performance improve nearly linerally to the number of partitions, up
to some point that will take exerpimenting to determine.

However, there may other methods of partition and/or rearranging the
data that would be even better, depending on a lot of information you
did not provide.

As an example, how are your selects distributed?  Are they fairly even
across the entire data set?  Or do ParentID become less accessed the
older they get?  If the latter, you'll probably be better served by
lazily archiving infrequently accessed ParentID rows to an archive
table and adjusting the application to search that table only if the
rows weren't found in the primary table.  This is an improvement over
modulous partitioning becuase it's more likely that the frequently
accessed data will be in memory and stay there.

Going even further, since the data is infrequently modified, you might be
better served by putting some sort of cache (memcache, or a custom in-
app cache) in front of the DB and checking it first.  The in-app cache
is always the best because it incurs no network traffic to access, but
the feasibility of doing that depends on the exact nature of the
application.  Say it with me: An RDBMS is not RAM, and trying to use
it like RAM will probably lead to disappointing performance.

--
Bill Moran <wmoran@potentialtech.com>


pgsql-general by date:

Previous
From: AlexK
Date:
Subject: Should I partition this table?
Next
From: Kevin Grittner
Date:
Subject: Re: Should I partition this table?