Re: Clustered index to preserve data locality in a multitenant application? - Mailing list pgsql-general

From Eduardo Morras
Subject Re: Clustered index to preserve data locality in a multitenant application?
Date
Msg-id 20160830150543.33fa542c425da70c22eb277f@yahoo.es
Whole thread Raw
In response to Clustered index to preserve data locality in a multitenant application?  (Nicolas Grilly <nicolas@gardentechno.com>)
List pgsql-general
On Tue, 30 Aug 2016 13:12:33 +0200
Nicolas Grilly <nicolas@gardentechno.com> wrote:

> Hello,
>
> We are developing a multitenant application which is currently based
> on MySQL, but we're thinking of migrating to PostgreSQL.
>
> We rely on clustered indexes to preserve data locality for each
> tenant. Primary keys start with the tenant ID. This way, rows
> belonging to the same tenant are stored next to each other. Because
> all requests hit only one tenant, this is a great performance
> improvement.
>
> PostgreSQL doesn't have clustered indexes ? I'm aware of the CLUSTER
> command but it's a one-time operation ? and I'm wondering if this can
> be a problem or not.
>
> Let's say we have a table containing data for 10,000 tenants and
> 10,000 rows per tenant, for a total of 100,000,000 rows. Let's say
> each 8 KB block contains ~10 rows. Let's way we want to compute the
> sum of an integer column for all rows belonging to a given tenant ID.
>
> In MySQL/InnoDB, rows are stored in the leaf nodes of a B-tree. To
> compute the sum, MySQL has to read at least 1,000 blocks (each block
> containing ~10 rows). I deliberately neglect the cost of walking the
> B-tree intermediate nodes.
>
> By comparison, PostgreSQL has to read at least 10,000 blocks (each
> block containing ~10 rows, but most of the time, only one row will
> match the tenant ID, other rows belonging to other tenants).
>
> A few questions:
>
> - Am I missing something?
> - Am I overestimating the benefit of a clustered index in our case,
> and the cost of not having one in PostgreSQL?
> - Is there another technical solution to this problem?

Check BRIN indexs, they are "designed for handling very large tables in
which certain columns have some natural correlation with their physical
location within the table", I think they fit your needs.

(see https://www.postgresql.org/docs/current/static/brin-intro.html)


> Thanks,
> Nicolas Grilly
> Managing Partner
> +33 6 03 00 25 34
> www.vocationcity.com  ?  Recruitment & application tracking software
> www.gardentechno.com  ?  Consulting & software engineering


---   ---
Eduardo Morras <emorrasg@yahoo.es>


pgsql-general by date:

Previous
From: "Mike Sofen"
Date:
Subject: Re: Re: Clustered index to preserve data locality in a multitenant application?
Next
From: Luke Gordon
Date:
Subject: UUIDs & Clustered Indexes