Home > mailing lists

Clustered index to preserve data locality in a multitenant application? - Mailing list pgsql-general

From	Nicolas Grilly
Subject	Clustered index to preserve data locality in a multitenant application?
Date	August 30, 2016 19:53:24
Msg-id	CAG3yVS5JFT1PB0fydw7BKn09joP10H1obargHnThwkob9TyimQ@mail.gmail.com Whole thread Raw
Responses	Re: Clustered index to preserve data locality in a multitenant application? (Vick Khera <vivek@khera.org>) Re: Clustered index to preserve data locality in a multitenant application? (Nicolas Grilly <nicolas@vocationcity.com>) Re: Clustered index to preserve data locality in a multitenant application? (Nicolas Grilly <nicolas@gardentechno.com>) Re: Clustered index to preserve data locality in a multitenant application? (Nicolas Grilly <nicolas@vocationcity.com>)
List	pgsql-general

Tree view

Hello,

We are developing a multitenant application which is currently based on MySQL, but we're thinking of migrating to PostgreSQL.

We rely on clustered indexes to preserve data locality for each tenant. Primary keys start with the tenant ID. This way, rows belonging to the same tenant are stored next to each other. Because all requests hit only one tenant, this is a great performance improvement.

PostgreSQL doesn't have clustered indexes — I'm aware of the CLUSTER command but it's a one-time operation — and I'm wondering if this can be a problem or not.

Let's say we have a table containing data for 10,000 tenants and 10,000 rows per tenant, for a total of 100,000,000 rows. Let's say each 8 KB block contains ~10 rows. Let's way we want to compute the sum of an integer column for all rows belonging to a given tenant ID.

In MySQL/InnoDB, rows are stored in the leaf nodes of a B-tree. To compute the sum, MySQL has to read at least 1,000 blocks (each block containing ~10 rows). I deliberately neglect the cost of walking the B-tree intermediate nodes.

By comparison, PostgreSQL has to read at least 10,000 blocks (each block containing ~10 rows, but most of the time, only one row will match the tenant ID, other rows belonging to other tenants).

A few questions:

- Am I missing something?

- Am I overestimating the benefit of a clustered index in our case, and the cost of not having one in PostgreSQL?

- Is there another technical solution to this problem?

Thanks,

Nicolas Grilly

Garden / Vocation City - Web Recruitment Software

Managing Partner

+33 6 03 00 25 34

www.vocationcity.com

pgsql-general by date:

From: "Mike Sofen"
Date: 30 August 2016, 19:40:55
Subject: Re: UUIDs & Clustered Indexes

From: Vick Khera
Date: 30 August 2016, 20:26:39
Subject: Re: Clustered index to preserve data locality in a multitenant application?

Clustered index to preserve data locality in a multitenant application? - Mailing list pgsql-general

Previous

Next