Re: Millions of tables - Mailing list pgsql-performance

From Simon Riggs
Subject Re: Millions of tables
Date
Msg-id CANP8+jKE0A1s6UK3F+EmPEmv9b_zWJ6T-qsMGvUJyEfLCZ7HTg@mail.gmail.com
Whole thread Raw
In response to Re: Millions of tables  (Greg Spiegelberg <gspiegelberg@gmail.com>)
List pgsql-performance
On 26 September 2016 at 05:19, Greg Spiegelberg <gspiegelberg@gmail.com> wrote:
> I did look at PostgresXL and CitusDB.  Both are admirable however neither
> could support the need to read a random record consistently under 30ms.
> It's a similar problem Cassandra and others have: network latency.  At this
> scale, to provide the ability to access any given record amongst trillions
> it is imperative to know precisely where it is stored (system & database)
> and read a relatively small index.  I have other requirements that prohibit
> use of any technology that is eventually consistent.

Then XL is exactly what you need, since it does allow you to calculate
exactly where the record is via hash and then access it, which makes
the request just a single datanode task.

XL is not the same as CitusDB.

> I liken the problem to fishing.  To find a particular fish of length, size,
> color &c in a data lake you must accept the possibility of scanning the
> entire lake.  However, if all fish were in barrels where each barrel had a
> particular kind of fish of specific length, size, color &c then the problem
> is far simpler.

The task of putting the fish in the appropriate barrel is quite hard.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-performance by date:

Previous
From: Karl Denninger
Date:
Subject: Re: PostgreSQL on ZFS: performance tuning
Next
From: "Alex Ignatov \(postgrespro\)"
Date:
Subject: Re: Millions of tables