Home > mailing lists

Re: Millions of tables - Mailing list pgsql-performance

From	Álvaro Hernández Tortosa
Subject	Re: Millions of tables
Date	September 26, 2016 08:29:03
Msg-id	7413c596-f2e0-77eb-ad6c-0ce456fc2665@8kdata.com Whole thread Raw
In response to	Millions of tables (Greg Spiegelberg <gspiegelberg@gmail.com>)
Responses	Re: Millions of tables
List	pgsql-performance

Tree view


On 26/09/16 05:50, Greg Spiegelberg wrote:
> Hey all,
>
> Obviously everyone who's been in PostgreSQL or almost any RDBMS for a
> time has said not to have millions of tables.  I too have long
> believed it until recently.
>
> AWS d2.8xlarge instance with 9.5 is my test rig using XFS on EBS (io1)
> for PGDATA.  Over the weekend, I created 8M tables with 16M indexes on
> those tables.  Table creation initially took 0.018031 secs, average
> 0.027467 and after tossing out outliers (qty 5) the maximum creation
> time found was 0.66139 seconds.  Total time 30 hours, 31 minutes and
> 8.435049 seconds.  Tables were created by a single process. Do note
> that table creation is done via plpgsql function as there are other
> housekeeping tasks necessary though minimal.
>
> No system tuning but here is a list of PostgreSQL knobs and switches:
> shared_buffers = 2GB
> work_mem = 48 MB
> max_stack_depth = 4 MB
> synchronous_commit = off
> effective_cache_size = 200 GB
> pg_xlog is on it's own file system
>
> There are some still obvious problems.  General DBA functions such as
> VACUUM and ANALYZE should not be done.  Each will run forever and
> cause much grief.  Backups are problematic in the traditional pg_dump
> and PITR space.  Large JOIN's by VIEW, SELECT or via table inheritance
> (I am abusing it in my test case) are no-no's.  A system or database
> crash could take potentially hours to days to recover.  There are
> likely other issues ahead.
>
> You may wonder, "why is Greg attempting such a thing?"  I looked at
> DynamoDB, BigTable, and Cassandra.  I like Greenplum but, let's face
> it, it's antiquated and don't get me started on "Hadoop".  I looked at
> many others and ultimately the recommended use of each vendor was to
> have one table for all data.  That overcomes the millions of tables
> problem, right?
>
> Problem with the "one big table" solution is I anticipate 1,200
> trillion records.  Random access is expected and the customer expects
> <30ms reads for a single record fetch.
>
> No data is loaded... yet  Table and index creation only. I am
> interested in the opinions of all including tests I may perform.  If
> you had this setup, what would you capture / analyze?  I have a job
> running preparing data.  I did this on a much smaller scale (50k
> tables) and data load via function allowed close to 6,000
> records/second.  The schema has been simplified since and last test
> reach just over 20,000 records/second with 300k tables.
>
> I'm not looking for alternatives yet but input to my test. Takers?
>
> I can't promise immediate feedback but will do my best to respond with
> results.
>
> TIA,
> -Greg

     Hi Greg.

     This is a problem (creating a large number of tables; really large
indeed) that we researched in my company a while ago. You might want to
read about it: https://www.pgcon.org/2013/schedule/events/595.en.html

     Cheers,

     Álvaro


--

Álvaro Hernández Tortosa


-----------
8Kdata

pgsql-performance by date:

From: Dev Nop
Date: 26 September 2016, 08:27:39
Subject: Re: Storing large documents - one table or partition by doc?

From: Stuart Bishop
Date: 26 September 2016, 09:43:25
Subject: Re: Millions of tables

Re: Millions of tables - Mailing list pgsql-performance

Previous

Next