Home > mailing lists

Re: Millions of tables - Mailing list pgsql-performance

From	Greg Spiegelberg
Subject	Re: Millions of tables
Date	September 27, 2016 15:49:55
Msg-id	CAEtnbpU23--36ea-HoECUBysvtYOMCobd9wEY6b2bL88ugOTXg@mail.gmail.com Whole thread Raw
In response to	Re: Millions of tables ("Mike Sofen" <msofen@runbox.com>)
Responses	Re: Millions of tables
List	pgsql-performance

Tree view

On Tue, Sep 27, 2016 at 9:42 AM, Mike Sofen <msofen@runbox.com> wrote:

From: Mike Sofen Sent: Tuesday, September 27, 2016 8:10 AM

From: Greg Spiegelberg Sent: Monday, September 26, 2016 7:25 AM
I've gotten more responses than anticipated and have answered some questions and gotten some insight but my challenge again is what should I capture along the way to prove or disprove this storage pattern? Alternatives to the storage pattern aside, I need ideas to test rig, capture metrics and suggestions to tune it.

In the next 24 hours, I will be sending ~1 trillion records to the test database. Because of time to set up, I'd rather have things set up properly the first go.

Thanks!
-Greg
---------------------
Greg, I ran another quick test on a wider table than you’ve described, but this time with 80 million rows, with core counts, ram and ssd storage similar to what you’d have on that AWS EC2 instance. This table had 7 columns (3 integers, 3 text, 1 timestamptz) with an average width of 157 chars, one btree index on the pk int column. Using explain analyze, I picked one id value out of the 80m and ran a select * where id = x. It did an index scan, had a planning time of 0.077ms, and an execution time of 0.254 seconds. I ran the query for a variety of widely spaced values (so the data was uncached) and the timing never changed. This has been mirroring my general experience with PG – very fast reads on indexed queries.

Summary: I think your buckets can be WAY bigger than you are envisioning for the simple table design you’ve described. I’m betting you can easily do 500 million rows per bucket before approaching anything close to the 30ms max query time.

Mike Sofen (Synthetic Genomics)

Totally typo’d the execution time: it was 0.254 MILLISECONDS, not SECONDS. Thus my comment about going up 10x in bucket size instead of appearing to be right at the limit. Sorry!

I figured. :)

Haven't ruled it out but expectations of this implementation is to perform at worst 3X slower than memcache or Redis.

Bigger buckets mean a wider possibility of response times. Some buckets may contain 140k records and some 100X more.

-Greg

pgsql-performance by date:

From: Greg Spiegelberg
Date: 27 September 2016, 15:46:11
Subject: Re: Millions of tables

From: Torsten Zuehlsdorff
Date: 27 September 2016, 16:00:15
Subject: Re: PostgreSQL on ZFS: performance tuning

Re: Millions of tables - Mailing list pgsql-performance

Previous

Next