Home > mailing lists

Re: Storing large documents - one table or partition by doc? - Mailing list pgsql-performance

From	Jeff Janes
Subject	Re: Storing large documents - one table or partition by doc?
Date	September 26, 2016 00:20:21
Msg-id	CAMkU=1ze4PZrgv6YQWzMMZpd7mi12K_bqNWVmkvZDYin5MOSWw@mail.gmail.com Whole thread
In response to	Storing large documents - one table or partition by doc? (Dev Nop <devnop0@gmail.com>)
List	pgsql-performance

Tree view

On Fri, Sep 23, 2016 at 3:12 AM, Dev Nop <devnop0@gmail.com> wrote:

I’m storing thousands of independent documents each containing around 20k rows. The larger the document, the more likely it is to be active with inserts and updates (1000s/day). The most common read query is to get all the rows for a single document (100s/day).

How can the query be an order of magnitude less than the writes? Wouldn't anything doing an insert or update want to see the results of other people's inserts/updates about as frequently as they happen?

It will be supporting real-time collaboration but with strong-consistency for a simple schema so not well-suited to dedicated "document databases" that assume schema-less & eventual consistency. I won’t have great hardware/budget so need to squeeze the most out of the least.
My question is whether to put all documents into a single huge table or partition by document?
The documents are independent so its purely a performance question. Its too many tables for postgresql partitioning support but I don’t get any benefit from a master table and constraints. Handling partitioning in application logic is effectively zero cost.
I know that 1000s of tables is regarded as an anti-pattern but I can only see the performance and maintenance benefits of one table per independent document e.g. fast per-table vacuum, incremental schema updates, easy future sharding. A monster table will require additional key columns and indexes that don’t have any value beyond allowing the documents to sit in the same table.

If you go the partitioned route, I would add the extra column anyway (but not an index on it), so that it is there if/when you need it.

The only downsides seem to be the system level per-table overhead but I only see that as a problem if I have a very long tail of tiny documents. I'd rather solve that problem if it occurs than manage an all-eggs-in-one-basket monster table.
Is there anything significant I am missing in my reasoning?

If you use a reasonably modern version of PostgreSQL (say, >=9.4) , the overhead of having 1000s of tables should not be too large of a problem. When get into the 100,000 range, that it is likely to start being a problem. If you get to 1,000,000, you almost definitely have a problem.

Cheers,

Jeff

pgsql-performance by date:

From: Jim Nasby
Date: 24 September 2016, 21:30:17
Subject: Re: Storing large documents - one table or partition by doc?

From: Greg Spiegelberg
Date: 26 September 2016, 02:50:14
Subject: Millions of tables

Re: Storing large documents - one table or partition by doc? - Mailing list pgsql-performance

Previous

Next