Re: typical active table count? - Mailing list pgsql-general

From Jeremy Schneider
Subject Re: typical active table count?
Date
Msg-id 03a01fb1-e0be-24f2-eadb-8e1a4b4981bf@ardentperf.com
Whole thread Raw
In response to Re: typical active table count?  (Ron <ronljohnsonjr@gmail.com>)
List pgsql-general
On 6/27/23 12:08 PM, Ron wrote:
> On 6/27/23 13:47, Jeremy Schneider wrote:
>> On 6/27/23 9:32 AM, Ben Chobot wrote:
>>> We certainly have databases where far more than 100 tables are updated
>>> within a 10 second period. Is there a specific concern you have?
>>>
>> Thank Ben, not a concern but I'm trying to better understand how common
>> this might be. And I think sharing general statistics about how people
>> use PostgreSQL is a great help to the developers who build and maintain it.
>>
>> One really nice thing about PostgreSQL is that two quick copies of
>> pg_stat_all_tables and you can easily see this sort of info.
>>
>> If you have a database where more than 100 tables are updated within a
>> 10 second period - this seems really uncommon to me - I'm very curious
>> about the workload.
> 
> 100 tables updates just means /possibly complicated schema/, not
> necessarily high volume.
> 
> ...
> 
> And honestly, 100 tables in 10 seconds is 10 tables/second.  If each
> gets one insert, that's a laughably slow transaction rate.  (Unless of
> course there's 85 indices per table, and foreign keys don't have
> supporting indices.)

I don't think the math actually works this way on highly concurrent
systems. In fact, this morning I connected with a coworker who works on
Amazon fulfillment center DBs and there was almost no difference in the
number of tables with insert/update/delete regardless of whether you
looked at a 10 second window or a 2 second window. I was also able to
chat with another coworker at Amazon who got numbers from a couple of
their PG databases, and connected w one person on slack at a different
company who passed along numbers, and got a few emails from Oracle folks.

The numbers reported back to me ranged from 29 to over a hundred.
Obviously there are also lots of small databases behind wordpress
websites with much less activity, but I found this to be an interesting
measure of some respectably busy systems.

The original context was a conversation related to logical replication
of DB changes.

But then I got interested in the general question and topic - and
someone on the Oracle side mentioned system tables which is a really
good point that hadn't occurred to me yet. The original conversation was
concerned with user tables and not system ones, but there would be a
fair amount of ongoing system table activity too.

Besides partitioning, another interesting dimension of the conversation
has been thinking about different categories of workloads. For example:
SaaS or multitenant applications with many copies of a similar schema,
ISVs, ERPs, or large enterprise databases with lots of development
history. All of these categories can easily ramp up the counts.

I'm still interested in more data - if anyone reading this can grab a
couple snapshots of pg_stat_all_tables and report back numbers for a 10
second window and a 2 second window, that would be amazing!

-Jeremy


-- 
http://about.me/jeremy_schneider




pgsql-general by date:

Previous
From: Marc Millas
Date:
Subject: Re: pb with join plan
Next
From: Atul Kumar
Date:
Subject: connect postgres using url