Re: Thousands of tables versus on table? - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: Thousands of tables versus on table?
Date
Msg-id 466598E6.5020501@g2switchworks.com
Whole thread Raw
In response to Re: Thousands of tables versus on table?  (Thomas Andrews <tandrews@soliantconsulting.com>)
List pgsql-performance
Thomas Andrews wrote:
>
>
> On 6/5/07 12:48 PM, "Scott Marlowe" <smarlowe@g2switchworks.com> wrote:
>
>
>> david@lang.hm wrote:
>>
>>> On Mon, 4 Jun 2007, Scott Marlowe wrote:
>>>
>>>
>>>> Gregory Stark wrote:
>>>>
>>>>>  "Thomas Andrews" <tandrews@soliantconsulting.com> writes:
>>>>>
>>>>>
>>>>>
>>>>>>  I guess my real question is, does it ever make sense to create
>>>>>>
>>>>> thousands >  of
>>>>>
>>>>>>  tables like this?
>>>>>>
>>>>>>
>>>>>  Sometimes. But usually it's not a good idea.
>>>>>  What you're proposing is basically partitioning, though you may not
>>>>>  actually
>>>>>  need to put all the partitions together for your purposes.
>>>>> Partitioning's
>>>>>  main
>>>>>  benefit is in the management of the data. You can drop and load
>>>>> partitions
>>>>>  in
>>>>>  chunks rather than have to perform large operations on millions of
>>>>>  records.
>>>>>
>>>>>  Postgres doesn't really get any faster by breaking the tables up like
>>>>>  that. In
>>>>>  fact it probably gets slower as it has to look up which of the
>>>>> thousands
>>>>>  of
>>>>>  tables you want to work with.
>>>>>
>>>>>
>>>> That's not entirely true.  PostgreSQL can be markedly faster using
>>>> partitioning as long as you always access it by referencing the
>>>> partitioning key in the where clause.  So, if you partition the table
>>>> by date, and always reference it with a date in the where clause, it
>>>> will usually be noticeably faster.  OTOH, if you access it without
>>>> using a where clause that lets it pick partitions, then it will be
>>>> slower than one big table.
>>>>
>>>> So, while this poster might originally think to have one table for
>>>> each user, resulting in thousands of tables, maybe a compromise where
>>>> you partition on userid ranges would work out well, and keep each
>>>> partition table down to some 50-100 thousand rows, with smaller
>>>> indexes to match.
>>>>
>>>>
>>> what if he doesn't use the postgres internal partitioning, but instead
>>> makes his code access the tables named responsesNNNNN where NNNNN is
>>> the id of the customer?
>>>
>>> this is what it sounded like he was asking initially.
>>>
>> Sorry, I think I initially read your response as "Postgres doesn't
>> really get any faster by breaking the tables up" without the "like that"
>> part.
>>
>> I've found that as long as the number of tables is > 10,000 or so,
>>
That should have been as long as the number of tables is < 10,000 or so...

>> having a lot of tables doesn't seem to really slow pgsql down a lot.
>> I'm sure that the tipping point is dependent on your db machine.  I
>> would bet that if he's referring to individual tables directly, and each
>> one has hundreds instead of millions of rows, the performance would be
>> better.  But the only way to be sure is to test it.
>>
>
>
Please stop top posting.  This is my last reply until you stop top posting.

> So, partitioning in PSQL 8 is workable, but breaking up the table up into
> actual separate tables is not?
>
Ummm, that's not what I said.  They're similar in execution.  However,
partitioning might let you put 100 customers into a given table, if,
say, you partitioned on customer ID or something that would allow you to
group a few together.
> Another solution we have proposed is having 'active' and 'completed' tables.
> So, rather than thousands, we'd have four tables:
>
>    responders_active
>    responders_completed
>    responses_active
>    responses_completed
>
That's not a bad idea.  Just keep up on your vacuuming.

pgsql-performance by date:

Previous
From: Thomas Andrews
Date:
Subject: Re: Thousands of tables versus on table?
Next
From: Gregory Stark
Date:
Subject: Re: Thousands of tables versus on table?