Re: Thousands of tables versus on table? - Mailing list pgsql-performance
From | Scott Marlowe |
---|---|
Subject | Re: Thousands of tables versus on table? |
Date | |
Msg-id | 466598E6.5020501@g2switchworks.com Whole thread Raw |
In response to | Re: Thousands of tables versus on table? (Thomas Andrews <tandrews@soliantconsulting.com>) |
List | pgsql-performance |
Thomas Andrews wrote: > > > On 6/5/07 12:48 PM, "Scott Marlowe" <smarlowe@g2switchworks.com> wrote: > > >> david@lang.hm wrote: >> >>> On Mon, 4 Jun 2007, Scott Marlowe wrote: >>> >>> >>>> Gregory Stark wrote: >>>> >>>>> "Thomas Andrews" <tandrews@soliantconsulting.com> writes: >>>>> >>>>> >>>>> >>>>>> I guess my real question is, does it ever make sense to create >>>>>> >>>>> thousands > of >>>>> >>>>>> tables like this? >>>>>> >>>>>> >>>>> Sometimes. But usually it's not a good idea. >>>>> What you're proposing is basically partitioning, though you may not >>>>> actually >>>>> need to put all the partitions together for your purposes. >>>>> Partitioning's >>>>> main >>>>> benefit is in the management of the data. You can drop and load >>>>> partitions >>>>> in >>>>> chunks rather than have to perform large operations on millions of >>>>> records. >>>>> >>>>> Postgres doesn't really get any faster by breaking the tables up like >>>>> that. In >>>>> fact it probably gets slower as it has to look up which of the >>>>> thousands >>>>> of >>>>> tables you want to work with. >>>>> >>>>> >>>> That's not entirely true. PostgreSQL can be markedly faster using >>>> partitioning as long as you always access it by referencing the >>>> partitioning key in the where clause. So, if you partition the table >>>> by date, and always reference it with a date in the where clause, it >>>> will usually be noticeably faster. OTOH, if you access it without >>>> using a where clause that lets it pick partitions, then it will be >>>> slower than one big table. >>>> >>>> So, while this poster might originally think to have one table for >>>> each user, resulting in thousands of tables, maybe a compromise where >>>> you partition on userid ranges would work out well, and keep each >>>> partition table down to some 50-100 thousand rows, with smaller >>>> indexes to match. >>>> >>>> >>> what if he doesn't use the postgres internal partitioning, but instead >>> makes his code access the tables named responsesNNNNN where NNNNN is >>> the id of the customer? >>> >>> this is what it sounded like he was asking initially. >>> >> Sorry, I think I initially read your response as "Postgres doesn't >> really get any faster by breaking the tables up" without the "like that" >> part. >> >> I've found that as long as the number of tables is > 10,000 or so, >> That should have been as long as the number of tables is < 10,000 or so... >> having a lot of tables doesn't seem to really slow pgsql down a lot. >> I'm sure that the tipping point is dependent on your db machine. I >> would bet that if he's referring to individual tables directly, and each >> one has hundreds instead of millions of rows, the performance would be >> better. But the only way to be sure is to test it. >> > > Please stop top posting. This is my last reply until you stop top posting. > So, partitioning in PSQL 8 is workable, but breaking up the table up into > actual separate tables is not? > Ummm, that's not what I said. They're similar in execution. However, partitioning might let you put 100 customers into a given table, if, say, you partitioned on customer ID or something that would allow you to group a few together. > Another solution we have proposed is having 'active' and 'completed' tables. > So, rather than thousands, we'd have four tables: > > responders_active > responders_completed > responses_active > responses_completed > That's not a bad idea. Just keep up on your vacuuming.
pgsql-performance by date: