Re: Performance considerations for very heavy INSERT traffic - Mailing list pgsql-performance
From | Greg Stark |
---|---|
Subject | Re: Performance considerations for very heavy INSERT traffic |
Date | |
Msg-id | 87slw91x8a.fsf@stark.xeocode.com Whole thread Raw |
In response to | Performance considerations for very heavy INSERT traffic (Brandon Black <blblack@gmail.com>) |
Responses |
Re: Performance considerations for very heavy INSERT traffic
|
List | pgsql-performance |
Brandon Black <blblack@gmail.com> writes: > The vast, overwhelming majority of our database traffic is pretty much a > non-stop stream of INSERTs filling up tables. That part Postgres should handle pretty well. It should be pretty much limited by your I/O bandwidth so big raid 1+0 arrays are ideal. Putting the WAL on a dedicated array would also be critical. The WAL parameters like commit_delay and commit_siblings are a bit of a mystery. Nobody has done any extensive testing of them. It would be quite helpful if you find anything conclusive and post it. It would also be surprising if they had a very large effect. They almost got chopped recently because they weren't believed to be useful. You might also ponder whether you need to by issuing a commit for every datum. If you only need to commit periodically you can get much better throughput. I suppose that's the same as commit_siblings. It would be interesting to know if you can get those parameters to perform as well as batching up records yourself. > There will of course be users using a web-based GUI to extract data from > these tables and display them in graphs and whatnot, but the SELECT query > traffic will always be considerably less frequent and intensive than the > incessant INSERTs, and it's not that big a deal if the large queries take a > little while to run. I do fear these queries. Even if they aren't mostly terribly intensive if you're pushing the edges of your write I/O bandwidth then a single seek to satisfy one of these selects could really hurt your throughput. That said, as long as your WAL is on a dedicated drive Postgres's architecture should in theory be ideal and allow you do run these things with impunity. The WAL is purely write-only and it's the only thing your inserts will be blocking on. > This data also expires - rows with timestamps older than X days will be > DELETEd periodically (once an hour or faster), such that the tables will > reach a relatively stable size (pg_autovacuum is handling vacuuming for now, > but considering our case, we're thinking of killing pg_autovacuum in favor > of having the periodic DELETE process also do a vacuum of affected tables > right after the DELETE, and then have it vacuum the other low traffic tables > once a day while it's at it). Ay, there's the rub. Taking this approach means you have vacuums running which have to scan your entire table and your inserts are being sprayed all over the disk. An alternative you may consider is using partitioned tables. Then when you want to expire old records you simply drop the oldest partition. Or in your case you might rotate through the partitions, so you just truncate the oldest one and start inserting into it. Unfortunately there's no built-in support for partitioned tables in Postgres. You get to roll your own using UNION ALL or using inherited tables. Various people have described success with each option though they both have downsides too. Using partitioned tables you would never need to delete any records except for when you delete all of them. So you would never need to run vacuum except on newly empty partitions. That avoids having to scan through all those pages that you know will never have holes. If you use TRUNCATE (or VACUUM ALL or CLUSTER) that would mean your inserts are always sequential too (though it also means lots of block allocations along the way, possibly not an advantage). This may be a lot more work to set up and maintain but I think it would be a big performance win. It would directly speed up the WAL writes by avoiding those big full page dumps. And it would also cut out all the I/O traffic vacuum generates. > Increasing shared_buffers seems to always help, even out to half of the dev > box's ram (2G). Half should be a pessimal setting. It means virtually everything is buffered twice. Once in the kernel and once in Postgres. Effectively halving your memory. If that's really helping try raising it even further, to something like 90% of your memory. But the conventional dogma is that shared buffers should never be more than about 10k no matter how much memory you have. Let the kernel do the bulk of the buffering. That said it's even more mysterious in your situation. Why would a large shared buffers help insert-only performance much at all? My guess is that it's letting the vacuums complete quicker. Perhaps try raising work_mem? -- greg
pgsql-performance by date: