Thread: Tuning for a tiny database
Hello, I couldn't find much information in the archives on this -- perhaps this is a bit of a specialized need, but I was hoping for some input from some experienced postgres admins. I'm moving some DNS servers from djbdns/tinydns to PowerDNS. While it supports many backends, postgresql seems like the best choice for us since it's what is used elsewhere (in larger configurations). As a bit of background, PowerDNS does not query the db for every incoming DNS query, it caches at various levels (both a "packet cache" and a db cache), so it's database needs are quite modest. Some raw numbers: We're only looking at a total of about six tables in one db. In total there are going to be well under 10,000 records in ALL tables. That might increase to at most 100,000 in the next few years. Our raw DNS queries/second tops out around 50 qps over three distinct servers. Keeping in mind that PowerDNS is doing heavy caching, we should never really see more than a few db queries per second. There will be one "master" pgsql db and three slaves using streaming replication. Now given the number of records and the frequency of queries, how should I be tuning for such a small setup? Ideally PowerDNS with it's huge amount of caching should get as much RAM and CPU as I can give it, but I do want to ensure the tiny bit of data postgres has is stuck in physical memory as well. What would you suggest for this scenario? Thanks, Charles
Le mardi 21 juin 2011 à 01:49 -0400, CSS a écrit : > I couldn't find much information in the archives on this -- perhaps this > is a bit of a specialized need, but I was hoping for some input from > some experienced postgres admins. > Hi, I am not an experienced postgres admin, but I am an experienced reader of this list, and from what I gather, the figures you quote are small enough that you probably can work out of the box without doing anything, unless DNS servers have special needs I am not aware of. > I'm moving some DNS servers from djbdns/tinydns to PowerDNS. While it > supports many backends, postgresql seems like the best choice for us > since it's what is used elsewhere (in larger configurations). As a bit > of background, PowerDNS does not query the db for every incoming DNS > query, it caches at various levels (both a "packet cache" and a db > cache), so it's database needs are quite modest. > > Some raw numbers: We're only looking at a total of about six tables in > one db. In total there are going to be well under 10,000 records in ALL > tables. That might increase to at most 100,000 in the next few years. Unless those DNS records are large (I guess not), the db should reside entirely in memory at least at the beginning. I am guessing also, but you should validate it, that your tables, with 1 500 records on average, probably don't even need an index, as the engine normally does a sequential scans over small datasets. -- Vincent Veyron http://marica.fr/ Logiciel de gestion des sinistres et des contentieux pour le service juridique
On 06/21/2011 01:49 AM, CSS wrote: > Some raw numbers: We're only looking at a total of about six tables in > one db. In total there are going to be well under 10,000 records in ALL > tables. That might increase to at most 100,000 in the next few years. > Our raw DNS queries/second tops out around 50 qps over three distinct > servers. Keeping in mind that PowerDNS is doing heavy caching, we > should never really see more than a few db queries per second. I doubt you really need to do any tuning for this scenario. I would set shared_buffers to a modest value--maybe 256MB--and stop further tuning until there's some evidence it's necessary. If presented with the same problem but with the much harder twist "I need to support >10,000 queries/second", I would recommend: -Populate a prototype with a representative amount of data -Measure the database size -Set shared_buffers to that -Whenever the database is restarted, construct a series of queries that forces all the data used regularly into the database's cache -Use pg_buffercache to confirm what's in there is what you expect Getting all the data into cache is sometimes harder than expected. Some optimizations in PostgreSQL keep it from caching large amount of tables when you do a sequential scan of the contents, as one example that complicates things. But if you get to where this is necessary, building such a tool isn't difficult, and there are some projects out there that address this particular need: filling the cache back up with relevant data after restart. This is the main one: http://pgfoundry.org/projects/pgfincore/ http://www.pgcon.org/2010/schedule/events/261.en.html -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Jun 20, 2011, at 10:49 PM, CSS wrote: > Hello, > > I couldn't find much information in the archives on this -- perhaps this > is a bit of a specialized need, but I was hoping for some input from > some experienced postgres admins. > > I'm moving some DNS servers from djbdns/tinydns to PowerDNS. While it > supports many backends, postgresql seems like the best choice for us > since it's what is used elsewhere (in larger configurations). As a bit > of background, PowerDNS does not query the db for every incoming DNS > query, it caches at various levels (both a "packet cache" and a db > cache), so it's database needs are quite modest. > > Some raw numbers: We're only looking at a total of about six tables in > one db. In total there are going to be well under 10,000 records in ALL > tables. That might increase to at most 100,000 in the next few years. > Our raw DNS queries/second tops out around 50 qps over three distinct > servers. Keeping in mind that PowerDNS is doing heavy caching, we > should never really see more than a few db queries per second. There > will be one "master" pgsql db and three slaves using streaming replication. > > Now given the number of records and the frequency of queries, how should > I be tuning for such a small setup? Ideally PowerDNS with it's huge > amount of caching should get as much RAM and CPU as I can give it, but I > do want to ensure the tiny bit of data postgres has is stuck in physical > memory as well. > > What would you suggest for this scenario? I'm doing pretty much the same thing. I installed postgresql 8.4.something and powerdns from the ubuntu repository on a small VM and hit go. It's running just fine, at comparable query rates and maybe 5,000 records. At some point I'll tune it, just out of tidiness, but it's fine out of the box. You'll also need replication, probably from a stealth master. I found bucardo very easy to set up for that, and it works nicely to replicate to multiple public servers from a single master server inside the firewall. Cheers, Steve
Greg Smith wrote: > On 06/21/2011 01:49 AM, CSS wrote: >> Some raw numbers: We're only looking at a total of about six tables in >> one db. In total there are going to be well under 10,000 records in ALL >> tables. That might increase to at most 100,000 in the next few years. >> Our raw DNS queries/second tops out around 50 qps over three distinct >> servers. Keeping in mind that PowerDNS is doing heavy caching, we >> should never really see more than a few db queries per second. > > I doubt you really need to do any tuning for this scenario. I would set > shared_buffers to a modest value--maybe 256MB--and stop further tuning > until there's some evidence it's necessary. Sounds good. I'm at... wait for it... 8MB now. This seems to work. Since I originally posted this I've been benchmarking powerdns and I've found that with it only hitting one core I can do about 50,000 queries/second. During that time, Postgres is not even a blip. I think with the heavy caching powerdns does, it just doesn't really hit postgres until it sees something in the cache has reached the max TTL. In short, heavy DNS query traffic is not in turn causing heavy DNS traffic. This might change with someone pummeling it with queries for domains or hosts that don't exist, but I believe it handles negative caching of records as well. > If presented with the same problem but with the much harder twist "I > need to support >10,000 queries/second", I would recommend: > > -Populate a prototype with a representative amount of data > -Measure the database size > -Set shared_buffers to that Thank you. I'm fairly green with Postgres, so this is very helpful. > -Whenever the database is restarted, construct a series of queries that > forces all the data used regularly into the database's cache > -Use pg_buffercache to confirm what's in there is what you expect I need to read up on that module, but it looks quite intriguing. > Getting all the data into cache is sometimes harder than expected. Some > optimizations in PostgreSQL keep it from caching large amount of tables > when you do a sequential scan of the contents, as one example that > complicates things. But if you get to where this is necessary, building > such a tool isn't difficult, and there are some projects out there that > address this particular need: filling the cache back up with relevant > data after restart. This is the main one: > > http://pgfoundry.org/projects/pgfincore/ > http://www.pgcon.org/2010/schedule/events/261.en.html Thanks again, I appreciate all your input on what's currently looking like a non-problem. But there are other places I can certainly use this info. Charles
Steve Atkins wrote: > On Jun 20, 2011, at 10:49 PM, CSS wrote: > >> Hello, >> >> I couldn't find much information in the archives on this -- perhaps this >> is a bit of a specialized need, but I was hoping for some input from >> some experienced postgres admins. >> >> I'm moving some DNS servers from djbdns/tinydns to PowerDNS. While it >> supports many backends, postgresql seems like the best choice for us >> since it's what is used elsewhere (in larger configurations). As a bit >> of background, PowerDNS does not query the db for every incoming DNS >> query, it caches at various levels (both a "packet cache" and a db >> cache), so it's database needs are quite modest. >> >> Some raw numbers: We're only looking at a total of about six tables in >> one db. In total there are going to be well under 10,000 records in ALL >> tables. That might increase to at most 100,000 in the next few years. >> Our raw DNS queries/second tops out around 50 qps over three distinct >> servers. Keeping in mind that PowerDNS is doing heavy caching, we >> should never really see more than a few db queries per second. There >> will be one "master" pgsql db and three slaves using streaming replication. >> >> Now given the number of records and the frequency of queries, how should >> I be tuning for such a small setup? Ideally PowerDNS with it's huge >> amount of caching should get as much RAM and CPU as I can give it, but I >> do want to ensure the tiny bit of data postgres has is stuck in physical >> memory as well. >> >> What would you suggest for this scenario? > > I'm doing pretty much the same thing. I installed postgresql 8.4.something and > powerdns from the ubuntu repository on a small VM and hit go. It's running just > fine, at comparable query rates and maybe 5,000 records. At some point I'll > tune it, just out of tidiness, but it's fine out of the box. That's pretty much what I'm seeing. As I mentioned in my last email in this thread, the only other test case I need to poke around with is hitting the powerdns server with a bunch of stuff it's not authoritative for (forcing a lookup in the domains table) and non-existent records in domains it is authoritative for (forcing a lookup in the records table). > You'll also need replication, probably from a stealth master. I found bucardo > very easy to set up for that, and it works nicely to replicate to multiple public > servers from a single master server inside the firewall. Already there - I have a hidden master db server and then the three public powerdns instances all slave from that. I'm using the "streaming" replication option in 9.0 and so far it seems to be working well. I need to look more closely at the whole log-shipping thing in case a slave somehow gets cut off from the master, but in initial testing, it looks really good. My only problem was I initially was putting some of these on older hosts and I ran into the 32 vs. 64 bit problem in copying the data directory across. Thanks for your input! Charles > Cheers, > Steve > >