Thread: Tuning for a tiny database

Tuning for a tiny database

From
CSS
Date:
Hello,

I couldn't find much information in the archives on this -- perhaps this
is a bit of a specialized need, but I was hoping for some input from
some experienced postgres admins.

I'm moving some DNS servers from djbdns/tinydns to PowerDNS.  While it
supports many backends, postgresql seems like the best choice for us
since it's what is used elsewhere (in larger configurations).  As a bit
of background, PowerDNS does not query the db for every incoming DNS
query, it caches at various levels (both a "packet cache" and a db
cache), so it's database needs are quite modest.

Some raw numbers: We're only looking at a total of about six tables in
one db.  In total there are going to be well under 10,000 records in ALL
tables.  That might increase to at most 100,000 in the next few years.
Our raw DNS queries/second tops out around 50 qps over three distinct
servers.  Keeping in mind that PowerDNS is doing heavy caching, we
should never really see more than a few db queries per second.  There
will be one "master" pgsql db and three slaves using streaming replication.

Now given the number of records and the frequency of queries, how should
I be tuning for such a small setup?  Ideally PowerDNS with it's huge
amount of caching should get as much RAM and CPU as I can give it, but I
do want to ensure the tiny bit of data postgres has is stuck in physical
memory as well.

What would you suggest for this scenario?

Thanks,

Charles

Re: Tuning for a tiny database

From
Vincent Veyron
Date:
Le mardi 21 juin 2011 à 01:49 -0400, CSS a écrit :

> I couldn't find much information in the archives on this -- perhaps this
> is a bit of a specialized need, but I was hoping for some input from
> some experienced postgres admins.
>
Hi,

I am not an experienced postgres admin, but I am an experienced reader
of this list, and from what I gather, the figures you quote are small
enough that you probably can work out of the box without doing anything,
unless DNS servers have special needs I am not aware of.

> I'm moving some DNS servers from djbdns/tinydns to PowerDNS.  While it
> supports many backends, postgresql seems like the best choice for us
> since it's what is used elsewhere (in larger configurations).  As a bit
> of background, PowerDNS does not query the db for every incoming DNS
> query, it caches at various levels (both a "packet cache" and a db
> cache), so it's database needs are quite modest.
>
> Some raw numbers: We're only looking at a total of about six tables in
> one db.  In total there are going to be well under 10,000 records in ALL
> tables.  That might increase to at most 100,000 in the next few years.


Unless those DNS records are large (I guess not), the db should reside
entirely in memory at least at the beginning. I am guessing also, but
you should validate it, that your tables, with 1 500 records on average,
probably don't even need an index, as the engine normally does a
sequential scans over small datasets.


--
Vincent Veyron
http://marica.fr/
Logiciel de gestion des sinistres et des contentieux pour le service juridique


Re: Tuning for a tiny database

From
Greg Smith
Date:
On 06/21/2011 01:49 AM, CSS wrote:
> Some raw numbers: We're only looking at a total of about six tables in
> one db.  In total there are going to be well under 10,000 records in ALL
> tables.  That might increase to at most 100,000 in the next few years.
> Our raw DNS queries/second tops out around 50 qps over three distinct
> servers.  Keeping in mind that PowerDNS is doing heavy caching, we
> should never really see more than a few db queries per second.

I doubt you really need to do any tuning for this scenario.  I would set
shared_buffers to a modest value--maybe 256MB--and stop further tuning
until there's some evidence it's necessary.

If presented with the same problem but with the much harder twist "I
need to support >10,000 queries/second", I would recommend:

-Populate a prototype with a representative amount of data
-Measure the database size
-Set shared_buffers to that
-Whenever the database is restarted, construct a series of queries that
forces all the data used regularly into the database's cache
-Use pg_buffercache to confirm what's in there is what you expect

Getting all the data into cache is sometimes harder than expected.  Some
optimizations in PostgreSQL keep it from caching large amount of tables
when you do a sequential scan of the contents, as one example that
complicates things.  But if you get to where this is necessary, building
such a tool isn't difficult, and there are some projects out there that
address this particular need:  filling the cache back up with relevant
data after restart.  This is the main one:

http://pgfoundry.org/projects/pgfincore/
http://www.pgcon.org/2010/schedule/events/261.en.html

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Tuning for a tiny database

From
Steve Atkins
Date:
On Jun 20, 2011, at 10:49 PM, CSS wrote:

> Hello,
>
> I couldn't find much information in the archives on this -- perhaps this
> is a bit of a specialized need, but I was hoping for some input from
> some experienced postgres admins.
>
> I'm moving some DNS servers from djbdns/tinydns to PowerDNS.  While it
> supports many backends, postgresql seems like the best choice for us
> since it's what is used elsewhere (in larger configurations).  As a bit
> of background, PowerDNS does not query the db for every incoming DNS
> query, it caches at various levels (both a "packet cache" and a db
> cache), so it's database needs are quite modest.
>
> Some raw numbers: We're only looking at a total of about six tables in
> one db.  In total there are going to be well under 10,000 records in ALL
> tables.  That might increase to at most 100,000 in the next few years.
> Our raw DNS queries/second tops out around 50 qps over three distinct
> servers.  Keeping in mind that PowerDNS is doing heavy caching, we
> should never really see more than a few db queries per second.  There
> will be one "master" pgsql db and three slaves using streaming replication.
>
> Now given the number of records and the frequency of queries, how should
> I be tuning for such a small setup?  Ideally PowerDNS with it's huge
> amount of caching should get as much RAM and CPU as I can give it, but I
> do want to ensure the tiny bit of data postgres has is stuck in physical
> memory as well.
>
> What would you suggest for this scenario?

I'm doing pretty much the same thing. I installed postgresql 8.4.something and
powerdns from the ubuntu repository on a small VM and hit go. It's running just
fine, at comparable query rates and maybe 5,000 records. At some point I'll
tune it, just out of tidiness, but it's fine out of the box.

You'll also need replication, probably from a stealth master. I found bucardo
very easy to set up for that, and it works nicely to replicate to multiple public
servers from a single master server inside the firewall.

Cheers,
  Steve


Re: Tuning for a tiny database

From
CSS
Date:
Greg Smith wrote:
> On 06/21/2011 01:49 AM, CSS wrote:
>> Some raw numbers: We're only looking at a total of about six tables in
>> one db.  In total there are going to be well under 10,000 records in ALL
>> tables.  That might increase to at most 100,000 in the next few years.
>> Our raw DNS queries/second tops out around 50 qps over three distinct
>> servers.  Keeping in mind that PowerDNS is doing heavy caching, we
>> should never really see more than a few db queries per second.
>
> I doubt you really need to do any tuning for this scenario.  I would set
> shared_buffers to a modest value--maybe 256MB--and stop further tuning
> until there's some evidence it's necessary.

Sounds good.  I'm at... wait for it... 8MB now.  This seems to work.
Since I originally posted this I've been benchmarking powerdns and I've
found that with it only hitting one core I can do about 50,000
queries/second.  During that time, Postgres is not even a blip.  I think
with the heavy caching powerdns does, it just doesn't really hit
postgres until it sees something in the cache has reached the max TTL.

In short, heavy DNS query traffic is not in turn causing heavy DNS
traffic.  This might change with someone pummeling it with queries for
domains or hosts that don't exist, but I believe it handles negative
caching of records as well.

> If presented with the same problem but with the much harder twist "I
> need to support >10,000 queries/second", I would recommend:
>
> -Populate a prototype with a representative amount of data
> -Measure the database size
> -Set shared_buffers to that

Thank you.  I'm fairly green with Postgres, so this is very helpful.

> -Whenever the database is restarted, construct a series of queries that
> forces all the data used regularly into the database's cache
> -Use pg_buffercache to confirm what's in there is what you expect

I need to read up on that module, but it looks quite intriguing.

> Getting all the data into cache is sometimes harder than expected.  Some
> optimizations in PostgreSQL keep it from caching large amount of tables
> when you do a sequential scan of the contents, as one example that
> complicates things.  But if you get to where this is necessary, building
> such a tool isn't difficult, and there are some projects out there that
> address this particular need:  filling the cache back up with relevant
> data after restart.  This is the main one:
>
> http://pgfoundry.org/projects/pgfincore/
> http://www.pgcon.org/2010/schedule/events/261.en.html

Thanks again, I appreciate all your input on what's currently looking
like a non-problem.  But there are other places I can certainly use this
info.

Charles


Re: Tuning for a tiny database

From
CSS
Date:
Steve Atkins wrote:
> On Jun 20, 2011, at 10:49 PM, CSS wrote:
>
>> Hello,
>>
>> I couldn't find much information in the archives on this -- perhaps this
>> is a bit of a specialized need, but I was hoping for some input from
>> some experienced postgres admins.
>>
>> I'm moving some DNS servers from djbdns/tinydns to PowerDNS.  While it
>> supports many backends, postgresql seems like the best choice for us
>> since it's what is used elsewhere (in larger configurations).  As a bit
>> of background, PowerDNS does not query the db for every incoming DNS
>> query, it caches at various levels (both a "packet cache" and a db
>> cache), so it's database needs are quite modest.
>>
>> Some raw numbers: We're only looking at a total of about six tables in
>> one db.  In total there are going to be well under 10,000 records in ALL
>> tables.  That might increase to at most 100,000 in the next few years.
>> Our raw DNS queries/second tops out around 50 qps over three distinct
>> servers.  Keeping in mind that PowerDNS is doing heavy caching, we
>> should never really see more than a few db queries per second.  There
>> will be one "master" pgsql db and three slaves using streaming replication.
>>
>> Now given the number of records and the frequency of queries, how should
>> I be tuning for such a small setup?  Ideally PowerDNS with it's huge
>> amount of caching should get as much RAM and CPU as I can give it, but I
>> do want to ensure the tiny bit of data postgres has is stuck in physical
>> memory as well.
>>
>> What would you suggest for this scenario?
>
> I'm doing pretty much the same thing. I installed postgresql 8.4.something and
> powerdns from the ubuntu repository on a small VM and hit go. It's running just
> fine, at comparable query rates and maybe 5,000 records. At some point I'll
> tune it, just out of tidiness, but it's fine out of the box.

That's pretty much what I'm seeing.  As I mentioned in my last email in
this thread, the only other test case I need to poke around with is
hitting the powerdns server with a bunch of stuff it's not authoritative
for (forcing a lookup in the domains table) and non-existent records in
domains it is authoritative for (forcing a lookup in the records table).

> You'll also need replication, probably from a stealth master. I found bucardo
> very easy to set up for that, and it works nicely to replicate to multiple public
> servers from a single master server inside the firewall.

Already there - I have a hidden master db server and then the three
public powerdns instances all slave from that.  I'm using the
"streaming" replication option in 9.0 and so far it seems to be working
well.  I need to look more closely at the whole log-shipping thing in
case a slave somehow gets cut off from the master, but in initial
testing, it looks really good.  My only problem was I initially was
putting some of these on older hosts and I ran into the 32 vs. 64 bit
problem in copying the data directory across.

Thanks for your input!

Charles

> Cheers,
>   Steve
>
>