Streaming Replication clusters and load balancing - Mailing list pgsql-hackers

From James Sewell
Subject Streaming Replication clusters and load balancing
Date
Msg-id CANkGpBu1hTEijzYRTo-RG02k=wNPJi7bvT2kyR_YxR3jFoaMDg@mail.gmail.com
Whole thread Raw
Responses Re: Streaming Replication clusters and load balancing  (Dmitry Vasilyev <d.vasilyev@postgrespro.ru>)
List pgsql-hackers
Hello all,

I have recently been working with PostgreSQL and HAProxy to provide seamless load balancing to a group of database servers. This on it's own isn't a hard thing: I have an implementation finished and am now thinking about the best way to bring it to a production ready state which could be used by others, and used in load-balancers other than HAProxy with minimal config changes.

My initial requirements were:

Given a group of PostgreSQL servers check each x seconds and:
  • Allow read/write access only to the master server (via IPA / portA)
    • Disallow access if there are multiple master servers
    • Allow read access to all servers (via IPB / portB) as long as the following holds:
      • They are attached to the current master server via streaming replication (or they are the current master server)
      • They can currently contact the master server (safest option, disallow all access when master-less)
      • They are in the same timeline as the master server (do I need this check?)
      • The master server reports that they have less than x bytes lag
      HAProxy can talk to PostgreSQL for a health check via TCP or PSQL (connection check only). Neither of these allow the logic above - therefore this logic has to be hosted outside of HAProxy. This might change in the future if HAProxy gets the ability to send SQL statements (like an F5 can).

      Today the best way to provide this information to  HAProxy (and many other load balancers, application frameworks, proxies, monitoring solutions) is via HTTP, where HTTP 200 is pass the check and HTTP 503 is fail the check (and don't load balance to this node). In my case I have a script which accepts HTTP requests to /read to check if this node is available for read only and /write which checks if this node is available for read/writes.

      The options as I see them are:
      • Implement a script / small app which connects to PostgreSQL and executes these checks
        • Implemented and proven today at many independent sites
        • Should it run on HAProxy server or PSQL server? 
        • Integrated HTTP server  or x.inetd script?
        • Platform independence?
        • What if it dies?
      • Implement a custom PostgreSQL BGworker which provides this information over HTTP
        • No outside of PostgreSQL config needed
        • No reliance on another daemon / interface being up
        • libmicrohttpd or similar should help with  platform independence
        • Security / acceptance by community?
        • Only newer versions of PostgreSQL
      • Spend the time working on getting SQL checks into HAProxy
        • What about other platforms which only support HTTP?
      I think all of the options would benefit from a PSQL extension which does the following:
      • Encapsulates the check logic (easier to upgrade, manipulate)
      • Stores historic check data for a number of hours / days / months
      • Stores defaults (override via HTTP could be possible for things like lag)
      Does anyone else have any thoughts on this topic? 

      Eventually many cool features could flow out of this kind of work:
      • Integration with High Availability products - I have this working with EnterpriseDB EFM now.
        • Locate the current master using the HA product 
          • more than one master doesn't cause loss of service as long as HA state is sane
        • Locate all clustered standby servers using the HA product
          • if a standby is removed from the HA cluster, it is removed from load balancing
          • if a standby is not part of the cluster, it is removed from load balancing (even if it is part of streaming replication)
      • HTTP replication status requests which facilitate dynamically managing HAProxy (or other) PostgreSQL server pools
        • Add a node to streaming replication, it automatically shows up in the pool and starts being checked to see if it can service reads
        • Great for cloud scale out
      • Allocation of additional load balancer groups based on some criteria (?), for example
        • read/write (as above)
        • read only (as above)
        • data warehouse (reporting reads only)
        • DR (replica with no reads or writes - until it becomes a master)
      Keen to hear comments.

      Cheers,

      James Sewell,
      Solutions Architect
      ______________________________________
       

      Level 2, 50 Queen St, Melbourne VIC 3000

      (+61) 3 8370 8000  W www.lisasoft.com  (+61) 3 8370 8099
       


      The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

      pgsql-hackers by date:

      Previous
      From: Jeevan Chalke
      Date:
      Subject: TEXT vs VARCHAR join qual push down diffrence, bug or expected?
      Next
      From: Marcin Mańk
      Date:
      Subject: Re: [patch] Proposal for \rotate in psql