shardman : Documentation: 14: 3.4. Connecting and Working with a Shardman Cluster : Postgres Professional

3.4. Connecting and Working with a Shardman Cluster
Prev	Up	Chapter 3. Develop	Home	Next

3.4. Connecting and Working with a Shardman Cluster #

3.4.1. SQL
3.4.2. psql/libpq
3.4.3. Python
3.4.4. Java
3.4.5. Go

As explained in Section 3.1.2, the cluster considered consists of four shards. This is how the data partitions of the main sharded table are distributed across shards.

For the ticket_no sharding key:

tickets_0 — shard-1 (cluster node node1)
tickets_1 — shard-2 (cluster node node2)
tickets_2 — shard-3 (cluster node node3)
tickets_3 — shard-4 (cluster node node4)

For the book_ref sharding key:

bookings_0 — shard-1 (cluster node node1)
bookings_1 — shard-2 (cluster node node2)
bookings_2 — shard-3 (cluster node node3)
bookings_3 — shard-4 (cluster node node4)

The examples below are provided for the book_ref sharding key, but the code in the subsections is suitable for the ticket_no sharding key.

Do not treat this code as optimal or use it in a production environment. It only shows how to implement creation of a connection pull to work with a Shardman cluster.

What is common for all the examples is the cluster connection string, which must contain node names, TCP port numbers, user name and password, database name for connection and a set of session parameters.

You can get this string using the shardmanctl utility. In the simplest case, the string looks like this:

$ shardmanctl getconnstr

dbname=postgres host=node1,node2,node3,node4 port=5432,5432,5432,5432

You can get this string to connect to cluster nodes or to create the connection pool in applications.

3.4.1. SQL #

A few convenient functions and views are implemented in Shardman that add cluster observability by:

Listing global tables
Listing sharded tables
Listing global sequences
Finding the shard number from the value of the sharding key
Perfroming ANALYZE for all the global and sharded tables in the cluster

3.4.1.1. Listing Global Tables #

To display all global tables in the cluster, use the shardman.global_tables view:

postgres=# select
    relname as table_name,
    nspname as schema
from shardman.global_tables;

 table_name |  schema
------------+----------
 aircrafts  | bookings
 seats      | bookings
 airports   | bookings
 flights    | bookings
(4 rows)

3.4.1.2. Listing Sharded Tables #

To display information on all the sharded tables in the cluster, query the shardman.sharded_tables view as follows:

postgres=# select
    relname as table_name,
    nparts  as partitions,
    colocated_with::oid::regclass::text as colocated_with,
    nspname as schema
from shardman.sharded_tables;

   table_name    | partitions | colocated_with |  schema
-----------------+------------+----------------+----------
 bookings        |          4 |                | bookings
 ticket_flights  |          4 | bookings       | bookings
 tickets         |          4 | bookings       | bookings
 boarding_passes |          4 | bookings       | bookings
(4 rows)

3.4.1.3. Listing Global Sequences #

To display all the global sequences in the cluster, use the shardman.sequence view:

postgres=# select
    seqns   as schema,
    seqname as sequence_name,
    seqmin  as min_value,
    seqmax  as max_value,
    seqblk  as bulk_size
from shardman.sequence;

  schema  |      sequence_name      | min_value |      max_value      | bulk_size
----------+-------------------------+-----------+---------------------+-----------
 bookings | flights_flight_id_seq   |    262145 | 9223372036854775807 |     65536
(1 rows)

3.4.1.4. Finding the Shard Number from the Sharding Key Value #

To display the name of the partition that contains data and the replication group name, call the shardman.get_partition_for_value() function. For example, for book_ref = 0369E5:

postgres=# select * from shardman.get_partition_for_value(
            'bookings'::regclass,
            '0369E5'::character(6));

 rgid | local_nspname | local_relname | remote_nspname | remote_relname
------+---------------+---------------+----------------+----------------
    1 | bookings      | bookings_0    | bookings       | bookings_0

This output shows that the data is in the bookings_0 partition of the bookings table and is located on the node where the query was executed.

Let's create a query to display the name of the server where the partition with data is located. If we connect to the server that contains the partition, the server name is displayed as “current server”. If the data is on a different server, the hostname of the shard master is displayed:

SELECT p.rgid,
      local_relname AS partition_name,
      CASE
          WHEN r.srvid IS NULL THEN 'current server'
          ELSE (SELECT (SELECT split_part(kv, '=', 2)
                        FROM (SELECT unnest(fs.srvoptions) as kv) x
                        WHERE split_part(kv, '=', 1) = 'host')
                FROM shardman.repgroups rg
                         JOIN pg_catalog.pg_foreign_server AS fs ON fs.oid = rg.srvid
                WHERE rg.id = p.rgid)
          END       AS server_name
FROM shardman.get_partition_for_value('bookings'::regclass, '0369E5'::character(6)) p
        JOIN shardman.repgroups AS r ON
   r.id = p.rgid;


 rgid | partition_name |  server_name
------+----------------+----------------
    1 | bookings_0     | current server
(1 row)

Execution of this query with another value of the sharding key, 0369E6, produces the output:

 rgid | partition_name | server_name
------+----------------+-------------
    4 | bookings_3_fdw | node4
(1 row)

It is clear that the partition is on the node4 node.

Also note that the shardman.rgid parameter allows you to find the number of the node with the connection session. To do this, execute the query:

SELECT pg_catalog.current_setting('shardman.rgid');

You can use this value to determine the location of connection sessions for queries like discussed in this section.

The shardman.get_partition_for_value() is mainly designed for administration purposes, to better understand the data topology.

As a rule, do not use administration functions when writing SQL code for data access.

3.4.1.5. Understanding How Partitions of Sharded Tables Are Distributed Across Shards #

You can get the list of all sharded tables in the bookings schema, together with the number of partitions and their distribution across servers (shards) from Shardman metadata on any cluster node.

Consider the following query:

SELECT p.rel::regclass::text AS table_name,
      p.pnum,
      p.rgid,
      r.srvid,
      fs.srvname
FROM shardman.parts p
JOIN shardman.repgroups r
  ON p.rgid = r.id
LEFT OUTER JOIN pg_foreign_server fs
             ON r.srvid = fs.oid;

To learn how the data is distributed, let's combine this query with a subquery from Section 3.4.1.4:

SELECT p.rel::regclass AS table_name,
      st.nparts AS total_parts,
      p.pnum AS num_part,
      CASE
         WHEN r.srvid IS NULL THEN 'connected server'
      ELSE
       (SELECT split_part(kv, '=', 2)
       FROM (SELECT unnest(fs.srvoptions) AS kv) x
       WHERE split_part(kv, '=', 1) = 'host')
       END AS server_name
FROM shardman.parts p
       JOIN shardman.repgroups r
         ON p.rgid = r.id
       LEFT JOIN shardman.sharded_tables st
              ON p.rel = st.rel
       LEFT JOIN pg_foreign_server fs
              ON r.srvid = fs.oid
WHERE st.nspname = 'bookings'
ORDER BY table_name, num_part, server_name;

The output format is the table name, number of table partitions, partition number and server name:

        table_name        | total_parts | num_part |   server_name
--------------------------+-------------+----------+------------------
 bookings.bookings        |           4 |        0 | connected server
 bookings.bookings        |           4 |        1 | node2
 bookings.bookings        |           4 |        2 | node3
 bookings.bookings        |           4 |        3 | node4
 bookings.ticket_flights  |           4 |        0 | connected server
 bookings.ticket_flights  |           4 |        1 | node2
 bookings.ticket_flights  |           4 |        2 | node3
 bookings.ticket_flights  |           4 |        3 | node4
 bookings.tickets         |           4 |        0 | connected server
 bookings.tickets         |           4 |        1 | node2
 bookings.tickets         |           4 |        2 | node3
 bookings.tickets         |           4 |        3 | node4
 bookings.boarding_passes |           4 |        0 | connected server
 bookings.boarding_passes |           4 |        1 | node2
 bookings.boarding_passes |           4 |        2 | node3
 bookings.boarding_passes |           4 |        3 | node4

3.4.1.6. Collecting Statistics #

To collect statistics for sharded and global tables, call the shardman.global_analyze() function. This function first collects statistics for all local partitions of sharded tables on each node and then broadcasts this statistics to other nodes. For a global table, the function first collects statistics on a certain node and then the statistics is broadcast to all the other nodes.

3.4.2. psql/libpq #

To connect to a Shardman cluster and successfully work with it, it is sufficient to connect to one cluster node. To do this, first get the connection string.

The PostgreSQL documentation contains the description of the cluster connection string. The string can be specified using two formats: a keyword/value string and URI. Any of them can be used to connect to a Shardman cluster.

Some parameters must also be specified. The list of parameters is also available in the PostgreSQL documentation.

The value of target_session_attrs must be set to read-write. Only connections that allow read/write transactions are acceptable. If the connection to a cluster node is a success, the request “SHOW transaction_read_only;” is sent. If it returns on, the connection is closed. If several servers are specified in the connection string, other servers will be iterated through, the same way as with the failed connection attempt. The target_session_attrs parameter allows you to specify both masters and replicas of the Shardman cluster.

The following examples illustrate the connection:

psql -d  "dbname=postgres host=node3,node4,node2,node1 port=5432,5432,5432,5432 user=username password=password target_session_attrs=read-write"

psql postgres://username:password@node1:5432,node2:5432,node3:5432,node4:5432/postgres?target_session_attrs=read-write

3.4.3. Python #

Connection to a Shardman cluster using the psycopg2 library looks like this:

import psycopg2
from psycopg2 import pool

pool = psycopg2.pool.SimpleConnectionPool(
                             min_size=1,
                             max_size=5,
                             user="pguser",
                             password="*****",
                             host="node1,node2,node3,node4",
                             port="5432,5432,5432,5432",
                             database="postgres",
                             target_session_attrs="read-write")

connection = pool.getconn()

A connection pool with the following parameters is created: the minimum and maximum number of connections min_size=1 and max_size=5. Then a specific connection to the cluster is selected, the user login and password are specified, as well as the list of nodes and TCP ports, database and connection parameters (see Section 3.4.2 for more information).

3.4.4. Java #

Connection to a Shardman cluster using JDBC looks like this:

String url = "jdbc:postgresql://node1:5432,node2:5432,node3:5432,node4:5432/postgres?loadBalanceHosts=true&targetServerType=primary";
Properties props = new Properties();


props.setProperty("user","postgres");
props.setProperty("password","********");

Connection conn = DriverManager.getConnection(url, props);

url contains the connection string, where all the available shard masters are listed. If no additional connection parameters of the JDBC driver are specified, connection to the cluster is performed through the first node available for connection. This is not always convenient. Therefore, connection string settings are added that allow using different cluster shards for different connections.

loadBalanceHosts=true allows iterating through nodes connecting to one of them, and targetServerType=primary indicates a need to only choose masters, then replicas can be added to the connection string.

3.4.5. Go #

Ways to connect to a Shardman cluster for Go are pretty much the same as those accepted in Java or Python. You need to specify lists of nodes, their TCP ports, as well as connection parameters and choose a suitable driver.

One of these drivers for Go is pgx version 4 or 5.

The following is an example of a connection string and creation of a pool for connecting to a cluster:

dbURL := "postgres://username:password@node1:5432,node2:5432,node3:5432,node4:5432/postgres?target_session_attrs=read-write")
dbPool, err := pgxpool.New(context.Background(), dbURL)

Also pay attention to the description of the target_session_attrs parameter.

Prev	Up	Next
3.3. Queries	Home	Chapter 4. Additional Features

epub pdf