shardmand
shardmand — Shardman configuration daemon
Synopsis
shardmand
[common_options
] [ --system-bus
] [ --user
]user_name
Here common_options
are:
[ --cluster-name
cluster_name
] [ --log-level
error
| warn
| info
| debug
] [ --retries
] [ retries_number
--session-timeout
] [ seconds
--store-endpoints
] [ store_endpoints
--store-ca-file
] [ store_ca_file
--store-cert-file
] [ store_cert_file
--store-key
] [ client_private_key
--store-timeout
] [ duration
--version
] [ -h
| --help
] [ --log-format
]
Description
shardmand is a Shardman configuration daemon. It runs on each node in a Shardman cluster, subscribes for changes of shardman/cluster0/data/ladle
and shardman/cluster0/data/cluster
keys in the etcd store (cluster0
is the default cluster name used by Shardman utils) and manages Shardman processes on the node where it is running according to the configuration described in these JSON documents.
shardmand manages integrated keepers
and sentinels
. On startup and when one of the monitored etcd keys changes, shardmand reconfigures them as follows:
It calculates the expected node configuration, i. e., the list of
keepers
andsentinels
expected to run and their configurations, from theshardman/cluster0/data/ladle
andshardman/cluster0/data/cluster
values.It receives the list of running
keepers
andsentinels
with their configurations from the internal process manager.It stops processes that are not expected to run. This can be a process that belongs to a cluster with the same name, but a different UUID, or a process whose description is no longer present in the expected node configuration. For
keeper
processes, shardmand purges their data directory.If a process should be running, but its settings are different from the expected ones, shardmand updates the configuration and restarts the process. If a process should be running, but it is not running, shardmand starts it.
Also, a separate thread of shardmand periodically updates the shardman/cluster0/data/shardmand/NODENAME
etcd key with the ClusterUUID
of the last cluster to which the configuration was applied. So, before the shardmanctl nodes add
command tries to initialize new stolon clusters for a clover, the command can ensure that no alive stolon threads from a previous cluster configuration are left on all nodes in the clover.
Additionaly, shardmand starts two http servers in separate threads. If servers ports match, a single server running both roles is started. The first server provides following metrics: shardmand_etcd_unavailable_time_seconds
, shardmand_healthy_keepers
, shardmand_sentinels
, shardmand_uptime
, shardmand_etcd_errors_total
, shardmand_reconfigurations_number_total
, shardmand_demotions_number_total
. Also server provides a /healthz
endpoint for shardmand health-check. The second server provides the following endponts:
/shardmand/v1/replica
— returns 200 status code if a secondary instance is running on node , 500 status code if a master instance is running on node,/shardmand/v1/master
— returns 200 status code if a master instance is running on node , 500 status code if a secondary instance is running on node. If node both master and secondary instances are running on node/shardmand/v1/replica
andshardmand/v1/master
endpoints return 404 status code./shardmand/v1/status
— getting information about shardmand status.
All Shardman services are managed by shardmand@cluster0.service
, so when it is started, stopped or restarted, it also starts, stops or restarts all other Shardman processes (including DBMS instances).
Command-line Reference
This section describes shardmand
-specific command-line options. For Shardman common options used by the commands, see the section called “Common Options”.
Common Options
shardmand common options are optional parameters that are not specific to the utility. They specify etcd connection settings, cluster name and a few more settings. By default shardmand tries to connect to the etcd store 127.0.0.1:2379
and use the cluster0
cluster name. The default log level is info
.
-
-h, --help
# Show brief usage information.
-
--cluster-name
#cluster_name
Specifies the name for a cluster to operate on. The default is
cluster0
.-
--log-level
#level
Specifies the log verbosity. Possible values of
level
are (from minimum to maximum):error
,warn
,info
anddebug
. The default isinfo
.-
--retries
#number
Specifies how many times shardmanctl retries a failing etcd request. If an etcd request fails, most likely, due to a connectivity issue, shardmanctl retries it the specified number of times before reporting an error. The default is 5.
-
--session-timeout
#seconds
Specifies the session timeout for shardmanctl locks. If there is no connectivity between shardmanctl and the etcd store for the specified number of seconds, the lock is released. The default is 30.
-
--store-endpoints
#string
Specifies the etcd address in the format:
http[s]://
. The default isaddress
[:port
](,http[s]://address
[:port
])*http://127.0.0.1:2379
.-
--store-ca-file
#string
Verify the certificate of the HTTPS-enabled etcd store server using this CA bundle.
-
--store-cert-file
#string
Specifies the certificate file for client identification by the etcd store.
-
--store-key
#string
Specifies the private key file for client identification by the etcd store.
-
--store-timeout
#duration
Specifies the timeout for a etcd request. The default is 5 seconds.
-
--monitor-port
#number
Specifies the port for the shardmand http server for metrics and probes. The default is 15432.
-
--api-port
#number
Specifies the port for the shardmand http api server. The default is 15432.
-
--version
# Show shardman-utils version information.
Environment
A shardmand service reads the environment from /etc/shardman/shardmand-cluster0.env
. The following environment variables affect the behavior of shardmand.
-
SDM_CLUSTER_NAME
# An alternative to setting the
--cluster-name
option-
SDM_LOG_LEVEL
# An alternative to setting the
--log-level
option-
SDM_RETRIES
# An alternative to setting the
--retries
option-
SDM_SYSTEM_BUS
# An alternative to setting the
--system-bus
option-
SDM_STORE_ENDPOINTS
# An alternative to setting the
--store-endpoints
option-
SDM_STORE_CA_FILE
# An alternative to setting the
--store-ca-file
option-
SDM_STORE_CERT_FILE
# An alternative to setting the
--store-cert-file
option-
SDM_STORE_KEY
# An alternative to setting the
--store-key
option-
SDM_STORE_TIMEOUT
# An alternative to setting the
--store-timeout
option-
SDM_SESSION_TIMEOUT
# An alternative to setting the
--session-timeout
option-
SDM_USER
# An alternative to setting the
--user
option
Examples
Configuring a shardmand Service
shardmand settings are usually specified in the /etc/shardman/shardmand-cluster0.env
file. If you want shardmand to connect to an etcd cluster at hosts n1
-n3
using port 2379 and all Shardman services to use the debug
log level, you can use the following env
file:
SDM_STORE_ENDPOINTS=http://n1:2379,http://n2:2379,http://n3:2379 SDM_LOG_LEVEL=debug
Note that you need to restart shardmand@cluster0
service to apply new settings from the env
file.
Showing shardmand Logs
To look at shardmand logs, you can use a journalctl
command:
$
journalctl -u shardmand@cluster0.service
Restarting Shardman Services
You can restart all Shardman services on a node using a systemctl
command:
$
systemctl restart shardmand@cluster0.service