Details
Description
HA-C consists of nodes, each node being an Nexus instance.
It is common practice to put a load balancer in front of an HA-C cluster. It is anticipated one node may be deliberately brought offline while the others remain functioning.
A load balancer needs to reliably determine the health of an individual node in order to reroute requests to available nodes.
Our documentation currently suggests using:
http://<serveripaddress>:<port>/service/metrics/data
but that endpoint
- requires authentication, unless the anonymous user is granted access
- returns a lot of data irrelevant to server health and not appropriate for an anonymous user
Expected
Each node in an HA-C cluster should expose an endpoint that can be used by a load balancer to determine the "health" of the node with regards to its ability to participate in spreading the work of incoming cluster load.
As a guideline, the endpoint should meet the requirements for "health checks" as defined by common load balancers such as ELB and nginx:
https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-healthchecks.html
https://docs.nginx.com/nginx/admin-guide/load-balancer/http-health-check/
https://httpd.apache.org/docs/2.4/mod/mod_proxy_hcheck.html
Example
- anything other than a HTTP 200 status code indicates the node is not ready to do work
- require no authentication by default, but MAY have a privilege specific to the endpoint
- the endpoint MAY provide a response body that provides information about health, but in order to determine health, it MUST NOT be required that the client actually parse this response body
- HTTP should be the primary protocol, on the main HTTP(S) connector of the Nexus instance, used to check node health, since this greatly simplifies navigating firewalls
- if the cluster/node is read-only, document if this affects the status code as far as a load balancer is concerned
Attachments
Issue Links
- causes
-
NEXUS-18949 /service/rest/v1/status returns 200 status code when node is read-only
-
- Closed
-