Uploaded image for project: 'Dev - Nexus Repo'
  1. Dev - Nexus Repo
  2. NEXUS-18234

provide a health endpoint for an individual node in an HA-C cluster

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 3.14.0
    • Fix Version/s: 3.15.0
    • Component/s: HA
    • Story Points:
      3

      Description

      HA-C consists of nodes, each node being an Nexus instance.

      It is common practice to put a load balancer in front of an HA-C cluster. It is anticipated one node may be deliberately brought offline while the others remain functioning.

      A load balancer needs to reliably determine the health of an individual node in order to reroute requests to available nodes.

      Our documentation currently suggests using:

      http://<serveripaddress>:<port>/service/metrics/data

      but that endpoint

      • requires authentication, unless the anonymous user is granted access
      • returns a lot of data irrelevant to server health and not appropriate for an anonymous user

      Expected

      Each node in an HA-C cluster should expose an endpoint that can be used by a load balancer to determine the "health" of the node with regards to its ability to participate in spreading the work of incoming cluster load.

      As a guideline, the endpoint should meet the requirements for "health checks" as defined by common load balancers such as ELB and nginx:

      https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-healthchecks.html
      https://docs.nginx.com/nginx/admin-guide/load-balancer/http-health-check/
      https://httpd.apache.org/docs/2.4/mod/mod_proxy_hcheck.html

      Example

      • anything other than a HTTP 200 status code indicates the node is not ready to do work
      • require no authentication by default, but MAY have a privilege specific to the endpoint
      • the endpoint MAY provide a response body that provides information about health, but in order to determine health, it MUST NOT be required that the client actually parse this response body
      • HTTP should be the primary protocol, on the main HTTP(S) connector of the Nexus instance, used to check node health, since this greatly simplifies navigating firewalls
      • if the cluster/node is read-only, document if this affects the status code as far as a load balancer is concerned

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              mjohnson Matt Johnson
              Reporter:
              plynch Peter Lynch
              Last Updated By:
              Peter Lynch Peter Lynch
              Team:
              NXRM - Morpheus
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Date of First Response:

                  tigCommentSecurity.panel-title