Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
5
Description
If you restart a node while it's taking a backup then the node will preserve the read-only state, despite the backup task having been aborted. The following log entry during restart is where it restores the frozen state:
2018-05-28 02:17:08,146+0100 INFO [FelixStartLevel] 127.0.0.1 *SYSTEM org.sonatype.nexus.orient.internal.freeze.DatabaseFreezeServiceImpl - Restoring database frozen state on startup
The first thing to determine is whether this is acceptable behaviour - while it leaves NXRM in a degraded state (only returning previously cached content, disallowing write operations) it could be considered the safe option given NXRM was shutdown during backup. It is also not difficult to return NXRM to a writable state, using the UI or REST. If we decide this is "working-as-designed" then this ticket will verify that manual intervention can quickly return NXRM to normal service. We should also verify that the behaviour is the same whether NXRM is clustered or non-clustered.
However, if it's determined that this is not acceptable behaviour then this ticket will look at how to avoid leaving the freeze state if the backup task was aborted during shutdown. This may involve making sure we update the freeze state regardless of how the backup task ends. Note if the process is forcibly killed (eg. power-cut) then we won't get any chance to update it, but in that case it might be best to start as read-only as there may be data integrity issues. Another option would be to try and detect when backup was aborted and ignore restoring the freeze state - but that sounds more fragile. Other suggestions are welcome
While recreating these scenarios (restarting while non-clustered backup is in progress and restarting while clustered backup is in progress) also consider whether extra logging would be useful.
Summary:
- Confirm restarting while non-clustered backup is in progress leaves NXRM as read-only
- Confirm restarting while clustered backup is in progress leaves NXRM as read-only
- Get input from team / PO about desired behaviour
- Implement any change in behaviour (if necessary)
- Consider additional logging / recording reason for freeze