Uploaded image for project: 'Dev - Nexus Repo'
  1. Dev - Nexus Repo
  2. NEXUS-17609

Startup fails if a task was interrupted and NXRM is read-only or lacks quorum

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.13.0
    • Fix Version/s: 3.14.0
    • Component/s: HA, Scheduled Tasks
    • Labels:
      None

      Description

      NEXUS-9605 introduced a feature that detects when tasks don't have a valid last run state, and corrects this using the time NXRM last shutdown to estimate duration, etc. Unfortunately if NXRM is in read-only mode or lacks quorum (in HA) then the attempt to persist the updated last run state will fail and abort the startup process.

      when NXRM is read-only:

      2018-07-13 15:07:18,076+0100 WARN  [FelixStartLevel] *SYSTEM org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI - Updating lastRunState to interrupted for nexus.01e53b8b-869a-4866-859c-46a83c874919 taskConfig: {multinode=true, .name=Test, lastRunState.runStarted=1531490752646, .id=230c933c-df4e-4e16-a0e1-f7c3d5f158d5, .typeName=Admin - Execute script, language=groovy, source=while (true) {
        println 'ping'
        sleep(3000)
      }, .visible=true, .typeId=script, lastRunState.endState=INTERRUPTED, .updated=2018-07-12T21:44:24.353+01:00, .enabled=true, .message=Execute script, lastRunState.runDuration=0, .created=2018-07-12T20:54:55.533+01:00}
      2018-07-13 15:07:18,098+0100 WARN  [FelixStartLevel] *SYSTEM org.sonatype.nexus.quartz.internal.orient.JobStoreImpl - Execution failed
      com.orientechnologies.common.concur.lock.OModificationOperationProhibitedException: Modification requests are prohibited
      	DB name="config"
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.throwFreezeExceptionIfNeeded(OAtomicOperationsManager.java:358)
      	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.startAtomicOperation(OAtomicOperationsManager.java:197)
      	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.startStorageTx(OAbstractPaginatedStorage.java:3910)
      	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:1799)
      	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:541)
      	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:99)
      	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2908)
      	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2870)
      	at org.sonatype.nexus.orient.transaction.OrientTransaction.commit(OrientTransaction.java:83)
      	at org.sonatype.nexus.transaction.TransactionalWrapper.proceedWithTransaction(TransactionalWrapper.java:67)
      	at org.sonatype.nexus.transaction.Operations.transactional(Operations.java:200)
      	at org.sonatype.nexus.transaction.Operations.call(Operations.java:146)
      	at org.sonatype.nexus.orient.transaction.OrientOperations.call(OrientOperations.java:56)
      	at org.sonatype.nexus.quartz.internal.orient.JobStoreImpl.execute(JobStoreImpl.java:202)
      	at org.sonatype.nexus.quartz.internal.orient.JobStoreImpl.storeJob(JobStoreImpl.java:339)
      	at org.quartz.core.QuartzScheduler.addJob(QuartzScheduler.java:938)
      	at org.quartz.impl.StdScheduler.addJob(StdScheduler.java:273)
      	at org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI.updateLastRunStateInfo(QuartzSchedulerSPI.java:236)
      	at org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI.doStart(QuartzSchedulerSPI.java:201)
      	at org.sonatype.nexus.common.stateguard.StateGuardLifecycleSupport.start(StateGuardLifecycleSupport.java:67)
      	at org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI$$EnhancerByGuice$$8d457f36.CGLIB$start$25(<generated>)
      	at org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI$$EnhancerByGuice$$8d457f36$$FastClassByGuice$$dd138cd.invoke(<generated>)
      	at com.google.inject.internal.cglib.proxy.$MethodProxy.invokeSuper(MethodProxy.java:228)
      	at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:76)
      	at org.sonatype.nexus.common.stateguard.MethodInvocationAction.run(MethodInvocationAction.java:39)
      	at org.sonatype.nexus.common.stateguard.StateGuard$TransitionImpl.run(StateGuard.java:191)
      	at org.sonatype.nexus.common.stateguard.TransitionsInterceptor.invoke(TransitionsInterceptor.java:56)
      	at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:77)
      	at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:55)
      	at org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI$$EnhancerByGuice$$8d457f36.start(<generated>)
      	at org.sonatype.nexus.extender.NexusLifecycleManager.startComponent(NexusLifecycleManager.java:155)
      	at org.sonatype.nexus.extender.NexusLifecycleManager.to(NexusLifecycleManager.java:95)
      	at org.sonatype.nexus.extender.NexusContextListener.frameworkEvent(NexusContextListener.java:191)
      	at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1429)
      	at org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:308)
      	at java.lang.Thread.run(Thread.java:748)
      

      when quorum is missing:

      2018-07-12 21:09:33,230+0100 WARN  [FelixStartLevel]  *SYSTEM org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI - Updating lastRunState to interrupted for nexus.01e53b8b-869a-4866-859c-46a83c874919 taskConfig: {multinode=true, .name=Test, lastRunState.runStarted=1531425302119, .id=230c933c-df4e-4e16-a0e1-f7c3d5f158d5, .typeName=Admin - Execute script, language=groovy, source=while (true) {
        println 'ping'
        sleep(3000)
      }, .visible=true, .typeId=script, lastRunState.endState=INTERRUPTED, .updated=2018-07-12T20:54:55.533+01:00, .enabled=true, lastRunState.runDuration=100881, .created=2018-07-12T20:54:55.533+01:00}
      2018-07-12 21:09:33,302+0100 ERROR [FelixStartLevel]  *SYSTEM com.orientechnologies.orient.core.db.OPartitionedDatabasePool$DatabaseDocumentTxPooled - Error on transaction commit `165EE0AD`
      com.orientechnologies.orient.server.distributed.ODistributedException: Quorum (2) cannot be reached on server 'E776F470-EFCEF342-6732907B-3327B80D-4B260AAF' database 'config' because it is major than available nodes (1)
              at com.orientechnologies.orient.server.distributed.impl.ODistributedDatabaseImpl.calculateQuorum(ODistributedDatabaseImpl.java:1055)
              at com.orientechnologies.orient.server.distributed.impl.ODistributedDatabaseImpl.send2Nodes(ODistributedDatabaseImpl.java:430)
              at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.sendRequest(ODistributedAbstractPlugin.java:589)
              at com.orientechnologies.orient.server.distributed.impl.ODistributedTransactionManager.commit(ODistributedTransactionManager.java:162)
              at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage.commit(ODistributedStorage.java:1426)
              at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:541)
              at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:99)
              at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2908)
              at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2870)
              at org.sonatype.nexus.orient.transaction.OrientTransaction.commit(OrientTransaction.java:83)
              at org.sonatype.nexus.transaction.TransactionalWrapper.proceedWithTransaction(TransactionalWrapper.java:67)
              at org.sonatype.nexus.transaction.Operations.transactional(Operations.java:200)
              at org.sonatype.nexus.transaction.Operations.call(Operations.java:146)
              at org.sonatype.nexus.orient.transaction.OrientOperations.call(OrientOperations.java:56)
              at org.sonatype.nexus.quartz.internal.orient.JobStoreImpl.execute(JobStoreImpl.java:202)
              at org.sonatype.nexus.quartz.internal.orient.JobStoreImpl.storeJob(JobStoreImpl.java:339)
              at org.quartz.core.QuartzScheduler.addJob(QuartzScheduler.java:938)
              at org.quartz.impl.StdScheduler.addJob(StdScheduler.java:273)
              at org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI.updateLastRunStateInfo(QuartzSchedulerSPI.java:232)
              at org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI.doStart(QuartzSchedulerSPI.java:200)
              at org.sonatype.nexus.common.stateguard.StateGuardLifecycleSupport.start(StateGuardLifecycleSupport.java:67)
              at org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI$$EnhancerByGuice$$8d0b0af4.CGLIB$start$25(<generated>)
              at org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI$$EnhancerByGuice$$8d0b0af4$$FastClassByGuice$$974bccb.invoke(<generated>)
              at com.google.inject.internal.cglib.proxy.$MethodProxy.invokeSuper(MethodProxy.java:228)
              at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:76)
              at org.sonatype.nexus.common.stateguard.MethodInvocationAction.run(MethodInvocationAction.java:39)
              at org.sonatype.nexus.common.stateguard.StateGuard$TransitionImpl.run(StateGuard.java:191)
              at org.sonatype.nexus.common.stateguard.TransitionsInterceptor.invoke(TransitionsInterceptor.java:56)
              at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:77)
              at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:55)
              at org.sonatype.nexus.quartz.internal.QuartzSchedulerSPI$$EnhancerByGuice$$8d0b0af4.start(<generated>)
              at org.sonatype.nexus.extender.NexusLifecycleManager.startComponent(NexusLifecycleManager.java:155)
              at org.sonatype.nexus.extender.NexusLifecycleManager.to(NexusLifecycleManager.java:95)
              at org.sonatype.nexus.extender.NexusContextListener.frameworkEvent(NexusContextListener.java:191)
              at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1429)
              at org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:308)
              at java.lang.Thread.run(Thread.java:748)
      

      Expected:

      if we can't persist the new last run state then log a warning and continue, so the admin can resolve the problem via the UI (either by unfreezing the instance or resetting the quorum).

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              wwannemacher Wes Wannemacher
              Reporter:
              mcculls Stuart McCulloch
              Last Updated By:
              Peter Lynch Peter Lynch
              Team:
              Nexus - Platform
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Date of First Response:

                  tigCommentSecurity.panel-title