Uploaded image for project: 'Dev - Nexus Repo'
  1. Dev - Nexus Repo
  2. NEXUS-31000

Possible race condition in reconcile component database from blob store task

    Details

    • Story Points:
      2
    • Sprint:
      NXRM MadMax Sprint 30, NXRM MadMax Sprint 31
    • Notability:
      3
    • InvestmentLayer:
      support-escalated
    • Aha Concept:
      non-concept

      Description

      The reconcile component database from blob store task has been observed to fail on multiple occasions with an NPE:

      2022-01-22 01:20:32,827+0000 WARN [quartz-17-thread-11] *SYSTEM org.sonatype.nexus.quartz.internal.task.QuartzTaskJob - Task a514e305-2841-4ad9-8ebb-8f2685c7c72c : 'Repair component database' [blobstore.rebuildComponentDB] execution failure
      java.lang.NullPointerException: null
      at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:880)
      at org.sonatype.nexus.repository.manager.internal.RepositoryImpl.getConfiguration(RepositoryImpl.java:101)
      at org.sonatype.nexus.repository.manager.internal.RepositoryManagerImpl.lambda$1(RepositoryManagerImpl.java:317)
      at java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
      at java.util.Spliterators$ArraySpliterator.tryAdvance(Unknown Source)
      at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(Unknown Source)
      at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(Unknown Source)
      at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(Unknown Source)
      at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(Unknown Source)
      at java.util.Spliterators$1Adapter.hasNext(Unknown Source)
      at java.util.Iterator.forEachRemaining(Unknown Source)
      at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Unknown Source)
      at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
      at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
      at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
      at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
      at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
      at java.util.stream.ReferencePipeline.forEach(Unknown Source)
      at org.sonatype.nexus.blobstore.restore.orient.OrientRestoreMetadataTask.blobStoreIntegrityCheck(OrientRestoreMetadataTask.java:222)
      at org.sonatype.nexus.blobstore.restore.orient.OrientRestoreMetadataTask.execute(OrientRestoreMetadataTask.java:121)
      at org.sonatype.nexus.blobstore.restore.orient.OrientRestoreMetadataTask.execute(OrientRestoreMetadataTask.java:1)
      at org.sonatype.nexus.scheduling.TaskSupport.call(TaskSupport.java:100)
      at org.sonatype.nexus.quartz.internal.task.QuartzTaskJob.doExecute(QuartzTaskJob.java:143)
      at org.sonatype.nexus.quartz.internal.task.QuartzTaskJob.execute(QuartzTaskJob.java:106)
      at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
      at org.sonatype.nexus.quartz.internal.QuartzThreadPool.lambda$0(QuartzThreadPool.java:145)
      at org.sonatype.nexus.thread.internal.MDCAwareRunnable.run(MDCAwareRunnable.java:40)
      at org.apache.shiro.subject.support.SubjectRunnable.doRun(SubjectRunnable.java:120)
      at org.apache.shiro.subject.support.SubjectRunnable.run(SubjectRunnable.java:108)
      at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)

      This is not always reproducible, some runs can fail while a subsequent one works.

      It looks like there may be a race condition here:

      https://github.com/sonatype/nexus-public/blob/release-3.37.3-02/plugins/nexus-blobstore-tasks/src/main/java/org/sonatype/nexus/blobstore/restore/orient/OrientRestoreMetadataTask.java#L222

      We're iterating over all repositories and doing very expensive, long running operations. If one of those repositories is deleted or changed during this process, I think that may trigger this.

      Please investigate, and fix as needed.

        Attachments

          Activity

            People

            Assignee:
            mkalachov Maksym Kalachov [X] (Inactive)
            Reporter:
            rseddon Rich Seddon
            Last Updated By:
            Michael Oliverio Michael Oliverio
            Team:
            NXRM - Mad Max
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:
              Date of First Response:

                tigCommentSecurity.panel-title