Cleanup policies may exhaust the database connection pool and lead to product instability, requiring DBA intervention to terminate running queries in Postgresql before NXRM can be safely restarted.
A customer was running 3.38.1 with the default database connection pool size of 100 connections. NXRM was running in a pod inside Kubernetes.
The instance was performing OK for many months, without any cleanup policies configured.
Then a new cleanup policy was added to a Maven 2 "snapshots" version policy hosted repository. The criteria was only this one criteria:
Component Usage: Remove components that haven't been downloaded in 30 days.
The cleanup task that processes cleanup policies ran. A few hours later builds started to fail because GET requests to download components were not completing in a timely manner and user authentications were failing. other database operations requiring a database connection were not completing
In an attempt to restore stability to the instance, the pod was killed where NXRM was running and multiple restarts attempted because startup would never complete. NXRM would begin to start but always stop logging anything at this log line during startup:
Examining the pg_activity table on PostgreSQL for active queries from NXRM during the problem showed the following:
One query like this:
99 other queries doing this:
9 of these queries initiated from each of 9 restart attempts of NXRM after the initial incident:
Recall the default NXRM DB connection pool size is defaulted to 100.
The nexus.log at time of incident starting reporting a variety of different queries failing because of:
Not until the original NXRM database session/queries were terminated at the database level could NXRM be successfully restarted.
Cleanup policy delete statements need to perform in such a way that
- there is little to no risk of exhausting the NXRM database connection pool
- be short running and help PostgreSQL eventually terminate related queries on its own or better detect when NXRM has been terminated itself or terminated the task performing queries - although if PostgreSQL can do this depends
- statements do not block for long period of times other update or table modification statements related to components or assets