Details
Description
When rpms are uploaded to a yum hosted repository, a metadata rebuild task is scheduled to run in the future ( default 60 seconds in the future, or from the end of last rebuild).
When the metadata rebuild runs, it will attempt to process as many repodata "levels" as it detects as needing a rebuild.
When a repository contains a lot of levels to rebuild, the total task completion time can take a noticeable time to complete. ie. 20+seconds for example.
The processing order of the task at a high level currently is:
1. Detect repodata levels needing rebuild ( short amount of time )
2. Iterate over and rebuild metadata for all levels to temporary files ( majority of task time )
3. After all levels are processed, write out all new metadata for all levels ( short amount of time )
Problem
After the task is started, a client may download a repodata/repomd.xml file for one level. This file references other files at that level.
After the task finishes, the same client may request the repodata filelists.xml.gz file which will now have been replaced by step 3 in the current design. This fails builds.
Expected
Design the task in such a way to write out the metadata per level as soon as it is known. At a high level this looks like this:
1. Detect repodata levels needing rebuild ( short amount of time )
2. Iterate over levels needing rebuild - for each level: ( long relative total time )
- rebuild metadata for level to temporary files ( short amount of time )
- write out new metadata for level immediately ( very short time )
- continue to next level
This design will help minimize the chances of build failure when a build is happening while the task is running.