Uploaded image for project: 'Dev - Nexus Repo'
  1. Dev - Nexus Repo
  2. NEXUS-5698

S3 scraper fails to retrieve bucket list if response is truncated

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 2.5
    • Component/s: Routing
    • Labels:
      None

      Description

      Sadly, this makes S3 unusable for almost all S3 hosted repositories.

      S3 buckets returns the Contents in "paged" way, with having maxed 1000 entries per response (hard limit set by S3, no way to circumvent that). Hence, for S3 scraper to list all the contents of the repository if 1st page it got has isTruncated=true (which happens to almost all S3 repositories), it has to generate subsequent requests using prefix marker parameters.

      Current code in 2.4 does it wrong, as it generates URL for request by duplicating prefix query parameter:

      http://somes3repo.com/?prefix=foo/bar?prefix=foo/bar
      

      (prefix=... part is duplicated). Amazon S3 ends up interpreting this GET as query for prefixes "foo/bar?prefix=foo/bar", and naturally 2nd response will be empty as there are no file with such prefix.

      Hence, the "scraped" prefixes file will be incomplete containing prefixes from first page only. This also means, that proxy of S3 bucket will not ask for artifacts that are actually there.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              cstamas Tamás Cservenák
              Last Updated By:
              Peter Lynch Peter Lynch
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Date of First Response:

                  tigCommentSecurity.panel-title