Dev - Nexus
  1. Dev - Nexus
  2. NEXUS-3915

When artifacts are blocked if they fail content validation - no entry is made in the RSS feeds

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.8.0
    • Fix Version/s: 1.9.1-RC1, 1.9.1
    • Component/s: System Feeds
    • Labels:
      None
    • Global Rank:
      5958

      Description

      Our websense decided that some jars where adult content (Yay!) and nexus detected that the jar was not a jar.
      However nexus made the follwoign entry in the log files

      2010-11-04 17:42:11 INFO [tp-31596370-427] - o.s.n.i.c.MavenArch~ - Failed to parse Maven artifact /home/software/nexus/nexus-pro
      fessional-webapp-1.8.0/./../sonatype-work/nexus/storage/central/org/eclipse/text/3.3.0-v20070606-0010/text-3.3.0-v20070606-0010.jar due to
      error in opening zip file
      2010-11-04 17:42:11 INFO [tp-31596370-427] - o.s.n.i.c.MavenArch~ - Failed to parse Maven artifact /home/software/nexus/nexus-pro
      fessional-webapp-1.8.0/./../sonatype-work/nexus/storage/central/org/eclipse/text/3.3.0-v20070606-0010/text-3.3.0-v20070606-0010.jar due to
      error in opening zip file

      But there was no entry in any of the following feeds
      Error and Warning events
      Broken artifacts in all Nexus repositories (...)
      Broken files in all Nexus repositories (...)

      So if you are monitoring these feeds then everything looks healthy, and it is not until you resort to the logs that you see something is wrong.

        Issue Links

          Activity

          Hide
          Marvin Herman Froeder added a comment - - edited

          Hi James,

          This is not really a nexus bug. By default, nexus doesn't validate the contents, it will serves what it gets from the remote repo.
          This INFO log is just the way that maven indexer has to tell the world that it won't be able to index that jar because it isn't a jar. But that will affect the jar content indexing only.

          If you wanna content to be validated you need to enable that:
          https://issues.sonatype.org/browse/NEXUS-3709
          http://www.sonatype.com/people/2010/11/nexus-1-8-0-1-functionality-and-fixes/

          VELO

          Show
          Marvin Herman Froeder added a comment - - edited Hi James, This is not really a nexus bug. By default, nexus doesn't validate the contents, it will serves what it gets from the remote repo. This INFO log is just the way that maven indexer has to tell the world that it won't be able to index that jar because it isn't a jar. But that will affect the jar content indexing only. If you wanna content to be validated you need to enable that: https://issues.sonatype.org/browse/NEXUS-3709 http://www.sonatype.com/people/2010/11/nexus-1-8-0-1-functionality-and-fixes/ VELO
          Hide
          James Nord added a comment -

          Hi Marvin,

          I disagree and this is a Nexus bug

          nexus detected that the jar was not a jar

          It did that because it is validating the content because we have enabled it (the fix for that was done in part for us see SUPPORT-288).

          So if you are saying that the only message I get that something failed content validation is an Indexer Message then I would say this is far more import than I originally deemed.

          I do see logs/rss errors for MD5 failure - I would expect Validation failure to be treated the same as far as logging/rss is concerned - basically if maven can not get some artifact from nexus we must be able to find out why (and when) - the source for this information is logs and RSS

          Show
          James Nord added a comment - Hi Marvin, I disagree and this is a Nexus bug nexus detected that the jar was not a jar It did that because it is validating the content because we have enabled it (the fix for that was done in part for us see SUPPORT-288). So if you are saying that the only message I get that something failed content validation is an Indexer Message then I would say this is far more import than I originally deemed. I do see logs/rss errors for MD5 failure - I would expect Validation failure to be treated the same as far as logging/rss is concerned - basically if maven can not get some artifact from nexus we must be able to find out why (and when) - the source for this information is logs and RSS
          Hide
          Tamás Cservenák added a comment -

          I don't get it from previous comments:

          If the "content validation" was turned on, and still the "adult JARs" were not caught by it, then it is a bug in "content validation" feature.

          If this is the case, could you please explain what your websense does? Content validation relies on MIME content in response... does websense responds with a HTML page but uses wrong Content-Encoding header?

          If the "content validation" was not turned on, then nothing helps, Nexus will blindly cache whatever it gets, and the logline you have is coming from Maven Indexer that tries to index already cached JARs, blindly cranking them up – and obviously failing to open this JARs which content is actually a HTML page, if I understand correctly. Indexer does not do content validation, it assumes the content to be indexed is okay.

          Show
          Tamás Cservenák added a comment - I don't get it from previous comments: If the "content validation" was turned on, and still the "adult JARs" were not caught by it, then it is a bug in "content validation" feature. If this is the case, could you please explain what your websense does? Content validation relies on MIME content in response... does websense responds with a HTML page but uses wrong Content-Encoding header? If the "content validation" was not turned on, then nothing helps, Nexus will blindly cache whatever it gets, and the logline you have is coming from Maven Indexer that tries to index already cached JARs, blindly cranking them up – and obviously failing to open this JARs which content is actually a HTML page, if I understand correctly. Indexer does not do content validation, it assumes the content to be indexed is okay.
          Hide
          Tamás Cservenák added a comment -

          Just to clear up, the issue title says "When artifacts are blocked if they fail content validation" suggests that Nexus' Content Validation did kick in ("they fail"), but in that case they are not (should not unless a bug lurks in there) cached in Nexus cache.

          But, according to your logs, they are in cache, and Maven Indexer chokes on them, and yes, Indexer errors are not to be found in RSS feeds.

          Show
          Tamás Cservenák added a comment - Just to clear up, the issue title says "When artifacts are blocked if they fail content validation" suggests that Nexus' Content Validation did kick in ("they fail"), but in that case they are not (should not unless a bug lurks in there) cached in Nexus cache. But, according to your logs, they are in cache, and Maven Indexer chokes on them, and yes, Indexer errors are not to be found in RSS feeds.
          Hide
          James Nord added a comment -

          This is what the system appeared to do from a user perspective (content validation is on)

          1) maven install requests artifact from nexus
          2) nexus requests content from remote repo (via proxy)
          3) something happens on proxy/nexus (assumed to be content validation failed - as downloading the jar via web browser resulted in a adult content HTML page from the proxy)
          4) maven install gets no artifact returned - maven throws error
          5) investigating the nexus stage locations showed no artifact in the folder location it is expected to be in.

          I will try and get a trace of what happens - but this involves changes to the production proxy/filtering so may take a while.

          Show
          James Nord added a comment - This is what the system appeared to do from a user perspective (content validation is on) 1) maven install requests artifact from nexus 2) nexus requests content from remote repo (via proxy) 3) something happens on proxy/nexus (assumed to be content validation failed - as downloading the jar via web browser resulted in a adult content HTML page from the proxy) 4) maven install gets no artifact returned - maven throws error 5) investigating the nexus stage locations showed no artifact in the folder location it is expected to be in. I will try and get a trace of what happens - but this involves changes to the production proxy/filtering so may take a while.
          Hide
          James Nord added a comment -

          it is hard to go back to this exact scenario - but this is what the http request looks like from a browser.

          > GET http://www.playboy.com/sometest/foo.jar HTTP/1.1
          > Host: www.playboy.com
          > Proxy-Connection: keep-alive
          > Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
          > User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.84 Safari/534.13
          > Accept-Encoding: gzip,deflate,sdch
          > Accept-Language: en-GB,en;q=0.8
          > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
          
          < HTTP/1.1 302 Moved Temporarily
          < Date: Wed, 09 Feb 2011 13:17:27 GMT
          < Proxy-Connection: close
          < Via: 1.1 localhost.localdomain
          < Location: http://xxx.xxx.xxx.xxx:yyyy/cgi-bin/blockpage.cgi?ws-session=123456789
          < Content-Length: 0
          
          
          > GET /cgi-bin/blockpage.cgi?ws-session=123456789 HTTP/1.1
          > Host: xxx.xxx.xxx.xxx:yyyy
          > Connection: keep-alive
          > Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
          > User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.84 Safari/534.13
          > Accept-Encoding: gzip,deflate,sdch
          > Accept-Language: en-GB,en;q=0.8
          > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
          
          < HTTP/1.0 200 OK
          < Content-Length: 1650
          < Content-Type: text/html; charset=iso-8859-1
          < 
          ...
          
          Show
          James Nord added a comment - it is hard to go back to this exact scenario - but this is what the http request looks like from a browser. > GET http://www.playboy.com/sometest/foo.jar HTTP/1.1 > Host: www.playboy.com > Proxy-Connection: keep-alive > Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 > User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.84 Safari/534.13 > Accept-Encoding: gzip,deflate,sdch > Accept-Language: en-GB,en;q=0.8 > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 < HTTP/1.1 302 Moved Temporarily < Date: Wed, 09 Feb 2011 13:17:27 GMT < Proxy-Connection: close < Via: 1.1 localhost.localdomain < Location: http://xxx.xxx.xxx.xxx:yyyy/cgi-bin/blockpage.cgi?ws-session=123456789 < Content-Length: 0 > GET /cgi-bin/blockpage.cgi?ws-session=123456789 HTTP/1.1 > Host: xxx.xxx.xxx.xxx:yyyy > Connection: keep-alive > Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 > User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.84 Safari/534.13 > Accept-Encoding: gzip,deflate,sdch > Accept-Language: en-GB,en;q=0.8 > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 < HTTP/1.0 200 OK < Content-Length: 1650 < Content-Type: text/html; charset=iso-8859-1 < ...
          Hide
          Brian Demers added a comment -

          If I am reading the comments correctly. It seems that we just might be missing a feed entry when a file is blocked due to content validation?

          I think what everyone is confused about is that the file: /home/software/nexus/nexus-pro
          fessional-webapp-1.8.0/./../sonatype-work/nexus/storage/central/org/eclipse/text/3.3.0-v20070606-0010/text-3.3.0-v20070606-0010.jar is already in your system. And either passed content validation or was download before you enabled it ? You should be able to check the dates in the Artifact Info panel to verify this.

          Show
          Brian Demers added a comment - If I am reading the comments correctly. It seems that we just might be missing a feed entry when a file is blocked due to content validation? I think what everyone is confused about is that the file: /home/software/nexus/nexus-pro fessional-webapp-1.8.0/./../sonatype-work/nexus/storage/central/org/eclipse/text/3.3.0-v20070606-0010/text-3.3.0-v20070606-0010.jar is already in your system. And either passed content validation or was download before you enabled it ? You should be able to check the dates in the Artifact Info panel to verify this.
          Hide
          Tamás Cservenák added a comment -

          I think now it's clear what happens:

          • the Nexus' content validation did it's job, but it was quiet about doing it. You would like to have at least an entry in RSS feed. This was missed and fix is on the way.
          • The build probably did pull the artifact's POM successfully (websense did not trigger on it), but failed with JAR. Internally, the Indexer will try to look up the POM's corresponding JAR, but it does in clumsy way (NEXUS-3638)

          And the (indexer) logs pasted by James are actually not because the JARs is in cache but it's invalid JARs (a HTML), but it's because the JAR file is not there at all, it's only indexer "blindly" opening up (trying to open up the file) the file.

          Show
          Tamás Cservenák added a comment - I think now it's clear what happens: the Nexus' content validation did it's job , but it was quiet about doing it. You would like to have at least an entry in RSS feed. This was missed and fix is on the way. The build probably did pull the artifact's POM successfully (websense did not trigger on it), but failed with JAR. Internally, the Indexer will try to look up the POM's corresponding JAR, but it does in clumsy way ( NEXUS-3638 ) And the (indexer) logs pasted by James are actually not because the JARs is in cache but it's invalid JARs (a HTML), but it's because the JAR file is not there at all, it's only indexer "blindly" opening up (trying to open up the file) the file.
          Hide
          James Nord added a comment -

          the Nexus' content validation did it's job, but it was quiet about doing it. You would like to have at least an entry in RSS feed. This was missed and fix is on the way.

          Yes.

          The build probably did pull the artifact's POM successfully (websense did not trigger on it), but failed with JAR.

          I believe this was the case at the time.

          If I am reading the comments correctly. It seems that we just might be missing a feed entry when a file is blocked due to content validation?

          A feed entry and a log file entry (it appears I had mistaken the indexer log issue with a content validation log)

          I think what everyone is confused about is that the file: /home/software/nexus/nexus-pro
          fessional-webapp-1.8.0/./../sonatype-work/nexus/storage/central/org/eclipse/text/3.3.0-v20070606-0010/text-3.3.0-v20070606-0010.jar is already in your system. And either passed content validation or was download before you enabled it ? You should be able to check the dates in the Artifact Info panel to verify this.

          When I checked just now - the jar was not present (we used to expire proxied items - so I'm assuming this was cleaned up but the pom was not)
          After -re-requesting the jar I looked at the "Artifact Information"
          For the jar:
          Uploaded Date:
          Tue Nov 27 2007 07:14:42 GMT+0000 (GMT Standard Time)
          Last Modified:
          Tue Nov 27 2007 07:14:42 GMT+0000 (GMT Standard Time)

          For the pom:
          Uploaded Date:
          Tue Nov 27 2007 07:14:41 GMT+0000 (GMT Standard Time)
          Last Modified:
          Tue Nov 27 2007 07:14:41 GMT+0000 (GMT Standard Time)

          I'm not sure what that tells you/me apart from when it was uploaded to the remote(maven central and not when it was cached.

          Show
          James Nord added a comment - the Nexus' content validation did it's job, but it was quiet about doing it. You would like to have at least an entry in RSS feed. This was missed and fix is on the way. Yes. The build probably did pull the artifact's POM successfully (websense did not trigger on it), but failed with JAR. I believe this was the case at the time. If I am reading the comments correctly. It seems that we just might be missing a feed entry when a file is blocked due to content validation? A feed entry and a log file entry (it appears I had mistaken the indexer log issue with a content validation log) I think what everyone is confused about is that the file: /home/software/nexus/nexus-pro fessional-webapp-1.8.0/./../sonatype-work/nexus/storage/central/org/eclipse/text/3.3.0-v20070606-0010/text-3.3.0-v20070606-0010.jar is already in your system. And either passed content validation or was download before you enabled it ? You should be able to check the dates in the Artifact Info panel to verify this. When I checked just now - the jar was not present (we used to expire proxied items - so I'm assuming this was cleaned up but the pom was not) After -re-requesting the jar I looked at the "Artifact Information" For the jar: Uploaded Date: Tue Nov 27 2007 07:14:42 GMT+0000 (GMT Standard Time) Last Modified: Tue Nov 27 2007 07:14:42 GMT+0000 (GMT Standard Time) For the pom: Uploaded Date: Tue Nov 27 2007 07:14:41 GMT+0000 (GMT Standard Time) Last Modified: Tue Nov 27 2007 07:14:41 GMT+0000 (GMT Standard Time) I'm not sure what that tells you/me apart from when it was uploaded to the remote( maven central and not when it was cached.
          Hide
          Tamás Cservenák added a comment -

          Validated, RSS feed does contains the entry about artifact not passing content validation.

          On a side-note: Maven Indexer got also some enhancements, and will not produce misleading logs like those in this issue (before, it "blindly" tried to open the JAR not present, misleading us to believe that JAR was downloaded, but actually it was not present in local cache).

          Tested using teaser servlet

          https://github.com/cstamas/teaser

          Just create a proxy against /echo resource (it accepts and processes any path below /echo by just dumping the response as text/plain), it triggered content validation (I requested a POM) and request was banned, no POM (or plaintext response) was cached, and also RSS entry was created "...the artifact /log4j/log4j/1.2.13/log4j-1.2.13.pom content is invalid in repository echo-proxy!".

          Show
          Tamás Cservenák added a comment - Validated, RSS feed does contains the entry about artifact not passing content validation. On a side-note: Maven Indexer got also some enhancements, and will not produce misleading logs like those in this issue (before, it "blindly" tried to open the JAR not present, misleading us to believe that JAR was downloaded, but actually it was not present in local cache). Tested using teaser servlet https://github.com/cstamas/teaser Just create a proxy against /echo resource (it accepts and processes any path below /echo by just dumping the response as text/plain), it triggered content validation (I requested a POM) and request was banned, no POM (or plaintext response) was cached, and also RSS entry was created "...the artifact /log4j/log4j/1.2.13/log4j-1.2.13.pom content is invalid in repository echo-proxy!".

            People

            • Assignee:
              Tamás Cservenák
              Reporter:
              James Nord
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Date of First Response: