As part of investigating 10087 I found there was a slight improvement in the behaviour of Tika:
1.14 still wrongly identifies as text/html if the xml contains any tag starting with <html.., however if you have a comment at the start of your xml file, like the pom in question (Note: works even if it's empty <!-- -->) then it correctly identifies as text/xml.
- Create a proxy repo called "RSO" and point it to https://repository.sonatype.org/service/local/repositories/sonatype-internal/content/
- Request http://localhost:8081/repository/RSO/com/sonatype/insight/ci/insight-ci-parent/2.14.4/insight-ci-parent-2.14.4.pom
- You should be presented with a POM rather than a 404.