Finding the right version of the right JAR in a maven repository_问答_开发者

I'm converting a build that has 71 .jar files in its global lib/ directory to use Maven. Of course, these have been pulled from the web by lots of developers over the past ten years of this project's history, and weren't always added to VCS with all the necessary version info, etc.

Is there an easy, automated way to go from that set of .jar files to the corresponding <dependency/> elements for use in my pom.xml files? I'm hoping for a web page where I can submit the checksum of a jar file and get back an XML snippet. The google hits for 'maven repository search' are basically just finding name-based searches. And http://repo1.maven.org/ has no search whatsoever, as far as I can see.

Update: GrepCode l开发者_JAVA百科ooks like it can find projects given an MD5 checksum. But it doesn't provide the particular details (groupId, artifactId) that Maven needs.

Here's the script I came up with based on the accepted answer:

#!/bin/bash

for f in *.jar; do
    s=`md5sum $f | cut -d ' ' -f 1`;
    p=`wget -q -O - "http://www.jarvana.com/jarvana/search?search_type=content&content=${s}&filterContent=digest" | grep inspect-pom | cut -d \" -f 4`;
    pj="http://www.jarvana.com${p}";
    rm -f tmp;
    wget -q -O tmp "$pj";

    g=`grep groupId tmp | head -n 1 | cut -d \> -f 3 | cut -d \< -f 1`;
    a=`grep artifactId tmp | head -n 1 | cut -d \> -f 3 | cut -d \< -f 1`;
    v=`grep version tmp | head -n 1 | cut -d \> -f 3 | cut -d \< -f 1`;
    rm -f tmp;

    echo '<dependency> <!--' $f $s $pj '-->';
    echo "  <groupId>$g</groupId>";
    echo "  <artifactId>$a</artifactId>";
    echo "  <version>$v</version>";
    echo "</dependency>";
    echo;
done

I was in the same situation as OP, but as mentioned in later answers Jarvana is no longer up.

I used the search by checksum functionality of Maven Central Search and their search api to achieve the same results.

First create a file with the sha1sums

sha1sum *.jar > jar-sha1sums.txt

then use the following python script to check if there is any information on the jars in question

import json
import urllib2

f = open('./jar-sha1sums.txt','r')
pom = open('./pom.xml','w')
for line in f.readlines():
    sha = line.split("  ")[0]
    jar = line.split("  ")[1]
    print("Looking up "+jar)
    searchurl = 'http://search.maven.org/solrsearch/select?q=1:%22'+sha+'%22&rows=20&wt=json'
    page = urllib2.urlopen(searchurl)
    data = json.loads("".join(page.readlines()))
    if data["response"] and data["response"]["numFound"] == 1:
        print("Found info for "+jar)
        jarinfo = data["response"]["docs"][0]
        pom.write('<dependency>\n')
        pom.write('\t<groupId>'+jarinfo["g"]+'</groupId>\n')
        pom.write('\t<artifactId>'+jarinfo["a"]+'</artifactId>\n')
        pom.write('\t<version>'+jarinfo["v"]+'</version>\n')
        pom.write('</dependency>\n')
    else:
        print "No info found for "+jar
        pom.write('<!-- TODO Find information on this jar file--->\n')
        pom.write('<dependency>\n')
        pom.write('\t<groupId></groupId>\n')
        pom.write('\t<artifactId>'+jar.replace(".jar\n","")+'</artifactId>\n')
        pom.write('\t<version></version>\n')
        pom.write('</dependency>\n')
pom.close()
f.close()

YMMV

Jarvana can search on a digest (select digest next to the Content input field).

For example, a search on d1dcb0fbee884bb855bb327b8190af36 will return commons-collections-3.1.jar.md5. Then just click on the

Finding the right version of the right JAR in a maven repository

icon to get the details (including maven coordinates).

One can imagine automating this.

Jarvana no longer exists, however, you can use this Groovy script that will iterate through a directory and look up the SHA1 hash f each jar in Nexus. https://github.com/myspotontheweb/ant2ivy/blob/master/ant2ivy.groovy

It will create a pom.xml for maven users and an ivy.xml for Ivy users.

Borrowed the code and idea from @Karl Tryggvason but couldn't get the python script working. Being a Windows monkey I did something similar in Powershell (v3 required), not so sophisticated (doesn't generate you a pom, just dumps results) but I thought it might save someone a few minutes.

$log = 'c:\temp\jarfind.log'

Get-Date | Tee-Object -FilePath $log

$jars = gci d:\source\myProject\lib -Filter *.jar

foreach ($jar in $jars)
{
    $sha = Get-FileHash -Algorithm SHA1 -Path $jar.FullName | select -ExpandProperty hash
    $name = $jar.Name
    $json = Invoke-RestMethod "http://search.maven.org/solrsearch/select?q=1:%22$($sha)%22&rows=20&wt=json"
    "Found $($json.response.numfound) jars with sha1 matching that of $($name)..." | Tee-Object -FilePath $log -Append
    $jarinfo = $json.response.docs
    $jarinfo | Tee-Object -FilePath $log -Append
}

Hi you can use mvnrepository to search for artifacts or you can use Eclipse and go through the add dependency there is a search which is using the index of maven central.

If you want to use artifactId and version of the read from jar name, you can use following code. It's an improvised version of Karl's.

import os
import sys
from subprocess import check_output

import requests

def searchByShaChecksum(sha):
  searchurl = 'http://search.maven.org/solrsearch/select?q=1:%22' + sha + '%22&rows=20&wt=json'
  resp = requests.get(searchurl)
  data = resp.json()
  return data


def searchAsArtifact(artifact, version):
  searchurl = 'http://search.maven.org/solrsearch/select?q=a:"' + artifact + '" AND v:"' + version.strip() + '"&rows=20&wt=json'
  resp = requests.get(searchurl)
  # print(searchurl)
  data = resp.json()
  return data


def processAsArtifact(file: str):
  data = {'response': {'start': 0, 'docs': [], 'numFound': 0}}
  jar = file.replace(".jar", "")
  splits = jar.split("-")
  if (len(splits) < 2):
    return data
  for i in range(1, len(splits)):
    artifact = "-".join(splits[0:i])
    version = "-".join(splits[i:])
    data = searchAsArtifact(artifact, version)
    if data["response"] and data["response"]["numFound"] == 1:
      return data
  return data


def writeToPom(pom: object, grp: str = None, art: str = None, ver: str = None):
  if grp is not None and ver is not None:
    pom.write('<dependency>\n')
  else:
    pom.write('<!-- TODO Find information on this jar file--->\n')
    pom.write('<dependency>\n')
  grp = grp if grp is not None else ""
  art = art if art is not None else ""
  ver = ver if ver is not None else ""
  pom.write('\t<groupId>' + grp + '</groupId>\n')
  pom.write('\t<artifactId>' + art + '</artifactId>\n')
  pom.write('\t<version>' + ver + '</version>\n')
  pom.write('</dependency>\n')


def main(argv):
  if len(argv) == 0:
    print(bcolors.FAIL + 'Syntax : findPomJars.py <lib_dir_path>' + bcolors.ENDC)
  lib_home = str(argv[0])
  if os.path.exists(lib_home):
    os.chdir(lib_home)

    pom = open('./auto_gen_pom_list.xml', 'w')
    successList = []
    failedList = []
    jarCount = 0
    for lib in sorted(os.listdir(lib_home)):
      if lib.endswith(".jar"):
        jarCount += 1
        sys.stdout.write("\rProcessed Jar Count: %d" % jarCount)
        sys.stdout.flush()
        checkSum = check_output(["sha1sum", lib]).decode()
        sha = checkSum.split("  ")[0]
        jar = checkSum.split("  ")[1].strip()
        data = searchByShaChecksum(sha)
        if data["response"] and data["response"]["numFound"] == 0:
          data = processAsArtifact(jar)

        if data["response"] and data["response"]["numFound"] == 1:
          successList.append("Found info for " + jar)
          jarinfo = data["response"]["docs"][0]
          writeToPom(pom, jarinfo["g"], jarinfo["a"], jarinfo["v"])
        else:
          failedList.append("No info found for " + jar)
          writeToPom(pom, art=jar.replace(".jar\n", ""))
    pom.close()

    print("\n")
    print("Success : %d" % len(successList))
    print("Failed : %d" % len(failedList))

    for entry in successList:
      print(entry)
    for entry in failedList:
      print(entry)

  else:
    print
    bcolors.FAIL + lib_home, " directory doesn't exists" + bcolors.ENDC


if __name__ == "__main__":
  main(sys.argv[1:])

Code can also be found on GitHub

path from where jar is available

jar_name=junit-4.12.jar

sha1sum $jar_name > jar-sha1sums.txt

shaVal=`cat jar-sha1sums.txt | cut -d " " -f1`

response=$(curl -s 'http://search.maven.org/solrsearch/select?q=1:%22'$shaVal'%22&rows=20&wt=json')

formatted_response=`echo $response | grep -Po '"response":*.*'`

versionId=`echo $formatted_response | grep -Po '"v":"[0-9]*.[0-9]*"' | cut -d ":" -f2| xargs`

artifactId=`echo $formatted_response | grep -Po '"a":"[a-z]*"' | cut -d ":" -f2 | xargs`

groupId=`echo $formatted_response |   grep -Po '"g":"[a-z]*"' | cut -d ":" -f2 | xargs`

To find latest availabe version

lat_ver_response=$(curl -s https://search.maven.org/solrsearch/select?q=g:"$groupId"+AND+a:"$artifactId"&core=gav&rows=20&wt=json)

format_lat_ver_response=`echo $lat_ver_response | grep -Po '"response":*.*'`

latestVersionId=`echo $format_lat_ver_response | grep -Po '"latestVersion":"[0-9]*.[0-9]*"' | cut -d ":" -f2| xargs`

gist created from ant2maven script @ https://github.com/sachinsshetty/ant2Maven.git

https://gist.github.com/sachinsshetty/bab6ca24671cafe2cb63daaab47103f3

This is the same script form the answer the @karl-tryggvason but using Python 3:

import json
from urllib.request import urlopen
f = open('./jar-sha1sums.txt','r')
pom = open('./pom.xml','w')
for line in f.readlines():
    sha = line.split(" ")[0]
    jar = line.split(" ")[1]
    print("Looking up "+jar)
    searchurl = 'http://search.maven.org/solrsearch/select?q=1:%22'+sha+'%22&rows=20&wt=json'
    page = urlopen(searchurl)
    data = json.loads(b"".join(page.readlines()))
    if data["response"] and data["response"]["numFound"] == 1:
        print("Found info for "+jar)
        jarinfo = data["response"]["docs"][0]
        pom.write('<dependency>\n')
        pom.write('\t<groupId>'+jarinfo["g"]+'</groupId>\n')
        pom.write('\t<artifactId>'+jarinfo["a"]+'</artifactId>\n')
        pom.write('\t<version>'+jarinfo["v"]+'</version>\n')
        pom.write('</dependency>\n')
    else:
        print ("No info found for "+jar)
        pom.write('<!-- TODO Find information on this jar file--->\n')
        pom.write('<dependency>\n')
        pom.write('\t<groupId></groupId>\n')
        pom.write('\t<artifactId>'+jar.replace(".jar\n","")+'</artifactId>\n')
        pom.write('\t<version></version>\n')
        pom.write('</dependency>\n')
pom.close()
f.close()