Skip to content

MiniProject2 part 1 assignment submission #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.ipynb_checkpoints
jdunca51.ipynb
45 changes: 41 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ where XX is between 0 and 33: to find your number look at the list below.
### Goal:
1. Download and store data from npm on all your packages on mongodb database:
fdac18mp2, collection: npm_yourutkid, the example code is in readNpm.py
```
zcat /data/NPMvulnerabilities/NPMpkglist/NPMpkglist_XX.gz | python3 readNpm.py
```
1. Identify the packages that have GH repos (based on the stored info)
```
import pymongo, json, sys
Expand All @@ -25,23 +28,57 @@ for r in coll.find():
if 'metadata' in r:
r = r['metadata']
if 'repository' in r:
r = r['url']
getReleases('url')
r = r['repository']
if 'url' in r:
r = r['url']
print (r)
```
Suppose the above code is in extrNpm.py. To output the urls:
```
python3 extrNpm.py > myurls
```
2. For each such package, get a list of all releases. Example file is readGit.py (you can use it with the snippet above to get releases). Reference to Github API:

2. For each such package, get a list of all releases. Example file is readGit.py (you can use it with the snippet above to get releases). It reads from standard input and populates
releases_yourutkid collection. Reference to Github API:
```
https://developer.github.com/v3/repos/releases/
```
3. Find no. of commits between the latest and other releases.
3. Extract releases from mongodb
```
import pymongo, json, sys
client = pymongo.MongoClient (host="da1")
db = client ['fdac18mp2']
id = "audris"
coll = db [ 'releases_' + id]
for r in coll.find():
n = r['name']
if 'values' in r:
for v in r['values']:
if 'tag_name' in v:
print (n+';'+v['tag_name'])
```
Suppose the above code is in extrRels.py. To output the urls:
```
cat myurls | python3 extrRels.py > myrels
```


4. Find no. of commits between the latest and other releases.

For example:
E.g. https://api.github.com/repos/webpack-contrib/html-loader/compare/v0.5.4...master or https://api.github.com/repos/git/git/compare/v2.2.0-rc1...v2.2.0-rc2
More resource: https://stackoverflow.com/questions/26925312/github-api-how-to-compare-2-commits (look for comparing the tags in the answer)
Get the data from the json, look for something like to get no. of commits between releases
```
"status": "ahead",
"ahead_by": 24,
"behind_by": 0,
"total_commits": 24,
```
For example
```
cat myrels | python3 compareRels.py
```

| number | GitHub Username | NetID | Name |
|:-:|:-:|:-:|---|
Expand Down
82 changes: 82 additions & 0 deletions compareRels_rdabbs1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import sys, re, pymongo, json, time
import datetime
from requests.auth import HTTPBasicAuth
import requests
gleft = 1500

#client = pymongo.MongoClient ()
client = pymongo.MongoClient (host="da1.eecs.utk.edu")
login = sys.argv[1]
passwd = sys.argv[2]

baseurl = 'https://api.github.com/repos'
headers = {'Accept': 'application/vnd.github.v3.star+json'}
headers = {'Accept': 'application/vnd.github.hellcat-preview+json'}

db = client['fdac18mp2'] # added in class
collName = 'releases_rdabbs1'
coll = db [collName]
def wait (left):
while (left < 20):
l = requests .get('https://api.github.com/rate_limit', auth=(login,passwd))
if (l.ok):
left = int (l.headers.get ('X-RateLimit-Remaining'))
reset = int (l.headers.get ('x-ratelimit-reset'))
now = int (time.time ())
dif = reset - now
if (dif > 0 and left < 20):
sys.stderr.write ("waiting for " + str (dif) + "s until"+str(left)+"s\n")
time .sleep (dif)
time .sleep (0.5)
return left

def get (url):
global gleft
gleft = wait (gleft)
values = []
# sys.stderr.write ("left:"+ str(left)+"s\n")
try:
r = requests .get (url, headers=headers, auth=(login, passwd))
time .sleep (0.5)
if (r.ok):
gleft = int(r.headers.get ('X-RateLimit-Remaining'))
lll = r.headers.get ('Link')
links = ['']
if lll is not None:
links = lll.split(',')
except Exception as e:
sys.stderr.write ("Could not get:" + url + ". Exception:" + str(e) + "\n")
return (json.loads(r.text))

def chunks(l, n):
if n < 1: n = 1
return [l[i:i + n] for i in range(0, len(l), n)]

def cmp_rel (url):
v = []
size = 0
try:
v = get (url)
except Exception as e:
sys.stderr.write ("Could not get:" + url + ". Exception:" + str(e) + "\n")
if 'ahead_by' in v and 'behind_by' in v:
print (url+';'+str(v['ahead_by'])+';'+str(v['behind_by']))
else:
sys.stderr.write ("Could not compare releases for: " + url + "; There exists no common ancestor between the two versions." + "\n")


p2r = {}
for l in sys.stdin.readlines():
l = l.rstrip()
p, r = l.split(';')
if p in p2r:
p2r[p] .append (r)
else:
p2r[p] = [r]

for p in p2r:
rs = p2r[p]
if len (rs) > 1:
for i in range(1,len (rs)):
url = 'https://api.github.com/repos/'+p+'/compare/' + rs[i-1] + '...' + rs[i]
cmp_rel (url)
15 changes: 15 additions & 0 deletions extrNpm_rdabbs1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import pymongo, json, sys
client = pymongo.MongoClient (host="da1")
db = client ['fdac18mp2']
id = "rdabbs1"
coll = db [ 'npm_' + id]
for r in coll.find():
if 'collected' in r:
r = r['collected']
if 'metadata' in r:
r = r['metadata']
if 'repository' in r:
r = r['repository']
if 'url' in r:
r = r['url']
print (r)
11 changes: 11 additions & 0 deletions extrRels_rdabbs1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import pymongo, json, sys
client = pymongo.MongoClient (host="da1")
db = client ['fdac18mp2']
id = "rdabbs1"
coll = db [ 'releases_' + id]
for r in coll.find():
n = r['name']
if 'values' in r:
for v in r['values']:
if 'tag_name' in v:
print (n+';'+v['tag_name'])
Loading