Skip to content

Completed Part 1 #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions .ipynb_checkpoints/dbarry-checkpoint.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Inserted all SourceForge URLs.\n",
"Inserted all GitLab URLs.\n"
]
}
],
"source": [
"########################################\n",
"# Name: Daniel Barry\n",
"# Professor: Dr. Audris Mockus\n",
"# Date: 10/15/2018\n",
"# MiniProject2\n",
"# The purpose of this code is to store\n",
"# the URLs from the project discovery\n",
"# in the class database.\n",
"########################################\n",
"\n",
"import pymongo\n",
"\n",
"## Database information for storage of discovery results.\n",
"dbname = \"fdac18mp2\"\n",
"sfcollname = \"sfprj_dbarry\"\n",
"glcollname = \"glprj_dbarry\"\n",
"\n",
"client = pymongo.MongoClient(host='da1')\n",
"db = client[dbname]\n",
"\n",
"## Switch to SourceForge database.\n",
"coll = db[sfcollname]\n",
"\n",
"## Read-in the URLs from the SourceForge list.\n",
"f = open(\"dbarry_sf_list.txt\", 'r')\n",
"lines = f.readlines()\n",
"f.close()\n",
"\n",
"## Store the URLs from the SourceForge List.\n",
"for line in lines:\n",
" word = line.split()\n",
" coll.insert_one({\"url\": word[2]})\n",
"\n",
"print(\"Inserted all SourceForge URLs.\")\n",
"\n",
"## Switch to GitLab database.\n",
"coll = db[glcollname]\n",
"\n",
"## Read-in the URLs from the GitLab list.\n",
"f = open(\"dbarry_gl_list.txt\", 'r')\n",
"lines = f.readlines()\n",
"f.close()\n",
"\n",
"## Store the URLs from the GitLab List.\n",
"for line in lines:\n",
" word = line.split()\n",
" coll.insert_one({\"url\": word[0]})\n",
"\n",
"print(\"Inserted all GitLab URLs.\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
74 changes: 73 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,76 @@
# MiniProject2: Discover a list of projects on SourceForge.net and GitLab.com
# MiniProject2: Phase2: Store info on NPM packages in MongoDB

## Task: Getting Release info from GitHub on NPM packages

### Resources:
NPM package list

The list of packages is unique to each one of you:
/data/NPMvulnerabilities/NPMpkglist/NPMpkglist_XX.gz
where XX is between 0 and 33: to find your number look at the list below.

### Goal:
1. Download and store data from npm on all your packages on mongodb database:
fdac18mp2, collection: ghrel_yourutkid
1. Identify the packages that have GH repos (based on the stored info)
```
# it has to contain value in
record["collected"]["metadata"]["repository"]["url"]
"git+https://github.com//0-.git"
```
2. For each such package, get a list of all releases. Use Github API:
```
https://developer.github.com/v3/repos/releases/
```
3. Find no. of commits between the latest and other releases.

For example:
E.g. https://api.github.com/repos/webpack-contrib/html-loader/compare/v0.5.4...master or https://api.github.com/repos/git/git/compare/v2.2.0-rc1...v2.2.0-rc2
More resource: https://stackoverflow.com/questions/26925312/github-api-how-to-compare-2-commits (look for comparing the tags in the answer)
Get the data from the json, look for something like to get no. of commits between releases
"status": "ahead",
"ahead_by": 24,
"behind_by": 0,
"total_commits": 24,

| number | GitHub Username | NetID | Name |
|:-:|:-:|:-:|---|
| 0 | 3PIV | pprovins | Provins IV, Preston |
| 1 | BrettBass13 | bbass11 | Bass, Brett Czech |
| 2 | CipherR9 | gyj992 | Johnson, Rojae Antonio |
| 3 | Colsarcol | cmawhinn | Mawhinney, Colin Joseph |
| 4 | EvanEzell | eezell3 | Ezell, Evan Collin |
| 5 | MikeynJerry | jdunca51 | Duncan, Jerry |
| 6 | Tasmia | trahman4 | Rahman, Tasmia |
| 7 | awilki13 | awilki13 | Wilkinson, Alex Webb |
| 8 | bryanpacep1 | jpace7 | Pace, Jonathan Bryan |
| 9 | caiwjohn | cjohn3 | John, Cai William |
| 10 | cflemmon | cflemmon | Flemmons, Cole |
| 11 | dbarry9 | dbarry | Barry, Daniel Patrick |
| 12 | desai07 | adesai6 | Desai, Avie |
| 13 | gjones1911 | gjones2 | Jones, Gerald Leon |
| 14 | herronej | eherron5 | Herron, Emily Joyce |
| 15 | hossain-rayhan | rhossai2 | Hossain, Rayhan |
| 16 | jdong6 | jdong6 | Dong, Jeffrey Jing |
| 17 | jyu25utk | jyu25 | Yu, Jinxiao |
| 18 | mkramer6 | mkramer6 | Kramer, Matthew S |
| 19 | mmahbub | mmahbub | Mahbub, Maria |
| 20 | nmansou4 | nmansou4 | Mansour, Nasib |
| 21 | nschwerz | nschwerz | Schwerzler, Nicolas Winfield William |
| 22 | rdabbs42 | rdabbs1 | Dabbs, Rosemary |
| 23 | saramsv | mousavi | Mousavicheshmehkaboodi, Sara |
| 24 | spaulsteinberg | ssteinb2 | Steinberg, Samuel Paul |
| 25 | zol0 | akarnauc | Karnauch, Andrey |
| 26 | zrandall | zrandall | Randall, Zachary Adams |
| 27 | lpassarella | lpassare | Passarella, Linsey Sara |
| 28 | tgoedecke | pgoedec1 | Goedecke, Trish |
| 29 | ray830305 | hchang13 | Chang, Hsun Jui |
| 30 | ssravali | ssadhu2 | Sadhu, Sri Ravali |
| 31 | diadoo | jpovlin | Povlin, John P |
| 32 | mander59 | mander59 | Anderson, Matt Mcguffee |
| 33 | iway1 | iway1 | Way, Isaac Caldwell |

# MiniProject2: Phase1: Discover a list of projects on SourceForge.net and GitLab.com


These two forges present two different types of data discovery challenges.
Expand Down
83 changes: 83 additions & 0 deletions compareRels.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import sys, re, pymongo, json, time
import datetime
from requests.auth import HTTPBasicAuth
import requests
gleft = 1500

#client = pymongo.MongoClient ()
client = pymongo.MongoClient ('da1')
login = sys.argv[1]
passwd = sys.argv[2]

baseurl = 'https://api.github.com/repos'
headers = {'Accept': 'application/vnd.github.v3.star+json'}
headers = {'Accept': 'application/vnd.github.hellcat-preview+json'}

db = client['fdac18mp2'] # added in class
collName = 'releases_dbarry'
coll = db [collName]
def wait (left):
while (left < 20):
l = requests .get('https://api.github.com/rate_limit', auth=(login,passwd))
if (l.ok):
left = int (l.headers.get ('X-RateLimit-Remaining'))
reset = int (l.headers.get ('x-ratelimit-reset'))
now = int (time.time ())
dif = reset - now
if (dif > 0 and left < 20):
sys.stderr.write ("waiting for " + str (dif) + "s until"+str(left)+"s\n")
time .sleep (dif)
time .sleep (0.5)
return left

def get (url):
global gleft
gleft = wait (gleft)
values = []
# sys.stderr.write ("left:"+ str(left)+"s\n")
try:
r = requests .get (url, headers=headers, auth=(login, passwd))
time .sleep (0.5)
if (r.ok):
gleft = int(r.headers.get ('X-RateLimit-Remaining'))
lll = r.headers.get ('Link')
links = ['']
if lll is not None:
links = lll.split(',')
except Exception as e:
sys.stderr.write ("Could not get:" + url + ". Exception:" + str(e) + "\n")
return (json.loads(r.text))

def chunks(l, n):
if n < 1: n = 1
return [l[i:i + n] for i in range(0, len(l), n)]

def cmp_rel (url):
v = []
size = 0
try:
v = get (url)
except Exception as e:
sys.stderr.write ("Could not get:" + url + ". Exception:" + str(e) + "\n")
if 'ahead_by' in v and 'behind_by' in v:
print (url+';'+str(v['ahead_by'])+';'+str(v['behind_by']))
else:
sys.stderr.write ("Could not compare releases for: " + url + "; There exists no common ancestor between the two versions." + "\n")


p2r = {}
for l in sys.stdin.readlines():
l = l.rstrip()
p, r = l.split(';')
if p in p2r:
p2r[p] .append (r)
else:
p2r[p] = [r]

for p in p2r:
rs = p2r[p]
if len (rs) > 1:
for i in range(1,len (rs)):
url = 'https://api.github.com/repos/'+p+'/compare/' + rs[i-1] + '...' + rs[i]
cmp_rel (url)

91 changes: 91 additions & 0 deletions dbarry.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Inserted all SourceForge URLs.\n",
"Inserted all GitLab URLs.\n"
]
}
],
"source": [
"########################################\n",
"# Name: Daniel Barry\n",
"# Professor: Dr. Audris Mockus\n",
"# Date: 10/15/2018\n",
"# MiniProject2\n",
"# The purpose of this code is to store\n",
"# the URLs from the project discovery\n",
"# in the class database.\n",
"########################################\n",
"\n",
"import pymongo\n",
"\n",
"## Database information for storage of discovery results.\n",
"dbname = \"fdac18mp2\"\n",
"sfcollname = \"sfprj_dbarry\"\n",
"glcollname = \"glprj_dbarry\"\n",
"\n",
"client = pymongo.MongoClient(host='da1')\n",
"db = client[dbname]\n",
"\n",
"## Switch to SourceForge database.\n",
"coll = db[sfcollname]\n",
"\n",
"## Read-in the URLs from the SourceForge list.\n",
"f = open(\"dbarry_sf_list.txt\", 'r')\n",
"lines = f.readlines()\n",
"f.close()\n",
"\n",
"## Store the URLs from the SourceForge List.\n",
"for line in lines:\n",
" word = line.split()\n",
" coll.insert_one({\"url\": word[2]})\n",
"\n",
"print(\"Inserted all SourceForge URLs.\")\n",
"\n",
"## Switch to GitLab database.\n",
"coll = db[glcollname]\n",
"\n",
"## Read-in the URLs from the GitLab list.\n",
"f = open(\"dbarry_gl_list.txt\", 'r')\n",
"lines = f.readlines()\n",
"f.close()\n",
"\n",
"## Store the URLs from the GitLab List.\n",
"for line in lines:\n",
" word = line.split()\n",
" coll.insert_one({\"url\": word[0]})\n",
"\n",
"print(\"Inserted all GitLab URLs.\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
51 changes: 51 additions & 0 deletions dbarry_gl_list.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
https://gitlab.com/jsnanigans/library-ts-rollup.git
https://gitlab.com/KawaiiDE/LxQt/libqtxdg.git
https://gitlab.com/KawaiiDE/LxQt/libfm-qt.git
https://gitlab.com/KawaiiDE/LxQt/liblxqt.git
https://gitlab.com/KawaiiDE/LxQt/libsysstat.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-config.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-archiver.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-admin.git
https://gitlab.com/KawaiiDE/LxQt/lximage-qt.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-about.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-panel.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-runner.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-powermanagement.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-session.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-globalkeys.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-notificationd.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-policykit.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-qtplugin.git
https://gitlab.com/KawaiiDE/LxQt/LXQt-graphics.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-themes.git
https://gitlab.com/KawaiiDE/LxQt/lxqt-build-tools.git
https://gitlab.com/UbikGames/LibNPengine.git
https://gitlab.com/UbikGames/Tools/libmpq.git
https://gitlab.com/UbikGames/Tools/libKotOR.git
https://gitlab.com/UbikGames/Tools/libdts.git
https://gitlab.com/UbikGames/Tools/LibNPA.git
https://gitlab.com/planktos/libraries.git
https://gitlab.com/zee220/libdocx.git
https://gitlab.com/wardlem/lambdash.git
https://gitlab.com/HeLuchao/lodash.git
https://gitlab.com/HeLuchao/lib-flexible.git
https://gitlab.com/worldofpeace/libsignon-glib.git
https://gitlab.com/Markkano/libs.git
https://gitlab.com/noahs-arc/libc-rust.git
https://gitlab.com/inivation/libcaer.git
https://gitlab.com/empower-stack/lib-go.git
https://gitlab.com/Strikhol/libmy.git
https://gitlab.com/etofigh/libexpr.git
https://gitlab.com/Ma_124/libdcbot.git
https://gitlab.com/chaos-siegen/project/bob3/libbob3.git
https://gitlab.com/bjmuld/libpsf-python.git
https://gitlab.com/bjmuld/libpsf-core.git
https://gitlab.com/sireciotti/library.git
https://gitlab.com/BangZ/lib-algorithm.git
https://gitlab.com/jorge.suit/libstudxml.git
https://gitlab.com/faneder/libcs50.git
https://gitlab.com/UbikBSD/System/libgdiplus.git
https://gitlab.com/levindoneto/libadm.git
https://gitlab.com/UbikBSD/System/libdds.git
https://gitlab.com/splash.jalj/libreria.git
https://gitlab.com/zcash-git/librustzcash.git
Loading