This package is developed by Greendeck
https://pypi.org/project/greendeck-proxygrabber/
pip install greendeck-proxygrabber
WHATS NEW?
Added proxy grabbing support of 4 new regions to proxy service, proxy grabber and proxy scraper.
Proxy service is a service that keeps and updates a Mongo Database with latest up and running proxies.
from greendeck_proxygrabber import ProxyService
service = ProxyService(MONGO_URI = 'mongodb://127.0.0.1:27017',
                       update_time = 300,
                       pool_limit = 1000,
                       update_count = 200,
                       database_name = 'proxy_pool',
                       collection_name_http = 'http',
                       collection_name_https = 'https',
                       country_code = 'ALL'
                       )This creates a service object.
- update_time = Time after which proxies will be updated (in seconds)
- pool_limit = Limit after which insertion will change to updating
- update_count = Number of proxies to request grabber at a time
- database_name = Mongo Database name to store proxies in
- collection_name_http = Collection name to store http proxies in
- collection_name_https = Collection name to store https proxies in
- country_code = ISO code of one of regions supported
List of supported regions is:
- Combined Regions: ALL
- United States: US
- Germany: DE
- Great Britain: GB
- France: FR
- Czech Republic: CZ
- Netherlands: NL
- India: IN
service.start()Starting service gives the following output:
MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Service...
This will run forever and will push/update proxies in mongodb after every {update_time} seconds.
Proxy to mongo is a functionality that lets you grab a set of valid proxies from the Internet and store it to the desired MongoDB database. You can schedule this to update or insert a given set of proxies to your database of pool, i.e. put it on airflow or any task scheduler.
from greendeck_proxygrabber import ProxyService
service = ProxyToMongo( MONGO_URI = MONGO_URI,
                        pool_limit = 1000,
                        length_proxy = 200,
                        database_name='proxy_pool',
                        collection_name_http='http',
                        collection_name_https='https',
                        country_code='DE'
                        )This creates a service object.
- pool_limit = Total number of proxies to keep in mongo/pass None if you don't want to update
- length_proxy = Number of proxies to fetch at once
- database_name = Mongo Database name to store proxies in
- collection_name_http = Collection name to store http proxies in
- collection_name_https = Collection name to store https proxies in
- country_code = ISO code of one of regions supported
List of supported regions is:
- Combined Regions: ALL
- United States: US
- Germany: DE
- Great Britain: GB
- France: FR
- Czech Republic: CZ
- Netherlands: NL
- India: IN
service.get_quick_proxy()Starting Grabber gives the following output:
MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Grabber...
This will run forever and will push/update proxies in mongodb after every {update_time} seconds.
from greendeck_proxygrabber import ProxyGrabbergrabber = ProxyGrabber(len_proxy_list, country_code, timeout)Here default values of some arguments are,
len_proxy_list = 10
country_code = 'ALL'
timeout = 2
Currently the program only supports proxies of combined regions
The grab_proxy grab_proxy() function helps to fetch the proxies.
grabber.grab_proxy()This returns a dictionary of the following structure:
{
    'https': [< list of https proxies >],
    'http': [< list of http proxies >],
    'region': 'ALL' # default for now
}The grab_proxy proxy_scraper() method of ScrapeProxy helps to fetch the proxies.
This returns a list of 200 proxies of both type http and https.
from greendeck_proxygrabber import ScrapeProxy
proxies_http, proxies_https = ScrapeProxy.proxy_scraper()This returns list of proxies of type http proxies followed by https proxies.
http_proxies = [< list of http proxies >]
https_proxies = [< list of https proxies >]
The proxy_checker_https and proxy_checker_http methods from ProxyChecker class helps to validate the proxies.
Given a list of proxies, it checks each of them to be valid or not, and returns a list of valid proxies from the proxies feeded to it.
from greendeck_proxygrabber import ProxyChecker
valid_proxies_http = ProxyChecker.proxy_checker_http(proxy_list = proxy_list_http, timeout = 2)
valid_proxies_https = ProxyChecker.proxy_checker_https(proxy_list = proxy_list_https, timeout = 2)π How to build your own pip package
- open an account here https://pypi.org/
In the parent directory
- python setup.py sdist bdist_wheel
- twine upload dist/*
references
MADE WITH π BY Gagan
