Hey there Devvers! Long time, no see! I have been on vacation for the past while and thought I would return with my first open sourced project, Slither. Slither is a basic anonymizing framework for adding elite, https, anonymous proxy IPs and pseudo-random User Agents to your web scraping and/or pen-testing projects! Finally up and running and based heavily on the Bye Bye 403 series that I've been writing, Slither is my first foray into OSS and something I deeply care about and hope folks find useful!
The GitHub repo can be found here and I really hope you all enjoy it! A lot of requests and questions have come from around the web about this topic and how to make it easier to avoid the dreaded 403 (or worse, 503) when you're trying to scrape and/or aggregate data! The framework is dead simple and supports concurrent scraping as well as parallel processes.
Whenever an instance of the Slither class is declared, a list of IPs and User-Agents are pulled from proxy sites around the web and assigned to the Slither().ip
and Slither().ua
variables. Simple plug those two variables into your project's headers and your off and running!
I really hope this helps some newcomers to web scraping and the emerging field of collecting data for data scientists and ML engineers to use! Please give it a try and leave a comment here or on the repo. Be gentle as this is my first OSS project despite years of working in software 😝. Enjoy and happy scraping!
Great! Yet another tool to train my IP blacklisting app with app with. Machine learning model against spam generator... Perfect!