Attention Web Scrapers and Pen Testers: Slither is now a PyPI package! 🎉
kaelscion

kaelscion @kaelscion

About: I'm Jake Cahill: Lifetime Pythonista, web scraping, cloud computing, and automation expert. Enjoy books. Love my wife, dog, and cat, and think AI and Rust are pretty nifty

Location:
Massachusetts, USA
Joined:
Sep 6, 2018

Attention Web Scrapers and Pen Testers: Slither is now a PyPI package! 🎉

Publish Date: Jul 12 '19
12 4

Hey data science, web automation, web scraping, and data aggregation folks. Are you tired of needing to purchase proxy IP addresses that get blocked on your goal web asset within a couple of days at most? Do you not yet have your own solution for cycling IP addresses and/or user agents? Do you like super salesy pitches like this one and tend to buy things from QVC after being asked stupid rhetorical questions!?! Well then have I got great news for you!

All kidding aside, I've finally gotten around to uploading my proxy IP and user-agent cycling library Slither to PyPI! To check out the GitHub repo, go here, for the PyPI page, head here

Only python 3 is supported and no support for python 2 is planned. This is my small way of doing my part to encourage Python 3 use over Python 2. To install it in your next project in a Python-3-only environment:

pip install slitherlib

for a multi-distro environment:

pip3 install slitherlib

To actually use the library in your scraping projects:


from slitherlib.slither import Snake
from random import choice

import requests

s = Snake()
ip_address = choice(s.ips)
user-agent= choice(s.uas)

headers = {
    "User-Agent": user-agents
}

r = requests.get('https://www.google.com', 
                 proxies={'https': ip_address, 
                          'http': ip_address},
                 headers=headers})
Enter fullscreen mode Exit fullscreen mode

At this time, Slither pulls IP addresses and User-Agents from free sources around the web and dump them into two variables, ips and uas. We add new proxy ip:port sources as we can find them and verify, to the best of our ability, that they are not run by hackers looking to steal IP address information.

As this project grows, we hope to build it into a full web-scraping suite that easily supports concurrency and multi-processing, ROBOTS.txt support, webdriver browser automation, dynamic mouse-moves, and other goodies that will keep the data-collection enthusiast collecting data more and fighting 403 and 404 codes less!

If you like it, please give us a star on GitHub! I welcome bug reports, feature requests, and any comments or concerns you have so that I can make this library the best it can be! And, as always, I LOVE to collaborate so feel free to open a PR if you have improvements or ideas!

Comments 4 total

  • Mpho Mphego
    Mpho MphegoAug 13, 2019

    Does this package work with Selenium firefox proxy authentication?

    • kaelscion
      kaelscionAug 13, 2019

      The framework returns proxy ip:port combinations as a list of strings. Yes, it can be used with selenium Firefox as the IP and User-Agents overrides/arguments accept a string argument.

      Basically, treat the Snake() object as a curated list of IP and UA choices that can be used anywhere a string object is accepted as an IP and/or UA argument.

      Were you running into a particular issue using Slither in your selenium Firefox project?

      • Mpho Mphego
        Mpho MphegoAug 13, 2019

        Beautiful, haven't used it as yet, but looking forward. I stumbled upon your Youtube video on Reddit.

        Great content keep it up.

        • kaelscion
          kaelscionAug 13, 2019

          Thanks so much! I've got a few videos up and always love to hear when people enjoy my content!

Add comment