Set User Agent on pandas read_csv
Waylon Walker

Waylon Walker @waylonwalker

About: 👋 Hey there, I am Waylon Walker I am a Husband, Father of two beautiful children, Senior Python Developer currently working in the Data Engineering platform space. I am a continuous learner, and sha

Location:
Peoria, Illinois
Joined:
Nov 14, 2019

Set User Agent on pandas read_csv

Publish Date: Mar 28 '22
7 0

I keep a small cars.csv on my website for quickly trying out different pandas operations. It's very handy to keep around to help what a method you are unfamiliar with does, or give a teammate an example they can replicate.

Hosts switched

I recently switched hosting from netlify over to cloudflare. Well cloudflare does some work to block certain requests that it does not think is a real user. One of these checks is to ensure there is a real user agent on the request.

Not my go to dataset 😭

This breaks my go to example dataset.

pd.read_csv("https://waylonwalker.com/cars.csv")

# HTTPError: HTTP Error 403: Forbidden
Enter fullscreen mode Exit fullscreen mode

But requests works???

What's weird is, requests still works just fine! Not sure why using urllib the way pandas does breaks the request, but it does.

requests.get("https://waylonwalker.com/cars.csv")

<Response [200]>
Enter fullscreen mode Exit fullscreen mode

Setting the User Agent in pandas.read_csv

this fixed the issue for me!

After a bit of googling I realize that this is a common thing, and that setting the user-agent fixes it. This is the point I remember seeing in the cloudflare dashbard that they protect against a lot of different attacks, aparantly it treats pd.read_csv as an attack on my cloudflare pages site.

pd.read_csv("https://waylonwalker.com/cars.csv", storage_options = {'User-Agent': 'Mozilla/5.0'})

# success
Enter fullscreen mode Exit fullscreen mode

Now my data is back

Now this works again, but it feels like just a bit more effort than I want to do by hand. I might need to look into my cloudflare settings to see if I can allow this dataset to be accessed by pd.read_csv.

Comments 0 total

    Add comment