jump to navigation

Web Scraping 101 with Python March 27, 2015

Posted by sandyclaus in Academic Technology, Programming, Python.
trackback

Now that we have the packages we need, we can start scraping. But first, a couple of rules.

You should check a site’s terms and conditions before you scrape them. It’s their data and they likely have some rules to govern it.

Be nice – A computer will send web requests much quicker than a user can. Make sure you space out your requests a bit so that you don’t hammer the site’s server.

Scrapers break – Sites change their layout all the time. If that happens, be prepared to rewrite your code.

Web pages are inconsistent – There’s sometimes some manual clean up that has to happen even after you’ve gotten your data.

via Web Scraping 101 with Python.

Advertisements

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: