Thursday, November 27, 2014

Here I Go Again

As I recently said on Twitter, web scraping with Python has replaced QGIS as my latest digital problem. At least the maps showed me that I was making progress.

Little has changed in 2 weeks.

I spent about an hour and a half in the latest DH Anonymous group with Justin and Sarah trying to figure out how to go about this whole Python gig. My first disconnect occurred from the tutorials. They allow you to work online in a simulated Command Line that definitely doesn't seem to replicate what I regularly see on my computer and how the Python program works on my machine. I've realized that many things will be different from the tutorial when working with the program directly, but I already needed help in figuring out the basics. Thankfully, Justin got me through most of that.

I installed all the necessities: BeautifulSoup, pip, Python itself. After some searching, video watching, and tutorial reading, I found some code that seemed to accomplish the beginning steps of the process.  The website I'm planning to scrape is relatively simple (www.sacred-texts.com/neu/ascp), and I was able to get the links for each item on the landing page to scrape along with the text of that list. However, I couldn't seem to find a way to manipulate my code to get Python to access the links and scrape the contents of those links. I had spent several hours on Tuesday night just trying to work with the code, add things to it (that probably didn't make much sense), and continue to research the use of Python with web scraping. It just wasn't working. Well, lo and behold, Wednesday's class (and Dr. Gibbs) revealed to me that I wasn't using the proper area to work with Python (Command prompt rather than Command line) and I was making all of this far too complicated.

After about an hour's worth of Q&A with Gibbs after class, I had a grasp on what I was doing and where I needed to go. I haven't gone back to it just yet (mostly because it's Thanksgiving), but I'm determined to figure this out. It's the last things I've got left to check off my list for my digital portfolio, so it's going to get done. Lesson of the fail: quit over-thinking!

No comments:

Post a Comment