Friday December 22nd

Scraping the Internet

Space things…

  • I attended the German club last night, and it was Amaaaaazing! (nope, cat didn’t walk over my keyboard!). Specifically, my friend who works on Electronic components testing was there; he’s been hired full-time at JPL! I also discovered I’ve been talking to a lady I may have met before in 2016, when I was at JPL for my four-day workshop. We gave her a ride home yesterday evening, and 2 and 2 came together. Hmm!
  • I also met a roboticist who is working on co-axial drones for Titan! He was amazing to speak with, and I immediately lit up, and before we knew it, the entire table was talking about ways to solve problems with drone technologies, and challenges, and different planetary atmospheres. It was really incredible! I mentioned a geodesic drone I had seen once, and he immediately wanted to know what it looked like. He was just really cool and he just happened to be there at another table and had stopped by to say hello! It was a really fun night!

Oh yeah..Hi all!

  • Been working through my Nanodegree today, and went through the Python module. It was actually pretty rough, but worth it. I say “rough”, because it seemed to hit upon all my weaknesses, so I know what I need to practice now, particularly for data analysis.
  • I’ve sort of been cheating and using things like lists to avoid dictionaries, and that sort of thing. But for data analysis, you def need to know your dictionaries, hash-maps, and tuples.
  • That being said, I got to the end of the Python module and came across web scraping, which is amazing. I don’t know why in the past I relegated it in my mind to “chatbot”, which is relegated to a part of my mind that is labelled “refuse (aka barf) knowledge”, because man, it is fun!
  • I enjoyed it so much (even though it was just a taste, and for the project we’ll be using it!) that I searched and found this book on webscraping in Python, and decided to go through all 256 pages!

So far

  • I’m learning how to parse html on pages and find text, tags and other such useful information. For example, this is a little script I made to find all the images on one of the author’s webpages and output the number of images on that page.

Code

Output

  • I had to watch out originally because both Python2 and Python3 are on Cloud9.
    So I’d install BeautifulSoup on Python2, go back to Python3 and then it would say module not found. Awesome :D

  • I’ve been using Cloud9 mainly, because I jump between so many different workstations, and my most frequently used laptop has close to no storage, to the point that I have to clear my cache on Chrome and restart it to get a little bit more storage (yeah, you can laugh; I won’t judge you LOL).
  • I’d like to get a more powerful laptop, but maybe in 2018. My goal this past year was to get my green card, so I have a bit more freedom and can get an opportunity where I’m more valued and therefore more financially secure, which I can definitely do in 2018 (and I’ll have the time to search and be pickier, which is often a luxury when you’re on a work or student visa).
  • I won’t stop my side hustles, though, and just generally working to be a better programmer. And, of course, I won’t stop being thrifty me. I’m definitely not one of those females who buys a lot of clothes…or shoes.. or anything like that. I’d probably buy a book or a new laptop before any of that stuff (or a bus ticket to some conference). A great teacher and friend once told me “if you have the opportunity, always invest in yourself”. I still believe this with every fibre of my being, and so far, so good.
  • Also, yeah…mentorship and involvement. My Rust group moved to a closer location, so I’m tempted to go to some of that in the New Year, and continue to mentor and do more things like Hackathons, if I can. We’ll see. My deeper understanding of Python, Haskell (and continued learning of things like Data Structures and Discrete Structures in C++) come first, though.

Upcoming

  • I’m working on project 1 this weekend, which involves some SQL and possibly Excel
  • Next week, since I’ve already completed the Python module, I’ll spend on either (if I don’t complete the project this weekend) finishing up that project or working on project 2. I’ll also dig into the next module, which is Data Analysis. After that, there’s a project and Statistics coursework.
  • We’re using Jupyter substantially after that, which is great! Jupyter is like LaTeX to me. It’s just so beautiful and organized as a format.
  • I’m also working on another Python Data Analysis course simultaneously, that is just about four weeks long. It doesn’t officially open until tomorrow, but I have done week one’s quiz already.
  • There’s a PureScript Meeting tomorrow online. I’ll be sure to attend that. I keep getting roped into these functional-ly things. I actually saw a Quora question a while ago that a student posted, where they couldn’t decide between Python, Haskell and JavaScript, and asked which two they should learn. I secretly said “Python and Haskell” and one of the persons answering OP said “if you want to be a data scientist, Python and Haskell”. And I went “huh…”, and smiled (those happen to be the two dream languages, with some C++ thrown in!) :D
  • Oh, and I have to follow up with about three or four companies in the New Year that I’ve been speaking with, to see if there is something out there for me. I’m really grateful for everything, and we’ll see what 2018 brings!

And…that’s about it!

Written on December 22, 2017