Friday December 29th

Data Continued…

This is really quite fun!

  • I’m really enjoying the Data Science work. It’s challenging, rewarding, but it also makes sense.

  • Today, I analyzed some wine data

We started out with two csv files

  • They were initially separated by semi-colons, so had to fix that!

  • There were two separate data files with similar columns; red and white. We were tasked with creating a column called “color” that would also give the color, and then we were to merge these two data-sets together, which makes sense.

  • Observed the data, created a white and red column for each dataset and merged to two in a new edited file containing both datasets

  • I verified that both datasets were, in fact, appended by looking at the head and tail of the dataset

I made a mistake!

  • On the first try, I accidentally set the header to false, so the edited file had no Header for the Columns. Yikes!

  • Fixed it! That meant updating the edited file! I also had to rename a column before I could combine it into the one file. I was silly and didn’t discover that it had created a NaN column of values from the old one, so I had to delete that column, which brought me back to the 13 columns of data, properly named.

  • So far, so good!

Visualizations

  • Learned how to do some simple diagrams / visualization using Seaborn

Common Functions

  • This is using a different data-set that is associated with EPA data and carbon emissions.

  • Checking for non-null values

  • Checking for dupes

What I like so far

  • The cells in Jupyter are great in that they allow you to focus on single, specified tasks rather than looking at pages of an intimidating code-base.

  • It also is quite functional; you’re chaining functions together (eg. .sum().mean()). Haven’t written a single self.ihatemylife yet :D

To do

  • Finish up chapter, which includes a lot of SQL and more Case Studies
  • Finish up project 1, project 2
  • Finish up Data Analysis coursework from other shorter course (project 3 and 4)
  • Finish up application
  • Prepare for interviews (I have two on 1/1)! bites nails

Katas

  • find consecutive pairs (tuples) given a list : eg [1,2,3,4,5] == [(1,2), (3,4), 5]; count = 2 (Python)
def pairs(ar):
  arr = zip(ar[0::2], ar[1::2])
  count = 0
  for i in arr:
    if abs(i[0] - i[1]) == 1:
      count = count + 1
  return count
  • if number is multiple of index, return (JavaScript)
function multipleOfIndex(array) {
  var arr = []
  for (var i = 0; i < array.length; i++)
  {
    if (array[i] % i == 0)
    {
      arr.push(array[i])
    }
  }
  return arr;
}

And, that’s about it

Written on December 29, 2017