## Designing My Own “Data Science & Machine Learning” Program

I have already completed my post graduation in business administration with specialization in finance, it had some exposure to analytics as well. Three years after completion of this, I got an itching to learn something about data science. Having said that, I cant imagining going for another PG but this should not stop me from learning it.

In the age of MOOC, online learning, YouTube, blogs, most of the resources are available to everyone, Yes, it is a pain to identify good resources but its there. I decided to go through some of the reputed Data Science program curriculum and derive a program for Data Science from freely available resources. I have considered following courses.

• Master of Science in Business Analytics by UT at Austin
• MS in Data Science by NYU
• Master of Science in Data Science by Columbia University
• MS in Computational Data Science by Carnegie Mellon University
• Certificate Program in Business Analytics by ISB

• http://datasciencemasters.org/
• http://www.datasciguide.com/

These road-maps are huge, in fact authors have mentioned that its not humanly possible to go through each resource, whoever is following these, should follow selectively.

Machine learning topics list is huge, if you start learning everything, you will loose momentum so when you start learning machine learning, consider following points

Firstly and most importantly choose your niche and start working on it,if you have not selected any niche, start reading more about different subtopics but dont delay deciding your niche.

and secondly, don’t wait till you complete all learning, start getting your hands dirty as early as possible.

Data Science in Python by University of Michigan

Month#1

Python Basics, NumPy, SciPy, Pandas and Matplotlib using following two courses:

Introduction to Data Science in Python

Basic Statistics

Applied Plotting, Charting & Data Representation in Python

You can do applied plotting, charting & data representation course at later stage but I recommend you do python and basic statistics courses before you starting machine learning course.

Month#2

Applied Machine Learning in Python

I am still analysing next steps and I will add these section once I figure these out.  Since now we have basic programming as well as basic understanding of machine learning this is a right time to learn required mathematics, statistics , probability before we venture ahead.

Please view this video as well

and

## Important blogs and Websites for Data Science

### Free Books (All legit)

• Mathematics and Statistics
• Programming

Data Science Blogs to Follow

## fuzzywuzzy : string matching Python

Many times while dealing with text analytics, we need to compare text. There are multiple algorithms and approaches to do the job. Lets have a look at fuzzywuzzy library.

##### fuzzywuzzy
###### Installation
pip install fuzzywuzzy
pip install python-Levenshtein


fuzzywuzzy will work even if you dont install python-Levenshtein but installing it will enhance performance.

###### Using fuzz.ratio

This is basic comparison and output is as below

>>> from fuzzywuzzy import fuzz
>>> from fuzzywuzzy import process
>>> fuzz.ratio("ABCD", "ABCD")
100
>>> fuzz.ratio("ABCD", "ABCDE")
89
>>> fuzz.ratio("ABCD", "ABCDEF")
80
>>> fuzz.ratio("ABCD", "ABCDEFG")
73
>>> fuzz.ratio("ABCD", "ABCDEFGH")
67
>>> fuzz.ratio("ABCD", "ABCDEFGHI")
62
>>> fuzz.ratio("ABCD", "ABCDEFGHIJ")
57

###### Using partial_ratio

ratio is very simple comparison, you can use partial_ratio to do sub-string mapping.

>>> fuzz.partial_ratio("ABCD", "ABCDEFGHIJ")
100


But evening partial ration fails when words are scarmbled.

>>> fuzz.partial_ratio("India Vs Aus","Aus Vs India")
42

###### Using tokensortratio

Basically “India Vs Aus” and “Aus Vs India” are same thing but people can use either ways and both are valid. In cases of words where sequence might be different,you can use tokensortratio

>>> fuzz.token_sort_ratio("India Vs Aus","Aus Vs India")
100
>>> fuzz.token_sort_ratio("India cricket team Vs Aus cricket team","Aus Vs India")
48


Now lets add further complication, if I add ‘cricket team’ in one of the word, match does not work.

###### fuzz.tokensetratio

>>> fuzz.token_set_ratio("India cricket team Vs Aus cricket team","Aus Vs India")