I was in my first semester at Syracuse University, when we were in our third week of classes, my data science professor, Prof. J. Saltz told us of a big data project he wanted us to pursue. I had never used R or R Studio prior to this course (Introduction to Data Science).
We were given a large set of data, in 11 disparate excel files, each exceeding 1GB and each with at least 270 variables and 100,000 observations. As the course progressed, i learned how to effectively clean data, get rid or empty spaces, missing values and unstructured columns and rows. I, along with my team, proceeded to clean the data, after which we conducted multiple types of statistical analysis on the data. We derived insight which led us to understand that all of our data belonged to a hotel corporation, and a majority of it was concentrated in the U.S. We then tried to understand the dependency of other variables of the primary variable which is the Net Promoter Score.
I personally was responsible for generating heat maps, running association rule mining algorithms on the data to find relativity and then trained a Support Vector machine algorithm on a bunch of data before accurately predicting the foreseeable Net promoter Score. As a team we translated our findings at the end of the semester to a panel and in an easy to understand and non-technical manner, we were able to convey our results, analysis, and the methods tools and techniques we adopted to achieve the outcome.