Data Science Breakdown

You have undoubtedly heard that Data Science is one of the fastest growing fields in the data industry and one of the best jobs in America .  While many people are interested in a career in data science, they are afraid it might take more than they have to offer.  I was one of these people.  I was afraid that I didn’t have the knowledge or mental (or mathematical) aptitude needed for such a career.  Being the unwavering person I am, I set a goal to learn more and then went on a search for information.  I found the Microsoft MPP in Data Science program and thought “Well, I can at least give it a try.” 

          *Let me pause here and applaud Microsoft for partnering with EdX.org to assemble and bring in this training… and making it available to anyone and everyone for FREE.  You can take and complete these classes for free.  The only payment needed is if you decide to complete the classes for verified certificates (needed to complete the MPP Certification).

What it takes

Are you interested in studying data science?  Ask yourself these questions:

  • Do you have an interest in exploring abstract ideas?
  • Are you a curious person?
  • Do you feel comfortable seeking for answers in unique ways?
  • Do you love exploring with new programs and technology?
  • Are you interested in finding the story within the story?
  • Are you good at finding patterns where there seem to be only random ideals and images?
  • Are you interested in working with data?

What is a Data Scientist?

If you answered yes to the above questions, you just might be the next great Data Scientist!  Let’s break down what a Data Scientist does.  The role of  data scientist is a unique one as it requires an ability to think on your feet, think outside the box, be creative with technology, and be somewhat of an entrepreneur.  Data science walks the fine line between technology and creative story telling.  A data scientist is one who knows how to use various means to pull narratives from data to create a great story.  You see, data is not merely a static table of letters and numbers.  No, it is much more than just digits in a row.  Data is a living, breathing, ever-evolving collection of information that is searching for a way to tell its’ story.  Data scientists are curious, technically equipped story-tellers exploring the data landscape for the next great story to share.  Sound interesting?  If so, keep reading!

Data Science Tools

On my journey to becoming a Microsoft MPP in Data Science, I started where we all start… at the beginning.  The very first class in the MPP Course is Introduction to Data Science.  This is your typical intro class.  It is easy, but very important.  This will guide you through what to expect, how to navigate the classes, as well as provide an over-view of the basic concepts and principles on which data science is based.  

There are a number of tools in the data science repertoire.  For the purpose of this blog, we will focus on the tools one can learn through the Microsoft MPP courses. 

  Analyzing & Visualizing Data

The first tool we look at is for analyzing and visualizing data.  The MPP course gives you a choice between working with Power BI or Excel.  As I have previous experience with Excel and feel pretty confident there, I chose to learning something new and went with Power BI.  I found Power BI to be a super fun tool that felt more like a video game and less like work.  I love a good visual!  This class easily walked me through setup and through a variety of use-case scenarios.  I found it very fun and easy to learn.  In fact, what struck  me the most about these classes is how very concise yet easily followed the class are.  

Communicate Data Insights

Now that you understand the basics of analyzing and visualizing data, it is important to know how to master data communication.  It is one thing to be able to look at data and understand it, it takes a completely different set of skills to convey the stories the data has to tell.  In the next course, Analytics Storytelling for Impact, you will learn how to fully explore a story to find what a great story is, and what it is not. This course really dives into how to make an impact through storytelling and gives you an idea how to create impact through presentations, reports and how to apply these skills to your data analytics.  I thoroughly enjoyed this class as it spoke to the theater major in me.  I do love to tell a good story, and this class gave me new ways to look at data and has resulted in me questioning things I see every day like political polls, job descriptions, and advertisements. 

Apply Ethics and Law in Analytics

Ethics?  What does ethics have to do with being a data scientist?  Admittedly, when I first saw that the program had been updated with Ethics and Law in Data and Analytics I was a bit taken aback.  I thought I had left the legal field and was on the way to a technical role.  Why learn ethics?  Data science, and data collection have changed wildly and quickly over the last few years.  It is my firm belief that every data professional needs to take this course.  Only through taking this course I learned about the possibility of data being accidentally  prejudice!  Certainly ethics should be considered when collecting and analyzing data!  The data scientist would be remiss in not heeding due diligence!

Query Relational Data

The data scientist must know how to query databases in order to get the data needed to analyse.  The MPP program offers Querying Data with Transact SQL where you will learn to query and modify data in SQL Server or Azure SQL using TSQL.  If you are not familiar, SQL is pronounced in the industry as “See-Quil”  not “Es-Que-El”… it is a pet peeve of mine to hear someone say S-Q-L when talking to me about SQL.  This course was very thorough and a great way to step into learning how to query and program using TSQL.   This class will take some effort, I found it to be one of the more intensive classes in this course.  SQL is no easy task, and SQL Server has many versions out there in practical use, each version with different hurdles to jump.  This particular class is a fantastic place to start to learn a great deal about SQL.  

Explore Data with Code

The next step in the program is to explore data with code.  You are given two options here, one path is Introduction to R for Data Science and the other is Introduction to Python for Data Science.  For my interests, I chose Python since it is widely used in many areas, especially advanced analytics and AI.  To my surprise, Python was a lot of fun to learn.  I did more research into the uses of Python and found it to be a very useful tool in my toolkit.  I can even design and program holiday lights for my house using Python!

Apply Math and Statistics to Data Analysis

Whoa, wait….math?  Math is involved???  Yes, absolutely!  Remember back in school when you thought “When will I EVER use this again in real life?”  The answer is “Now, and always, honestly.”  There are three classes offered here so you can choose which you want to learn:

I chose the Python Edition to continue on my usage of Python from the last class.  I was not a great math student, so I was really afraid I would not be smart enough to get through this class.  If  you are feeling that way, stop that now.  Like I have said before, these classes are designed in such a great way that not only was I able to learn and grow, I made a great grade!  Don’t let fear of failure keep you from trying something new.

Plan and Conduct Data Studies

Again you are given the choice to learn Data Science Research Methods: R Edition and Data Science Research Methods: Python Edition.  No matter which path you choose, this class teaches the fundamentals of the research process.  You will learn to develop and design sound data collection strategies and how to put those results in context.  

Build Machine Learning Models

To be honest, I faced this particular class with dread.  Much to my surprise, I really and truly enjoyed learning about building machine learning models. You can chose between Principles of Machine Learning: R Edition and Principles of Machine Learning: Python Edition .  If you have previously chosen Python as I did, continue on with that path.  This class offers a clear explanation of machine learning theory through hands-on experience in the labs.  You will use Python or R to build, validate and deploy machine learning models using Azure Notebooks.

I will make one suggestion though, before completing this class, would recommend completing  Developing Big Data Solutions with Azure Machine Learning .  As a more visual-based person, I found that I understood the machine learning models much more after completing the course using Azure Machine Learning.

Build Predictive Solutions at Scale

Okay, now we are getting to some really fun stuff! I think this was my absolute favorite of all the classes.  You can chose from one of these three:

I chose Developing Big Data with Azure Machine Learning (AML) and what a blast I had!  I can say that working with AML and with Azure Data Studio was like opening up presents on my birthday!  The final projects were a lot of work, but I got a real sense of what working in the field as a data scientist and machine learning is all about… trial and error.  It was a lot of fun trying to use insights, hunches, best guesses, and technology all together to create and train a model in order to accurately predict solutions!  

Final Project

After all the courses are completed and passed, you can only gain the MPP in Data Science if you successfully pass the Microsoft Professional Capstone : Data Science.  As of the writing of this blog, I am slated to begin the Capstone on December 31, 2018 and I cannot think of a better way to ring in the new year!

Final Thoughts

I have researched many ways to become a Data Scientist.  Most universities offer degrees in data science.  I have found that on the majority of their sites, they tout a Masters or PHD in Data Science is what you need (with a heavy prerequisite of extensive math and stats classes) in order to become a data scientist.  Must you have an advanced degree in mathematics or engineering to become a data scientist?  Absolutely not.  You don’t even have to hold a degree to work as a data scientist!  Take a look at this article published on Forbes: 4 Reasons Not To Get That Masters In Data Science

My advice is to take a look at the Microsoft MPP program and try on a few of the free classes.  If you are truly interested in a data science career and are willing to put forth the time and attention needed to learn, you already qualify as a good candidate.  Don’t let your past dictate your future.  Make the investment in yourself and grow along with the technology as it comes.  You can do this!  

Power BI Data Gateway

What is a Data Gateway in Power BI?

When creating reports in Power BI, the end goal is to make them useful to many users. In order to share reports created in Power BI, they must be published to the cloud (known as PowerBI.com). Once nestled in the cloud, the data in the reports will either stand static, or will need to be updated on a regular basis. In order to refresh data and keep end users in up to date, the cloud must have access to data sources. This is where you need a Data Gateway. Think of a data gateway as a bridge between your on-premises data sources and the cloud.

A gateway should be installed on a machine that is always on and connected to the internet. Gateway cannot access information from a machine that is powered off or loses internet.

  •  Before installing, take into consideration that if you are installing on a laptop and it is turned off, not connected to the internet, or asleep, the gateway won’t work and the data in the cloud will not sync with your on-prem data. Also, if the machine on which the gateway is installed is connected to a wireless network, the gateway may perform more slowly and that will take longer for the data to sync with the cloud and your on-prem data.

Power BI Gateway can be installed in two ways:

  • On-premises data gateway – This gateway can be used by any user that has access to the server on which the gateway is installed. It can be used for scheduling refreshes and live queries.
  • On-premises data gateway (Personal mode) – This gateway is can only be used by the person setting up the gateway. This mode is only used for scheduling refreshes in Power BI. At the time of writing, Live Connection, DirectQuery, Power Apps, Logic Apps, and Microsoft Flow are not supported.

Only one gateway in each mode can be installed on one machine. That is, you may install one gateway in personal mode, and another in regular mode. You cannot install two or more personal mode gateways on one machine. You can, however, manage multiple gateways from the same interface on Power BI.

Installing a Gateway

To install a gateway, you will first need to sign on to PowerBI.com. Take note that this is NOT the desktop app, this is the cloud-based service. Look at the top right on the menu bar, click on the icon that looks like an arrow pointing down. The dropdown will reveal several actions. You will want to choose ‘Data Gateway’.

SettingupGW

This will take you to a new webpage where you will be able to start your Gateway download. Click on the DOWNLOAD GATEWAY button and wait for the download to begin. Once the Download Installer has finished, open up the exe and follow instructions.

DownloadPage

When the installer opens, you will be ready to start setting up your gateway.

GatewayInstaller

Click NEXT to choose the type of gateway you need.  Before you choose, take into consideration the role of each. Remember that the Personal mode is only useful for on demand refresh and scheduling refresh in PowerBI and cannot be used for Live or DirectQuery. On-premises data gateway can be used by multiple users and does support both schedule refresh and DirectQuery.

Please note the following in regard to installing either mode:

  •  both gateways require 64-bit Windows operating systems
  •  gateways can’t be installed on a domain controller
  •  you can install up to two On-premises data gateways on the same computer, one running in each mode (personal and standard)
  •  you cannot have more than one gateway running in the same mode on the same computer
  •  you can install multiple On-premises data gateways on different computers, and manage them all from the same Power BI gateway management interface (not including Personal mode)
  •  You can only have one Personal mode gateway running for each Power BI user. If you install another Personal mode gateway for the same user, even on a different computer, the most recent installation replaces the existing previous installation.

ChoosePersonalorNot

Once you have chosen your mode and clicked next, it will take a a few seconds for it to download and get ready to install your gateway.

The next step is to point the download to the drive on wish you want the install to go. You will want the Gateway positioned as close to your data source as possible. Be sure to read and accept the terms of use and privacy statement.

Upon successful installation, you will need to add an email address to use with this gateway. Next you will need to sign in.

SucessfulInstall

We have successful installation of our Gateway! Now you will have the option to configure a new gateway, migrate, restore, or take over an existing gateway. Here we will register the data gateway.

Register On-Prem

To configure a new gateway, you will need to enter a name for the gateway, enter a recovery key (minimum 8 characters) and finally, select Configure. Be sure to store your recovery key in a safe place. You will need it in the future if you ever need to migrate, restore, or take over a gateway.

ConfigureGateway

Congratulations, you now have a successful installation and configuration of Gateway! Now you will be able to connect to on-prem data sources! For use with Power BI, you will need to add your data sources to the gateway within the Power BI service. This is done by going to the menu bar, clicking on the gear icon and choosing MANAGE GATEWAYS from the drop down. We will cover adding data sources in the next blog!

*For a more in-depth look at Gateway installation, information can be found on Microsoft Docs.