MPP Data Science Final Project

Recently I wrote about my experience working through the Microsoft Professional Program in Data Science in my blog post Data Science Breakdown. I am so happy to announce that I successfully completed the capstone! More-so, I achieved the MPP Data Science Certificate!

“Was it difficult?”

The number one question I get asked is if the capstone was difficult. The short answer is “yes.” The capstone is designed to test the knowledge and skills gained from the classes in the course, as well as expand your horizons as to how you would deal with “real-world” projects. The next question I get is “Do you think I can pass it?” Absolutely… if you give it your all.

The capstone I completed was divided into three parts.
* data analytics
* building and testing a machine learning model
* write a professional report

The Sum of Three Parts

Part 1, Analyzing Data

The capstone really put me to work. Reviewing and analyzing data is very easy and straight forward for me. I really enjoy looking between the lines and finding the patterns that emerge. The capstone allows the student use of any analytics program they desire. For this portion, Excel was my choice, and it proved to be a wise one. Part 1 took mere minutes for me to complete. It was the second part of the capstone where the majority of my time and effort was spent.

Part 2, Machine Learning

In order to create a successful machine learning model, you will need to be proficient in your Google-Fu. Building the machine learning model took more knowledge and experience than I had gained in the course work alone. A large amount of my time was spent exploring various approaches and algorithms. Many hours were spent researching algorithms and trying to figure out the best ways to go about training the unruly model.

I was frustrated, tired, and admittedly ready to give up. REALLY AND TRULY. A thought occurred: “This is why they call it ‘Data Science’. I am sitting here trying to find answers along an untrodden path.” In that moment, I imagined this is what it feels like to be a true scientist. Just as in physical science, data science requires time, patience, research, trial and error.

Not being one to give in, I persisted until the best combination of algorithms was found. It was a profound relief to test the model against my data and to see it be successful.

Giving Thanks

So many people in the community helped guide me in the direction to find answers. Thank you to all who wrote blogs, tweets, whitepapers, and produced videos. Sharing your unique views and understanding with others makes us a stronger community.

Part 3, Professional Report

Armed with my data, my experiences, and my successful machine learning model, the final step in the capstone is to put it all together in a written report! Holy goodness. This was much more difficult than I had expected it to be. Your grade for the third part is dependent on other students and their assessment of your report. Yet another real-life experience to get you ready for a career as a Data Scientist!

The report took a few days to complete; the last day I worked a solid 30 hours straight on it. I could not sleep; was excited, terrified, stressed, and also just REALLY ready to get this completed. Once my report was submitted, I began reading the other students’ reports as assigned. The reports were outstanding. Those students brought to light information that had not shown itself in my project. Each report was vastly different; the story the data told the other three varied from each other and mine as well. How interesting that we all had the same data sets and somehow all four of us presented completely different stories!

Lessons Learned

Undoubtedly, the classes in the MPP Data Science course taught me a number of valuable skills while also having the unintended consequence of teaching me some things about…well…me.

When faced with something new and extremely difficult, I learned that I have the ability to rise up and learn and be successful. That the willingness to learn something new can set you apart from others.

At the end of the capstone, while reviewing other students’ reports, I saw opportunities for more learning as their approaches to the same material differed from mine.

Finally, I learned that when we work together for a common goal, we are stronger and smarter together.

We are stronger and smarter together.

Return To Where It All Started

They say you can’t go back home again. That is true, but this week I will be traveling back to where it all started. I am headed back to speak SQL Sat Nashville! Last year at this time (January 2018) I nervously arrived at SQLSatNash to present for the very first time.

With a ton of support from my friends and family, I created a session over Soft Skills and I was ready to present! I had prepared. I had obsessively gone over each and every word on the slides. I had practiced and practiced and had actually given a test run to my husband’s company. I was ready. I WAS READY!

Is it time to panic?

No sleep was to be had the night before my session. I was excited and nervous.. and very scared. The day was here and I could not be any more prepared. I walked in to set up laptop and the room was freezing. I swear I could see my breath when I talked! You see, the university had been closed the day before due to ice and snow, and the rooms had not yet warmed up. This gave me a new panic. How was I going to keep people’s attention in such a cold room? Would they even stay?

My knees were knocking, and it wasn’t just because the room was a balmy 40 degrees (F)! I was nervous and was constantly questioning my decision to step into the public speaking realm.

I loved seeing it snow at #SQLSatNash 2018

Calm down, you are amongst friends!

In order to calm my nerves, I started playing Can’t Stop the Feeling. This is one of my go-to feel great songs. People started filing in, and I kept it playing. I started dancing (to keep warm and also because I had no idea what to do with all that nervous energy!) and I was surprised to see others started joining me!!!  Much to my amazement and pure enjoyment, Aunt Kathi was on the front row to see ME and she started dancing as well! Actually I think she lead the dancing! ha ha (Seeing Aunt Kathi there in my session made me feel like a super star. Having her join me in dancing made me feel even more special. Thank you Aunt Kathi!)

I look out over my audience and see some amazing people. John Morehouse, Duke (#JohnsDad), Rie Irish, Allen Kinsel, Lance Tidwell, Jim Donahoe, my son Robert, Chris Hyde, and many others were there to support me! What an amazing group of #SQLFamily! The time was now, and I did not want to disappoint them! So here we go, the song was over and it was time to start!

I am thankful for my #SQLFamily

If you have not attended my sessions, they are… less than boring. I feel that my job as a speaker is to not only education, but to entertain. So, sort of edutainment I suppose. I was so full of excitement and energy, I felt like I was buzzing all over the place! I had a lot of great audience interaction and participation and before I knew it, the presentation was complete. WOW! That time sure flew by! I was shaking, I was excited, I was on a new-found high that I had never experienced before! I was totally bit by the Speaker Bug!

SQL Saturday?

I went on to speak at several SQL Saturdays in 2018 and had a wonderful time at every event. If you are wondering where the next or nearest SQL Saturday is to you, you can visit the SQLSaturday website and find out! Remember that just as important as the training you receive, the networking at SQL Saturdays can be greatly beneficial to you and your career. Don’t be afraid to approach a speaker and ask questions. We are there to share knowledge and to network as well.

I am super excited to be chosen to speak in Nashville again. It is such a great event put on by great people. This year I will be discussing something near and dear to my heart, Ethics in Data. I would love to see you at SQL Saturday Nashville 2019! You can find out more and register here!

And hey, if you want to start presenting but don’t know where to start, check out this blog (Just Start at the Beginning) and send me a message! I would love to help grow new speakers! You can do this.

Data Science Breakdown

You have undoubtedly heard that Data Science is one of the fastest growing fields in the data industry and one of the best jobs in America .  While many people are interested in a career in data science, they are afraid it might take more than they have to offer.  I was one of these people.  I was afraid that I didn’t have the knowledge or mental (or mathematical) aptitude needed for such a career.  Being the unwavering person I am, I set a goal to learn more and then went on a search for information.  I found the Microsoft MPP in Data Science program and thought “Well, I can at least give it a try.” 

          *Let me pause here and applaud Microsoft for partnering with EdX.org to assemble and bring in this training… and making it available to anyone and everyone for FREE.  You can take and complete these classes for free.  The only payment needed is if you decide to complete the classes for verified certificates (needed to complete the MPP Certification).

What it takes

Are you interested in studying data science?  Ask yourself these questions:

  • Do you have an interest in exploring abstract ideas?
  • Are you a curious person?
  • Do you feel comfortable seeking for answers in unique ways?
  • Do you love exploring with new programs and technology?
  • Are you interested in finding the story within the story?
  • Are you good at finding patterns where there seem to be only random ideals and images?
  • Are you interested in working with data?

What is a Data Scientist?

If you answered yes to the above questions, you just might be the next great Data Scientist!  Let’s break down what a Data Scientist does.  The role of  data scientist is a unique one as it requires an ability to think on your feet, think outside the box, be creative with technology, and be somewhat of an entrepreneur.  Data science walks the fine line between technology and creative story telling.  A data scientist is one who knows how to use various means to pull narratives from data to create a great story.  You see, data is not merely a static table of letters and numbers.  No, it is much more than just digits in a row.  Data is a living, breathing, ever-evolving collection of information that is searching for a way to tell its’ story.  Data scientists are curious, technically equipped story-tellers exploring the data landscape for the next great story to share.  Sound interesting?  If so, keep reading!

Data Science Tools

On my journey to becoming a Microsoft MPP in Data Science, I started where we all start… at the beginning.  The very first class in the MPP Course is Introduction to Data Science.  This is your typical intro class.  It is easy, but very important.  This will guide you through what to expect, how to navigate the classes, as well as provide an over-view of the basic concepts and principles on which data science is based.  

There are a number of tools in the data science repertoire.  For the purpose of this blog, we will focus on the tools one can learn through the Microsoft MPP courses. 

  Analyzing & Visualizing Data

The first tool we look at is for analyzing and visualizing data.  The MPP course gives you a choice between working with Power BI or Excel.  As I have previous experience with Excel and feel pretty confident there, I chose to learning something new and went with Power BI.  I found Power BI to be a super fun tool that felt more like a video game and less like work.  I love a good visual!  This class easily walked me through setup and through a variety of use-case scenarios.  I found it very fun and easy to learn.  In fact, what struck  me the most about these classes is how very concise yet easily followed the class are.  

Communicate Data Insights

Now that you understand the basics of analyzing and visualizing data, it is important to know how to master data communication.  It is one thing to be able to look at data and understand it, it takes a completely different set of skills to convey the stories the data has to tell.  In the next course, Analytics Storytelling for Impact, you will learn how to fully explore a story to find what a great story is, and what it is not. This course really dives into how to make an impact through storytelling and gives you an idea how to create impact through presentations, reports and how to apply these skills to your data analytics.  I thoroughly enjoyed this class as it spoke to the theater major in me.  I do love to tell a good story, and this class gave me new ways to look at data and has resulted in me questioning things I see every day like political polls, job descriptions, and advertisements. 

Apply Ethics and Law in Analytics

Ethics?  What does ethics have to do with being a data scientist?  Admittedly, when I first saw that the program had been updated with Ethics and Law in Data and Analytics I was a bit taken aback.  I thought I had left the legal field and was on the way to a technical role.  Why learn ethics?  Data science, and data collection have changed wildly and quickly over the last few years.  It is my firm belief that every data professional needs to take this course.  Only through taking this course I learned about the possibility of data being accidentally  prejudice!  Certainly ethics should be considered when collecting and analyzing data!  The data scientist would be remiss in not heeding due diligence!

Query Relational Data

The data scientist must know how to query databases in order to get the data needed to analyse.  The MPP program offers Querying Data with Transact SQL where you will learn to query and modify data in SQL Server or Azure SQL using TSQL.  If you are not familiar, SQL is pronounced in the industry as “See-Quil”  not “Es-Que-El”… it is a pet peeve of mine to hear someone say S-Q-L when talking to me about SQL.  This course was very thorough and a great way to step into learning how to query and program using TSQL.   This class will take some effort, I found it to be one of the more intensive classes in this course.  SQL is no easy task, and SQL Server has many versions out there in practical use, each version with different hurdles to jump.  This particular class is a fantastic place to start to learn a great deal about SQL.  

Explore Data with Code

The next step in the program is to explore data with code.  You are given two options here, one path is Introduction to R for Data Science and the other is Introduction to Python for Data Science.  For my interests, I chose Python since it is widely used in many areas, especially advanced analytics and AI.  To my surprise, Python was a lot of fun to learn.  I did more research into the uses of Python and found it to be a very useful tool in my toolkit.  I can even design and program holiday lights for my house using Python!

Apply Math and Statistics to Data Analysis

Whoa, wait….math?  Math is involved???  Yes, absolutely!  Remember back in school when you thought “When will I EVER use this again in real life?”  The answer is “Now, and always, honestly.”  There are three classes offered here so you can choose which you want to learn:

I chose the Python Edition to continue on my usage of Python from the last class.  I was not a great math student, so I was really afraid I would not be smart enough to get through this class.  If  you are feeling that way, stop that now.  Like I have said before, these classes are designed in such a great way that not only was I able to learn and grow, I made a great grade!  Don’t let fear of failure keep you from trying something new.

Plan and Conduct Data Studies

Again you are given the choice to learn Data Science Research Methods: R Edition and Data Science Research Methods: Python Edition.  No matter which path you choose, this class teaches the fundamentals of the research process.  You will learn to develop and design sound data collection strategies and how to put those results in context.  

Build Machine Learning Models

To be honest, I faced this particular class with dread.  Much to my surprise, I really and truly enjoyed learning about building machine learning models. You can chose between Principles of Machine Learning: R Edition and Principles of Machine Learning: Python Edition .  If you have previously chosen Python as I did, continue on with that path.  This class offers a clear explanation of machine learning theory through hands-on experience in the labs.  You will use Python or R to build, validate and deploy machine learning models using Azure Notebooks.

I will make one suggestion though, before completing this class, would recommend completing  Developing Big Data Solutions with Azure Machine Learning .  As a more visual-based person, I found that I understood the machine learning models much more after completing the course using Azure Machine Learning.

Build Predictive Solutions at Scale

Okay, now we are getting to some really fun stuff! I think this was my absolute favorite of all the classes.  You can chose from one of these three:

I chose Developing Big Data with Azure Machine Learning (AML) and what a blast I had!  I can say that working with AML and with Azure Data Studio was like opening up presents on my birthday!  The final projects were a lot of work, but I got a real sense of what working in the field as a data scientist and machine learning is all about… trial and error.  It was a lot of fun trying to use insights, hunches, best guesses, and technology all together to create and train a model in order to accurately predict solutions!  

Final Project

After all the courses are completed and passed, you can only gain the MPP in Data Science if you successfully pass the Microsoft Professional Capstone : Data Science.  As of the writing of this blog, I am slated to begin the Capstone on December 31, 2018 and I cannot think of a better way to ring in the new year!

Final Thoughts

I have researched many ways to become a Data Scientist.  Most universities offer degrees in data science.  I have found that on the majority of their sites, they tout a Masters or PHD in Data Science is what you need (with a heavy prerequisite of extensive math and stats classes) in order to become a data scientist.  Must you have an advanced degree in mathematics or engineering to become a data scientist?  Absolutely not.  You don’t even have to hold a degree to work as a data scientist!  Take a look at this article published on Forbes: 4 Reasons Not To Get That Masters In Data Science

My advice is to take a look at the Microsoft MPP program and try on a few of the free classes.  If you are truly interested in a data science career and are willing to put forth the time and attention needed to learn, you already qualify as a good candidate.  Don’t let your past dictate your future.  Make the investment in yourself and grow along with the technology as it comes.  You can do this!  

Switching Gears and Moving Cheese

In 2016 I decided to switch careers.  It was a big decision for me, and a bit scary.  I had been working in the legal field since I graduated college.  It was an area in which I grew and excelled, but it had become stale for me.  I knew I wasn’t happy and that my heart truly belonged in a more technically challenging role; however, I had no idea where to start.  In walked the Microsoft Professional Program in Data Science.  Knowing that I learn best from structured programs, I felt confident that the classroom-style lecture and lab would be a great fit.  I found it interesting and so very helpful that the program allowed me to first dip my toe into tech without too much investment.  The courses are designed so that you can audit them and not pay a thing!  Soon, I was hooked!

Jumping Into the Deep End

Standing on the precipice of greatness is often an exciting and terrifying experience.  I was staring at the data field trying to decide which path to take.  I was fortunate to be offered a position as a Marketing DBA with a SQL Server consulting company.  I began learning and growing as a website designer, marketing specialist, CRM specialist, project manager, event coordinator, and database administrator.  Wow, there is SO much to learn!  So many paths to take, so many areas in which to specialize.  It can be very overwhelming to the new student.  After a while I found that I needed to make a move, I needed to take a big leap into the well of knowledge, so I left my position to go back and finish my data science courses.

Data Science

What is “data science?” you may be asking.  I feel that the term is both widely over-used and wholly underestimated. Think about the term “scientist” and what that entails.  A scientist uses knowledge and technology to aid in finding an answer or truth.  So naturally, a data scientist does the same thing.  We use various means of technology, finesse, hypothesis, and trial and error to draw from the data the story that is waiting to be told.  My personal journey took me along the path of learning Python and Azure Machine Learning, two things I never knew I would have any desire to learn.  I faced these two courses with extreme skepticism.  I resigned myself to the thought that I would more than likely barely get by (and that “D stands for diploma” as we used to say in college).  Much to my surprise, I found that not only did I understand both, I absolutely fell in love with coding in Python and predictive analysis with Azure Machine Learning!

Data science really is so much more than exploring with new technology and programs.  Data science also draws on the tried and true, and this is where my background in the legal field comes in really handy.  For instance, can pure, raw data be racist?  (Absolutely, and that will be a topic we save for an up-coming blog. )  When we are given data, it is never in a perfect form: information is missing or duplicated, in the wrong format, or is unusable for many other different reasons.  Because of this, a data scientist must know how to clean, manipulate, and finesse the data in order to pull out the story.  Much like an artist who starts with a block of marble creates a beautiful statue, Data scientists must know the curves and lines to manipulate to bring out the best form of the story the data is designed to tell. In the legal field, we used words and phrases to convey law, tell stories of guilt or innocence, and create the narrative we desired.  It all takes an art form and knack for finesse. 

nils-478-unsplash

Moving Cheese

There will come a time in life, perhaps many times, where you find your cheese has been moved.  Many times as well that you find you need to move your own cheese.  “Why the cheese talk Angela?” you might be pondering.  Back in my first job out of college, an attorney I worked with gave me a copy of a book he wanted me to read.  It was truly an eye-opening and life changing moment for me.  If you have not read the book Who Moved My Cheese ?  I suggest getting a copy.  It is a good reminder that no matter where you are in life, if you are not living happy and feeling fulfilled, you CAN make a change.  You can wait for someone else to move your cheese, or you can just move it yourself.  So this is what I did, I moved my cheese.

An Invitation

I invite you to come along on this data science journey with me as we explore new programs and technologies.  Over the next few weeks, I will be diving into my experience in each section of the Microsoft Professional Program in Data Science.  I will NOT be sharing answers to test or lab questions, that would be cheating, and you know that as well as I do!

Tail Log Backup

In my SQL Server Recovery Models blog, I touched a bit on my experience with recovery using Tail Log Backup.  In this post we will take an in-depth look at Tail Log Backup; what they are, why they should be in your toolbelt, and lastly line out the steps to successfully take a Tail Log Backup.

What is a Tail Log Backup?

Simply put, a Tail Log Backup contains log records that were not yet backed up at the time of failure.  So if Transaction Log Backups occur every 15 minutes, and you suffered loss at the 11 minute mark, the Tail Log Backup includes all data changes during the time span between the last successful Transaction Log Backup and minute 11.  Recovery using the Tail Log backups can be performed in either Full Recovery or Bulk Logged Recovery, but cannot be used in Simple Recovery Model. 

Why are these important?

Is it possible to recover with no data loss?  YES*!  This is where our new friend comes into action!   Taking a Tail Log backup is done to prevent data loss and helps recover right up to the point of disaster. (This is also referred to as Point In Time Restore.)

Keep in mind: in order to recover your database to its most recent point, you must have a valid Full Backup and valid Transaction Log Backup sequence!

After a disaster, if you can take a Tail Log Backup, have all the preceding Transaction Log Backups, have a valid Full Backup, and you are in Full Recovery mode, it is possible to recover with NO DATA LOSS!  For this to be possible in Bulk-Logged Recovery mode, no minimally logged operations must have occurred.

In what case would you ever need a Tail Log Backup?

Any time you have a damaged database and are needing to restore, it is best to check to see if you need a Tail Log Backup.  The question you need to ask is “Do I have Transaction Log Backups?”   If the answer is yes, your recovery will be much faster!  Another question to ask is “Is the Server still available?”

Server Still Available

If the database is damaged but the server is still available, it is pretty easy to take a Tail Log Backup.  When the data files are damaged or missing, you will get an error if you try to take a normal log backup.  But if you use NO TRUNCATE, you will be able to take a Log Backup.

BACKUP LOG [TestDB] TO DISK = 'G:\DBA\Backups\TestDB_Log_Tail.bck' ;
WITH INIT,
NO_TRUNCATE;

*Note:  In order to successfully take a Tail Log Backup, you must use NO TRUNCATE. That will allow the log backup even if the database files are damaged or not there.  Using INIT will overwrite any existing set and you will still end up with only one backup in case the command is run twice.

Server Not Available

Let’s say the server has crashed and cannot be brought back online.  If you are lucky enough to still have access to all the Full Backups and Log Backup files, you can attach them to another server and automatically recover.

If the database is damaged and the server is not available, taking a Tail Log Backup becomes a little more difficult. Rest assured, there is still an option to try.

You will need to create a dummy database with the same name as the one that is damaged.

  1. Next, set the database offline.
  2. Delete the files from the dummy database.
  3. Drop in the log file from the real database.

--Create a dummy database with the same name
CREATE DATABASE [TestDB];
GO
--Set the database offline
ALTER DATABASE [TestDB] SET OFFLINE;
GO
--Insert Log file from original database--Take Tail Log Backup
BACKUP LOG [TestDB] TO DISK = 'G:\DBA\Backups\TestDB_Log_Tail.bck' ;
WITH INIT,
NO_TRUNCATE;
GO

Now you are ready to take a Tail Log Backup as detailed above.  This will allow you to recover to the point of failure!  In my next post, we will do a deep dive into Recovery Using Tail Log Backups.

Thank you for reading!

 

*Originally posted at Procure SQL:
https://www.procuresql.com/blog/2018/09/19/tail-log-backups/

 

Power BI Data Gateway

What is a Data Gateway in Power BI?

When creating reports in Power BI, the end goal is to make them useful to many users. In order to share reports created in Power BI, they must be published to the cloud (known as PowerBI.com). Once nestled in the cloud, the data in the reports will either stand static, or will need to be updated on a regular basis. In order to refresh data and keep end users in up to date, the cloud must have access to data sources. This is where you need a Data Gateway. Think of a data gateway as a bridge between your on-premises data sources and the cloud.

A gateway should be installed on a machine that is always on and connected to the internet. Gateway cannot access information from a machine that is powered off or loses internet.

  •  Before installing, take into consideration that if you are installing on a laptop and it is turned off, not connected to the internet, or asleep, the gateway won’t work and the data in the cloud will not sync with your on-prem data. Also, if the machine on which the gateway is installed is connected to a wireless network, the gateway may perform more slowly and that will take longer for the data to sync with the cloud and your on-prem data.

Power BI Gateway can be installed in two ways:

  • On-premises data gateway – This gateway can be used by any user that has access to the server on which the gateway is installed. It can be used for scheduling refreshes and live queries.
  • On-premises data gateway (Personal mode) – This gateway is can only be used by the person setting up the gateway. This mode is only used for scheduling refreshes in Power BI. At the time of writing, Live Connection, DirectQuery, Power Apps, Logic Apps, and Microsoft Flow are not supported.

Only one gateway in each mode can be installed on one machine. That is, you may install one gateway in personal mode, and another in regular mode. You cannot install two or more personal mode gateways on one machine. You can, however, manage multiple gateways from the same interface on Power BI.

Installing a Gateway

To install a gateway, you will first need to sign on to PowerBI.com. Take note that this is NOT the desktop app, this is the cloud-based service. Look at the top right on the menu bar, click on the icon that looks like an arrow pointing down. The dropdown will reveal several actions. You will want to choose ‘Data Gateway’.

SettingupGW

This will take you to a new webpage where you will be able to start your Gateway download. Click on the DOWNLOAD GATEWAY button and wait for the download to begin. Once the Download Installer has finished, open up the exe and follow instructions.

DownloadPage

When the installer opens, you will be ready to start setting up your gateway.

GatewayInstaller

Click NEXT to choose the type of gateway you need.  Before you choose, take into consideration the role of each. Remember that the Personal mode is only useful for on demand refresh and scheduling refresh in PowerBI and cannot be used for Live or DirectQuery. On-premises data gateway can be used by multiple users and does support both schedule refresh and DirectQuery.

Please note the following in regard to installing either mode:

  •  both gateways require 64-bit Windows operating systems
  •  gateways can’t be installed on a domain controller
  •  you can install up to two On-premises data gateways on the same computer, one running in each mode (personal and standard)
  •  you cannot have more than one gateway running in the same mode on the same computer
  •  you can install multiple On-premises data gateways on different computers, and manage them all from the same Power BI gateway management interface (not including Personal mode)
  •  You can only have one Personal mode gateway running for each Power BI user. If you install another Personal mode gateway for the same user, even on a different computer, the most recent installation replaces the existing previous installation.

ChoosePersonalorNot

Once you have chosen your mode and clicked next, it will take a a few seconds for it to download and get ready to install your gateway.

The next step is to point the download to the drive on wish you want the install to go. You will want the Gateway positioned as close to your data source as possible. Be sure to read and accept the terms of use and privacy statement.

Upon successful installation, you will need to add an email address to use with this gateway. Next you will need to sign in.

SucessfulInstall

We have successful installation of our Gateway! Now you will have the option to configure a new gateway, migrate, restore, or take over an existing gateway. Here we will register the data gateway.

Register On-Prem

To configure a new gateway, you will need to enter a name for the gateway, enter a recovery key (minimum 8 characters) and finally, select Configure. Be sure to store your recovery key in a safe place. You will need it in the future if you ever need to migrate, restore, or take over a gateway.

ConfigureGateway

Congratulations, you now have a successful installation and configuration of Gateway! Now you will be able to connect to on-prem data sources! For use with Power BI, you will need to add your data sources to the gateway within the Power BI service. This is done by going to the menu bar, clicking on the gear icon and choosing MANAGE GATEWAYS from the drop down. We will cover adding data sources in the next blog!

*For a more in-depth look at Gateway installation, information can be found on Microsoft Docs.

Deep Dive into Bulk Logged Recovery Model

    Full disclosure time: Bulk Logged Recovery Model is  quite confusing to me.  And as it seems, to many others.  I wrote a bit about it in SQL Server Recovery Models and decided that it was so complex, I really wanted to learn more and to explore what works and what doesn’t.  Let’s take a deep dive into bulk logged recovery!

Deep Dive into Bulk Logged Recovery

Why would you choose Bulk Logged Recovery?

    Switching from full recovery to bulk logged recovery does have its perks when you have a very large amount of data to insert.  Most notably in a data warehouse setting, switching to bulk logged recovery to perform bulk inserts make perfect sense as you are dealing with very large amounts of data being updated at one time.  Also, when doing an index rebuild switching to bulk logged recovery can improve performance while performing operations on large amounts of data at once.

Are there better ways to insert a large amount of data at once?

     Bulk Logged Recovery uses minimal logging for bulk-logged operations, this reduces log space usage. I must add a caveat here; it makes it faster and reduces the usage in the file, but it results in a very large log backup. 

“Under the bulk-logged recovery model, if a log backup covers any bulk operations, the log backup contains both log records and the data pages that were changed by bulk operations. This is necessary to capture the results of the bulk-logged operations. The incorporated data extents can make a log backup very large.” Reference Myths & Truths

     However, there is a risk of data loss for the bulk operations because these operations prevent capturing changes on a transactional basis.  A point in time recovery (PIT) while using bulk logged recovery is not possible because the minimally logged operations cannot be restored.  This can be an issue. So, if you have a bulk operation that needs to be handled, but you want to ensure point in time restore of each transaction in that operation, what is an alternative solution?  It is important to note that you can indeed recover a transaction log containing bulk logged operations but not to a particular point in time.  Instead you can take a transaction log backup as soon as the bulk operation is finished and regain PIT recovery.

     You can still perform bulk operations in full recovery model, it just means that they will be fully logged and that you will not see the performance gains from minimal logging.  It is the price you have to pay, you sacrifice performance for PIT restore abilities of the transactions within the bulk operation.  Ultimately your decision will have to be based on weighing what your company demands for I/O and RPO/RTO (Recovery Point Objective/Recovery Time Objective).  Do you know what your company’s RPO/RTO plans entail?  Now is a good time to find out!

     Feeling uneasy?  Wondering if there are there other ways to process bulk operations?

     There are different methods one can utilize for optimizing bulk operations.  These methods include using minimal logging, batching, disabling triggers and constraints, and many others that can be found here

How best to ensure limited data loss using Bulk Logged Recovery.

    So you decide you are going to use bulk logged recovery and you want to make sure that you are set up for success, there are a few things to keep in mind.  It is recommended that you perform bulk inserts using bulk logged recovery when there is the least amount of activity on your database.  Also take into consideration how difficult or easy it will be to recreate data if there is a failure during the bulk insert.  There is no PIT restore of the bulk operation using bulk logged recovery.  If the bulk operation is interrupted at any point, the entire operation must be performed again in its entirety.

Still want to proceed?

Wait!   

First, before you switch from full recovery, take an extra log backup.  If all things go badly, at least you will be able to get your database back to the point before you switch recovery models.  This is highly recommended!  If not, this is what we call an RGE (resume generating event).

     Let’s walk through the process of getting ready and switching recovery models.  Our first step in this exercise is to create a table.  We then go on to taking a log backup, inserting data manually, taking a log backup, and then on to switching to bulk logged recovery.

This is a great visual of what we will be doing.

 


--Step 1--

Use BLRDB
GO

DROP TABLE dbo.BulkDataTest;

CREATE TABLE dbo.BulkDataTest
(Price money NULL,
ProductID int PRIMARY KEY NOT NULL,
ProductName varchar (25) NOT NULL,
ProductDescription text NULL)
GO

BACKUP DATABASE LOG [BLRDB]
TO DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0800.bak'

-- Step 2--
Insert into dbo.BulkDataTest
(Price, ProductID, ProductName, ProductDescription)
VALUES ('456', '456123', 'HeroBike', 'Red Bike with Hero Cape Handles');

-- Step 3 --

BACKUP LOG [BLRDB]
TO DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0815.bak'

-- Step 4 --
-- Switch to Bulk Logged Recovery Model --
USE [master]
GO
ALTER DATABASE [BLRDB] SET RECOVERY BULK_LOGGED WITH NO_WAIT
GO
-- verify Recovery mode  This script will ensure that you are in the desired recovery model--
select name, recovery_model_desc from sys.databases

Our next steps will be to insert our bulk data, insert manual data, take log backups, switch back to Full Recovery, and take an additional log backup.

-- Step 5 --
-- Commit both transactions below at the same time--
USE [BLRDB]
GO

BULK INSERT BulkDataTest
FROM 'C:\DBA\TestDocs\demo_bulk_insert_26.csv'
With (FIELDTERMINATOR = ',' ,
ROWTERMINATOR = '\n' ,
ROWS_PER_BATCH = 100000,
TABLOCK
);
GO

INSERT INTO BulkDataTest
(Price, ProductID, ProductName, ProductDescription)
VALUES ('1099', '1111111', 'HoverCraft', 'BippityBoppityBoop');

Select *
From dbo.BulkDataTest

-- Step 6 --
--take log backup--

BACKUP LOG [BLRDB]
TO DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0900.bak'

-- Step 7 --
--Insert more rows manually--

INSERT INTO dbo.BulkDataTest
(Price, ProductID, ProductName, ProductDescription)
VALUES ('56', '111117', 'TheCheap', 'One of the cheapest bikes ever made'),
('58' , '111118', 'NewerModel', 'This one is for beginners'),
('591' , '111119', 'ABetterOne', 'Okay this one is good') ;

-- Step 8 --
-- Switch back to Full Recovery Mode--
USE [master]
GO
ALTER DATABASE [BLRDB] SET RECOVERY FULL WITH NO_WAIT
GO

Use BLRDB
GO
-- Step 9 --
--Insert more rows manually--
INSERT INTO dbo.BulkDataTest
(Price, ProductID, ProductName, ProductDescription)
VALUES ('36', '111120', 'BoyBike', 'This is a bike for tall 8yo'),
('136', '111121', 'ManBike', 'This is a bike for tall men'),
('236', '111122', 'ShortBike', 'This is a bike for under 5foot');

-- Step 10 --
--Take Log Backup--
BACKUP LOG [BLRDB]
TO DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0930.bak'

Ready for a challenge?

Now let’s simulate a dropped database and walk through restore! The following steps will walk you through to be able to answer the questions below.

Question 1: Restore backup to step 6.  What is missing?  Do you have the single row inserts?  Is the data from the bulk insert there?


--Drop Database--
USE Master
GO

DROP DATABASE [BLRDB]

--Restore Full Backup --

RESTORE DATABASE BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Full_02232018.bak'
WITH NORECOVERY;

RESTORE DATABASE BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0800.bak'
WITH NORECOVERY;

RESTORE DATABASE BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0815.bak'
WITH NORECOVERY;

RESTORE DATABASE BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0900.bak'
WITH NORECOVERY;

RESTORE DATABASE BLRDB
WITH RECOVERY;

use [BLRDB]
go

SELECT *
FROM dbo.BulkDataTest

--For Qustion 1, Restore Step 6 --

USE Master
GO

RESTORE DATABASE BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Full_02232018.bak'
WITH NORECOVERY;
GO

RESTORE DATABASE BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0800.bak'
WITH NORECOVERY;
GO

RESTORE DATABASE BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0930.bak'
WITH NORECOVERY;
GO

RESTORE DATABASE BLRDB WITH RECOVERY;
GO

use [BLRDB]
go

SELECT *
FROM dbo.BulkDataTest

Question 2: Restore to backup at step 10.  What is missing?  Do you have everything?


-- For Question 2, Restore Step 10 --
-- Drop DB--
USE Master
GO

DROP DATABASE [BLRDB]
GO

USE Master
GO

RESTORE DATABASE BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Full_02232018.bak'
WITH NORECOVERY;
GO

RESTORE DATABASE BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0800.bak'
WITH NORECOVERY;
GO

Restore Database BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0815.bak'
WITH NORECOVERY;
GO

Restore Database BLRDB
FROM DISK ='C:\DBA\Backups\BLRDB_Log_02232018_0900.bak'
WITH NORECOVERY;
GO

Restore Database BLRDB
FROM DISK = 'C:\DBA\Backups\BLRDB_Log_02232018_0930.bak'
WITH RECOVERY;

USE BLRDB
GO

SELECT *
FROM dbo.BulkDataTest

Swimming back up.  Swimming Back Up

     We have taken the plunge and now it is time to swim back to the surface and discuss what we have found.  Working through this exercise and answering these questions should show you how bulk logged recovery model works, why it is important to switch back to full recovery, and most importantly, why full and log backups are a must!

     What are your answers?  Your answers should show that if you are only able to restore to step 6 (the point at which you took a log backup but had not yet switched back to full recovery and taken another log backup) all data operations that were performed since your bulk operation have to be recreated!  The second answer should show you that it is imperative to take a log backup before switching to bulk recovery, take a log backup after your bulk insert, and take another log backup after reverting back to full Recovery.  If you follow those steps, your answer shows that all of your data operations were in fact logged and can be restored up to the last log backup without data loss.

*Originally posted on Procure SQL at  https://www.procuresql.com/blog/2018/07/18/deep-dive-into-bulk-logged-recovery-model/