Blog » Via Rod Colledge
... In this series of blog posts, I'll examine DBA best practices through the lens of the very opposite; worst practices ...
In the last post, we discussed three backup related worst practices, one of which was not anticipating and simulating various restoration scenarios. This practice fits within the framework of a broader disaster recovery strategy, and will be the focus of this post.
There's a fantastic feature article on CIO.com titled When Bad Things Happen to Good Projects. While SQL Server is not involved in this particular case, the lessons learned are applicable to any project. It tells the story of how a series of seemingly small events combined to create the perfect storm and almost destroy an SAP project within HP. The big takeaway from this article for me was the importance of pessimism.
Nobody likes a pessimist. Negativity, at the office or at home, is hardly an attractive personaility trait. It's human nature to be attracted to optimists, and one needs to look no further than Barack Obama for confirmation of that. Regardless of your political persuassion, his gift for words and positive outlook are an intoxicating combination that is difficult to deny. The problem, of course, is that it's far too early to tell whether his actions will help or harm the recovery ... I digress ...
In the context of SQL Server administration, optimism is, at best, irrelevant, and at worst, a catalyst for catastrophy. DBA's have a proud reputation for being difficult to deal with. We've all heard the jokes about DBA being an acronym for "Don't bother asking". In my experience, the best DBAs tend to be the most pessimistic, and for good reason. Let's examine a number of worst practices for disaster recovery, common attributes of the "Optimistic DBA".
Not having a disaster recovery plan;
DBA's without a disaster recovery plan of some sort are either ignorant, lazy, or dangerously optimistic. Either way, they're in the wrong job. Data is everything. In the top 10 list of organizational assets to be concerned about in a crisis, data recoverability and availability would occupy positions 1 thru 9.
Fortunately, SQL Server 2008 has a number of features which can play a key role in the development of a disaster recovery plan. As this series of blog posts concentrates on worst practices, we won't spend time addressing such features. Instead, I'd like to pose a series of questions. If you're a DBA, how would you answer these?
- If my server, and all of its data, was physically destoyed or stolen, what would the recovery process look like, how long would it take, and how much data would be lost?
- Would the system be recovered in line with the Service Level Agreement (SLA)?
- What dependancies, interfaces and people do I rely on for recovery purposes?
Knowing the answer to these questions (among others) is absolutely crucial. If you can't answer them, or even if you hesitate, then you need to have a serious think about what the ramifications would be in a disaster situation.
In the last post, we discussed the importance of simulating restore scenarios. Our next worst practice takes that to the next level.
Not simulating and practicing the disaster recovery plan;
Although I list this as a worst practice, I'm well aware of the difficulties of actually implementing this one. Unlike a simple restoration simulation in a test environment, testing an entire disaster recovery plan, for example, destroying and rebuilding a server, or failing over to an offsite location, is a lot more involved. The primary impediments here are time and cost, both of which typically involve a lot more people than the DBA group. As such, management support is crucial in understanding the importance and value of such an excercise.
Disaster recovery simulation should be tightly coupled to the development of SLA's. If a business demands zero data loss and 5 nines availability, then they must be prepared to spend the time and money developing and simulating an appropriate recovery plan. It's unrealistic to expect such a service level without a mechanism to ensure it's achievable. In a similar manner, a DBA that signs off on an SLA without a means to ensure it's achievable is either ignorant, lazy, or dangerously optimistic, three attributes we covered earlier.
Defining Disaster Too Narrowly;
Finally, what's your definition of disaster? Beyond the obvious cataclysmic events, do you plan for smaller events such as these?
- Someone accidentaly (or maliciously) dropping a production table,
- A security breach occurs whereby production data is accessed by an intruder. What did they see, and what are the implications?
- A storage controller produces physical corruption across a number of disks,
- Your lead DBA is off sick for a number of weeks
These are obviously only a very small selection of a massive number of potential "small" disasters. As made clear in the CIO article I mentioned earlier, "small" disasters have a nasty habit of combining together into much larger ones, and in such situations, the well prepared pessimist will almost always be the person you'll want in the drivers seat.
In the next post, we'll take a look at worst practices around change control processes ...