Strategy is a Contact Sport

March 22, 2008

Quality Engineering Throughout the SDLC with Gomez

IT Monitoring tools are usually something consumed by techies deep within IT shops and rarely are they discussed within IT and rarer still with the business owners. I’m going to discuss a tool, however, that will change the role monitoring plays in your organization and a tool that will transform your business.

We’ve been using HP/Mercury Business Availability Center for two years at my current company. At AOL, we used everything from OpenView to NetCool to understand our infrastructure. In about ten years of being involved in the evaluation and selection of infrastructure monitoring systems, I’ve never come across a monitoring tool as big a game changer as Gomez’s monitoring software (www.gomez.com).

Depending on where you are on the three stages of the monitoring evolutionary scale, this tool may be more advanced than your company is ready for. I’ll discuss the three types of IT monitoring and then spend the remainder of the article describing synthetic transactions and why I think Gomez is a unique tool to help your entire company not just the IT department.

Three Types of IT Monitoring

  1. Link State: This is the most basic form of monitoring. The monitoring system sends a packet every five minutes or so to determine if a physical device is still running. If the device responds to the request, the monitoring systems assume that everything is ok and does not report a fault state for that device. If a device, however, does not respond to the ICMP request, the monitoring system assumes that there is a problem and sends an alert to the administrator that a system or network device is unavailable. The problem with link state monitoring is that it does not attempt to understand if the operating system and services that are running are working properly. The next level of monitoring addresses this deficiency.
  2. Process & Services: The next level of monitoring builds upon the link state monitoring capability and provides the IT department a more robust tool for service support. Anybody who has told a customer “the server is up but we’re not sure why the application isn’t responding” knows the value of process and services monitoring. Link state is great for telling IT people that a server is up. Too bad that is pretty worthless for understanding if applications are available. What employees and customers care about is the application that is running on the server that they are trying to reach. Applications running on Unix/Linux use an operating system component called a process. Applications that run on Microsoft Windows servers use a component called a service. Functionally there is no difference between the two. If an employee is attempting to use Microsoft Exchange and it isn’t responding, the MS Exchange service is probably not running or has hung. Process & Services Monitoring looks at these OS components and sends an alert to the administrator when they fail to run properly. In our example, if we are monitoring the Exchange process, we will know immediately when the server has a problem delivering email.
  3. Synthetic Transaction: The previous two monitoring states are important and form the foundation of reactive monitoring in IT infrastructure. They are great in helping IT NOCs and engineers respond quickly to support issues but fail to help IT and the business proactively look for tends and potential problems before they arise. Synthetic monitoring actually helps IT and the business come to a common understand of how well the end to end transaction is working. Synthetic transactions simulate actual transactions such as reading a database, retrieving a webpage or authenticating a user session. Each transaction mirrors a real user transaction. Because you can use technology to simulate these transactions, you can now perform them 24×7x365 and use the data to understand when you are having problems. We can look at critical information that was previously unavailable such as average webpage load time, time to pull a record or other customer transactions. If the baseline for a webpage load is normally four seconds and it increases to eight seconds, you know that something has happened in your environment that has doubled the time it takes to load a page. That’s a very bad thing. The IT team can start looking at recent code or infrastructure changes that may have introduced latency into the environment.

Synthetic Monitoring Evolution

Now that we understand the three monitoring stages that IT organizations go through, we have to look at some of the vendors in the synthetic monitoring space and understand how they differ. I have implemented various tools that performed these functions. HP Mercury Business Availability Center is the most recent tool that I have used to baseline our applications and infrastructure. BAC is an excellent tool and provides important insight into how apps and the underlying infrastructure are doing. With BAC you can baseline your apps and look at trends over time to determine the affect of technology and webpage decisions have on site performance. But the fundamental problem with BAC is that it is dependent on the IT team that manages the tool to collect and distribute information about performance as shown in figure 1.

Figure 1: IT Infrastructure creates and distributes reports of limited value

At my current company, our organization has implemented the BAC tool and is also the team responsible for collecting and presenting the information to out stakeholders. We become the gatekeeper and the bottleneck for getting information to people that need to make technical and business decisions. Another problem with this tool is that the development and QA organizations cannot make decisions based on the information they receive. Any actions that arise from the tool are based on inferences from data compiled and disseminated by the infrastructure team. An additional problem pointed is that any requests or changes to the information generated must go through the IT infrastructure team. The Dev, Business Intelligence and Marketing departments have to wait for their requests to be prioritized. This can lead to unnecessary delays that may translate into lost revenue opportunities or poor website performance.

Quality Engineering throughout the SDLC

I once had a client who complained that the environment we managed for them had lower availability than our Mercury BAC tools were telling us. They were using Gomez. Since “the customer is always right”, we worked with them to understand what Gomez was telling them and how it differed from our tools. We successfully addressed a number of internal issues that made our client very happy. In the process, we discovered that their tool was significantly better than ours. There are many tools that provide baselining and benchmarking of your applications. What is different with Gomez is that it looks at the collection of objects on the page and determines which objects are causing the delay in load time. Whereas Mercury BAC did a great job of proving our overall transaction time and availability, it didn’t decompose each object, each call, and each jpg, in a way that would tell us which calls were hampering the customer experience. Gomez measures the transaction times for each step in the sequence so we know where to look immediately for performance improvements.

How would you like to know more about your competitors’ webpage than they do? As part of the benchmarking capabilities, you can compare your website performance to theirs and see if they are more efficient in their page caching or object calling methodologies. Does your business think that site speed is affecting revenue? Find out by baselining your site against your competitors. If you’re faster than your competition, maybe throwing money at website performance won’t solve your problem. Maybe you have design issues that focus groups can help you uncover.

Most synthetic monitoring tools have nodes in data centers throughout the world that reside on the internet backbone. These nodes poll your sites and report response times through their synthetic monitoring. Gomez does this but it does a lot more. In addition to backbone measurements, they have more than 40,000 desktops throughout the world at various internet speeds that also perform monitoring of your site.

In this example, the traditional backbone monitoring would lead you to believe that your site is performing reasonably. It takes 9.7 seconds to load a webpage with an availability of 98.05%. But this is high speed backbone performance (>100mbs throughput) that few people in the world will actually experience. When you look at the traditional DSL customer who is defined as low broadband, we see that the response time more is twice as slow. And if you hope that you can sell to that person still on dial up, they will have to wait over a minute for your page to download. Developers can now work with the business on defining how graphics intensive they want the website to be based on actual load times. You no longer have to guess how your website will perform because Gomez tells you by bandwidth, geography, browser and OS.

Dev and QA can use the BrowserCam feature to select the operating, browser, and screen resolution that they would like to review before they release a new website or page into the production environment. Want to know how customers will view your page running on Windows XP with IE 7 in Los Angeles? No problem. That information and an actual screen rendering are only a few clicks away. Your IT department has the information it needs to know exactly what the finished product will look like when it is delivered to the business owner. And the business owner knows exactly what it will look like when it is released to their customers.

To say that Gomez has the potential to better align the business and IT would be an understatement. In a company that uses Gomez, the consumption model changes from the IT Infrastructure team acting as the gatekeeper and potential bottleneck of information to the many departments being able to directly access the information. Now everyone can access a tool for the purposes unique to their organizational needs. Infrastructure teams can still use it for SLAs and alerting like they always have. But now the QA and Dev teams can use it to develop and test code before it gets to the business or the customer. They will have a wide variety of test tools available to look at how their code will work on various operating systems and browsers. They will be able to test their applications and see how geography affects performance. They will be able to see how bandwidth affects their app also. Marketing and Sales can work with their Business Intelligence or Data Warehouse teams to match the marketing campaigns to a specific geographic market and the most popular browser. Are most of the users in your target market dial up users? If so, you can select to view only performance data based on those users. Dev and the business can work together to design a lightweight website that targets low bandwidth users. There isn’t any guesswork as to what it will look like or how long it will take those pages to load. Gomez tells you exactly what it will look like.

Nice!

A Disruptive Technology

I’ve only touched the surface of the power of Gomez. But the truth is that technology is only one component in successful IT alignment with the business. The other two parts of successful IT alignment with the business are people and processes. This is what potential customers will have to evaluate when the look at Gomez. It is much easier to embrace the past and consume reports from systems like Mercury BAC because it doesn’t require a change in your company’s culture or business processes. Gomez changes everything. Used effectively, people in departments from marketing, business analytics to QA an Dev can now learn how to do their jobs better because of the wealth of information Gomez provides.

Once you get your people committed to the tool, you have to look at your internal processes. If you are a developer or QA manager, you just can’t do things the same once this tool is deployed. You have to embrace it and change your existing SDLC. This will require a reevaluation of your software development lifecycle, your quality assurance processes and your release management function. You’ll be able to deliver with better quality and faster if you do. But you have to change. You just can’t ignore the information that Gomez presents or you are failing to do your job. And that’s why ultimately this is a disruptive technology.

People will have to embrace working tougher differently than they did in the past. People will have to look at their processes and see how they can leverage the tool. If you are the type of person that doesn’t want change your existing way of doing business, or if you think your corporate culture won’t accept reengineering how you design, develop and manage your infrastructure, you probably should pass on this tool. A successful Gomez deployment requires strong leadership from your IT and business executives because it will change the way your company does business.

And for most companies, that’s a very good thing.

More information on Gomez can be found on their website at www.gomez.com

November 17, 2007

John Maxwell Meets Pete Carroll

Filed under: Operational Excellence — rontevans @ 7:17 pm
Tags: , , , , ,

During the dark days of USC football in the 90s, we had a string of knucklehead coaches. Ted Tollner, John Robinson’s second stint, and to borrow from Harry Potter, “he-who-must-not-be-named” (Paul Hackett) all held the reigns of the USC football program. Our talent was always amazing but we could never even sniff a Pac10 championship, much less a national championship because something was missing. It was the team’s coach.

It was leadership.

In Pete Carroll’s first season, the team started off 1-5. He then got everyone focused and the team finished 5-1 for a six and six record. In 2002, the team finished #4 in the nation. In 2003 and 2004 they won national championships. They have finished no lower than fourth every year since 2002.

Last year in Urban Myer’s second season as coach, Florida won the national championship.

John Maxwell has written a number of books on leadership. One of my favorite books is “The 17 Irrefutable Laws of Leadership.” He comments “People don’t buy into a vision until they buy in to the leader. Once they believe in the leader, they will believe in the vision.”

People have to believe that tomorrow is going to be better than today or why bother? In IT, we often are in reactive mode because we are responding to the demands of the business and the marketplace. Leaders are required to help guide people through the dark days and evenings by offering them a hope that it will get better.

IT is a great career but it is filled with frustrations and challenges. Leaders have to make the right investments in people, processes, and technology to ensure that their teams win in the workplace. Leaders get the right players playing the right positions and then develop a game plan to use the talent on the field.

Pete Carroll, Urban Myers, Jim Tressell, and Bob Stoops are all premier college coaches because they practice John Maxwell’s lesson on being the leader that people want to follow.

October 1, 2007

Two Great Quotes on IT Measurement

Filed under: Operational Excellence — rontevans @ 2:06 am
Tags: , ,

I’m getting ready to send out our August Balanced Scorecard Operational Review deck to our IT organization today. It got me to thinking about a couple of my favorite quotes that I wish other parts of the organization would use to manage their teams. The first quote is on the first page of every monthly scorecard. If I could force people to sleep with this quote under their pillow at night I would.

“Trying to improve something when you don’t have a means of measurement and performance standards is like setting out on a cross-country trip in a car without a fuel gauge. You can make calculated guesses and assumptions based on experience and observations, but without hard data, conclusions are based on insufficient evidence.”

Mikel Harry, an author of a good book on “Six Sigma.”

We can definitely debate the impact Six Sigma has had on corporations but the quote is fantastic. His book is the origin of my phrase “we are a facts based, math based organization” that I try to instill into the team. His second quote that I love states that “you don’t truly understand a problem until you can express it mathematically.” Both good quotes and I definitely agree with them. Without facts, my nine year old daughter expressed it best when she said “Opinions are like armpits. Everyone has them and sometimes they are smelly.”As she gets older she will change the wording slightly but the message and its meaning will still stay the same.

September 17, 2007

Application Portfolio Management: An Overview

Filed under: Operational Excellence — rontevans @ 1:58 am

One of the challenges of introducing fancy terms is people will think that you are using a 25 cent word when a five cent word would do.

That was my initial concern when I brought the term Application Portfolio Management to my company. APM was a dream in November and December when I started with my current company so I just kept it on the shelf and hoped that one day we could get to it.

In March of this year, the daylight savings patch project hit. The first thing we had to do was work with the existing team to identify how many servers we had. We had no clue. The list of servers and locations created the first authoritative list of systems. We plowed through the DST project but a bigger accomplishment occurred. We now had a list of servers.

It would be easy to call the initiative “server consolidation” because technically we are reducing the number of servers we have. But server consolidation is a tactical, operations only focus on the real business problem that needs to be addressed. We don’t have a problem of too many servers, we have a problem of too many systems and applications that the business has asked us to support on their behalf.

APM forces IT to engage the business on agreeing to the right level of hosting necessary to support the business goals. APM looks not at servers and physical boxes rather it looks at the applications that run on them that are in service of some larger revenue or expense goal. If we are hosting an application, APM demands that the application have an owner in the business that has signed up for the value said application is delivering.

But the first step is to build a list. The second step is to pull together the list of systems and departments assigned to those devices. Once we have the business signed up for their servers, we can then help them rationalize the number of applications and systems it takes to run their business units.

Phase one of APM is gathering the list of devices and making the easy decisions regarding how many prod, dev, and stage servers we have and if any of them are candidates for decommissioning. For a small company that has grown rapidly, we have built a large list of candidates for decommissioning over the years.

The second phase is to gain finance support for allocating the cost back to each department. This is where we really gain traction on the APM initiative. We have real dollars that we are spending that can be saved through APM phase 2. Right now, IT eats that bill. The business is not incented to reduce the number of systems we have in our three environments. By presenting a hosting cost by business unit, it helps the company bring down its expenses by determining if systems and their applications are really worth the money we are spending to maintain them.

We’re on our way and with IT, Finance, and the business support, we’ll be able to lower our company’s operating expenses while improving IT’s responsiveness at the same time.

 

June 5, 2007

Common Characteristics of High Performing IT Organizations

Filed under: Operational Excellence — rontevans @ 1:56 pm
  • High service levels and availability
  • High throughput of effective change
  • Higher investment early in the IT lifecycle
  • Early and consistent process integration between IT operations and IT security
  • Posture of compliance
  • Collaborative working relationships between functions
  • Low amounts of unplanned work
  • Server to system administrator ratio greater than 100:1

source: The Visible Ops Handbook, Implementing ITIL in 4 Practical and auditable Steps

May 17, 2007

Change Control Questions

Filed under: Operational Excellence — rontevans @ 1:50 pm

“Who” Questions

  •  Who will be affected by the change? Ensure that there is appropriate representation on the CAB to make decisions.
  • Who could be affected by the change if it fails?
  • Who from the potentially affected group(s) has signed off on the change?
  • Who is performing the change (the “change builder”)?
  • Who has reviewed the proposed change?
  • Who is the project manager if this change involves more than one step?

“What” Questions

  • What assets are the targets of the proposed change?
  • What is the change timeline?
  • What is the change review priority based on the associated risk and urgency?
    • Urgent
    • High
    • Medium
    • Low
  • What assets or processes depend on the targeted assets?
  • What will the successful change look like when implemented?
  • What business processes need to be verified after making the change?
  • What is the business or technical reason for the change?
  • What will happen if the change is not made?

“When” Questions

  • When will the change be performed?
  • When will it be finished?
  • When will the benefits of the change be realized?

“How” Questions

  • How will the change be implemented (in waves, one at a time, etc.)?
  • How will we verify success?
  • How will issues be escalated?
  • How successful were similar changes in the past?  (i.e. change success rate)

“What if” Questions

  • What is the back out (rollback) plan if the change should fail for some reason?
  • What is the worst possible outcome associated with this change?
  • What will the worst case service outage be?

May 16, 2007

Ron’s IT Operations Golden Rules

Filed under: Operational Excellence — rontevans @ 1:46 pm
  1. Protecting the production environment is our number one job.
  2. The first part of protecting the environment is protecting the data. Backups are critical. Everything can be interrupted to ensure we get a clean backup.
  3. The most important part of your change control is the back-out plan. If it all goes to heck in a handbasket, you can quickly get back to a good state if you have already planned your recovery in advance.
  4. There is no such thing as minor change.
  5. Nobody is indispensable…but all of the DBAs are forbidden to cross the street at the same time.
  6. It’s only money. Don’t let that get in the way of doing what’s right.
  7. A policy without compliance and auditing is a wish. As in “I wish they would do what I said.”
  8. What gets measured gets improved.
  9. You don’t understand the problem until you can express it mathematically.
  10. Never say no to a user - just put a price tag on yes.

May 7, 2007

The Toolkit methodology

Filed under: Operational Excellence — rontevans @ 2:30 pm

Step One: Hunt & Gather

  • Interview business customers and stakeholders
  • Interview staff. Conduct 1:1s and meet and greets with your organization.
  • Interview your boss.

Step Two: Synthesize

  • From what you have learned, identify top 5 frequent themes from interviews.
  • Communicate themes back in the form of top objectives for your department for consensus.
  • Communicate themes to your boss for consensus and alignment.

Step Three: Communicate the Vision

  • Use the top 5 themes as your organizational strategies for the department.
  • Use the four perspectives of the balanced scorecard to ensure you are covering financial, customer, internal process and people in your strategy.
  • Create metrics that would be leading or lagging indicators of how well your department is achieving it’s goals for each of the five strategies. Ensure they are in a BSC perspective so you can look for balance in your scorecard.

Step Four: Operationalizing the Strategy

  • Use predefined templates to create a clear understanding of what the definition is of each metric along with it’s actual baseline if known.
  • Agree internally within IT first on a end of year target. Use this as a starting point for discussions with stakeholders.
  • Produce bi-weekly or monthly scorecard. Ensure each metric has an owner who is responsible for improving performance.
  • Approve projects and initiatives that directly support improving the metrics. How will this initiative help us reach our target for the year?
  • Allocate resources to projects based on impact to the metrics not based on organizational politics and alignment. Do the right thing not the easy thing!
  • Hold Monthly System Quality Meetings with your direct reports to let them know that you are overseeing the progress.

March 24, 2007

IT is like pulling babies out of the water

Filed under: Operational Excellence — rontevans @ 7:56 pm

I had a mentor once that described life in IT in the following manner. I think it’s a great story about the difference between being a tactical thinker and a strategic thinker.

A man was sitting by the edge of the stream one day and noticed a baby floating helplessly down the stream. He dropped the book he was reading and immediately jumped in the water, swam out, and got the baby and brought it safely back to the shore. As he put the baby on his blanket, he noticed that another baby was also floating down the stream. He jumped back in the water and swam out and saved this baby also. As he placed the second child safely on his blanket, he noticed a third baby in the water. Tired and exhausted the man jumped in the water and struggled, but successfully pulled the child out. As he turned around and saw the fourth baby in the water, he took several steps, clutched his heart, and fell dead on the side of the stream.

Sadly, the man never journeyed the 100 yards to the top of the stream, where he would have found the person who was putting the babies in the water and stopped him.

What does this have to do with IT?

Like the man who was too busy doing his job, IT leaders don’t stop long enough to look at what independent variables are causing the workload to grow within their organizations. We are constantly trying to be the heroes who save projects, write applications, and deploy systems when it is killing our teams. Instead of going the 100 yards upsteam to stop or at least retard the demand on IT services, we stay in the water waiting for the next baby to come down the stream. We’ve been in the water so long that it seems a normal way to behave.

What can you do about it?

Unfettered demand for IT resources is not sustainable. The costs are measured in sick time, attrition, and recruiting costs. To retain your staff, your leadership team, and your sanity, you need to inject a governance process into your organization. One thing you can implement quickly is a “Plan of Record.” A plan of record is the authoritative list of IT projects that have business sponsors. This list is critical because it lists, at a minimum, the name of the project, the expected completion date, the business owner, and a rough estimate of hours and dollars.

Once you have a PoR, you can now engage your business units in a discussion of their projects and their competiting demands on your resources. When you sit down with your business owner, you can have a fact based discussion like the following, “Jim, I am workingon four projects for your division, this new one is going to use the same resources as the other four. Can you help me prioritize which ones I should be working on first?”

As IT leaders we can take control of how many babies are being put in the water by going upstream and solving the problem on the front end. Using a Plan of Record as a tool for face to face meetings with your business partners help everyone reduce the frustration and sense of being a hero pulling projects out of the water.

March 9, 2007

The Idea Gap

Filed under: Operational Excellence — rontevans @ 2:24 am

When I talk to fellow IT leaders regionally and internationally there seems to be a common frustration among my peers. The problem is not now nor has it ever been the difficultly in finding people to perform IT jobs. I can stand outside with a hiring req and yell a salary and I’ll get tons of candidates. The problem is when you actually talk to them or bring them into your organization.

One of the things that helps me identify tomorrow’s leaders very quickly is how the phrase problems. A technologist, be it individual contributor through VP, who looks at a technical problem as, well, just a technical problem is firmly planted on a career treadmill going nowhere fast.

Technologist who will distinguish themselves from their contemporaries will be those who translate technology into business value. For a certain audience that reads this it will sound obvious. Sadly there will also be people that think that they are in technology for the sake of technology. Dude, those days have passed!

If technology purchases are not redefined as “IT investments” and looked at in that light, we will never align our department with the business direction. Sorry folks but if IT investments (capital, expense & labor) are not solving business problems by either enabling increased revenue or reduced expenses then it’s probably not an IT project aligned with any business results.

And it shouldn’t have been funded.

Ideas on how we drive business value is where people separate themselves. It’s certainly what I look for in potential candidates and existing staff. It should be just as much of the interview process as asking candidates technical questions.

Next Page »

Blog at WordPress.com.