Quality Engineering Throughout the SDLC with Gomez
IT Monitoring tools are usually something consumed by techies deep within IT shops and rarely are they discussed within IT and rarer still with the business owners. I’m going to discuss a tool, however, that will change the role monitoring plays in your organization and a tool that will transform your business.
We’ve been using HP/Mercury Business Availability Center for two years at my current company. At AOL, we used everything from OpenView to NetCool to understand our infrastructure. In about ten years of being involved in the evaluation and selection of infrastructure monitoring systems, I’ve never come across a monitoring tool as big a game changer as Gomez’s monitoring software (www.gomez.com).
Depending on where you are on the three stages of the monitoring evolutionary scale, this tool may be more advanced than your company is ready for. I’ll discuss the three types of IT monitoring and then spend the remainder of the article describing synthetic transactions and why I think Gomez is a unique tool to help your entire company not just the IT department.
Three Types of IT Monitoring
- Link State: This is the most basic form of monitoring. The monitoring system sends a packet every five minutes or so to determine if a physical device is still running. If the device responds to the request, the monitoring systems assume that everything is ok and does not report a fault state for that device. If a device, however, does not respond to the ICMP request, the monitoring system assumes that there is a problem and sends an alert to the administrator that a system or network device is unavailable. The problem with link state monitoring is that it does not attempt to understand if the operating system and services that are running are working properly. The next level of monitoring addresses this deficiency.
- Process & Services: The next level of monitoring builds upon the link state monitoring capability and provides the IT department a more robust tool for service support. Anybody who has told a customer “the server is up but we’re not sure why the application isn’t responding” knows the value of process and services monitoring. Link state is great for telling IT people that a server is up. Too bad that is pretty worthless for understanding if applications are available. What employees and customers care about is the application that is running on the server that they are trying to reach. Applications running on Unix/Linux use an operating system component called a process. Applications that run on Microsoft Windows servers use a component called a service. Functionally there is no difference between the two. If an employee is attempting to use Microsoft Exchange and it isn’t responding, the MS Exchange service is probably not running or has hung. Process & Services Monitoring looks at these OS components and sends an alert to the administrator when they fail to run properly. In our example, if we are monitoring the Exchange process, we will know immediately when the server has a problem delivering email.
- Synthetic Transaction: The previous two monitoring states are important and form the foundation of reactive monitoring in IT infrastructure. They are great in helping IT NOCs and engineers respond quickly to support issues but fail to help IT and the business proactively look for tends and potential problems before they arise. Synthetic monitoring actually helps IT and the business come to a common understand of how well the end to end transaction is working. Synthetic transactions simulate actual transactions such as reading a database, retrieving a webpage or authenticating a user session. Each transaction mirrors a real user transaction. Because you can use technology to simulate these transactions, you can now perform them 24×7x365 and use the data to understand when you are having problems. We can look at critical information that was previously unavailable such as average webpage load time, time to pull a record or other customer transactions. If the baseline for a webpage load is normally four seconds and it increases to eight seconds, you know that something has happened in your environment that has doubled the time it takes to load a page. That’s a very bad thing. The IT team can start looking at recent code or infrastructure changes that may have introduced latency into the environment.
Synthetic Monitoring Evolution
Now that we understand the three monitoring stages that IT organizations go through, we have to look at some of the vendors in the synthetic monitoring space and understand how they differ. I have implemented various tools that performed these functions. HP Mercury Business Availability Center is the most recent tool that I have used to baseline our applications and infrastructure. BAC is an excellent tool and provides important insight into how apps and the underlying infrastructure are doing. With BAC you can baseline your apps and look at trends over time to determine the affect of technology and webpage decisions have on site performance. But the fundamental problem with BAC is that it is dependent on the IT team that manages the tool to collect and distribute information about performance as shown in figure 1. 
Figure 1: IT Infrastructure creates and distributes reports of limited value
At my current company, our organization has implemented the BAC tool and is also the team responsible for collecting and presenting the information to out stakeholders. We become the gatekeeper and the bottleneck for getting information to people that need to make technical and business decisions. Another problem with this tool is that the development and QA organizations cannot make decisions based on the information they receive. Any actions that arise from the tool are based on inferences from data compiled and disseminated by the infrastructure team. An additional problem pointed is that any requests or changes to the information generated must go through the IT infrastructure team. The Dev, Business Intelligence and Marketing departments have to wait for their requests to be prioritized. This can lead to unnecessary delays that may translate into lost revenue opportunities or poor website performance.
Quality Engineering throughout the SDLC
I once had a client who complained that the environment we managed for them had lower availability than our Mercury BAC tools were telling us. They were using Gomez. Since “the customer is always right”, we worked with them to understand what Gomez was telling them and how it differed from our tools. We successfully addressed a number of internal issues that made our client very happy. In the process, we discovered that their tool was significantly better than ours. There are many tools that provide baselining and benchmarking of your applications. What is different with Gomez is that it looks at the collection of objects on the page and determines which objects are causing the delay in load time. Whereas Mercury BAC did a great job of proving our overall transaction time and availability, it didn’t decompose each object, each call, and each jpg, in a way that would tell us which calls were hampering the customer experience. Gomez measures the transaction times for each step in the sequence so we know where to look immediately for performance improvements.
How would you like to know more about your competitors’ webpage than they do? As part of the benchmarking capabilities, you can compare your website performance to theirs and see if they are more efficient in their page caching or object calling methodologies. Does your business think that site speed is affecting revenue? Find out by baselining your site against your competitors. If you’re faster than your competition, maybe throwing money at website performance won’t solve your problem. Maybe you have design issues that focus groups can help you uncover.
Most synthetic monitoring tools have nodes in data centers throughout the world that reside on the internet backbone. These
nodes poll your sites and report response times through their synthetic monitoring. Gomez does this but it does a lot more. In addition to backbone measurements, they have more than 40,000 desktops throughout the world at various internet speeds that also perform monitoring of your site.
In this example, the traditional backbone monitoring would lead you to believe that your site is performing reasonably. It takes 9.7 seconds to load a webpage with an availability of 98.05%. But this is high speed backbone performance (>100mbs throughput) that few people in the world will actually experience. When you look at the traditional DSL customer who is defined as low broadband, we see that the response time more is twice as slow. And if you hope that you can sell to that person still on dial up, they will have to wait over a minute for your page to download.
Developers can now work with the business on defining how graphics intensive they want the website to be based on actual load times. You no longer have to guess how your website will perform because Gomez tells you by bandwidth, geography, browser and OS.
Dev and QA can use the BrowserCam feature to select the operating, browser, and screen resolution that they would like to review before they release a new website or page into the production environment. Want to know how customers will view your page running on Windows XP with IE 7 in Los Angeles? No problem. That information and an actual screen rendering are only a few clicks away. Your IT department has the information it needs to know exactly what the finished product will look like when it is delivered to the business owner. And the business owner knows exactly what it will look like when it is released to their customers.
To say that Gomez has the potential to better align the business and IT would be an understatement. In a company that uses Gomez, the consumption model changes from the IT Infrastructure team acting as the gatekeeper and potential bottleneck of information to the many departments being able to directly access the information. Now everyone can access a tool for the purposes unique to their organizational needs. Infrastructure teams can still use it for SLAs and alerting like they always have. But now the QA and Dev teams can use it to develop and test code before it gets to the business or the customer. They will have a wide variety of test tools available to look at how their code will work on various operating systems and browsers. They will be able to test their applications and see how geography affects performance. They will be able to see how bandwidth affects their app also. Marketing and Sales can work with their Business Intelligence or Data Warehouse teams to match the marketing campaigns to a specific geographic market and the most popular browser. Are most of the users in your target market dial up users? If so, you can select to view only performance data based on those users. Dev and the business can work together to design a lightweight website that targets low bandwidth users. There isn’t any guesswork as to what it will look like or how long it will take those pages to load. Gomez tells you exactly what it will look like.
Nice!
A Disruptive Technology
I’ve only touched the surface of the power of Gomez. But the truth is that technology is only one component in successful IT alignment with the business. The other two parts of successful IT alignment with the business are people and processes. This is what potential customers will have to evaluate when the look at Gomez. It is much easier to embrace the past and consume reports from systems like Mercury BAC because it doesn’t require a change in your company’s culture or business processes. Gomez changes everything. Used effectively, people in departments from marketing, business analytics to QA an Dev can now learn how to do their jobs better because of the wealth of information Gomez provides.
Once you get your people committed to the tool, you have to look at your internal processes. If you are a developer or QA manager, you just can’t do things the same once this tool is deployed. You have to embrace it and change your existing SDLC. This will require a reevaluation of your software development lifecycle, your quality assurance processes and your release management function. You’ll be able to deliver with better quality and faster if you do. But you have to change. You just can’t ignore the information that Gomez presents or you are failing to do your job. And that’s why ultimately this is a disruptive technology.
People will have to embrace working tougher differently than they did in the past. People will have to look at their processes and see how they can leverage the tool. If you are the type of person that doesn’t want change your existing way of doing business, or if you think your corporate culture won’t accept reengineering how you design, develop and manage your infrastructure, you probably should pass on this tool. A successful Gomez deployment requires strong leadership from your IT and business executives because it will change the way your company does business.
And for most companies, that’s a very good thing.
More information on Gomez can be found on their website at www.gomez.com
That is a very interesting article on IT Monitoring, especially the three types of IT Monitoring.
NTT Com Asia also provides IT Monitoring Service, and from their site, I gather there’s a fourth type of IT Monitoring which is the Security Scanning Service, which help to ensure the high level of security for the network.
Comment by April — April 17, 2008 @ 8:53 am