Saturday 20 March, 2010


PDF
Print
E-mail
Addthis
Beware of VM Sprawl



Virtualization was supposed to solve server sprawl and offer a simple route to data center consolidation. It hasn't. Virtual machines are so easy to deploy that many organizations are now suffering VM sprawl. They have reached the point where they do not know how many VMs they have and have no way to manage them. Rampant virtualization can turn out to be a curse, not a blessing. How can it be controlled?

There is no doubt that virtualization can save money for organizations large and small. Kelly Services, a $5.5 billion temporary staffing and outsourcing firm headquartered in Troy, Mich., is in the midst of a VMware deployment. By the time it is finished, the company will end up decommissioning 269 older servers without having to purchase new hardware.

Similarly, the City of Alexandria, La., consolidated 32 1U and 2U IBM xSeries servers onto two IBM System x366 servers. "Continued maintenance agreements on older servers were expensive, as were the high power consumption and the cooling issues," says Blake Rachal, system analyst for the City of Alexandria. "It was like a snake eating itself so we had to do something to break the cycle." Going virtual slashed power and maintenance costs while allowing them to provision a new server in 15 minutes rather than three weeks. "VMware has got to be the most cost-effective solution I have ever seen," he says.

But unless virtualization is properly planned and managed, it can grow into a tangled morass that consumes more personnel and money than the previous operating structure.

"Because it is so easy to stand up a new VM, they tend to proliferate, resulting in what is called VM sprawl," says Andi Mann, an analyst with Enterprise Management Associates. "So you multiply the volume of systems you need to manage, increase the depth of management that you need, and yet you have insufficient management tools to deal with that."

To help data center managers cope with expanding numbers of VMs, vendors are coming up with various ways of managing virtual and physical environments as a single entity.

Spreading Sprawl


Maybe it is a regular 10-year cycle. A decade ago, low-cost x86 boxes and Linux clusters were destined to finally put mainframes out of business. But as the number of servers grew, so did the headaches.

Take the case of Kelly Services. It reached the point where it had 800 servers in its 6,000-square-foot data center—19 different types of Dell 1U, 2U and 3U servers and an Oracle/Linux grid running on two dozen IBM 3850 servers. "Our floor space was becoming cramped because of the server sprawl," says Bob Brachulis, Kelly's director of infrastructure and services. "Each application someone dreamt up would require a server."

Server consolidation and virtualization offer a path to eliminate such server sprawl. Companies have jumped on-board with this strategy, installing approximately six million virtual servers with many more on the way, according to International Data Corp. (IDC) estimates. Although virtualization can provide a solution to physical server sprawl, the virtual servers themselves can get out of control and cause costs to skyrocket.

"Organizations are quickly realizing that virtualization without proper planning, management and service optimization capabilities can lead to performance headaches that can stymie attempts to leverage the technology for real business advantage," says John Madden, an analyst for Ovum Research.

Enterprise Management Associates conducted a study of sprawl and found that most people cited the inability to see, manage and control VM sprawl as one of the biggest problems. "Virtualization, by definition, is a way of hiding reality, so getting virtual machine sprawl is an inexorable consequence," says Mann. "More virtual machines just means you need more people to manage them and there is more that can go wrong."

Mann's research showed that the difference between organizations that take control of their virtual machines and those that do not can be as much as $3,300 per virtual machine just in staff costs. And that is only the beginning. Each virtual server is also subject to licensing costs for the operating system and applications, even if they are not serving any business purpose. Zombie servers are an even bigger problem with virtual machines than they are with physical servers due to the ease and speed of setting up a VM. A VM set up for a quick test or other short-term need can keep running indefinitely. Embiotics, a VM management software vendor in Ottawa, conducted a survey and found that in a typical virtualized environment approximately 30 percent of the VMs are unused. An organization with 150 VMs is wasting $50,000 to $150,000.

"You think of virtual machines as free—they just go on the hardware—but they are far from it," says Mann. "Your software vendor may end up holding you over the barrel for any license violations."

Getting an Early Start

Gaining control over virtual machines requires establishing policies over the full life cycle of the servers, from capacity planning, through configuration, usage and decommissioning. "Putting in place a process that includes approval steps, lets you prevent virtual machines from being deployed out of process," says Mann. "Then, together with discovery and configuration, you can see machines that have not been approved for provisioning and get rid of them."

The best time to start managing VMs is before they are deployed, through use of capacity planning and modeling tools. This includes selecting the mix of virtual machines on a box to take full advantage of the capabilities of that server. A single application, which is processor or I/O intensive, can saturate the CPU or network interface, crippling the performance of other VMs trying to share those resources.

"The beauty of capacity management is that it allows you to do what-if analysis for use-case scenarios, determine the impact of changes on system performance and build mathematical models to calculate the impact of the changes," says Tim Grieser, program vice president for the enterprise system management software group at IDC. "It is invaluable in supporting ongoing sizing and provisioning initiatives."

That is the approach Kelly Services is taking with its VMware deployment. Every server is investigated and tested in a lab environment before migrating to a VM in the production environment. "We found it is essential to have strong processes up front: change control and monitoring, planning what we are going to do, knowing everything about that server and the application before moving it, and then monitoring to make sure there is no impact to the service," says Brachulis. "It is very important to us that the team using the server and application don't notice the difference."

Gaining Visibility

The problem with gaining control, however, is the masking feature of the hypervisors. To manage virtual servers, not only must the hardware statistics be visible, but also metrics about each of the virtual machines running on each physical server. Otherwise there can be a mismatch between the properties of the physical and virtual machine. "VMs exert a performance overhead compared to physical servers, though we expect this to diminish over time," says Grieser. "In addition, some workloads don't run well in such an environment, and network traffic can degrade under virtualization."

Grabbing control of VMs requires software that can work across both physical and virtual resources. The main hypervisor vendors—VMware, Sun, Microsoft, IBM and HP—have management software for their own VMs; however, these are not adequate for complete management of a virtual, heterogeneous environment. "The basic bundled tools do not have a broad view of performance across multiple hosts, and subnets do not understand physical performance issues and are not really aware of applications let alone the interactions of multiple components in a composite application or business services and priorities," says Mann. "Basic tools are not enough to guarantee SLAs based on business performance objectives."

Other commercial and open source applications may not meet the needs of a virtualized environment either unless they are specifically designed for that purpose. As a result, according to Gartner, Inc., organizations spent more than $900 million last year on virtualization management software. Gartner expects that figure to hit $1.3 billion in 2009. While mergers and acquisitions are likely to consolidate the market over the next few years, there are currently more than 100 vendors providing solutions for one or more portions of the virtual server management stack.

"More and more, these toolsets are working with physical and virtual environments in a way that they didn't a year ago," says Mann. "We now have very strong capacity planning tools and configuration tools that really dig into the virtual machines rather than just the physical."

Finding the Right Tools

As part of its virtualization process, Kelly Services opted to upgrade its management software. It had been using the open source Nagios management software, but switched to a set of tools from BMC Software, which includes discovery, change management, event management and performance management.

"With Nagios, when a server was having a problem, we would still have to go in and figure out what service was being impacted," says Brachulis. "Now we can look at the BMC impact manager, determine what service is being impacted and what component failure is causing the problem. Instead of the help desk alerting us when there is a problem, now we let them know they will start getting phone calls and that we are already working on it."

Interactive Data Pricing and Reference Data, a division of Interactive Data Corp., uses TeamQuest Performance Software to monitor systems performance on its virtual and physical servers. The company provides global securities pricing, evaluations and reference data to financial institutions and investment funds. It collects, edits and maintains data on more than six million securities, delivering the data through Web-based applications, direct data feeds and via requests from third-party software. "Our typical customer would have Web-based tools or PC-based applications that make use of our applications," says Steve Amichetti, Interactive Data's manager of VM systems.

While Kelly Services is virtualizing small Dell servers, Interactive Data uses UNIX (Solaris, AIX and HP-UX) servers, with some Linux added into the mix. The company had been getting by with basic UNIX tools such as VMstat, TOP and other freeware to monitor system performance. But Amichetti found that the tools did not scale well beyond a small number of servers. "While these tools have value, they didn't allow us to monitor and analyze a large number of systems effectively," he says. This led him to look for a commercial product that would better enable them to meet service level agreements.

"We looked at a half-dozen products but found that TeamQuest could do it all when it came to capacity planning and service management, he says. "After a successful proof of concept, we decided to purchase the product and began to introduce it to our systems gradually." He now has it loaded on all the production systems, most of the preproduction machines and some of the systems in development. TeamQuest gathers the data at one-minute intervals, while VMstat gave it every 15 seconds. He still uses VMstat when he needs higher granularity, but for most uses he relies on TeamQuest's greater functionality.

"TeamQuest is a complete toolset that enables us to really dig in to find out what's going on," says Amichetti. "We can compare the good days against bad days to zero in on the exact problems, isolate bottlenecks and determine the proper corrective action."

He gives the example of when a system administrator noticed a failed server within a cluster. Previously, Amichetti might have spent days of trial and error trying to solve the problem. Instead, he just pulled up TeamQuest View and noticed an especially high jump in the number of processes started for a particular workload. He drilled down to pinpoint a specific userID that was initiating tens of thousands of processes and causing the cluster node to failover.

"VMstat and TOP wouldn't have caught this problem," he says. "TeamQuest showed us in minutes what was causing the problem. We immediately tracked down the guilty party and took care of it."


About the author
Drew Robb is a freelance writer based in Los Angeles, Calif.
PDF
Print
E-mail
Addthis
 
Related Articles
The dynamic infrastructure: Solving today's problems while seizing tomorrow's opportunities

For organizations today, achieving optimal business results demands a bold new approach to IT.

. read more
Enterprises that seek a successful move to virtualization have a number of hurdles to overcome. Not all are obvious. read more
Excerpt from Storage Virtualization for Dummies - Hitachi Data Systems Edition

Chapter 6: Ten Best Practices for Storage Virtualization


In This Chapter:

- Putting in place a process for establishing physical configurations
- Choosing the right way to implement data migrations and tiered storage
- Getting the most from thin or dynamic provisioning
- Discovering scenarios in which storage virtualization can improve IT efficiencies

. read more

Login

Latest Video

How to implement access and change control for Group Policy?

Latest Event

X-Force Meet the Experts

March 23, 2010

The IBM X-Force team is dedicated to internet security. Its ... click here

Portal Switch