APM SLAs in the Cloud : Is there a Rainbow on the Horizon?

At a recent IT trade show where I was on the panel discussing Cloud Providers, there was an interesting question from the audience about how to hold their Cloud provider responsible for slow performance of their applications. How to avoid the next catastrophic outage and ensure they had SLAs to penalize the cloud provider for sloppy application performance.

Reflecting on this a bit further, the answer might not be as simple as it seems. Is your Cloud provider really responsible for your application’s performance in their Cloud? Maybe its your network or Internet provider? Perhaps the DNS service provided by your friendly domain registrar is sloppy? Or could it be the slow CRM API provided by yet another SaaS provider that your application interfaces with? Current generation applications are distributed and multi-layered and the end services provided by these depends on an even more distributed set of applications.

Trying to create an SLA with so many inter-dependent components and so many providers is not trivial. Worse – there is no obligation today for the different providers to share any data on their IaaS or PaaS performance, even if this data is available to them internally.

While selecting a Cloud provider of any kind, you will need to start factoring in operational transparency and availability of their service performance metrics via APIs. The performance of their infrastructure is not as relevant as performance of the service they are providing – they might have redundancy and other design elements that might not impact their service even if their infrastructure fails, so getting their infrastructure performance metrics might not be relevant. You will also need a monitoring platform that has open API’s to aggregate the performance data from different Cloud providers and can give you a composite metric mapping into the performance of your service. And finally, your monitoring platform has to be able to provide a service oriented view of these performance metrics and not just traditional metrics like CPU and memory.

As more enterprises move to Cloud based services and infrastructure, their desire to work with a single vendor will force them to gravitate towards “cloud service aggregators” – a single vendor aggregating services from various cloud providers. However, enterprises will demand SLAs from the aggregate providers, and this will require some way to identify the responsible partner for failed SLAs or outages. There will be a need to get automated performance and SLA metrics from different downstream partners and correlate this data to provide aggregate SLAs for the enterprise. This requires transparency in operations, open APIs and uniform SLA measurements, and even though not prevalent today, this will become a necessity in the near future.

Posted in Uncategorized | Leave a comment

Recession driven Innovations: Agility, Automation & Predictive Analytics

Agility and velocity became the new success necessities of organizations in past few years, driven largely by recession realities and unpredictable business environments. Recession, viewed positively, boosted innovation and increased the pace of new process development and its adoption. The IT customer is increasingly global, the realm of the IT services grows larger every day and the sprawling, distributed IT components demand intelligent ways to manage and monitor this infrastructure.

The administrative burden for managing this burgeoning infrastructure will only increase for IT departments, unless they adopt processes and software to automate most of the burden. To automate processes, you need to integrate the different workflows seamlessly, which requires the software products to have flexible APIs. The order entry, provisioning, monitoring & billing workflows are all candidates for integration and automation. There have been significant advances even within the cloud monitoring and management solutions to reduce the administrative burden with the use of templates, threshold baselining and creating of service models.

The other innovation has been in the field of data analytics. The IT customer’s demands have always been dynamic, and IT departments have reacted by provisioning for the peak demand, resulting in wasted idle compute resources. Even the usage of application resources is dynamic by the hour and day of week, and it is increasingly important for IT departments to understand the behavior pattern of their network and applications in addition to the computing resources. The number of users, the response times, the queued messages, the database query rate – all vary by time of day and understanding the usage pattern and deviations from it helps isolate the root cause of IT service performance degradation much faster, and ultimatately, higher customer satisfaction. More importantly, using APM behavior patterns greatly reduces the amount of false alarms for IT Operations, and lower TCO.

Automation and Analytics are smart product features focusing on reducing the administrative burden in todays distributed Cloud environments. Keeping business necessities in mind, these innovative features are pragmitacally relevant and a must for all IT departments in today’s business environment.

Posted in Enterprise Monitoring, Uncategorized | Tagged , , , , | Leave a comment

APM –  The ‘A’ Dimension

As more applications migrate to the evolving private and public cloud infrastructure, and permeate through the sprawling distributed “new” IT environment, the Gartner published “five dimensions” of Application Performance Monitoring become essential requirements for any APM monitoring platform. Being able to capture end-user experience, topology, deep dive monitoring of components and analytics will enable IT operations to isolate APM performance issues, reduce the MTTR for application services, and ultimately higher user satisfaction.

With the arrival and rapid adoption of Virtualization and Cloud infrastructure, reduction in costs for ‘incremental units’ of computing power, the ability to more easily flex up and down as needed, and the lack of restrictions imposed by the traditional models, are all key drivers to migrate applications to this distributed cloud infrastructure. However, with these benefits comes the added burden of high administration overhead from managing a virally sprawling & dynamic IT environment.

As the number of discrete virtual servers, components and resident applications explodes, performance monitoring and intelligent analytical needs to make rapid decisions will be critical for IT operations. Manually intensive legacy point monitoring tools will not be able to keep up in a dynamic and complex environment where applications can move in almost real time across the underlying IT infrastructure.

Of course, the better approach would be to utilize and adopt the APM “A” dimension – Automation in the monitoring platform to reduce the burden from routine administrative tasks for application monitoring. Implementing the right systems and processes and finding a monitoring solution which uses a good degree of automation is essential to gain back the efficiency lost from the increased complexity of distributed application infrastructure. Automation in the area of monitoring will ensure consistency in performance monitoring and benchmarking, enabling IT Operations to make better and proactive decisions for application performance. As JP Garbani at Forrester said recently, gaining the right level of productivity in IT operations will come from using better tools, and specifically, automation.

Posted in Enterprise Monitoring, Uncategorized | Tagged , , , | Leave a comment

Cloud Monitoring Software: Automation and Intelligence Not Optional

The quest to improve the productivity and efficiency of IT organizations is an ongoing one. A number of technologies and processes have been adopted over the decades to make IT operations leaner and more effective. With the arrival and rapid adoption of Virtualization technology and Cloud infrastructures in the past few years, IT organizations worldwide are starting to realize significant economy-of-scale benefits. Reduction in costs for ‘incremental units’ of computing power, the ability to more easily flex up and down as needed, and the lack of restrictions imposed by the traditional models, will all drive a dramatic increase in the consumption of computing and application resources as organizations will be freed up to do more. On the flip side, steps will need to be taken to deal with the resulting increase in the administration burden, else the efficiency gains realized from shared, flexible IT infrastructure will be outstripped by the high cost of managing a more dynamic and complex environment.

Terms like “virtualization sprawl” have been coined to refer to the increase in the number of discrete virtual servers and related application components within the overall IT environment. This is no longer a hypothetical scenario, and organizations are already experiencing administration challenges because of the fundamental IT transformation driven by virtualization and cloud technologies. Consider the case of a leading educational institution in the Northeastern United States. Prior to embarking on an aggressive virtualization initiative, the operations team was responsible for ensuring the performance of approximately 1000 distinct physical servers. By the time the first phase of the server consolidation and virtualization initiative was completed, the team was tracking and managing the performance of over 7000 virtual servers!

As the number of discrete virtual servers, components and resident applications explodes, the performance monitoring and root-cause-analysis demands on IT administrators will multiply exponentially. Manually intensive legacy and point monitoring tools will not be able to keep up, and organizations will face significant challenges in detecting and resolving issues in a timely manner. In one recent case of an organization being overwhelmed, the IT team resorted to forced daily ‘proactive reboots’ of a large number of their servers. The team claimed that this workaround was the only way to keep the infrastructure performing, given the absence of a comprehensive monitoring and management solution to identify real issues and isolate problem sources. The IT team acknowledged that the organization’s users and business operations were being impacted by this daily reset cycle, but viewed this approach as the lesser evil compared to blind, reactive fire-fighting!

Off course, the better approach would be to take a more strategic stance and implement the right systems/processes to assure the performance of their IT infrastructure. Today’s cloud monitoring software solutions have to be capable of supporting automation of many of the routine administration tasks. More importantly, these systems need to have in-built intelligence to infer what his going on in the IT infrastructure and automate decision-making. The increased demands on the IT team will be partially offset by the automation capabilities of the monitoring solution, allowing IT personnel to focus on the deeper and more complex administration tasks. Furthermore, the overall efficiency and utilization of IT resources will be higher with the right capabilities in the IT monitoring software (see http://tiny.cc/cwytn to learn how).

Posted in Enterprise Monitoring, Uncategorized | Tagged | Leave a comment

IT Monitoring Software Can Improve Operations Efficiency

Over the past few years we have seen significant growth in MSP activity, fuelled by continued emphasis on outsourcing by enterprises, increased adoption of IT infrastructure services, and emergence of specialist software service providers. In contrast, the overall global economic picture has recently become less rosy, given postponement of investments by organizations in light of uncertainty, the curtailment of spending by governments to reign in spiraling deficits, the sovereign debt crisis in Europe, and the gloomy consumer sentiment in the US. Although this will invariably have an impact on the IT industry, MSPs do not necessarily have to put the brakes on growth plans, and should explore how best to manage increased operational demands with existing resources. One area that MSPs can quickly and easily make some gains is through increasing the efficiency and effectiveness of IT operations, system administration, and network engineering personnel. And here, advanced network and cloud monitoring software solutions can directly provide a number of operational benefits.

One the first places to focus in on is reducing, and ideally eliminating, the time your team spends chasing down false alarms or false positives. Network monitoring software solutions that have smart notification engines that account for topological relationships (e.g. don’t generate an alarm for a downstream device if the upstream device is down), or apply intelligent rules to recognize short-duration flaps, are able to reduce the ‘noise’ and help reduce the time operations personnel spend responding to spurious events (to learn more see http://tiny.cc/vsiny). Additionally, Zyrion’s software for example, has time-based or adaptive thresholds that automatically learn and apply time-period specific warning or critical thresholds, which allows setting alarm triggers that match varying patterns of use or load in the IT infrastructure. For example, if nightly back-up jobs increase the utilization levels of a server during the evening hours, then you can set higher utilization threshold levels for this time period so that unnecessary alarms are not generated. The daytime thresholds can be set to be lower to ensure that a quality end-user experience is provided.

Enhanced administration features, such as being able to define and manage maintenance schedules, can help reduce alarm floods for ‘offlined’ devices. Scheduled maintenance functionality allows defining in advance any number of time periods for automatically suspending device tests at the start of the time-period, and then automatically resuming the tests at the end of the time-period. This simplifies the process of performing maintenance tasks on devices and applications, by halting alerts while the IT component is offline. Once a device is suspended, the data collection for all the tests on the device is suspended, and thus no alarms or notifications will be generated.

IT teams often spend a significant amount of time analyzing and isolating sources of problems. Here again, the right monitoring tools can have a significant beneficial effect. For example, being able to rapidly drill down from a high-level dashboard view, to the device and test detail, all the way to network flow graphs, all with a few mouse clicks, allows you to instantly identify the ‘top talkers’ on the network to pinpoint potential causes of problems. When analyzing alarms and events, the ability to quickly see related events (e.g. occurred at the same time, similar device type, etc.) in the monitoring console gives you indicators of linked or correlated problem areas.

IT, network and cloud monitoring software should be viewed as more than just performance assurance tools. They can have a significant impact on the effectiveness of your operations and administration team as well, and can enable you to do more with less in these challenging economic times.

Posted in MSP Tips, Uncategorized | Leave a comment

Dynamic Environment: Is Your IT Monitoring Software Built for Change?

Winston Churchill once said, “To improve is to change. To be perfect is to change often.” Whether the impetus for change is driven internally, or imposed on organizations by external factors, the reality is that business organizations are constantly changing and evolving. This includes Managed Service Providers (MSPs) as well. Whether the transformation is at a macro or micro level, the IT infrastructure will invariably be impacted given the critical role IT plays in enabling and executing business processes today.

This clearly has implications on your network and IT monitoring software, given that your management tools have to be capable of assuring the effective performance of your and your customer infrastructure as changes occur. There are a number of factors that need to be accounted for to ensure that your IT monitoring and management systems are able to keep up as your business grows and evolves. Most importantly, you need to make sure you minimize the resource costs and lead-times to keep pace.

To begin with, your monitoring software platform has to be agnostic to the type of performance data being gathered and analyzed, and have mechanisms to capture data from a variety of sources. Although the initial deployment may be focused on monitoring the health of a say set of Windows servers using WMI, a need may arise down the road to capture metrics from a security appliance via SNMP Traps, and at some point processing of system logs for a new custom application may become necessary. Common event management, notification, visualization, dashboard and analysis capabilities will also enable seamless inclusion of new IT components.

The monitoring software should provide intuitive UIs and workflows for quickly supporting additional components as the IT infrastructure expands. The ability to define and use templates, leverage pre-existing monitoring profiles, as well as clone current configurations, all shorten the process of dealing with expansion, whether in your infrastructure or that of managed services customers. Having the facility to centrally handle configurations and push these out, or restore them, further simplifies and streamlines your administration processes within a dynamic infrastructure.

Supporting flexible rules and thresholds for generation of events / alarms is becoming increasingly important. As business needs may vary going forward, the criteria, conditions and context for what should be considered a performance degradation warning versus a critical actionable event may change. A “one-size-fits-all” approach that was okay early on may no longer be tenable, especially if your customer base is expected to become more heterogeneous in terms of size, characteristics or geography.

Assuring the performance of IT infrastructure is a key element of delivering quality, reliable and value-rich managed services. Given that business change is inevitable, and the knock-on effects on the IT infrastructure are unavoidable, you need to make sure your IT and network monitoring software is ‘built for change’.

Posted in MSP Tips, Uncategorized | Leave a comment

Time for MSPs to Take Steps to Deploy Network Monitoring Software that can Seamlessly Support IPv6 and IPv4

On June 8th, hundreds of enterprises and service providers participated in a 24-hour, large-scale “test flight” of IPv6 technology. The event was coined as World IPv6 Day, and was organized by the Internet Society. The purpose of the event was to energize, educate and motivate organizations across the IT and communications industry to prepare their services for IPv6 to enable a successful migration as IPv4 addresses begin running out.

Although much of the current focus on the migration to IPv6 is around the nuts and bolts of making external facing services such as DNS work cleanly in a hybrid world, as well as the use of IP addresses to interconnect distributed server, storage and network elements, organizations need to also be thinking about internal controls, management systems and frameworks as part of the transition.

A critical part of transitioning to IPv6 technology involves ensuring that the right network monitoring software systems are in place to assure the performance of complex networks, data centers and cloud infrastructures. For MSPs that deliver services that may be tied to customer-owned or remote IT infrastructure, the preparation to deal with a hybrid IPv4 and IPv6 world has to be done much more proactively. In some cases, MSP services may extend back into managing enterprise data center components and applications, which could be using different IP versions. If the MSP is on the hook to deliver against agreed to SLAs or performance levels, then it needs to have clear visibility into the health and performance of the entire IT infrastructure that is part of its scope of coverage.

Your Strategy and Opportunity

It is time to start taking steps to trial and implement network and IT monitoring software systems that can seamlessly monitor IPv6 and IPv4 servers and network devices in a hybrid environment. Given that hybrid environments will coexist for a while, these monitoring solutions will enable organizations to uniformly discover and provision IPv6 devices, and collect and analyze performance data, all within one integrated system that supports IPv4 devices as well.

Users can ignore the intricacies of managing different types of devices, and are able to benefit from a unified management and operational view of their entire IT infrastructure. Being able to capture performance metrics from the full IT and cloud infrastructure, and then correlating the data and linking this to supported business services is critical to ensure the effective delivery of services and assure business operations in the new dynamic environment. These systems address this need by providing a service-oriented, end-to-end, performance view, whether IPv6 based or otherwise.

I recently asked a service provider where they were in their overall strategy for migrating to IPv6. Although they are waiting for customer demand to pick to make firm operational commitments, they have started testing a variety of internal IPv6 configurations, including the monitoring and management aspects. Although you may be taking tentative steps towards embracing IPv6 infrastructure, being prepared in advance by having the management tools in place will ease the process as you make the transition from an all IPv4 to a hybrid to a fully converted environment.

Posted in MSP Tips, Uncategorized | Tagged , , | Leave a comment

Network Monitoring: Combine ‘Bottom-up’ Analysis with ‘Top-Down’ User Experience Measurements

A recent survey conducted by Network World revealed that most IT managers were unable to measure end-user experience with their traditional network monitoring software tools.  Over 50 percent of the survey respondents identified page response time, server query response time and TCP transaction response time (key measures of end-user experience) as being important, yet were not able to measure these metrics with their existing management tools. The survey highlighted a need for IT and network management software that is able to monitor the performance of IT from a user perspective (e.g. end-user page response time), as well as monitor the performance of the various underlying network, server and application components that make up the layers of infrastructure that enable delivery of services.

Although there are specialist solutions that support end-user experience monitoring, these tools are generally not pre-integrated with management tools that monitor the health of the underlying IT infrastructure. Having the linked ‘top-down’ and ‘bottom-up’ views and integrated capability within one IT monitoring system allows tracking service performance and user experience metrics, and then if problems are detected, the solutions facilitate drilling down to view and analyze the technical performance metrics for the various enabling components (e.g. CPU utilization of the application server).  This capability allows rapid and context-specific identification of potential causes of degradation of end-user experience. Having unified, correlated, status views allows the IT team to not only better assure the real-time user experience, but also conduct detailed analysis on areas of performance issues and bottlenecks in the underlying IT infrastructure.

Organizations that are in the midst of exploring new network monitoring software solutions can look for the following types of capabilities to get an integrated view of performance. Does the solution monitor metrics, such as response time, for complete multi-step end-user transactions? Ideally, any number of multi-step test transactions should be definable, where these tests can be monitored alongside the other device or server specific tests to generate alarms when thresholds are violated.  As part of scripting a transaction step, the user should be able to select specific frames and links for navigating through a particular path for testing purposes. Additionally, secure pages should be accessible by providing the relevant authentication credentials. As part of the scripting process, when the user clicks through to the next step, the software needs to be capable of performing basic validation to ensure that the transaction being scripted can indeed be executed without application access errors.

Combining transaction monitoring and infrastructure monitoring in one system, and then taking this one step further by mapping services to the relevant top-down and bottom-up metrics (see example of service monitoring solution at http://tiny.cc/mpqxn ), allows organizations to monitor service performance from both technical and end-user perspectives. As the overall IT infrastructure becomes more dynamic and complex with adoption of new technologies such as virtualization and cloud, the ability to unify and tie infrastructure monitoring with end-user experience monitoring will allow organizations to better assure overall business performance and customer satisfaction.

Posted in Enterprise Monitoring | Leave a comment

Integration APIs: Connect Your Network Monitoring Software

The history of Enterprise Software is riddled with examples of organizations never having realized the value of purchased solutions given the high-cost and complexity of the “integration hurdle.” Based on past experiences and today’s environment, MSPs and businesses want enterprise software solutions that can be made operational quickly, without dependence on lengthy integration or implementation projects. Even if this requires forgoing some of the advanced functionality promised by the more complicated solutions.

Fortunately, most of today’s network monitoring software systems can quickly, and fairly inexpensively, start performing basic infrastructure management for industry standard devices. Unlike, say a billing application, no complex integration with other enterprise systems, such as ordering or fulfillment, is required to get going out of the gate. That said, MSPs need to be careful that the monitoring software does not become stranded on an island, and that the solution indeed does have the flexible APIs and interfaces to connect with custom data sources and enterprise applications, and link into other IT service management processes (read whitepaper on ITIL alignment).

With a less complete tool, the immediate satisfaction from seeing metrics being gathered and alarms being displayed on status screens (for a relatively low starter price) can quickly give way to challenges further down the road as your business evolves. In a world where success is measured on a quarterly basis and where shorter horizons tend to favor decisions based on tactical factors, stepping back and taking a longer-term, strategic view in selecting your network monitoring software will pay dividends. I am not suggesting that you discount the ability to quickly operationalize the software, as that is table stakes. But, I am recommending that you consider the ability of the software to adapt to and interact with your changing IT environment over time.

How to Get Started

Let me share an illustrative example of the importance of considering a broader set of factors in selecting your network monitoring software solution. A few years back, a newly launched MSP’s immediate and somewhat moderate monitoring needs centered around ensuring the performance of the core IT infrastructure consisting of a number of switches and physical servers. Metrics needed to be captured and compared against thresholds, alarms were required to be displayed on an event management console, and notifications had to be emailed to the operations technicians.

Within three months of going live, the need arose to capture and process performance metrics from two custom applications. The MSP was able to utilize the monitoring software’s universal, external data-feed API to inject these metrics into the system, and then apply custom rules for event generation. The performance data and alarms were displayed alongside those of the other standard devices. As the MSP’s business accelerated rapidly, the operations team quadrupled in size and the IT team implemented a centralized user management application. The monitoring software had the integration framework to override the inbuilt authentication, and was able to utilize the new external authentication database to control access to the system.

Continued Evolution

The MSP’s service offerings continued to expand, and along with that the heterogeneity of the IT infrastructure increased. The built-in action profiles in the monitoring system that are triggered when events occur were no longer adequate. These standard action profiles were augmented using the supported custom plug-in framework, which was capable of running external programs as well. Device names and test information was easily passed to an external application to build highly flexible actions, some of which involved using the monitoring system’s API to query the state of another device before executing a corrective action.  Additionally, certain performance data needed to be fed into an external web portal application, which again was facilitated by the network monitoring software’s data access API.

Although the immediate monitoring needs of the MSP were pretty basic when the software purchase decision was made, the team had considered extensibility and integration-support as part of their evaluation criteria. They recognized that these were important future requirements. Make sure your network monitoring software system is not stranded, and can indeed keep up with your evolving business.

Posted in MSP Tips | Tagged , | Leave a comment

Service Monitoring: Give Customers Real-time Visibility

During a recent training engagement with a new customer, we received a somewhat frantic call from our IT contact. She sounded a bit stressed and said that our consultant would need to remain on-site one more day for a follow-on training session. The IT contact emphasized that this was not something she had planned for, nor foreseen, and it had come up quite suddenly.

Indeed, the IT group had given a presentation to the company’s executive team on their recently deployed network monitoring software. Upon seeing a demonstration of the top-level Business Service dashboards that displayed key metrics on the performance of critical business services, the VP of Customer Care immediately requested that he wanted himself and his directors to be trained on accessing and customizing the dashboards. Interestingly, this is not the first time we have experienced this kind of a reaction from senior managers and business owners.

Whether an internal IT organization supports the business or whether MSPs manage the IT infrastructure and applications, the business and management constituents want real-time visibility on the performance of business services that are dependent on the underlying IT infrastructure. They are no longer content with after-the-fact reporting, and want direct access to the relevant data to view service performance and validate compliance against targets. The prevailing assumption has been that IT monitoring tools are not able to deliver actionable information that is relevant to business owners or senior managers. That is no longer the case.

Opportunities and Challenges

The demand for greater visibility presents both an opportunity and a challenge for MSPs. Being able to provide timely and relevant performance information to business owners and senior managers through mechanisms like service impact dashboards and real-time status reports will enhance MSP offerings. On the other hand, MSPs will need to make investments in the appropriate systems that enable providing a service-oriented view of the IT infrastructure and applications.

Monitoring software systems will need to have two primary underpinnings to meet the challenge:

  • The first, and the more critical one, is having built-in, pre-integrated Business Service Management (BSM) capability that links the underlying IT infrastructure to business services. Within a fully integrated BSM environment, information is presented in a way that is relevant to the user roles within an organization. The business owner can access a rolled-up dashboard view of the metrics on which the business services depend. The information in this view is described in business terms. An IT operations person can simultaneously view the detailed performance data plots for a given piece of the underlying IT infrastructure, where the data is defined in technical terms. To learn more about BSM solutions, read a BSM overview whitepaper.
  • The second requirement is that the monitoring system needs to have full multi-tenancy support. The application has to allow creation of read-only, read-write and admin users within a domain, and admin users across domains. The ability to enable look-and-feel that is driven by the customer login (e.g. custom logo, role-specific layouts, custom stylesheets, etc.) will be important to ensure an intuitive and high-quality user experience for business users and senior managers.

Given the appropriate tools, such as service health dashboards, MSPs can enable business owners to proactively monitor/verify the status and trending of business service performance. To get started, MSPs need to learn more about some of the organizations that are embracing BSM solutions.

With the right level of integration (and correlation) between the business service monitoring layer and the underlying network/IT monitoring technology, business owners can better understand the impact of the complex mesh of underlying infrastructure on the specific outcomes of interest in their business services. MSPs should consider adopting tools that deliver real-time business service assurance capability, and give business owners the visibility they desire.

Posted in MSP Tips | Leave a comment