Big Data in Application and Cloud Performance

Big Data in Application and Cloud Performance- Why & How

Vikas Aggarwal

CEO, Zyrion Inc.

Always regarded as a non-critical part of day-to-day operations in the past, Big Data and its delayed analysis was relegated to batch processing tools and monthly meetings. Today, as the IT industry has snowballed into a fast moving avalanche of Cloud, virtualization, out-sourcing and distributed computing, the science of extracting meaningful intelligent metrics from Big Data has become an important and real-time component of IT Operations.

WHY BIG DATA IN CLOUD PERFORMANCE TOOLS

No longer do IT management systems work in vertical or horizontal isolation as just a few years ago. The inter-dependence between IT Services, applications, servers, cloud services and network infrastructure has a direct and measurable impact on Business Services. The amount of data generated by these components is huge and the rate at which this data is generated is so fast that traditional tools cannot keep up with any kind of real time correlation. The combined volume of data generated by this hybrid infrastructure can be huge, but if it is correlated properly, it can give mission- critical insight into:

  • the response times and behavior of an IT service or application
  • the cause of performance degradation of an IT service
  • trend analysis and proactive capacity planning
  • see if SLAs are being met for business services

This data has to be analyzed and processed in real-time in order to provide proactive responses and alerting for service degradation. The data that is being collected can be structured or unstructured, coming from a variety of systems which depend on each other to offer optimal performance, and has little to no obvious linkage or keys to one another (i.e. the data coming from an application is completely independent of the data coming from the network that it is running on). Some examples of data sources that need to be correlated are application logs, netflow, JMX, XML, SNMP, WMI, security logs, packet analysis, business service response times, weather, news, etc.

Enterprises are moving to hybrid cloud environments at an alarming rate and all customer surveys indicate that the complexity of these platforms are their biggest concern. Enterprises must adopt monitoring systems that are flexible and can handle Big Data efficiently so that they can offer real-time responses to alarms and get meaningful business impact analysis from all of the different data sources.

Contextual analytics and presentation of data from multiple sources is invaluable to IT Operations in troubleshooting poor application performance and user satisfaction. As a simple example, a user response time application could send an alert that the response time of an application is too high. Application Performance Monitoring (APM) data could indicate that a database is responding slowly to queries because the buffers are starved and the number of transactions is abnormally high. Integrating with network netflow or packet data would allow immediate drill down to isolate which client IP address is the source of the high number of queries.

HOW TO HANDLE BIG DATA FOR CLOUD PERFORMANCE

Traditional monitoring or BI platforms are not designed to handle the volume and variety of data from this hybrid IT infrastructure. The management platforms need to be designed to correlate Big Data from the IT components in real-time and provide feedback to the operations team for proactive responses. As these monitoring systems evolve, their Big Data correlation components will become richer and more analytical and will position these enterprises for the IT environments of the future.

New generation enterprise monitoring solutions that are scalable, have predictive analytics, multi-tenant and a granular security model are now available from a small number of vendors. Single use systems that are designed for just network data or just application data are trapped within the same boundaries that makes Big Data meaningless by its very nature, Big Data systems needs to be able to handle a very wide variety of data sources to provide greater uptime from faster troubleshooting and lower OpEx from correlated analysis.


Vikas Aggarwal is CEO of Zyrion Inc.,
a leading provider of Cloud and Network Monitoring software for large enterprises and Managed Service Providers.
You can read more about Zyrion’s Enterprise Monitoring Solution and how it handles Big Data on their web site.

 

 

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , | Leave a comment

MSPs and Big Data: Why and How

MSPs & Big Data – Why & How

Vikas Aggarwal

CEO, Zyrion Inc.

Always regarded as a non-critical part of day-to-day operations in the past, Big Data and its delayed analysis was relegated to batch processing tools and monthly meetings. Today, as the IT industry has snowballed into a fast moving avalanche of Cloud, virtualization, out-sourcing and distributed computing, the science of extracting meaningful intelligent metrics from Big Data has become an important and real-time component of IT Operations.

WHY BIG DATA IN CLOUD PERFORMANCE TOOLS

No longer do IT management systems work in vertical or horizontal isolation as just a few years ago. The inter-dependence between IT Services, applications, servers, cloud services and network infrastructure has a direct and measurable impact on Business Services. The amount of data generated by these components is huge and the rate at which this data is generated is so fast that traditional tools cannot keep up with any kind of real time correlation. The combined volume of data generated by this hybrid infrastructure can be huge, but if it is correlated properly, it can give misson critical insight into:

  • the response times and behavior of an IT service or application
  • the cause of performance degradation of an IT service
  • trend analysis and proactive capacity planning
  • see if SLAs are being met for business services

This data has to be analyzed and processed in real-time in order to provide proactive responses and alerting for service degradation. The data that is being collected can be structured or unstructured, coming from a variety of systems which depend on each other to offer optimal performance, and has little to no obvious linkage or keys to one another (i.e. the data coming from an application is completely independent of the data coming from the network that it is running on). Some examples of data sources that need to be correlated are application logs, netflow, JMX, XML, SNMP, WMI, security logs, packet analysis, business service response times, weather, news, etc.

Managed Service Providers are moving to hybrid cloud environments themselves and offering services ranging from security, backup, VoIP, applications and compute resources. Also, enterprises are outsourcing more and more of their IT performance management, and they expect the MSP to handle any and all kinds of IT data. Managed Service Providers must adopt monitoring systems that are flexible and can handle Big Data efficiently. Such IT monitoring platforms will allow them to offer higher value added services to enterprise customers. Once they have such versatile Big Data systems in place, they can offer real-time responses to alarms and alerts and give meaningful business impact analysis to these enterprise customers.

Contextual analytics and presentation of data from multiple sources is invaluable to IT Operations in troubleshooting poor application performance and user satisfaction. As a simple example, a user response time application could send an alert that the response time of an application is too high. Application Performance Monitoring (APM) data could indicate that a database is responding slowly to queries because the buffers are starved and the number of transactions is abnormally high. Integrating with network netflow or packet data would allow immediate drill down to isolate which client IP address is the source of the high number of queries.

HOW TO HANDLE BIG DATA FOR CLOUD PERFORMANCE

Traditional monitoring or BI platforms are not designed to handle the volume and variety of data from this hybrid IT infrastructure. The management platforms need to be designed to correlate Big Data from the IT components in real-time and provide feedback to the operations team for proactive responses. As these monitoring systems evolve, their Big Data correlation components will become richer and more analytical and will position these MSPs for the IT environments of the future.

New generation MSP monitoring solutions that are scalable, have predictive analytics, multi-tenant and a granular security model are now available from a small number of vendors. Single use systems that are designed for just network data or just application data are trapped within the same boundaries that makes Big Data meaningless – by its very nature, Big Data systems needs to be able to handle a very wide variety of data sources to provide greater uptime from faster troubleshooting and lower OpEx from correlated analysis.


Vikas Aggarwal is CEO of Zyrion Inc.,
a leading provider of Cloud and Network Monitoring software for large enterprises and Managed Service Providers.
You can read more about Zyrion’s MSP monitoring solution and how it handles Big Data on their web site.

 

 

Posted in Enterprise Monitoring, MSP Tips, Uncategorized | Tagged , , , , , , , , , | Leave a comment

IT Operations Need Integrated APM Dashboards & Packet Data

There are many proposed approaches to Application Performance Management – one approach is collecting performance data from the application itself while the other is collecting application data from packet data by sniffing on the network. Fetching metrics from the application process itself yields valuable data such as memory, buffers, cache and other such application data which cannot be obtained from the wire. On the other hand, performance metrics from the network itself gives a good breakdown of response times and delays from the different components of the entire service.

However, one of the key users of all these tools is IT Operations – and isolating the root cause of a slow performing application is of prime importance to them. Having all the data in disjoint, disparate systems and each requiring its own skill set is a battle that IT Operations has been fighting for decades, and presenting yet another set of cool tools does not bring them any closer to winning this battle.

APM Dashboard ScreenshotPresenting a unified dashboard of application process metrics and network or packet level metrics increases the usability and value of both performance data types. Contextual analytics and presentation of data from multiple sources is invaluable to IT Operations in troubleshooting poor application performance and user satisfaction. As a simple example, either of the two approaches above could indicate that a database is responding slowly to queries. The process level metrics would show that the buffers are starved because the number of transactions is abnormally high. Integrating with the netflow or packet data would allow immediate drill down to isolate which client IP address is the source of the high number of queries.

Current generation APM products also need to be flexible and capable of analyzing very large datasets in real-time to provide meaningful results to IT due to the complexities of new distributed, virtualized datacenters. There is a growing demand for such products as more and more enterprises migrate to hybrid cloud environments and downtime or degraded performance is no longer an option.


Vikas Aggarwal is CEO of Zyrion Inc., a leading provider of Cloud and Application Monitoring software for large enterprises and Managed Service Providers. You can read more about Zyrion’s cloud monitoring solution here.

Posted in Uncategorized | Leave a comment

How Much Product Functionality Are You Really Using?

Most software products in the ITIL stack – monitoring, ticketing, etc., all perform their basic functionality equally well when compared to other products in their class. However, best of class products have a lot more functionality and features that is typically forgotten in the craziness and urgency of operational deployments. In most cases, close to 60% of the product features are paid for but unused.

One might argue that these features are not important or needed in the enterprise. Interestingly, most of these unused features are precisely the reason for selecting the product in the first place and served as differentiators for selecting one product over the other. Using these advanced features would in most cases give must better ROI to the customer as is normally identified during the selection process. Yet a bulk of these differentiating features go unused

Some of the key reasons for not being able to derive greater benefits from software:

  • Using these new features requires a change in Operational Processes – and this change is usually not been factored in during the deployment.
  • the selection team is different from the end users of the deployed products, and they have not been exposed to the key differentiating features in the product
  • The desire to implement the software quickly and reduce implementation risk
  • Lack of skills to use the advanced features in the product
  • A very simple and effective way to address these issues is by training and re-training. In most cases, the customer purchases training at the beginning of the engagement, but they fail to do advanced training after the product is deployed and running for a while. Enterprises should make it a point to set aside budget and time for reviewing the product deployment a few months after implementation, and ask the software vendors to demonstrate how their advanced feature set can help reduce expenses and deliver faster ROI. This second stage of training would need to focus on how the product can interact better with existing OSS tools and improve processes.

    The deployment of any best of class product requires a two stage deployment – initially with the features which are the most common (and interestingly not the reason for selecting the product in the first place). The second stage (and most often overlooked) is much later after the initial deployment, and is when the best of class features are actually put to use.


    Vikas Aggarwal is CEO of Zyrion Inc.,
    a leading provider of Cloud and Network Monitoring software for large enterprises and Managed Service Providers.
    You can read more about Zyrion’s cloud monitoring solution here.

     

     

    Posted in Enterprise Monitoring, MSP Tips, Uncategorized | Tagged , , , , , , , , , | Leave a comment

    Climbing Up the Stack-MSP Customers Demand Higher Value in Monitoring Services

    Managed Services is now a mature industry, and as it moves further down this maturity path, trying to differentiate or win customers based on lower prices per device monitored can only continue for a little while. As Managed Service Providers race to adopt automation and other workflows to reduce their operating expenses, the price differentiator can only narrow between different providers.

    As one would expect, after having squeezed the last drop of overhead from the opex, Service Providers need to start looking at offering higher value and differentiated services to their customers in order to stay ahead. When life began as a VAR many years ago, the focus was on verticals and getting familiar with an industry. Today, with the transformation to a service provider offering varied benefits to the old customers, MSPs need to be able to monitor the custom applications and services within the industry. If you look beyond the email and web services, every one of your customers has a unique IT application or service that is the core of their business – whether its medical billing or an online gaming or streaming video application. Today, most MSPs monitor the IT infrastructure and databases for these custom applications, but few have extended the services for monitoring these custom applications.

    Monitoring your customers ‘custom’ applications and services requires a higher value sale – not only does the MSP have to understand their customer’s business and applications that support this service, but they also will need to leverage their monitoring software’s APIs or custom monitoring abilities to monitor relevant metrics from these custom applications. Not only does this require a sales person who can explain the benefits of doing this to the customer, but also a technical person with some programming level skills to use the APIs.

    The benefits are obvious – the end customer now has an MSP who can monitor their IT services which directly impacts their business bottom line, and the MSP now has a stronger relationship with the end customer because of the high value provided. Providing some of these service oriented performance metrics on a rich dashboard also raises the visibility of the MSP within the end customer’s senior managers – something they probably were not able to do if only monitoring devices and applications instead of IT Services.

    We are already seeing this transition and demand for higher value amonst the MSPs as this industry moves further along the mature phase. Software technology vendors are beginning to adopt technologies such as ITIL BSM into their offerings, since these best practice technologies are essential in order to deliver the customized monitoring services to the end customers.


    Posted in Enterprise Monitoring, MSP Tips, Uncategorized | Tagged , , , , , , , , , | Leave a comment

    Top MSP Priority – Refocus on OpEx

    Last month was spent meeting with a lot of our MSP customers and discussing how our new analytics & automation module fits into their operations. Each meeting had a common recurring theme – while the first phase of their business was customer acquisition, the current phase was focusing on reducing their operational costs. Deriving higher efficiencies in their NOC using better tools was again a top priority for senior management.

    A large number of the MSPs had been using open source free tools before, and as they had grown in size and operations, realizing that there is a cost to “free” which can add up pretty quickly made them switch to commercial products which focused on ease of use and lower TCO. This was a first step towards higher efficiency in their NOC.

    Interoperability between the different systems becomes an important requirement. Looking at the ITIL cheatsheet, at the very least you need monitoring, service desk, inventory and a billing system. Having an open API, and better still, existing connectors between these different systems so as to reduce the human interaction and automate as much as possible is an important factor. Even more useful, and often overlooked, is Workflow management – when a new customer is brought on board, where are the devices created, how do the monitoring and billing systems get provisioned, and how is notification escalated between the monitoring and ticketing systems.

    Automation and intelligence within each system is next on the list – how can each of the systems (monitoring, service desk, inventory, billing) provide more efficiency within their own functional areas. In helpdesk systems, being able to prioritize alarms, auto-escalation, schedules, etc. are useful features. In monitoring platforms, being able to reduce the noise and false alarms means faster time to resolution with fewer resources.

    While reducing opex has always been important to management, it seems to go through its phases. But the current vendor and analyst focus on automation and analytics is a good indication that using smarter tools to reduce opex is at the top of everyone’s priority again.


    Vikas Aggarwal is CEO of Zyrion Inc.,
    a leading provider of Cloud and Network Monitoring software for large enterprises and Managed Service Providers.

    Posted in MSP Tips, Uncategorized | Tagged , , , , | Leave a comment

    APM SLAs in the Cloud : Is there a Rainbow on the Horizon?

    At a recent IT trade show where I was on the panel discussing Cloud Providers, there was an interesting question from the audience about how to hold their Cloud provider responsible for slow performance of their applications. How to avoid the next catastrophic outage and ensure they had SLAs to penalize the cloud provider for sloppy application performance.

    Reflecting on this a bit further, the answer might not be as simple as it seems. Is your Cloud provider really responsible for your application’s performance in their Cloud? Maybe its your network or Internet provider? Perhaps the DNS service provided by your friendly domain registrar is sloppy? Or could it be the slow CRM API provided by yet another SaaS provider that your application interfaces with? Current generation applications are distributed and multi-layered and the end services provided by these depends on an even more distributed set of applications.

    Trying to create an SLA with so many inter-dependent components and so many providers is not trivial. Worse – there is no obligation today for the different providers to share any data on their IaaS or PaaS performance, even if this data is available to them internally.

    While selecting a Cloud provider of any kind, you will need to start factoring in operational transparency and availability of their service performance metrics via APIs. The performance of their infrastructure is not as relevant as performance of the service they are providing – they might have redundancy and other design elements that might not impact their service even if their infrastructure fails, so getting their infrastructure performance metrics might not be relevant. You will also need a monitoring platform that has open API’s to aggregate the performance data from different Cloud providers and can give you a composite metric mapping into the performance of your service. And finally, your monitoring platform has to be able to provide a service oriented view of these performance metrics and not just traditional metrics like CPU and memory.

    As more enterprises move to Cloud based services and infrastructure, their desire to work with a single vendor will force them to gravitate towards “cloud service aggregators” – a single vendor aggregating services from various cloud providers. However, enterprises will demand SLAs from the aggregate providers, and this will require some way to identify the responsible partner for failed SLAs or outages. There will be a need to get automated performance and SLA metrics from different downstream partners and correlate this data to provide aggregate SLAs for the enterprise. This requires transparency in operations, open APIs and uniform SLA measurements, and even though not prevalent today, this will become a necessity in the near future.

    Posted in Uncategorized | Leave a comment

    Recession driven Innovations: Agility, Automation & Predictive Analytics

    Agility and velocity became the new success necessities of organizations in past few years, driven largely by recession realities and unpredictable business environments. Recession, viewed positively, boosted innovation and increased the pace of new process development and its adoption. The IT customer is increasingly global, the realm of the IT services grows larger every day and the sprawling, distributed IT components demand intelligent ways to manage and monitor this infrastructure.

    The administrative burden for managing this burgeoning infrastructure will only increase for IT departments, unless they adopt processes and software to automate most of the burden. To automate processes, you need to integrate the different workflows seamlessly, which requires the software products to have flexible APIs. The order entry, provisioning, monitoring & billing workflows are all candidates for integration and automation. There have been significant advances even within the cloud monitoring and management solutions to reduce the administrative burden with the use of templates, threshold baselining and creating of service models.

    The other innovation has been in the field of data analytics. The IT customer’s demands have always been dynamic, and IT departments have reacted by provisioning for the peak demand, resulting in wasted idle compute resources. Even the usage of application resources is dynamic by the hour and day of week, and it is increasingly important for IT departments to understand the behavior pattern of their network and applications in addition to the computing resources. The number of users, the response times, the queued messages, the database query rate – all vary by time of day and understanding the usage pattern and deviations from it helps isolate the root cause of IT service performance degradation much faster, and ultimatately, higher customer satisfaction. More importantly, using APM behavior patterns greatly reduces the amount of false alarms for IT Operations, and lower TCO.

    Automation and Analytics are smart product features focusing on reducing the administrative burden in todays distributed Cloud environments. Keeping business necessities in mind, these innovative features are pragmitacally relevant and a must for all IT departments in today’s business environment.

    Posted in Enterprise Monitoring, Uncategorized | Tagged , , , , | Leave a comment

    APM –  The ‘A’ Dimension

    As more applications migrate to the evolving private and public cloud infrastructure, and permeate through the sprawling distributed “new” IT environment, the Gartner published “five dimensions” of Application Performance Monitoring become essential requirements for any APM monitoring platform. Being able to capture end-user experience, topology, deep dive monitoring of components and analytics will enable IT operations to isolate APM performance issues, reduce the MTTR for application services, and ultimately higher user satisfaction.

    With the arrival and rapid adoption of Virtualization and Cloud infrastructure, reduction in costs for ‘incremental units’ of computing power, the ability to more easily flex up and down as needed, and the lack of restrictions imposed by the traditional models, are all key drivers to migrate applications to this distributed cloud infrastructure. However, with these benefits comes the added burden of high administration overhead from managing a virally sprawling & dynamic IT environment.

    As the number of discrete virtual servers, components and resident applications explodes, performance monitoring and intelligent analytical needs to make rapid decisions will be critical for IT operations. Manually intensive legacy point monitoring tools will not be able to keep up in a dynamic and complex environment where applications can move in almost real time across the underlying IT infrastructure.

    Of course, the better approach would be to utilize and adopt the APM “A” dimension – Automation in the monitoring platform to reduce the burden from routine administrative tasks for application monitoring. Implementing the right systems and processes and finding a monitoring solution which uses a good degree of automation is essential to gain back the efficiency lost from the increased complexity of distributed application infrastructure. Automation in the area of monitoring will ensure consistency in performance monitoring and benchmarking, enabling IT Operations to make better and proactive decisions for application performance. As JP Garbani at Forrester said recently, gaining the right level of productivity in IT operations will come from using better tools, and specifically, automation.

    Posted in Enterprise Monitoring, Uncategorized | Tagged , , , | Leave a comment

    Cloud Monitoring Software: Automation and Intelligence Not Optional

    The quest to improve the productivity and efficiency of IT organizations is an ongoing one. A number of technologies and processes have been adopted over the decades to make IT operations leaner and more effective. With the arrival and rapid adoption of Virtualization technology and Cloud infrastructures in the past few years, IT organizations worldwide are starting to realize significant economy-of-scale benefits. Reduction in costs for ‘incremental units’ of computing power, the ability to more easily flex up and down as needed, and the lack of restrictions imposed by the traditional models, will all drive a dramatic increase in the consumption of computing and application resources as organizations will be freed up to do more. On the flip side, steps will need to be taken to deal with the resulting increase in the administration burden, else the efficiency gains realized from shared, flexible IT infrastructure will be outstripped by the high cost of managing a more dynamic and complex environment.

    Terms like “virtualization sprawl” have been coined to refer to the increase in the number of discrete virtual servers and related application components within the overall IT environment. This is no longer a hypothetical scenario, and organizations are already experiencing administration challenges because of the fundamental IT transformation driven by virtualization and cloud technologies. Consider the case of a leading educational institution in the Northeastern United States. Prior to embarking on an aggressive virtualization initiative, the operations team was responsible for ensuring the performance of approximately 1000 distinct physical servers. By the time the first phase of the server consolidation and virtualization initiative was completed, the team was tracking and managing the performance of over 7000 virtual servers!

    As the number of discrete virtual servers, components and resident applications explodes, the performance monitoring and root-cause-analysis demands on IT administrators will multiply exponentially. Manually intensive legacy and point monitoring tools will not be able to keep up, and organizations will face significant challenges in detecting and resolving issues in a timely manner. In one recent case of an organization being overwhelmed, the IT team resorted to forced daily ‘proactive reboots’ of a large number of their servers. The team claimed that this workaround was the only way to keep the infrastructure performing, given the absence of a comprehensive monitoring and management solution to identify real issues and isolate problem sources. The IT team acknowledged that the organization’s users and business operations were being impacted by this daily reset cycle, but viewed this approach as the lesser evil compared to blind, reactive fire-fighting!

    Off course, the better approach would be to take a more strategic stance and implement the right systems/processes to assure the performance of their IT infrastructure. Today’s cloud monitoring software solutions have to be capable of supporting automation of many of the routine administration tasks. More importantly, these systems need to have in-built intelligence to infer what his going on in the IT infrastructure and automate decision-making. The increased demands on the IT team will be partially offset by the automation capabilities of the monitoring solution, allowing IT personnel to focus on the deeper and more complex administration tasks. Furthermore, the overall efficiency and utilization of IT resources will be higher with the right capabilities in the IT monitoring software (see http://tiny.cc/cwytn to learn how).

    Posted in Enterprise Monitoring, Uncategorized | Tagged | Leave a comment