6 min read
3 Unique Challenges in Determining SAP Root-Cause Analysis
By: Bernd Engist on Oct 23, 2018 3:02:06 PM
Agentless SAP monitoring systems seem like the easy way to go, no installation and no deployment. However, a deeper dive between agent vs agentless SAP monitoring systems will reveal some very interesting facts that need to be considered before making a decision.
A network failure means lost data
Let's envision this scenario: Your SAP or HANA environments are up and running, functioning properly. You then get a notification ‘Monitored system cannot reach SAP'’. The SAP instance seems to be down. What would be the immediate steps you'll take trying to troubleshoot the situation? You'll start with checking the trace file of the instance's working processes locally on the operating system. In the case of an agentless monitoring solution, once the network connection to the monitored service is down, monitoring is down and there is no way to troubleshoot the issue.
An intelligent agent, however, keeps monitoring even when it can’t send the info back to the central server. First, it will check the traces to further identify what's not working. Is it a dispatcher emergency shutdown, or is it an overload or networking issue?
When using an agentless monitoring system, during network downtime, data is not collected and there is no way to get the missing data back to the monitoring system. So you end up with gaps in statistics data: availability, check history, performance data, etc.
Even worse is the case of managed service providers (MSPs) managing remote systems hosted off-premise or at the customer location. In this case, the customer may never have actually had an outage or any real problem, but there was an issue with the network connectivity between the enterprise and the MSP. The MSP however, is now in ‘high alert’ mode as they lose time and resources trying to figure out what the true issue is. And whether it is on their side or on the customer side. On top of that, the MSP will never have any insight into how the system was performing while they were troubleshooting. And there is now a data gap which they will never be able to be historically referenced. Even more important, they missed out on their committed availability (or service level agreement) and need to explain gaps in their service.
An intelligent agent-based solution can cache data during the time the network is down and deliver them back to the monitoring solution once the network connection has been re-established. It also protects the system by ensuring that no unintended or malicious changes made during that time - something that an agentless system would miss but an auditor would not.
So once the network is back, not only the source issue is identified, but also all the data collected during the downtime is sent to the main repository as if the network was never down.
Finding the root cause
Let’s take another example we often hear. An online redo log archive is stuck, which means the database is unusable and therefore SAP is completely frozen. In this scenario, the SAP host may respond to pings, but absolutely nothing can be done, including running background jobs, moving between transaction codes, not even logging in. Remember, an agentless system is reliant on being able to access the SAP system through an RFC. So what can an agentless system do in the situation? Nothing
An agent-based system, which runs on the operating system, can easily check the file system of the offline redo log file system directly, identify the issue and report to the business of the outage. Furthermore, the agent-based monitoring solution can provide forecasting and early notification using different thresholds levels, completely preventing the outage in the first place. Using the collected data directly from both the database and the operating system it provides proactive notifications and a stuck archive issue immediately.
However, in real life, things do not go as planned. Alerts get ignored or just missed and a database can get stuck. An agentless system will notify you, "System or Database Down". While an intelligent agent-based monitoring can do much more and tell you: "Database is down, due to a full archive directory", indicating the root cause, allowing you to get to work fixing the issue rather than waste time finding the source issue.
Root Cause Analysis in the context of system monitoring typically means finding the most relevant event in a series of events, i. e. to pinpoint the one that caused all the other issues.
In Avantra, we created intelligent checks: the system was designed to work as an experienced administrator would and be your ‘Virtual Basis Assistant’. What’s next if something fails? Where else can I investigate? This is called intrinsic root cause analysis.
Avantra's Root Cause Analysis works somewhat different than a regular monitoring system. By hierarchical execution of monitors, the system verifies potential root causes in layers. If one monitor becomes critical then the system doesn’t assume that everything tied to it is down (as an agentless system might) but rather consider them as ‘suspects’ and run checks on them to identify the source of the problem. Allowing the system to pinpoint the source of the problem and let the team focus on solving the problem.
End-to-end SAP monitoring in a distributed network
Most enterprises these days have complex structures of multiple offices in different locations running systems that are hosted in one of those locations, in the cloud or a hybrid of multiple locations and clouds. Measuring the end-to-end user experience and performance is not simple when using an agentless monitoring solution. Practically none of those synthetic tests will really imitate the end user experience. An agentless system is working in one way: doing everything from the central monitoring location. So it can monitor one network connection to the destination system, but it can not simulate a login from a real user perspective. This means it has no way of knowing if something does not work or performance is bad.
However, an agent placed exactly where the user group is located simulates a real user working environment at that site. The agent approach will use an agent running on the user’s site, exactly in the same network location. It will, for example, continuously login to the user’s destination system with the same login data and network settings as the users at that site. This will traverse the network architecture and SAP technical configurations such as such as an SAP router or login group settings for classical ABAP systems, or web proxies for HTTPS-based services, and so on.
The agent monitoring results presented in a real-time performance dashboard, provides a meaningful actionable insight of each end user location and allow for fast and efficient root cause analysis and troubleshooting.
Conclusion
There are pros and cons of using agent-based vs. agentless monitoring solutions. In some cases, due to business decisions and technical restrictions, an agent may not be an option. Leaving an agentless deployment as the only viable option. In all other scenarios, agent-based solutions provide better and more meaningful results. As it can take a deeper dive, offer the best possible root cause analysis, all while providing true end-user experience simulation monitoring.
In Avantra we believe in giving you the choice of enjoying both worlds. While we recommend agent-based approach each agent is capable of remote monitoring one or multiple systems. This gives you the flexibility of installing multiple agents where you can (for example on the DMZ) and letting those agents monitor the areas you can’t or don’t want to install agents with only minor tweaks to the firewall configuration.
To see how Avantra utilizes robust, yet slim agents to properly monitor SAP ecosystems and provide full health analysis all in a secure manner, watch this 15-minute technical demo.
Photo by Zach Reiner on Unsplash
Related Posts
5 Agent-Based vs. Agentless SAP Monitoring Myths Debunked
“Our IT uses agentless monitoring solution, why can’t we use it for SAP?” Do you hear that often?...
A Large Retailer Increase System Efficiency with Over 120 SAP HANA Monitoring
It's REALLY hard to get a clear visibility of a complex SAP system performance.
Or is it?
Well, it...
Six reasons why SAP monitoring is important for CIOs
Gartner reported that unplanned IT system downtime cost: $5,600 per minute. $300,000 per hour. IDC,...