Menu

Monitoring Edge Systems

May 09, 2023

Monitoring Edge Systems

Monitoring Edge Systems: Monitoring with Prometheus and Grafana

By Justin Faust, Principal Software Development Engineer

Monitoring computer systems is essential for maintaining the performance and reliability of a system. In today's digital age, the consequences of system failure or malfunction can be catastrophic for individuals and businesses alike. Monitoring tools such as Prometheus and Grafana can provide valuable insights into the health of a system, enabling users to identify potential issues before they become major problems. This blog post will serve as a beginner's guide to monitoring computer systems, discussing why it is important, which industries it is particularly relevant for, and how Prometheus and Grafana can be used to monitor your system. For a more in-depth discussion of monitoring, check out our white paper, "Monitoring Edge Systems with U-BMC" and the source code.

Monitoring is relevant to any industry where the performance and reliability of systems are critical to business operations. This includes industries such as finance, healthcare, and e-commerce. In these industries, hardware failure or malfunction can lead to significant financial losses, reputational damage, and even legal liability. Monitoring computer system hardware is crucial for identifying potential issues before they become major problems. Without monitoring, issues such as data loss or system downtime may go unnoticed, leading to costly consequences.

Monitoring can help improve a system's overall performance by providing insights into its health and identifying areas for optimization. In a previous blog post, we discussed the benefits of out-of-band management using the Redfish API (Application Program Interface) implemented by our Unified-BMC (U-BMC). By leveraging Redfish, our customers can gain secure and efficient access metrics even when the system is not connected to the internet. Employing monitoring tools enables real-time insights to any number of decentralized systems. These systems can be counted in the thousands. Monitoring these systems can be a daunting task, but with the right tools, it can be done efficiently and effectively.

Prometheus and Grafana are popular open-source monitoring tools that can be used for collecting a wide range of metrics. Prometheus is a monitoring solution that collects metrics from various sources, such as systems, applications, and databases. It stores the collected data in a time-series database, making it easy to query and analyze. With Prometheus, we can collect metrics from the various sources available to us, including JSON APIs like our Redfish implementation on U-BMC. Grafana is a visualization tool that can be used to create dashboards to display the collected data in a visually appealing and easy-to-understand way. Grafana lets us customize the dashboards according to our needs by displaying metrics in various formats, such as tables, graphs, and maps.

When monitoring your system, it is important to track a range of metrics to gain a comprehensive understanding of the health of the system. External workload of customers and their demand should be tracked as it can help businesses understand their customer's needs better. Internal system metrics such as CPU, memory, networking, and processes should also be monitored to identify potential issues and optimize system performance. You may find it helpful to apply the USE Method as "... it directs the construction of a checklist, which for server analysis can be used for quickly identifying resource bottlenecks or errors." Read more about it here: https://www.brendangregg.com/usemethod.html.

Prometheus and Grafana can be used to gather these metrics and help us gain valuable insights into the health of a system. To get started using Prometheus and Grafana, follow these steps:

Monitoring with Prometheus and Grafana

  1. Install Prometheus on the system to be monitored. Optionally, you may also install Prometheus on a separate system to monitor multiple Prometheus instances at once using the federation feature.
  2. Configure Prometheus to collect metrics from the system, such as CPU, memory, networking, and processes. You can also configure Prometheus to collect metrics from external sources, such as JSON APIs. The Prometheus documentation provides a list of available exporters that can be used to collect metrics and export them to Prometheus.
  3. Install Grafana and connect it to Prometheus. Grafana can be installed on the same system as Prometheus or on a separate system. The data sources you configure in Grafana can include Prometheus, MySQL, PostgreSQL, and many others.
  4. Once you have added a data source, you can create a dashboard to display the collected metrics.
  5. Configure alerts in Prometheus and Grafana to be notified of potential issues.

Gathering metrics and visualizing them in Grafana dashboards has the potential to aid you in finding issues such as CPU overload or high memory usage. For example, if a system is experiencing high CPU usage, it may indicate that the system is running too many processes or that the CPU is not powerful enough to handle the workload. This condition will be made obvious by the monitoring software, and we can create alerts if the average CPU usage remains too high over a long range in time. By identifying this issue early, steps can be taken to optimize the system's performance and prevent further issues from occurring.

Prometheus and Grafana are powerful tools that can be used to monitor a wide range of metrics, enabling users to identify potential issues before they become major problems. By implementing monitoring practices using these tools, businesses can prevent costly consequences and improve efficiency in their operations.

At One Stop Systems, we believe that collaboration and partnerships are essential to solving tough issues in the computing industry. We invite businesses to collaborate with us to develop innovative solutions that address their unique challenges. If you are looking for a partner to help you tackle the challenges of monitoring your edge systems, reach out to us today. Let’s work together to keep your systems running smoothly and efficiently.

Click the buttons below to share this blog post!

Return to the main Blog page




Leave a comment

Comments will be approved before showing up.


Also in One Stop Systems Blog

Composable Infrastructure:  Dynamically Changing IT Infrastructure
Composable Infrastructure: Dynamically Changing IT Infrastructure

May 01, 2024

The evolution of IT infrastructure spans several decades and is marked by significant advancements in computing technology, networking, storage, and management practices. Data Centers have historically relied on Converged or Hyper-Converged infrastructures when deploying their hardware which proved to limited in flexibility, efficiency, scalability, and support for the Artificial Intelligence / Machine Learning (AI/ML) modern workloads of today. 

Continue Reading

Edge Computing
The Four Types of Edge Computing

April 17, 2024

“Edge Computing” is a term which has been widely adopted by the tech sector. Dominant leaders in accelerated computing have designated “Edge” as one of their fastest-growing segments, with FY24 revenue projected to be nearly $100 billion. The boom in the market for Edge Computing has become so significant that it is increasingly common to see companies create their own edge-related spinoff terms such as ‘Rugged Edge’, ‘Edge AI’, ‘Extreme Edge’, and a whole slew of other new buzzwords. 

Continue Reading

Datalogging in Autonomous Military
Unveiling the Strategic Edge: Datalogging in Autonomous Military Vehicles

March 11, 2024

The landscape of modern warfare is undergoing a profound transformation with the integration of cutting-edge technologies, and at the forefront of this evolution are autonomous military vehicles. Datalogging, a seemingly inconspicuous yet indispensable technology, plays a pivotal role in shaping the capabilities and effectiveness of these autonomous marvels. In this blog post, we delve into the critical role of datalogging in autonomous military vehicles and its impact on the future of defense strategies.

Continue Reading

You are now leaving the OSS website