Menu

Back
Home

Monitoring Edge Systems

May 09, 2023

Monitoring Edge Systems: Monitoring with Prometheus and Grafana

By Justin Faust, Principal Software Development Engineer

Monitoring computer systems is essential for maintaining the performance and reliability of a system. In today's digital age, the consequences of system failure or malfunction can be catastrophic for individuals and businesses alike. Monitoring tools such as Prometheus and Grafana can provide valuable insights into the health of a system, enabling users to identify potential issues before they become major problems. This blog post will serve as a beginner's guide to monitoring computer systems, discussing why it is important, which industries it is particularly relevant for, and how Prometheus and Grafana can be used to monitor your system. For a more in-depth discussion of monitoring, check out our white paper, "Monitoring Edge Systems with U-BMC" and the source code.

Monitoring is relevant to any industry where the performance and reliability of systems are critical to business operations. This includes industries such as finance, healthcare, and e-commerce. In these industries, hardware failure or malfunction can lead to significant financial losses, reputational damage, and even legal liability. Monitoring computer system hardware is crucial for identifying potential issues before they become major problems. Without monitoring, issues such as data loss or system downtime may go unnoticed, leading to costly consequences.

Monitoring can help improve a system's overall performance by providing insights into its health and identifying areas for optimization. In a previous blog post, we discussed the benefits of out-of-band management using the Redfish API (Application Program Interface) implemented by our Unified-BMC (U-BMC). By leveraging Redfish, our customers can gain secure and efficient access metrics even when the system is not connected to the internet. Employing monitoring tools enables real-time insights to any number of decentralized systems. These systems can be counted in the thousands. Monitoring these systems can be a daunting task, but with the right tools, it can be done efficiently and effectively.

Prometheus and Grafana are popular open-source monitoring tools that can be used for collecting a wide range of metrics. Prometheus is a monitoring solution that collects metrics from various sources, such as systems, applications, and databases. It stores the collected data in a time-series database, making it easy to query and analyze. With Prometheus, we can collect metrics from the various sources available to us, including JSON APIs like our Redfish implementation on U-BMC. Grafana is a visualization tool that can be used to create dashboards to display the collected data in a visually appealing and easy-to-understand way. Grafana lets us customize the dashboards according to our needs by displaying metrics in various formats, such as tables, graphs, and maps.

When monitoring your system, it is important to track a range of metrics to gain a comprehensive understanding of the health of the system. External workload of customers and their demand should be tracked as it can help businesses understand their customer's needs better. Internal system metrics such as CPU, memory, networking, and processes should also be monitored to identify potential issues and optimize system performance. You may find it helpful to apply the USE Method as "... it directs the construction of a checklist, which for server analysis can be used for quickly identifying resource bottlenecks or errors." Read more about it here: https://www.brendangregg.com/usemethod.html.

Prometheus and Grafana can be used to gather these metrics and help us gain valuable insights into the health of a system. To get started using Prometheus and Grafana, follow these steps:

Monitoring with Prometheus and Grafana

Install Prometheus on the system to be monitored. Optionally, you may also install Prometheus on a separate system to monitor multiple Prometheus instances at once using the federation feature.
Configure Prometheus to collect metrics from the system, such as CPU, memory, networking, and processes. You can also configure Prometheus to collect metrics from external sources, such as JSON APIs. The Prometheus documentation provides a list of available exporters that can be used to collect metrics and export them to Prometheus.
Install Grafana and connect it to Prometheus. Grafana can be installed on the same system as Prometheus or on a separate system. The data sources you configure in Grafana can include Prometheus, MySQL, PostgreSQL, and many others.
Once you have added a data source, you can create a dashboard to display the collected metrics.
Configure alerts in Prometheus and Grafana to be notified of potential issues.

Gathering metrics and visualizing them in Grafana dashboards has the potential to aid you in finding issues such as CPU overload or high memory usage. For example, if a system is experiencing high CPU usage, it may indicate that the system is running too many processes or that the CPU is not powerful enough to handle the workload. This condition will be made obvious by the monitoring software, and we can create alerts if the average CPU usage remains too high over a long range in time. By identifying this issue early, steps can be taken to optimize the system's performance and prevent further issues from occurring.

Prometheus and Grafana are powerful tools that can be used to monitor a wide range of metrics, enabling users to identify potential issues before they become major problems. By implementing monitoring practices using these tools, businesses can prevent costly consequences and improve efficiency in their operations.

At One Stop Systems, we believe that collaboration and partnerships are essential to solving tough issues in the computing industry. We invite businesses to collaborate with us to develop innovative solutions that address their unique challenges. If you are looking for a partner to help you tackle the challenges of monitoring your edge systems, reach out to us today. Let’s work together to keep your systems running smoothly and efficiently.

Click the buttons below to share this blog post!

Return to the main Blog page

Also in One Stop Systems Blog

The Future of Transportation: Will Autonomous Trucks Ever Make the Driver Obsolete?

April 14, 2025

The advent of technology has always brought about significant changes to various industries, and the transportation sector is no exception. Among the most transformative innovations in recent years is the development of autonomous vehicles, particularly trucks. The potential for autonomous trucks to revolutionize freight transport is immense, raising the fundamental question: will these technological advancements make human drivers obsolete? To explore this question, we must consider the current state of autonomous driving technology, the economic implications, and the societal impact of removing human drivers from the equation.

Advantages and Disadvantages of Implementing AI Inference Nodes on Soldiers

January 15, 2025

The integration of artificial intelligence (AI) into military operations has revolutionized battlefield strategies, decision-making, and operational efficiency. Among these advancements, AI inference nodes deployed directly on soldiers represents a cutting-edge innovation. These nodes, compact computational devices, enable real-time AI processing and analytics, empowering soldiers with enhanced situational awareness, decision support, and operational effectiveness. However, such technology also brings challenges, particularly in power management, size, and weight constraints. This blog delves into the advantages and disadvantages of implementing AI inference nodes on soldiers, focusing on these critical aspects.

Composable Infrastructure: Dynamically Changing IT Infrastructure

May 01, 2024

The evolution of IT infrastructure spans several decades and is marked by significant advancements in computing technology, networking, storage, and management practices. Data Centers have historically relied on Converged or Hyper-Converged infrastructures when deploying their hardware which proved to limited in flexibility, efficiency, scalability, and support for the Artificial Intelligence / Machine Learning (AI/ML) modern workloads of today.

You are now leaving the OSS website

CONTINUE CANCEL

Monitoring Edge Systems

Monitoring Edge Systems: Monitoring with Prometheus and Grafana

Leave a comment

Also in One Stop Systems Blog

The Future of Transportation: Will Autonomous Trucks Ever Make the Driver Obsolete?

Advantages and Disadvantages of Implementing AI Inference Nodes on Soldiers

Composable Infrastructure: Dynamically Changing IT Infrastructure

Sign up for our Newsletter

OSS Policies

OSS Newsletters

You are now leaving the OSS website

Monitoring Edge Systems

Monitoring Edge Systems: Monitoring with Prometheus and Grafana

Leave a comment

Also in One Stop Systems Blog

The Future of Transportation: Will Autonomous Trucks Ever Make the Driver Obsolete?

Advantages and Disadvantages of Implementing AI Inference Nodes on Soldiers

Composable Infrastructure: Dynamically Changing IT Infrastructure

Sign up for our Newsletter

OSS Policies

OSS Newsletters

Social

You are now leaving the OSS website