Skip to main content

Monitoring Terminology and Metrics

Metrics

Monitoring uses a variety of metrics to track system health. We will go through the different resources, the units used to measure them, and the way they can be used by CPM Monitoring. 

CPU Utilization

CPU utilization measures the amount of processor being used at a given time. CPU utilization is expressed as a percentage.
On DigitalOcean, total use of all processors combined is indicated by 100%. This differs from some CPU usage tools which report 100% per CPU or core. For example, other tools might express metrics out of 200% on a machine with two CPUs, or 400% for a quad-core processor.
In the Droplet graphs, CPU usage is broken down in terms of Linux's conception of system and user time. System time is time spent executing kernel-level instructions, while user time is time spent executing "userland" instructions, which is defined by anything outside of the kernel.
DigitalOcean CPU graphs
Alert policies do not distinguish between user and system time.

Memory

Memory utilization is a measurement of the memory being consumed on the server. This is expressed as a percentage of the total available physical memory:
DigitalOcean memory graphs
DigitalOcean calculates memory consumption by evaluating memory information exposed in /proc/meminfo. Memory usage is calculated by subtracting free memory and memory used for caching from the total memory amount.

Disk I/O

Disk I/O, or input/output, is a measure of how much read and write activity the server's disks are experiencing. This is expressed in terms of MB/s, or megabytes per second.
DigitalOcean breaks disk I/O down into read and write operations, which are handled separately. Droplet graphs show these as two separate lines within the Disk I/O graph:
DigitalOcean disk I/O graphs
Separate alert policies can be created to monitor disk read operations and disk write operations.

Disk Usage

Disk usage is a measurement of how much disk space is currently being used. This is expressed as a percentage of the total disk space available on the server.
This value takes into account the Droplet's root storage and any additional attached block storage devices. The values of each storage device are rolled up into a single value that represents the total storage space of the server:
DigitalOcean disk usage graphs
Alert policies are also interpreted in terms of total disk space.

Bandwidth

Bandwidth is a measurement of the amount of incoming or outgoing traffic passing through the Droplet's network interfaces. This is expressed in terms of MBps, or Megabytes per second.
In Droplet graphs, bandwidth is broken down between public and private traffic. Public bandwidth is bandwidth over the public interface that connects to the internet. Incoming traffic is represented by one line, and outgoing traffic by another.
DigitalOcean public bandwidth graphs
Private bandwidth is a measure of the traffic on the private interface that allows for communication within a data center. This graph will only be displayed if private networking is enabled and the interface has experienced traffic. Again, there are separate lines for incoming and outgoing traffic.
DigitalOcean private bandwidth graphs
In alert policies, there is no distinction between public and private interfaces, but the separation of inbound and outbound traffic remains. An alert policy can track incoming traffic or outgoing traffic. Alerts policies are also defined in terms of MBps.

Top Processes

DigitalOcean also reports the highest consumers of CPU and memory as a chart within Droplet graphs. The processes are sorted with the highest consumer of the selected resource first. Each process is accompanied by a usage percentage out of the total available resources.
The top CPU users:
DigitalOcean top CPU chart
The top memory users:
DigitalOcean top memory chart
These charts don't have much impact on the alert policies, though they may be able to provide insight into what processes may have contributed to triggering an alert.

Terminology

When working with monitoring technology, some familiarity with common terminology is often helpful. Below, we will cover some of the most frequently used concepts that are relevant to DigitalOcean Monitoring:
  • Resource: In computing, a resource is a basic component with limited availability. Resources include CPU, memory, disk space, or available bandwidth.
  • Metric: In computing, a metric is a standard for measuring a computer resource. Metrics can either refer to the resource and unit with which to measure, or the data that is collected about that resource.
  • Units: Units are standard ways of comparing values.
  • Percentage units: Percentage units specify a value in relationship to the total available quantity, which is typically set at 100%. Percentages are useful for quantities with a known limit, like disk space.
  • Rate units: Rate units specify a value in relation to another measure (most frequently time). Rate units usually tell you frequency of occurrence over a set time period so that you can compare magnitude. Rate units are useful when there is no easy-to-understand upper boundary that indicates total use or when it is more helpful to examine usage, like incoming bandwidth.
  • Data point: A data point, or value, is a number and unit representing a single measurement.
  • Data set: A data set is a collection of related data points.
  • Time series data: Time series data is data collected at regular intervals and arranged chronologically in order to examine changes over time.
  • Trend: A trend indicates a general tendency in a data set over time. Trends are useful for recognizing changes and for predicting future behavior.
  • Monitoring: In computing, monitoring is the process of gathering and visualizing data to improve awareness of system health and minimize response time when usage is outside of expected levels.
  • System usage monitoring: System usage monitoring is a type of monitoring that involves tracking system resources.
  • Alerting: Alerting within a computer monitoring system is the ability to send notifications when certain metrics fall outside of expected ranges.
  • Threshold: In alerting, a threshold is a value that defines the boundary between normal and abnormal usage.
  • Alert interval: An alert interval is the period of time that average usage must exceed a threshold before triggering an alert.

Comments

Popular posts from this blog

How to remove zabbix-agent from Ubuntu 16.04 (Xenial Xerus)

Uninstall zabbix-agent To remove just zabbix-agent package itself from Ubuntu 16.04 (Xenial Xerus) execute on terminal: sudo apt-get remove zabbix-agent Uninstall zabbix-agent and it's dependent packages To remove the zabbix-agent package and any other dependant package which are no longer needed from Ubuntu Xenial. sudo apt-get remove --auto-remove zabbix-agent Purging zabbix-agent If you also want to delete configuration and/or data files of zabbix-agent from Ubuntu Xenial then this will work: sudo apt-get purge zabbix-agent To delete configuration and/or data files of zabbix-agent and it's dependencies from Ubuntu Xenial then execute: sudo apt-get purge --auto-remove zabbix-agent

Install Zabbix Agent on Suse Linux and Configure

Install taken from Suse –  http://software.opensuse.org/download/package?project=server:monitoring&package=zabbix-agent For SLE 12 SP1 run the following as root : zypper addrepo http://download.opensuse.org/repositories/server:monitoring/SLE_12_SP1/server:monitoring.repo zypper refresh zypper install zabbix-agent For SLE 12 run the following as root : zypper addrepo http://download.opensuse.org/repositories/server:monitoring/SLE_12/server:monitoring.repo zypper refresh zypper install zabbix-agent For SLE 11 SP4 run the following as root : zypper addrepo http://download.opensuse.org/repositories/server:monitoring/SLE_11_SP4/server:monitoring.repo zypper refresh zypper install zabbix-agent For SLE 11 SP3 run the following as root : zypper addrepo http://download.opensuse.org/repositories/server:monitoring/SLE_11_SP3/server:monitoring.repo zypper refresh zypper install zabbix-agent To configure the agent – Instructions taken from – https://www.zabbix.org/wiki

Zabbix alert Notification with Telegram

Zabbix Notifications with graphs in Telegram. Features  Graphs based on latest data are sent directly to your messenger  You can send messages both in private and group chats  Channels support  Saves chatid as a temporary file  Simple markdown and HTML are supported  Emoji in messages First of all : Nedd to install python (>3) on Cent os  # yum -y install python-pip # yum install -y https://centos7.iuscommunity.org/ius-release.rpm #  yum install -y python34u python34u-libs python34u-devel python34u-pi # yum -y install python-pip You need to install the  requests  module for python, this is required for operation! # pip install requests Put  zbxtg.py  in your  AlertScriptsPath  directory, the path is set inside your zabbix_server.conf (once confirm by zabbix server conf file ) Link to download ZBXTG.PY file  https://drive.google.com/open?id=0BxB8j19aCMZ8dFl1aHVuLVJyRjQ # cp zbxtg.py /usr/local/share/zabbix/alertscripts/ Create