Compare commits
23 Commits
954f607c5a
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| be3f8635b8 | |||
|
|
dce5a0ab85 | ||
|
|
9727d660d1 | ||
| d828afdb53 | |||
|
|
11d497b4e3 | ||
| 5fdca2c30d | |||
|
|
631fdd9389 | ||
| 902d60bbea | |||
|
|
efc028c511 | ||
| 0e24bc09a0 | |||
|
|
1535c7c25b | ||
|
|
ba2c552d24 | ||
| 0fa10c8e0b | |||
| 7d8afee839 | |||
| 6254b1809a | |||
| a75840ea82 | |||
| 422353f640 | |||
| 78b7ea7146 | |||
| 030b8302c4 | |||
| 9de12e1b10 | |||
| 0fa8b49212 | |||
| f24181ff79 | |||
|
|
d16b8ffa5a |
8
.gitignore
vendored
8
.gitignore
vendored
@@ -0,0 +1,8 @@
|
||||
.idea
|
||||
.venv
|
||||
/data/
|
||||
/configs/config.*
|
||||
/configs/metrics.*
|
||||
/configs/metrics_win.*
|
||||
/__pycache__/
|
||||
/metrics/__pycache__/
|
||||
47
README.md
47
README.md
@@ -14,6 +14,8 @@ Create configurable lightweight application to collect some metrics
|
||||
- hot reload metrics if configuration changed
|
||||
- could be used as [regular application](#StartRegular), as [systemctl service](#StartService) or a [Docker application](#StartDocker)
|
||||
- supports JSON, PROPERTIES and YAML configuration formats
|
||||
- [internal metrics](#Internal) to show how time spent on update every other metrics
|
||||
- [metrics labels](#Labels) support
|
||||
|
||||
### 📌 Using
|
||||
_To use specific configuration format the `app_config.CONFIG_FILE_NAME` variable need to be changed. The default config file format is **JSON**_
|
||||
@@ -55,7 +57,9 @@ There are some embedded metrics in the Exporter:
|
||||
- Chassis temperature
|
||||
- CPU temperature
|
||||
|
||||
The Default Application config is (no metrics are configured):
|
||||
Default config stored in `./configs/config.json` file. To change it the `app_config.CONFIG_METRICS_FILE_NAME` variable need to be changed.
|
||||
|
||||
The Default Application config is (no custom metrics are configured):
|
||||
```json
|
||||
{
|
||||
"monitor": {
|
||||
@@ -78,6 +82,12 @@ The Default Application config is (no metrics are configured):
|
||||
- `name` - parameter used in every metric to identify it. **Required**.
|
||||
- `interval` - time interval in seconds the metric will be updated. **Required**.
|
||||
|
||||
#### Metrics Labels<a id='Labels' />
|
||||
From version 2.0 the Application supports Labels. See [Metrics Names](MetricName) for details.
|
||||
|
||||
#### Internal metrics<a id='Internal' />
|
||||
From version 2.0 the Application supports internal metrics to collect time. See [Metrics Names](MetricName) for details.
|
||||
|
||||
#### Disk (or mount point) Metrics<a id='DiscMetrics' />
|
||||
**_Monitors the Mount Point's sizes: `total`, `used`, `free` space in bytes_**
|
||||
```json
|
||||
@@ -171,6 +181,10 @@ The Default Application config is (no metrics are configured):
|
||||
- `result_path` - path to result value in response JSON separated by `app_config.RESPONSE_PATH_SEPARATOR` character. Could be configured in [Application config](#AppConfig).
|
||||
- `timeout` - timeout to wait for response
|
||||
|
||||
#### REST value Binary Metrics
|
||||
**_Gets the responses value from http request to REST service_**
|
||||
THe same as [REST value Metrics] but works with 'ON/OFF' and 'TRUE/FALSE' values
|
||||
|
||||
#### Shell value Metrics
|
||||
**_Gets the shell command executed result value_**
|
||||
```json
|
||||
@@ -185,11 +199,20 @@ The Default Application config is (no metrics are configured):
|
||||
- `args` - CLI arguments to be provided to the command
|
||||
In example above the metric will return integer value 3.
|
||||
|
||||
<a id='MetricName' />**The metric name creates as follows:**
|
||||
- uses Metric Prefix, actually `das_`
|
||||
- uses `metric_text` given to every metric while it creating
|
||||
- uses `instance_prefix` given in metric configuration
|
||||
- uses `name` given in metric configuration
|
||||
<a id='MetricName' />**The metric names:**
|
||||
From version 2.0 there are following metric names used
|
||||
- `das_collect_time_ms` - Total time spent collecting metrics in milliseconds; Labels: **name**, Total time spent collecting metrics [name] on [server] in milliseconds
|
||||
- `das_disk_bytes` - Bytes (total, used, free) on (mount_point) for (server); Labels: **total, used, free, mount_point, server**
|
||||
- `das_service_health` - Service health; Labels **name, url, method, server**
|
||||
- `das_rest_value` - Remote REST API Value; Labels **name, url, method, server**
|
||||
- `das_shell_value` - Shell Value; Labels: **name, command, server**
|
||||
- `das_host_available` - Host availability; Labels **name, ip, server**
|
||||
- `das_net_interface_bytes` - Network Interface bytes; Labels: **name, server, metric=(sent|receive)**
|
||||
- `das_exporter` - Exporter Uptime for **server** in seconds
|
||||
- `das_uptime_seconds` - System uptime on **server**
|
||||
- `das_cpu_percent` - CPU used percent on **server**
|
||||
- `das_memory_percent` - Memory used percent on **server**
|
||||
- `das_temperature` - Temperature overall; Labels **server**, **metric=(CPU|Chassis)**;
|
||||
**Note:** there are no doubles in metrics names supported by Prometheus. If so the exception occurs ant the application will be stopped.
|
||||
|
||||
### 🚀 Launching the application
|
||||
@@ -208,19 +231,19 @@ In application directory:
|
||||
python ./main.py
|
||||
```
|
||||
|
||||
##### System service (preferred option)<a id='StartService' />
|
||||
#### System service (preferred option)<a id='StartService' />
|
||||
Prepare the [dasExporter.service](dasExporter.service) file. Then launch commands:
|
||||
```shell
|
||||
sudo ln -s "$( cd -- $(dirname $0) >/dev/null 2>&1 ; pwd -P )/dasExporter.service" /etc/systemd/system/dasExporter.service
|
||||
sudo ln -s "$( pwd -P )/dasExporter.service" /etc/systemd/system/dasExporter.service
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl start dasExporter
|
||||
sudo systemctl enable dasExporter
|
||||
```
|
||||
To view service status use `sudo systemctl status dasExporter`
|
||||
To restart the service use `sudo systemctl restart dasExporter`
|
||||
To stop the service use `sudo systemctl stop dasExporter`
|
||||
* `sudo systemctl status dasExporter` - to view service status use
|
||||
* `sudo systemctl restart dasExporter` - to restart the service use
|
||||
* `sudo systemctl stop dasExporter` - to stop the service use
|
||||
|
||||
##### Docker application<a id='StartDocker' />
|
||||
#### Docker application<a id='StartDocker' />
|
||||
Use provided [docker-compose.yaml](docker-compose.yaml) and [Dockerfile](Dockerfile) files to launch Exporter in docker container.
|
||||
|
||||
**_Make sure you provided all mounts you need to be monitored in `volume` section in the `docker-compose.yaml` file and made according changes in [Disc Metrics Configuration](#DiscMetrics)._**
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import os
|
||||
|
||||
APP_VERSION="2.4"
|
||||
SCRIPT_PATH = os.path.dirname(__file__)
|
||||
CONFIGS_DIR = SCRIPT_PATH + "/configs"
|
||||
CONFIG_FILE_NAME = CONFIGS_DIR + "/config.json"
|
||||
|
||||
@@ -16,7 +16,7 @@ def read_config(name):
|
||||
raise Exception("Wrong file type")
|
||||
|
||||
def read_json(name):
|
||||
with open(name, 'r') as f:
|
||||
with open(name, 'r', encoding='utf-8') as f:
|
||||
j_conf = json.load(f)
|
||||
conf = {}
|
||||
for key, value in j_conf.items():
|
||||
@@ -25,7 +25,7 @@ def read_json(name):
|
||||
|
||||
def read_prop(filepath, sep='=', comment_char='#'):
|
||||
conf = {}
|
||||
with open(filepath, "rt") as f:
|
||||
with open(filepath, "rt", encoding='utf-8') as f:
|
||||
for line in f:
|
||||
l = line.strip()
|
||||
if l and not l.startswith(comment_char):
|
||||
@@ -35,9 +35,9 @@ def read_prop(filepath, sep='=', comment_char='#'):
|
||||
conf[key] = value
|
||||
return conf
|
||||
|
||||
def read_yaml(name): #ToDo: need to be tested!
|
||||
def read_yaml(name):
|
||||
conf = {}
|
||||
with open(name, 'r') as f:
|
||||
with open(name, 'r', encoding='utf-8') as f:
|
||||
y_conf = yaml.safe_load(f)
|
||||
for key, value in y_conf.items():
|
||||
conf[key] = value
|
||||
|
||||
16
main.py
16
main.py
@@ -44,12 +44,13 @@ def parse_config(cfg):
|
||||
|
||||
def init_metric_entities(data):
|
||||
return {
|
||||
M.DiskMetric(data, app_config.INSTANCE_PREFIX),
|
||||
M.HealthMetric(data, app_config.INSTANCE_PREFIX),
|
||||
M.IcmpMetric(data, app_config.INSTANCE_PREFIX),
|
||||
M.InterfaceMetric(data, app_config.INSTANCE_PREFIX),
|
||||
M.RestValueMetric(data, app_config.INSTANCE_PREFIX),
|
||||
M.ShellValueMetric(data, app_config.INSTANCE_PREFIX),
|
||||
M.DiskMetric(data),
|
||||
M.HealthMetric(data),
|
||||
M.IcmpMetric(data),
|
||||
M.InterfaceMetric(data),
|
||||
M.RestValueMetric(data),
|
||||
M.RestValueBMetric(data),
|
||||
M.ShellValueMetric(data),
|
||||
M.UptimeMetric(app_config.UPTIME_UPDATE_SECONDS),
|
||||
M.SystemMetric(app_config.SYSTEM_UPDATE_SECONDS)
|
||||
}
|
||||
@@ -59,6 +60,7 @@ def is_need_to_reload_config():
|
||||
|
||||
def print_config_info_debug():
|
||||
print('-=: Debug Mode :=-')
|
||||
print(f'\tAPP_VERSION={app_config.APP_VERSION}')
|
||||
print(f'\tSCRIPT_PATH={app_config.SCRIPT_PATH}')
|
||||
print(f'\tCONFIGS_DIR={app_config.CONFIGS_DIR}')
|
||||
print(f'\tCONFIG_FILE_NAME={app_config.CONFIG_FILE_NAME}')
|
||||
@@ -74,7 +76,7 @@ def print_config_info_debug():
|
||||
print(f'\tIS_PRINT_INFO={app_config.IS_PRINT_INFO}')
|
||||
|
||||
def main():
|
||||
print("-=: Collector started :=-")
|
||||
print(f'-=: Collector started (version {app_config.APP_VERSION}) :=-')
|
||||
if os.path.isfile(app_config.CONFIG_FILE_NAME):
|
||||
config = read_app_config()
|
||||
parse_config(config)
|
||||
|
||||
@@ -11,32 +11,57 @@ ENUM_UP_DN_STATES = ['up', 'dn']
|
||||
def get_metric(name):
|
||||
return REGISTRY._names_to_collectors.get(name)
|
||||
|
||||
def get_gauge_metric(metric_name, descr):
|
||||
def get_gauge_metric(metric_name, descr, labels=None):
|
||||
if labels is None:
|
||||
labels = []
|
||||
metric = get_metric(metric_name)
|
||||
if metric is None:
|
||||
metric = Gauge(metric_name, descr)
|
||||
if labels:
|
||||
metric = Gauge(metric_name, descr, labelnames=labels)
|
||||
else:
|
||||
metric = Gauge(metric_name, descr)
|
||||
return metric
|
||||
|
||||
def get_counter_metric(metric_name, descr):
|
||||
def get_counter_metric(metric_name, descr, labels=None):
|
||||
metric = get_metric(metric_name)
|
||||
if metric is None:
|
||||
metric = Counter(metric_name, descr)
|
||||
if labels:
|
||||
metric = Counter(metric_name, descr, labelnames=labels)
|
||||
else:
|
||||
metric = Counter(metric_name, descr)
|
||||
return metric
|
||||
|
||||
def get_enum_metric(metric_name, descr, states):
|
||||
def get_enum_metric(metric_name, descr, states, labels=None):
|
||||
metric = get_metric(metric_name)
|
||||
if metric is None:
|
||||
metric = Enum(metric_name, descr, states=states)
|
||||
if labels:
|
||||
metric = Enum(metric_name, descr, states=states, labelnames=labels)
|
||||
else:
|
||||
metric = Enum(metric_name, descr, states=states)
|
||||
return metric
|
||||
|
||||
def get_time_millis():
|
||||
return round(time.time() * 1000)
|
||||
|
||||
|
||||
class AbstractData:
|
||||
METRIC_NAME_PREFIX = 'das_'
|
||||
g_collect: Gauge
|
||||
def __init__(self, name, interval, prefix=''):
|
||||
self.name = name
|
||||
self.interval = interval
|
||||
self.instance_prefix = prefix
|
||||
self.updated_at = int(time.time())
|
||||
self.g_collect = get_gauge_metric('das_collect_time_ms',
|
||||
'Total time spent collecting metrics [name] on [server] in milliseconds',
|
||||
['server', 'name'])
|
||||
self.g_collect.labels(server=prefix, name=name)
|
||||
self.g_collect.labels(server=prefix, name=name)
|
||||
self.g_collect.labels(server=prefix, name=name)
|
||||
self.g_collect.labels(server=prefix, name=name)
|
||||
self.g_collect.labels(server=prefix, name=name)
|
||||
self.g_collect.labels(server=prefix, name=name)
|
||||
self.g_collect.labels(server=prefix, name=name)
|
||||
self.g_collect.labels(server=prefix, name=name)
|
||||
|
||||
def set_update_time(self):
|
||||
self.updated_at = int(time.time())
|
||||
@@ -44,11 +69,8 @@ class AbstractData:
|
||||
def is_need_to_update(self):
|
||||
return self.updated_at + self.interval <= int(time.time())
|
||||
|
||||
def get_metric_name(self, metric_text, name):
|
||||
return (self.METRIC_NAME_PREFIX +
|
||||
metric_text + '_' +
|
||||
(self.instance_prefix + '_' if self.instance_prefix else '') +
|
||||
name)
|
||||
def set_collect_time(self, value=0):
|
||||
self.g_collect.labels(server=self.instance_prefix, name=self.name).set(value)
|
||||
|
||||
def print_trigger_info(self):
|
||||
if app_config.IS_PRINT_INFO:
|
||||
@@ -56,24 +78,27 @@ class AbstractData:
|
||||
|
||||
|
||||
class DiskData(AbstractData):
|
||||
g_total: Gauge
|
||||
g_used: Gauge
|
||||
g_free: Gauge
|
||||
g_all: Gauge
|
||||
def __init__(self, mount_point='/', total=0, used=0, free=0, interval=60, name='', prefix=''):
|
||||
super().__init__(name, interval, prefix)
|
||||
self.mount_point = mount_point
|
||||
self.total = total
|
||||
self.used = used
|
||||
self.free = free
|
||||
self.g_total = get_gauge_metric(self.get_metric_name('disk_total_bytes', name), 'Total bytes on disk')
|
||||
self.g_used = get_gauge_metric(self.get_metric_name('disk_used_bytes', name), 'Used bytes on disk')
|
||||
self.g_free = get_gauge_metric(self.get_metric_name('disk_free_bytes', name), 'Free bytes on disk')
|
||||
self.g_all = get_gauge_metric('das_disk_bytes',
|
||||
'Bytes [total, used, free] on [mount_point] for [server]',
|
||||
['name', 'mount', 'server', 'metric'])
|
||||
self.g_all.labels(name=name, mount=mount_point, server=self.instance_prefix, metric='total')
|
||||
self.g_all.labels(name=name, mount=mount_point, server=self.instance_prefix, metric='used')
|
||||
self.g_all.labels(name=name, mount=mount_point, server=self.instance_prefix, metric='free')
|
||||
self.set_data(total, used, free)
|
||||
|
||||
def set_data(self, total, used, free):
|
||||
self.g_total.set(total)
|
||||
self.g_used.set(used)
|
||||
self.g_free.set(free)
|
||||
time_ms = get_time_millis()
|
||||
self.g_all.labels(name=self.name, mount=self.mount_point, server=self.instance_prefix, metric='total').set(total)
|
||||
self.g_all.labels(name=self.name, mount=self.mount_point, server=self.instance_prefix, metric='used').set(used)
|
||||
self.g_all.labels(name=self.name, mount=self.mount_point, server=self.instance_prefix, metric='free').set(free)
|
||||
self.set_collect_time(get_time_millis() - time_ms)
|
||||
self.set_update_time()
|
||||
self.print_trigger_info()
|
||||
|
||||
@@ -91,15 +116,22 @@ class HealthData(AbstractData):
|
||||
self.user = user
|
||||
self.password = password
|
||||
self.headers = headers
|
||||
metric_name = self.get_metric_name('service_health', name)
|
||||
self.e_state = get_enum_metric(metric_name, 'Service health', ENUM_UP_DN_STATES)
|
||||
self.set_status(is_up)
|
||||
self.e_state = get_enum_metric('das_service_health',
|
||||
'Service [name, url, method, server] health',
|
||||
ENUM_UP_DN_STATES,['name', 'url', 'method', 'server'])
|
||||
self.e_state.labels(name=name, url=url, method=method, server=self.instance_prefix)
|
||||
self.set_data(is_up)
|
||||
|
||||
def set_status(self, is_up):
|
||||
def set_data(self, is_up, working_time = None):
|
||||
time_ms = get_time_millis()
|
||||
self.is_up = is_up
|
||||
self.e_state.state(ENUM_UP_DN_STATES[0] if is_up else ENUM_UP_DN_STATES[1])
|
||||
self.e_state.labels(name=self.name, url=self.url, method=self.method, server=self.instance_prefix).state(ENUM_UP_DN_STATES[0] if is_up else ENUM_UP_DN_STATES[1])
|
||||
self.set_update_time()
|
||||
self.print_trigger_info()
|
||||
if working_time:
|
||||
self.set_collect_time(working_time)
|
||||
else:
|
||||
self.set_collect_time(get_time_millis() - time_ms)
|
||||
|
||||
|
||||
class RestValueData(AbstractData):
|
||||
@@ -118,19 +150,64 @@ class RestValueData(AbstractData):
|
||||
self.value = value
|
||||
self.type = result_type
|
||||
self.path = result_path
|
||||
metric_name = self.get_metric_name('rest_value', name)
|
||||
self.g_value = get_gauge_metric(metric_name, 'Remote REST API Value ' + name)
|
||||
self.set_value(value)
|
||||
self.g_value = get_gauge_metric('das_rest_value',
|
||||
'Remote REST API [name, url, method, server] Value',
|
||||
['name', 'url', 'method', 'server'])
|
||||
self.g_value.labels(name=name, url=url, method=method, server=self.instance_prefix)
|
||||
self.set_data(value)
|
||||
|
||||
def set_value(self, value):
|
||||
def set_data(self, value, working_time = None):
|
||||
time_ms = get_time_millis()
|
||||
self.value = value
|
||||
try:
|
||||
self.g_value.set(int(value))
|
||||
self.g_value.labels(name=self.name, url=self.url, method=self.method, server=self.instance_prefix).set(int(value))
|
||||
except:
|
||||
self.g_value.set(0)
|
||||
self.g_value.labels(name=self.name, url=self.url, method=self.method, server=self.instance_prefix).set(0)
|
||||
|
||||
self.set_update_time()
|
||||
self.print_trigger_info()
|
||||
if working_time:
|
||||
self.set_collect_time(working_time)
|
||||
else:
|
||||
self.set_collect_time(get_time_millis() - time_ms)
|
||||
|
||||
|
||||
class RestValueBData(AbstractData):
|
||||
g_value: Gauge
|
||||
def __init__(self, name, url, interval, timeout, value=None, method='GET', user=None, password=None, headers=None, prefix='',
|
||||
result_type='single', result_path=''):
|
||||
super().__init__(name, interval, prefix)
|
||||
if headers is None:
|
||||
headers = {}
|
||||
self.url = url
|
||||
self.timeout = timeout
|
||||
self.method = method.upper()
|
||||
self.user = user
|
||||
self.password = password
|
||||
self.headers = headers
|
||||
self.value = value
|
||||
self.type = result_type
|
||||
self.path = result_path
|
||||
self.g_value = get_gauge_metric('das_rest_value_b',
|
||||
'Remote REST API [name, url, method, server] Binary Value',
|
||||
['name', 'url', 'method', 'server'])
|
||||
self.g_value.labels(name=name, url=url, method=method, server=self.instance_prefix)
|
||||
self.set_data(value)
|
||||
|
||||
def set_data(self, value, working_time = None):
|
||||
time_ms = get_time_millis()
|
||||
self.value = value
|
||||
try:
|
||||
self.g_value.labels(name=self.name, url=self.url, method=self.method, server=self.instance_prefix).set(1 if str(value).upper() in ['ON', 'TRUE'] else 0)
|
||||
except:
|
||||
self.g_value.labels(name=self.name, url=self.url, method=self.method, server=self.instance_prefix).set(-1)
|
||||
|
||||
self.set_update_time()
|
||||
self.print_trigger_info()
|
||||
if working_time:
|
||||
self.set_collect_time(working_time)
|
||||
else:
|
||||
self.set_collect_time(get_time_millis() - time_ms)
|
||||
|
||||
|
||||
class ShellValueData(AbstractData):
|
||||
@@ -142,19 +219,26 @@ class ShellValueData(AbstractData):
|
||||
self.command = command
|
||||
self.value = value
|
||||
self.args = args
|
||||
metric_name = self.get_metric_name('shell_value', name)
|
||||
self.g_value = get_gauge_metric(metric_name, 'Shell Value ' + name)
|
||||
self.set_value(value)
|
||||
self.g_value = get_gauge_metric('das_shell_value',
|
||||
'Shell [name, command, server] Value',
|
||||
['name', 'command', 'server'])
|
||||
self.g_value.labels(name=name, command=command, server=self.instance_prefix)
|
||||
self.set_data(value)
|
||||
|
||||
def set_value(self, value):
|
||||
def set_data(self, value, working_time = None):
|
||||
time_ms = get_time_millis()
|
||||
self.value = value
|
||||
try:
|
||||
self.g_value.set(int(value))
|
||||
self.g_value.labels(name=self.name, command=self.command, server=self.instance_prefix).set(int(value))
|
||||
except:
|
||||
self.g_value.set(0)
|
||||
self.g_value.labels(name=self.name, command=self.command, server=self.instance_prefix).set(0)
|
||||
|
||||
self.set_update_time()
|
||||
self.print_trigger_info()
|
||||
if working_time:
|
||||
self.set_collect_time(working_time)
|
||||
else:
|
||||
self.set_collect_time(get_time_millis() - time_ms)
|
||||
|
||||
|
||||
class IcmpData(AbstractData):
|
||||
@@ -164,38 +248,47 @@ class IcmpData(AbstractData):
|
||||
self.ip = ip
|
||||
self.count = count
|
||||
self.is_up = is_up
|
||||
metric_name = self.get_metric_name('host_available', name)
|
||||
self.e_state = get_enum_metric(metric_name, 'Host availability', ENUM_UP_DN_STATES)
|
||||
self.set_status(is_up)
|
||||
self.e_state = get_enum_metric('das_host_available',
|
||||
'Host [name, ip, server] availability',
|
||||
ENUM_UP_DN_STATES, ['name', 'ip', 'server'])
|
||||
self.e_state.labels(name=name, ip=ip, server=self.instance_prefix)
|
||||
self.set_data(is_up)
|
||||
|
||||
def set_status(self, is_up):
|
||||
def set_data(self, is_up, working_time = None):
|
||||
time_ms = get_time_millis()
|
||||
self.is_up = is_up
|
||||
self.e_state.state(ENUM_UP_DN_STATES[0] if is_up else ENUM_UP_DN_STATES[1])
|
||||
self.e_state.labels(name=self.name, ip=self.ip, server=self.instance_prefix).state(ENUM_UP_DN_STATES[0] if is_up else ENUM_UP_DN_STATES[1])
|
||||
self.set_update_time()
|
||||
self.print_trigger_info()
|
||||
if working_time:
|
||||
self.set_collect_time(working_time)
|
||||
else:
|
||||
self.set_collect_time(get_time_millis() - time_ms)
|
||||
|
||||
|
||||
class InterfaceData(AbstractData):
|
||||
g_sent: Counter
|
||||
g_receive: Counter
|
||||
g_all: Counter
|
||||
def __init__(self, name, iface, interval, sent, receive, prefix=''):
|
||||
super().__init__(name, interval, prefix)
|
||||
self.iface = iface
|
||||
self.sent = sent
|
||||
self.receive = receive
|
||||
sent_metric_name = self.get_metric_name('net_interface_sent_bytes', name)
|
||||
self.g_sent = get_counter_metric(sent_metric_name, 'Network Interface bytes sent')
|
||||
receive_metric_name = self.get_metric_name('net_interface_receive_bytes', name)
|
||||
self.g_receive = get_counter_metric(receive_metric_name, 'Network Interface bytes receive')
|
||||
self.g_all = get_counter_metric('das_net_interface_bytes',
|
||||
'Network Interface [name, server, metric=[sent,receive]] bytes',
|
||||
['name', 'server', 'metric'])
|
||||
self.g_all.labels(name=name, server=self.instance_prefix, metric='sent')
|
||||
self.g_all.labels(name=name, server=self.instance_prefix, metric='receive')
|
||||
self.set_data(sent, receive)
|
||||
|
||||
def set_data(self, sent, receive):
|
||||
time_ms = get_time_millis()
|
||||
sent_delta = sent - self.sent
|
||||
recv_delta = receive - self.receive
|
||||
self.sent = sent
|
||||
self.receive = receive
|
||||
self.g_sent.inc(sent_delta)
|
||||
self.g_receive.inc(recv_delta)
|
||||
self.g_all.labels(name=self.name, server=self.instance_prefix, metric='sent').inc(sent_delta)
|
||||
self.g_all.labels(name=self.name, server=self.instance_prefix, metric='receive').inc(recv_delta)
|
||||
self.set_collect_time(get_time_millis() - time_ms)
|
||||
self.set_update_time()
|
||||
self.print_trigger_info()
|
||||
|
||||
@@ -206,14 +299,18 @@ class UptimeData(AbstractData):
|
||||
def __init__(self, interval, prefix=''):
|
||||
super().__init__('uptime', interval, prefix)
|
||||
self.uptime = 0
|
||||
metric_name = self.get_metric_name('exporter', self.name)
|
||||
self.c_uptime = get_counter_metric(metric_name, 'Exporter Uptime in seconds')
|
||||
self.c_uptime = get_counter_metric('das_exporter_uptime',
|
||||
'Exporter Uptime for [server] in seconds',
|
||||
['server'])
|
||||
self.c_uptime.labels(server=self.instance_prefix)
|
||||
self.set_data()
|
||||
|
||||
def set_data(self):
|
||||
time_ms = get_time_millis()
|
||||
uptime = int(time.time()) - self.START_TIME
|
||||
self.c_uptime.inc(uptime - self.uptime)
|
||||
self.c_uptime.labels(server=self.instance_prefix).inc(uptime - self.uptime)
|
||||
self.uptime = uptime
|
||||
self.set_collect_time(get_time_millis() - time_ms)
|
||||
self.set_update_time()
|
||||
self.print_trigger_info()
|
||||
|
||||
@@ -223,7 +320,7 @@ class SystemData(AbstractData):
|
||||
c_uptime: Counter
|
||||
g_cpu: Gauge
|
||||
g_memory: Gauge
|
||||
g_chassis_temp: Gauge
|
||||
g_tempr: Gauge
|
||||
g_cpu_temp: Gauge
|
||||
def __init__(self, interval, prefix=''):
|
||||
super().__init__('system', interval, prefix)
|
||||
@@ -232,23 +329,23 @@ class SystemData(AbstractData):
|
||||
self.set_data()
|
||||
|
||||
def init_metrics(self):
|
||||
uptime_metric_name = self.get_metric_name(self.name, 'uptime_seconds')
|
||||
self.c_uptime = get_counter_metric(uptime_metric_name, 'System uptime')
|
||||
cpu_metric_name = self.get_metric_name(self.name, 'cpu_percent')
|
||||
self.g_cpu = get_gauge_metric(cpu_metric_name, 'CPU used percent')
|
||||
mem_metric_name = self.get_metric_name(self.name, 'memory_percent')
|
||||
self.g_memory = get_gauge_metric(mem_metric_name, 'Memory used percent')
|
||||
chassis_temp_metric_name = self.get_metric_name(self.name, 'ChassisTemperature_current')
|
||||
self.g_chassis_temp = get_gauge_metric(chassis_temp_metric_name, 'Current Chassis Temperature overall')
|
||||
cpu_temp_metric_name = self.get_metric_name(self.name, 'CpuTemperature_current')
|
||||
self.g_cpu_temp = get_gauge_metric(cpu_temp_metric_name, 'Current CPU Temperature overall')
|
||||
self.c_uptime = get_counter_metric('das_uptime_seconds', 'System uptime on [server]', ['server'])
|
||||
self.c_uptime.labels(server=self.instance_prefix)
|
||||
self.g_cpu = get_gauge_metric('das_cpu_percent', 'CPU used percent on [server]', ['server'])
|
||||
self.g_cpu.labels(server=self.instance_prefix)
|
||||
self.g_memory = get_gauge_metric('das_memory_percent', 'Memory used percent on [server]', ['server'])
|
||||
self.g_memory.labels(server=self.instance_prefix)
|
||||
self.g_tempr = get_gauge_metric('das_temperature', 'Temperature of [type] overall on [server]', ['metric', 'server'])
|
||||
self.g_tempr.labels(server=self.instance_prefix, metric='CPU')
|
||||
self.g_tempr.labels(server=self.instance_prefix, metric='Chassis')
|
||||
|
||||
def set_data(self):
|
||||
time_ms = get_time_millis()
|
||||
uptime = int(time.time()) - self.BOOT_TIME
|
||||
self.c_uptime.inc(uptime - self.uptime)
|
||||
self.c_uptime.labels(server=self.instance_prefix).inc(uptime - self.uptime)
|
||||
self.uptime = uptime
|
||||
self.memory = psutil.virtual_memory().percent
|
||||
self.g_memory.set(self.memory)
|
||||
self.g_memory.labels(server=self.instance_prefix).set(self.memory)
|
||||
Thread(target=self.set_cpu_percent()).run()
|
||||
|
||||
try:
|
||||
@@ -272,20 +369,21 @@ class SystemData(AbstractData):
|
||||
else:
|
||||
self.ch_temp = self.cpu_temp
|
||||
|
||||
self.g_chassis_temp.set(self.ch_temp)
|
||||
self.g_cpu_temp.set(self.cpu_temp)
|
||||
self.g_tempr.labels(server=self.instance_prefix, metric='Chassis').set(self.ch_temp)
|
||||
self.g_tempr.labels(server=self.instance_prefix, metric='CPU').set(self.cpu_temp)
|
||||
except:
|
||||
self.ch_temp = -500
|
||||
self.cpu_temp = -500
|
||||
self.g_chassis_temp.set(self.ch_temp)
|
||||
self.g_cpu_temp.set(self.cpu_temp)
|
||||
self.g_tempr.labels(server=self.instance_prefix, metric='Chassis').set(self.ch_temp)
|
||||
self.g_tempr.labels(server=self.instance_prefix, metric='CPU').set(self.cpu_temp)
|
||||
|
||||
self.set_collect_time(get_time_millis() - time_ms)
|
||||
self.set_update_time()
|
||||
self.print_trigger_info()
|
||||
|
||||
def set_cpu_percent(self):
|
||||
self.cpu = psutil.cpu_percent(1)
|
||||
self.g_cpu.set(self.cpu)
|
||||
self.g_cpu.labels(server=self.instance_prefix).set(self.cpu)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
@@ -12,13 +12,16 @@ import app_config
|
||||
|
||||
from threading import Thread
|
||||
from metrics.DataStructures import DiskData, HealthData, IcmpData, ENUM_UP_DN_STATES, InterfaceData, UptimeData, \
|
||||
SystemData, RestValueData, ShellValueData
|
||||
SystemData, RestValueData, ShellValueData, RestValueBData
|
||||
|
||||
|
||||
class AbstractMetric:
|
||||
metric_key = ""
|
||||
config = {}
|
||||
prefix = ""
|
||||
def __init__(self, key, config):
|
||||
self.metric_key = key
|
||||
self.prefix = app_config.INSTANCE_PREFIX
|
||||
if key and key in config:
|
||||
self.config = config[key]
|
||||
self.data_array = []
|
||||
@@ -33,6 +36,7 @@ class AbstractMetric:
|
||||
|
||||
|
||||
def is_health_check(url, timeout, method, user, pwd, headers, callback=None):
|
||||
time_ms = get_time_millis()
|
||||
session = requests.Session()
|
||||
if user and pwd:
|
||||
session.auth = (user, pwd)
|
||||
@@ -45,13 +49,15 @@ def is_health_check(url, timeout, method, user, pwd, headers, callback=None):
|
||||
)
|
||||
result = response.status_code == 200
|
||||
if callback is not None:
|
||||
callback(result)
|
||||
working_time = get_time_millis() - time_ms
|
||||
callback(result, working_time)
|
||||
else:
|
||||
return result
|
||||
except (requests.ConnectTimeout, requests.exceptions.ConnectionError) as e:
|
||||
return False
|
||||
|
||||
def get_rest_value(url, timeout, method, user, pwd, headers, callback=None, result_type='single', path=''):
|
||||
time_ms = get_time_millis()
|
||||
session = requests.Session()
|
||||
if user and pwd:
|
||||
session.auth = (user, pwd)
|
||||
@@ -64,10 +70,11 @@ def get_rest_value(url, timeout, method, user, pwd, headers, callback=None, resu
|
||||
)
|
||||
resp = json.loads(response.content.decode().replace("'", '"'))
|
||||
result = parse_response(resp, path)
|
||||
if not result.isalnum():
|
||||
if not str(result).isalnum():
|
||||
result = 0
|
||||
if callback is not None:
|
||||
callback(result)
|
||||
working_time = get_time_millis() - time_ms
|
||||
callback(result, working_time)
|
||||
else:
|
||||
return result
|
||||
except (requests.ConnectTimeout, requests.exceptions.ConnectionError) as e:
|
||||
@@ -95,6 +102,7 @@ def parse_response(resp, path):
|
||||
return ''
|
||||
|
||||
def get_shell_value(command, args, callback=None):
|
||||
time_ms = get_time_millis()
|
||||
cmd = [command, ' '.join(str(s) for s in args)]
|
||||
try:
|
||||
output = subprocess.check_output(cmd)
|
||||
@@ -106,17 +114,26 @@ def get_shell_value(command, args, callback=None):
|
||||
result = 0
|
||||
|
||||
if callback is not None:
|
||||
callback(result)
|
||||
working_time = get_time_millis() - time_ms
|
||||
callback(result, working_time)
|
||||
else:
|
||||
return result
|
||||
|
||||
def is_ping(ip, count, callback=None):
|
||||
time_ms = get_time_millis()
|
||||
param = '-n' if platform.system().lower() == 'windows' else '-c'
|
||||
command = ['ping', param, str(count), ip]
|
||||
output = subprocess.check_output(command)
|
||||
result = 'unreachable' not in str(output) and 'could not find' not in str(output) and 'time out' not in str(output)
|
||||
try:
|
||||
output = subprocess.check_output(command)
|
||||
result = ('unreachable'.upper() not in str(output).upper() and
|
||||
'could not find'.upper() not in str(output).upper() and
|
||||
'time out'.upper() not in str(output).upper())
|
||||
except:
|
||||
result = False
|
||||
|
||||
if callback is not None:
|
||||
callback(result)
|
||||
working_time = get_time_millis() - time_ms
|
||||
callback(result, working_time)
|
||||
else:
|
||||
return result
|
||||
|
||||
@@ -126,14 +143,16 @@ def get_net_iface_stat(name):
|
||||
def get_next_update_time(d):
|
||||
return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(d.updated_at + d.interval))
|
||||
|
||||
def get_time_millis():
|
||||
return round(time.time() * 1000)
|
||||
|
||||
class DiskMetric(AbstractMetric):
|
||||
def __init__(self, config, prefix=''):
|
||||
def __init__(self, config):
|
||||
super().__init__('disk', config)
|
||||
for d in self.config:
|
||||
mount_point, interval, name = d['path'], d['interval'], d['name']
|
||||
total, used, free = shutil.disk_usage(mount_point)
|
||||
self.data_array.append(DiskData(mount_point, total, used, free, interval, name, prefix))
|
||||
self.data_array.append(DiskData(mount_point, total, used, free, interval, name, self.prefix))
|
||||
|
||||
def proceed_metric(self):
|
||||
for d in self.data_array:
|
||||
@@ -148,7 +167,7 @@ class DiskMetric(AbstractMetric):
|
||||
|
||||
|
||||
class HealthMetric(AbstractMetric):
|
||||
def __init__(self, config, prefix=''):
|
||||
def __init__(self, config):
|
||||
super().__init__('health', config)
|
||||
for d in self.config:
|
||||
name, url, interval, timeout, method = d['name'], d['url'], d['interval'], d['timeout'], d['method']
|
||||
@@ -163,12 +182,12 @@ class HealthMetric(AbstractMetric):
|
||||
else:
|
||||
headers = ''
|
||||
result = is_health_check(url, timeout, method, user, pwd, headers)
|
||||
self.data_array.append(HealthData(name, url, interval, timeout, result, method, user, pwd, headers, prefix))
|
||||
self.data_array.append(HealthData(name, url, interval, timeout, result, method, user, pwd, headers, self.prefix))
|
||||
|
||||
def proceed_metric(self):
|
||||
for d in self.data_array:
|
||||
if d.is_need_to_update():
|
||||
thread = Thread(target=is_health_check, args=(d.url, d.timeout, d.method, d.user, d.password, d.headers, d.set_status))
|
||||
thread = Thread(target=is_health_check, args=(d.url, d.timeout, d.method, d.user, d.password, d.headers, d.set_data))
|
||||
thread.start()
|
||||
|
||||
def print_debug_info(self):
|
||||
@@ -177,17 +196,17 @@ class HealthMetric(AbstractMetric):
|
||||
|
||||
|
||||
class IcmpMetric(AbstractMetric):
|
||||
def __init__(self, config, prefix=''):
|
||||
def __init__(self, config):
|
||||
super().__init__('ping', config)
|
||||
for d in self.config:
|
||||
name, ip, count, interval = d['name'], d['ip'], d['count'], d['interval']
|
||||
result = is_ping(ip, count)
|
||||
self.data_array.append(IcmpData(name, ip, count, interval, result, prefix))
|
||||
self.data_array.append(IcmpData(name, ip, count, interval, result, self.prefix))
|
||||
|
||||
def proceed_metric(self):
|
||||
for d in self.data_array:
|
||||
if d.is_need_to_update():
|
||||
thread = Thread(target=is_ping, args=(d.ip, d.count, d.set_status))
|
||||
thread = Thread(target=is_ping, args=(d.ip, d.count, d.set_data))
|
||||
thread.start()
|
||||
|
||||
def print_debug_info(self):
|
||||
@@ -196,12 +215,12 @@ class IcmpMetric(AbstractMetric):
|
||||
|
||||
|
||||
class InterfaceMetric(AbstractMetric):
|
||||
def __init__(self, config, prefix=''):
|
||||
def __init__(self, config):
|
||||
super().__init__('iface', config)
|
||||
for d in self.config:
|
||||
name, iface, interval = d['name'], d['iface'], d['interval']
|
||||
result = get_net_iface_stat(iface)
|
||||
self.data_array.append(InterfaceData(name, iface, interval, result.bytes_sent, result.bytes_recv, prefix))
|
||||
self.data_array.append(InterfaceData(name, iface, interval, result.bytes_sent, result.bytes_recv, self.prefix))
|
||||
|
||||
def proceed_metric(self):
|
||||
for d in self.data_array:
|
||||
@@ -215,7 +234,7 @@ class InterfaceMetric(AbstractMetric):
|
||||
|
||||
|
||||
class RestValueMetric(AbstractMetric):
|
||||
def __init__(self, config, prefix=''):
|
||||
def __init__(self, config):
|
||||
super().__init__('rest_value', config)
|
||||
for d in self.config:
|
||||
name, url, interval, timeout, method = d['name'], d['url'], d['interval'], d['timeout'], d['method']
|
||||
@@ -232,13 +251,45 @@ class RestValueMetric(AbstractMetric):
|
||||
result_type, result_path = d['result_type'], d['result_path']
|
||||
result = get_rest_value(url=url, timeout=timeout, method=method, user=user, pwd=pwd, headers=headers,
|
||||
result_type=result_type, path=result_path)
|
||||
self.data_array.append(RestValueData(name, url, interval, timeout, result, method, user, pwd, headers, prefix, result_type, result_path))
|
||||
self.data_array.append(RestValueData(name, url, interval, timeout, result, method, user, pwd, headers, self.prefix, result_type, result_path))
|
||||
|
||||
def proceed_metric(self):
|
||||
for d in self.data_array:
|
||||
if d.is_need_to_update():
|
||||
thread = Thread(target=get_rest_value, args=(d.url, d.timeout, d.method, d.user, d.password, d.headers,
|
||||
d.set_value, d.type, d.path))
|
||||
d.set_data, d.type, d.path))
|
||||
thread.start()
|
||||
|
||||
def print_debug_info(self):
|
||||
for d in self.data_array:
|
||||
print(f'[DEBUG] (next update at {get_next_update_time(d)}) on {d.url}: by {d.method} in {d.path} got value="{d.value}"')
|
||||
|
||||
|
||||
class RestValueBMetric(AbstractMetric):
|
||||
def __init__(self, config):
|
||||
super().__init__('rest_value_b', config)
|
||||
for d in self.config:
|
||||
name, url, interval, timeout, method = d['name'], d['url'], d['interval'], d['timeout'], d['method']
|
||||
if 'auth' in self.config:
|
||||
user = d['auth']['user']
|
||||
pwd = d['auth']['pass']
|
||||
else:
|
||||
user = ''
|
||||
pwd = ''
|
||||
if 'headers' in self.config:
|
||||
headers = d['headers']
|
||||
else:
|
||||
headers = ''
|
||||
result_type, result_path = d['result_type'], d['result_path']
|
||||
result = get_rest_value(url=url, timeout=timeout, method=method, user=user, pwd=pwd, headers=headers,
|
||||
result_type=result_type, path=result_path)
|
||||
self.data_array.append(RestValueBData(name, url, interval, timeout, result, method, user, pwd, headers, self.prefix, result_type, result_path))
|
||||
|
||||
def proceed_metric(self):
|
||||
for d in self.data_array:
|
||||
if d.is_need_to_update():
|
||||
thread = Thread(target=get_rest_value, args=(d.url, d.timeout, d.method, d.user, d.password, d.headers,
|
||||
d.set_data, d.type, d.path))
|
||||
thread.start()
|
||||
|
||||
def print_debug_info(self):
|
||||
@@ -247,17 +298,17 @@ class RestValueMetric(AbstractMetric):
|
||||
|
||||
|
||||
class ShellValueMetric(AbstractMetric):
|
||||
def __init__(self, config, prefix=''):
|
||||
def __init__(self, config):
|
||||
super().__init__('shell_value', config)
|
||||
for d in self.config:
|
||||
name, command, interval, args = d['name'], d['command'], d['interval'], d['args']
|
||||
result = get_shell_value(command, args)
|
||||
self.data_array.append(ShellValueData(name, interval, command, result, args, prefix))
|
||||
self.data_array.append(ShellValueData(name, interval, command, result, args, self.prefix))
|
||||
|
||||
def proceed_metric(self):
|
||||
for d in self.data_array:
|
||||
if d.is_need_to_update():
|
||||
thread = Thread(target=get_shell_value, args=(d.command, d.args, d.set_value))
|
||||
thread = Thread(target=get_shell_value, args=(d.command, d.args, d.set_data))
|
||||
thread.start()
|
||||
|
||||
def print_debug_info(self):
|
||||
@@ -268,7 +319,7 @@ class ShellValueMetric(AbstractMetric):
|
||||
class UptimeMetric(AbstractMetric):
|
||||
def __init__(self, interval):
|
||||
super().__init__(None, {})
|
||||
self.data_array.append(UptimeData(interval))
|
||||
self.data_array.append(UptimeData(interval, self.prefix))
|
||||
|
||||
def proceed_metric(self):
|
||||
for d in self.data_array:
|
||||
@@ -283,7 +334,7 @@ class UptimeMetric(AbstractMetric):
|
||||
class SystemMetric(AbstractMetric):
|
||||
def __init__(self, interval):
|
||||
super().__init__(None, {})
|
||||
self.data_array.append(SystemData(interval, app_config.INSTANCE_PREFIX))
|
||||
self.data_array.append(SystemData(interval, self.prefix))
|
||||
|
||||
def proceed_metric(self):
|
||||
for d in self.data_array:
|
||||
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Reference in New Issue
Block a user