Skywalking-04:扩展Metric监控信息

Skywalking-04:扩展Metric监控信息

扩展 Metric 监控信息

官方文档

Source and Scope extension for new metrics

案例:JVM Thread 增加 Metrics

修改 Thread 的定义

apm-protocol/apm-network/src/main/proto/language-agent/JVMMetric.proto 协议文件中覆盖 message Thread 的定义

message Thread {
  int64 liveCount = 1;
  int64 daemonCount = 2;
  int64 peakCount = 3;
  int64 deadlocked = 4;
  int64 monitorDeadlocked = 5;
  int64 newThreadCount = 7;
  int64 runnableThreadCount = 8;
  int64 blockedThreadCount = 9;
  int64 waitThreadCount = 10;
  int64 timeWaitThreadCount = 11;
  int64 terminatedThreadCount = 12;
}

重新构建 apm-network 项目

cd apm-protocol/apm-network
mvn clean package -DskipTests=true

PS:可以安装 Protocol Buffer Editor 插件,支持 Protocol Buffer 语法

修改 agent core 中 Thread Metrics 的提供类

直接使用如下代码覆盖 org.apache.skywalking.apm.agent.core.jvm.thread.ThreadProvider 类

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 *
 */

package org.apache.skywalking.apm.agent.core.jvm.thread;

import java.lang.management.ManagementFactory;
import java.lang.management.ThreadInfo;
import java.lang.management.ThreadMXBean;
import java.util.Optional;

import org.apache.skywalking.apm.network.language.agent.v3.Thread;

public enum ThreadProvider {
    INSTANCE;
    private final ThreadMXBean threadMXBean;
    private static final long [] EMPTY_DEADLOCKED_THREADS = new long[0];

    ThreadProvider() {
        this.threadMXBean = ManagementFactory.getThreadMXBean();
    }

    public Thread getThreadMetrics() {
        int newThreadCount = 0;
        int runnableThreadCount = 0;
        int blockedThreadCount = 0;
        int waitThreadCount = 0;
        int timeWaitThreadCount = 0;
        int terminatedThreadCount = 0;

        // 基于线程状态信息增加对应状态的线程数
        ThreadInfo[] threadInfos = threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds());
        if (threadInfos != null) {
            for (ThreadInfo threadInfo : threadInfos) {
                if (threadInfo != null) {
                    switch (threadInfo.getThreadState()) {
                        case NEW:
                            newThreadCount++;
                            break;
                        case RUNNABLE:
                            runnableThreadCount++;
                            break;
                        case BLOCKED:
                            blockedThreadCount++;
                            break;
                        case WAITING:
                            waitThreadCount++;
                            break;
                        case TIMED_WAITING:
                            timeWaitThreadCount++;
                            break;
                        case TERMINATED:
                            terminatedThreadCount++;
                            break;
                        default:
                            break;
                    }
                } else {
                    /*
                     * If a thread of a given ID is not alive or does not exist,
                     * the corresponding element in the returned array will,
                     * contain null,because is mut exist ,so the thread is terminated
                     */
                    terminatedThreadCount++;
                }
            }
        }

        // 当前存活线程数
        int threadCount = threadMXBean.getThreadCount();
        // deamon线程数
        int daemonThreadCount = threadMXBean.getDaemonThreadCount();
        // 峰值线程数
        int peakThreadCount = threadMXBean.getPeakThreadCount();
        int deadlocked = Optional.ofNullable(threadMXBean.findDeadlockedThreads())
                .orElse(EMPTY_DEADLOCKED_THREADS).length;
        int monitorDeadlocked = Optional.ofNullable(threadMXBean.findMonitorDeadlockedThreads())
                .orElse(EMPTY_DEADLOCKED_THREADS).length;
        // 构建一个Thread对象,用于发送Thread Metric信息至OAP
        return Thread.newBuilder().setLiveCount(threadCount)
                .setDaemonCount(daemonThreadCount)
                .setPeakCount(peakThreadCount)
                .setDeadlocked(deadlocked)
                .setMonitorDeadlocked(monitorDeadlocked)
                .setNewThreadCount(newThreadCount)
                .setRunnableThreadCount(runnableThreadCount)
                .setBlockedThreadCount(blockedThreadCount)
                .setWaitThreadCount(waitThreadCount)
                .setTimeWaitThreadCount(timeWaitThreadCount)
                .setTerminatedThreadCount(terminatedThreadCount)
                .build();
    }

}

修改 ServiceInstanceJVMThread

直接使用如下代码覆盖 org.apache.skywalking.oap.server.core.source.ServiceInstanceJVMThread 类,
ServiceInstanceJVMThread继承了 Source 抽象类, Source 类是 Skywalking 中 oal 体系,资源及范围的定义。

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 *
 */

package org.apache.skywalking.oap.server.core.source;

import lombok.Getter;
import lombok.Setter;

import static org.apache.skywalking.oap.server.core.source.DefaultScopeDefine.SERVICE_INSTANCE_CATALOG_NAME;
import static org.apache.skywalking.oap.server.core.source.DefaultScopeDefine.SERVICE_INSTANCE_JVM_THREAD;

@ScopeDeclaration(id = SERVICE_INSTANCE_JVM_THREAD, name = "ServiceInstanceJVMThread", catalog = SERVICE_INSTANCE_CATALOG_NAME)
@ScopeDefaultColumn.VirtualColumnDefinition(fieldName = "entityId", columnName = "entity_id", isID = true, type = String.class)
public class ServiceInstanceJVMThread extends Source {
    @Override
    public int scope() {
        return SERVICE_INSTANCE_JVM_THREAD;
    }

    @Override
    public String getEntityId() {
        return String.valueOf(id);
    }

    @Getter
    @Setter
    private String id;
    @Getter
    @Setter
    @ScopeDefaultColumn.DefinedByField(columnName = "name", requireDynamicActive = true)
    private String name;
    @Getter
    @Setter
    @ScopeDefaultColumn.DefinedByField(columnName = "service_name", requireDynamicActive = true)
    private String serviceName;
    @Getter
    @Setter
    @ScopeDefaultColumn.DefinedByField(columnName = "service_id")
    private String serviceId;
    @Getter
    @Setter
    private long liveCount;
    @Getter
    @Setter
    private long daemonCount;
    @Getter
    @Setter
    private long peakCount;
    @Getter
    @Setter
    private long deadlocked;
    @Getter
    @Setter
    private long monitorDeadlocked;
    @Getter
    @Setter
    private long newThreadCount;
    @Getter
    @Setter
    private long runnableThreadCount;
    @Getter
    @Setter
    private long blockedThreadCount;
    @Getter
    @Setter
    private long waitThreadCount;
    @Getter
    @Setter
    private long timeWaitThreadCount;
    @Getter
    @Setter
    private long terminatedThreadCount;
}

修改 JVMSourceDispatcher

org.apache.skywalking.oap.server.analyzer.provider.jvm.JVMSourceDispatcher 是一个 Source 分发类,将从 agent 接收到的 JVM 相关 Metrics 拆分成对应的 Source 。例如: ServiceInstanceJVMMemory 、 ServiceInstanceJVMThread 。
修改方法org.apache.skywalking.oap.server.analyzer.provider.jvm.JVMSourceDispatcher#sendToThreadMetricProcess 为如下代码:

    private void sendToThreadMetricProcess(String service,
            String serviceId,
            String serviceInstance,
            String serviceInstanceId,
            long timeBucket,
            Thread thread) {
        ServiceInstanceJVMThread serviceInstanceJVMThread = new ServiceInstanceJVMThread();
        serviceInstanceJVMThread.setId(serviceInstanceId);
        serviceInstanceJVMThread.setName(serviceInstance);
        serviceInstanceJVMThread.setServiceId(serviceId);
        serviceInstanceJVMThread.setServiceName(service);
        serviceInstanceJVMThread.setLiveCount(thread.getLiveCount());
        serviceInstanceJVMThread.setDaemonCount(thread.getDaemonCount());
        serviceInstanceJVMThread.setPeakCount(thread.getPeakCount());
        serviceInstanceJVMThread.setTimeBucket(timeBucket);
        serviceInstanceJVMThread.setDeadlocked(thread.getDeadlocked());
        serviceInstanceJVMThread.setMonitorDeadlocked(thread.getMonitorDeadlocked());
        serviceInstanceJVMThread.setNewThreadCount(thread.getNewThreadCount());
        serviceInstanceJVMThread.setRunnableThreadCount(thread.getRunnableThreadCount());
        serviceInstanceJVMThread.setBlockedThreadCount(thread.getBlockedThreadCount());
        serviceInstanceJVMThread.setWaitThreadCount(thread.getWaitThreadCount());
        serviceInstanceJVMThread.setTimeWaitThreadCount(thread.getTimeWaitThreadCount());
        serviceInstanceJVMThread.setTerminatedThreadCount(thread.getTerminatedThreadCount());
        sourceReceiver.receive(serviceInstanceJVMThread);
    }

java-agent.oal 增加相关指标

oap-server/server-bootstrap/src/main/resources/oal/java-agent.oal 添加如下语句

// 参考oal语法
instance_jvm_thread_deadlocked = from(ServiceInstanceJVMThread.deadlocked).longAvg();
instance_jvm_thread_monitor_deadlocked = from(ServiceInstanceJVMThread.monitorDeadlocked).longAvg();
instance_jvm_thread_new_thread_count = from(ServiceInstanceJVMThread.newThreadCount).longAvg();
instance_jvm_thread_runnable_thread_count = from(ServiceInstanceJVMThread.runnableThreadCount).longAvg();
instance_jvm_thread_blocked_thread_count = from(ServiceInstanceJVMThread.blockedThreadCount).longAvg();
instance_jvm_thread_wait_thread_count = from(ServiceInstanceJVMThread.waitThreadCount).longAvg();
instance_jvm_thread_time_wait_thread_count = from(ServiceInstanceJVMThread.timeWaitThreadCount).longAvg();
instance_jvm_thread_terminated_thread_count = from(ServiceInstanceJVMThread.terminatedThreadCount).longAvg();

修改 apm.yml

oap-server/server-bootstrap/src/main/resources/ui-initialized-templates/apm.yml 文件的 APM 面板下的 Instance 项增加如下配置

{
  "width": 3,
  "title": "JVM Thread Count (Java Service)",
  "height": "250",
  "entityType": "ServiceInstance",
  "independentSelector": false,
  "metricType": "REGULAR_VALUE",
  "queryMetricType": "readMetricsValues",
  "chartType": "ChartLine",
  "metricName": "instance_jvm_thread_live_count, instance_jvm_thread_daemon_count, instance_jvm_thread_peak_count,instance_jvm_thread_deadlocked,instance_jvm_thread_monitor_deadlocked"
},
{
  "width": 3,
  "title": "JVM Thread State Count (Java Service)",
  "height": "250",
  "entityType": "ServiceInstance",
  "independentSelector": false,
  "metricType": "REGULAR_VALUE",
  "metricName": "instance_jvm_thread_new_thread_count,instance_jvm_thread_runnable_thread_count,instance_jvm_thread_blocked_thread_count,instance_jvm_thread_wait_thread_count,instance_jvm_thread_time_wait_thread_count,instance_jvm_thread_terminated_thread_count",
  "queryMetricType": "readMetricsValues",
  "chartType": "ChartBar"
}

如果不清楚添加位置,可以直接使用如下配置,覆盖 oap-server/server-bootstrap/src/main/resources/ui-initialized-templates/apm.yml

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# UI templates initialized file includes the default template when the SkyWalking OAP starts up at the first time.
#
# Also, SkyWalking would detect the existing templates in the database, once they are missing, all templates in this file
# could be added automatically.

templates:
  - name: "APM"
    # The type includes DASHBOARD, TOPOLOGY_INSTANCE, TOPOLOGY_ENDPOINT.
    # DASHBOARD type templates could have multiple definitions, by using different names.
    # TOPOLOGY_INSTANCE, TOPOLOGY_ENDPOINT type templates should be defined once, as they are used in the topology page only.
    type: "DASHBOARD"
    # Configuration could be defined through UI, and use `export` to format in the standard JSON.
    configuration: |-
      [
        {
          "name": "APM",
          "type": "service",
          "children": [
            {
              "name": "Global",
              "children": [
                {
                  "width": 3,
                  "title": "Services Load",
                  "height": "300",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_cpm",
                  "queryMetricType": "sortMetrics",
                  "chartType": "ChartSlow",
                  "parentService": false,
                  "unit": "CPM - calls per minute"
                },
                {
                  "width": 3,
                  "title": "Slow Services",
                  "height": "300",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_resp_time",
                  "queryMetricType": "sortMetrics",
                  "chartType": "ChartSlow",
                  "parentService": false,
                  "unit": "ms"
                },
                {
                  "width": 3,
                  "title": "Un-Health Services (Apdex)",
                  "height": "300",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_apdex",
                  "queryMetricType": "sortMetrics",
                  "chartType": "ChartSlow",
                  "parentService": false,
                  "aggregation": "/",
                  "aggregationNum": "10000",
                  "sortOrder": "ASC"
                },
                {
                  "width": 3,
                  "title": "Slow Endpoints",
                  "height": "300",
                  "entityType": "Endpoint",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "endpoint_avg",
                  "queryMetricType": "sortMetrics",
                  "chartType": "ChartSlow",
                  "parentService": false,
                  "unit": "ms"
                },
                {
                  "width": "6",
                  "title": "Global Response Latency",
                  "height": "280",
                  "entityType": "All",
                  "independentSelector": false,
                  "metricType": "LABELED_VALUE",
                  "metricName": "all_percentile",
                  "queryMetricType": "readLabeledMetricsValues",
                  "chartType": "ChartLine",
                  "metricLabels": "P50, P75, P90, P95, P99",
                  "labelsIndex": "0, 1, 2, 3, 4",
                  "unit": "percentile in ms"
                },
                {
                  "width": "6",
                  "title": "Global Heatmap",
                  "height": "280",
                  "entityType": "All",
                  "independentSelector": false,
                  "metricType": "HEATMAP",
                  "unit": "ms",
                  "queryMetricType": "readHeatMap",
                  "chartType": "ChartHeatmap",
                  "metricName": "all_heatmap"
                }
              ]
            },
            {
              "name": "Service",
              "children": [
                {
                  "width": 3,
                  "title": "Service Apdex",
                  "height": "200",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_apdex",
                  "queryMetricType": "readMetricsValue",
                  "chartType": "ChartNum",
                  "aggregation": "/",
                  "aggregationNum": "10000"
                },
                {
                  "width": 3,
                  "title": "Service Avg Response Time",
                  "height": "200",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_resp_time",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "ms"
                },
                {
                  "width": 3,
                  "title": "Successful Rate",
                  "height": "200",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_sla",
                  "queryMetricType": "readMetricsValue",
                  "chartType": "ChartNum",
                  "unit": "%",
                  "aggregation": "/",
                  "aggregationNum": "100"
                },
                {
                  "width": 3,
                  "title": "Service Load",
                  "height": "200",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_cpm",
                  "queryMetricType": "readMetricsValue",
                  "chartType": "ChartNum",
                  "unit": "CPM - calls per minute"
                },
                {
                  "width": 3,
                  "title": "Service Apdex",
                  "height": "200",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_apdex",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "aggregation": "/",
                  "aggregationNum": "10000"
                },
                {
                  "width": 3,
                  "title": "Service Response Time Percentile",
                  "height": "200",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "LABELED_VALUE",
                  "metricName": "service_percentile",
                  "queryMetricType": "readLabeledMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "ms",
                  "metricLabels": "P50, P75, P90, P95, P99",
                  "labelsIndex": "0, 1, 2, 3, 4"
                },
                {
                  "width": 3,
                  "title": "Successful Rate",
                  "height": "200",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_sla",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "%",
                  "aggregation": "/",
                  "aggregationNum": "100"
                },
                {
                  "width": 3,
                  "title": "Service Load",
                  "height": "200",
                  "entityType": "Service",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_cpm",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "CPM - calls per minute"
                },
                {
                  "width": "4",
                  "title": "Service Instances Load",
                  "height": "280",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_instance_cpm",
                  "queryMetricType": "sortMetrics",
                  "chartType": "ChartSlow",
                  "parentService": true,
                  "unit": "CPM - calls per minute"
                },
                {
                  "width": "4",
                  "title": "Slow Service Instance",
                  "height": "280",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_instance_resp_time",
                  "queryMetricType": "sortMetrics",
                  "chartType": "ChartSlow",
                  "parentService": true,
                  "unit": "ms"
                },
                {
                  "width": "4",
                  "title": "Service Instance Successful Rate",
                  "height": "280",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_instance_sla",
                  "queryMetricType": "sortMetrics",
                  "chartType": "ChartSlow",
                  "parentService": true,
                  "unit": "%",
                  "aggregation": "/",
                  "aggregationNum": "100",
                  "sortOrder": "ASC"
                }
              ]
            },
            {
              "name": "Instance",
              "children": [
                {
                  "width": "3",
                  "title": "Service Instance Load",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_instance_cpm",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "CPM - calls per minute"
                },
                {
                  "width": 3,
                  "title": "Service Instance Throughput",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_instance_throughput_received,service_instance_throughput_sent",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "Bytes"
                },
                {
                  "width": "3",
                  "title": "Service Instance Successful Rate",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_instance_sla",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "%",
                  "aggregation": "/",
                  "aggregationNum": "100"
                },
                {
                  "width": "3",
                  "title": "Service Instance Latency",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "service_instance_resp_time",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "ms"
                },
                {
                  "width": 3,
                  "title": "JVM CPU (Java Service)",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "instance_jvm_cpu",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "%",
                  "aggregation": "+",
                  "aggregationNum": ""
                },
                {
                  "width": 3,
                  "title": "JVM Memory (Java Service)",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "instance_jvm_memory_heap, instance_jvm_memory_heap_max,instance_jvm_memory_noheap, instance_jvm_memory_noheap_max",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "MB",
                  "aggregation": "/",
                  "aggregationNum": "1048576"
                },
                {
                  "width": 3,
                  "title": "JVM GC Time",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "instance_jvm_young_gc_time, instance_jvm_old_gc_time",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "ms"
                },
                {
                  "width": 3,
                  "title": "JVM GC Count",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartBar",
                  "metricName": "instance_jvm_young_gc_count, instance_jvm_old_gc_count"
                },
                {
                  "width": 3,
                  "title": "JVM Thread Count (Java Service)",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "metricName": "instance_jvm_thread_live_count, instance_jvm_thread_daemon_count, instance_jvm_thread_peak_count,instance_jvm_thread_deadlocked,instance_jvm_thread_monitor_deadlocked"
                },
                {
                  "width": 3,
                  "title": "JVM Thread State Count (Java Service)",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "instance_jvm_thread_new_thread_count,instance_jvm_thread_runnable_thread_count,instance_jvm_thread_blocked_thread_count,instance_jvm_thread_wait_thread_count,instance_jvm_thread_time_wait_thread_count,instance_jvm_thread_terminated_thread_count",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartBar"
                },
                {
                  "width": 3,
                  "title": "CLR CPU  (.NET Service)",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "instance_clr_cpu",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "%"
                },
                {
                  "width": 3,
                  "title": "CLR GC (.NET Service)",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "instance_clr_gen0_collect_count, instance_clr_gen1_collect_count, instance_clr_gen2_collect_count",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartBar"
                },
                {
                  "width": 3,
                  "title": "CLR Heap Memory (.NET Service)",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "instance_clr_heap_memory",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "MB",
                  "aggregation": "/",
                  "aggregationNum": "1048576"
                },
                {
                  "width": 3,
                  "title": "CLR Thread (.NET Service)",
                  "height": "250",
                  "entityType": "ServiceInstance",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "metricName": "instance_clr_available_completion_port_threads,instance_clr_available_worker_threads,instance_clr_max_completion_port_threads,instance_clr_max_worker_threads"
                }
              ]
            },
            {
              "name": "Endpoint",
              "children": [
                {
                  "width": "4",
                  "title": "Endpoint Load in Current Service",
                  "height": "280",
                  "entityType": "Endpoint",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "endpoint_cpm",
                  "queryMetricType": "sortMetrics",
                  "chartType": "ChartSlow",
                  "parentService": true,
                  "unit": "CPM - calls per minute"
                },
                {
                  "width": "4",
                  "title": "Slow Endpoints in Current Service",
                  "height": "280",
                  "entityType": "Endpoint",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "queryMetricType": "sortMetrics",
                  "chartType": "ChartSlow",
                  "metricName": "endpoint_avg",
                  "unit": "ms",
                  "parentService": true
                },
                {
                  "width": "4",
                  "title": "Successful Rate in Current Service",
                  "height": "280",
                  "entityType": "Endpoint",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "endpoint_sla",
                  "queryMetricType": "sortMetrics",
                  "chartType": "ChartSlow",
                  "aggregation": "/",
                  "aggregationNum": "100",
                  "parentService": true,
                  "unit": "%",
                  "sortOrder": "ASC"
                },
                {
                  "width": 3,
                  "title": "Endpoint Load",
                  "height": 350,
                  "entityType": "Endpoint",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "endpoint_cpm",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine"
                },
                {
                  "width": 3,
                  "title": "Endpoint Avg Response Time",
                  "height": 350,
                  "entityType": "Endpoint",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "endpoint_avg",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "ms"
                },
                {
                  "width": 3,
                  "title": "Endpoint Response Time Percentile",
                  "height": 350,
                  "entityType": "Endpoint",
                  "independentSelector": false,
                  "metricType": "LABELED_VALUE",
                  "metricName": "endpoint_percentile",
                  "queryMetricType": "readLabeledMetricsValues",
                  "chartType": "ChartLine",
                  "metricLabels": "P50, P75, P90, P95, P99",
                  "labelsIndex": "0, 1, 2, 3, 4",
                  "unit": "ms"
                },
                {
                  "width": 3,
                  "title": "Endpoint Successful Rate",
                  "height": 350,
                  "entityType": "Endpoint",
                  "independentSelector": false,
                  "metricType": "REGULAR_VALUE",
                  "metricName": "endpoint_sla",
                  "queryMetricType": "readMetricsValues",
                  "chartType": "ChartLine",
                  "unit": "%",
                  "aggregation": "/",
                  "aggregationNum": "100"
                }
              ]
            }
          ]
        }
      ]
    # Activated as the DASHBOARD type, makes this templates added into the UI page automatically.
    # False means providing a basic template, user needs to add it manually.
    activated: true
    # True means wouldn"t show up on the dashboard. Only keeps the definition in the storage.
    disabled: false

效果展示

file

file

代码贡献

  • Add some new thread metric and class metric to JVMMetric #7230
  • add some new thread metric and class metric to JVMMetric #52
  • Remove Terminated State and New State in JVMMetric (#7230) #53
  • Add some new thread metric and class metric to JVMMetric (#7230) #7243

总结

Metric 如何扩展,网上基本上没案例,都是看官方文档和源码了解。对应这种非常热门的开源项目,还是看官方文档和源码更稳。

参考文档

  • Java ManagementFactory解析
  • 编程中使用ThreadMXBean类来检测死锁
  • Source and Scope extension for new metrics
  • Observability Analysis Language

分享并记录所学所见

hmoban主题是根据ripro二开的主题,极致后台体验,无插件,集成会员系统
自学咖网 » Skywalking-04:扩展Metric监控信息