How to Monitor a Teamscale Instance
To check and monitor the state of a Teamscale instance, the following options are available:
Monitoring via Web UI
Teamscale provides many helpful metrics via the System Information view in the System perspective. In addition, the logs (e.g., Worker Log) can be helpful for diagnosis.
API Endpoint for Nagios
The URLs api/health-check
and api/health-metrics
provide check results and metrics in the Nagios format. This can be used with Nagios or compatible solutions, such as Sensu, to monitor the current health status of Teamscale. To integrate on the command-line, see the monitoring
directory in your Teamscale distribution.
Exposed Metrics
General information about the instance
- State of the scheduler
- If the instance has run out of memory
- If the instance is running out of disk space
- If the Java VM is running out of memory
- If the license is valid or outdated
- If the server certificate is valid
- The version of Teamscale
- Number of workers
- The load of the worker
- Number of licensed user
- Number of users
- Number of active users in specific time frames
- Number of committers in specific time frames
- Number of projects
API Endpoint for Prometheus
The URL api/monitoring/prometheus
exposes various metrics of Teamscale in the Prometheus format. Our reference documentation contains more information on how to enable and configure Teamscale for use with Prometheus.
Exposed Metrics
General information about the instance
- Name of the instance
- Name of the process
- The version of Teamscale
- If the instance is in shadow mode
- State of the scheduler
- Number of workers
- The load of the workers
- The current runtime of each worker
- The size of the job queue
- Days left until license expires
- Number of licensed user
- Number of users
- Number of active users in specific time frames
- Number of committers in specific time frames
- Number of projects
- CPU load
- Number of logical CPU cores
- Size of the RAM
- Amount of used RAM
- Detailed state of the Java VM
- Used RAM
- Number of threads
- Time spend in garbage collection
- and more
- Statistics of the Internal String Abbreviator Cache
- Statistics on service calls
- Directory size of the working directory, storage directory, (git) repository directory and temp directory used by Teamscale
- Amount of critical system events
Metrics for each project.
- Primary public ID of the project
- Analysis state of the project
- Number of connectors in specific states
- Number of files
- Number of lines of code
- Number of commits in specific states
- Number of the different log entries
- Number of rolled back commits within the last 24 hours
- Duration of rollback executions
- Number of currently postponed rollbacks
- Whether the project is paused
- Statistics on pre-commit usage
Storage performance metrics. These metrics are disabled by default because they are expensive to collect and are only useful for debugging. To enable these metrics set the flag -Dcom.teamscale.storage-metrics.enabled=true
. More information how to set a flag can be seen here.
- Number of opening operations for a store
- Number of storage operations
- Number of overall affected keys in storage operations
- Number of overall bytes in keys
- Number of overall bytes in values
- Duration in milliseconds of storage operations
Forwarding Teamscale Logs to Splunk
Teamscale uses the Log4J logging framework and provides support for forwarding the generated logs to a Splunk server. Log forwarding can be configured using the Splunk logging for Java integration and Teamscale fully supports HTTP Event Collector (recommended) and TCP data inputs. See the default log4j2.yaml
Log4J configuration file in the Teamscale distribution Zip for an example configuration. For further configuration options refer to the official How to use Splunk logging for Java page. To reduce load on the Splunk server, consider adjusting the batch_interval
, batch_size_bytes
, or batch_size_count
of the Log4J appender for Splunk to reduce the frequency of log forwarding.