Task Manager health monitoring | Kibana Guide

What is Kibana?
What’s new in 8.10
Kibana concepts
- Create a data view
- Set the time range
- Kibana Query Language
- Lucene query syntax
- Save a query
Quick start
Set up
- Install Kibana
- Configure Kibana
- Start and stop Kibana
- Access Kibana
- Securing access to Kibana
- Add data
- Upgrade Kibana
- Configure security
- Configure reporting
- Configure logging
  - Examples
  - Cli configuration
- Configure monitoring
- Command line tools
  - kibana-encryption-keys
  - kibana-verification-code
Production considerations
- Security
- Alerting
- Reporting
- Task Manager
  - Health monitoring
  - Troubleshooting
Discover
- Explore your documents
- Search for relevance
- Save a search for reuse
- View field statistics
- Run a pattern analysis on your log data
- Run a search session in the background
Dashboard and visualizations
- Create your first dashboard
- Analyze time series data
- Create panels with editors
  - Lens
  - TSVB
  - Vega
  - Aggregation-based
  - Timelion
- Make dashboards interactive
- Improve dashboard loading time
Canvas
- Edit workpads
- Present your workpad
- Tutorial: Create a workpad for monitoring sales
- Canvas expression lifecycle
- Canvas function reference
  - TinyMath functions
Maps
- Build a map to compare metrics by country or region
- Track, visualize, and alert on assets in real time
- Map custom regions with reverse geocoding
- Heat map layer
- Tile layer
- Vector layer
- Plot big data
- Search geographic data
- Configure map settings
- Connect to Elastic Maps Service
- Import geospatial data
  - Tutorial: Index GeoJSON data
- Troubleshoot
Reporting and sharing
- Automatically generate reports
- Troubleshooting
Machine learning
- Anomaly detection
- Data frame analytics
- AIOps Labs
Graph
- Configure Graph
- Troubleshooting and limitations
Alerting
- Set up
- Create and manage rules
- Rule types
- Rule action variables
- Troubleshooting and limitations
Observability
APM
- Set up
- Get started
- How-to guides
- Users and privileges
- Settings
- REST API
- Troubleshooting
Security
Dev Tools
- Run API requests
- Profile queries and aggregations
- Debug grok expressions
- Debug Painless scripts
Fleet
Osquery
- Manage the integration
- Exported fields reference
- Prebuilt packs reference
- Osquery FAQ
Stack Monitoring
- Beats Metrics
- Elasticsearch Metrics
- Kibana alerts
- Kibana Metrics
- Logstash Metrics
- Troubleshooting
Stack Management
- Cases
- Connectors
  - Email
  - IBM Resilient
  - Index
  - Jira
  - Microsoft Teams
  - Opsgenie
  - PagerDuty
  - Server log
  - ServiceNow ITSM
  - ServiceNow SecOps
  - ServiceNow ITOM
  - Swimlane
  - Slack
  - Tines
  - Torq
  - Webhook
  - Webhook - Case Management
  - xMatters
  - Generative AI
  - D3 Security
  - Preconfigured connectors
- License Management
- Maintenance windows
- Manage data views
- Numeral Formatting
- Rollup Jobs
- Manage saved objects
  - Saved Object IDs
- Security
- Spaces
- Advanced Settings
- Tags
- Upgrade Assistant
- Watcher
REST API
- Get features API
- Kibana spaces APIs
- Kibana role management APIs
- User session management APIs
  - Invalidate user sessions
- Saved objects APIs
- Data views API
- Index patterns APIs
- Alerting APIs
- Action and connector APIs
- Cases APIs
- Import and export dashboard APIs
  - Import dashboard
  - Export dashboard
- Logstash configuration management APIs
- Machine learning APIs
  - Sync machine learning saved objects
- Osquery manager API
- Short URLs APIs
- Get Task Manager health
- Upgrade assistant APIs
Kibana plugins
Troubleshooting
- Using Kibana server logs
- Trace Elasticsearch query to the origin in Kibana
Accessibility
Release notes
- Kibana 8.9.2
- Kibana 8.9.1
- Kibana 8.9.0
  - Enhancements and bug fixes
- Kibana 8.8.2
- Kibana 8.8.1
- Kibana 8.8.0
  - Enhancements and bug fixes
- Kibana 8.7.1
- Kibana 8.7.0
  - Enhancements and bug fixes
- Kibana 8.6.1
- Kibana 8.6.0
  - Enhancements and bug fixes
- Kibana 8.5.2
- Kibana 8.5.1
- Kibana 8.5.0
  - Enhancements and bug fixes
- Kibana 8.4.3
- Kibana 8.4.2
- Kibana 8.4.1
- Kibana 8.4.0
  - Enhancements and bug fixes
- Kibana 8.3.3
- Kibana 8.3.2
- Kibana 8.3.1
- Kibana 8.3.0
  - Enhancements and bug fixes
- Kibana 8.2.3
- Kibana 8.2.2
- Kibana 8.2.1
- Kibana 8.2.0
  - Enhancements and bug fixes
- Kibana 8.1.3
- Kibana 8.1.2
- Kibana 8.1.1
- Kibana 8.1.0
  - Enhancements and bug fixes
- Kibana 8.0.0
  - Enhancements and bug fixes
- Kibana 8.0.0-rc2
  - Enhancements and bug fixes
- Kibana 8.0.0-rc1
  - Enhancements and bug fixes
- Kibana 8.0.0-beta1
- Kibana 8.0.0-alpha2
- Kibana 8.0.0-alpha1
Developer guide
- Getting started
- Best practices
- Architecture
- Contributing
- External plugin development
- Advanced
- List of Kibana plugins
- Development Telemetry

› › ›

« Task Manager Task Manager troubleshooting »

Task Manager health monitoringedit

This functionality is in technical preview and may be changed or removed in a future release. Elastic will apply best effort to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

The Task Manager has an internal monitoring mechanism to keep track of a variety of metrics, which can be consumed with either the health monitoring API or the Kibana server log.

The health monitoring API provides a reliable endpoint that can be monitored. Consuming this endpoint doesn’t cause additional load, but rather returns the latest health checks made by the system. This design enables consumption by external monitoring services at a regular cadence without additional load to the system.

Each Kibana instance exposes its own endpoint at:

$ curl -X GET api/task_manager/_health

Copy as curl

Monitoring the _health endpoint of each Kibana instance in the cluster is the recommended method of ensuring confidence in mission critical services such as Alerting, Actions, and Reporting.

Configuring the monitored health statisticsedit

The health monitoring API monitors the performance of Task Manager out of the box. However, certain performance considerations are deployment specific and you can configure them.

A health threshold is the threshold for failed task executions. Once a task exceeds this threshold, a status of warn or error is set on the task type execution. To configure a health threshold, use the xpack.task_manager.monitored_task_execution_thresholds setting. You can apply this this setting to all task types in the system, or to a custom task type.

By default, this setting marks the health of every task type as warning when it exceeds 80% failed executions, and as error at 90%. Set this value to a number between 0 to 100. The threshold is hit when the value exceeds this number. To avoid a status of error, set the threshold at 100. To hit error the moment any task fails, set the threshold to 0.

Create a custom configuration to set lower thresholds for task types you consider critical, such as alerting tasks that you want to detect sooner in an external monitoring service.

xpack.task_manager.monitored_task_execution_thresholds:
  default: 
    error_threshold: 70
    warn_threshold: 50
  custom:
    "alerting:.index-threshold": 
      error_threshold: 50
      warn_threshold: 0

	A default configuration that sets the system-wide `warn` threshold at a 50% failure rate, and `error` at 70% failure rate.
	A custom configuration for the `alerting:.index-threshold` task type that sets a system wide `warn` threshold at 0% (which sets a `warn` status the moment any task of that type fails), and `error` at a 50% failure rate.

Consuming health statsedit

The health API is best consumed by via the /api/task_manager/_health endpoint.

Additionally, there are two ways to consume these metrics:

Debug logging

The metrics are logged in the Kibana DEBUG logger at a regular cadence. To enable Task Manager debug logging in your Kibana instance, add the following to your kibana.yml:

logging:
  loggers:
      - context: plugins.taskManager
        appenders: [console]
        level: debug

These stats are logged based on the number of milliseconds set in your xpack.task_manager.poll_interval setting, which could add substantial noise to your logs. Only enable this level of logging temporarily.

Automatic logging

By default, the health API runs at a regular cadence, and each time it runs, it attempts to self evaluate its performance. If this self evaluation yields a potential problem, a message will log to the Kibana server log. In addition, the health API will look at how long tasks have waited to start (from when they were scheduled to start). If this number exceeds a configurable threshold (xpack.task_manager.monitored_stats_health_verbose_log.warn_delayed_task_start_in_seconds), the same message as above will log to the Kibana server log.

This message looks like:

Detected potential performance issue with Task Manager. Set 'xpack.task_manager.monitored_stats_health_verbose_log.enabled: true' in your Kibana.yml to enable debug logging`

If this message appears, set xpack.task_manager.monitored_stats_health_verbose_log.enabled to true in your kibana.yml. This will start logging the health metrics at either a warn or error log level, depending on the detected severity of the potential problem.

Making sense of Task Manager health statsedit

The health monitoring API exposes three sections: configuration, workload and runtime:

Configuration	This section summarizes the current configuration of Task Manager. This includes dynamic configurations that change over time, such as `poll_interval` and `max_workers`, which can adjust in reaction to changing load on the system.
Workload	This section summarizes the work load across the cluster, including the tasks in the system, their types, and current status.
Runtime	This section tracks execution performance of Task Manager, tracking task drift, worker load, and execution stats broken down by type, including duration and execution results.
Capacity Estimation	This section provides a rough estimate about the sufficiency of its capacity. As the name suggests, these are estimates based on historical data and should not be used as predictions. Use these estimations when following the Task Manager Scaling guidance.

Each section has a timestamp and a status that indicates when the last update to this section took place and whether the health of this section was evaluated as OK, Warning or Error.

The root status indicates the status of the system overall.

The Runtime status indicates whether task executions have exceeded any of the configured health thresholds. An OK status means none of the threshold have been exceeded. A Warning status means that at least one warning threshold has been exceeded. An Error status means that at least one error threshold has been exceeded.

Some tasks (such as connectors) will incorrectly report their status as successful even if the task failed. The runtime and workload block will return data about success and failures and will not take this into consideration.

To get a better sense of action failures, please refer to the Event log index for more accurate context into failures and successes.

The Capacity Estimation status indicates the sufficiency of the observed capacity. An OK status means capacity is sufficient. A Warning status means that capacity is sufficient for the scheduled recurring tasks, but non-recurring tasks often cause the cluster to exceed capacity. An Error status means that there is insufficient capacity across all types of tasks.

By monitoring the status of the system overall, and the status of specific task types of interest, you can evaluate the health of the Kibana Task Management system.

« Task Manager Task Manager troubleshooting »

On this page

Configuring the monitored health statistics
Consuming health stats
Making sense of Task Manager health stats

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Task Manager health monitoringedit

Configuring the monitored health statisticsedit

Consuming health statsedit

Making sense of Task Manager health statsedit

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards