Metrics

Prometheus metrics that can be used for monitoring and alerting.

Prometheus Metrics

Some Prow components expose Prometheus metrics that can be used for monitoring and alerting. The following table describes the metrics that are currently available.

Component	Type	Metric	Labels	Description
Tide	Gauge	`pooledprs`	org, repo, branch	The number of PRs in each Tide pool.
	Gauge	`updatetime`	org, repo, branch	The last time each Tide pool was synced.
	Gauge	`syncdur`		The Tide sync controller loop duration.
	Gauge	`statusupdatedur`		The Tide status controller loop duration.
	Histogram	`merges`	org, repo, branch	A histogram of the number of PRs in each merge.
	Counter	`tidepoolerrors`	org, repo, branch	Count of Tide pool sync errors.
	Counter	`tidequeryresults`	query_index, org_shard, result	Count of Tide queries by query index, org shard, and result (success/error).
	Counter	`tidesyncheartbeat`	controller	Count of Tide syncs per controller.
Hook	Counter	`prow_webhook_counter`	event_type	The number of GitHub webhooks received by Prow.
Plank/Jenkins-Operator	Gauge	`prowjobs`	job_name, type, state	The number of ProwJobs.
Jenkins-Operator	Counter	`jenkins_requests`	verb, handler, code	The number of jenkins requests made by Prow.
	Counter	`jenkins_request_retries`		The number of jenkins request retries Prow has made.
	Histogram	`jenkins_request_latency`	verb, handler	A histogram of round trip times between Prow and Jenkins.
	Histogram	`resync_period_seconds`		A histogram of the jenkins controller loop duration.
Bugzilla	Histogram	`bugzilla_request_duration`	method, status	Bugzilla request duration by API path.
Sinker	Gauge	`sinker_pods_existing`		Number of the existing pods in each sinker cleaning.
	Gauge	`sinker_loop_duration_seconds`		Time used in each sinker cleaning.
	Gauge	`sinker_pods_removed`	reason	Number of pods removed in each sinker cleaning.
	Gauge	`sinker_pod_removal_errors`	reason	Number of errors which occurred in each sinker pod cleaning.
	Gauge	`sinker_prow_jobs_existing`		Number of the existing prow jobs in each sinker cleaning.
	Gauge	`sinker_prow_jobs_cleaned`	reason	Number of prow jobs cleaned in each sinker cleaning.
	Gauge	`sinker_prow_jobs_cleaning_errors`	reason	Number of errors which occurred in each sinker prow job cleaning.
Crier	Histogram	`crier_report_latency`	reporter	Histogram of time spent reporting, calculated by the time difference between job completion and end of reporting.
	Counter	`crier_reporting_results`	reporter, result	Count of successful and failed reporting attempts by reporter.
Flagutil	Counter	`kubernetes_failed_client_creations`	cluster	The number of clusters for which we failed to create a client.
Gerrit/Adapter	Counter	`gerrit_processing_results`	instance, repo, result	Count of change processing by instance, repo, and result.
	Histogram	`gerrit_trigger_latency`	instance	Histogram of seconds between triggering event and ProwJob creation time.
Gerrit/Client	Counter	`gerrit_query_results`	instance, repo, result	Count of Gerrit API queries by instance, repo, and result.
GitHub	Gauge	`github_user_info`	token_hash, login, email	Metadata about a user, tied to their token hash.
GitHub-Server	Counter	`prow_webhook_counter`	event_type	A counter of the webhooks made to prow.
	Counter	`prow_webhook_response_codes`	response_code	A counter of the different responses hook has responded to webhooks with.
	Histogram	`prow_plugin_handle_duration_seconds`	event_type, action, plugin, took_action	How long Prow took to handle an event by plugin, event type and action.
	Counter	`prow_plugin_handle_errors`	event_type, action, plugin, took_action	Prow errors handling an event by plugin, event type and action.
Jenkins	Counter	`jenkins_requests`	verb, handler, code	Number of Jenkins requests made from prow.
	Counter	`jenkins_request_retries`		Number of Jenkins request retries made from prow.
	Histogram	`jenkins_request_latency`	verb, handler	Time for a request to roundtrip between prow and Jenkins.
	Histogram	`resync_period_seconds`		Time the controller takes to complete one reconciliation loop.
Jira	Histogram	`jira_request_duration_seconds`	method, path, status
Kube	Gauge	`prowjobs`	job_namespace, job_name, type, state, org, repo, base_ref, cluster, retest	Number of prowjobs in the system.
	Counter	`prowjob_state_transitions`	job_namespace, job_name, type, state, org, repo, base_ref, cluster, retest	Number of prowjobs transitioning states.
Plugins	Gauge	`prow_configmap_size_bytes`	name, namespace	Size of data fields in ConfigMaps updated automatically by Prow in bytes.
Pubsub/Subscriber	Counter	`prow_pubsub_message_counter`	subscription	A counter of the webhooks made to prow.
	Counter	`prow_pubsub_error_counter`	subscription, error_type	A counter of the webhooks made to prow.
	Counter	`prow_pubsub_ack_counter`	subscription	A counter for message acked made to prow.
	Counter	`prow_pubsub_nack_counter`	subscription	A counter for message nacked made to prow.
	Counter	`prow_pubsub_response_codes`	response_code, subscription	A counter of the different responses server has responded to Push Events with.
Version	Gauge	`prow_version`		Prow Version.

Pushgateway and Proxy

To support metric collection from ephemeral tasks like request handling and to provide a single scrape endpoint, Prow’s prometheus metrics are pushed to a Prometheus pushgateway that is scraped instead of the metric source. A proxy is used to limit cluster external requests to GET requests since Prometheus doesn’t provide any form of authentication. The pushgateway and proxy deployment are defined in pushgateway_deployment.yaml.

Kubernetes Prow Metrics

Prometheus metrics from the Kubernetes Prow instance are used to create the graphs at http://monitoring.prow.k8s.io

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified May 4, 2024: doc: Update outdated hyperlinks in website (#103) (c5e374dc7)