Observability
Projects our group is involved in.
This the multi-page printable view of this section. Click here to print.
Projects our group is involved in.
kube-state-metrics
(KSM) is a service that listens to the Kubernetes API server and generates metrics about the state of the objects. It’s an add-on agent to generate and expose cluster-level metrics.
README.md
/docs
/docs/design
/docs/developer
kube-state-metrics
- Frederic Branczykkube-state-metrics
kube-state-metrics
on Google Cloudkube-rbac-proxy
, as the name suggests, is an HTTP proxy that sits in front of a workload and performs authentication and authorization of incoming requests using the TokenReview
and SubjectAccessReview
resources of the Kubernetes API.
The purpose of kube-rbac-proxy
is to distinguish between calls made by same or different user(s) (or service account(s)) to endpoint(s) and protect them from unauthorized resource access based on their trusted identity (e.g. tokens, TLS certificates, etc.) or the RBACs they hold, respectively. Once the request is authenticated and/or authorized, the proxy forwards the response from the server to the client unmodified.
kube-rbac-proxy can be configured with one of the 2 mechanisms for authentication:
OpenID Connect where kube-rbac-proxy validates the client-provided token against the configured OIDC provider. This mechanism isn’t used by the monitoring components.
Kubernetes API using bearer tokens or mutual TLS:
TokenReview
request to verify the identity of the client.In the case of a failed authentication, an HTTP 401 Unauthorized
status code is returned (note the distinction between authentication and unauthorized here). Note that anonymous access is always disabled, and the proxy doesn’t rely on HTTP headers to authenticate the request but it can add them if started with --auth-header-fields-enabled
.
Refer to this page for more information on authentication in Kubernetes.
Once authentication is done, kube-rbac-proxy
must then decide whether to allow the user’s request to go through or not. A SubjectAccessReview
request is created for the API server, which allows for the review of the subject’s access to a particular resource. Essentially, it checks whether the authenticated user or service account has sufficient permissions to perform the desired action on the requested resource, based on the RBAC permissions granted to it. If so, the request is forwarded to the endpoint, otherwise it is rejected. It is worth mentioning that the HTTP verbs are internally mapped to their corresponding RBAC verbs. Note that static authorization (as described in the downstream usage section) without SubjectAccessReview is also possible.
Once the request is authenticated and authorized, it is forwarded to the endpoint. The response from the endpoint is then forwarded back to the client. If the request fails at any point, the proxy returns an error response to the client. If the authorization step fails, i.e., the client doesn’t have the required permissions to access the requested resource, kube-rbac-proxy
returns an HTTP 403 Forbidden
status code to the client and does not forward the request to the endpoint.
In the context of monitoring, we’re talking here about metric scrapes. These communications are usually secured using Mutual TLS (mTLS), which is a two-way authentication mechanism (see configuring Prometheus to scrape metrics).
Initially, the server (Prometheus) provides its digital certificate to the client which validates the server’s identity. The process is then reciprocated, as the client shares its digital certificate for authentication by the server. Following the successful completion of these authentication steps, a secure channel for encrypted communication is established, ensuring that data transfer between the entities is duly safeguarded.
apiVersion: apps/v1
kind: Deployment
...
spec:
template:
spec:
containers:
- name: kube-rbac-proxy
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
args:
- "--tls-cert-file=/etc/tls/private/tls.crt"
- "--tls-private-key-file=/etc/tls/private/tls.key"
- "--client-ca-file=/etc/tls/client/client-ca.crt"
...
CMO specifies the aforementioned CA certificate in the metrics-client-ca ConfigMap which is used to define client certificates for every kube-rbac-proxy
container that’s safeguarding a component. The component’s Service
endpoints are secured using the generated TLS Secret
annotating it with the service.beta.openshift.io/serving-cert-secret-name
. Internally, this requests the service-ca
controller to generate a Secret
containing a certificate and key pair for the ${service.name}.${service.namespace}.svc
. These TLS manifests are then used in various component ServiceMonitors
to define their TLS configurations, and within CMO to ensure a “mutual” acknowledgement between the two.
Static authorization involves configuring kube-rbac-proxy
to allow access to certain resources or non-resources which are evaluated against the Role
or ClusterRole
RBAC permissions the user or the service account has. The example below demonstrates how this can be employed to give access to a known ServiceAccount
to the /metrics
endpoint. /metrics
endpoints exposed by various monitoring components are protected this way. Note that after the initial user or service account authentication, the request is matched against a comma-separated list of paths, as defined by the --allow-path
flag, like so.
apiVersion: v1
kind: Secret
...
stringData:
# "path" is the path to match against the request path.
# "resourceRequest" is a boolean indicating whether the request is for a resource or not.
# "user" is the user to match against the request user.
# "verb" is the verb to match against the corresponding request RBAC verb.
config.yaml: |-
"authorization":
"static":
- "path": "/metrics"
"resourceRequest": false
"user":
"name": "system:serviceaccount:openshift-monitoring:prometheus-k8s"
"verb": "get"
For more details, refer to the kube-rbac-proxy
’s static authorization example.
For more information on collecting metrics in such cases, refer to this section of the handbook.
kube-rbac-proxy
is also used to secure API endpoints such as Prometheus, Alertmanager and Thanos. In this case, the proxy is configured to authenticate requests based on bearer tokens and to perform authorization with SubjectAccessReview
.
The following components use the same method in their kube-rbac-proxy
configurations Secrets
to authorize the /metrics
endpoint and restrict it to GET
requests only:
alertmanager-kube-rbac-proxy-metric
(alertmanager
)openshift-user-workload-monitoring
(alertmanager-user-workload
)kube-state-metrics-kube-rbac-proxy-config
(kube-state-metrics
)node-exporter-kube-rbac-proxy-config
(node-exporter
)openshift-state-metrics-kube-rbac-proxy-config
(openshift-state-metrics
)kube-rbac-proxy
(prometheus-k8s
) (additionally the /federate
endpoint, for the telemeter as well as its own client)prometheus-operator-kube-rbac-proxy-config
(prometheus-operator
)prometheus-operator-uwm-kube-rbac-proxy-config
(prometheus-operator
)kube-rbac-proxy-metrics
(prometheus-user-workload
)telemeter-client-kube-rbac-proxy-config
(telemeter-client
)thanos-querier-kube-rbac-proxy-metrics
(thanos-querier
)thanos-ruler-kube-rbac-proxy-metrics
(thanos-ruler
)On the other hand, the example below depicts restricted access to a resource, i.e., monitoring.coreos.com/prometheusrules
in the openshift-monitoring
namespace.
apiVersion: v1
kind: Secret
...
stringData:
# "resourceAttributes" describes attributes available for resource request authorization.
# "rewrites" describes how SubjectAccessReview may be rewritten on a given request.
# "rewrites.byQueryParameter" describes which HTTP URL query parameter is to be used to rewrite a SubjectAccessReview
# on a given request.
config.yaml: |-
"authorization":
"resourceAttributes":
"apiGroup": "monitoring.coreos.com"
"namespace": "{{ .Value }}"
"resource": "prometheusrules"
"rewrites":
"byQueryParameter":
"name": "namespace"
The following components use the same method in their kube-rbac-proxy
configuration Secrets
to authorize the respective resources:
alertmanager-kube-rbac-proxy
(alertmanager
): prometheusrules
alertmanager-kube-rbac-proxy-tenancy
(alertmanager-user-workload
): prometheusrules
kube-rbac-proxy-federate
(prometheus-user-workload
): namespaces
thanos-querier-kube-rbac-proxy-rules
(thanos-querier
): prometheusrules
thanos-querier-kube-rbac-proxy
(thanos-querier
): pods
Note that all applicable omitted configuration settings are interpreted as wildcards.
Details on configuring kube-rbac-proxy
under different scenarios can be found in the repository’s /examples section.
In addition to enabling debug logs or compiling a custom binary with debugging capabilities (-gcflags="all=-N -l"
), users can:
-v=12
(or higher), or,kube-rbac-proxy
.Observatorium is an observability system designed to enable the ingestion, storage (short and long term) and querying capabilities for three major observability signals: metrics, logging and tracing. It unifies horizontally scalable, multi-tenant systems like Thanos, Loki, and in the future, Jaeger to deploy them in a single stack with consistent APIs. On top of that it’s designed to be managed as a service thanks to consistent tenancy, authorization and rate limiting across all three signals.
TBD(https://github.com/rhobs/handbook/issues/22)
https://github.com/observatorium/observatorium/issues
The CNCF Slack workspace’s (join here) channels:
#observatorium
for user related things.#observatorium-dev
for developer related things.TBD
We use Observatorium as a Service for our Red Hat Observability Service (RHOBS).
We also know of several other companies installing Observatorium on their own (as of 2021.07.07):
https://github.com/observatorium/observatorium/blob/main/docs/community/maintainers.md
Prometheus is a monitoring and alerting system which collects and stores metrics. In the broader sense, it is a collection of tools including (but not limited to) Alertmanager, node_exporter, etc.
The Prometheus Operator is a Kubernetes Operator which manages Prometheus, Alertmanager and ThanosRuler deployments.
Thanos is a horizontally scalable, multi-tenant monitoring system in a form of distributed time series database that supports Prometheus data format.
https://thanos.io/tip/thanos/getting-started.md
12.2020: Absorbing Thanos Infinite Powers for Multi-Cluster Telemetry
12.2020: Turn It Up to a Million: Ingesting Millions of Metrics with Thanos Receive
02.2019: FOSDEM + demo
03.2019: Alibaba Cloud user story
https://github.com/thanos-io/thanos/issues
The CNCF Slack workspace’s (join here) channels:
#thanos
for user related things.#thanos-dev
for developer related things.https://thanos.io/tip/contributing/contributing.md/#adding-new-features--components
We use Thanos in many places within Red Hat, notably:
https://thanos.io/tip/thanos/maintainers.md/#core-maintainers-of-this-repository