1 - Accepted

Accepted

This is a list of accepted proposals. This means proposal was accepted, but not yet implemented.

Internal Accepted Proposals

1.1 - ## Evolution of the Observability Operator (fka MSO)

Evolution of the Observability Operator (fka MSO)

TL;DR: As a mid-to-long term vision for the Monitoring Stack Operator we propose to rename MSO as OO (Observability Operator). With the name, we propose to establish OO as the open source, (single) cluster-side component for OpenShift observability needs. In conjunction with Observatorium / RHOBS - as the multi-cluster, multi-tenant, scalable observability backend component. OO is thought to manage different kinds of cluster-side monitoring, alerting, logging, and tracing (and potentially profiling) stack setups covering the needs of OpenShift variants like the client-side for the fully managed multi-cluster use cases to HyperShift and single node air gapped setups.

Why

With the rise of new OpenShift variants with very different needs regarding observability, the desire for a consistent way of providing differently configured monitoring stacks grows. Examples:

  • Traditional single-cluster on-prem OpenShift deployments need a self-contained monitoring stack where all components (scraping, storage, visualization, alerting, log collection) run on the same cluster. This is the kind of setup Cluster Monitoring Operator was designed for.
  • Multi-cluster deployments need a central (aggregated, federated) view on metrics, logs and alerts. Certain components of the stack don’t run on the workload clusters but in some central infrastructure.
  • Resource-constraint deployments need a stripped down version of the monitoring stack, e.g. only forwarding signals to a central infrastructure or disabling certain parts of the stack completely.
  • Mixed deployments (e.g. OpenShift + OpenStack) are conceptually very similar to the multi-cluster use case, also needing a central pane of glass for observability signals.
  • Special purpose deployments (e.g. OpenShift Data Foundation) have special requirements when it comes to monitoring, that are tricky to align with the existing CMO setup.
  • Looking at eventually correlating different observability signals also the cluster-side stack would potentially benefit from a holistic approach for deploying monitoring, logging and tracing components and configuring them in the right way to work together.
  • Managed service deployments need a multi tenancy capable way of deploying many similarly built monitoring stacks to a single cluster. This is the short-term focus for OO.

The proposal is to combine all these (and more) use cases into one single (meta) operator (as recommended by the operator best practices) which can be configured with e.g. presets to instruct lower-level operators (like prometheus-operator, potentially Loki operator or Jaeger one) to deploy purpose-built monitoring stacks for different uses cases. This is similar to the original CMO concept but with much higher flexibility, and feature velocity in mind, thanks to not being tied to OpenShift Core versioning.

Additionally, supporting multiple different ways of deploying monitoring stacks (CMO as the standard OpenShift way, OO for managed services, something else for e.g. HyperShift or edge scenarios, …) is a burden for the team. Instead, eventually supporting only one way to deploy monitoring stacks - with OO - covering all these use cases makes it a lot simpler and far more consistent.

Pitfalls of the current solution

CMO is built for traditional self-operated single-cluster focused deployments of OpenShift. It intentionally lacks the flexibility for many other use cases (see above) in order to provide monitoring that is resilient against configuration drift. E.g. the reason for creating OO (MSO) in the first place - supporting managed service uses cases - can’t currently be covered by CMO. See the original MSO proposal for more details.

The results of this lack of flexibility can be readily observed: Red Hat teams have built their own solutions for their monitoring use cases, e.g. leveraging community operators or self-written deployments, with varying success, reliability and supportability.

Goals

Goals and use cases for the solution as proposed in How:

  • Widen the scope of OO to cover additional use cases besides managed services.
  • Replace existing ways of deploying monitoring stacks across Red Hat products with OO.
  • Focus on OpenShift use cases primarily but don’t exclude vanilla Kubernetes as a possible target.
  • Create an API that easily allows common configuration across observability signals.

Non-Goals

  • Create a multi-cluster capable observability operator.

How

  • Define use cases to be covered in detail.
  • Prioritize use cases and add needed features one by one.

Alternatives

  1. Tackle each monitoring use case across Red Hat products one by one and build a custom solution for them. This would lead to many different (but potentially simpler) implementations which need to be supported.
  2. Develop signal specific operators that can handle the required use cases. This would likely require an API between those operators to apply common configuration.

Action Plan

Collection of requirements and prioritization of use cases currently in progess (Q3 2022).

1.2 - 2021-06: Handbook

2021-06: Handbook

TL;DR: I would like to propose to put all public documentation pieces related to the Monitoring Group (and not tied to a specific project) in the public GitHub repository called handbook. I propose to review all documents with a similar flow as code and put documents in the form of markdown files that can be read from both GitHub UI and automatically served on https://rhobs-handbook.netlify.app/ website.

The diagram below shows what fits into this handbook and what should be distributed to the relevant upstream project (e.g developer documentation).

Why

Documentation is essential
  • Without good team processes documentation, collaboration within the team can be challenging. Members have to figure out what to do on their own, or tribal knowledge has to be propagated. Surprises and conflicts can arise. On-boarding new team members are hard. It’s vital given that our Red Hat teams are distributed over the world and working remotely.
    • Additionally, it’s hard for any internal or external team to discover how to reach us or escalate without noise.
  • Without a good team subject matter overview, it’s hard to wrap your head around the number of projects we participate in. In addition, each team member is proficient in a different area, and we miss some “index” overview of where to navigate for various project aspects (documentation, contributing, proposals, chats).
    • Even if documentation is created, it risks being placed in the wrong place.
  • Without a place for written design proposals (those in progress, those accepted and rejected), the team risks repeating iterating over the same ideas or challenging old ideas already researched.
  • Without good operational or configuration knowledge, we keep asking the same question about, e.g. how to rollout service X or contribute to X etc.
Despite strong incentives, writing documentation has proven to be of one the most unwanted task among engineers

Demotivation is because our (Google Docs based) process tends to create the following obstacles:

  • There are too many side decisions to make, e.g. where to put this documentation, what format to use, how long, how to be on-topic, are we sure this information is not recorded somewhere else? Every, even small decision takes our energy and have the risk of procrastination.
  • There is no review process, so it’s hard to maintain a high quality of those documents.
  • Created documentation is tough to consume and discover.
  • Because docs are hard to discover, the documentation tends to be often duplicated, has gaps, or is obsolete.
  • Documents used to be private, which brings extra demotivation. Some of the information is useful for the public audience. Some of this could be useful for external contributors. It’s hard to reuse such private docs without recreating them.

All of those make people decide NOT to write documentation but rather schedule another meeting and repeat the same information repeatedly.

On a similar side, anyone looking for information about our teams' work, proposals or project is demotivated to look, find and read our documentation because it’s not consistent, not in a single place, hard to discover or not completed.

Pitfalls of the current solution

  • It mainly exists in Google Docs, which has the following issues:
    • Not everything is in our Team drive, there are docs not owned by us, created adhoc.
    • It’s painful to organize them well e.g in directories, since it’s so easy so copy, create one.
    • Even if it’s organized well, it’s not easily discoverable.
  • Existing Google doc-based documents are hard to consume. The formatting is widely different. Naming is inconsistent.
  • Document creation is rarely actionable. There is no review process, so the effort of creating a relevant document might be wasted, as the document is lost. This also leads to docs being in the half-completed state, demotivating readers to look at it.
  • It’s hard to track previous discussions around docs, who approved them (e.g. proposals).
  • It’s not public, and it’s hard to share best practices with other external and internal teams.

Goals

Goals and use cases for the solution as proposed in How:

  • Single source of truth for Monitoring Group Team docs like processes, overviews, runbooks, links for internal content.
  • Have a consistent documentation format that is readable and understandable.
  • Searchable and easily discoverable.
  • Process of adding documents should be easy and consistent.
  • Automation and normal review process should be in place to ensure high quality (e.g. link checking).
  • Allow public collaboration on processes and other docs.

NOTE: We would love to host Logging and Tracing Teams if they choose to follow our process, but we don’t want to enforce it. We are happy to extend this handbook from Monitoring Group handbook to Observability Group, but it has to grow organically (if Logging, Tracing team will see the value joining us here).

Audience

The currently planned audience for proposed documentation content is following (in importance order):

  1. Monitoring Group Team Members.
  2. External Teams at Red Hat.
  3. Teams outside Red Hat, contributors to our projects, potential future hires, people interested in best practices, team processes etc.

Non-Goals

  • Support other formats than Markdown e.g. Asciidoc.
  • Replace official project or product documentation.
  • Precise design proposal process (it will come in a separate proposal).
  • Sharing Team Statuses, we use JIRA and GH issues for that.

How

The idea is simple:

Let’s make sure we maintain the process of adding/editing documentation as easy and rewarding as possible. This will increase the chances team members will document things more often and adopt this as a habit. Produced content will be more likely complete and up-to-date, increasing the chances it will be helpful to our audience, which will reduce the meeting burden. This will make writing docs much more rewarding, which creates a positive loop.

I propose to use git repository handbook to put all related team documentation pieces there. Furthermore, I suggest reviewing all documents with a similar flow as code and placing information in the form of markdown files that can be read from both GitHub UI and automatically served on https://rhobs-handbook.netlify.app/ website.

Pros:

  • Matches our goals.
  • Sharing by default.
  • Low barriers to write documents in a consistent format, low barrier to consume it.
  • Ensures high quality with local CI and review process.

Cons:

  • Some website maintenance is needed, but we use the same and heavily automated flow in Prometheus-operator, Thanos, Observatorium websites etc.

The idea of a handbook is not new. Many organizations do this e.g GitLab.

NOTE: The website style might be not perfect (https://rhobs-handbook.netlify.app/). Feel free to propose issues, fixes to the overall look and readability!

Flow of Adding/Consuming Documentation to Handbook

flow

If you want to add or edit markdown documentation, refer to our technical guide.

Alternatives

  1. Organize Team Google Drive with all Google docs we have.

Pros:

  • Great for initial collaboration

Cons:

  • Inconsistent format
  • Hard to track approvers
  • Never know when the doc is “completed.”
  • Hard to maintain over time
  • Hard to share and reuse outside
  1. Create Red Hat scoped only, a private handbook.

Pros:

  • No worry if we share something internal?

Cons:

  • We don’t have many internal things we don’t want to share at the current moment. All our projects and products are public.
  • Sharing means we have to duplicate the information, copy it in multiple places.
  • Harder to share with external teams
  • We can’t use open source tools, CIs etc.

Action Plan

  • Create handbook repo and website
  • Create website automation (done with mdox)
  • Move existing up-to-date public documents (team processes, project overviews, faqs, design docs) over to the handbook (deadline: End of July).
  • Clean up Team Drive from not used or non-relevant project (or move it to some trash dir).

2 - Done

Done

This is a list of implemented proposals.

Internal Implemented Proposals

2.1 - 2021-06: Proposal Process

2021-06: Proposal Process

TL;DR: We would like to propose an improved, official proposal process for Monitoring Group that clearly states when, where and how to create proposal/enhancement/design documents.

Why

More extensive architectural, process, or feature decisions are hard to explain, understand and discuss. It takes a lot of time to describe the idea, to motivate interested parties to review it, give feedback and approve. That’s why it is essential to streamline the proposal process.

Given that we work in highly distributed teams and work with multiple communities, we need to allow asynchronous discussions. This means it’s essential to structure the talks into shared documents. Persisting in those decisions, once approved or rejected, is equally important, allowing us to understand previous motivations.

There is a common saying "I've just been around long enough to know where the bodies are buried". We want to ensure the team related knowledge is accessible to everyone, every day, no matter if the team member is new or part of the team for ten years.

Pitfalls of the current solution

Currently, the Observability Platform team have the process defined here (internal), whereas the In-Cluster part were not defining any official process (as per here (internal)).

In practice, both teams had somehow similar flow:

  • For upstream: Follow the upstream project’s contributing guide, e.g Thanos
  • For downstream:
    • Depending on the size:
      • Small features can be proposed during the bi-weekly team-sync or directly in Slack.
        • If the team can reach consensus in this time, then document the decision somewhere written, e.g. an email, Slack message to which everyone can add an emoji reaction, etc.
        • Add a JIRA ticket to plan this work.
      • Large features might need a design doc:
        1. Add a JIRA ticket for creating the design doc
        2. Create a new Google Doc in the team folder based on this template
        3. Fill sections
        4. Announce it on the team mailing list and Slack channel
        5. Address comments / concerns 6 Define what “done” means for this proposal, i.e. what is the purpose of this design document:
        • Knowledge sharing / Brain dump: This kind of document may not need a thorough review or any official approval
        • Long term vision and Execution & Implementation: If approved (with LGTM comments, or in an approved section) by a majority of the team and no major concerns consider it approved. NOTE: The same applies to rejected proposals.
        1. If the document has no more offline comments and no consensus was reached, schedule a meeting with interested parties.
        2. When the document changes status, move it to the appropriate status folder in the design docs directory of the team folder. If an approved proposal concerns a component with its own directory, e.g. Telemeter, then create a shortcut to the proposal document in the component-specific directory. This helps us find design documents by topic and by status.

It served us well, but it had the following issues (really similar to ones stated in handbook proposal):

  • Even if our Google Design docs organized in our team drive, those Google documents are not easily discoverable.
  • Existing Google doc-based documents are hard to consume. The formatting is widely different. Naming is inconsistent.
  • Document creation is rarely actionable. There is no review process, so the effort of creating a relevant document might be wasted, as the document is lost. This also leads to docs being in the half-completed state, demotivating readers to look at it.
  • It’s hard to track previous discussions around proposals, who approved them (e.g. proposals).
  • It’s not public, and it’s hard to share good proposals with other external and internal teams.

Goals

Goals and use cases for the solution as proposed in How:

  • Allow easy collaboration and decision making on design ideas.
  • Have a consistent design style that is readable and understandable.
  • Ensure design docs are discoverable for better awareness and knowledge sharing about past decisions.
  • Define a clear review and approval process.

Non-Goals

How

We want to propose an improved, official proposal process for Monitoring Group that clearly states when, where and how to create proposal/enhancement/design documents.

Everything starts with a problem statement. It might be a missing functionality, confusing existing functionality or broken one. It might be an annoying process, performance or security issue (or potential one).

Where to Propose Changes/Where to Submit Proposals?

As defined in handbook proposal, our Handbook should tell you that Handbook is meant to be an index for our team resources and a linking point to other distributed projects we maintain or contribute to.

First, we need to identify if the idea we have is something we can contribute to an upstream project, or it does not fit anywhere else, so we can leverage the Handbok Proposal directory and the process. See the below algorithm to find it out:

where

Internal Team Drive for Public and Confidential Proposals

Templates

Handbook Proposal Process

If there is no problem, there is no need for changing anything, no need for a proposal. This might feel trivial, but we should first ask ourselves this question before even thinking about writing a proposal.

It takes time to propose an idea, find consensus and implement more significant concepts, so let’s not waste time before it’s worth it. But, unfortunately, even good ideas sometimes have to wait for a good moment to discuss them.

Let’s assume the idea sounds interesting to you; what to do next, where to propose it? How to review it? Follow the algorithm below:

where

Note: It’s totally ok to reject a proposal if a team member feels the idea is wrong. It’s better to explicitly oppose it than to ignore it and leave it in limbo.

NOTE: We would love to host Logging and Tracing Teams if they choose to follow our process, but we don’t want to enforce it. We are happy to extend this process from the Monitoring Group handbook to Observability Group. Still, it has to grow organically (if the Logging, Tracing team will see the value of joining us here).

On Review Process

As you see on the above algorithm, if the content relates to any upstream project, it should be proposed, reviewed and potentially implemented together with the community. This does not mean that you cannot involve other team members towards this effort. Share the proposal with team members, even if they are not part of maintainer’s team on a given project, any feedback, and voice are useful and can help to move idea further.

Similar to proposals that touch our team only, despite mentioning mandatory approval process from leads, anyone can give feedback! Our process is in fact very similar to Hashicorp’s RFC process:

Once you’ve written the first draft of an RFC, share it with your team. They’re likely to have the most context on your proposal and its potential impacts, so most of your feedback will probably come at this stage. Any team member can comment on and approve an RFC, but you need explicit approval only from the appropriate team leads in order to move forward. Once the RFC is approved and shared with stakeholders, you can start implementing the solution. For major projects, also share the RFC to the company-wide email list. While most members of the mailing list will just read the email rather than the full RFC, sending it to the list gives visibility into major decisions being made across the company.

Summary

Overall, we want to bring a culture where design docs will be reviewed in certain amount of time and authors (team members) will be given feedback. This, coupled with recognizing the work and being able to add it to your list of achievements (even if proposal was rejected), should bring more motivation for people and teams to assess ideas in structure, sustainable way.

Alternatives

  1. Organize Team Google Drive with all Google docs we have.

Pros:

  • Great for initial collaboration

Cons:

  • Inconsistent format
  • Hard to track approvers
  • Never know when the doc is “completed.”
  • Hard to maintain over time
  • Hard to share and reuse outside

Action Plan

  • Explain process in Proposal Process Guide
  • Move existing up-to-date public design docs over to the Handbook (deadline: End of July).
  • Propose a similar process to upstream projects that do not have it.

3 - Proposals Process

Proposals Process

Where to Propose Changes/Where to Submit Proposals?

As defined in handbook proposal, our Handbook should tell you that Handbook is meant to be an index for our team resources and a linking point to other distributed projects we maintain or contribute to.

First, we need to identify if the idea we have is something we can contribute to an upstream project, or it does not fit anywhere else, so we can leverage the Handbok Proposal directory and the process. See the below algorithm to find it out:

where

Internal Team Drive for Public and Confidential Proposals

Proposal Process

If there is no problem, there is no need for changing anything, no need for a proposal. This might feel trivial, but we should first ask ourselves this question before even thinking about writing a proposal.

It takes time to propose an idea, find consensus and implement more significant concepts, so let’s not waste time before it’s worth it. But, unfortunately, even good ideas sometimes have to wait for a good moment to discuss them.

Let’s assume the idea sounds interesting to you; what to do next, where to propose it? How to review it? Follow the algorithm below:

where

Note: It’s totally ok to reject a proposal if a team member feels the idea is wrong. It’s better to explicitly oppose it than to ignore it and leave it in limbo.

NOTE: We would love to host Logging and Tracing Teams if they choose to follow our process, but we don’t want to enforce it. We are happy to extend this process from the Monitoring Group handbook to Observability Group. Still, it has to grow organically (if the Logging, Tracing team will see the value of joining us here).

On Review Process

As you see on the above algorithm, if the content relates to any upstream project, it should be proposed, reviewed and potentially implemented together with the community. This does not mean that you can involve other team members towards this effort. Share the proposal with team members, even if they are not part of maintainer’s team on a given project, any feedback, and voice are useful and can help to move idea further.

Similar to proposals that touch our team only, despite mentioning mandatory approval process from leads, anyone can give feedback! Our process is in fact very similar to Hashicorp’s RFC process:

Once you’ve written the first draft of an RFC, share it with your team. They’re likely to have the most context on your proposal and its potential impacts, so most of your feedback will probably come at this stage. Any team member can comment on and approve an RFC, but you need explicit approval only from the appropriate team leads in order to move forward. Once the RFC is approved and shared with stakeholders, you can start implementing the solution. For major projects, also share the RFC to the company-wide email list. While most members of the mailing list will just read the email rather than the full RFC, sending it to the list gives visibility into major decisions being made across the company.

Templates

Google Docs Template

Open Source Design Doc Template.

Markdown Template:

Your Proposal Title

  • Owners:

    • <@author: single champion for the moment of writing>
  • Related Tickets:

    • <JIRA, GH Issues>
  • Other docs:

    • <Links…>

TL;DR: Give here a short summary of what this document is proposing and what components it is touching. Outline rough idea of proposer’s view on proposed changes.

For example: This design doc is proposing a consistent design template for “example.com” organization.

Why

Put here a motivation behind the change proposed by this design document, give context.

For example: It’s important to clearly explain the reasons behind certain design decisions in order to have a consensus between team members, as well as external stakeholders. Such a design document can also be used as a reference and knowledge-sharing purposes. That’s why we are proposing a consistent style of the design document that will be used for future designs.

Pitfalls of the current solution

What specific problems are we hitting with the current solution? Why it’s not enough?

For example, We were missing a consistent design doc template, so each team/person was creating their own. Because of inconsistencies, those documents were harder to understand, and it was easy to miss important sections. This was causing certain engineering time to be wasted.

Goals

Goals and use cases for the solution as proposed in How:

  • Allow easy collaboration and decision making on design ideas.
  • Have a consistent design style that is readable and understandable.
  • Have a design style that is concise and covers all the essential information.

Audience

If not clear, the target audience that this change relates to.

Non-Goals

  • Move old designs to the new format.
  • Not doing X,Y,Z.

How

Explain the full overview of the proposed solution. Some guidelines:

  • Make it concise and simple; put diagrams; be concrete, avoid using “really”, “amazing” and “great” (:
  • How you will test and verify?
  • How you will migrate users, without downtime. How we solve incompatibilities?
  • What open questions are left? (“Known unknowns”)

Alternatives

The section stating potential alternatives. Highlight the objections reader should have towards your proposal as they read it. Tell them why you still think you should take this path [ref]

  1. This is why not solution Z…

Action Plan

The tasks to do in order to migrate to the new idea.

  • Task one <GH issue/JIRA ticket>
  • Task two <GH issue/JIRA ticket> …

4 - Rejected

Rejected

This is a list of rejected proposals.

NOTE: This does not mean we can return to them and accept!

Internal Rejected Proposals