Skip to content

StatusPage: Who should have access? #2304

@MattIPv4

Description

@MattIPv4

Main issue: #2265

Creating a dedicated issue to house discussion around who should have access to the status page either through the control panel provided by StatusPage or through a custom solution using their API.

Ref: #2265 (comment), #2265 (comment), #2299 (comment), #2299 (comment) & #2299 (comment)

My baseline opinion is that as many people as possible should have access to create an incident with as little red tape as possible to block them getting there so that if something does go wrong, we can quickly & easily create an incident to communicate what's happening with the wider community.

I think there are a couple of overarching options here:

1. Use the StatusPage control panel

This limits how many folks we can give access, as our StatusPage plan gives us 25 email addresses we can add (two already in use currently for myself & ops@jsf). Do note though that these emails can be individuals, or if needed could be used for shared logins.

In addition, anyone who has access does have access to everything on the control panel (changing design, changing components, managing incidents, seeing who is subscribed, etc.).

However, this has the massive advantage of being able to use the StatusPage UI directly rather than having to emulate it for folks to use elsewhere.

Within this, I see two immediate solutions, both with an org team behind them:

a. Have an org team for membership, with an administrator that manually adds/remove people on the StatusPage account. This is definitely the simplest solution of them all.

b. Have an org team for membership, use a custom script to interface with the StatusPage API to automate the additional & removal of folks from the StatusPage account.

2. Emulate StatusPage control in a repository

A few different variants of this have been suggested, each with different access implications.

Any implementation here would need to have a way to emulate all of the following parts of incident management on StatusPage effectively:

  • Incident title (creating, changing)
  • Incident status (setting, updating [investigating, identified, monitoring, resolved]
  • Incident severity (updating)
  • Update message (during creation & each subsequent update)
  • Components affected (for each update message)
    • Severity of each affected component (operational, degraded performance, partial outage, major outage, under maintenance)
  • Notifications to send (subscribers [global & component-specific], tweet)

With this strategy, there is also the big question of what happens if something goes wrong with the custom implementation. Who still has access to the control panel to fix a broken incident? What's the timeframe on this, as we'd essentially be miscommunicating an incident to the wider community until it was fixed?

a. Issues in the main node.js repository to power

Micheal suggested:

Ideally I think what would work best is if it was based on an issue in the node.js repo (for greatest visibility) and then being approved by two approvals from Node.js collaborators. We trust our collaborators to push code so it quite likely makes sense to trust them with decide what is a reportable incident and when it is resolved. Even better if after the approvals simply adding a tag (which any collaborator can do) would result in the incident being pushed to the status page. If this could not be automated, it could be done manually to start.

Sam suggested:

Ideally, a GH team would be allowed to use as an auth source, or even better, an issue in a specific repo would be enough to drive status page changes (with repo access controlled by a team).

To implement something like this, we'd likely want to use something like YAML frontmatter in issue comments to control what everything is set to on the incident. Labels could be used to control overall incident severity & status, though this would be harder for component severity.

Using labels to act as approval before it gets posted on the status page provides the only easy security to limit who can post to the page, requiring someone with maintain access to that repo to add the label to the issue.

Using a major repository, such as node.js, would give basically every collaborator there permission to post to the status page, which whilst incredibly useful might not be desired as this isn't a dedicated form of access for status page.

b. Dedicated repository for status page administration

The alternative to using issues in an existing repository would be to create a new repository that is used just too controlling the status page.

This would allow for a dedicated team to be created (with lots of members) that has access to the repo to be able to post incidents to the status page.

Using a dedicated repository also gives more options with how exactly the repository will integrate with the StatusPage API.

Maybe instead of an issue, each incident becomes a folder in the repo, with each update being a file in that repo. Then, PRs and approvals can be used to ensure incidents & updates are approved before being merged and posted.

I welcome all thoughts and feedback on how access to StatusPage should be configured and how we should post incidents & updates to the page.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions