trento-project · CDimonaco · Jan 27, 2025 · nelsonkopliku · Jan 27, 2025 · CDimonaco
diff --git a/adr/0017-agent-operations-framework.md b/adr/0017-agent-operations-framework.md
@@ -0,0 +1,124 @@
+# 17. Agent Operations Framework
+
+Date: 2025-01-27
+
+## Status
+
+Accepted
+
+## Context
+
+Following the new epics and product decisions, we need to build a framework capable of executing write operations on SAP machines.
+To achieve this, we have decided to use the agent, as it runs on the machines with root privileges, has all the necessary permissions to execute commands, and includes infrastructure to consume messages from other components of Trento.
+
+## Decision
+
+Unlike fact-gathering operations, we have chosen to delegate this new development to a dedicated library rather than embedding the code directly into the agent's codebase. This approach ensures better separation of concerns and allows the release of additional tools supporting operations without affecting the agent's codebase.
+
+The requirements for this new development are very specific:
+
+- Operations must be atomic and include rollback capabilities.
+- Operations must accept arguments to enable actions in different contexts without requiring dedicated code for each case.
+- Operations must be transactional, including distinct steps for prerequisites verification, commit, rollback, and validation of changes applied during the commit phase.
+- Operations must be versioned, with backward compatibility ensured for previous versions in the event of updates.
+- Operations must be idempotent.
+
+The agent will consume the library containing these operations and use it to fulfill operation requests from other components.
+
+The library handling operations is named [workbench](https://github.com/trento-project/workbench).
+
+### Example:
+
+- **Operator**: `saptuneapplysolution` - Applies a SAP Tune solution using the solution name as an argument.
+- **Operation**: Apply a SAP Tune solution if it has not already been applied. If already applied, no action is taken. In case of an error, revert the solution specified as an argument.
+
+## Operator
+
+An operator is a unit of code capable of performing write operations on target machines.
+A write operation can either succeed or fail. Upon success, it generates a diff showing changes made during the commit phase.
+
+The operator accepts arguments in the form of a `map[string]any` to specify operation parameters. Each operator is responsible for extracting and validating these arguments.
+
+The operator follows a transactional workflow, which includes the following distinct phases:
+
+- **PLAN**
+- **COMMIT**
+- **VERIFY**
+- **ROLLBACK**
+
+### PLAN
+
+The PLAN phase collects information about the operation and verifies prerequisites.
+This phase also gathers information for generating diffs by collecting the "before" state of the system.
+Backups are created for any resources modified during the COMMIT phase, ensuring restoration is possible in case of rollback or manual recovery if rollback fails.
+
+If an error occurs during the PLAN phase, no rollback is required; the operation is simply aborted.
+
+### COMMIT
+
+The COMMIT phase performs the actual write operations using the data collected during the PLAN phase.
+If an error occurs, rollback is triggered.
+
+The COMMIT phase must be idempotent. If a requested change has already been applied, the commit operation is skipped without error. Idempotency must be implemented based on the specific operation's requirements.
+
+### VERIFY
+
+The VERIFY phase ensures the actions applied during the COMMIT phase produced the expected results.
+If an error occurs, rollback is initiated.
+
+The VERIFY phase also collects the "after" state to generate the diff showing changes applied during the commit.
+
+### ROLLBACK
+
+The ROLLBACK phase reverts changes made during the COMMIT phase. It uses data collected during the PLAN phase to restore the system to its previous state.
+
+Rollback implementations may vary based on the type of operation. Clear error messages and appropriate logs must be provided.
+
+If rollback fails, an error is returned without further action.
+
+Each operator implements these phases by satisfying the `phaser` interface:
+
+```go
+type phaser interface {
+    plan(ctx context.Context) error
+    commit(ctx context.Context) error
+    rollback(ctx context.Context) error
+    verify(ctx context.Context) error
+    operationDiff(ctx context.Context) map[string]any
+}
+```
+
+
+These methods are invoked by the Executor, which wraps the operator. All operators are exposed through a constructor function, returning operators already wrapped in an Executor.
+
+## Executor
+
+The Executor is a wrapper around an operator that manages operations transactionally.
+For library users, the Executor is transparent to the users, operators are already wrapped within an Executor.
+
+Below is a flowchart illustrating the transactional flow:
+
+![flow_chart](https://github.com/trento-project/workbench/raw/main/flow_chart.png)
+
+## Registry
+
+The Registry manages all available operators.
+Each operator has a version. By default, the latest version is fetched if no specific version is requested.
+
+The operator naming convention is: `<operatorname>@<version>`
+
+The Registry returns an Operator Builder:
+
+```golang
+type OperatorBuilder func(operationID string, arguments OperatorArguments) Operator
+```
+
+- `operationID`: A unique identifier for the operation.
+- `arguments`: A `map[string]any` structure containing operation parameters.
+
+## Consequences
+
+This development will enable transactional write operations on target machines.
+
+Each operation is atomic.
+Coordination, ordering, and dependency management of multiple operations are not the agent's responsibility but are delegated to another component that orchestrates their execution.