Skip to content

Conversation

cloud-j-luna
Copy link
Member

@cloud-j-luna cloud-j-luna commented Apr 25, 2025

This PR adds the capability of observing a provider cluster state and performing verifications.
It adds several e2e tests to verify if workloads had unexpected impact from several scenarios.

Current tests

  • Deployment upgrade creates a new pod.
  • Deployment shell access doesn't trigger workload restarts.
  • Provider & Hostname Operator restarts don't impact deployed workloads.

@cloud-j-luna cloud-j-luna self-assigned this Apr 25, 2025
Copy link

coderabbitai bot commented May 21, 2025

Walkthrough

This update introduces Kubernetes pod state observation and verification into end-to-end deployment tests. New observer utilities are added to integration helpers, enabling tests to track pod creation and restarts during deployment lifecycle events, including updates and provider restarts. Several documentation and environment files for local development environments are removed.

Changes

File(s) Change Summary
integration/deployment_test.go Added two new end-to-end tests: TestE2EDeploymentCreateProviderShutdown and TestE2EDeploymentUpdate with pod state verification.
integration/deployment_update_test.go Enhanced tests to observe and verify Kubernetes pod states before and after deployment updates and lease shell operations.
integration/test_helpers.go Introduced Observer type and related methods for monitoring pod lifecycle and container restarts in Kubernetes namespaces.
integration/e2e_test.go Registered new test suite E2EDeploymentCreate and captured provider creation transaction result for validation.
_run/minikube/.envrc
_run/minikube/Makefile
_run/minikube/README.md
_run/minikube/deployment.yaml
_run/minikube/provider.yaml
Deleted Minikube environment configuration, deployment, provider, and documentation files.
_run/single/.envrc
_run/single/Makefile
_run/single/README.md
_run/single/deployment.yaml
_run/single/deployment2.yaml
_run/single/kind-config.yaml
_run/single/provider.yaml
Deleted single-node environment configuration, deployment, provider, and documentation files.

Sequence Diagram(s)

sequenceDiagram
    participant Test as E2E Test
    participant Kube as Kubernetes
    participant Provider as Provider Service
    participant Blockchain as Blockchain

    Test->>Blockchain: Create Deployment
    Blockchain-->>Test: Deployment Created
    Test->>Provider: Query for Bid
    Provider-->>Test: Bid Exists
    Test->>Blockchain: Create Lease
    Blockchain-->>Test: Lease Created
    Test->>Provider: Send Manifest
    Provider-->>Test: Manifest Accepted
    Test->>Kube: Query Pods (Observer.Observe)
    Kube-->>Test: Pod State Returned
    Test->>Provider: Restart Provider
    Provider-->>Test: Provider Restarted
    Test->>Kube: Query Pods (Observer.VerifyNoChangeOccurred/VerifyNewPodCreate)
    Kube-->>Test: Pod State Verified
Loading

Poem

In the warren where the tests all hop,
Observers peek at pods that pop.
Deployments change, providers spin,
Kubernetes keeps the state within.
Old docs and configs, now out of sight—
The code grows stronger, rabbits delight!
🐇✨


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f66b63d and 57dd1b9.

📒 Files selected for processing (16)
  • _run/minikube/.envrc (0 hunks)
  • _run/minikube/Makefile (0 hunks)
  • _run/minikube/README.md (0 hunks)
  • _run/minikube/deployment.yaml (0 hunks)
  • _run/minikube/provider.yaml (0 hunks)
  • _run/single/.envrc (0 hunks)
  • _run/single/Makefile (0 hunks)
  • _run/single/README.md (0 hunks)
  • _run/single/deployment.yaml (0 hunks)
  • _run/single/deployment2.yaml (0 hunks)
  • _run/single/kind-config.yaml (0 hunks)
  • _run/single/provider.yaml (0 hunks)
  • integration/deployment_test.go (1 hunks)
  • integration/deployment_update_test.go (5 hunks)
  • integration/e2e_test.go (2 hunks)
  • integration/test_helpers.go (3 hunks)
💤 Files with no reviewable changes (12)
  • _run/single/provider.yaml
  • _run/single/.envrc
  • _run/minikube/.envrc
  • _run/single/kind-config.yaml
  • _run/minikube/provider.yaml
  • _run/single/deployment2.yaml
  • _run/single/Makefile
  • _run/minikube/Makefile
  • _run/single/deployment.yaml
  • _run/single/README.md
  • _run/minikube/README.md
  • _run/minikube/deployment.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • integration/deployment_update_test.go
  • integration/test_helpers.go
🧰 Additional context used
🧬 Code Graph Analysis (2)
integration/e2e_test.go (1)
integration/deployment_test.go (1)
  • E2EDeploymentCreate (29-31)
integration/deployment_test.go (5)
integration/e2e_test.go (1)
  • IntegrationTestSuite (54-73)
testutil/provider/provider.go (4)
  • TestSendManifest (48-62)
  • RunLocalProvider (107-136)
  • RunLocalHostnameOperator (138-149)
  • RunLocalIPOperator (151-164)
tools/fromctx/context.go (2)
  • MustKubeClientFromCtx (176-183)
  • CtxKeyErrGroup (32-32)
cluster/util/lease_id_to_namespace.go (1)
  • LeaseIDToNamespace (12-20)
integration/test_helpers.go (1)
  • NewObserver (198-204)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build-bins
🔇 Additional comments (3)
integration/e2e_test.go (2)

229-229: Good fix for transaction result capture.

Properly capturing the provider creation transaction result enables correct validation on line 237. This addresses the issue mentioned in past reviews.


564-564: Proper integration of new test suite.

Adding the E2EDeploymentCreate test suite registration follows the established pattern and ensures the new deployment tests are executed as part of the integration test run.

integration/deployment_test.go (1)

33-214: Well-structured test implementation aligned with PR objectives.

The test provides comprehensive coverage of deployment lifecycle with provider restart scenarios, effectively verifying that workloads remain unaffected by provider restarts. The use of the observer pattern for pod state monitoring aligns well with the PR's goal of observing provider cluster state and checking workload side effects.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in a Comment
  • Commit Unit Tests in branch luna/e2e-workload-restarts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@cloud-j-luna cloud-j-luna marked this pull request as ready for review June 11, 2025 09:48
@cloud-j-luna cloud-j-luna requested review from boz and troian as code owners June 11, 2025 09:48
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
integration/e2e_test.go (1)

249-257: ⚠️ Potential issue

Fix incorrect transaction validation.

The res variable from line 229 (provider creation) is being incorrectly used to validate unrelated transactions on lines 257 and 278. Each transaction should have its result captured and validated independently.

Apply this diff to fix the issue:

 // Publish provider's certificate
-_, err = ccli.TxPublishServerExec(
+res, err = ccli.TxPublishServerExec(
 	context.Background(),
 	s.validator.ClientCtx,
 	s.keyProvider.GetAddress(),
 	cliGlobalFlags()...,
 )
 s.Require().NoError(err)
 s.Require().NoError(s.network.WaitForNextBlock())
 clitestutil.ValidateTxSuccessful(s.T(), s.validator.ClientCtx, res.Bytes())
 // Publish tenant's certificate
-_, err = ccli.TxPublishClientExec(
+res, err = ccli.TxPublishClientExec(
 	context.Background(),
 	s.validator.ClientCtx,
 	s.keyTenant.GetAddress(),
 	cliGlobalFlags()...,
 )
 s.Require().NoError(err)
 s.Require().NoError(s.network.WaitForNextBlock())
 clitestutil.ValidateTxSuccessful(s.T(), s.validator.ClientCtx, res.Bytes())

Also applies to: 270-278

🧹 Nitpick comments (1)
_run/.envrc (1)

8-10: Remove dead commented-out code
Since the tqdm check is no longer enforced, it’s clearer to delete these lines entirely rather than leave them commented out in .envrc. You can rely on VCS history to restore if needed.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04097f4 and 6e9f364.

📒 Files selected for processing (17)
  • _run/.envrc (1 hunks)
  • _run/minikube/.envrc (0 hunks)
  • _run/minikube/Makefile (0 hunks)
  • _run/minikube/README.md (0 hunks)
  • _run/minikube/deployment.yaml (0 hunks)
  • _run/minikube/provider.yaml (0 hunks)
  • _run/single/.envrc (0 hunks)
  • _run/single/Makefile (0 hunks)
  • _run/single/README.md (0 hunks)
  • _run/single/deployment.yaml (0 hunks)
  • _run/single/deployment2.yaml (0 hunks)
  • _run/single/kind-config.yaml (0 hunks)
  • _run/single/provider.yaml (0 hunks)
  • integration/deployment_test.go (1 hunks)
  • integration/deployment_update_test.go (6 hunks)
  • integration/e2e_test.go (2 hunks)
  • integration/test_helpers.go (3 hunks)
💤 Files with no reviewable changes (12)
  • _run/minikube/.envrc
  • _run/single/.envrc
  • _run/single/kind-config.yaml
  • _run/single/provider.yaml
  • _run/minikube/provider.yaml
  • _run/single/deployment.yaml
  • _run/minikube/deployment.yaml
  • _run/single/README.md
  • _run/single/deployment2.yaml
  • _run/minikube/Makefile
  • _run/minikube/README.md
  • _run/single/Makefile
🧰 Additional context used
🧬 Code Graph Analysis (2)
integration/deployment_test.go (5)
integration/e2e_test.go (1)
  • IntegrationTestSuite (54-73)
testutil/provider/provider.go (4)
  • TestSendManifest (48-62)
  • RunLocalProvider (107-136)
  • RunLocalHostnameOperator (138-149)
  • RunLocalIPOperator (151-164)
tools/fromctx/context.go (2)
  • MustKubeClientFromCtx (174-181)
  • CtxKeyErrGroup (31-31)
cluster/util/lease_id_to_namespace.go (1)
  • LeaseIDToNamespace (12-20)
integration/test_helpers.go (1)
  • NewObserver (198-204)
integration/e2e_test.go (1)
integration/deployment_test.go (1)
  • E2EDeploymentCreate (28-30)
🪛 GitHub Actions: tests
integration/deployment_test.go

[error] 195-195: undefined: ptestutil.RunProviderJWTServer

🔇 Additional comments (5)
integration/deployment_update_test.go (2)

117-137: LGTM! Good integration of pod state verification.

The Observer pattern is correctly integrated to verify that deployment updates result in new pod creation, which aligns with the PR objectives.


206-285: LGTM! Proper verification of pod stability during lease shell operations.

The Observer correctly verifies that lease shell operations don't trigger pod restarts or recreations, which fulfills one of the key PR objectives.

integration/test_helpers.go (3)

184-222: Well-designed Observer implementation for pod state tracking.

The Observer correctly tracks both pod UIDs and container restart counts, providing comprehensive coverage for detecting pod lifecycle changes. The implementation properly handles the label selector for web services.


242-271: Good timeout and retry logic for pod creation verification.

The method uses appropriate timeouts (30s) and polling intervals (5s) to wait for pod creation, with clear error messages.


273-295: Correct implementation for verifying pod stability.

The method properly waits for the timeout period while continuously checking for unexpected changes, returning nil if no changes are detected (the expected behavior).

@troian
Copy link
Member

troian commented Jul 2, 2025

@cloud-j-luna lets rebase on to latest main and knock failing e2e out

@cloud-j-luna cloud-j-luna force-pushed the luna/e2e-workload-restarts branch from 6e9f364 to f6d5dba Compare July 2, 2025 19:51
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
integration/deployment_test.go (1)

38-41: Consider using dynamic deployment sequence to avoid conflicts.

The hardcoded deployment sequence (102) could cause test failures if run concurrently or multiple times without cleanup.

	deploymentID := dtypes.DeploymentID{
		Owner: s.keyTenant.GetAddress().String(),
-		DSeq:  uint64(102),
+		DSeq:  uint64(time.Now().UnixNano() / 1000000), // Use timestamp for uniqueness
	}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6d5dba and b526869.

📒 Files selected for processing (1)
  • integration/deployment_test.go (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
integration/deployment_test.go (5)
integration/e2e_test.go (1)
  • IntegrationTestSuite (54-73)
testutil/provider/provider.go (4)
  • TestSendManifest (48-62)
  • RunLocalProvider (107-136)
  • RunLocalHostnameOperator (138-149)
  • RunLocalIPOperator (151-164)
tools/fromctx/context.go (2)
  • MustKubeClientFromCtx (176-183)
  • CtxKeyErrGroup (32-32)
cluster/util/lease_id_to_namespace.go (1)
  • LeaseIDToNamespace (12-20)
integration/test_helpers.go (1)
  • NewObserver (198-204)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: integration-tests / crd-e2e
  • GitHub Check: release-dry-run
  • GitHub Check: lint
  • GitHub Check: build-bins
  • GitHub Check: tests
🔇 Additional comments (1)
integration/deployment_test.go (1)

33-214: Excellent implementation of provider restart resilience test.

This test successfully implements the PR objective of verifying that provider restarts don't affect deployed workloads. The use of the Observer pattern to track pod state before and after restart is well-designed and provides robust validation.

troian
troian previously approved these changes Jul 4, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
integration/deployment_test.go (1)

62-62: Fix incorrect transaction validation - still using wrong result variable.

This is the same issue flagged in previous reviews. Line 62 validates the lease creation using res.Bytes(), but res contains the deployment creation result from line 35, not the lease creation result.

-	_, err = mcli.TxCreateLeaseExec(
+	leaseRes, err := mcli.TxCreateLeaseExec(
		s.validator.ClientCtx,
		bidID,
		s.keyTenant.GetAddress(),
		cliGlobalFlags()...,
	)
	s.Require().NoError(err)
	s.Require().NoError(s.waitForBlocksCommitted(2))
-	clitestutil.ValidateTxSuccessful(s.T(), s.validator.ClientCtx, res.Bytes())
+	clitestutil.ValidateTxSuccessful(s.T(), s.validator.ClientCtx, leaseRes.Bytes())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d84d3a9 and f66b63d.

📒 Files selected for processing (5)
  • _run/.envrc (1 hunks)
  • integration/deployment_test.go (1 hunks)
  • integration/deployment_update_test.go (6 hunks)
  • integration/test_helpers.go (3 hunks)
  • make/test-integration.mk (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • _run/.envrc
🚧 Files skipped from review as they are similar to previous changes (2)
  • integration/deployment_update_test.go
  • integration/test_helpers.go
🧰 Additional context used
🧬 Code Graph Analysis (1)
integration/deployment_test.go (4)
testutil/provider/provider.go (2)
  • TestSendManifest (48-62)
  • RunLocalProvider (107-132)
tools/fromctx/context.go (1)
  • MustKubeClientFromCtx (172-179)
cluster/util/lease_id_to_namespace.go (1)
  • LeaseIDToNamespace (12-20)
integration/test_helpers.go (1)
  • NewObserver (198-204)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: integration-tests / crd-e2e
  • GitHub Check: release-dry-run
  • GitHub Check: lint
  • GitHub Check: tests
  • GitHub Check: build-bins
🔇 Additional comments (3)
integration/deployment_test.go (2)

24-41: Test implementation looks good overall.

The deployment creation logic follows the expected pattern with proper error handling, block waiting, and transaction validation. The test structure aligns well with the PR objectives of observing cluster state during deployment updates.


110-110: Ignore missing import for NewObserver

The NewObserver function is defined in integration/test_helpers.go within the same integration package, so it’s already in scope and requires no additional import.

  • integration/test_helpers.go:198–204 — defines func NewObserver(namespace string) *Observer

Likely an incorrect or invalid review comment.

make/test-integration.mk (1)

28-28: Consider if narrowing test scope is intentional for this PR.

The test execution has been narrowed from running the entire TestIntegrationTestSuite to only TestE2ELeaseShell. While this can be useful for focused development testing, it might not provide comprehensive coverage for CI/CD pipelines.

Verify that this change aligns with the testing strategy - if this is temporary for development, consider reverting it before merging.

@cloud-j-luna
Copy link
Member Author

This issue is stale until I get back. Changes were overwritten with old branch and work is on a machine I don't have access right now. Will be fixed and merged by 10th of July.

@cloud-j-luna cloud-j-luna force-pushed the luna/e2e-workload-restarts branch 2 times, most recently from d84d3a9 to 1b08a34 Compare July 4, 2025 19:50
cloud-j-luna and others added 8 commits July 4, 2025 20:52
Signed-off-by: Joao Luna <[email protected]>
…suer (#299)

refs akash-network/support#302

Signed-off-by: Artur Troian <[email protected]>
Co-authored-by: Artur Troian <[email protected]>
Signed-off-by: Joao Luna <[email protected]>
Signed-off-by: Artur Troian <[email protected]>
Co-authored-by: Artur Troian <[email protected]>
Signed-off-by: Joao Luna <[email protected]>
Signed-off-by: Artur Troian <[email protected]>
Co-authored-by: Artur Troian <[email protected]>
Signed-off-by: Joao Luna <[email protected]>
Signed-off-by: Joao Luna <[email protected]>
Signed-off-by: Joao Luna <[email protected]>
Signed-off-by: Joao Luna <[email protected]>
Signed-off-by: Joao Luna <[email protected]>
@cloud-j-luna cloud-j-luna force-pushed the luna/e2e-workload-restarts branch from 1b08a34 to e2a3d95 Compare July 4, 2025 19:52
@cloud-j-luna cloud-j-luna requested a review from troian July 4, 2025 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chore: add capability of e2e tests to observe provider cluster state and check workload side effects
2 participants