-
Notifications
You must be signed in to change notification settings - Fork 0
Yaron/eng 34182 change ib kubernetes to add tenant ports to weka ip over ib #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Yaron/eng 34182 change ib kubernetes to add tenant ports to weka ip over ib #3
Conversation
pkg/k8s-client/client.go
Outdated
|
||
// Update the Pod | ||
_, err = c.clientset.CoreV1().Pods(pod.Namespace).Update( | ||
context.Background(), currentPod, metav1.UpdateOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's updating the pod. Do it with RetryOnConflict?
I wasn't maintaining the values file - it's maintained in tcloud-infra
separately
…On Tue, Jul 29, 2025 at 1:20 PM syutogether ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In deployment/ib-kubernetes-configmap.yaml
<#3 (comment)>
:
> @@ -8,3 +8,5 @@ data:
DAEMON_PERIODIC_UPDATE: "5"
GUID_POOL_RANGE_START: "02:00:00:00:00:00:00:00"
GUID_POOL_RANGE_END: "02:FF:FF:FF:FF:FF:FF:FF"
+ # DEFAULT_LIMITED_PARTITION: "0x0001" # optional
+ ENABLE_IP_OVER_IB: "false" # default false
Both the optional DEFAULT_LIMITED_PARTITION and the default
ENABLE_IP_OVER_IB can be moved to the values.yaml. For the optional one,
we can comment it out in the values.yaml. For the default one, we keep it
uncommented in the values.yaml.
—
Reply to this email directly, view it on GitHub
<#3 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF6U6VED6VKXJOEQFCNYU5T3K7JPJAVCNFSM6AAAAACCU2KKR6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTANRZGA3TSMZTGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
tag_build_and_push.sh
Outdated
@@ -0,0 +1,47 @@ | |||
#! /bin/bash | |||
TAG=v1.1.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image tag is hardcoded. That does not look right. Maybe make it a parameter?
…ted GUIDs on restart
I see. It's not a chart in the ib-kubernetes repo. |
we don't have a proper build for this repo yet - I'm tagging manually
because that will be the stable release we'll be rolling out (also set in
tcloud-infra).
…On Tue, Jul 29, 2025 at 1:36 PM syutogether ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tag_build_and_push.sh
<#3 (comment)>
:
> @@ -0,0 +1,47 @@
+#! /bin/bash
+TAG=ENG-34182
There is a scripts folder
<https://github.com/togethercomputer/ib-kubernetes/tree/master/scripts>
in the ib-kubernetes repo. Looks like that's the right place to put the
tag_build_and_push.sh.
And like in the deploy.sh
<https://github.com/togethercomputer/ib-kubernetes/blob/master/scripts/deploy.sh>,
we can TAG as a parameter to the script.
—
Reply to this email directly, view it on GitHub
<#3 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF6U6VEAVR33AJ3FVKJFBBL3K7LLFAVCNFSM6AAAAACCU2KKR6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTANRZGE2DANRYG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me, thanks Yaron!
1b19081
to
71d1aaa
Compare
In order to isolate tenant VMs in their own pkey, and also allow them VMs to access a Weka cluster using their IP-over-IB ports securely, we need to be able to add the ports as limited members of the Weka partition (configured as "default partition"), and remove them when VMs are terminated and GUIDs are deleted.
Also had to fix the case where a tenant cluster has multiple VMs - since the code added a finalizer to the NAD - and there's only 1 NAD per cluster - upon cluster deletion - the operator only had the chance to act on it and remove GUIDs from UFM for 1 of the VMs. When the other was terminated, it couldn't find a matching NAD and couldn't clean up the GUIDs.
So added finalizer to the pods, and a check whether a NAD has pods attached to it, before removing it.