Skip to content

fix some memleak #13306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions ompi/mca/op/avx/op_avx_component.c
Original file line number Diff line number Diff line change
Expand Up @@ -298,12 +298,6 @@ avx_component_op_query(struct ompi_op_t *op, int *priority)
}
}
#endif
if( NULL != module->opm_fns[i] ) {
OBJ_RETAIN(module);
}
if( NULL != module->opm_3buff_fns[i] ) {
OBJ_RETAIN(module);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is correct, the module is already retained once for each function used (during *_enable).

break;
case OMPI_OP_BASE_FORTRAN_LAND:
Expand Down
8 changes: 8 additions & 0 deletions opal/mca/common/ucx/common_ucx.c
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ OPAL_DECLSPEC void opal_common_ucx_mca_var_register(const mca_base_component_t *
{
char *default_tls = "rc_verbs,ud_verbs,rc_mlx5,dc_mlx5,ud_mlx5,cuda_ipc,rocm_ipc";
char *default_devices = "mlx*";
char *old_str = NULL;
int hook_index;
int verbose_index;
int progress_index;
Expand Down Expand Up @@ -113,6 +114,7 @@ OPAL_DECLSPEC void opal_common_ucx_mca_var_register(const mca_base_component_t *
if (NULL == *opal_common_ucx.tls) {
*opal_common_ucx.tls = strdup(default_tls);
}
old_str = *opal_common_ucx.tls;

tls_index = mca_base_var_register(
"opal", "opal_common", "ucx", "tls",
Expand All @@ -123,6 +125,7 @@ OPAL_DECLSPEC void opal_common_ucx_mca_var_register(const mca_base_component_t *
"please set to '^posix,sysv,self,tcp,cma,knem,xpmem'.",
MCA_BASE_VAR_TYPE_STRING, NULL, 0, MCA_BASE_VAR_FLAG_SETTABLE | MCA_BASE_VAR_FLAG_DWG,
OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_LOCAL, opal_common_ucx.tls);
free(old_str);

if (NULL == opal_common_ucx.devices) {
opal_common_ucx.devices = (char**) malloc(sizeof(char*));
Expand All @@ -132,13 +135,15 @@ OPAL_DECLSPEC void opal_common_ucx_mca_var_register(const mca_base_component_t *
if (NULL == *opal_common_ucx.devices) {
*opal_common_ucx.devices = strdup(default_devices);
}
old_str = *opal_common_ucx.devices;

devices_index = mca_base_var_register(
"opal", "opal_common", "ucx", "devices",
"List of device driver pattern names, which, if supported by UCX, will "
"bump its priority above ob1. Special values: any (any available)",
MCA_BASE_VAR_TYPE_STRING, NULL, 0, MCA_BASE_VAR_FLAG_SETTABLE | MCA_BASE_VAR_FLAG_DWG,
OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_LOCAL, opal_common_ucx.devices);
free(old_str);

if (component) {
mca_base_var_register_synonym(verbose_index, component->mca_project_name,
Expand Down Expand Up @@ -206,6 +211,9 @@ OPAL_DECLSPEC void opal_common_ucx_mca_deregister(void)
}
opal_mem_hooks_unregister_release(opal_common_ucx_mem_release_cb);
opal_output_close(opal_common_ucx.output);
if (opal_common_ucx.opal_mem_hooks) {
mca_base_framework_close(&opal_memory_base_framework);
}
}

#if HAVE_DECL_OPEN_MEMSTREAM
Expand Down
2 changes: 1 addition & 1 deletion opal/mca/patcher/base/patcher_base_frame.c
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ static int opal_patcher_base_close(void)
return opal_patcher->patch_fini();
}

return OPAL_SUCCESS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really vary skeptical about this change. While logically it sounds reasonable, the patcher is a very special module, it changes the way the memory allocations/deallocations are tracked, and I'm definitively not sure we can unload it. I personally would be against this change without proper testing, but I defer to others for additional insights.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just release component resources, and patch_list has been released in line 86.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly my concern. The patcher works by interposition, aka. it interposes functions calls to sbrk or similar API. If the shared library where these functions are defined runs out of scope and is unloaded, bad things will happen (as it will call a function in a memory area that has been release and possibly reused).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjelmn is the one that should really chime in on this. Looking at the code, all the right bits are there such that unloading the component should work. But clearly we're not testing it.

return mca_base_framework_components_close(&opal_patcher_base_framework, NULL);
}

/* Use default register/open functions */
Expand Down