fix(skore): convert DataFrame column names to string type #2034
+75
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #2029
CC: @thomass-dev
When debugging, in the
data_accessor.py
file:skore/skore/src/skore/_sklearn/_estimator/data_accessor.py
Lines 49 to 67 in eb0d6e9
This method only converts numpy arrays to DataFrames with string column names, but doesn't handle the case if DataFrames already exist but have integer type column names.
This then further passes down to the skrub functions (which expects strings), and the final error of
TypeError: cannot use a string pattern on a bytes-like object
occurs becausesuggested_name
is an integer from the RangeIndex columns.tag_pattern
is a string regex pattern andre.findall(tag_pattern, suggested_name)
method expects both arguments to be strings or both to be bytes-like objects.I tried to overcome this by ensuring that the DataFrame object has string column names if it already exists to avoid issues with skrub later when passing down from
data_accessor.py
function.Alternatively, should I simply just raise exception errors if the compute fails in subsequent steps instead of this?