Confused about how NullLevel works #2745
-
|
Hello! I am working on a linkage project using splink. With the data that I have, I can use fields for different levels of a person's location (street address, postcode, city, state, and country). Because information at a lower level will be found at a higher level (if two people share the same address, they also share the same postcode), I want to avoid over-inflating the match results. To do this, I am creating a custom comparison for location fields that resembles the following: In the documentation for custom comparisons, the function NullLevel is referenced for situations whre one or both of the matching fields are null, but I am confused as to how this function works. Since the missingness of my location data is not consistent (a person may have a street value but not a postcode value, and vice versa), it would be useful for me to be able to do a null check for each field in the comparison, but I cannot tell if this is or is not supported. Could someone please give a basic overview of how the NullLevel function works? I would like to specifically know:
Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
The key things to know/remember with comparison levels are:
So in answer to your first question, yes, if you have a first level of You can combine levels using For your use-case you would probably want a null level at the top like: comparison_levels=[
cll.And(
cll.NullLevel("street"),
cll.NullLevel("postcode"),
cll.NullLevel("city"),
cll.NullLevel("state"),
cll.NullLevel("country"),
),
...
]This captures any pair of records that do not share any of the five fields being present. In other words, it captures all instances where:
More complex configurations are possible, but unless you are doing any cross-field comparison levels that shouldn't be necessary. A couple of other points - apologies if you are aware of these and it's just an omission in the example:
One thing to consider is that your levels do not distinguish between whether preceding fields are
and your 'else' level will capture record pairs where:
and similarly for the further levels. I'm not sure what your data is like, but you may want to consider adding more levels to be able to distinguish some of these cases. You probably wouldn't need every possible case, and it may well be that what you have works well enough for your data - just something to bear in mind. |
Beta Was this translation helpful? Give feedback.
The key things to know/remember with comparison levels are:
CASE WHEN...statement)