Description
I believe there are a handful of bugs in how unblob handles symlinks. I'm not too confident about my understanding of unblob internals so I wanted to talk through these instead of just proposing fixes.
I've encountered all these failures with this firmware image: https://legacyfiles.us.dlink.com/DCS-5009L/REVA/FIRMWARE/DCS-5009L_REVA_FIRMWARE_1.00.B1.zip. I'm comparing the unblob output to that generated by running binwalk.
Issue 1: CPIO extractor calls create_symlink
with backwards arguments - the source of a symlink is the filename (entry.path) while the destination is where the link points to (I think). Without this fix all the binaries in this firmware that symlink to /bin/busybox
are missing in the output archive (i.e., /sbin/arp
) as _get_checked_link
in file_utils.py sees that /bin/busybox
already exists and skips (checking the destination, not the source, since the arguments are swapped).
diff --git a/unblob/handlers/archive/cpio.py b/unblob/handlers/archive/cpio.py
index eacb30a..fecb5dc 100644
--- a/unblob/handlers/archive/cpio.py
+++ b/unblob/handlers/archive/cpio.py
@@ -223,7 +223,7 @@ class CPIOParserBase:
self.file[entry.start_offset : entry.start_offset + entry.size]
).decode("utf-8")
)
- fs.create_symlink(src=link_path, dst=entry.path)
+ fs.create_symlink(src=entry.path, dst=link_path)
elif (
stat.S_ISCHR(entry.mode)
or stat.S_ISBLK(entry.mode)
Issue 2: create_symlink
creates symlinks backwards, they currently point from dst
to src
. The call to _get_checked_link
has it right and correctly checks if the src
argument already exists. But, the code to actually create the link would previously create a link from dst -> src
instead of from src -> dst
. This would raise a FileExistsError
if Issue 1 is fixed with no other changes.
diff --git a/unblob/file_utils.py b/unblob/file_utils.py
index 167357d..c6b8677 100644
--- a/unblob/file_utils.py
+++ b/unblob/file_utils.py
@@ -593,12 +593,13 @@ class FileSystem:
# but they are relocatable
src = self._path_to_root(dst.parent) / chop_root(src)
- safe_link = self._get_checked_link(src=dst.parent / src, dst=dst)
+ safe_link = self._get_checked_link(src=src, dst=dst)
if safe_link:
- dst = safe_link.dst.absolute_path
- self._ensure_parent_dir(dst)
- dst.symlink_to(src)
+ src = safe_link.src.absolute_path
+ self._ensure_parent_dir(src)
+ logger.warning(f"Creating symlink {dst} -> {src}")
+ src.symlink_to(dst)
def create_hardlink(self, src: Path, dst: Path):
"""Create a new hardlink dst to the existing file src."""
Issue 3: False positives in path traversal detection. This firmware contains a CPIO archive with some valid, non-malicious symlinks, but they're incorrectly flagged as potential path traversals and skipped during extraction. This one has me pretty confused and I'm not sure what the best fix would be.
If I run cpio -itv
on the CPIO archive, I see a valid symlink for init to busybox:
lrwxrwxrwx 1 501 501 15 May 14 2014 /sbin/init -> ../bin/busybox
But during unblob extraction, I get a warning and the symlink is skipped:
2024-02-12 05:20.25 [warning ] Potential path traversal through link Skipped. link_path=sbin/init path=../bin/busybox
This happens for ~20 binaries in this firmware that have relative symlink to busybox
that involve ..
as part of the paths. I believe this is in part caused by the FSLink constructor failing to create self.dst
based on the self.src
path, which can maybe be fixed with:
diff --git a/unblob/file_utils.py b/unblob/file_utils.py
index c6b8677..e03d4ec 100644
--- a/unblob/file_utils.py
+++ b/unblob/file_utils.py
@@ -425,7 +425,7 @@ class _FSPath:
class _FSLink:
def __init__(self, *, root: Path, src: Path, dst: Path) -> None:
- self.dst = _FSPath(root=root, path=dst)
+ self.dst = _FSPath(root=root, path=root / src.parent / dst)
self.src = _FSPath(root=root, path=src)
self.is_safe = self.dst.is_safe and self.src.is_safe
With this fix, all the files except 2 (e.g., etc_ro/ppp/peers/3g -> /etc/3g
)are then created successfully - these remaining 2 have end up with ..
in the src field incorrectly after the body of the is_absolute
check in create_symlink
which incorrectly checks src
instead of dst
. This can be fixed with:
diff --git a/unblob/file_utils.py b/unblob/file_utils.py
index e03d4ec..5db58d4 100644
--- a/unblob/file_utils.py
+++ b/unblob/file_utils.py
@@ -587,11 +587,21 @@ class FileSystem:
"""Create a symlink dst with the link/content/target src."""
logger.debug("creating symlink", file_path=dst, link_target=src, _verbosity=3)
- if src.is_absolute():
- # convert absolute paths to dst relative paths
- # these would point to the same path if self.root would be the real root "/"
- # but they are relocatable
- src = self._path_to_root(dst.parent) / chop_root(src)
+ if dst.is_absolute():
+ # If the symlink destination is absolute, we need to make it relative to the root
+ # so it can be safely created in the extraction directory.
+ # If the resulting path points to outside of the extraction directory, we skip it.
+ dst = self.root / chop_root(dst)
+ if not is_safe_path(self.root, dst):
+ self.record_problem(
+ LinkExtractionProblem(
+ problem="Potential path traversal through symlink",
+ resolution="Skipped.",
+ path=str(dst),
+ link_path=str(src),
+ )
+ )
+ return