Description
Describe the bug
During extraction of at least 29 NETGEAR firmware images, unblob may try creating the same output file twice triggering an exception. As a result, some files that should be extracted are not.
To Reproduce
Steps to reproduce the behavior:
- Download a sample firmware to trigger the bug with:
wget https://www.downloads.netgear.com/files/GDC/M4100/M4100-V10.0.2.20.zip
- Launch unblob with command
unblob -v M4100-V10.0.2.20.zip
- See error:
2024-02-10 23:39.13 [error ] Unknown error happened while extracting chunk pid=2295991
Traceback (most recent call last):
File "/unblob/unblob/processing.py", line 607, in _extract_chunk
if result := chunk.extract(inpath, extract_dir):
File "/unblob/unblob/models.py", line 115, in extract
return self.handler.extract(inpath, outdir)
File "/unblob/unblob/models.py", line 452, in extract
return self.EXTRACTOR.extract(inpath, outdir)
File "/unblob/unblob/handlers/archive/cpio.py", line 384, in extract
parser.dump_entries(fs)
File "/unblob/unblob/handlers/archive/cpio.py", line 215, in dump_entries
fs.carve(entry.path, self.file, entry.start_offset, entry.size, mode=entry.mode & 0o777)
File "/unblob/unblob/file_utils.py", line 511, in carve
carve(safe_path, file, start_offset, size, mode=mode)
File "/unblob/unblob/file_utils.py", line 294, in carve
with carve_path.open("xb") as f:
File "/usr/lib/python3.10/pathlib.py", line 1119, in open
return self._accessor.open(self, mode, buffering, encoding, errors,
FileExistsError: [Errno 17] File exists: '/tmp/tmp1151iav4/M4100_V10.0.2.20.zip_extract/m4100v10.0.2.20.stk_extract/1201148-2097967.lzma_extract/lzma.uncompressed_extract/lib/libthread_db-1.0.so'
Expected behavior
This error should not be raised, instead additional files should be extracted. I made a simpel change in file_utils.py's carve method (see below) to return early if the target file already exists and with this change an extra 75 files are created in [extract_dir]/m4100v10.0.2.20.stk_extract/1201148-2097967.lzma_extract/lzma.uncompressed_extract
. I doubt this is the right fix, but it shows that this bug prevents some files from being extracted.
Environment information:
- OS: Ubuntu 22.04
- Docker
Linux b4935d734f27 6.2.2 #3 SMP PREEMPT_DYNAMIC Wed Mar 8 12:03:22 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
The following executables found installed, which are needed by unblob:
7z ✓
debugfs ✓
jefferson ✓
lz4 ✓
lziprecover ✓
lzop ✓
sasquatch ✓
sasquatch-v4be ✓
simg2img ✓
ubireader_extract_files ✓
ubireader_extract_images ✓
unar ✓
zstd ✓
Additional context
I found this bug while doing some large-scale evaluations of filesystems produced by binwalk and unblob using fw2tar.
My (likely-incorrect) patch that results in additional files being created:
diff --git a/unblob/file_utils.py b/unblob/file_utils.py
index 21e887b..3db4b98 100644
--- a/unblob/file_utils.py
+++ b/unblob/file_utils.py
@@ -291,6 +291,9 @@ def carve(carve_path: Path, file: File, start_offset: int, size: int):
"""Extract part of a file."""
carve_path.parent.mkdir(parents=True, exist_ok=True)
+ if carve_path.exists():
+ print(f"Warning not replacing {carve_path}")
+ return
with carve_path.open("xb") as f:
for data in iterate_file(file, start_offset, size):
f.write(data)
After fixing this, I got another error along the same vein in file_utils which I patched with:
diff --git a/unblob/file_utils.py b/unblob/file_utils.py
index 21e887b..3db4b98 100644
--- a/unblob/file_utils.py
+++ b/unblob/file_utils.py
@@ -579,7 +582,8 @@ class FileSystem:
if safe_link:
dst = safe_link.dst.absolute_path
self._ensure_parent_dir(dst)
- dst.symlink_to(src)
+ if not dst.exists():
+ dst.symlink_to(src)
def create_hardlink(self, src: Path, dst: Path):
"""Create a new hardlink dst to the existing file src."""