Skip to content

Add Check for empty parts to OpenXmlValidator #1920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

mikeebowen
Copy link
Collaborator

@mikeebowen mikeebowen commented Apr 23, 2025

The issue was not that there was no way to check for a minimally valid document, the issue was that the validator considered completely empty parts to be valid. So this issue was resolved by adding a check for empty parts to the OpenXmlValidator.

Copy link

github-actions bot commented Apr 23, 2025

Test Results

    58 files   -   1      58 suites   - 1   54m 55s ⏱️ + 3m 5s
 2 060 tests +  1   2 057 ✅ +  1   3 💤 ±0  0 ❌ ±0 
32 334 runs   - 164  32 298 ✅  - 164  36 💤 ±0  0 ❌ ±0 

Results for commit 31aec5a. ± Comparison against base commit b1d49f3.

♻️ This comment has been updated with latest results.

@mikeebowen mikeebowen requested review from twsouthwick, Copilot and tomjebo and removed request for Copilot April 23, 2025 19:19
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the IsMinimumDocument method to WordprocessingDocument, SpreadsheetDocument, and PresentationDocument to validate that a file meets minimal structural and extension requirements before further processing.

  • Added IsMinimumDocument method for WordprocessingDocument with path, extension, and Body element checks
  • Added IsMinimumDocument method for SpreadsheetDocument with validations for sheets and sheet data and handling for unsupported document types
  • Added IsMinimumDocument method for PresentationDocument with validation for NotesSize element and restrictions on unsupported document types

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/DocumentFormat.OpenXml/Packaging/WordprocessingDocument.cs Added validation method ensuring file existence, correct extension, and Body element presence
src/DocumentFormat.OpenXml/Packaging/SpreadsheetDocument.cs Introduced method validation for SpreadsheetDocument with extension checks and element validations
src/DocumentFormat.OpenXml/Packaging/PresentationDocument.cs Added a validation method ensuring PresentationDocument contains valid NotesSize and proper extension

Copy link
Member

@twsouthwick twsouthwick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea. However, it would be great for it to have the following characteristics:

  • Be overrideable/replaceable
  • Be accessed off of OpenXmlPackage directly rather than implemented separately

Questions that would decide the API shape:

  • Should this be a separate API? Or should it be opt in while opening a package? I tend not to be a fan of APIs that check if something is valid then doing it - there's always the potential something gets changed underneath
  • Do we really want to validate the extension? What about if it's a stream/package? If it is overrideable, we could have a check if we've opened it with a path to validate that if needed
  • Are we OK with it throwing an exception if it is an ill-formed package (i.e. we already do that)?

@mikeebowen mikeebowen marked this pull request as draft April 29, 2025 23:25
@mikeebowen
Copy link
Collaborator Author

mikeebowen commented May 6, 2025

@twsouthwick

Be overrideable/replaceable

Created virtual method in OpenXmlPackage

Be accessed off of OpenXmlPackage directly rather than implemented separately

There is a default implementation in OpenXmlPackage but the individual Wordprocessing, Spreadsheet, and Presentation Documents have unique implementations for their different requirements.

Should this be a separate API? Or should it be opt in while opening a package? I tend not to be a fan of APIs that check if something is valid then doing it - there's always the potential something gets changed underneath

Added VerifyMinimumPackage option to OpenSettings when true, the package contents are examined and throw an exception if minimum package requirements are not met

Do we really want to validate the extension? What about if it's a stream/package? If it is overrideable, we could have a check if we've opened it with a path to validate that if needed

The minimum package is verified on all Open overloads with VerifyMinimumPackage true. The Open method handles exceptions for the file path such as invalid extension or invalid characters, so with the new API validating the path is not necessary.

Are we OK with it throwing an exception if it is an ill-formed package (i.e. we already do that)?

Removed file checks. The Open method already handles those.

@mikeebowen mikeebowen marked this pull request as ready for review May 6, 2025 16:42
@mikeebowen mikeebowen requested a review from twsouthwick May 6, 2025 16:43
@twsouthwick
Copy link
Member

@mikeebowen this would be a good use case for a feature so it can be overridden at runtime vs compile time. Here's an example of how to do it: fa893dd

@mikeebowen mikeebowen changed the title Add IsMinimumDocument method to WordprocessingDocument, SpreadsheetDocument, and PresentationDocument Add VerifyMinimumPackage to OpenSettings options May 12, 2025
@twsouthwick
Copy link
Member

A few questions about the direction/goal here

Only a few kinds of packages are supported (such as in the presentation doc) - it will fail if it's an unsupported one

Is this what we want? I'd expect it to validate it but fail if it is invalid. It appears to also fail if we don't have a check which seems a little unexpected.

Validate() throws and returns true/false

I'd expect it to do one or the other. Is the manual throwing due to custom messages? If so, let's make it consistent

Mixture of exception types

I think we should use an existing OpenXmlPackageException type or create a new one for this

How to think about this vs validation

At some point, how is this different from the existing OpenXmlValidator logic? Should we be hooking this up to that and performing a full validation?

@mikeebowen mikeebowen changed the title Add VerifyMinimumPackage to OpenSettings options Add Check for empty parts to OpenXmlValidator Jun 13, 2025
@@ -106,6 +107,20 @@ private void ValidatePart(OpenXmlPart part, ValidationContext context)
{
Validate(context);
}
else if (part.Uri.ToString().EndsWith(".xml", System.StringComparison.InvariantCultureIgnoreCase) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to dispose the stream from getpart. probably best to add an extension method of ".IsEmptyPart()" that can handle correctly disposing the stream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is it possible to check if a file is a valid document and not some other file type renamed to .docx or some other office extension?
3 participants