-
Notifications
You must be signed in to change notification settings - Fork 562
Add Check for empty parts to OpenXmlValidator #1920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces the IsMinimumDocument method to WordprocessingDocument, SpreadsheetDocument, and PresentationDocument to validate that a file meets minimal structural and extension requirements before further processing.
- Added IsMinimumDocument method for WordprocessingDocument with path, extension, and Body element checks
- Added IsMinimumDocument method for SpreadsheetDocument with validations for sheets and sheet data and handling for unsupported document types
- Added IsMinimumDocument method for PresentationDocument with validation for NotesSize element and restrictions on unsupported document types
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
src/DocumentFormat.OpenXml/Packaging/WordprocessingDocument.cs | Added validation method ensuring file existence, correct extension, and Body element presence |
src/DocumentFormat.OpenXml/Packaging/SpreadsheetDocument.cs | Introduced method validation for SpreadsheetDocument with extension checks and element validations |
src/DocumentFormat.OpenXml/Packaging/PresentationDocument.cs | Added a validation method ensuring PresentationDocument contains valid NotesSize and proper extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea. However, it would be great for it to have the following characteristics:
- Be overrideable/replaceable
- Be accessed off of OpenXmlPackage directly rather than implemented separately
Questions that would decide the API shape:
- Should this be a separate API? Or should it be opt in while opening a package? I tend not to be a fan of APIs that check if something is valid then doing it - there's always the potential something gets changed underneath
- Do we really want to validate the extension? What about if it's a stream/package? If it is overrideable, we could have a check if we've opened it with a path to validate that if needed
- Are we OK with it throwing an exception if it is an ill-formed package (i.e. we already do that)?
Created virtual method in
There is a default implementation in
Added
The minimum package is verified on all Open overloads with
Removed file checks. The Open method already handles those. |
@mikeebowen this would be a good use case for a feature so it can be overridden at runtime vs compile time. Here's an example of how to do it: fa893dd |
use feature for minimum document
A few questions about the direction/goal here Only a few kinds of packages are supported (such as in the presentation doc) - it will fail if it's an unsupported one Is this what we want? I'd expect it to validate it but fail if it is invalid. It appears to also fail if we don't have a check which seems a little unexpected. Validate() throws and returns true/false I'd expect it to do one or the other. Is the manual throwing due to custom messages? If so, let's make it consistent Mixture of exception types I think we should use an existing OpenXmlPackageException type or create a new one for this How to think about this vs validation At some point, how is this different from the existing OpenXmlValidator logic? Should we be hooking this up to that and performing a full validation? |
…OpenXmlPackage tests
src/DocumentFormat.OpenXml.Framework/Packaging/OpenXmlPackage.cs
Outdated
Show resolved
Hide resolved
@@ -106,6 +107,20 @@ private void ValidatePart(OpenXmlPart part, ValidationContext context) | |||
{ | |||
Validate(context); | |||
} | |||
else if (part.Uri.ToString().EndsWith(".xml", System.StringComparison.InvariantCultureIgnoreCase) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to dispose the stream from getpart. probably best to add an extension method of ".IsEmptyPart()" that can handle correctly disposing the stream
The issue was not that there was no way to check for a minimally valid document, the issue was that the validator considered completely empty parts to be valid. So this issue was resolved by adding a check for empty parts to the OpenXmlValidator.