Skip to content

Commit 475c123

Browse files
committed
refactor!: Refactor field count handling in CsvReader
Replaced `ignoreDifferentFieldCount` with `allowExtraFields` and `allowMissingFields` for improved control over record validation. Changed the default to NOT allow this.
1 parent 04b37aa commit 475c123

File tree

16 files changed

+118
-75
lines changed

16 files changed

+118
-75
lines changed

docs/src/content/docs/architecture/interpretation.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -89,15 +89,17 @@ Consider the following CSV snippet as an illustration of varying field counts:
8989
```
9090
header_a,header_bCRLF
9191
value_a_1CRLF
92-
value_a_2,value_b_2CRLF
92+
value_a_2,value_b_2,value_c_2CRLF
9393
```
9494

9595
In this example, `value_a_1` likely belongs to `header_a`, and `header_b` does not have a value for the first data
96-
record. However, this is just an assumption.
96+
record. However, this is just an assumption. Field `value_c_2` does not even have a corresponding header.
9797

98-
By default, FastCSV handles scenarios with different field counts by ignoring them (see
99-
`CsvReaderBuilder.ignoreDifferentFieldCount(boolean)`). This is done to accommodate this frequently occurring case. To
100-
ensure no misinterpretation, you can disable this behavior, causing an exception to be thrown when reading such data.
98+
To ensure no misinterpretation, FastCSV does not allow extra or missing fields in a record by default.
99+
This means that the above example would result in a `CsvParseException` when reading it with FastCSV.
100+
101+
However, this behavior can be changed by setting `CsvReaderBuilder.allowExtraFields(boolean)`
102+
and `CsvReaderBuilder.allowMissingFields(boolean)` to `true`.
101103

102104
### Empty lines
103105

docs/src/content/docs/guides/Examples/reading-ambiguous-data.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ There are a few widespread ambiguities in CSV files:
1010
See the [Empty lines section of the CSV Interpretation](/architecture/interpretation/#empty-lines) page for more information.
1111
2. **Empty fields**: CSV files can contain empty fields, which are fields that contain no data.
1212
See the [Empty fields / null values section of the CSV Interpretation](/architecture/interpretation/#empty-fields--null-values) page for more information.
13-
3. **Missing fields**: CSV files can contain missing fields, which are fields that are not present in a record.
14-
See the [Different field count of the CSV Interpretation](/architecture/interpretation/#different-field-count) page for more information.
13+
3. **Extra or missing fields**: CSV files can contain a different number of fields in each record.
14+
See the [Extra fields section of the CSV Interpretation](/architecture/interpretation/#extra-fields) page for more information.
1515

1616
FastCSV is very aware of these ambiguities and provides ways to handle them.
1717

docs/src/content/docs/guides/Examples/skip-non-csv-head.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,10 @@ Strictly speaking, such a file **is not a valid CSV file** as defined by the CSV
2121

2222
The main problem with those files is:
2323

24-
- An exception would be thrown unless the options `ignoreDifferentFieldCount()` and `skipEmptyLines()` are set.
2524
- When working with named fields, the very first line (`This is an example of a CSV file that contains`)
2625
would be interpreted as the actual header line.
26+
- An exception would be thrown unless the options `allowExtraFields(true)` is set, as some lines have
27+
more fields than the first line.
2728

2829
FastCSV comes with two features to handle such files:
2930

docs/src/content/docs/guides/basic.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,8 @@ CsvReader.builder()
5151
.commentStrategy(CommentStrategy.SKIP)
5252
.commentCharacter('#')
5353
.skipEmptyLines(true)
54-
.ignoreDifferentFieldCount(false)
54+
.allowExtraFields(false)
55+
.allowMissingFields(false)
5556
.allowExtraCharsAfterClosingQuote(false)
5657
.detectBomHeader(false)
5758
.maxBufferSize(16777216);

docs/src/content/docs/guides/upgrading.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,17 @@ For a full list of changes, including new features, see the [changelog](https://
1313
- The minimum Java version has been raised from 11 to 17
1414
- This also raised the required Android API level from version 33 (Android 13) to 34 (Android 14)
1515

16+
## Ignoring different field counts
17+
18+
FastCSV 4.x no longer ignores different field counts by default, ensuring that data is not misinterpreted.
19+
20+
You can change this behavior by calling `allowExtraFields(true)` and `allowMissingFields(true)` in the `CsvReaderBuilder`.
21+
These methods provide more control over how to handle different field counts in CSV data than the previous (now removed) `ignoreDifferentFieldCount()` method.
22+
23+
:::caution
24+
As the default has changed, you may need to check your code and your desired behavior.
25+
:::
26+
1627
## Internal buffer flushing
1728

1829
In FastCSV 2.x and 3.x, the CsvWriter instantiated via `CsvWriterBuilder.build(Writer)` flushed the internal buffer to the `Writer` after each record.

example/src/main/java/example/ExampleCsvReaderWithFaultyData.java

Lines changed: 12 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -8,34 +8,27 @@ public class ExampleCsvReaderWithFaultyData {
88

99
private static final String DATA = """
1010
foo,bar
11-
only one field followed by some empty lines
12-
13-
14-
bar,foo
11+
foo
12+
foo,bar,baz
1513
""";
1614

1715
public static void main(final String[] args) {
18-
System.out.println("Reading data with lenient (default) settings:");
19-
CsvReader.builder()
20-
.ofCsvRecord(DATA)
21-
.forEach(System.out::println);
22-
23-
System.out.println("Reading data while not skipping empty lines:");
24-
CsvReader.builder()
25-
.skipEmptyLines(false)
26-
.ofCsvRecord(DATA)
27-
.forEach(System.out::println);
28-
29-
System.out.println("Reading data while not ignoring different field counts:");
16+
System.out.println("Reading data with default settings:");
3017
try {
3118
CsvReader.builder()
32-
.ignoreDifferentFieldCount(false)
3319
.ofCsvRecord(DATA)
3420
.forEach(System.out::println);
3521
} catch (final CsvParseException e) {
36-
System.out.println(e.getMessage());
37-
System.out.println(e.getCause().getMessage());
22+
System.out.println("Exception expected due to different field counts:");
23+
e.printStackTrace(System.out);
3824
}
25+
26+
System.out.println("Reading data while not ignoring different field counts:");
27+
CsvReader.builder()
28+
.allowExtraFields(true)
29+
.allowMissingFields(true)
30+
.ofCsvRecord(DATA)
31+
.forEach(System.out::println);
3932
}
4033

4134
}

lib/src/intTest/java/blackbox/reader/AbstractCsvReaderTest.java

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -80,36 +80,38 @@ void immutableResponse() {
8080
.isInstanceOf(UnsupportedOperationException.class);
8181
}
8282

83-
// different field count
83+
// allow extra fields
8484

85-
@ParameterizedTest
86-
@ValueSource(strings = {
87-
"foo\nbar",
88-
"foo\nbar\n",
89-
"foo,bar\nfaz,baz",
90-
"foo,bar\nfaz,baz\n",
91-
"foo,bar\n,baz",
92-
",bar\nfaz,baz"
93-
})
94-
void differentFieldCountSuccess(final String s) {
95-
assertThatNoException().isThrownBy(() -> readAll(s));
85+
@Test
86+
void allowNoExtraFields() {
87+
assertThatThrownBy(() -> readAll("foo\nfoo,bar"))
88+
.isInstanceOf(CsvParseException.class)
89+
.hasMessage("Exception when reading record that started in line 2")
90+
.hasRootCauseInstanceOf(CsvParseException.class)
91+
.hasRootCauseMessage("Record 2 has 2 fields, but first record had 1 fields");
9692
}
9793

9894
@Test
99-
void differentFieldCountSuccess2() {
100-
crb.ignoreDifferentFieldCount(false);
101-
assertThatNoException().isThrownBy(() -> readAll("foo\nbar"));
95+
void allowExtraFields() {
96+
crb.allowExtraFields(true);
97+
assertThatNoException().isThrownBy(() -> readAll("foo\nfoo,bar"));
10298
}
10399

104-
@Test
105-
void differentFieldCountFail() {
106-
crb.ignoreDifferentFieldCount(false);
100+
// allow missing fields
107101

108-
assertThatThrownBy(() -> readAll("foo\nbar,\"baz\nbax\""))
102+
@Test
103+
void allowNoMissingFields() {
104+
assertThatThrownBy(() -> readAll("foo,bar\nfoo"))
109105
.isInstanceOf(CsvParseException.class)
110106
.hasMessage("Exception when reading record that started in line 2")
111107
.hasRootCauseInstanceOf(CsvParseException.class)
112-
.hasRootCauseMessage("Record 2 has 2 fields, but first record had 1 fields");
108+
.hasRootCauseMessage("Record 2 has 1 fields, but first record had 2 fields");
109+
}
110+
111+
@Test
112+
void allowMissingFields() {
113+
crb.allowMissingFields(true);
114+
assertThatNoException().isThrownBy(() -> readAll("foo,bar\nfoo"));
113115
}
114116

115117
// allow extra characters after closing quotes

lib/src/intTest/java/blackbox/reader/AbstractSkipLinesTest.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,15 +54,15 @@ void multipleRecordsNoSkipEmpty() {
5454
@ParameterizedTest
5555
@ValueSource(strings = {",\nfoo\n", ",,\nfoo\n", "''\nfoo\n", "' '\nfoo\n"})
5656
void notEmpty(final String input) {
57-
crb.quoteCharacter('\'');
57+
crb.allowMissingFields(true).quoteCharacter('\'');
5858
final CsvRecordHandler cbh = CsvRecordHandler.of(c -> c.fieldModifier(FieldModifiers.TRIM));
5959
assertThat(crb.build(cbh, input).stream()).hasSize(2);
6060
}
6161

6262
@ParameterizedTest
6363
@ValueSource(strings = {",\nfoo\n", ",,\nfoo\n", "''\nfoo\n", "' '\nfoo\n"})
6464
void notEmptyCustomCallback(final String input) {
65-
crb.quoteCharacter('\'');
65+
crb.allowMissingFields(true).quoteCharacter('\'');
6666
final AbstractBaseCsvCallbackHandler<String[]> cbh = new AbstractBaseCsvCallbackHandler<>() {
6767
private final List<String> fields = new ArrayList<>();
6868

lib/src/intTest/java/blackbox/reader/CsvReaderBuilderTest.java

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ void builderToString() {
7979
.isEqualTo("""
8080
CsvReaderBuilder[fieldSeparator=,, quoteCharacter=", \
8181
commentStrategy=NONE, commentCharacter=#, skipEmptyLines=true, \
82-
ignoreDifferentFieldCount=true, allowExtraCharsAfterClosingQuote=false, \
82+
allowExtraFields=false, allowMissingFields=false, allowExtraCharsAfterClosingQuote=false, \
8383
trimWhitespacesAroundQuotes=false, detectBomHeader=false, maxBufferSize=16777216]""");
8484
}
8585

@@ -128,7 +128,8 @@ void chained() {
128128
.commentStrategy(CommentStrategy.NONE)
129129
.commentCharacter('#')
130130
.skipEmptyLines(true)
131-
.ignoreDifferentFieldCount(false)
131+
.allowExtraFields(false)
132+
.allowMissingFields(false)
132133
.allowExtraCharsAfterClosingQuote(false)
133134
.ofCsvRecord("foo");
134135

lib/src/intTest/java/blackbox/reader/RelaxedCsvReaderTest.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ void readerToString() {
3535
assertThat(crb.ofCsvRecord(""))
3636
.asString()
3737
.isEqualTo("CsvReader[commentStrategy=NONE, skipEmptyLines=true, "
38-
+ "ignoreDifferentFieldCount=true, parser=RelaxedCsvParser]");
38+
+ "allowExtraFields=false, allowMissingFields=false, parser=RelaxedCsvParser]");
3939
}
4040

4141
}

0 commit comments

Comments
 (0)