Skip to content

Commit 5ae17f5

Browse files
committed
feat!: enforce unique headers by default
FastCSV now rejects duplicate headers by default to prevent data misinterpretation. A new `allowDuplicateHeader` option was introduced, allowing users to override this behavior if needed.
1 parent 7a10e18 commit 5ae17f5

File tree

6 files changed

+100
-15
lines changed

6 files changed

+100
-15
lines changed

docs/src/content/docs/architecture/interpretation.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -146,13 +146,14 @@ header_a,header_aCRLF
146146
value_1,value_2CRLF
147147
```
148148

149-
The `NamedCsvRecord` of FastCSV offers several options to handle this case:
149+
The `NamedCsvRecord` class in FastCSV offers several options to handle this scenario:
150150

151-
- `getField("header_a")`, `findField("header_a")` and `getFieldsAsMap()` returns only the **first** value (`"value_1"`).
152-
- `findFields("header_a")` and `getFieldsAsMapList()` returns a List containing **all** values (`"value_1"`
153-
and `"value_2"`).
151+
- By default, FastCSV does **not** allow duplicate headers to prevent misinterpretation of data.
152+
This behavior can be changed by calling `allowDuplicateHeader(true)` on the `NamedCsvRecordHandlerBuilder`.
153+
- Methods like `getField("header_a")`, `findField("header_a")`, and `getFieldsAsMap()` return only the **first** value (`"value_1"`).
154+
- Methods like `findFields("header_a")` and `getFieldsAsMapList()` return a list containing **all** values (`"value_1"` and `"value_2"`).
154155

155-
Regardless of the chosen option, FastCSV always handles the header as case-sensitive.
156+
Regardless of the option chosen, FastCSV always treats headers as case-sensitive.
156157

157158
### Spaces within fields
158159

docs/src/content/docs/guides/upgrading.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,23 @@ For a full list of changes, including new features, see the [changelog](https://
1313
- The minimum Java version has been raised from 11 to 17
1414
- This also raised the required Android API level from version 33 (Android 13) to 34 (Android 14)
1515

16+
## Duplicate header handling
17+
18+
FastCSV 4.x rejects duplicate headers by default, ensuring that each header field is unique and preventing misinterpretation.
19+
20+
You can change this behavior by calling `allowDuplicateHeader(true)` on the `NamedCsvRecordHandlerBuilder`.
21+
22+
```java title="Example"
23+
var rh = NamedCsvRecordHandler.of(c -> c.allowDuplicateHeader(true));
24+
try (CsvReader<NamedCsvRecord> csv = CsvReader.builder().build(rh, csvFile)) {
25+
// ...
26+
}
27+
```
28+
29+
:::caution
30+
As the default has changed, you may need to check your code and your desired behavior.
31+
:::
32+
1633
## Ignoring different field counts
1734

1835
FastCSV 4.x no longer ignores different field counts by default, ensuring that data is not misinterpreted.

docs/src/content/docs/index.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ The main features of FastCSV include:
7070
- Supports single and multi-character field separators
7171
- Supports trimming of whitespaces around quoted fields
7272
- Supports optional header records (access fields by name)
73+
- Supports for duplicate header names
7374
- Supports skipping empty lines
7475
- Supports skipping non-CSV header (either by a fixed number of lines or by peeking data)
7576
- Supports commented lines (skipping & reading) with configurable comment character

lib/src/intTest/java/blackbox/reader/NamedCsvReaderTest.java

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,9 @@ void findFieldByName() {
5353

5454
@Test
5555
void findFieldsByName() {
56-
assertThat(parse("foo,xoo,foo\nbar,moo,baz").stream())
56+
final var cbh = NamedCsvRecordHandler
57+
.of(c -> c.allowDuplicateHeader(true));
58+
assertThat(CsvReader.builder().build(cbh, "foo,xoo,foo\nbar,moo,baz").stream())
5759
.singleElement(NamedCsvRecordAssert.NAMED_CSV_RECORD)
5860
.findFields("foo").containsExactly("bar", "baz");
5961
}
@@ -86,26 +88,30 @@ void findNonExistingFieldByName2() {
8688

8789
@Test
8890
void headerToString() {
89-
assertThat(parse("headerA,headerB,headerA\nfieldA,fieldB,fieldC\n").stream())
91+
assertThat(parse("headerA,headerB,headerC\nfieldA,fieldB,fieldC\n").stream())
9092
.singleElement()
9193
.asString()
9294
.isEqualTo("NamedCsvRecord[startingLineNumber=2, "
9395
+ "fields=[fieldA, fieldB, fieldC], "
9496
+ "comment=false, "
95-
+ "header=[headerA, headerB, headerA]]");
97+
+ "header=[headerA, headerB, headerC]]");
9698
}
9799

98100
@Test
99101
void fieldMap() {
100-
assertThat(parse("headerA,headerB,headerA\nfieldA,fieldB,fieldC\n").stream())
102+
final var cbh = NamedCsvRecordHandler
103+
.of(c -> c.allowDuplicateHeader(true));
104+
assertThat(CsvReader.builder().build(cbh, "headerA,headerB,headerA\nfieldA,fieldB,fieldC\n").stream())
101105
.singleElement(NamedCsvRecordAssert.NAMED_CSV_RECORD)
102106
.fields()
103107
.containsExactly(entry("headerA", "fieldA"), entry("headerB", "fieldB"));
104108
}
105109

106110
@Test
107111
void allFieldsMap() {
108-
assertThat(parse("headerA,headerB,headerA\nfieldA,fieldB,fieldC\n").stream())
112+
final var cbh = NamedCsvRecordHandler
113+
.of(c -> c.allowDuplicateHeader(true));
114+
assertThat(CsvReader.builder().build(cbh, "headerA,headerB,headerA\nfieldA,fieldB,fieldC\n").stream())
109115
.singleElement(NamedCsvRecordAssert.NAMED_CSV_RECORD)
110116
.allFields()
111117
.containsOnly(entry("headerA", List.of("fieldA", "fieldC")), entry("headerB", List.of("fieldB")));

lib/src/intTest/java/blackbox/reader/NamedCsvRecordHandlerTest.java

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
package blackbox.reader;
22

33
import static org.assertj.core.api.Assertions.assertThat;
4+
import static org.assertj.core.api.Assertions.assertThatThrownBy;
45

56
import java.util.Map;
67

78
import org.junit.jupiter.api.Test;
89

10+
import de.siegmar.fastcsv.reader.CsvParseException;
911
import de.siegmar.fastcsv.reader.CsvReader;
1012
import de.siegmar.fastcsv.reader.FieldModifiers;
1113
import de.siegmar.fastcsv.reader.NamedCsvRecordHandler;
@@ -56,4 +58,20 @@ void consumer() {
5658
.fields().containsExactly(Map.entry("col1", "foo"), Map.entry("col2", "bar"));
5759
}
5860

61+
@Test
62+
void noDuplicateHeaderInit() {
63+
assertThatThrownBy(() -> NamedCsvRecordHandler.of(c -> c.header("col1", "col2", "col1")))
64+
.isInstanceOf(IllegalArgumentException.class)
65+
.hasMessage("Header contains duplicate fields: [col1]");
66+
}
67+
68+
@Test
69+
void noDuplicateHeaderData() {
70+
assertThatThrownBy(() -> CsvReader.builder().ofNamedCsvRecord("col1,col2,col1").stream().count())
71+
.isInstanceOf(CsvParseException.class)
72+
.hasMessage("Exception when reading first record")
73+
.hasRootCauseExactlyInstanceOf(IllegalArgumentException.class)
74+
.hasRootCauseMessage("Header contains duplicate fields: [col1]");
75+
}
76+
5977
}

lib/src/main/java/de/siegmar/fastcsv/reader/NamedCsvRecordHandler.java

Lines changed: 47 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
package de.siegmar.fastcsv.reader;
22

3+
import java.util.HashSet;
4+
import java.util.LinkedHashSet;
35
import java.util.List;
46
import java.util.Objects;
57
import java.util.function.Consumer;
@@ -18,11 +20,14 @@
1820
public final class NamedCsvRecordHandler extends AbstractInternalCsvCallbackHandler<NamedCsvRecord> {
1921

2022
private static final String[] EMPTY_HEADER = new String[0];
23+
private final boolean allowDuplicateHeader;
2124
private String[] header;
2225

2326
private NamedCsvRecordHandler(final int maxFields, final int maxFieldSize, final int maxRecordSize,
24-
final FieldModifier fieldModifier, final List<String> header) {
27+
final FieldModifier fieldModifier,
28+
final boolean allowDuplicateHeader, final List<String> header) {
2529
super(maxFields, maxFieldSize, maxRecordSize, fieldModifier);
30+
this.allowDuplicateHeader = allowDuplicateHeader;
2631
if (header != null) {
2732
setHeader(header.toArray(new String[0]));
2833
}
@@ -50,7 +55,7 @@ public static NamedCsvRecordHandler of() {
5055
///
5156
/// @param configurer the configuration, must not be `null`
5257
/// @return the new instance
53-
/// @throws NullPointerException if `null` is passed
58+
/// @throws NullPointerException if `null` is passed
5459
/// @throws IllegalArgumentException if argument constraints are violated
5560
/// @see #builder()
5661
public static NamedCsvRecordHandler of(final Consumer<NamedCsvRecordHandlerBuilder> configurer) {
@@ -60,14 +65,36 @@ public static NamedCsvRecordHandler of(final Consumer<NamedCsvRecordHandlerBuild
6065
return builder.build();
6166
}
6267

63-
private void setHeader(final String... header) {
68+
@SuppressWarnings("PMD.UseVarargs")
69+
private void setHeader(final String[] header) {
6470
Objects.requireNonNull(header, "header must not be null");
6571
for (final String h : header) {
6672
Objects.requireNonNull(h, "header element must not be null");
6773
}
74+
75+
if (!allowDuplicateHeader) {
76+
checkForDuplicates(header);
77+
}
78+
6879
this.header = header.clone();
6980
}
7081

82+
@SuppressWarnings("PMD.UseVarargs")
83+
private static void checkForDuplicates(final String[] header) {
84+
final var duplicateHeaders = new LinkedHashSet<String>();
85+
final var seen = new HashSet<String>();
86+
for (final String h : header) {
87+
if (!seen.add(h)) {
88+
duplicateHeaders.add(h);
89+
}
90+
}
91+
92+
if (!duplicateHeaders.isEmpty()) {
93+
throw new IllegalArgumentException("Header contains duplicate fields: "
94+
+ duplicateHeaders);
95+
}
96+
}
97+
7198
@Override
7299
protected NamedCsvRecord buildRecord() {
73100
if (comment) {
@@ -83,15 +110,29 @@ protected NamedCsvRecord buildRecord() {
83110
}
84111

85112
/// A builder for [NamedCsvRecordHandler].
86-
@SuppressWarnings("PMD.AvoidFieldNameMatchingMethodName")
113+
@SuppressWarnings({"checkstyle:HiddenField", "PMD.AvoidFieldNameMatchingMethodName"})
87114
public static final class NamedCsvRecordHandlerBuilder
88115
extends AbstractInternalCsvCallbackHandlerBuilder<NamedCsvRecordHandlerBuilder> {
89116

117+
private boolean allowDuplicateHeader;
90118
private List<String> header;
91119

92120
private NamedCsvRecordHandlerBuilder() {
93121
}
94122

123+
/// Sets whether duplicate header fields are allowed.
124+
///
125+
/// When set to `false`, an [IllegalArgumentException] is thrown if the header contains duplicate fields.
126+
/// When set to `true`, duplicate fields are allowed. See [NamedCsvRecord] for details on how duplicate
127+
/// headers are handled.
128+
///
129+
/// @param allowDuplicateHeader whether duplicate header fields are allowed (default: `false`)
130+
/// @return This updated object, allowing additional method calls to be chained together.
131+
public NamedCsvRecordHandlerBuilder allowDuplicateHeader(final boolean allowDuplicateHeader) {
132+
this.allowDuplicateHeader = allowDuplicateHeader;
133+
return this;
134+
}
135+
95136
/// Sets a predefined header.
96137
///
97138
/// When not set, the header is taken from the first record (that is not a comment).
@@ -133,7 +174,8 @@ protected NamedCsvRecordHandlerBuilder self() {
133174
/// @throws IllegalArgumentException if argument constraints are violated
134175
/// (see [AbstractInternalCsvCallbackHandler])
135176
public NamedCsvRecordHandler build() {
136-
return new NamedCsvRecordHandler(maxFields, maxFieldSize, maxRecordSize, fieldModifier, header);
177+
return new NamedCsvRecordHandler(maxFields, maxFieldSize, maxRecordSize, fieldModifier,
178+
allowDuplicateHeader, header);
137179
}
138180

139181
}

0 commit comments

Comments
 (0)