CsvStreamUtils is a CSV toolkit written in Scala to provide a functional way to handle CSV files or streams. It can be imported in Java or Scala. In order to get rid of the boilerplate to parse CSV file in a imperative way, it allows users to use more composable way to handle CSV parsing.
resolvers += "Sonatype OSS Releases" at "https://oss.sonatype.org/content/repositories/releases"
libraryDependencies ++= Seq(
"com.github.dannywe" % "csvstreamutils_2.10" % "1.03"
)
There are several features that can be used for the CSV parsing
- A composable way to handle CSV parsing line by line
- Error handling
- Modularity
List<ViewObject> voList = Lists.newArrayList();
try (CSVReader csvReader = csvFactory.createReader(csv)) {
String[] csvColumns;
int count = 0;
csvColumns = csvReader.readNext();
validateHeader(csvColumns, headers);
csvPersistentStrategy.validationContext();
//you want to have error limit not return all the errors
while (errors.size() <= ERROR_LIMIT
&& (csvColumns = csvReader.readNext()) != null) {
errors.addAll(csvLineValidator.validateCsvLine(csvColumns,
lineNo, validationContext));
lineNo++;
voList.add(factory.create(csvColumns));
} catch (IOException ex) {
//error handling
}
}
- The code is hard to reuse because of many division of the code like header validation, error limit and so on.
- It is not modular, so it is not composable.
- If we add more line computation such as take or drop, the implementation would be even worse.
CsvService service = new CsvService();
File file = new File("src/test/resources/users.csv");
Result<User> result = service.parse(file, User::new);
if (result.isSuccessful()) {
List<User> userList = result.getResult();
}
User::new
is Java 8 method reference syntax to the constructor. In this example it acts as a factory for CsvStreamUtils to constructUser
objects from arrays ofString
.- For use in Java 7, replace this reference with an instance of
Function<String[], User>
from Guava to transform aString[]
to aUser
.
File file = new File("src/test/resources/users_invalid_1_rows.csv");
Result<User> result = service.parse(file, User::new);
assertThat(result.isFailed(), is(true));
List<LineErrorConverter> failureResult = result.getFailureResult();
assertThat(failureResult.size(), is(1));
List<SimplifiedErrorContainer> errorContainer = failureResult.get(0).getViolations()
The SimplifiedErrorContainer
has 3 fields to specify errors:
lineNumber: Int
columnName: String
errorMessage: String
Given the CSV file:
username,company,interest,team
Cloud,AusRegistry,Swimming,dev
James,AusRegistry,, // error line
Andres,AusRegistry,,dev // error line
Andrew,AusRegistry,Table tennis, // error line
Jason,AusRegistry,Noisy,dev
Varol,AusRegistry,Table tennis,dev
And the User
class:
public class User {
@NotEmpty
private String name;
@NotEmpty
private String company;
@NotEmpty
private String interest;
@NotEmpty
private String team;
// ...
}
We can parse the file like so:
File file = new File("src/test/resources/users_invalid_3_rows.csv");
Result<User> result = service.parse(file, User::new);
And extract all errors:
List<LineErrorConverter> failureResult = result.getFailureResult();
String formattedErrorMessage = result.getFormattedErrorMessage();
System.out.println(formattedErrorMessage)
//Line 3 Column Interest may not be empty.
//Line 3 Column Team may not be empty.
//Line 4 Column Interest may not be empty.
//Line 5 Column Team may not be empty.
Dropping rows:
// the first 3 rows are part of the header
StreamOperationBuilder<User> builder = new StreamOperationBuilder<>();
Result<User> result = service.parse(file, User::new,
builder.drop(3)
);
Limiting rows parsed:
// the info is in the first line
builder.take(1);
Parsing rows whilst a predicate holds true:
// we don't want to know about James
builder.takeWhile(user -> !"James".equals(user.getName));
And finally, Composing operations to target specific rows:
// the info is on line 3
builder.drop(3).andThen(builder.takeWhile(t -> !"James".equals(t.getName))));
Limiting the number of errors:
File file = new File("src/test/resources/users_invalid_100_rows.csv");
Result<User> result = service.parse(file, User::new, 5); // Only return maximum 5 lines of error
List<LineErrorConverter> failureResult = result.getFailureResult();
assertThat(failureResult.size(), is(5)); //only return 5 lines error
Customise the format of error messages
//to implement ErrorLineFormatter
public class CustomizedErrorLineFormatter implements ErrorLineFormatter {
@Override
public String format(int lineNumber, String columnName, String errorMessage) {
return "Column " + columnName + " in Line " + lineNumber + " has error: " + errorMessage;
}
}
// usage
String customizedErrorMessage = result.getFormattedErrorMessage(new CustomizedErrorLineFormatter());
System.out.println(customizedErrorMessage);
//Column Interest in Line 3 has error: may not be empty
//Column Team in Line 3 has error: may not be empty
//Column Interest in Line 4 has error: may not be empty
//Column Team in Line 5 has error: may not be empty