Flipper is an open-source PDF library written in Scala and that can be integrated in any Java/Scala environment developed by the good people at Growin. It has some really usefull features such as:
-
Parsing a PDF document and returning a JSON object - Flipper is able to parse the text in a PDF document, as well as recognize text in images inside the PDF document, and return a JSON object with the extracted information. You simply specify the type of value you want to obtain for a given keyword (A noun, a verb, a number etc.), and Flipper will do the rest!
-
Convert JSON to PDF - Flipper does not content itself with just parsing a PDF file, that's easy! Flipper also converts a given JSON object to a PDF document. You can also customize the outputted document with CSS.
-
Convert PDF to other file types - We also support the conversion from PDF to other popular formats: .png; .jpeg/jpg; .gif; .odt.
Current version: 0.3
Flipper is divided into 3 different modules that can be used individually: Reader, Generator and Converter.
Flipper/
├── converter ; PDF to other file types module
├── generator ; JSON to PDF module
├── reader ; PDF parser to JSON
└── build.sbt ; Project config file
You can find the individual README.md files with examples and documentation here:
Flipper is available on maven central, so to use it you simply need to add the lines bellow to your own project.
If you are using SBT, add the following line to your build.sbt
:
libraryDependencies += "com.growin" %% "flipper" % "0.3"
Or Maven, add these lines to your pom.xml
:
<dependency>
<groupId>com.growin</groupId>
<artifactId>flipper_2.12</artifactId>
<version>0.3</version>
</dependency>
For other versions you can access the maven repository. There you will also find other ways of including Flipper into your project without using SBT or Maven.
Download the eng.traineddata and por.traineddata from here and insert them in a directory named tessdata in the root of the project.
Flipper uses Tess4j (a tesseract for java wrapper) to extract text from images (using an algorithm known as optical character recognition). In order to improve this algorithms accuracy, we must provide Tess4j with a set of training data.
If you want to make sure for your self that Flipper is in fact amazing and working properly you can the folowing steps to test it (using sbt):
- Start by cloning this repository
git clone https://github.com/GrowinScala/Flipper.git
- Then
cd
into it
cd Flipper
- And run the unit tests using sbt
sbt test
Flipper is an Open-Source project developed at Growin in our offices in Lisbon.
If you have any questions, you can contact:
- Valter Fernandes - [email protected]
- Margarida Reis - [email protected]
- Lucas Fischer - [email protected]
Or visit our website: www.growin.com
Open source licensed under the MIT License