growinscala / flipper

PDF to JSON, JSON to PDF and etc.

GitHub

Flipper

What is Flipper?

Flipper is an open-source PDF library written in Scala and that can be integrated in any Java/Scala environment developed by the good people at Growin. It has some really usefull features such as:

  • Parsing a PDF document and returning a JSON object - Flipper is able to parse the text in a PDF document, as well as recognize text in images inside the PDF document, and return a JSON object with the extracted information. You simply specify the type of value you want to obtain for a given keyword (A noun, a verb, a number etc.), and Flipper will do the rest!

  • Convert JSON to PDF - Flipper does not content itself with just parsing a PDF file, that's easy! Flipper also converts a given JSON object to a PDF document. You can also customize the outputted document with CSS.

  • Convert PDF to other file types - We also support the conversion from PDF to other popular formats: .png; .jpeg/jpg; .gif; .odt.

Current version: 0.3


Project structure

Flipper is divided into 3 different modules that can be used individually: Reader, Generator and Converter.

Flipper/
        ├── converter ; PDF to other file types module
        ├── generator ; JSON to PDF module
        ├── reader    ; PDF parser to JSON
        └── build.sbt ; Project config file

You can find the individual README.md files with examples and documentation here:


How do I get set up?

This part of the documentation will guide you on the simple process of setting up Flipper for yourself.


Configuration

Flipper is available on maven central, so to use it you simply need to add the lines bellow to your own project.

If you are using SBT:

libraryDependencies += "com.growin" %% "flipper" % "0.3"

Or Maven:

<dependency>
    <groupId>com.growin</groupId>
    <artifactId>flipper_2.12</artifactId>
    <version>0.3</version>
</dependency>

For other versions you can access the maven repository. There you will also find other ways of including Flipper into your project that aren't SBT or Maven.


Dependencies

Download the eng.traineddata and por.traineddata from here and insert them in a directory named tessdata in the root of the project.

Flipper uses Tess4j (a tesseract for java wrapper) to extract text from images (using an algorithm known as optical character recognition). In order to improve this algorithms accuracy, we must provide Tess4j with a set of training data.


How to test Flipper

To be implemented



Who do I talk to?

Flipper is an Open-Source project developed at Growin in our offices in Lisbon.
If you have any questions, you can contact:

Or visit our website: www.growin.com



License

Open source licensed under the MIT License