jjwtimmer / cloudsearch-query-validator

CloudSearch Structured Query Validator using FastParse

GitHub

CloudSearch structured query validator

Build Status

This is a CloudSearch structured query that parses queries using the the FastParse library. It's useful to validate the syntax of structured queries when you allow input from users. In the wild it's used for querying the Reddit API.

If you want to use the Reddit plain syntax, please checkout this project: https://github.com/JJWTimmer/reddit-plain-query-validator

Usage

Add the dependency to your build.sbt

libraryDependencies += "com.github.jjwtimmer" %% "cloudsearch-query-validator" % "0.1"

Use it!

import com.github.jjwtimmer.cloudsearch.validation.CloudSearchQueryValidator
import scala.util.{Success, Failure}

// successful parse example
CloudSearchQueryValidator("(and (field author 'kafka') title:'I forgot')")

// failed parse example
CloudSearchQueryValidator("a the https")

// pattern matching example
CloudSearchQueryValidator("(not (or author:'jjwtimmer' author:'jeroenr'))") match {
  case Success(result) => println(s"Parsed: $result")
  case Failure(error) => println("Not a valid query")
}

Disclaimer

The documented part of the CloudSearch structured query syntax is now supported:

  1. VALUE: either single-quoted string, date, integer, fraction, boundingbox or range
    1. string: 'example string'
    2. date: '2016-05-23T23:34:33.324Z'
    3. integer: 345
    4. fraction: 234.435
    5. boundingbox: ['-50.4, 4.56', '45,-4.36']
    6. range: [,] {,} [,} {,], both left and right bound are optional and can contain date, integer, fraction.
  2. fieldname:VALUE
  3. (field FIELD VALUE)
  4. (and OTHER1 OTHER2 ...)
  5. (or OTHER1 OTHER2 ...)
  6. (not OTHER), unsupported syntax: (not field=genres 'Sci-Fi')
  7. matchall
  8. (phrase boost=FRACTION field=FIELD 'string value')
  9. (prefix boost=FRACTION field=FIELD 'string value')
  10. (range field=FIELD {,'2016-05-23T23:34:33.324Z'])
  11. (term field=FIELD 2000)
  12. (near boost=FRACTION distance=INTEGER field=FIELD 'string')

The implementation is very naieve, no checking if an option is specified multiple times within the expression, for example, or if a date is a valid date.

Contributing

Pull requests are always welcome

Not sure if that typo is worth a pull request? Found a bug and know how to fix it? Do it! We will appreciate it. Any significant improvement should be documented as a GitHub issue before anybody starts working on it.

I'm always thrilled to receive pull requests and will try to process them quickly. If your pull request is not accepted on the first try, don't get discouraged!

Thanks

Please checkout this beautiful GNIP validator that was inspiration for this lib, made by my colleague Jeroen Rosenberg: https://github.com/jeroenr/gnip-rule-validator