Microdata parsing library for Scala.
To install add the following line to your SBT configuration:
libraryDependencies += "me.gladwell.microtesia" %% "microtesia" % "0.5.1"
To use simply put the Microtesia API in scope and call the parseMicrodata
method as follows:
scala> import microtesia._
import microtesia._
scala> parseMicrodata("""<div
| itemscope
| itemtype="http://schema.org/Movie">
| <h1 itemprop="name">Avatar</h1>
| </div>""")
res0: scala.util.Try[microtesia.MicrodataDocument] = Success(MicrodataDocument(List(MicrodataItem(List((name,MicrodataString(Avatar))),Some(http://schema.org/Movie),None))))
See the API reference for more information.
Once the HTML has been parsed, you can extract microdata values using for-comprehensions:
scala> import microtesia._
import microtesia._
scala> val items = List(MicrodataItem(properties = Seq(("name", MicrodataString("Brian")))))
scala> for {
| MicrodataItem(properties, _, _) <- items
| MicrodataProperty("name", MicrodataString(string)) <- properties
| } yield string
res1: List[String] = List(Brian)
See MicrodataValueSpec.scala for more examples of microdata for-comprehensions.
You can also query microdata using an XPath-like syntax:
scala> import microtesia._
import microtesia._
scala> val item = MicrodataItem(properties = Seq(("name", MicrodataString("Brian"))))
scala> item \ "name"
res1: MicrodataQuery = QueryResults(List(MicrodataString(Brian)))
While you can use the for-comprehensions to write custom parsers, microtesia provides a formats
API (based on https://github.com/milessabin/shapeless) to automatically de-serialise MicrodataValue
instances into value types and case classes:
scala> import microtesia._, formats._
import microtesia._
import formats._
scala> case class Person(name: String, age: Int, adult: Boolean)
defined class Person
scala> MicrodataItem(
| Seq(
| ("name", MicrodataString("hello")),
| ("age", MicrodataString("13")),
| ("adult", MicrodataString("true"))
| )
| ).convertTo[Person]
res0: scala.util.Try[Person] = Success(Person(hello,13,true))
To release a new, tagged version of Microtesia, execute the following:
$ sbt +publish
$ sbt ghpages-push-site
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.
Copyright 2015-2018 Ricardo Gladwell.