bizreach / robotparser-scala

Apache License 2.0 GitHub

robotparser-scala implements a parser for the robots.txt file format in Scala.

robotparser-scala Build Status

robotparser-scala implements a parser for the robots.txt file format in Scala.

Setup

Add robotparser-scala as a dependency in build.sbt:

libraryDependencies += "jp.co.bizreach" %% "robotparser-scala" % "0.0.5"

Usage

You'll parse the robots.txt file as following:

import jp.co.bizreach.robot._

val stream: InputStream = ...
val robotsTxt = RobotsTxtParser.parse(stream)

And then, you have RobotsTxt instance. By default, character encoding is UTF-8.

If you'll parse the sitemap file, as following:

import jp.co.bizreach.robot._

val stream: InputStream = ...
SitemapParser.parse(stream) match {
  // Sitemap file
  case x: Urlset => ...

  // Sitemap Index file
  case x: Sitemapindex => ...
}

SitemapParser supports following files:

  • XML Sitemap
  • XML Sitemap Index
  • Text Sitemap
  • gz

And then, you have Urlset or Sitemapindex instance. By default, character encoding is UTF-8.