workingdog / scalapsl   1.2

BSD 3-clause "New" or "Revised" License GitHub

a scala library for the Public Suffix List

Scala versions: 2.12 2.11

Public Suffix List Scala API

This Scala library (scalapsl) presents a simple API to use the Public Suffix List[1]. With this library you can parse a URL into its component subdomains, such as: Top Level Domain (TLD), Second Level Domain (SLD), Third Level Domain (TRD), etc... registrable domain and public suffix parts.

From the Public Suffix List:

A "public suffix" is one under which Internet users can (or historically could) directly register names. Some examples of public suffixes are .com, .co.uk and pvt.k12.ma.us. The Public Suffix List is a list of all known public suffixes. The Public Suffix List is an initiative of Mozilla, but is maintained as a community resource. It is available for use in any software, but was originally created to meet the needs of browser manufacturers. It allows browsers to, for example:

  • Avoid privacy-damaging "supercookies" being set for high-level domain name suffixes
  • Highlight the most important part of a domain name in the user interface
  • Accurately sort history entries by site.

Since there was and remains no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain (the policies differ with each registry), the only method is to create a list. This is the aim of the Public Suffix List. This Scala library is a basic conversion of the Java code of reference 2.

Documentation

The Public Suffix List is a list of simple rules for all known public suffixes, see documentation.

Installation

Add the following dependency to your application build.sbt:

libraryDependencies += "com.github.workingDog" %% "scalaspl" % "1.2"

Note the typo in the library name, should have been scalapsl.

To compile and generate an independent fat jar file from source, use SBT:

sbt assembly

The jar file scalapsl-1.3-SNAPSHOT.jar will be in created in the ./target/scala-2.12 directory.

How to use

Very simple to use, see the Example.scala and TestApp.scala.

From Example.scala:

    val psl = PublicSuffixList()
    println("the public suffix of \"www.example.net\" is: " + psl.publicSuffix("www.example.net").get)
    println("\"www.example.net\" is a public suffix: " + psl.isPublicSuffix("www.example.net"))
    println("\"www.example.net\" is registrable: " + psl.isRegistrable("www.example.net"))
    println("the registrable domain of: www.example.net is: " + psl.registrable("www.example.net").get)
    println()
    val domain = "x.a.b.c.kobe.jp"
    println("domain: " + domain + "\n  tld: " + psl.tld(domain) + "\n  sld: " + psl.sld(domain) + "\n  trd: " + psl.trd(domain))
    println()
    println("domainLevel 1: " + psl.domainLevel(domain, 1))
    println("domainLevel 2: " + psl.domainLevel(domain, 2))
    println("domainLevel 3: " + psl.domainLevel(domain, 3))
    println("domainLevel 4: " + psl.domainLevel(domain, 4))
    println("domainLevel 5: " + psl.domainLevel(domain, 5))
    println()
    psl.domainLevels(domain).foreach(lvl => println("domainLevels: " + lvl))

To read the PSL directly from reference 1, you need to be connected to the internet and have the following property set in the configuration file application.conf:

psl.url="https://publicsuffix.org/list/public_suffix_list.dat"

If you do not want to read the list from the internet, but from a local file, use something like:

psl.url="file:///Users/.../scalaPSL/src/main/resources/public_suffix_list.dat"

See application.conf example in the ./src/main/resources directory.

References

  1. The Public Suffix List

  2. Original java code

  3. Other code

Dependencies

Uses scala 2.12.8 and TypeSafe Configuration library for JVM languages. The latest Public Suffix List can be obtained from The Public Suffix List on github.

Status

Passes all tests, see TestApp.scala and the test file