Read and convert subtitle (.srt) file to csv or List
libraryDependencies += "io.github.mdauthentic" % "sous-title_2.13" % "0.3.0"import io.github.mdauthentic.core._Calling the open or readInLine method returns an SRT type containing id, startTime, endTime and sub (the subtitle itself).
scala> val reader = SRTReader.open("file.srt")
reader: List(SRT(1, 00:00:33.599, 00:00:35.270, List(Soy Amelia Folch.)))
Inline reader returns a list of .srt type
scala> val srt =
"""1
|00:00:33,599 --> 00:00:35,270
|(NARRA) Soy Amelia Folch.
|
|2
|00:00:36,199 --> 00:00:39,870
|Tengo 23 años y sin embargo
|he salvado la vida del Empecinado.""".stripMarginscala> val inlineReader = SRTReader.readInLine(srt)
inlineReader: List(SRT(1,00:00:33.599,00:00:35.270,List((NARRA) Soy Amelia Folch.)), SRT(2,00:00:36.199,00:00:39.870,List(Tengo 23 años y sin embargo, he salvado la vida del Empecinado.)))If you are interested in only some part of the result returned by the reader, for instance the subtitle and not the rest i.e. id, start and end time, then you can extract just the subtitle by doing something like this;
scala> inlineReader.sub
List(List((NARRA) Soy Amelia Folch.), List(Tengo 23 años y sin embargo, he salvado la vida del Empecinado.))There are two ways to write to file;
- writing without header
scala> val reader = SRTReader.open("file.srt")
reader: List[SRT] = List(SRT(1, 00:00:33.599, 00:00:35.270, List(Soy Amelia Folch.)))scala> SRTWriter.write(reader, "output.csv")using file path directly
scala> SRTWriter.write("inputFileName.srt", "outputFileName.csv")- with user-defined header
scala> val header = List("id", "start_time", "end_time", "subtitle")
header: List[String] = List(id, start_time, end_time, subtitle)scala> SRTWriter.write("input.srt", "output.csv", header)
In Scandal (a TV series), wine was mentioned several times and I was curious to know the number of times the word was used in the entire series (from seasons 1 - 7). This library was used to convert all the subtitle files for this series to csv format for further analysis.
This library will come in handy in data analysis projects for parsing and extracting the contents of subtitle files.