sotastream.data module

class sotastream.data.Line(rawLine=None, fields=[])[source]

Bases: object

A Line object represents a line containined fields. The string representation is typically delimited by tabs, and internally we use fields. The fields can represent any parallel corpus. Typically, they are source, target, and metadata.

append(other: Line, fields: List[int] | None = None, separator=' ')[source]

Append field-wise, on the specified fields. If the current Line has fewer fields than the Line being appended, it is padded to match.

Parameters:
  • other – the Line object to append.

  • fields – the list of fields to append (None means all fields).

fields
static join(lines: List[Line], separator=' <eos>', end_range=2)[source]

Joins columns of lines together using the specified separator. Quits at column end_range - 1.

Example input: join([Line(“a b 1”), Line(“d e 1”)], separator=”|”, end_range=2) Example output: Line(“a|d b|e”)

Parameters:
  • lines – the list of Line objects to join.

  • separator – the separator to use.

  • end_range – the column to stop at.

Returns:

a new Line object.