sotastream.data module
- class sotastream.data.Line(rawLine=None, fields=[])[source]
Bases:
objectA Line object represents a line containined fields. The string representation is typically delimited by tabs, and internally we use fields. The fields can represent any parallel corpus. Typically, they are source, target, and metadata.
- append(other: Line, fields: List[int] | None = None, separator=' ')[source]
Append field-wise, on the specified fields. If the current Line has fewer fields than the Line being appended, it is padded to match.
- Parameters:
other – the Line object to append.
fields – the list of fields to append (None means all fields).
- fields
- static join(lines: List[Line], separator=' <eos>', end_range=2)[source]
Joins columns of lines together using the specified separator. Quits at column end_range - 1.
Example input: join([Line(“a b 1”), Line(“d e 1”)], separator=”|”, end_range=2) Example output: Line(“a|d b|e”)
- Parameters:
lines – the list of Line objects to join.
separator – the separator to use.
end_range – the column to stop at.
- Returns:
a new Line object.