sotastream.cli module

sotastream.cli.add_global_args(parser: ArgumentParser)[source]

Add global arguments to the parser. These appear before the pipeline argument and are available to all pipelines.

Parameters:

parser – The parser to add the options to.

sotastream.cli.adjustSeed(seed, local_num_instances, local_instance_rank)[source]

Adjust seed for infinibatch such that each instance gets a different one based on process number and MPI coordinates.

sotastream.cli.main()[source]
sotastream.cli.maybe_split_files(args)[source]

Split data files into smaller files in a temporary directory

This function updates args inplace: it replaces .gz paths (if any) with split dirs.

Args:

args: CLI args object from argparse

sotastream.cli.run_pipeline_process(conn, args, seed, worker_id, num_workers)[source]

Runs a pipeline in a single subprocess. Each subprocess writes to the pipe (conn) after it has seen the specified number (args.queue_buffer_size) of lines.