sotastream.pipelines.multistream_pipeline module
- class sotastream.pipelines.multistream_pipeline.MultiStreamPipeline(paths: List[Path], ext: str, mix_weights: List = None, **kwargs)[source]
Bases:
PipelinePipeline for mixing multiple (or variable) number of datasources.
This pipeline takes one more more data paths and mixes them together as given by –mix-weights parameter (default: equal ratios i.e. balance the sources). Example usecase: classification task, where each data stream is per class (default mix ratio is to balance classes)
- classmethod add_cli_args(parser: ArgumentParser)[source]
Add CLI arguments to pipeline specific subparser. These arguments are shared across all pipelines and appear after the pipeline name in the CLI. For global args that appear before the pipeline name, see sotastream.cli.add_cli_args
- classmethod get_data_sources_default_weights()[source]
A list of floats corresponding to the number of data sources and specifying the mixture weights among them. These will be provided to the argparse subcommand as the default values for the –mix-weights argument. To get the actual instantiated values, use self.mix_weights. The function is named in an overly explicit way to avoid confusion between these two sources.
- classmethod get_data_sources_for_argparse() List[Tuple][source]
This returns a list of (name, description) pairs for each data source. This is used to instantiate the argparse subcommand with named positional arguments. These are not the actual instantiated data paths; for that, each class has The function name is quite verbose in order to minimize confusion.
- Returns:
List[Tuple]: List of (name, description)