Skip to content

stemflow.utils.validation


Validation module. Most of these functions are plain checking and easy to understand.

check_mem_string(mem_str)

Check if a string is a valid memory specification like '8GB', '512MB', '1.5GB', etc.

Source code in stemflow/utils/validation.py
def check_mem_string(mem_str: str) -> bool:
    """
    Check if a string is a valid memory specification like '8GB', '512MB', '1.5GB', etc.
    """
    if not isinstance(mem_str, str):
        return False

    pattern = r'^\s*(\d+(\.\d+)?)\s*(KB|MB|GB|TB)\s*$'
    return re.match(pattern, mem_str.upper()) is not None

check_random_state(seed)

Turn seed into a np.random.RandomState instance.

Parameters:

  • seed (Union[None, int, Generator]) –

    If seed is None, return a random generator. If seed is an int, return a random generator with that seed. If seed is already a random generator instance, return it. Otherwise raise ValueError.

Returns:

  • Generator

    The random generator object based on seed parameter.

Source code in stemflow/utils/validation.py
def check_random_state(seed: Union[None, int, np.random._generator.Generator]) -> np.random._generator.Generator:
    """Turn seed into a np.random.RandomState instance.

    Args:
        seed:
            If seed is None, return a random generator.
            If seed is an int, return a random generator with that seed.
            If seed is already a random generator instance, return it.
            Otherwise raise ValueError.

    Returns:
        The random generator object based on `seed` parameter.
    """
    if seed is None:
        return np.random.default_rng(np.random.randint(0, 2**32 - 1))
    if isinstance(seed, int):
        return np.random.default_rng(seed)
    if isinstance(seed, np.random._generator.Generator):
        return seed
    raise ValueError("%r cannot be used to seed a np.random.default_rng instance" % seed)

transform_y(X_train, y_train)

If y_train is not str, but also not a dataframe, transform it into a dataframe

Source code in stemflow/utils/validation.py
def transform_y(X_train, y_train):
    """If y_train is not str, but also not a dataframe, transform it into a dataframe
    """
    if isinstance(X_train, pd.DataFrame):
        if isinstance(y_train, (np.ndarray, pd.Series)):
            return pd.DataFrame(y_train, columns=['y_true'], index=X_train.index)

    return y_train