Protocols

Introduction

Protocols are special cases of Bionic decorators; their effect is to specify the Serialization Protocol for the entity being defined. For example:

# This entity should only have values equal to "short" or "long".
@builder
@bn.protocol.enum('short', 'long')
def name_length(name):
    if len(name) < 10:
        return 'short'
    else:
        return 'long'

# This entity's value will always be a ``pandas.DataFrame``.
@builder
@bn.protocol.frame
def raw_df():
    from sklearn import datasets
    dataset = datasets.load_breast_cancer()
    df = pd.DataFrame(
        data=dataset.data,
    )
    df['target'] = dataset.target
    return df

Protocols are used to tell Bionic how to serialize, deserialize, and validate entity values. In most cases, Bionic’s default protocol can figure out an appropriate way to handle each value, so explicit protocol decorators are usually not required. However, they can be useful for data types that need special handling, or just to add clarity, safety, or documentation to a entity definition.

Protocols can also be used when creating new entities with declare or assign:

builder.assign('name_length', 'short', bn.protocol.enum('short', 'long'))
builder.declare('raw_df', bn.protocol.frame)

Custom Protocols

If you need to control how an entity is serialized, you can write your own custom protocol. (However, since Bionic is still at an early stage, future API changes may break your implementation.)

class MyProtocol(BaseProtocol):
    def get_fixed_file_extension(self):
        """
        Returns a file extension identifying this protocol. This value will be appended
        to the name of any file written by the protocol, and may be used to determine
        whether a file can be read by the protocol.

        This string should be unique, not shared with any other protocol. By
        convention, it doesn't include an initial period, but may include periods in
        the middle.  (For example, `"csv"`, and `"csv.zip"` would both be sensible
        file extensions.)
        """
        raise NotImplementedError()

    def write(self, value, path):
        """Serializes the object ``value`` to the pathlib path ``path``."""
        raise NotImplementedError()

    def read(self, path):
        """Deserializes an object from the pathlib path ``path``, and returns it."""
        raise NotImplementedError()

Built-In Protocol Decorators

bionic.protocol.dask(func=None, **kwargs)

Decorator indicating that an entity’s values always have the dask.dataframe.DataFrame type.

These values will be serialized to a .dask.pq directory.

bionic.protocol.dillable(func=None, **kwargs)

Decorator indicating that an entity’s values can be serialized using the dill library.

This is useful for objects that can’t be pickled for some reason.

bionic.protocol.enum(*allowed_values)

Indicates that an entity will only have one of a specific set of values.

Parameters

allowed_values (Sequence of objects) – The expected possible values for this entity.

Returns

An entity decorator.

Return type

Function

bionic.protocol.frame(func=None, file_format=None, check_dtypes=None)[source]

Decorator indicating that an entity will always have a pandas DataFrame type.

The frame values will be serialized to either Parquet (default) or Feather. Parquet is more popular, but some types of data or frame structures are only supported by one format or the other. In particular, ordered categorical columns are supported by Feather and not Parquet.

This decorator can be used with or without arguments:

@frame
def dataframe(...):
    ...

@frame(file_format='feather')
def dataframe(...):
    ...
Parameters
  • file_format ({'parquet', 'feather'} (default: 'parquet')) – Which file format to use when saving values to disk.

  • check_dtypes (boolean (default: True)) – Check for column types not supported by the file format. This check is best-effort and not guaranteed to catch all problems. If an unsupported data type is found, an exception will be thrown at serialization time.

bionic.protocol.geodataframe(func=None, **kwargs)

Decorator indicating that an entity’s values always have the geopandas.geodataframe.GeoDataFrame type.

These values will be serialized to SHP files.

bionic.protocol.image(func=None, **kwargs)

Decorator indicating that an entity’s values always have the Pillow.Image type.

These values will be serialized to PNG files.

bionic.protocol.json(func=None, **kwargs)

Decorator indicating that an entity’s values are built-in types that are JSON-serializable: the supported types are int, float, str, bool, list, and dict. Note that dict keys must be strings, and each element of a list or dict must itself be a supported built-in type.

These values will be serialized to JSON files.

bionic.protocol.numpy(func=None, **kwargs)

Decorator indicating that an entity’s values always have the numpy.ndarray type.

These values will be serialized to .npy files.

bionic.protocol.path(func=None, **kwargs)

Decorator indicating that an entity’s values are pathlib.Path objects referring to local files. When the Path is serialized, the underlying files are transferred to Bionic’s internal file cache; this means a Path can be serialized to a cloud cache and then deserialized on a different machine and still work. The Path can refer to either a file or a directory.

Parameters

operation ({"move", "copy"} (default: "copy")) – Indicates whether the underlying file should be moved or copied to Bionic’s internal cache. If the file is created by this entity function, it probably makes sense to use “move”, since no one else should be accessing the file anyway. If the file already existed, then “copy” is better.

bionic.protocol.picklable(func=None, **kwargs)

Decorator indicating that an entity’s values can be serialized using the pickle library.

Parameters

pickle_protocol_version – int (default: 4) The pickle serialization protocol to use.

bionic.protocol.type(type_)

Indicates that an entity’s values will always have a specific type.

Parameters

type (Type) – The expected type for this entity.

Returns

A entity decorator.

Return type

Function

bionic.protocol.yaml(func=None, **kwargs)

Decorator indicating that an entity’s values can be serialized using the PyYAML library.

Parameters

**kwargs (keyword args for yaml.dump) – E.g. default_flow_style, encoding, etc.