Module pysimt.datasets
A dataset in pysimt inherits from torch.nn.Dataset and is designed
to read and expose a specific type of corpus.
- A dataset class name should end with the
Datasetsuffix. - The
__init__method should include**kwargsfor other possible arguments. - The
__getitem__and__len__methods should be implemented. - A static method
to_torch(batch, **kwargs)is automatically used when preparing the batch tensor during forward-pass.
Please see pysimt.datasets.TextDataset to get an idea on how to implement
a new dataset.
Expand source code
"""
A dataset in `pysimt` inherits from `torch.nn.Dataset` and is designed
to read and expose a specific type of corpus.
* A dataset class name should end with the `Dataset` suffix.
* The `__init__` method should include `**kwargs` for other possible arguments.
* The `__getitem__` and `__len__` methods should be implemented.
* A static method `to_torch(batch, **kwargs)` is automatically used when
preparing the batch tensor during forward-pass.
Please see `pysimt.datasets.TextDataset` to get an idea on how to implement
a new dataset.
"""
from .numpy import NumpyDataset
from .text import TextDataset
from .objdet import ObjectDetectionsDataset
# Second the selector function
def get_dataset(type_):
return {
'numpy': NumpyDataset,
'text': TextDataset,
'objectdetections': ObjectDetectionsDataset,
}[type_.lower()]
# Should always be at the end
from .multimodal import MultimodalDataset # noqa
Sub-modules
pysimt.datasets.basepysimt.datasets.collatepysimt.datasets.imagefolderpysimt.datasets.kaldipysimt.datasets.multimodalpysimt.datasets.numpypysimt.datasets.objdetpysimt.datasets.text
Functions
def get_dataset(type_)-
Expand source code
def get_dataset(type_): return { 'numpy': NumpyDataset, 'text': TextDataset, 'objectdetections': ObjectDetectionsDataset, }[type_.lower()]