You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on a benchmark dataset created from the existing Snapshot Serengeti dataset. The dataset offers different types for novelties for novelty detection computer vision systems.
The annotations are split into training, validation and test set, and they are formatted as jsonl objects. Each jsonl file contains the url of the images in the set and the label information such as the class.
I am trying to create the croissant metadata to allow for download of specific split (train, valid or test) using the python library but I am not able to figure this out looking into the doc.
distribution = [
# NOVEL-SS annotations:
mlc.FileObject(
id="jsonl-files",
name="jsonl-files",
description="NOVEL-SS training set image annotations.",
content_url="https://raw.githubusercontent.com/Irenetema/NOVEL_SS/master/labels/croissant_jsonl.zip",
encoding_format="application/zip",
sha256="6265b65ce08acafc3cd55233d32135625857ad421927e8f3af71c789ad434a85",
),
mlc.FileObject(
id="train_annotations",
name="train_annotations",
description="NOVEL-SS training set image annotations.",
contained_in=["jsonl-files"],
content_url="train.jsonl",
encoding_format="application/jsonlines"
),
mlc.FileObject(
id="valid_annotations",
name="valid_annotations",
description="NOVEL-SS training set image annotations.",
contained_in=["jsonl-files"],
content_url="valid.jsonl",
encoding_format="application/jsonlines"
),
mlc.FileObject(
id="test_annotations",
name="test_annotations",
description="NOVEL-SS training set image annotations.",
contained_in=["jsonl-files"],
content_url="test.jsonl",
encoding_format="application/jsonlines"
),
]
record_sets = [
# RecordSets contains records in the dataset.
mlc.RecordSet(
id="images_and_bbox",
name="images_and_bbox",
key="name",
fields=[
mlc.Field(
id="images_and_bbox/image_path",
name="image_path",
description="Snapshot Serengeti image path (e.g. S6/P07/P07_R2/S6_P07_R2_IMAG0077.JPG)",
data_types=mlc.DataType.TEXT,
source=mlc.Source(
file_set="train_annotations",
extract=mlc.Extract(column="image_path"),
),
),
mlc.Field(
id="images_and_bbox/width",
name="width",
description="Image width (e.g., 2048)",
data_types=mlc.DataType.INTEGER,
source=mlc.Source(
file_set="train_annotations",
extract=mlc.Extract(column="width"),
),
),
mlc.Field(
id="images_and_bbox/height",
name="height",
description="Image height (e.g., 1536)",
data_types=mlc.DataType.INTEGER,
source=mlc.Source(
file_set="train_annotations",
extract=mlc.Extract(column="height"),
),
),
mlc.Field(
id="images_and_bbox/environment_id",
name="environment_id",
description="id of environment (lighting condition) of the image",
data_types=mlc.DataType.INTEGER,
source=mlc.Source(
file_set="train_annotations",
extract=mlc.Extract(column="environment_id"),
),
),
mlc.Field(
id="images_and_bbox/novelty_type",
name="novelty_type",
description="interger indentifying the type of novelty in the image",
data_types=mlc.DataType.INTEGER,
source=mlc.Source(
file_set="train_annotations",
extract=mlc.Extract(column="novelty_type"),
),
),
],
),
]
# Metadata contains information about the dataset.
metadata = mlc.Metadata(
name="NOVEL-SS",
# Descriptions can contain plain text or markdown.
description=(
""
),
cite_as=(
""
),
distribution=distribution,
record_sets=record_sets
)
When I download only one set (see file_set="train_annotations" above) it works but I don't see how make the split be a parameter.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I am working on a benchmark dataset created from the existing Snapshot Serengeti dataset. The dataset offers different types for novelties for novelty detection computer vision systems.
The annotations are split into training, validation and test set, and they are formatted as jsonl objects. Each jsonl file contains the url of the images in the set and the label information such as the class.
I am trying to create the croissant metadata to allow for download of specific split (train, valid or test) using the python library but I am not able to figure this out looking into the doc.
When I download only one set (see file_set="train_annotations" above) it works but I don't see how make the split be a parameter.
Any idea how to do this?
Beta Was this translation helpful? Give feedback.
All reactions