Getting Started

Build high-quality datasets with VinLab comprehensive labeling solutions

What is VinLab?

VinLab is a web-based platform for medical image annotation. It has been developed to remove the ground-truth barrier AI teams met to build meaningful medical AI applications. VinLab provides a high-level web interface equipped with advanced annotation tools and project management features.

Basic labeling workflow

Start and finish a labeling project with VinLab by following these steps:

  1. Install VinLab

  2. Create accounts for VinLab

  3. Set up the labeling project

  4. Set up the labeling group

  5. Import data to the project

  6. Assign tasks to assignees

  7. Labeling the tasks

  8. Review Tasks

  9. Export Annotations

Glossary

The following table describes some terms you might encounter as you use VinLab

Term

Description

DICOM

Digital Imaging and Communications in Medicine (DICOM) is the standard for the communication and management of medical imaging information and related data. DICOM is most commonly used for storing and transmitting medical images enabling the integration of medical imaging devices such as scanners, servers, workstations, printers, network hardware, and PACS (picture archiving and communication systems) from multiple manufacturers.

DICOM files

DICOM files contain a file header portion, File Meta Information portion, and a single SOP instance. The header is made up of a 128 byte preamble, followed by the characters DICM, all uppercase. The preamble must contain all zeroes if it is not used (sometimes applications will use it for proprietary data).

Following the header is the File Meta Information. This portion follows a tagged file format, and contains information about the file, the series and study it belongs to, and the patient that it belongs to. This information is frequently parsed and used as indexing data by PACS and archive systems.

(Reference: A Very Basic DICOM Introduction)

Study, Series, Image

In the DICOM model, a patient can have 1..n studies (sometimes referred to as exams or procedures). Each study consists of 1..n series. A series generally equates to a specific type (modality) of data, or the position of a patient on the acquisition device. Each series contains 1..n DICOM object instances (most commonly images, but also reports, waveform objects, etc.).

Project

A project is the workspace where you manage all of your data, annotation & labeling processes that serve a specific AI project.

Data

When you upload data (DICOM files), they will be organized by the unit of Study (same as the PACS system), and we named it Data.

Task

When you assign a project member to a Data row (a study), this will create a new Task. Each Task is assigned to one and only one assignee. And a study contains many tasks of different workers.

Label

The label is the possible class associations (e.g. covid-19 or if x-ray contains a tumor) that your machine learning algorithm will predict.

Through data labeling, the raw data are identified and labels are added to give the data more meaningful information so that a machine learning model can learn from it.

Label group

Label group is a structured collection of your object labels. It allows users to re-use a set or entire of label classes and their attributes over different projects.

Annotation

Annotation is the technique through which we label data to make objects recognizable by machines.

In this documentation, we use annotation and label interchangeably. Sometimes, annotation means the technique which is labeling. Sometimes, annotation means the annotated data which is label data.

Related task

A task with the same study that is associated to.

Storage

Each storage represents a path to the user's data storage on a specific cloud provider. Currently, we use S3 AWS as the main cloud provider. Other providers will come soon.

Each storage has information: Name, Bucket, Data path, Region, Access key, Secret key

Dataset

A project can have many datasets. Each Dataset corresponds to a connection to a Storage. When reconnecting 1 Storage in Project, a new Dataset is created

Sync dataset

Import data that is not already in the Project’s Dataset, from Storage

Abbreviations

PACS

Picture archiving and communication system

MPR

Multiplanar reconstruction

MRI

Magnetic resonance imaging

CT

Computed tomography

CSV

A CSV (comma-separated values) file is a text file that has a specific format which allows data to be saved in a table structured format.

Last updated