Getting Started
Build high-quality datasets with VinLab comprehensive labeling solutions
What is VinLab?
VinLab is a web-based platform for medical image annotation. It has been developed to remove the ground-truth barrier AI teams met to build meaningful medical AI applications. VinLab provides a high-level web interface equipped with advanced annotation tools and project management features.
Basic labeling workflow
Start and finish a labeling project with VinLab by following these steps:
Install VinLab
Create accounts for VinLab
Set up the labeling project
Set up the labeling group
Import data to the project
Assign tasks to assignees
Labeling the tasks
Review Tasks
Export Annotations
Glossary
The following table describes some terms you might encounter as you use VinLab
Term
Description
DICOM
Digital Imaging and Communications in Medicine (DICOM) is the standard for the communication and management of medical imaging information and related data. DICOM is most commonly used for storing and transmitting medical images enabling the integration of medical imaging devices such as scanners, servers, workstations, printers, network hardware, and PACS (picture archiving and communication systems) from multiple manufacturers.
DICOM files
DICOM files contain a file header portion, File Meta Information portion, and a single SOP instance. The header is made up of a 128 byte preamble, followed by the characters DICM, all uppercase. The preamble must contain all zeroes if it is not used (sometimes applications will use it for proprietary data).
Following the header is the File Meta Information. This portion follows a tagged file format, and contains information about the file, the series and study it belongs to, and the patient that it belongs to. This information is frequently parsed and used as indexing data by PACS and archive systems.
(Reference: A Very Basic DICOM Introduction)
Study, Series, Image
In the DICOM model, a patient can have 1..n studies (sometimes referred to as exams or procedures). Each study consists of 1..n series. A series generally equates to a specific type (modality) of data, or the position of a patient on the acquisition device. Each series contains 1..n DICOM object instances (most commonly images, but also reports, waveform objects, etc.).
Project
A project is the workspace where you manage all of your data, annotation & labeling processes that serve a specific AI project.
Data
When you upload data (DICOM files), they will be organized by the unit of Study (same as the PACS system), and we named it Data.
Task
When you assign a project member to a Data row (a study), this will create a new Task. Each Task is assigned to one and only one assignee. And a study contains many tasks of different workers.
Label
The label is the possible class associations (e.g. covid-19 or if x-ray contains a tumor) that your machine learning algorithm will predict.
Through data labeling, the raw data are identified and labels are added to give the data more meaningful information so that a machine learning model can learn from it.
Label group
Label group is a structured collection of your object labels. It allows users to re-use a set or entire of label classes and their attributes over different projects.
Annotation
Annotation is the technique through which we label data to make objects recognizable by machines.
In this documentation, we use annotation and label interchangeably. Sometimes, annotation means the technique which is labeling. Sometimes, annotation means the annotated data which is label data.
Related task
A task with the same study that is associated to.
Storage
Each storage represents a path to the user's data storage on a specific cloud provider. Currently, we use S3 AWS as the main cloud provider. Other providers will come soon.
Each storage has information: Name, Bucket, Data path, Region, Access key, Secret key
Dataset
A project can have many datasets. Each Dataset corresponds to a connection to a Storage. When reconnecting 1 Storage in Project, a new Dataset is created
Sync dataset
Import data that is not already in the Project’s Dataset, from Storage
Abbreviations
PACS
Picture archiving and communication system
MPR
Multiplanar reconstruction
MRI
Magnetic resonance imaging
CT
Computed tomography
CSV
A CSV (comma-separated values) file is a text file that has a specific format which allows data to be saved in a table structured format.
Last updated