OVIC@NIPS description

THIS IS AN EXTERNAL DOC, ANY PARTNER CAN ACCESS

v2:11/09/2018

ovic@google.com

Description

Continuing the success of LPIRC at CVPR 2018, OVIC is hosting a winter installment with more categories and tasks. We are targeting NIPS2018 to announce the winners for three categories below:

  1. Real-time image classification. This is the original OVIC task where we focus on Imagenet classification models operating at 30 ms / image.
  2. (new) Interactive image classification. Similar to the category above but the latency budget is extended  to 100 ms / image.
  3. (new) Interactive object detection. The newly introduced category focuses on COCO detection models at 100 ms / image.

A participant / team can submit to and win prizes in multiple categories. The submissions will be a single model in TensorflowLite format.

(new) A participant is also encouraged to contribute to the TensorflowLite codebase to support / expedite their models. Final scores will be computed using a stable build after submission closes[1].

Image Classification Challenges

Category 1) and 2) are based on ImageNet classification. Training data are available at the ILSVRC 2012 website. Participants are encouraged to check out this tutorial for training quantized Mobilenet models.

Submission

The models must expect input tensors with dimensions [1 x input_height x input_width x 3], where the first dimension is batch size and the last dimension is channel count, and input_height and input_width are the integer height and width expected by the model, each must be between 1 and 1000. The output must be a [1 x 1001] tensor encoding probabilities of the classes, with the first value corresponding to the “background” class. The list of the full labels is here.

The participants can convert their Tensorflow model into a submission-ready model using the following command:

bazel-bin/tensorflow/lite/toco/toco -- \ 

--input_file="${local_frozen}"  --output_file="${toco_file}" --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE \ --inference_type="${inference_type}" \

--inference_input_type=QUANTIZED_UINT8 \ --input_shape="1,${input_height},${input_width},3" \

--input_array="${input_array}" \

--output_array="${output_array}" \

--mean_value="${mean_value}"  --std_value="${std_value}"

where local_frozen is the frozen graph definition;

 inference_type is either FLOAT or QUANTIZED_UINT8;

 input_array and output_array are the names of the input and output in the tensorflow graph; and mean_value and std_value are the mean and standard deviation of the input image.

Note that:

The input type is always QUANTIZED_UINT8, and specifically, RGB images with pixel values between 0 and 255. This requirement implies that for floating point models, a Dequantize op will be automatically inserted at the beginning of the graph to convert UINT8 inputs to floating-point by subtracting mean_value and dividing by std_value.

Evaluation

Submissions are evaluated based on classification accuracy / time while focusing on the real-time regime (defined below) running on Google’s Pixel 2 phone.

  1. Each submission is evaluated using a single thread with a batch-size of 1 on a single big core of the Pixel 2 phone.
  2. Each submission will be evaluated on both the ImageNet validation set and a hold out test set freshly collected for the competition.
  3. (new) A submission will be mapped to one of two latency buckets: 24-36ms (inclusive) for the real-time category, and  80-120ms (inclusive) for the interactive category. The test metric is the accuracy improvement over the empirical Pareto frontier, established by the state-of-the-art models established by the Mobilenet family (including quantized Mobilenet V2) and submissions from the previous OVIC competition.
    Specifically, a formula
    a(t) = k log(t) + a0
    will be fitted to models on the Pareto frontier to correlate model accuracy a(t) with latency t. The test metric M of a model with latency T and accuracy A is:
    M(A, T) = A - a(T)
    For models that do not fit into these two buckets:  if T < 24ms, M(A, T) = M(A, 24ms); if 36ms < T < 80ms, M(A, T) = M(A, 80ms); If T > 120ms, the submission is invalid.

Figure above illustrates Pareto frontiers estimated from baseline models (shown as dots) in the two latency buckets, and how the test metric is computed for a submission (star) as the offset from the estimated frontier. Illustration only, no real data point used.

  1. (new) The participant only needs to submit the model once and it will automatically assign the right category based on its latency.
  2. The LPIRC site has a leaderboard that shows the test metric on ImageNet validation set. The winners will be determined on their accuracy on a hold out test set. Note that discrepancies are expected between these two accuracy scores.
  3. An SDK is provided to aid the development. Check out the OVIC instruction page in Tensorflow. The SDK contains:
  1. A Java validator to catch runtime errors of the model. Submissions failing the validator will not be scored.
  2. Sample TFLite models (see full instructions here).
  3. A Java test to debug model performance.
  4. An Android benchmarker app to time a submission on any Android phone

Items a) and b) allow the participants to debug runtime errors. Submissions must pass the validator and the test, and must be compatible with the benchmarker app in order to be scored.

Item d) allows the participants to measure latency of their submissions on their local phone. Note that latency obtained via d) may be different from the latency reported by the competition’s server due to language differences, device specs and evaluation settings, etc. In all cases the latency reported by the competition’s server will be used.

Object Detection Challenge

Category 3) is based on COCO object detection. Training data are available from the COCO website. Participants are encouraged to check out Tensorflow’s ObjectDetectionAPI tutorial for training detection models.

Submission

For category 3), the instructions for the inputs are the same: the submissions should expect input tensors with dimensions [1 x input_height x input_width x 3], where the first dimension is batch size and the last dimension is channel count, and input_height and input_width are the integer height and width expected by the model, each must be between 1 and 1000. Inputs should contain RGB values between 0 and 255.

The output should contain four tensors:

  1. output locations of size [1 x 100 x 4] representing the coordinates of 100 detection boxes. Each box is represented by [start_y, start_x, end_y, end_x] where 0 <= start_x <= end_x <= 1, and 0 <= start_y <= end_y <= 1. The x’s correspond to the width dimension and the y’s to the height dimension.
  2. Output classes of size [1 x 100] representing the class indices of the 100 boxes. The index starts from 0.
  3. Output scores of size [1 x 100] representing the class probabilities of the 100 boxes.
  4. Number of detections (scalar) representing the number of detections. This must be 100.

The recommended way to produce these tensors is to use Tensorflow’s object detection API. Let config_path points to the TrainEvalPipelineConfig used to create and train the model, and checkpoint_path points to the checkpoint of the model. Participants can create a frozen tensorflow model in directory output_dir using the following command:

bazel-bin/tensorflow_models/object_detection/export_tflite_ssd_graph \

  --pipeline_config_path="${config_path}" --output_directory="${output_dir}" \

  --trained_checkpoint_prefix="${checkpoint_path}" \

  --max_detections=100 \

  --add_postprocessing_op=true \

  --use_regular_nms=${use_regular_nms}

Where use_regular_nms is a binary flag that controls whether the regular non-max suppression is used, with the alternative being a faster non-max suppression implementation that is less accurate.

Participants can convert their Tensorflow model into a submission-ready model using the following command:

bazel-bin/tensorflow/lite/toco/toco \

  --input_file="${local_frozen}" --output_file="${toco_file}" \

  --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE \

  --inference_type=${inference_type} \

  --inference_input_type=QUANTIZED_UINT8 \

  --input_shapes="1,${input_height},${input_width},3" \

  --input_arrays="${input_array}" \

  --output_arrays=\

'TFLite_Detection_PostProcess',\

'TFLite_Detection_PostProcess:1',\

'TFLite_Detection_PostProcess:2',\

'TFLite_Detection_PostProcess:3' \

  --change_concat_input_ranges=false --allow_custom_ops

  --mean_values="${mean_value}" --std_values="${std_value}"

where local_frozen is the frozen graph definition;

 inference_type is either FLOAT or QUANTIZED_UINT8;

The images will be resized to these dimensions but it is up to the participant to pick dimensions that are not too small to adversely impact accuracy or too large to adversely impact model run-time.

 input_array and output_array are the names of the input and output in the tensorflow graph; and mean_value and std_value are the mean and standard deviation of the input image.

Note that:

The input type is always QUANTIZED_UINT8, and specifically, RGB images with pixel values between 0 and 255. This requirement implies that for floating point models, a Dequantize op will be automatically inserted at the beginning of the graph to convert UINT8 inputs to floating-point by subtracting mean_value and dividing by std_value.

Evaluation

Submissions are evaluated based on detection mAP and time while focusing on the interactive regime (defined below) running on Google’s Pixel 2 phone.

  1. Each submission is evaluated using a single thread with a batch-size of 1 on a single big core of the Pixel 2 phone.
  2. Each submission will be evaluated on both the COCO minival set (detailed in point 4) and COCO test set (2017 test-dev). The submission will run on each set for B x T ms where B is the per-image latency budget, which is 100 (ms), and T is the number of images in the set. The test metric is defined as the COCO mAP over all images. Unprocessed images will not return any detection boxes, thus they negatively impacts the AP.
  3. There is a public leaderboard on the LPIRC site that shows the test metric on the minival set. The winners will be determined based on their model’s score of the test metric on a hold out test set. Note that discrepancies are expected between the test metrics on these two datasets. The submission with the highest test metric wins. Ties will be broken by submission time.
  4. The image IDs of the minival dataset are found here.
    Note: Make sure to check out the minival set ID as there are multiple minival sets in the literature. Participants are encouraged to exclude minival from training to get a less biased estimate of the generalization performance, as the final model will be evaluated on the holdout set.
  5. The following source code is provided to aid the development of the participants.
  1. A Java validator to catch runtime errors of the model. Submissions failing the validator will not be scored.
  2. A sample TFLite object detection model (see full instructions here).
  3. A Java test to debug model performance.
  4. An Android benchmarker app to time a submission on any Android phone.

Item d) allows the participants to measure latency of their submissions on their local phone. Note that latency obtained via d) may be different from the latency reported by the competition’s server due to language differences, device specs and evaluation settings, etc.  In all cases the latency reported by the competition’s server will be used.

Disclaimers:

All submissions, along with the empirical Pareto frontier, will be re-computed after submission closes using the same codebase version. Regressions / improvements may happen as a result of versioning difference between the time of submission and the time of evaluation. In case of a significant regression the organizers may consider using the better measurement between the two.

Participants should submit their own work and develop innovative solutions. Please do not submit released tflite models or solutions from the previous OVIC competition, or else the submission may be disqualified.

Latency measurements of all submissions, reference models and the empirical Pareto frontier will be recomputed using the codebase on Nov 30, 2018. Regressions / improvements may happen as a result of versioning difference.

Prizes

The first and second place teams for each category will be awarded $1,500 and $500, respectively.

FAQs

Who can participate?

A participant must be at least 13 years old, not a citizen of US embargoed countries, and not affiliated with the organizers or sponsors (Purdue University, Duke University, University of North Carolina Chapel Hill or employees of Facebook or Alphabet Inc.)

Timeline

Registration open

Oct 15 2018

Submission open

Nov 1, 2018

Submission closed

Nov 30, 2018

Winner announced

Dec 5, 2018


[1]  In case of significant latency regression in the final build, the latency measured at the time of submission may be considered at the discretion of the organizing committee.