3D Model Construction for a target class using XAI


Team

Table of Contents

  1. Introduction
  2. Image Classification Model
  3. SHAP (SHapley Additive exPlanations)
  4. 3D reconstruction using Point E
  5. Links

Introduction

The aim of this project is to create a general 3D skeleton for a target object through a process which uses a model that understands the unique and real features of the target object using XAI.

Problem Domain

Project scope

Here we aim to uncover the underlying decision-making criteria and features using explainable methods and reconstruct them in 3D for a better understanding

image (1) —-> image

Image Classification Model

As the image classification model, we’re using MobileNet Image Classification With TensorFlow’s Keras API. MobileNet is a deep learning model architecture designed for efficient image classification on mobile and embedded devices. It was developed by Google and is widely used due to its small size and fast inference speed.

The MobileNet architecture employs depthwise separable convolutions, which significantly reduces the number of parameters and computations required compared to traditional convolutional neural networks (CNNs). This reduction allows MobileNet to achieve a good balance between accuracy and model size.

Advantages

SHAP (SHapley Additive exPlanations)

SHAP (SHapley Additive exPlanations) is an advanced method for explaining the predictions of machine learning models. It is based on cooperative game theory and aims to assign an importance value to each input feature or variable in a model, indicating how much that feature contributes to the model’s prediction for a particular instance. It aids in understanding the factors driving the model’s decisions, enhances interpretability, and enables trust in the model’s outputs, making it valuable in real-world applications.

Key features

3D reconstruction using Point E

Typical 3D reconstruction models takes a long time to reconstruct a 3D model. The state-of-the-art methods typically require multiple GPU-hours to produce a single sample.

Here we are using the Point.E method which is introduced with the research paper, “Point·E: A System for Generating 3D Point Clouds from Complex Prompts” by Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin and Mark ChenIn. In this model, an alternative method for 3D object generation is used which produces 3D models in only 1-2 minutes on a single GPU.

Their method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image.

Comparing with state-of-the-art method this method falls short in terms of sample quality but faster in sampling. Since our application wants to sample thousands of images, speed of the sampling is required over the quality. So, the traid-off is accepted.

Images Classification results

Result : Model predicting the image as a car correctly car_edit

SHAP output

WhatsApp Image 2023-06-13 at 10 10 378

3D reconstruction results

dog1 dog2 red1 (1) car-b

Process & Progression

Machine Learning Section

2D to 3D Reconstruction Section

Software Engineering Section

Testing

ml_test1 (1) ml_test2 (1)