The rapid growth of Generative AI has captured the attention of organizations and researchers alike, with its potential to create unique and original content. Large Language Models (LLMs) have made it possible to complete various tasks with ease. OpenAI's DALL-E, a text-to-image generation model, allows users to create realistic images from textual prompts and has already amassed over a million users. OpenAI has now expanded its portfolio with the release of Shap-E, a conditional generative model designed to generate 3D assets.
Shap-E stands out from traditional models that generate single output representations. Instead, it produces the parameters of implicit functions, which can be rendered as textured meshes or neural radiance fields (NeRF) for versatile and realistic 3D asset generation.
Researchers first trained an encoder to take 3D assets as input and map them into the parameters of an implicit function. This allowed the model to learn the underlying representation of the 3D assets thoroughly. Following this, a conditional diffusion model was trained using the encoder's outputs. The diffusion model learns the conditional distribution of the implicit function parameters given the input data and generates diverse and complex 3D assets by sampling from the learned distribution. The model was trained on a large dataset of paired 3D assets and corresponding textual descriptions.
Shap-E uses INRs for 3D representations, which provide a versatile and flexible framework by capturing detailed geometric properties of 3D assets. The two types of INRs utilized in Shap-E are Neural Radiance Fields (NeRF) and DMTet with its extension GET3D.
NeRF maps coordinates and viewing directions to densities and RGB colors, enabling realistic and high-fidelity rendering from arbitrary viewpoints. DMTet and GET3D represent textured 3D meshes by mapping coordinates to colors, signed distances, and vertex offsets, allowing the construction of 3D triangle meshes in a differentiable manner.
The Shap-E model has demonstrated its ability to produce high-quality outputs in seconds. Example results include 3D assets for textual prompts such as a bowl of food, a penguin, a voxelized dog, a campfire, and an avocado-shaped chair. When compared to Point·E, another generative model for point clouds, Shap-E exhibited faster convergence and achieved comparable or better sample quality, despite modeling a higher-dimensional, multi-representation output space.
For random samples on selected prompts, see samples.md.
Shap-E is a promising and significant addition to the world of Generative AI, offering an efficient and effective generative model for 3D assets. Its capacity to generate versatile and realistic 3D assets is poised to revolutionize the industry, opening up new possibilities for content creators and researchers alike.
To get started with examples, see the following notebooks:
For the latest news & updates
Join our newsletter