mirshad7
/

NeRF-MAE

Feature Extraction

Model card Files Files and versions

xet

Community

mirshad7 commited on Mar 18

Commit

8d9200a

1 Parent(s): 5f1484d

initial commit

Browse files

Files changed (1) hide show

README.md +67 -1

README.md CHANGED Viewed

@@ -58,6 +58,19 @@
 - **NeRF-MAE**: The first large-scale pretraining utilizing Neural Radiance Fields (NeRF) as an input modality. We pretrain a single Transformer model on thousands of NeRFs for 3D representation learning.
 - **NeRF-MAE Dataset**: A large-scale NeRF pretraining and downstream task finetuning dataset.
 ## Citation
 If you find this repository or our dataset useful, please star ⭐ this repository and consider citing 📝:
@@ -102,15 +115,68 @@ cd ../../../..
 ## ⛳ Model Usage and Checkpoints
 NeRF-MAE is structured to provide easy access to pretrained NeRF-MAE models (and reproductions), to facilitate use for various downstream tasks. This is for extracting good visual features from NeRFs if you don't have resources for large-scale pretraining. Our pretraining provides an easy-to-access embedding of any NeRF scene, which can be used for a variety of downstream tasks in a straightforwaed way.
-We have released 1.pretrained and 2. finetuned checkpoints to start using our codebase out-of-the-box. Below is a sample useage of our model with spelled out comments in a few lines of code:
 ```
 import torch
 # Load data from the specified folder and filename with the given resolution.
 res, rgbsigma = load_data(folder_name, filename, resolution=args.resolution)
 # Build the model using provided arguments.
 model = build_model(args)

 - **NeRF-MAE**: The first large-scale pretraining utilizing Neural Radiance Fields (NeRF) as an input modality. We pretrain a single Transformer model on thousands of NeRFs for 3D representation learning.
 - **NeRF-MAE Dataset**: A large-scale NeRF pretraining and downstream task finetuning dataset.
+## 🏷️ TODO 🚀
+- [x] Release large-scale pretraining code 🚀
+- [x] Release NeRF-MAE dataset comprising radiance and density grids 🚀
+- [x] Release 3D object detection finetuning and eval code 🚀
+- [x] Pretrained NeRF-MAE checkpoints and out-of-the-box model usage 🚀
+## NeRF-MAE Model Architecture
+<p align="center">
+<img src="demo/nerf-mae_architecture.jpg" width="90%">
+</p>
 ## Citation
 If you find this repository or our dataset useful, please star ⭐ this repository and consider citing 📝:
 ## ⛳ Model Usage and Checkpoints
+- [Hugginface repo to download pretrained and finetuned checkpoints](https://huggingface.co/mirshad7/NeRF-MAE)
 NeRF-MAE is structured to provide easy access to pretrained NeRF-MAE models (and reproductions), to facilitate use for various downstream tasks. This is for extracting good visual features from NeRFs if you don't have resources for large-scale pretraining. Our pretraining provides an easy-to-access embedding of any NeRF scene, which can be used for a variety of downstream tasks in a straightforwaed way.
+We have released pretrained and finetuned checkpoints to start using our codebase out-of-the-box. There are two types of usages. 1. Most common one is using the features directly in a downstream task such as an FPN head for 3D Object Detection and 2. Reconstruct the original grid for enforcing losses such as masked reconstruction loss. Below is a sample useage of our model with spelled out comments.
+1. Get the features to be used in a downstream task
+```
+import torch
+# Define Swin Transformer configurations
+swin_config = {
+    "swin_t": {"embed_dim": 96, "depths": [2, 2, 6, 2], "num_heads": [3, 6, 12, 24]},
+    "swin_s": {"embed_dim": 96, "depths": [2, 2, 18, 2], "num_heads": [3, 6, 12, 24]},
+    "swin_b": {"embed_dim": 128, "depths": [2, 2, 18, 2], "num_heads": [3, 6, 12, 24]},
+    "swin_l": {"embed_dim": 192, "depths": [2, 2, 18, 2], "num_heads": [6, 12, 24, 48]},
+}
+# Set the desired backbone type
+backbone_type = "swin_s"
+config = swin_config[backbone_type]
+# Initialize Swin Transformer model
+model = SwinTransformer_MAE3D_New(
+    patch_size=[4, 4, 4],
+    embed_dim=config["embed_dim"],
+    depths=config["depths"],
+    num_heads=config["num_heads"],
+    window_size=[4, 4, 4],
+    stochastic_depth_prob=0.1,
+    expand_dim=True,
+    resolution=resolution,
+)
+# Load checkpoint and remove unused layers
+checkpoint = torch.load(checkpoint_path, map_location="cpu")
+model.load_state_dict(checkpoint["state_dict"])
+for attr in ["decoder4", "decoder3", "decoder2", "decoder1", "out", "mask_token"]:
+    delattr(model, attr)
+# Extract features using Swin Transformer backbone. input_grid has sample shape torch.randn((1, 4, 160, 160, 160))
+features = []
+input_grid = model.patch_partition(input_grid) + model.pos_embed.type_as(input_grid).to(input_grid.device).clone().detach()
+for stage in model.stages:
+    input_grid = stage(input_grid)
+    features.append(torch.permute(input_grid, [0, 4, 1, 2, 3]).contiguous())  # Format: [N, C, H, W, D]
+#Multi-scale features have shape:  [torch.Size([1, 96, 40, 40, 40]), torch.Size([1, 192, 20, 20, 20]), torch.Size([1, 384, 10, 10, 10]), torch.Size([1, 768, 5, 5, 5])]
+# Process features through FPN
+```
+2. Get the Original Grid Output
 ```
 import torch
 # Load data from the specified folder and filename with the given resolution.
 res, rgbsigma = load_data(folder_name, filename, resolution=args.resolution)
+# rgbsigma has sample shape torch.randn((1, 4, 160, 160, 160))
 # Build the model using provided arguments.
 model = build_model(args)