Official implementation of SpecGen (WACV 2026).
SpecGen predicts spectral BRDF (Bidirectional Reflectance Distribution Function) from a single input image using a HyperNetwork architecture based on K-Planes representation.
Input Image → CNN Encoder → MLP generates 6 K-Planes → Bilinear Interpolation → SKNet Feature Fusion → Decoder → Spectral BRDF
core/
├── __init__.py # Module initialization
├── model.py # KPlaneField model definition (core)
├── train.py # Training script
├── inference.py # Inference/rendering script
├── ops.py # Activation functions and interpolation
├── coords.py # Rusinkiewicz coordinate transformation
├── config.py # Configuration file
├── README.md # This file
├── data/
│ └── README.md # Detailed data format documentation
└── renderdata/ # Sample geometry data for inference
├── normals.npy
├── mask.npy
└── L.txt
A CNN that encodes the input image into a compact feature representation:
Conv2d(3→16) → ReLU → Conv2d(16→32) → ReLU → Conv2d(32→20) → ReLU
# Output: (batch, 20, 64, 64)The encoded features are passed through 6 separate MLPs to generate 6 feature planes:
| MLP | Output Shape | Corresponding Dimensions |
|---|---|---|
| mlp0 | (64, 90, 90) | θ_h × θ_d |
| mlp1 | (64, 180, 90) | φ_d × θ_h |
| mlp2 | (64, 39, 90) | λ × θ_h |
| mlp3 | (64, 180, 90) | φ_d × θ_d |
| mlp4 | (64, 39, 90) | λ × θ_d |
| mlp5 | (64, 39, 180) | λ × φ_d |
where λ represents the wavelength dimension for spectral BRDF.
Attention-weighted fusion of interpolated features from all 6 planes using SKNet-style selective kernel mechanism.
sigma_net: 64 → 32 dim features
color_net: 32 → 1 (BRDF value)We use the Rusinkiewicz parameterization for BRDF, which describes the lighting geometry with 3 angles:
- θ_h: Half-vector elevation angle [0°, 90°]
- θ_d: Difference-vector elevation angle [0°, 90°]
- φ_d: Difference-vector azimuthal angle [0°, 180°]
pip install torch numpy pillow scipy tqdm tensorboard
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torchfrom core import train_main
train_main(
config_path="core/config.py",
train_file="train.txt",
val_file="val.txt",
image_dir="./images",
data_dir="./data",
checkpoint_dir="./checkpoints",
batch_size=2048,
num_epochs=15
)from core import load_model, render_single_image
import torch
device = torch.device("cuda")
model = load_model("checkpoints/best.pth", "core/config.py", device)
render_single_image(
model=model,
img_path="input.png",
N_map_file="normals.npy",
mask_file="mask.npy",
L_file="light.txt",
out_dir="./output",
obj_name="test",
device=device
)For detailed data format documentation, see data/README.md.
| Data Type | Format | Shape/Content |
|---|---|---|
| Input Image | PNG | 256×256 RGB sphere render |
| BRDF Data | .npy |
(N, 5) - [θh, θd, φd, λ, value] |
| Normal Map | .npy |
(H, W, 3) normalized normals |
| Mask | .npy |
(H, W) binary mask |
| Light Dirs | .txt |
one x y z direction per line |
SpectralLoss consists of three components:
- MSE Loss: Mean squared error between prediction and ground truth
- Scale Loss: Scale-invariant constraint
- TV Loss: Total variation regularization for smoothness
loss = mse_weight * mse_loss + scale_weight * scale_loss + tv_weight * tv_lossIf you find this work useful, please cite:
@inproceedings{specgen2026,
title={SpecGen: Neural Spectral BRDF Generation via Spectral-Spatial Tri-plane Aggregation},
author={},
booktitle={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year={2026}
}This project builds upon the K-Planes framework:
@inproceedings{kplanes_2023,
title={K-Planes: Explicit Radiance Fields in Space, Time, and Appearance},
author={Sara Fridovich-Keil and Giacomo Meanti and Frederik Rahbæk Warburg
and Benjamin Recht and Angjoo Kanazawa},
booktitle={CVPR},
year={2023}
}This project is released under the MIT License.
- The model requires CUDA support via
tiny-cuda-nn - Mixed precision training (
autocast) is recommended avg_flagswitches between Spectral/RGB modes