cilix | Learning Unified Representation of 3D Gaussian Splatting

Abstract

A well-designed vectorized representation is crucial for the learning systems natively based on 3D Gaussian Splatting. While 3DGS enables efficient and explicit 3D reconstruction, its parameter-based representation remains hard to learn as features, especially for neural-network-based models. Directly feeding raw Gaussian parameters into learning frameworks fails to address the non-unique and heterogeneous nature of the Gaussian parameterization, yielding highly data-dependent models. This challenge motivates us to explore a more principled approach to represent 3D Gaussian Splatting in neural networks that preserves the underlying color and geometric structure while enforcing unique mapping and channel homogeneity. In this paper, we propose an embedding representation of 3DGS based on continuous submanifold fields that encapsulate the intrinsic information of Gaussian primitives, thereby benefiting the learning of 3DGS.

Motivation

3D Gaussian Splatting (3DGS) enables high-fidelity and real-time view synthesis by parameterizing each primitive with \( \boldsymbol{\theta} = \{\boldsymbol{\mu}, \mathbf{q}, \mathbf{s}, \mathbf{c}, o\}\), where \(\boldsymbol{\mu}\) is position, \(\mathbf{q}\) rotation, \(\mathbf{s}\) scale, \(\mathbf{c}\) spherical harmonics coefficients, and \(o\) opacity. However, this parametric space is ill-suited for learning:

Non-uniqueness: distinct parameter sets (e.g., \(\mathbf{q}\) vs. \(-\mathbf{q}\)) produce identical renderings.
Numerical heterogeneity: parameters span different ranges and reside on distinct manifolds (\(\mathbb{R}^3\), \(SO(3)\), SH space).
Instability: autoencoders trained on parameters collapse under domain shifts and noise.

These issues motivate a new representation that ensures uniqueness, homogeneity, and geometric consistency.

Figure 1: Overview of our method. We propose a novel embedding representation for 3D Gaussian Splatting based on continuous submanifold fields. Our approach effectively captures the intrinsic information of Gaussian primitives, facilitating improved learning and representation of 3DGS.

Submanifold Field Representation of 3DGS

Each Gaussian primitive is mapped to a submanifold field defined on its iso-probability ellipsoid: \( \mathcal{M} = \{\mathbf{x} \in \mathbb{R}^3 \mid (\mathbf{x}-\boldsymbol{\mu})^\top \Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu}) = r^2 \}\), with color field \(F(\mathbf{x}) = \sigma(o)\cdot \text{Color}(\mathbf{d}_\mathbf{x})\), where \(\mathbf{d}_\mathbf{x} = (\mathbf{x}-\boldsymbol{\mu})/\|\mathbf{x}-\boldsymbol{\mu}\|\). This mapping is provably unique and injective, unlike the parametric case.

We discretize \((\mathcal{M},F)\) as a colored point cloud and encode it using a Submanifold Field VAE (SF-VAE):

Encoder: PointNet maps sampled points to a 32-D latent code.
Decoder: reconstructs the field and recovers Gaussian parameters via PCA + SH fitting.
Loss: Wasserstein-2 based Manifold Distance (M-Dist), aligning spatial and color similarity: \( d^2((\mathbf{x},c_x),(\mathbf{y},c_y)) = \|\mathbf{x}-\mathbf{y}\|^2 + \lambda \|c_x - c_y\|^2 \).

Figure 2: Submanifold Field VAE pipeline. Point clouds sampled from iso-probability surfaces are encoded, embedded, and reconstructed.

Evaluation

We evaluate on ShapeSplat and Mip-NeRF 360, comparing submanifold fields with parametric baselines under matched capacity. Key results:

Higher fidelity: SF-VAE achieves PSNR↑63.4 vs 44.7/37.5 for parametric baselines (ShapeSplat).
Generalization: superior cross-domain transfer (object → scene PSNR↑19.2 vs 9.8 for parametric).
Robustness: embeddings resilient to noise; interpolations smooth and semantically consistent.
Better latent structure: unsupervised clustering on SF embeddings shows clearer semantic separation.

Figure 3: Reconstruction and latent space evaluation. Submanifold fields yield higher perceptual fidelity, robustness, and semantic clustering.