D³QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

Zhang, Yanran; Yu, Bingyao

D³QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

Yanran Zhang^*¹, Bingyao Yu^*,†¹, Yu Zheng¹ Wenzhao Zheng¹ Yueqi Duan² Lei Chen¹ Jie Zhou¹ Jiwen Lu^†¹

¹ Department of Automation, Tsinghua University, China
² Department of Electronic Engineering, Tsinghua University, China
ICCV 2025
^*Indicates Equal Contribution, ^†Corresponding Author

Paper Code arXiv

Codebook Activation Difference Visualization

Our key insight is the stark distribution discrepancy in the discrete latent space of autoregressive(AR) models. Token distribution plots reveal the long-tail token usage in real data versus the concentrated, high-frequency bias in fakes. This is confirmed by codebook activation heatmaps, which show a balanced pattern for real images but polarized hotspots for fakes—the core artifact for our detection method.

Abstract

The emergence of visual autoregressive (AR) models has revolutionized image generation while presenting new challenges for synthetic image detection. Unlike previous GAN or diffusion-based methods, AR models generate images through discrete token prediction, exhibiting both marked improvements in image synthesis quality and unique characteristics in their vector-quantized representations. In this paper, we propose to leverage Discrete Distribution Discrepancy-aware Quantization Error D³QE for autoregressive-generated image detection that exploits the distinctive patterns and the frequency distribution bias of the codebook existing in real and fake images. We introduce a discrete distribution discrepancy-aware transformer that integrates dynamic codebook frequency statistics into its attention mechanism, fusing semantic features and quantization error latent. To evaluate our method, we construct a comprehensive dataset termed ARForensics covering 7 mainstream visual AR models. Experiments demonstrate superior detection accuracy and strong generalization of D³QE across different AR models, with robustness to real-world perturbations.

Introduction

Autoregressive (AR) models create forgeries in the discrete latent space via discrete token prediction, evading conventional detectors. We observe a strong Discrete Distribution Discrepancy: real images follow a long-tail token distribution, while fakes concentrate probability in high-frequency regions, showing polarized codebook usage.

Our Main Contributions:

D³QE Framework: Analyzes the codebook distribution bias and quantization error from AR generation.
D³AT Transformer: Features Discrepancy-Aware Self-Attention (D³ASA), integrating codebook statistics (ΔD) to fuse quantization error with semantic features.
ARForensics Benchmark: The first dataset for AR-generated image detection, covering 7 mainstream models to test generalization.

Dataset Visualization

The ARForensics dataset contains samples from 7 mainstream AR models ( LlamaGen, VAR, Infinity, Janus-Pro, RAR, Switti, Open-MAGVIT2), serving as a robust visual benchmark for testing the generalization of ai-generated image detection models.

Methodology

The D³QE framework fuses local discrete artifacts with global semantic features through four key components:

Quantization Error Representation: A frozen VQVAE Encoder extracts the error between the continuous latent map z and its discrete representation z_q.
Discrete Distribution Statistics: Computes the discrete distribution discrepancy (ΔD) from real vs. fake token usage statistics.
D³AT Transformer: Its core Discrepancy-Aware Self-Attention (D³ASA) module processes the quantization error, guided by global ΔD.
Semantic Feature Fusion: A frozen CLIP extracts semantic features, which are fused with the D³AT's output for final classification.

Experiments

We evaluate D³QE extensively on our proposed ARForensics benchmark, demonstrating its superior performance in both intra-model testing and cross-model generalization.

The model's generalization to GAN-generated images is assessed on ForenSynths.

Its generalization to Diffusion-based images is evaluated on the GenImage dataset.

We also evaluate the model's robustness under common real-world corruptions, such as JPEG compression and Gaussian blurring.

Poster

BibTeX


@article{zhang2025d3qe,
  title={$$\backslash$bf $\{$D\^{} 3$\}$ $ QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection},
  author={Zhang, Yanran and Yu, Bingyao and Zheng, Yu and Zheng, Wenzhao and Duan, Yueqi and Chen, Lei and Zhou, Jie and Lu, Jiwen},
  journal={arXiv preprint arXiv:2510.05891},
  year={2025}
}

More Works from Yanran Zhang

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

D³QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

Abstract

Introduction

Our Main Contributions:

Dataset Visualization

Methodology

Experiments

Poster

BibTeX