Abstract: The neural codec language model (CLM) has demonstrated remarkable performance in text-to-speech (TTS) synthesis. However, troubled by "recency bias", CLM lacks sufficient attention to coarse ...