← All model leaderboards · Updated 2026-04-07

text to speech

Text to Speech Model Leaderboard

Compare the top text to speech (TTS) models across Artificial Analysis and Design Arena - unified rankings, source-by-source scores, speed, and API pricing in one table.

Top consensus: Inworld TTS 1.5 Max

Try AI voice generation

Models ranked

35

in this table

Leading model

Inworld TTS 1.5 Max

100.0 consensus

Median consensus

56.0

typical model in this list

Gap to 2nd

4.8 pts

consensus points, 1st vs 2nd

Showing 35 of 35 · Snapshot 2026-04-07
SourcesArtificial AnalysisDesign Arena
🥇

Inworld TTS 1.5 Max

Inworld

100.0

AA

#1

1217

DA

Speed 7.41s$10.0/1M chars
🥈

Inworld TTS 1 Max

Inworld

95.2

AA

#3

1173

DA

Speed 9.52s$10.0/1M chars
🥉

Inworld TTS 1.5

Inworld

92.9

AA

#4

1168

DA

Speed 2.86s$5.0/1M chars
4

MiniMax Speech 2.8 HD

MiniMax

90.5

AA

#5

1162

DA

Speed 5.88s$100.0/1M chars
5

MiniMax Speech 2.8 Turbo

MiniMax

88.1

AA

#6

1145

DA

Speed 4.65s$60.0/1M chars
6

ElevenLabs v3

ElevenLabs

87.7

AA

#2

1177

DA

#3

1239

Speed 20s$206.0/1M chars
7

MiniMax Speech 2.6 Turbo

MiniMax

83.3

AA

#7

1130

DA

Speed 4.76s$60.0/1M chars
8

MiniMax Speech 2.6 HD

MiniMax

81.0

AA

#8

1127

DA

Speed 5.13s$100.0/1M chars
9

MiniMax Speech-02-HD

MiniMax

78.6

AA

#9

1119

DA

Speed 3.57s$100.0/1M chars
10

ElevenLabs Multilingual v2

ElevenLabs

73.8

AA

#10

1106

DA

Speed 9.09s$206.0/1M chars
11

OpenAI TTS-1

OpenAI

71.4

AA

#11

1102

DA

Speed 13.33s$15.0/1M chars
12

MiniMax Speech-02-Turbo

MiniMax

69.0

AA

#12

1100

DA

Speed 5.71s$60.0/1M chars
13

OpenAI GPT-4o Mini TTS

OpenAI

66.7

AA

DA

#4

1227

Speed 9.3s
14

OpenAI TTS-1 HD

OpenAI

66.7

AA

#13

1099

DA

Speed 16.67s$30.0/1M chars
15

ElevenLabs Turbo v2.5

ElevenLabs

64.3

AA

#14

1097

DA

Speed 2.04s$103.0/1M chars
16

ElevenLabs Flash v2.5

ElevenLabs

59.5

AA

#15

1087

DA

Speed 2.04s$103.0/1M chars
17

Google Gemini 2.5 Flash Lite TTS

Google

57.1

AA

#16

1081

DA

Speed 18.18s$10.0/1M chars
18

Google Gemini 2.5 Pro TTS

Google

56.0

AA

#29

1022

DA

#1

1380

Speed 36.1s
19

Speechify SIMBA 1.6

Speechify

54.8

AA

#17

1069

DA

Speed 8.33s$10.0/1M chars
20

Google Gemini 2.5 Flash TTS

Google

54.0

AA

#27

1032

DA

#2

1328

Speed 33.33s
21

Cartesia Sonic 3

Cartesia

50.0

AA

#18

1062

DA

Speed 22.22s$46.7/1M chars
22

Google Studio

Google

47.6

AA

#19

1060

DA

Speed 3.23s$160.0/1M chars
23

MiniMax T2A-01-HD

MiniMax

45.2

AA

#20

1058

DA

Speed 5s$50.0/1M chars
24

Murf AI Gen2

Murf AI

44.4

AA

DA

#6

1087

Speed 4s
25

Microsoft Azure Neural

Microsoft Azure

35.7

AA

#23

1047

DA

Speed 2.7s$15.0/1M chars
26

Hume AI Octave 2

Hume AI

33.3

AA

#24

1046

DA

Speed 25s$7.6/1M chars
27

Qwen3 TTS Flash

Alibaba

33.3

AA

DA

#7

1036

Speed 10s
28

Kokoro 82M v1.0 (Open Weights)

Kokoro

32.5

AA

#21

1056

DA

#8

1025

Speed 3.45s$0.7/1M chars
29

Hume AI Octave TTS

Hume AI

31.3

AA

#31

1016

DA

#5

1091

Speed 20s$93.8/1M chars
30

Cartesia Sonic English (Oct 2024)

Cartesia

28.6

AA

#25

1045

DA

Speed 7.69s$46.7/1M chars
31

Google Chirp 3: HD

Google

26.2

AA

#26

1043

DA

Speed 8.33s$30.0/1M chars
32

Amazon Polly Generative

Amazon

20.2

AA

#22

1051

DA

#10

978

Speed 0.889s$30.0/1M chars
33

Google Journey

Google

16.7

AA

#28

1032

DA

Speed 33.33s$160.0/1M chars
34

Deepgram Aura v2

Deepgram

11.1

AA

DA

#9

1003

Speed 19.9s
35

MiniMax T2A-01-Turbo

MiniMax

9.5

AA

#30

1021

DA

Speed 4.55s$30.0/1M chars

Methodology

Each source uses preference data to estimate skill scores. We map ranks to percentiles and average where a model appears on multiple lists. The bar in the Consensus column is green; purple, rose match Artificial Analysis, Design Arena columns. Speed is approximate time to the first image.

FAQ

Answers below use the same snapshot as the table above (as of 2026-04-07, 35 models). Figures are from our export, not live pages at Artificial Analysis, Design Arena—those sites may have moved on since we built this snapshot. The Consensus column is our average of percentile ranks across benchmarks where each model appears.

We use each model's released field from the export. Among rows with a parseable date, the newest in this snapshot include: MiniMax Speech 2.8 HD (2026-02-01); MiniMax Speech 2.8 Turbo (2026-02-01); ElevenLabs v3 (2026-02-01).

By default we sort by Consensus, so Inworld TTS 1.5 Max leads this snapshot at 100.0 (average percentile across benchmarks where the model appears). By Elo in the Artificial Analysis column alone, Inworld TTS 1.5 Max is highest at 1217. “Best” still depends on price, latency, and which benchmarks you care about—use the sortable table.

The Elo values below are the Artificial Analysis numbers in this export (2026-04-07), not necessarily what you see on Artificial Analysis today:
  1. Inworld TTS 1.5 Max — Elo 1217
  2. ElevenLabs v3 — Elo 1177
  3. Inworld TTS 1 Max — Elo 1173
  4. Inworld TTS 1.5 — Elo 1168
  5. MiniMax Speech 2.8 HD — Elo 1162

Text-to-speech (TTS) models take written text and synthesize spoken audio. Our table aggregates TTS-specific leaderboards from the sources named in the header. Voice cloning, speech-to-speech, and speech-to-text (ASR) systems live on different benchmarks and are not included here.

Each upstream source runs preference tests and publishes ranks or scores. We map those to percentiles within each benchmark, then average across benchmarks where a model appears—that is the Consensus column (see Methodology above the FAQ). Per-source columns show the ranks and scores stored in our snapshot for Artificial Analysis, Design Arena. To change upstream leaderboards, participate on those sites; our table updates when we refresh the export.

We flag open-weights rows from the export name suffix “Open Weights”. By Artificial Analysis Elo in this snapshot, the highest are:
  1. Kokoro 82M v1.0 (Open Weights) — Elo 1056
Treat naming as a signal only—confirm license terms with each provider before production use.

Elo in our table is the value from the snapshot for the Artificial Analysis column (and similar skill estimates elsewhere). Margin of error / intervals (e.g. in CI columns) come from that same export. For how Artificial Analysis computes Elo from votes, see their methodology; our numbers stay fixed until the next snapshot refresh.