← All model leaderboards · Updated 2026-04-07

text to speech

Text to Speech Model Leaderboard

Compare the top text to speech (TTS) models across Artificial Analysis and Design Arena - unified rankings, source-by-source scores, speed, and API pricing in one table.

Top consensus: Inworld TTS 1.5 Max

Try AI voice generation

Models ranked

in this table

Leading model

Inworld TTS 1.5 Max

100.0 consensus

Median consensus

56.0

typical model in this list

Gap to 2nd

4.8 pts

consensus points, 1st vs 2nd

Showing 35 of 35 · Snapshot 2026-04-07

SourcesArtificial AnalysisDesign Arena

AI text to speech models — consensus, speed, and per-source ranks
#	Model
🥇	Inworld TTS 1.5 Max	Inworld	100.0	#1 1,217	— —	7.41s	$10.0/1M chars
🥈	Inworld TTS 1 Max	Inworld	95.2	#3 1,173	— —	9.52s	$10.0/1M chars
🥉	Inworld TTS 1.5	Inworld	92.9	#4 1,168	— —	2.86s	$5.0/1M chars
4	MiniMax Speech 2.8 HD	MiniMax	90.5	#5 1,162	— —	5.88s	$100.0/1M chars
5	MiniMax Speech 2.8 Turbo	MiniMax	88.1	#6 1,145	— —	4.65s	$60.0/1M chars
6	ElevenLabs v3	ElevenLabs	87.7	#2 1,177	#3 1,239	20s	$206.0/1M chars
7	MiniMax Speech 2.6 Turbo	MiniMax	83.3	#7 1,130	— —	4.76s	$60.0/1M chars
8	MiniMax Speech 2.6 HD	MiniMax	81.0	#8 1,127	— —	5.13s	$100.0/1M chars
9	MiniMax Speech-02-HD	MiniMax	78.6	#9 1,119	— —	3.57s	$100.0/1M chars
10	ElevenLabs Multilingual v2	ElevenLabs	73.8	#10 1,106	— —	9.09s	$206.0/1M chars
11	OpenAI TTS-1	OpenAI	71.4	#11 1,102	— —	13.33s	$15.0/1M chars
12	MiniMax Speech-02-Turbo	MiniMax	69.0	#12 1,100	— —	5.71s	$60.0/1M chars
13	OpenAI GPT-4o Mini TTS	OpenAI	66.7	— —	#4 1,227	9.3s	—
14	OpenAI TTS-1 HD	OpenAI	66.7	#13 1,099	— —	16.67s	$30.0/1M chars
15	ElevenLabs Turbo v2.5	ElevenLabs	64.3	#14 1,097	— —	2.04s	$103.0/1M chars
16	ElevenLabs Flash v2.5	ElevenLabs	59.5	#15 1,087	— —	2.04s	$103.0/1M chars
17	Google Gemini 2.5 Flash Lite TTS	Google	57.1	#16 1,081	— —	18.18s	$10.0/1M chars
18	Google Gemini 2.5 Pro TTS	Google	56.0	#29 1,022	#1 1,380	36.1s	—
19	Speechify SIMBA 1.6	Speechify	54.8	#17 1,069	— —	8.33s	$10.0/1M chars
20	Google Gemini 2.5 Flash TTS	Google	54.0	#27 1,032	#2 1,328	33.33s	—
21	Cartesia Sonic 3	Cartesia	50.0	#18 1,062	— —	22.22s	$46.7/1M chars
22	Google Studio	Google	47.6	#19 1,060	— —	3.23s	$160.0/1M chars
23	MiniMax T2A-01-HD	MiniMax	45.2	#20 1,058	— —	5s	$50.0/1M chars
24	Murf AI Gen2	Murf AI	44.4	— —	#6 1,087	4s	—
25	Microsoft Azure Neural	Microsoft Azure	35.7	#23 1,047	— —	2.7s	$15.0/1M chars
26	Hume AI Octave 2	Hume AI	33.3	#24 1,046	— —	25s	$7.6/1M chars
27	Qwen3 TTS Flash	Alibaba	33.3	— —	#7 1,036	10s	—
28	Kokoro 82M v1.0 (Open Weights)	Kokoro	32.5	#21 1,056	#8 1,025	3.45s	$0.7/1M chars
29	Hume AI Octave TTS	Hume AI	31.3	#31 1,016	#5 1,091	20s	$93.8/1M chars
30	Cartesia Sonic English (Oct 2024)	Cartesia	28.6	#25 1,045	— —	7.69s	$46.7/1M chars
31	Google Chirp 3: HD	Google	26.2	#26 1,043	— —	8.33s	$30.0/1M chars
32	Amazon Polly Generative	Amazon	20.2	#22 1,051	#10 978	0.889s	$30.0/1M chars
33	Google Journey	Google	16.7	#28 1,032	— —	33.33s	$160.0/1M chars
34	Deepgram Aura v2	Deepgram	11.1	— —	#9 1,003	19.9s	—
35	MiniMax T2A-01-Turbo	MiniMax	9.5	#30 1,021	— —	4.55s	$30.0/1M chars

🥇

Inworld TTS 1.5 Max

Inworld

100.0

1217

—

Speed 7.41s$10.0/1M chars

🥈

Inworld TTS 1 Max

Inworld

95.2

1173

—

Speed 9.52s$10.0/1M chars

🥉

Inworld TTS 1.5

Inworld

92.9

1168

—

Speed 2.86s$5.0/1M chars

MiniMax Speech 2.8 HD

MiniMax

90.5

1162

—

Speed 5.88s$100.0/1M chars

MiniMax Speech 2.8 Turbo

MiniMax

88.1

1145

—

Speed 4.65s$60.0/1M chars

ElevenLabs v3

ElevenLabs

87.7

1177

1239

Speed 20s$206.0/1M chars

MiniMax Speech 2.6 Turbo

MiniMax

83.3

1130

—

Speed 4.76s$60.0/1M chars

MiniMax Speech 2.6 HD

MiniMax

81.0

1127

—

Speed 5.13s$100.0/1M chars

MiniMax Speech-02-HD

MiniMax

78.6

1119

—

Speed 3.57s$100.0/1M chars

ElevenLabs Multilingual v2

ElevenLabs

73.8

#10

1106

—

Speed 9.09s$206.0/1M chars

OpenAI TTS-1

OpenAI

71.4

#11

1102

—

Speed 13.33s$15.0/1M chars

MiniMax Speech-02-Turbo

MiniMax

69.0

#12

1100

—

Speed 5.71s$60.0/1M chars

OpenAI GPT-4o Mini TTS

OpenAI

66.7

—

1227

Speed 9.3s—

OpenAI TTS-1 HD

OpenAI

66.7

#13

1099

—

Speed 16.67s$30.0/1M chars

ElevenLabs Turbo v2.5

ElevenLabs

64.3

#14

1097

—

Speed 2.04s$103.0/1M chars

ElevenLabs Flash v2.5

ElevenLabs

59.5

#15

1087

—

Speed 2.04s$103.0/1M chars

Google Gemini 2.5 Flash Lite TTS

Google

57.1

#16

1081

—

Speed 18.18s$10.0/1M chars

Google Gemini 2.5 Pro TTS

Google

56.0

#29

1022

1380

Speed 36.1s—

Speechify SIMBA 1.6

Speechify

54.8

#17

1069

—

Speed 8.33s$10.0/1M chars

Google Gemini 2.5 Flash TTS

Google

54.0

#27

1032

1328

Speed 33.33s—

Cartesia Sonic 3

Cartesia

50.0

#18

1062

—

Speed 22.22s$46.7/1M chars

Google Studio

Google

47.6

#19

1060

—

Speed 3.23s$160.0/1M chars

MiniMax T2A-01-HD

MiniMax

45.2

#20

1058

—

Speed 5s$50.0/1M chars

Murf AI Gen2

Murf AI

44.4

—

1087

Speed 4s—

Microsoft Azure Neural

Microsoft Azure

35.7

#23

1047

—

Speed 2.7s$15.0/1M chars

Hume AI Octave 2

Hume AI

33.3

#24

1046

—

Speed 25s$7.6/1M chars

Qwen3 TTS Flash

Alibaba

33.3

—

1036

Speed 10s—

Kokoro 82M v1.0 (Open Weights)

Kokoro

32.5

#21

1056

1025

Speed 3.45s$0.7/1M chars

Hume AI Octave TTS

Hume AI

31.3

#31

1016

1091

Speed 20s$93.8/1M chars

Cartesia Sonic English (Oct 2024)

Cartesia

28.6

#25

1045

—

Speed 7.69s$46.7/1M chars

Google Chirp 3: HD

Google

26.2

#26

1043

—

Speed 8.33s$30.0/1M chars

Amazon Polly Generative

Amazon

20.2

#22

1051

#10

978

Speed 0.889s$30.0/1M chars

Google Journey

Google

16.7

#28

1032

—

Speed 33.33s$160.0/1M chars

Deepgram Aura v2

Deepgram

11.1

—

1003

Speed 19.9s—

MiniMax T2A-01-Turbo

MiniMax

9.5

#30

1021

—

Speed 4.55s$30.0/1M chars

Methodology

Each source uses preference data to estimate skill scores. We map ranks to percentiles and average where a model appears on multiple lists. The bar in the Consensus column is green; purple, rose match Artificial Analysis, Design Arena columns. Speed is approximate time to the first image.

Primary sources

FAQ

Answers below use the same snapshot as the table above (as of 2026-04-07, 35 models). Figures are from our export, not live pages at Artificial Analysis, Design Arena—those sites may have moved on since we built this snapshot. The Consensus column is our average of percentile ranks across benchmarks where each model appears.

We use each model's released field from the export. Among rows with a parseable date, the newest in this snapshot include: MiniMax Speech 2.8 HD (2026-02-01); MiniMax Speech 2.8 Turbo (2026-02-01); ElevenLabs v3 (2026-02-01).

By default we sort by Consensus, so Inworld TTS 1.5 Max leads this snapshot at 100.0 (average percentile across benchmarks where the model appears). By Elo in the Artificial Analysis column alone, Inworld TTS 1.5 Max is highest at 1217. “Best” still depends on price, latency, and which benchmarks you care about—use the sortable table.

The Elo values below are the Artificial Analysis numbers in this export (2026-04-07), not necessarily what you see on Artificial Analysis today:

Inworld TTS 1.5 Max — Elo 1217
ElevenLabs v3 — Elo 1177
Inworld TTS 1 Max — Elo 1173
Inworld TTS 1.5 — Elo 1168
MiniMax Speech 2.8 HD — Elo 1162

Text-to-speech (TTS) models take written text and synthesize spoken audio. Our table aggregates TTS-specific leaderboards from the sources named in the header. Voice cloning, speech-to-speech, and speech-to-text (ASR) systems live on different benchmarks and are not included here.

Each upstream source runs preference tests and publishes ranks or scores. We map those to percentiles within each benchmark, then average across benchmarks where a model appears—that is the Consensus column (see Methodology above the FAQ). Per-source columns show the ranks and scores stored in our snapshot for Artificial Analysis, Design Arena. To change upstream leaderboards, participate on those sites; our table updates when we refresh the export.

We flag open-weights rows from the export name suffix “Open Weights”. By Artificial Analysis Elo in this snapshot, the highest are:

Kokoro 82M v1.0 (Open Weights) — Elo 1056

Treat naming as a signal only—confirm license terms with each provider before production use.

Elo in our table is the value from the snapshot for the Artificial Analysis column (and similar skill estimates elsewhere). Margin of error / intervals (e.g. in CI columns) come from that same export. For how Artificial Analysis computes Elo from votes, see their methodology; our numbers stay fixed until the next snapshot refresh.

Text to Speech Model Leaderboard

Methodology

Primary sources

FAQ

Which models have the newest release dates in this snapshot?

Which model is “best” on this page?

What are the top models by Elo in this snapshot?

What counts as a text-to-speech model?

How are models ranked on this page?

Which open-weights models lead in this snapshot?

What are Elo and margin of error?