In phonetics, a diphone is an adjacent pair of phones in an utterance. For example, in [daɪfəʊn], the diphones are [da], [aɪ], [ɪf], [fə], [əʊ], [ʊn]. The term is usually used to refer to a recording of the transition between two phones.
In the following diagram, a stream of phones are represented by P1, P2, etc., and the corresponding diphones are represented by D1-2, D2-3, etc.:
If the number of phones in a language is P, the theoretical number of possible diphones is P2. However, since all languages have restrictions about what sounds can occur next to each other (see phonotactics), the number of diphones in each language is usually much smaller than P2.
Diphones are useful in speech synthesis. When pre-recorded diphones are combined to create synthesized speech, the resulting sounds are much more natural than just combining simple phones. That is because the pronunciations of each phone varies based on the surrounding phones.