The human voice organ fits in a small space having a characteristic length of ~cm. Large amounts of complex physical phenomena combine in it so as to produce sounds. Despite of the reduced dimensions of the voice organ, however, a complete numerical simulation of its physics is still out of reach, even when using massively parallel supercomputers. This has led researchers to split the problem of voice generation into parts, independently focusing for instance, on simulating the self-oscillation of the vocal folds to generate the glottal pulse, the propagation of acoustic waves in moving vocal tracts to produce diphthongs, or the diffraction of the glottal jet pressure by the teeth, which results in fricative sounds. In this workshop, a review will be given of the type of equations and difficulties encountered when trying to solve these type of phenomena and show that, under some assumptions, the first unified simulations coupling the mechanics, aerodynamics and acoustics of the vocal folds and vocal tract, may not be as far as one might think. A workflow from 3D biomechanical models, to the generation of vocal fold self-oscillations, flow and acoustic waves to the radiated sound may be feasible in the short term.