PicoAudio2 Online Inference
Definition
TCC (Temporal Coarse Caption):
A brief text description for the overall audio scene.
Example: a dog barks
TDC (Temporal Detailed Caption):
A caption with timestamp information for each event.
It allows precise temporal control over when events happen in the generated audio.
Example: a dog barks(1.0-2.0, 3.0-4.0); a man speaks(5.0-6.0)
Input Requirements & Format
- TCC is required for audio generation.
- TDC is optional. If provided, it should follow the format:
event1(start1-end1, start2-end2); event2(start1-end1, ...) - Length (in seconds) is optional, but recommended for temporal control. The length defaults to 10.0 seconds.
- Enable Time Control: Tick to use TDC and length for precise event timing.
Notes
- If TDC format is incorrect or length is missing, the model will generate audio without precise temporal control.
- For general audio generation, it is recommended to input
randomfor TDC. - You may leave TDC blank to let the LLM generate timestamps automatically (subject to API quota).