Generating realistic and interactive dynamics of traffic
participants according to specific instruction is critical for
street scene simulation. However, there is currently lack of a
comprehensive method that generates realistic dynamics of
different types of participants including vehicle and pedes-
trian, with different kinds of interactions between them. In
this paper, we introduce ChatDyn, the first system capa-
ble of generating interactive, controllable and realistic par-
ticipants dynamics in street scene based on language in-
structions. To achieve precise control through complex lan-
guage, ChatDyn employs a multi-LLM-agent role-playing
approach, which utilizes natural language inputs to plan the
trajectories and behaviors for different traffic participants.
To generate realistic fine-grained dynamics based on the
planning, ChatDyn designs two novel executors: the PedEx-
ecutor, a unified multi-task executor that generates realistic
pedestrian dynamics under different task plannings; and the
VehExecutor, a physical transition based policy that gener-
ates physically plausible vehicle dynamics. Extensive ex-
periments show the realistic generation results under com-
plex commands, and validate the effectiveness of the com-
ponents.