tạo prompt tự động, làm được video dài, nhân vật cố định
Như thí dụ bên trên, mình kêu Gemini tạo ra cho mình 3 prompt tương ứng với 3 Scene cho một kịch bản video ngắn kinh dị, hài hước về một trường học có ma. Mình miêu tả cho nó phong cách, các nhân vật cơ bản, bối cảnh,…
Sau khi đã có prompt, chúng ta bắt đầu vào Flow (LINK) để tạo video bằng Veo 3. Các bạn nhớ chọn Từ văn bản sang video, ở tùy chọn Setting bên phải, nhớ chọn số video ra là 1 và chọn model cao cấp nhất là Veo 3 nha. Xong, giờ paste vào ngồi đợi thôi.
Và đây là kết quả
Nhưng nếu bạn muốn tạo video dài hơn 8 giây, với nhiều cảnh và nội dung phức tạp hơn thì sao? Rào cản ở đây chính là hiện tại tính năng dùng một hình ảnh làm tham chiếu chỉ mới hỗ trợ tạo bằng Veo 2, nghĩa là chỉ có video mà không có tiếng. Còn Veo 3 thì chưa hỗ trợ tính năng này. Dù vậy, chúng ta vẫn có thể dùng prompt để tối đa hóa tính ổn định của nhân vật qua mỗi lần gen từ text ra video khác nhau. Cách làm ở đây là mình sẽ chia nhỏ kịch bản ra thành các prompt con, mỗi prompt tương ứng với một đoạn video 8 giây. Tuy nhiên, phần mô tả nhân vật chính và đặc điểm môi trường sẽ được làm kỹ riêng ra, sau đó sẽ gắn phần mô tả này vào mỗi prompt cảnh quay để cố gắng tạo ra các nhân vật giống nhau.
Bên dưới là một video mình dùng 6 cảnh khác nhau, tổng thời lượng 48 giây áp dụng cách làm này. Prompt mình để bên dưới cho bạn nào quan tâm nha.
Có thể thấy là mặc dù đã ép model tạo các video khác nhau với nhân vật giống nhau bằng prompt nhưng có vẻ vẫn xuất hiện lỗi, có sự khác biệt về giọng nói, hình ảnh khuôn mặt của nhân vật qua mỗi lần tạo. Cơ bản thì video chỉ dừng lại ở mức tạm chứ chưa thể hoàn hảo được. Đối lại, nếu chúng ta sử dụng tính năng dùng một frame trong video để làm tham chiếu cho video gen ra tiếp theo thì chất lượng video sinh ra sau sẽ nhất quán hơn, nhưng lại không có âm thanh. Tất nhiên dây chỉ là beta nên chắc chắn, ít hôm nữa Google họ sẽ update để cho phép chúng ta dùng tính năng này. Khi đó thì việc tạo video dài với nhân vật nhất quán sẽ đơn giản hơn rất nhiều.
Prompt cho video ngắn trên cho bạn nào muốn thử:
Character & Setting Details (Recap):
SPECS: Camera: ARRI Alexa Mini, Lens: Cooke S4/i 32mm.
NO CAPTIONS OR TEXT.
TEACHER (as described above) stands with imposing calm behind the worn wooden lectern, his posture erect as he surveys the sparsely filled room. CLASSROOM (as described above). The single fluorescent tube flickers erratically, casting long, uneasy shadows from the bizarre diagrams and the few students.
WIDE SHOT: Establishes the entire strange classroom: TEACHER at the lectern, the eerie wall decor harshly illuminated, and a few apprehensive STUDENTS scattered at old wooden desks. The sheet-draped mannequin is a silent, unsettling figure in a darker corner. MEDIUM CLOSE-UP: On the TEACHER. His face is partly in shadow due to the overhead light, his sunglasses reflecting the flickering room. A tiny, almost imperceptible smirk is present.
DIALOGUE: TEACHER (authoritative, clear Southern Vietnamese accent, translate to Vietnamese and say): “Welcome to ‘Survival When Encountering Supernatural Entities.’ First lesson: When a ghost scares you, absolutely do not scream.”
A loose ceiling tile visibly trembles, dislodging a small shower of dust. AUDIO: Persistent low electric hum and occasional sharp CRACKLE from the fluorescent light, the faint sound of falling dust, Teacher’s distinct voice. KEY ELEMENTS: Mysterious teacher, detailed eerie classroom, supernatural rules, unsettling atmosphere, strong opening.
Scene 2 Prompt
SPECS: Camera: RED Komodo, Lens: Zeiss Supreme Prime 29mm.
NO CAPTIONS OR TEXT.
TEACHER (as described above) pauses, letting his first rule sink in, then offers his peculiar reasoning. STUDENT A (20s, wearing a simple, slightly worn university jacket, looking genuinely anxious and pale) slowly raises a trembling hand. CLASSROOM (as described above). The flickering light seems to pulse, making the shadows writhe. One of the jars on a high shelf appears to subtly rattle.
MEDIUM SHOT: TEACHER leaning slightly over the lectern, his gloved hands (if wearing them, otherwise bare) pressing down on its surface as he explains. He then turns his head with deliberate slowness towards STUDENT A. CLOSE-UP: STUDENT A’s face, eyes wide with a mixture of fear and morbid curiosity. They gulp audibly before asking their question.
DIALOGUE: TEACHER (grave, Southern Vietnamese accent, translate to Vietnamese and say): “Why? Because it will make you… hoarse! Very bad for the vocal cords!” STUDENT A (timid, voice slightly shaky, translate to Vietnamese and say): “Teacher, if a ghost grabs my leg, what should I do?”
The fluorescent light emits a prolonged, louder BUZZ, then briefly dims before returning to its erratic flickering. AUDIO: Teacher’s emphatic voice, Student A’s hesitant voice, the distinct, prolonged BUZZ and dimming of the light fixture. KEY ELEMENTS: Dark humor, absurd logic, student interaction, building tension, consistent eerie environment, sensory details.
Scene 3 Prompt
SPECS: Camera: Sony Venice, Lens: Panavision Primo 40mm.
NO CAPTIONS OR TEXT.
TEACHER (as described above) straightens up from the lectern, a subtle shift in his posture suggesting he relishes this question. He steps out to the small open space before the desks. CLASSROOM (as described above). The shadows cast by the TEACHER elongate and distort dramatically as he moves. The mannequin in the corner seems, for a split second, to have its head tilted.
MEDIUM LONG SHOT: TEACHER, now center stage in the small clearing, addresses the class. He then begins his demonstration with surprisingly fluid and precise hand gestures, miming the act of tickling empty air with intense focus. CLOSE-UP: On the TEACHER’S face (from the nose down, sunglasses still prominent) showing the serious, almost scientific concentration he applies to the tickling mime.
DIALOGUE: TEACHER (assured, a hint of theatricality, Southern Vietnamese accent, translate to Vietnamese and say): “Very simple! Immediately… cù lét lại nó! Ma cũng biết nhột như ai thôi! Đảm bảo nó sẽ buông ra ngay và cười không nhặt được mồm!” (Accompanies this with the vigorous, precise tickling mime).
A faint, dry, rustling sound, like laughter made of dead leaves, is heard from a dark corner of the room. AUDIO: Teacher’s confident and slightly playful voice, the swish of his suit fabric, the unsettling, dry rustling laughter. KEY ELEMENTS: Physical comedy, absurd solution, Teacher’s unwavering bizarre confidence, unsettling subtle sound, focused character action.
Scene 4 Prompt
SPECS: Camera: Canon C300 Mark III, Lens: Canon CN-E 35mm T1.5.
NO CAPTIONS OR TEXT.
STUDENT B (20s, dressed in a dark, faded band t-shirt, arms crossed initially, now leaning forward on their desk with a challenging glint in their eye) interjects. TEACHER (as described above) turns smoothly to face Student B, listening with polite, unwavering attention. CLASSROOM (as described above). The minimal light from the grimy windows is now almost non-existent. The room is increasingly dependent on the single, failing fluorescent bulb.
MEDIUM SHOT: STUDENT B delivering their question with a clear, skeptical tone. The TEACHER stands patiently, his silhouette framed against a particularly grotesque diagram on the wall. MEDIUM CLOSE-UP: TEACHER, offering a slow, deliberate nod to Student B. His sunglasses reflect the student’s challenging face. A slight, almost condescending smile touches his lips before he speaks.
DIALOGUE: STUDENT B (challenging, firm voice, translate to Vietnamese and say): “Còn nếu ma hiện hình mặt đầy máu me thì sao thầy?” TEACHER (smooth, unperturbed, a tone of explaining something obvious to a child, Southern Vietnamese accent, translate to Vietnamese and say): “À, trường hợp này cần sự tinh tế. Hãy nhẹ nhàng hỏi: ‘Anh/chị ơi, mình xài app filter gì mà ‘real’ quá vậy? Chỉ em với!'”
A distant, mournful howl (dog or something more ambiguous) echoes from outside the building. AUDIO: Student B’s challenging voice, Teacher’s smooth, condescendingly patient tone, the distant, mournful HOWL. KEY ELEMENTS: Student challenge, more satirical advice, escalating absurdity, Teacher’s unshakable composure, ominous external sounds.
Scene 5 Prompt
SPECS: Camera: Panasonic Lumix S1H, Lens: Leica SL 50mm f/1.4.
NO CAPTIONS OR TEXT.
TEACHER (as described above) pushes off lightly from the lectern he had momentarily leaned against, beginning a slow, deliberate pace across the front of the classroom, addressing all students. CLASSROOM (as described above). The room is now very dim. The flickering fluorescent light casts stark, moving shadows, making the eerie diagrams seem to writhe on the walls. The air feels colder.
MEDIUM SHOT: TEACHER pacing, his dark suit making him almost blend into the deeper shadows at the edge of the light’s reach, then re-emerging. He makes a sharp, decisive zigzag motion with his hand as he speaks. CLOSE-UP: On a student’s notebook, where they have shakily scrawled “CHẠY ZÍC ZẮC???” next to a crude drawing of a ghost.
DIALOGUE: TEACHER (voice now brisk and commanding, a shift in energy, Southern Vietnamese accent, translate to Vietnamese and say): “Và nhớ nhé, khi bị ma rượt, đừng chạy đường thẳng! Hãy chạy theo đường ‘zíc zắc’. Ma nó chóng mặt là nó bỏ cuộc ngay!”
The building groans, a deep, structural sound, as if settling or under strain. AUDIO: Teacher’s firm, instructive voice, the sound of his footsteps on the old floorboards, the deep GROAN of the building. KEY ELEMENTS: Further absurd advice, building atmosphere, dynamic movement of Teacher, tangible student reaction (notebook), sense of environmental instability.
Scene 6 Prompt
SPECS: Camera: Blackmagic Pocket Cinema Camera 6K Pro, Lens: Sigma Cine 35mm T1.5.
NO CAPTIONS OR TEXT.
TEACHER (as described above) stops his pacing directly under the weakest point of the flickering fluorescent light. He clasps his hands behind his back, assuming a formal, almost final stance. CLASSROOM (as described above). The room is steeped in gloom. The faces of the students are pale and wide-eyed in the unsteady light. The mannequin seems to have its head turned directly towards the Teacher.
CLOSE-UP: On the TEACHER’S face, specifically his mouth and the lower rim of his sunglasses. His expression is serious, almost grave, as he delivers the homework. The failing light flickers intensely across his features. EXTREME CLOSE-UP: The filament inside the fluorescent tube sputtering violently, glowing erratically.
DIALOGUE: TEACHER (tone becoming slightly more conspiratorial, yet firm, Southern Vietnamese accent, translate to Vietnamese and say): “Bài tập về nhà: Tối nay mỗi người tự tắt đèn ở một mình 15 phút, nếu có gì ‘vui’ thì mai lên chia sẻ kinh nghiệm.”
The fluorescent light emits a final, loud POP and ZAP, then DIES COMPLETELY, plunging the room into absolute darkness. A collective, sharp GASP from the students. AUDIO: Teacher’s distinct voice delivering the ominous homework, the loud POP and ZAP of the light, the collective student GASP, followed by sudden, heavy silence and perhaps a single, terrified whimper. KEY ELEMENTS: Ominous homework assignment, dramatic lighting failure, cliffhanger ending, heightened sensory impact (sound and sudden darkness), peak suspense.