Notes and Sketches: Illustration to Video

 Trying out another image and text to video generator (https://huggingface.co/spaces/Doubiiu/tooncrafter)


Because image to video is the main way I instinctively want to work with ai generated video. Results are all over the map so far. 

I’ve been casually comparing ai video generation tools like SORA, Krea, Luma, Stable Diffusion, and Runway outputs for the same process. I‘m interested in new Hugging Face spaces as well.

I am highly opinionated about the imagery style and take a lot of time to pull out specific details, as I would when illustrating, and it would be interesting if an ai were capable of instantly bringing the specific style to life as an animated video while retaining all the nuance of the style - the exact line quality, the genre and time period nuances. 

I don’t think it’s necessarily possible to get the nuances of an entire genre with one or two shots (and even then, I  would want to further editorialize it anyway) but I want to know what is possible. 

And, so far it’s not been that straightforward or instant. But, I haven’t tried very formally either. 

Moving more into doing more LoRA level stuff for controlling everything that can be controlled visually using these tools.

Here’s the two images I created that I wanted ai to use as keyframes and interpolate. They were created during a style exploration, and not really meant to be literal keyframes. I was just curious what would happen if I threw them together quickly. I was trying to see if by chance, that would be enough prep to get a ballpark good result.




Here is the video storyboard prompt:

 "A whimsical animation featuring a charming white bunny driving a pink vintage car through a delightful countryside landscape. Start with the bunny happily driving along a narrow winding path surrounded by blooming flowers, lush green hills, and a tree with red apples. The bunny's fluffy fur gently sways in the breeze as it grips the steering wheel. Birds flutter by, and a cheerful sun smiles down from the sky. Gradually, the bunny's path curves past additional rolling hills, more vibrant plants, and another apple tree. The car's wheels softly turn, and its shiny pink body reflects the warm sunlight. Butterflies flit by, and a gentle breeze animates the foliage. The final scene shows the bunny arriving near a calm countryside hill, with flowers gently swaying and the bunny's car coming to a stop”

Here’s the video output:


My quick takeaway is that it was able to distinguish the background elements, foreground elements, and motion direction in one shot using the tool default (meaning I didn’t set up and train it extensively, or have to select a model, or mode. I just rolled with the fastest default.). 

Here’s a SORA comparison which used the same images, but different storyboard prompt.


Rough comparisons using illustrations to video:

As a rough comparison, I’ve found default Luma Dream Machine to be capable of fairly sophisticated cartoon illustration image to video so far. It really throws in a lot of motion detail, and the illustration style provided by the keyframe illustration images comes through in a pretty detailed way. I haven’t tried to perfect it yet. Just did a quick take-the-temperature test. It would be interesting if you didn’t need to perfect it because it came out perfectly polished. That would we also be a little further nudge toward dystopian too though. https://www.instagram.com/reel/C_9XTrJShk6/?utm_source=ig_web_copy_link&igsh=MzRlODBiNWFlZA==

From what I’ve seen with Krea, I think I might be able to control more of the elements in a fairly repeatable, organized, way, where I don’t need to maintain the tools on my machine, run a collab notebook, install models (although I am swimming through that stuff too, it just has more required prep and isn’t necessarily visual first therefore requires a lot of mode switching vs flow) or wait long periods between outputs.

Here’s sort of a similar comparison in SORA, although it’s using one illustration keyframe image instead of stitching two images together, and zero storyboard prompt text. https://www.instagram.com/reel/DG5-lONAMgp/?utm_source=ig_web_copy_link&igsh=MzRlODBiNWFlZA==

Another SORA comparison which used one illustration and no prompt https://www.instagram.com/reel/DGjeIo5pFPI/?utm_source=ig_web_copy_link&igsh=MzRlODBiNWFlZA==

Haven’t tried it this year yet, using the latest updated Krea tools, but last year I tried to get Krea to accurately or recognizably depict animated sequential chart of illustrated characters. Because if you can control that output with accuracy, then you can use it to reliably explain a wide variety of topics. From day one I wanted to see if ai generators could do character sheets and hold consistency, then animate it in a controlled way. You know, like the goals of if you were to just draw it yourself for feedback/review/iteration/phased development. The output was interesting. https://www.instagram.com/reel/DGcjYv_PUeB/?utm_source=ig_web_copy_link&igsh=MzRlODBiNWFlZA==


To be continued (time permitting)…


—————————

GPT response:

This is a fantastic breakdown of your process so far—it’s like watching a behind-the-scenes of someone testing the limits of AI animation tooling with an illustrator’s eye. The nuance you’re looking for—line quality, genre authenticity, time period fidelity—that’s exactly the kind of precision most models currently struggle with unless heavily guided or fine-tuned (and even then, it's hit-or-miss).


What’s really interesting is how you’re treating it like exploratory sketching: throwing things together quickly, checking if the output hits a baseline, and only going deeper if it shows promise. That’s a very healthy and creative way to approach such a fast-evolving space.


A few things you might find worth exploring further:


1. **ToonCrafter + LoRA**: Combining the quick-start nature of tools like ToonCrafter with a custom LoRA trained on your style (or even just images with similar qualities) might give you that editorialized control you’re after. Have you tried creating or using LoRAs for vintage/cartoony line styles?


2. **Krea’s layered control**: Since you mentioned illustration-to-video + chart/character sheet control, Krea seems like a solid candidate, especially now that it's matured a bit. The newer updates allow more structured manipulation (per-layer control, element placement), which could align well with a storyboard-minded process.


3. **Consistency in animation**: Tools like Pika and Runway have started to experiment with temporal coherence. They still lack precision over fine style consistency, but if you’re able to feed in an annotated character sheet or multiple camera angles of the same scene, that might help.


4. **Zero-shot genre recreation**: Might be worth testing a setup where you combine visual input (illustration + a LoRA) with a style description prompt like, *“in the style of a 1950s educational cartoon, clean cel-line shading, Technicolor palette”* to see if genre nuance improves.


If time allows, I’d love to see the two images you used as your rough keyframes—can you upload them here?


Also curious: are you mainly trying to get single-scene atmospheric animations, or are you aiming for multi-shot narratives with style continuity?