Notes: Trained style-specific illustration to video exploration continues

(Originally posted on Instagram: https://www.instagram.com/reel/DHut7_3RHCf/

This one is Krea video from image using Pika model.

Interesting result. It got the overall idea. The pattern adherence of the dress is interesting. The pace is interesting. I should create some kind of coherence matrix to rate each of the standard elements. Give each a quality scale. I’m sure that exists. (Update: I created it. Here's a Coherence Matrix to structure quality criteria that can hypothetically become trigger threshold values in an agentic pipeline.)

It chose to put theme over character consistency (accurate frame by frame character geometry). That’s interesting. It makes sense - as it was created in one shot, with a pre-determined setting. This particular model Pika 2.2 has edit options that other models don’t have. So, I chose it for that reason.

If I really wanted to spend more time on it, there’s an infinite number of ways to work with it to improve the coherence - clip and re-draw, train a style specifically for the video mode, etc, etc. There’s a lot of ways to get more cohesive results in terms of exploring ai capabilities.

And, I could always create it the traditional way - like storyboard it (creating the visual language that encapsulates all the qualities that convey childhood wonder, joy, gleefulness, finding the perfect blend of naive cartoon style with color technique to pull out the right mood), develop character turns, sketch it on paper, draw it in Illustrator (actually this part is usually a combo of tools and artist-specific, computer-specific, organically created and invested in over many years by the artist), texture it, iterate to create each frame, and animate it in After Effects (or similar). Also time intensive, but controllable, repeatable, and therefore reliable.

My overall goal for these ai tests has been to deep dive into the landscape of ai visuals, to compare a lot of methods against the non-ai techniques - looking at output quality, time, cost, code vs no-code, etc.

I know there's various leaderboards for ai models. The criteria and stats around that are interesting. And, now with agentic CUA and MCP, you could in theory, with enough api keys for the right web-based tools (and trial and error) chain ai tools together to execute automatically whatever set of things you want.

Starting to feel a bit better about ai visual deep dives, and all the cases I’m testing, since ai copyright guidelines are starting to ever so slowly recognize that there are many more nuanced use cases where ai may have been a seed, but significant human directed changes were made, or parts of it can be human created.(https://44037860.fs1.hubspotusercontent-na1.net/hubfs/44037860/Invoke-First-Copyright-Image-AI-Generated-Material-Report.pdf)

Operator mode from Openai, Computer Use mode from Anthropic, various python tools, open source tools, Crew, etc... these can go to a site, screen shot a page, analyze it using vision, decide what coordinates to apply a cursor action to, and apply that action (I'm skipping over a lot here.) Operators and agents.

I wonder what things like this, for example Manus.im, can do by itself. I'm currently testing Manus.im. If given the right use case prompt. I wonder what it can do automatically. It decides the agents and creates them to fulfill the request, so it’s like creating agents but automated.

Could it figure out a way to draw like this using brushes and no seed images? That would be a little crazy.

>>>

(This section/theoretic details added post-prompt.)

I’ll likely know the answer soon.

Would it decide to train a model or be able to know that it could create safetensor files?

Would it just go find some open source tools, models, LoRAs that fit together so perfectly that they automatically become a reusable, reliable creative choice?

Could it, as part of a MCP agentic stack system figure it out?

I haven't seen an example. I looked into a lot of the components of this idea to gauge the landscape a bit - like open source things it could potentially invoke, existing web-based illustration apps that it could decide to access, apis for a variety of creative illustration software it could potentially decide to access and, say, write scripts for to create layers, add shapes, symbols, font symbols, svg code (scene objects, icons, specific illustration styles, textures, patterns, characters, animations) export to other art tools, maybe it could auto trace using brushes, or style transfer, export sequences or.m4, pass to the next agent in the flow, which could be a visual evaluator of the previous creative agent output. It could send it back through the workflow automatically if it decides the quality is not met.

All creative dev theory and hypothesis so far. I do have the initial quality matrix created, the scale to use to rate things, broken down by each agent in the creative pipeline. Testing and iteration is underway.

I’ve seen auto brushstrokes, but agentic brush strokes I haven't seen yet.

Quality is always going to be the goal. As we all know, just because ai generated an image, doesn’t mean it’s the right one , or accurate or high quality, or aesthetically accurate, or meets the need. That takes human iteration.

But, if Manus.im lowered the bar to certain complex agentic workflows for different categories like productivity and instructional and code… then it’s probably going to be able to invoke some kind of creative process to make visual elements. That is my going hypothesis.

I’m curious to push on that and see exactly what that means. I can think of hundreds of ways agents and tools can orchestrate creative processes/pipelines/workflows - the trigger thresholds and recursive behabiors working towards a specific aesthetic output. So, I have a few experiments going to inform that understanding a bit.

Agentic brushstrokes - that api/tool probably exists. Maybe python in Krita? No, that’s a desktop app. Perhaps computer use mode in Claude? I will have to see.

Straight forward image generation, yes, but not deeply artistic brushstrokes painting - agent invoked auto painting.

<<<<

So, could it just find and leverage an existing web based video or image generator and return a fully coherent video? It doesn't seem far fetched.

(Pasted as a prompt into GPT as is, in order to get a default response.)

Gpt response:

You’re articulating the edge of something that’s definitely happening—this frontier between human-authored visual storytelling and agentic systems capable of creative orchestration. Your observation about Pika choosing theme over strict frame-by-frame consistency is a perfect example of how these systems are optimizing for narrative feel rather than technical coherence. That’s a very human-like prioritization.

A few thoughts sparked by your reflections:

1. Coherence Matrix:

What you described—rating standard elements like character geometry, pacing, visual consistency, theme retention, motion fluidity, texture fidelity, etc.—is brilliant. You could define it like a framework:

Element	Description	Score (1–5)
Character Consistency	Facial/pose/geometry continuity across frames
Style Adherence	Color, line, and shape language matching reference style
Motion Believability	Natural motion transitions, believable speed and pacing
Scene Coherence	Environmental logic, object persistence
Emotional Fidelity	Mood/tone match to intent (e.g., joy, wonder)
Thematic Unity	Symbolic/narrative cohesion

T

his could become a standardized rubric across your AI tests, a way to catalog and chart model behavior objectively.

2. Agentic Artistic Systems:

You’re right: we’re still not seeing fully agentic “brushstroke intelligence.” A model like Manus (or an agent system built on Crew/OpenAI Operator/Anthropic CUA) could, in theory, create a system that:

1. Searches for LoRAs/ControlNets/safetensors based on a visual goal.

2. Chains tools like Krea, Pika, SD video, and post-processing pipelines.

3. Generates or “paints” without seeds, maybe by understanding how brushes behave (through simulation or training on stroke data).

4. Trains/adapts a style token or LoRA using intermediate iterations.

5. Chooses tools for each step—using code, scraping UIs, or plugging into APIs.

6. Outputs a video/artwork that is not just coherent, but stylistically intentional.

We’re on the cusp—but most systems are still just cobbling together outputs based on user input, not making choices with visual nuance.

3. A Future Agent Stack (Just for Fun):

If you were to design your own agentic art studio, it could look like this:

• Goal Prompt: “Create a 15-second whimsical animation in the style of 1950s children’s books evoking joy and curiosity.”

• Agents:

• Stylist: Searches and selects LoRAs or trained styles.

• Storyboarder: Generates key frames + character turns.

• Animator: Chooses tools for tweening or frame interpolation.

• Editor: Enhances pacing, color grading, and timing.

• Critic: Runs the output through your Coherence Matrix and adjusts based on score.

The wild part? You could orchestrate all of this with tools like LangChain, Crew, or AgentOps—layering in GUIs like AutoGen Studio or running it headlessly.

4. Traditional vs AI Hybrid:

The comparison you’re doing—AI vs traditional—is exactly what will define the creative workflows of the next decade. And your intuition is right: the AI pipeline is infinitely hackable but unstable, while the traditional pipeline is predictable but expensive and time-consuming. The hybrid model (where you build frameworks, prompt styles, feed references, then refine manually) is probably the future for solo creators and small teams.

Do you want help building out that coherence matrix as a framework or a visual scoring template? Could be useful to formalize it.

Further prompt:

Yes create that if it doesn’t exist. If it does exist then give me that standard one. I want one that maps the quality criteria for ai models to the expert criteria used by professional Disney animation studios.

Answer from GPT:

Certainly! To assess the quality of AI-generated animations using criteria established by professional Disney animation studios, we can develop a Coherence Matrix that integrates Disney’s renowned 12 Principles of Animation with specific evaluation metrics for AI models. This matrix will facilitate a structured comparison between AI-generated content and traditionally animated works, focusing on aspects crucial to Disney’s animation quality.

Coherence Matrix for Evaluating AI-Generated Animations:

Principle	Description	AI Evaluation Focus	Quality Scale (1–5)
Squash and Stretch	Provides a sense of weight and flexibility to objects, enhancing realism.	Assess if AI-generated animations exhibit appropriate deformation during motion to convey mass and flexibility.
Anticipation	Prepares the audience for an action, making movements more believable.	Evaluate whether AI animations include preliminary movements that signal forthcoming actions, enhancing predictability and engagement.
Staging	Directs the audience’s attention to the most important elements in a scene.	Determine if AI effectively highlights focal points within a scene, ensuring clarity and emphasis on key actions or characters.
Straight Ahead Action and Pose to Pose	Two approaches to animation: ‘Straight Ahead’ involves frame-by-frame creation for fluidity, while ‘Pose to Pose’ focuses on keyframes for structure.	Analyze the AI’s approach to scene creation—does it maintain consistency and fluidity (Straight Ahead) or ensure strong, well-defined poses (Pose to Pose)?
Follow Through and Overlapping Action	Ensures that different parts of a character or object move at different rates, adding realism.	Check if AI animations account for secondary motions, such as clothing or hair continuing to move after the main action has stopped, to enhance authenticity.
Slow In and Slow Out	Actions start and end slowly, with faster motion in between, creating natural movement.	Assess whether AI-generated motions have appropriate acceleration and deceleration, avoiding mechanical or abrupt movements.
Arc	Most natural actions follow an arched trajectory, contributing to fluidity and realism.	Evaluate if AI animations incorporate arcing motions in movements like arm swings or head turns, rather than linear, robotic paths.
Secondary Action	Adds supplementary actions to support the main action, enriching the scene.	Determine if AI introduces complementary movements that enhance the primary action without causing distraction, such as a character’s facial expressions or environmental interactions.
Timing	Refers to the number of frames per action, affecting the perception of speed and emotion.	Analyze if AI-generated animations utilize appropriate timing to convey the intended mood, weight, and personality of actions, ensuring neither rushed nor sluggish movements.
Exaggeration	Deliberate overstatement of actions or features to enhance appeal and clarity.	Assess whether AI applies exaggeration effectively to emphasize key actions or emotions, making the animation more engaging without compromising believability.
Solid Drawing	Considers the three-dimensionality and weight of characters and objects.	Evaluate if AI-generated characters and objects maintain consistent proportions, volume, and perspective throughout the animation, avoiding flat or distorted appearances.
Appeal	Ensures that characters and actions are engaging and captivating to the audience.	Determine if AI animations produce characters and scenes with aesthetic appeal, charisma, and clear personality traits that resonate with viewers.

Quality Scale:

• 1: Poor – Significant deficiencies in applying the principle.

• 2: Fair – Inconsistent application; noticeable issues.

• 3: Good – Adequate application with minor shortcomings.

• 4: Very Good – Strong application with minimal flaws.

• 5: Excellent – Exemplary application of the principle.

This matrix serves as a comprehensive tool to systematically evaluate AI-generated animations against Disney’s time-honored animation principles. By assigning scores across these criteria, one can identify areas where AI excels or requires improvement, facilitating a detailed comparison with traditional animation techniques.

While specific standardized frameworks for assessing AI-generated animations in alignment with Disney’s principles are not widely documented, ongoing research explores methodologies for evaluating AI-generated content. For instance, studies have investigated perceptual quality assessments for AI-generated images and videos, emphasizing the need for objective evaluation metrics .

By integrating these insights with the established principles of traditional animation, this Coherence Matrix provides a structured approach to assess and enhance the quality of AI-generated animations, bridging the gap between innovative AI techniques and the artistic standards set by industry pioneers like Disney.

Your exploration of AI-generated video from illustrations using the Pika model is fascinating. The results you've described highlight both the capabilities and limitations of current AI video generation technology.

Analysis of AI-Generated Video

Strengths:

Overall idea capture
Interesting pattern adherence in the dress
Unique pacing

Weaknesses:

Character consistency frame-to-frame

The model's choice to prioritize thematic consistency over character geometry is indeed interesting and likely a result of its one-shot generation process with a predetermined setting5.

Improving Coherence

To enhance the results, you could:

Clip and re-draw specific sections
Train a style specifically for video mode
Iterate using Pika's unique edit options4

Traditional vs. AI Approach

Your comparison of AI-generated content to traditional animation processes is insightful. The traditional method offers more control and reliability for deliverables but is time-intensive. AI methods provide rapid iteration but may lack fine-grained control67.

Evaluating AI Models

Creating a coherence matrix with quality scales for standard elements is an excellent idea. This aligns with existing practices in AI model evaluation, where various leaderboards and benchmarks assess different aspects of model performance8.

Agentic AI and Tool Chaining

The potential for chaining AI tools using agentic CUA and MCP is exciting. This could theoretically automate complex workflows, combining various AI capabilities to execute sophisticated tasks5.

Future Possibilities

Regarding your questions about more advanced AI capabilities:

Agentic brush strokes: While not widely available, research in this direction is ongoing9.
Model training and tool selection: Advanced AI systems might eventually be able to select or even create appropriate models and tools for specific tasks, but this level of meta-learning is still in development10.
Coherent video generation: Current AI can generate short video clips, but fully coherent, longer videos remain a challenge. However, rapid progress in this field suggests improvements are likely in the near future37.

Your deep dive into AI visuals, comparing various methods against traditional techniques, is valuable research. As AI technology continues to evolve, it will be fascinating to see how it transforms creative processes and workflows in the visual arts industry12.

Citations:

Answer from Perplexity

Helpful Definitions:

Krea: An AI-powered video generation tool that can create animations from still images. Krea.ai

Pika model: A specific AI model used for generating videos, known for its editing capabilities.

Coherence matrix: A proposed system for evaluating the quality and consistency of AI-generated animations across various criteria.

LoRA (Low-Rank Adaptation): A technique used in AI to fine-tune models for specific tasks or styles.

Safetensor files: A file format used to store AI model weights securely and efficiently.

Agentic CUA: Refers to AI systems that can autonomously use computer interfaces to perform tasks.

MCP (Model Context Protocol): A framework for enabling AI models to interact with external tools and APIs.

Storyboard: A sequence of drawings representing the shots planned for a movie or animation.

Character turns: Drawings showing a character from different angles, used in animation to maintain consistency.

After Effects: A digital visual effects and motion graphics software developed by Adobe.

API (Application Programming Interface): A set of protocols and tools for building software applications.

Vision AI: AI systems capable of analyzing and understanding visual information from images or videos.

Seed images: Initial images used as a starting point for AI-generated content.

ControlNet: A neural network architecture that allows for more controlled image generation.

Anthropic: An AI research company known for developing large language models.

LangChain: A framework for developing applications powered by language models.

AutoGen Studio: A tool for creating and managing AI agents for various tasks.

Further Prompt: Pull out the theory and hypothesis I mentioned.

Theres more than one. The goal is the create manus.im best-practice use case tests so I can use manus.im to run all the experiments and answer all my questions about my hypothesis and theories.

The supplemental information is here as well: https://exploringallthethings.blogspot.com/2025/03/coherence-matrix-draft.html

Answer from Perplexity

Based on the information provided, here are the theories and hypotheses you mentioned, along with a plan to create best-practice use case tests for Manus.im to validate these hypotheses:

Theories and Hypotheses

Agentic Brushstrokes Hypothesis: The hypothesis that AI systems, such as Manus.im, could potentially create artistic brushstrokes autonomously without seed images. This involves exploring if Manus.im can simulate or train on stroke data to generate artistic outputs.
Coherence Matrix Theory: Developing a coherence matrix to evaluate AI-generated animations based on criteria like character consistency, style adherence, motion believability, scene coherence, emotional fidelity, and thematic unity. This theory aims to standardize the assessment of AI-generated content against traditional animation principles.
Agentic AI Tool Chaining Hypothesis: The idea that AI systems can chain together various tools and models (like LoRAs, ControlNets, and safetensors) to automate complex creative workflows. This hypothesis suggests that Manus.im could potentially orchestrate these tools to generate coherent and stylistically intentional visual content.
Hybrid AI-Traditional Workflow Hypothesis: The hypothesis that combining AI-generated content with traditional animation techniques can create more efficient and controllable workflows. This involves using AI for rapid iteration and traditional methods for fine-tuning and ensuring quality.

Best-Practice Use Case Tests for Manus.im

To validate these hypotheses and theories, you can design the following use case tests for Manus.im:

Test Case 1: Autonomous Brushstroke Generation

Objective: Determine if Manus.im can generate artistic brushstrokes autonomously without seed images.
Steps:
1. Provide Manus.im with a goal prompt to create a painting using simulated brushstrokes.
2. Evaluate the output based on aesthetic appeal and similarity to human-created brushstrokes.
3. Iterate by adjusting prompts or integrating additional tools (e.g., Krita, Sumoapp) to enhance results.

Test Case 2: Coherence Matrix Evaluation

Objective: Assess the effectiveness of a coherence matrix in evaluating AI-generated animations.
Steps:
1. Use Manus.im to generate animations based on different prompts.
2. Apply the coherence matrix to evaluate these animations across various criteria (e.g., character consistency, style adherence).
3. Compare the results with traditional animation standards to identify areas for improvement.

Test Case 3: Agentic AI Tool Chaining

Objective: Evaluate Manus.im's ability to chain AI tools for complex workflows.
Steps:
1. Design a workflow involving multiple AI tools (e.g., Pika for video generation, LoRAs for style adaptation).
2. Use Manus.im to automate this workflow and generate a coherent visual output.
3. Assess the efficiency and quality of the output compared to manual tool chaining.

Test Case 4: Hybrid AI-Traditional Workflow

Objective: Compare the efficiency and quality of AI-generated content refined with traditional techniques.
Steps:
1. Generate content using Manus.im and refine it using traditional animation tools (e.g., After Effects).
2. Evaluate the time, cost, and quality of the final output compared to purely traditional methods.
3. Iterate by adjusting the balance between AI and traditional techniques to optimize workflows.

These tests will help validate your hypotheses and provide insights into how Manus.im can be effectively used for creative workflows.

Citations:

Answer from Perplexity

Answer from Gpt:

Here’s a structured, actionable approach to bridging what you want (validating your hypotheses and theories) with how to accomplish it using Manus.im.

Extracted Theories and Hypotheses from Your Exploration:

You have several key hypotheses and theories to validate:

Hypothesis 1: Agentic Brushstroke Generation

• Theory: AI systems (Manus.im) could autonomously create artistic brushstrokes without relying on seed images, mimicking human-like painterly expression.

Hypothesis 2: Coherence Matrix for AI Evaluation

• Theory: A standardized coherence matrix, incorporating Disney’s 12 Principles of Animation, can effectively assess and improve the quality of AI-generated animations, providing objective benchmarks for comparing AI with traditional animation.

Hypothesis 3: Agentic AI Tool Chaining for Creative Workflow

• Theory: Manus.im (and similar agentic AI platforms) can autonomously chain together existing AI tools (LoRAs, ControlNets, safetensors, APIs) into sophisticated creative workflows to generate coherent, stylistically intentional visual content.

Hypothesis 4: Hybrid AI-Traditional Workflow Efficiency

• Theory: Combining rapid AI generation (for iterative prototyping and stylistic exploration) with traditional animation techniques (for refinement and precision) can produce workflows superior in speed, cost-efficiency, and quality compared to exclusively traditional methods.

Best-Practice Use Case Tests to Validate These Hypotheses via Manus.im

Below are concrete tests designed for Manus.im that directly validate your hypotheses and build a repeatable, systematic approach to your exploration.

Use Case Test 1: Agentic Brushstroke Generation

Goal: Determine Manus.im’s capacity to autonomously produce artistic brushstrokes without seed images.

Method:

1. Provide Manus.im with a clear prompt (e.g., “Create a watercolor-style landscape painting without reference images, emphasizing expressive brushstrokes.”).

2. Evaluate output on:

• Brushstroke realism (variety, organic form, thickness variation).

• Artistic appeal (composition, emotional expression, aesthetics).

• Independence from seed/reference images.

3. Refine prompts, document iteration results, and score outputs using qualitative evaluation metrics.

Use Case Test 2: Coherence Matrix Evaluation of AI Animations

Goal: Implement your coherence matrix (aligned with Disney’s animation principles) to objectively evaluate AI-generated animations.

Method:

1. Generate short video clips using Manus.im via varying prompts (e.g., “Create a whimsical animation of a girl playing in a magical garden.”).

2. Score each animation rigorously using the Coherence Matrix:

• Squash & Stretch

• Anticipation

• Staging clarity

• Motion coherence

• Emotional fidelity

• Character consistency

3. Compare with traditional animations, documenting strengths and weaknesses to refine future prompts or training methodologies.

Use Case Test 3: Agentic Tool-Chaining and Automated Creative Workflows

Goal: Test Manus.im’s capability to autonomously orchestrate complex AI-driven creative workflows.

Method:

1. Define an ambitious creative goal (e.g., “Generate a 10-second animated video with a vintage children’s book style.”).

2. Allow Manus.im to autonomously choose and integrate:

• AI models (Pika model, LoRAs, ControlNet).

• APIs for visual style adaptation and frame interpolation.

• Visual post-processing or compositing tools (via APIs or automation scripting).

3. Evaluate the coherence and quality of resulting videos, documenting Manus.im’s choices of tools, workflows, and resulting effectiveness.

Use Case Test 4: Hybrid AI-Traditional Workflow Assessment

Goal: Compare the hybrid workflow (AI initial generation + manual refinement) against purely traditional animation.

Method:

1. Create initial video/storyboard outputs rapidly using Manus.im.

2. Refine and enhance outputs manually using traditional animation tools (Adobe Illustrator, After Effects).

3. Document:

• Total time and cost of hybrid method vs traditional approach.

• Quality differences (assessed via the Coherence Matrix).

• Repeatability and reliability of workflow stages.

Suggested Documentation Structure for Each Experiment

Use consistent documentation to streamline comparison and repeatability:

Section	Content
Hypothesis Tested	Clearly stated
Goal	Briefly defined
Tools & AI Models Used	List with rationale
Prompts Provided	Specific instructions given to Manus.im
Workflow Diagram	Step-by-step outline of the agentic decisions made by Manus.im
Evaluation & Scores	Coherence Matrix scores, qualitative notes
Iterations & Adjustments	Changes in prompts, tools, or workflow based on results
Findings	Analysis of outcomes, lessons learned
Recommendations	Practical suggestions for improving prompts, workflows, or Manus.im use

Next Steps for Implementation

• Start immediately with clearly defined prompts and parameters.

• Document meticulously to measure how Manus.im meets each hypothesis criterion.

• Iterate frequently, revising tests based on outcomes to refine your understanding and increase Manus.im’s performance.

This structured methodology will help you deeply validate your hypotheses and fully leverage Manus.im as an autonomous, agentic creative platform for sophisticated visual workflows.

Exploring All the Things

Search This Blog