Aneya | My site

Aneya.ai

A platform for seamless and intuitive Gif generation : Real-time magic through multi modal commands and intuitive experience

Overview

Revolutionizing GIF creation, our platform seamlessly merges AI, voice assistants, and an intuitive user interface, enabling users to effortlessly generate dynamic GIFs in real-time through a unique combination of text prompts, cursor tracking, and voice recognition with one or multiple inputs. My role involved extensive research into Generative AI trends, strategic mapping of the user journey, and crafting intuitive designs. I navigated through AI research, synthesizing research findings to guide our design strategy.

Duration

4 Months

Sep 2023 - Dec 2023

Advisor

Angus Forbes

Problem

In the fast paced world of digital art creation, the use of conventional text prompts presents significant challenges for users, limiting their ability to fully realize their creative vision. These prompts, often rigid and predefined, constrain artists within boundaries, limiting their exploration and experimentation. As they strive to push the boundaries of digital artistry, the limitations imposed by traditional text prompts can hinder their creative potential, inhibiting the exploration of new techniques and styles.

Our goal

We wanted to revolutionize GIF creation by designing an immersive platform that seamlessly integrates AI while harnessing the creative expression of users.

The ultimate vision was to about empowering them with tools to unleash boundless creativity, resulting in GIFs that are not confined to the ordinary, but instead, become a canvas for individuality and personal expression.

Additionally, our aspiration was to foster a symbiotic relationship between human creativity and AI interaction, seamlessly blending the two to enhance the GIF creation process.

Solution

Multi modal inputs - Diversifying creative inputs

Users can get creative using different inputs at the same time, they can type/select ideas suggested by the text prompt, upload sketches or photos, draw on the canvas, or simply speak to Aneya's voice assistant.

These features seamlessly work together, letting users switch between sketching, getting suggestions via text, or using voice commands making creative expression easy and dynamic.

Realtime output - Minimizing

creative lag

With our real-time output feature, users can instantly see the impact of their inputs on the GIF in the adjacent output panel.

This seamless integration enhances user-friendliness, allowing quick adjustments based on the generated output. The dynamic and intuitive process ensures a smoother and more engaging creative experience by minimizing the delays between prompts and generated outputs.

Prompt through voice - "Hey Aneya can you show that the dog is playing in the swimming pool ?

Aneya changes the Gif to the dog playing in the swimming pool.

Cursor tracking - Enabling natural ways of interaction

Users can simply express their desired animation using voice commands while drawing on the canvas. For example, saying 'from here to here' guides Aneya to track the cursor and understand the animation direction.

This seamless integration of voice and cursor tracking enhances the creative process, making it more user-friendly and accessible for users to bring their animations to life.

Integration of Voice and cursor

Intuitive Experience

Process

Research

Secondary research

We conducted a thorough review of research papers and online articles, analyzing current trends in generative AI and Text Prompt to inform our approach.

In our exploration of enhancing GIF creation, we identified key opportunities after doing existing research and trend analysis. These include integrating voice commands for user convenience, enabling dynamic theme customization for personalization, introducing real-time collaboration tools for teamwork, and implementing enhanced feedback mechanisms for improved user guidance. This research was followed by phase 1 sketching to explore our initial ideas.

Contextual analysis

We conducted semi-structured contextual analysis to generate themes and identify opportunities based on all of our research. To understand user needs and behaviors, we performed contextual inquiries by observing users in their natural environments and engaging them in open-ended discussions. We took 3-4 generative AI sites as an example. This approach allowed us to gather deep insights into user interactions and pain points, informing our design decisions for a more user-friendly gif generation platform.

From these contextual inquiries, we synthesized insights and identified four key themes: irrelevant kickstart prompts, hindrance in creative flow due to prompt dependency, skill variation, and inspiration drought. By grouping similar responses from our inquiries, we were able to extract these recurring issues, supported by direct quotes from users. To refine these initial themes into more actionable insights, we employed thematic analysis. This process involved systematically coding the data to identify patterns and relationships between themes.

Screen Shot 2024-04-05 at 11.35.32 AM.png

Screen Shot 2024-04-05 at 11.35.18 AM.png

By examining the underlying issues and user needs in more depth, we were able to consolidate and reframe the initial themes into four refined themes: lack of intuitiveness, limited expression, language barriers, and cognitive load. This deeper analysis allowed us to move from broad, surface-level themes to more specific and actionable insights.

Finally, through a strategic brainstorming process, we formulated four key solutions to address these refined themes: implementing a diverse range of multimodal inputs, bridging the gap between prompt creation and desired output, providing customizable text prompts, and enabling real-time output. These solutions were designed to tackle the identified problems comprehensively, ensuring a seamless and intuitive gif generation experience for our users.

Competitive analysis

Diving deeper into the space after the contextual inquiries, we began understanding how the problem has been dealt currently. This analysis gave us better idea into our feature ideation and further project inspiration. While conducting competitive analysis we focused on intuitiveness, cognitive load and manual or conversational nature of text prompts.

After analyzing the data we tried mapping problems from our secondary research to the each category.

~ Manual + Cognitive overload - Manual interactions contributed to elevated cognitive strain

~ Manual + Intuitive - Despite the manual nature the process remain intuitive

~ Intuitive + Cognitive overload - Websites adopted conversational elements but incurred heightened cognitive load, impacting user experience.

Screen Shot 2024-04-05 at 11.34.58 AM.png

Narrowing down the scope

We aimed to gain deeper insights into user behaviors, preferences, and emerging trends, laying the groundwork for informed design decisions for final designs.

Expert Interviews

The next we considered understanding what experts in the field say to not only gain insights into emerging trends and innovative applications but also helped us understand the dynamics of human-AI interaction and the rationale behind it.

Through discussions with experts, we gained a deeper understanding of how users interact with AI-driven systems, including their expectations, preferences, and pain points. Furthermore, by examining experts perspectives on human-AI interaction, we were able to identify key considerations for designing intuitive interfaces and seamless user experiences. Understanding the capabilities and limitations of generative AI models enabled us to implement features that enhance user engagement and satisfaction while mitigating potential sources of frustration or confusion.

Quotes from expert Interviews

Ideation and Design

We began sketching early, which helped narrow our scope and generate ideas. The first phase started right after initial research, offering a unique opportunity. Subsequent sketching phases were guided by primary research and testing. We went through total of 3 design sprints.

In the first phase, we began by conceptualizing two distinct panels: an input panel and an output panel.

Moving into the second phase, our focus shifted towards enhancing the platform's intuitiveness.

In the third and final phase of design, we aimed to elevate the intuitiveness of the platform even further.

The input panel featured various drawing options and settings, akin to those found in standard drawing applications, including color, line thickness, and opacity adjustments.

Meanwhile, the output panel focused on settings related to audio, video, and keyframe adjustments, catering to users' needs for customization and control over their GIF creations.

We explored the concept of removing traditional settings options to streamline the user experience. Introducing a smart AI text prompt became a key feature, leveraging users' previous history to suggest ideas and streamline the creative process.

Additionally, we envisioned integrating text and voice inputs seamlessly, offering users multiple ways to interact with the platform and generate GIFs effortlessly.

We introduced cursor tracking, revolutionizing the way users interacted with the platform by allowing them to give commands casually while sketching or adding existing photos.

By combining cursor tracking with other input modalities, we aimed to create a fluid and natural user experience where users could express their creativity without being constrained by traditional input methods.

Design iterations

Iteration 1

Iteration 2

Iteration 3

Final Solution

Next steps

The future worked could include more intuitive and gesture based reactions. On such interaction could be tracking the movement of a cursor and mapping it to the users' informal voice commands to make the prompting process more embodied and intuitive.

Business possibilities

Aneya comes into play in different scenarios. Whether you're into social sharing, preserving history, animation work, or video projects – Aneya's got something for everyone.

Creative prototypes - Actual working designs

Simultaneously while working on Designs our development team explored some really cool things like Controlnet morphing, frame interpolation for large motions, and even played around with Controlnet plus dream booth. Let's check out what we pulled off and what we learned along the way.

FILM could generate smooth interpolations between smaller movements
ControlNet (canny edge) + Dreambooth could interpolate large motion if given a template
ControlNet morphing could convert latent space interpolations into a clean image
Able to generate higher quality animations compared to GANs. No longer has ghosting
These approaches could be combined to create a powerful animation tool

Controlnet morphing

Screen Shot 2024-01-22 at 5.48.48 PM.png

FILM - Frame interpolation for large motion

Screen Shot 2024-01-22 at 5.45.08 PM.png

Controlnet +Dreambooth

Prompt: “a beautiful girl avatar style

Screen Shot 2024-01-22 at 5.51.40 PM.png

Reflection

This project opened doors to a new realm of UX design, unlike any I've encountered before. Exploring GIF creation introduced me to unique challenges and methodologies, especially in understanding generative AI trends and real-time collaboration tools. It highlighted the importance of intuitive interactions and personalized experiences, reshaping my approach to design and digital experiences. Participating in the creative AI project helped me explore how technology and human creativity intersected. It encouraged me to try new things beyond traditional user research methods.

Way forward I would like to bring additional user's perspective into this and have more user research rounds to make it user centered as well.

ml.

Research

Ideation and Design