Beyond the Chat Box Wall of Text

An artistic depiction of a chatlog being deconstructed

You’ve probably at least tried out the likes of ChatGPT, Bard or Claude. There is a text box. You type in your remark or question. Hit enter, your text scrolls up, and after some sort of spinner action, the LLM’s response starts to appear below. You and the LLM together build a wall of text history upwards, your personal wing of the Tower of Babel.

Perhaps you speak these interactions instead, as Dr. Chandra did with SAL 9000 in 2010: Odyssey Two, and through some multimodal magic you get a response spoken back.

Hello Sal! Hello Siri! Alexa! Google Assistant! In any case, it’s a history of said and responded. Under the hood almost all applications of AI language models would probably have to have such a wall-of-text structure, but should the user have to deal so directly with all that? We may, 13 years post the fictional date, be behind Arthur C. Clarke in space travel, but perhaps we can sneak ahead in user experience (UX).

One of the biggest problems with the straightforward text wall is that it lays bare the problem of feeding the LLM with the right context. Prompt engineering is the new job that no one, not even prompt engineers, seem to think should be a job. Experts spend hours in trial and error to coax useful responses out of the machine, never mind regular mortals.

Considering that AI is seeping into more and more areas of our lives, we have to consider making it easier for regular people to use. Let’s explore some of the generative AI (GenAI) interface ideas designers and developers are creating in order to increase usability.

Graphologue and Sensescape by UCSD’s Creativity Lab

Here are two Human-AI Interaction concepts from a prominent research center, taking a visio-spatial approach to LLMs, and GPT-4 in particular. These projects seek to provide intuitive visual tools for mapping out ideas and relationships, extending the scope of interaction beyond mere text. Each concept includes mouse-based interactivity. You can move ideas or concepts around and arrange them. You can highlight parts of AI generated text in order to create a new concept, or further expand on a section.

ContextMinds

With an interface not that dissimilar from Graphologue’s, ContextMinds helps in organizing content ideas for SEO optimization. This shows the practical application of such GUI concepts in a professional setting, when plugged into non AI-specific tools. By interacting with these tools the user can collaborate with the AI through a graphical representation of knowledge and strategy. My marketing team has used this facility to produce content at scale where a single person could push through an entire workflow, an unprecedented level of productivity.

Embedded “Magic Wand” Functionalities

Applications such as MS Word, Gmail, and Notion are integrating AI-driven tools as virtual magic wands for streamlining common tasks. The new features help people expand bullet lists into fuller prose, draft emails, create summaries for conclusions for an article, re-write it in a different tone, and much more. They also help overcome the natural teror of the blank page, by providing a starting point. They’re still bound in text on all sides, but when cleverly designed they offer a few more dimensions of expression and interface than the straight up wall. AI magic wand features remove prompting from the equation, setting up pre-coded functionality for one-click use, speeding up interactions while reducing the trial and error of open-ended prompting.

Voice-Prompted Interactions

Even though I’ve made the point that voice to AI is just a layer over the wall of text, it’s certainly a valuable layer. Following on from Alexa and Siri, applications like HumaneAI, Pi, and some initiatives from OpenAI are refining natural voice conversations with AI, enabling a more human-like interaction. Humans use spoken words far more intuitively than the written word, and for most of us voice conversations are easier than typing and reading text. Voice-based interfaces are also usable on the go, without having to use a keyboard and read a screen. Not so distant at all from Dr. Chandra with Sal 9000.

an AI generated image of a robot talking to a human

Mining the Design Mind

Popular design software vendor Figma recently added a raft of AI features ot their FigJam product. Some of the features are about dealing with the usual blank page problem, providing templates through AI-assisted menu diving. Even more interesting are features to help order and group brainstorming outputs such as cards and virtual sticky notes. This, plus summarization features can help with in the tricky process of turning brainstorming sessions into actionable plans. In a tool for designers, the idea is to have the AI can help with organizing and summarizing, with the humans focusing on creativity and ideation.

2100, a GenAI Interface Odyssey

Beyond the above efforts, I can imagine even more interesting future ideas for incorporating LLM technology into apps, to further reduce manual prompting and increase intuitive control.

AI as Discovery Agent

If AI is where we have expendable processing resources, it should take the lead, asking the questions in the first place. Over time, if we can turn the tables and give the AI more initiative, it can learn to better understand human needs, and provide more targeted assistance. This turns things on their head, from the human prompting the computer to the computer prompting the human, opening up into a fully 2-way collaboration to improve prroductivity. The user can give the AI tasks and information targets to collect and have the AI take the initiative with the search and processing path, tapping the user on the shoulder for further guidance, and to deliver results.

Visual-Spatial Interfaces and Gesture Recognition

Think of gesture recognition technologies such as Apple’s VisionPro, recent Apple iWatches, or even advancements on the classic Microsoft Kinect gaming experience. These suggest a move towards interfaces that combine traditional prompts with physical movements or visual cues, and even intelligent observations of human pose and movement by the agent in order to anticipate needs. Presumably we would want this sort of thing to be strictly closed circuit/local processing, to avoid even more of the Big Brother concern than we already have.

Preset Functionalities and Guided Pathing

AI could offer more preset options to minimize manual prompts and utilize ‘thought trees’ or ‘idea graphs’, guiding users towards their end goals more intuitively. A bit like the magic wand features, but in ways more integrated into typical graphical user interface patterns.

Developers like to speak of “opinionated” UIs, meaning that the effort goes more into deciding default behavior than giving the user sophisticated access to tweak things. Highly opinionated models rooted in AI language processing, on one hand, might end up really opening up UX possibilities, though I admit that we always have to make sure we’re not eroding human sovereignty.

Point-Gesture Interfaces

Whether using a screen, a projected screen, augmented reality glasses or headsets, the user can point and scroll through visualized concepts in order to assemble prompt material for the AI. Such an approach would especially be valuable for neuro-atypical users, or those with sensory or mobility divergences.

This sort of interface is in heavy development at large companies, but a recent example that caught my eye was PromptInspirer, by an individual developer posting on Reddit, a tool to inspire DALL-E or MidJourney images generation prompts by dragging and dropping icons. The demo features drag & drop of over 10,000 items onto a stage which turns into prompt text. This is a great example of how we can use visual-spatial interfaces enrich the prompting process.

Brain-Computer Interfaces (BCIs)

If we push well into our Space Odyssey adjacent ideas of the future, why not wire the machine directly into our thoughts? The thing is, this idea might not be as far-out as it sounds. Active brain imaging is a rapidly evolving science, and there are researchers working on having AI interpret such imaging, for a variety of reasons. The questions are still wide open as to how specifically our working minds can be read. Of course, this concept pushes spookiness to the extreme, and there would always be significant safety concerns and barriers to adoption. Nevertheless, BCIs represent a notable frontier, and perhaps the final frontier of AI interfaces.

Hot-wired interfaces require warm, friendly hands

GenAI interfaces will continue to evolve from the simple wall of text to complex, multimodal interactions incorporating voice, gestures, and even thoughts. This evolution will see a deeper integration of AI into our daily lives, transforming how we think about and interact with machines. One perspective is that as these technologies mature, we can anticipate more personalized, intuitive, and efficient AI interactions, heralding a new era of human-computer interaction. On the other hand, as this research develops, there is high potential for strong human reactions against intrusion by AI. It may well be that the more AI strives to be intimate with our requirements, the more alienated we feel.

Our best chance for avoiding the dark side, or perceived dark side of these possibilities is to go out of our way to incorporate and preserve human craft in the process. There is no formula for doing so, and retaining the human touch as AI interfaces evolve is likely to be every bit as much of a challenge as the technological barriers. It’s our role as AI professionals to handle our natural wall of adverse perceptions with care and empathy while we coax AI beyond the wall of text.