Multimodal Spring AI - Building voice, image generation, and vision into your Spring applications

Most of the buzz around Generative AI has been how it is able to understand and respond to natural language prompts. But largely, those prompts have been typed in by a user and the responses have come back in textual form.

While humans often communicate with each other in a similar way via text messages and emails, natural interaction takes place across many modes of communication, including talking, hearing, seeing, and showing.

In this session, you'll see how to add sight and sound to your Spring AI applications. You'll learn how to build applications that can both talk and hear what your users say as “see” what your users show them (via images) and produce responses in graphic form.

About Craig Walls

Craig Walls is a Principal Engineer, Java Champion, Alexa Champion, and the author of Spring AI in Action, Spring in Action, and Build Talking Apps. He's a zealous promoter of the Spring Framework, speaking frequently at local user groups and conferences and writing about Spring. When he's not slinging code, Craig is planning his next trip to Disney World or Disneyland and spending as much time as he can with his wife, two daughters, 1 bird and 2 dogs.

More About Craig »