APIs are (sort of) over; and the new era of work
Your workplace companion can do even more than ever
A conversation with a colleague today following the release of GPT-4o (more on that later) sparked a thought: APIs might be heading towards extinction. Multi-modal AIs could render traditional API integrations obsolete. Imagine you’re integrating with the myriad of SaaS platforms in use by modern companies today, why bother wrestling with an API and lacklustre docs when an LLM can simply interact with a UI, gather the necessary data, and perform the needed tasks? And why bother building a dedicated API for your product when your clients can just throw an LLM at your UI and extract what they need?
We treat our UI as our API, making it the integration point for our product, with an AI as the “user”. Traditional access control still apply, we treat the AI as an actor that chooses to interact with a UI instead of an API, and has a role and permissions, just like a human user would do. Does that mean we would make one UI for humans, and another for AIs? I suspect not, in fact I could imagine a world where catering for AI actors helps us to craft more impactful and accessible UIs – a boon for everyone!
APIs are dead, long live UIs.
—
This brings us to OpenAI’s spring announcement: GPT-4o, with its real-time voice and vision capabilities, integrated directly into ChatGPT and on your desktop (thanks to the new desktop app!). Able to hold real-time, interruptible conversations, and observe what’s going on your screen, it’s not just another tool; it’s your new workplace companion. Whether you’re an engineer or not, GPT-4o is yet another indication that the way we work may never look the same again.
Need to interpret a complex chart? GPT-4o has got you covered. Tidying up some messy graphics? It’s on it. Writing up some tickets or decoding a cryptic error message? Just speak to it and show it your screen, like you would to a colleague sitting with you. This AI can see what’s on your screen and assist you in real-time. All of this is amplified further when you consider it as a collaborative tool, not a one-to-one experience. Imagine GPT-4o embedded in your planning sessions, your executive meetings, or a mob programming session, and suddenly you’re getting the best of both worlds – your experienced and knowledgeable colleagues alongside the speed and power of an LLM. As a researcher, facilitator, or teacher, I believe multi-modal models will be even more powerful in group settings. Keeping your colleagues involved also helps to maintain appropriate safeguards on the use of AI in the workplace, especially anything sensitive, being sure to check facts, and prevent access to anything that you’d rather it didn’t see.
—
We might be witnessing the next steps towards an AI OS, introducing a whole new set of interaction paradigms with LLMs as the operator, able to interact and extract information on your behalf. Imagine this deeply integrated into future generations of Apple Vision Pro or the Meta Quest, with real-time video and audio streams to assist you with your work. The possibilities are staggering. Everything is ripe for disruption.
Welcome to the next era of work.
This post was written by hand, and edited with the support of GPT-4o. The image was also generated with GPT-4o.