Can you hear it? That loud, whooshing sound? 

It’s the noise that happens when you see entire industry business models transforming in font of our eyes. It is also the sound of the air that is sucked out of our lungs as we stair, awestruck, trying to keep up with the level of disruption that is currently happening this year in technology, almost on a daily basis.


I am talking about the generative AI platform ChatGPT, and in this case, the GPT-4 update, which was announced yesterday


While GPT-3.5 was based on text input, GPT-4 is now multimodal. Multimodality refers to the ability of the AI to understand and generate text as well as other forms of data, such as images, audio, and video. 


Earlier this week, Microsoft teased the release at "AI in Focus - Digital Kickoff”. Holger Kenn (Chief Technologist Business Development AI & Emerging Technologies at Microsoft Germany) alluded to the fact that GPT-4 was coming in the next week and it would, in fact, be multimodal. He explained that multimodal means that it "can translate text not only accordingly into images, but also into music and video." 


In yesterday’s official announcement on the GPT-4 Developer Livestream, (Video Below) Greg Brockman, President and Co-Founder of OpenAI, showcased some of the new GPT-4 capabilities and limitations, including its new image input functionality. Needless, to say, it is pretty impressive…


While there are many features that are still unavailable to the public, here are a few highlights that are in the works:

Zum Schutz Ihrer persönlichen Daten ist die Verbindung zu YouTube blockiert.
Durch das Laden der Inhalte akzeptieren Sie die Datenschutzbestimmungen von YouTube.

Visual Inputs:

According to OpenAI: "GPT-4 can accept a prompt of text and images, which - parallel to the text-only setting - lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images. Over a range of domains - including documents with text and photographs, diagrams, or screenshots - GPT-4 exhibits similar capabilities as it does on text-only inputs."


In the livestream, Greg Brockton held up a physical notebook showing a hand-drawn sketch of a joke website. He then made a photo of it with his phone and uploaded it to a Discord Bot that he also created using ChatGPT running GPT-4.


He then instructed it to output the image into HTML and the result was a fully functioning website for the sketch. In other words: It can take a hand drawn sketch UI and turn it into a fully functional website UI!


In an additional example, Greg used the new ChatGPT Plus Playground functionality, to instruct ChatGPT to act as a tax consultant. He then entered about 16 pages of tax codes and some complex information about a married couple along with their current tax status. ChatGPT then chewed through the data and presented their standard deduction, including showing it's work.


In other words: ChatGPT just completed a fictional couple’s income tax return!


Advanced Reasoning Capabilities

GPT-4’s advanced reasoning capabilities represent a significant leap forward in solving complex problems with greater accuracy. After dedicating six months to refining its capabilities, OpenAI reports that it is now 82% less likely to produce responses that violate content guidelines. In addition, GPT-4 outperforms its predecessor, GPT-3.5, by producing 40% more factual responses during internal evaluations.


As we have previously discussed, GPT-3.5 has been tested on multiple simulated industry exams, such as the Uniform Bar Exam, the Wharton MBA exam, the USMLE, the AP Physics, Macroeconomics, Physics, etc. In most cases, GPT3.5 finished the exams within a 40% to 60% performance range. With GPT-4, using image capabilities, the results were in the 90% range.


What is Available Now and What is Coming:

GPT4 is now accessible to paying users through ChatGPT Plus, albeit with usage restrictions, while developers can join a waitlist to gain access to the API. The cost for the service is $0.03 per 1,000 "prompt" tokens and $0.06 per 1,000 "completion" tokens, where each token represents raw text, like individual word components. Prompt tokens are the input to GPT-4, while completion tokens represent the output generated by the model. 


Interestingly, GPT-4 has already been integrated into various systems by early adopters, including Microsoft's Bing Chat, Stripe's website scanning feature, Duolingo's language learning subscription tier, Morgan Stanley's retrieval system for financial analysts, and Khan Academy's automated tutor.


While many of these new features are still in development, we are excited to see the rapid progress of this technology. 


Schedule a Workshop

If you are intrigued by the possibilities of GPT-4 and want to explore how this new technology can and will transform your business, don't hesitate to get in touch with us. 


We offer tailored workshops to help you understand the potential impact of GPT-4 on your industry and guide you in implementing it in your operations. Contact us today to schedule a workshop and take the first step towards unlocking the power of AI for your business.


More Questions?
We're here for you!
Sean Earley
CIO / Exec. Editor

KEMWEB goes robotspaceship!

Unleashing the power of Innovation