ChatGPT Opens Its Eyes and Ears

October 27, 2023

By Tycho Young

As the power of AI grows, the line between what’s human and what’s AI is blurring. With the release of GPT-4 and its integrations, as well as advances in deep fake technology, AI has become unequivocally powerful. With a wide breadth of knowledge and immense speed, AI is now far superior to humans in many ways. Improvements are continuously being made to AI’s capabilities, pushing the boundaries of what we consider true intelligence.

Recently, OpenAI has released two new major integrations with ChatGPT, enhancing its abilities and pushing the boundaries of AI as a service in our everyday lives. While we all know ChatGPT is capable, it can also be verbose and struggle with understanding complex problems, but this problem may be solved with recent updates.

A. GPT-4

GPT-4 is OpenAI’s most advanced model, boasting deep understanding and enhanced problem-solving. Compared to GPT-3.5, GPT-4 performs 40% better on academic benchmarks, comparable to that of a human. For example, GPT-4 receives a 4 out of 5 on the AP Calculus BC Exam, whereas GPT-3.5 receives a 1. GPT-4 also has a myriad of integrations, which allows it to write code and execute it, which OpenAI calls “Advanced Data Analysis”. The improved system can also use Bing search to retrieve information from the web, essentially eliminating previous problems with repetitive computation, visualization, and knowledge cutoff. GPT-4 is now available through OpenAI’s API or ChatGPT Plus for $20/mo.

B. ChatGPT Vision

One of the fundamental differences between GPT-3.5 and GPT-4 is the fact that GPT-4 is a multimodal model. This capability allows GPT-4 to accept different types of input, and in this instance, visuals. ChatGPT will now be able to accept any image as input, allowing it to decipher handwriting, read diagrams, and so much more. Not only is GPT-4V (GPT-4 Vision) capable of describing images, but it is also able to reason and think about them practically. While not as adept as humans in real-world scenarios, Vision provides a major boost to ChatGPT’s capabilities.

Why does this matter?

ChatGPT can now analyze things that may be hard to describe with text. This capability has massive implications for the everyday use of ChatGPT, allowing users to simply take a picture of a hand-drawn diagram and ask ChatGPT for help. Vision will roll out to ChatGPT Plus users in early October.

C. ChatGPT Conversation

Another feature coming to ChatGPT is conversation. ChatGPT can hear and understand speech, while also responding with intonation and expression. This feature allows users to engage in back-and-forth conversations with their assistants. For instance, the chatbot can read a bedtime story or help users practice their foreign language skills. OpenAI states in a blog post: “The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.”

Why does this matter?

OpenAI is finding new ways for users to use ChatGPT in their everyday lives, making it increasingly easy for users to simply pull out their phone and talk with ChatGPT as if it were an assistant standing right next to them. As ChatGPT becomes more thoughtful, competent, and conversational, some may find themselves befriending the chatbot, using it as a therapist without the labor of typing every word.

D. DALL•E 3

DALL·E, OpenAI’s image generation model, wowed the world as one of the first significant advancements in AI when DALL·E 2 was released. Despite this publicity, DALL·E 2 had poor attention to detail, mutilated fingers, created unidentifiable objects, and was incompetent when rendering text. The new and improved DALLE·3 is able to generate highly stylized images in a matter of seconds. OpenAI states in a blog post about DALL·E 3: “Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide.” Furthermore, OpenAI has integrated DALL•E 3 with ChatGPT, allowing ChatGPT to process users’ prompts and create multiple variations with more nuance. Despite these improvements, ChatGPT still often misinterprets the prompt and creates bias, leading to morphed images of what the user had in mind.

Implications on Humans

ChatGPT is becoming more prevalent in our everyday lives. While it may have been used initially to draft ideas or write emails, recent improvements have made ChatGPT a truly impressive and revolutionary tool. GPT-4 brings advanced reasoning capabilities, Bing adds the knowledge of the vast internet, DALL·E 3 gives the ability to draw with stunning quality, and vision and conversation allow ChatGPT to see and talk as if it were human.

As ChatGPT gains these abilities, one may question what defines human intelligence. With 54% of Americans having a literacy level below 6th grade, ChatGPT is already more literate than a large portion of the American population. AI is faster, cheaper, and will continue to improve itself. Does this mean humans will be replaced? Most likely not. Despite the technical prowess of ChatGPT, it is only as good as the instructions provided. ChatGPT is simply a tool to improve the efficiency of humans, and the real value of human workers is their independent problem-solving ability. So while workers might not be replaced by ChatGPT itself, they will be replaced by someone who effectively uses ChatGPT to do both of their jobs in half the time. Those who don’t have critical thinking skills and initiative will be replaced by those who do. Taking initiative, learning how to use ChatGPT, solving problems, and going the extra mile will always get you ahead, regardless of how smart AI becomes.

← Back