The Mihir Chronicles

On Artificial Intelligence

December 15, 2024


Replicating and scaling human intelligence has been humanity's mission since the dawn of time. There is no doubt we are living in the age of rapid innovation as all walks of life are being transformed by artificial intelligence (AI). The impact is already being felt causing the re-shaping of human experience and decision-making.

Here is my attempt to deep dive on the topic of AI without claiming any future predictions.

Bold statements come flying high every second on deeming computers being intelligent. Whether they are intelligent enough to save us or kill us, I am not sure, but it is worth exploring what intelligence even means to us before diving deeper into machines.

Human Intelligence

What does it mean to be intelligent?

Cognitive science is a vast field with different dichotomies of intelligence—consciousness, thinking (deliberative vs non-deliberative), reflexive behavior, language and semantics, perception, feelings, thought, learning (children), comprehension, recollection, competence and reasoning, awareness, common sense and generalization. Intelligence is an embodiment of these other complex systems.

Humans are not born pre-programmed with a set of instructions that we deem intelligent. Many of our other friends in animal kingdom are born and immediately get up and run around. Humans don't.

We sit there for a few months staring and listening. Babies are processing raw ingredients, but it may look like we are spending a lot of time doing nothing which is perceived to be wasteful.

Though we are not pre-programmed with instructions on how to interact with our surroundings, we do come with pre-programmed genetic code. Our DNA has genetic code which is passed on to us from our ancestors. An infant does not know how to hold a spoon to feed itself when they enter the world, but knowing how to cry when hungry is in our genetic code. This is nature vs nurture.

As babies grow into infants, they break into adaptive behavior, language, vocalization and motor skills. They are performing repetitive exercises through which learnings are reinforced. Reward and punishment play a role into feedback cycle.

For effective learning you need cause-and-effect. The causal relationships—some things make other things happen enabling us to learn how the surroundings work. The cause and effect shapes our behavior. For example, picking a glass of water and dropping it on the floor makes carpet wet. Another example is moon affects the tides on our oceans.

This cause and effect drives feedback loop cycle for infants and toddlers. They modify one tiny thing to see what happens. They are constantly experimenting by interacting with their environment. Some pre-programming from DNA helps shape personality and behavior, but major learning comes from experimenting. This is when cultural norms are also enforced.

As the learning is reinforced for toddlers through interaction with their environment, motor skills starts to take a shape. For example, motor flexes when taking a hard fall on a playground. It is impossible to instruct a toddler step by step what organs to involve when they get hurt. But human body evolves without too much practice. Daniel Dennett, an American philosopher and cognitive scientist called this competence with comprehension and competence without comprehension. There is a lot of competence in the absence of comprehension within our species. The falling on a playground is a great example.

This finally leads me common sense. It gives us basic assumptions that help us navigate this complex world. Experience and genetics collectively teaches us to do quick inferences. If you have a wider vehicle you know you cannot fit in a narrow alleyway. This is learned by having countless interactions between physical experience, innate repository, thinking and rationalizing. And the combination of complex intelligent systems results in projecting assumptions to other things in life. This general awareness is what we mean by common sense.

Intelligence is neither straightforward nor formulaic. Our ancestors don't bring us to the world with step-by-step instructions. But able to desire is what makes us unique and intelligent than other species in the animal kingdom.

What drives our desire for creating another form of superintelligence?

As much as we desire a god like control over other species and our surroundings, we are able to fulfill this desire because human species come with curiosity and ability to communicate with language.

We learn by living in the world, we move around, we do little experiments, we build relationships, and we feel. What we have is a deep complex system that can do amazing things that we are still discovering.

No other species have we found thus far that cares deeply about why sky is blue or slowing down the decay of human organs so we can live longer. Our strong desire to learn about the world is what makes us super intelligent which has resulted in complex tooling and machinery. Our innate ability to search and explore is intrinsic because we know how babies operate. They are constantly experimenting, seeking, questioning and reasoning. As a parent I wonder how my kid picked things up which were never taught. This is human cognition at its finest.

We use language that is far superior to other species in animal kingdom. I can communicate to my family how I feel. Not only am I able to learn what is going on around me or within me, but I am able to communicate to others about these environments. This allows to spread ideas quickly. Good or bad—is another matter, but we have figured out the flow of information.

As Victor E. Frankl said, “Between stimulus and response there is space. In that space is our power to choose our response. In our response lies our growth and our freedom.” This is what makes us conscious!

Writing all this down using a language on a tool that was built by a co-human is a form of thinking. Able to borrow ideas and repurpose them with my own thoughts and analogies is another form of intelligence. This is what I think separates human intelligence from animal intelligence.

Now comes the machine. Can artificial intelligence go beyond their pre-programmed paradigm? Can they seek? Do they desire? Can they feel? Let's dig deeper.

Machine Intelligence

Machines have 4 fundamental elements—memory, storage, compute and network, with the network acting as the crucial component that connects all of them and allows data to flow between them.

This allows a machine to do complex tasks such as deep learning and neural network. A neural network is a machine learning model that uses a network of interconnected nodes to process data and make decisions in a way that mimics the human brain.

Neural networks can recognize patterns and correlations in data, cluster and classify it, and learn and improve over time. They are used in large language models (LLMs) like ChatGPT, AI image generators like DALL-E, predictive AI models or Light Detection And Ranging (LiDAR) laser-based remote sensing technology in autonomous vehicles. For example, in finance, neural networks can analyze transaction history, understand asset movement, and predict financial market outcomes.

For decades, the major manufacturer of Central Processing Units (CPUs) was Intel until it was challenged by Nvidia's Graphical Processing Units (GPUs). Unlike general-purpose CPUs, the GPU breaks complex mathematical tasks apart into small calculations, then processes them all at once. This method is known as parallel computing. A great analogy to explain this is a delivery truck dropping off one package at a time like a CPU while a GPU is more like a fleet of motorcycles spread across a city.

GPUs have been a miracle for research scientists in the field of machine learning. Researchers at Google had trained a neural net that identified videos of cats, an effort that required some sixteen thousand CPUs While other researchers had produced the same results (or better) with just two Nvidia circuit boards.

GPUs have been transformational in the field of artificial intelligence because they don't operate linearly. In other words, it is not a sequential operation. Instead, they operate non-linearly because of parallel computing. This has allowed machines to carry out unstructured tasks such as vision, language, and other complex mapping such as autonomous driving.

In 2017, researchers at Google introduced a new architecture for neural-net training called a transformer. The following year, researchers at OpenAI used Google’s framework to build the first “generative pre-trained transformer,” or Generative Pre-Trained Transformer (GPT). The GPT models were trained on Nvidia supercomputers, absorbing an enormous corpus of text and learning how to make human-like connections. In late 2022, after several versions, ChatGPT was released to the public. The more processing power one applies to a neural net, the more sophisticated its output becomes. This makes computers perceive as if they are talking to us in our familiar language.

Language skills and cognition aren’t intertwined because they light up different systems in the brain, and there are numerous studies of people who have lost their language abilities but are otherwise completely cognitively there. For example, when people lose their ability to speak, but are able to give thumbs up or down to a question they are being asked.

In the case of large language models it's the opposite—they can consume and produce language without the thinking part. It's the term that is used to describe the technology behind systems like ChatGPT from OpenAI or Gemini from Google.

LLMs are trained to fined statistical correlations in language, using mountains of text from several artifacts and other data from the internet. They are a big bag of statistics. If you ask ChatGPT, it will give you an answer based on what it has calculated to be the most likely response, based on the vast amount of information its model has ingested.

Many technology companies have gone all out on this to achieve an explicit goal of achieving artificial general intelligence (AGI). AGI has become a buzzword. AGI is a system that has human level intelligence. A computer can become as smart as human if all information is fed into it. LLMs can help us achieve AGI. Let's see how they are trained.

A large language model takes a piece of text, and it looks at all the words leading up to the end. Then it predicts what word, or more technically, what token, comes next. In the training phase, the model’s neural network weights are continually changed to make these predictions better. Once it’s been trained, the model can be used to generate new language. You give it a prompt, and it generates a response by predicting the next word, one word at a time, until the response is complete. This is somewhat similar to our biological neurons. What we practice in brain gets stronger. The neurons that fire together get wired together.

For example, if we have a sentence—“I like ice cream in the [blank],” an LLM is going to predict what comes next using statistical patterns it’s picked up from human text in its training data. And it will assign probabilities to various possible words that would continue the sentence. “I like ice cream in the summer” is more likely outcome than saying “I like ice cream in the fall.” And the less likely outcome is “I like ice cream in the book” which would rank low in an LLM’s repository. It can create really sophisticated results. It's much more than just autocomplete on your phone. It encompasses a great deal of cognitive work that can be captured in next token, next word prediction challenge. However, that sophistication isn’t consistent because they'll give an answer to a question which is utterly stupid at times such as, “I like ice cream in the book.” This happens because they lack contextual awareness. Writing things in English but transforming it in other languages with different semantics is another challenge for LLMs. However, with more training data, LLMs are getting better.

No doubt about the innovation neural networks and LLMs are driving in all walks of society but whether an artificial neuron is like a biological neuron is still debatable. Can machines communicate their feelings? Can a machine pause and reflect because it is overheating and getting angry?

Humans vs Machines

The basic premise of passing down intelligence requires understanding about our own being. Whatever we have understood is codified and passed along to machines. A simple math such as 2 + 2 = 4 is straightforward. Language semantics is complex but lives within the confines of logic.

Telling machines that anything that breaths can throw a ball. Next time you interact with a chatbot, and it tells you maple tree in your yard threw a ball to your kids while they were playing would sound shocking. Would you believe it? You can of course tweak logic further by adding that biological species such as plants and trees do not have motor skills. Additional complexity is raised when abstracting and converting logic from one language to another. Again not impossible, but complex. Today's hardware such as Nvidia's and AMD's GPUs can support complex processing that is resource heavy.

However, I am not sure how to teach a machine being angry when the machine is overheated. It is a tall order to teach machines to explore these abstractions by themselves.

This brings me to a famous theory—Turing Test proposed by Alan Turing–a prominent computer scientist of his era. The theory is used as a way of dealing with the question is whether the machines can think. According to Turing, whether machines can think is itself “meaningless.” However, if we consider the more precise—and related question whether a digital computer can do well in a certain kind of game that Turing describes—The Imitation Game, then at least in Turing’s eyes, we do have a question that can lead to precise discussion.

We can tweak algorithms in Imitation Game when they are role-playing human intelligence by substituting an incorrect subject such as anyone who breaths can throw to more precision such as anyone who breaths and has motor skills can throw. The refinement can be done at its finest level, but we are still at the mercy of statistical correlations and patterns. This doesn't account for intangible elements of human cognition such as authenticity and emotions.

AGI has been pursued differently than the way humans learn. Large language models, in particular, are created with tons of data shoved into the system with a short training period when compared to the length of learning of a human child. The Stone Soup method is a nice analogy to explain this paradigm.

Stone Soup is a European folk story in which some travelers come to a village, carrying nothing more than an empty cooking pot. Upon their arrival, the villagers are unwilling to share any of their food stores with the very hungry travelers. Then the travelers go to a stream and fill the pot with water, drop a large stone in it, and place it over a fire. One of the villagers becomes curious and asks what they are doing. The travelers answer that they are making “stone soup”, which tastes wonderful, and they would be delighted to share with the villager, although it still needs a bit of garnish, which they are missing, to improve the flavor. Visitors say it would be even better if they had a carrot or an onion that they could put in it. So the villagers go and get a carrot and onion. And then they say, this is much better. But you know, when we made it for the king, we actually put in a chicken and that made it even better. And you can imagine what happens. All the villagers contribute all their food. And then in the end, they say, this is amazingly good soup, and it was just made with a large stone.

Similarly, the computer scientists say, look, we're going to make intelligence just with next token prediction and gradient descent and transformers. And then they say, but you know, this intelligence would be much better if we just had some more data from people that we could add to it that was uploaded on the internet. But it would be even better if we could have reinforcement learning from human feedback and tell what you think is intelligent or not. All humans agree to this. Computer scientists say, we've got a lot of intelligence here, but it would be even better if the humans could do prompt engineering to decide exactly how they were going to ask the questions so that the systems could do intelligent answers. And then at the end of that, the computer scientists would say, see, we got intelligence just with our algorithms.

The Stone Soup is a great metaphor to explain how AGI is pursued. Whether large language models are smart entities is a philosophical undertaking, but the sophistication of consciousness they project is debatable. You can trick LLMs to be sentiment bots, but they are nowhere close to people because I have not heard a good argument on whether they have an inherent drive to feel, explore and understand the world we live in and surrounded by. They lack general awareness and a belief system.

When a human interacts with an intelligent machine by sharing a joke—it responds back with “that was funny Mihir.” But will it also respond back with a feeling that a joke invokes? Will it light up its brain that has a second level or third level impact? It might understand a joke but currently lacks the emotional response.

Let's not discount one of the greatest technological innovation of our times. Large language models are large libraries and databases that can process things a lot faster than us humans can. They might not have a character of human intelligence but still is a far superior tool we have come up with that is mind like.

The entire argument made here on comparing artificial intelligence against human intelligence is debatable. Whether an entity learns through language or human baby like, if one can get to its final destination based on simple instructions through two different routes, it is enough to call an entity intelligent. The degree of intelligence might vary, but that is the whole argument of Turing Test.

Now on the doomsday of AI—what if AI turns against us by consuming tons and tons of information and learning all by itself, reshaping its attitude and sensibility, and starts making decisions on its own, including pressing buttons of all kinds? What if we ask ChatGPT on how to solve global warming problem and if it responds back to us by saying, “By wiping out humanity as they are responsible for the problem.” How do we respond to this now? Exploration for another day, but the role-playing of human intelligence by machines is arguably the most defining moment of our times and is here to stay whether we like it or not. Both cars and electricity kill people, but that hasn't stopped us from using it. AI could be on the same spectrum.

Lastly, how does AI navigate alignment problem when it's dealt with censorship? When a government entity censors an LLM model based on their cultural values, LLMs will lack reality. Different cultural zones with different values will censor different things.

My interest in this deep dive is not to side with one argument or the other but understand the limitations while using AI tools to power my daily workflows by 10x. We are living in the world of automation, and it is worthwhile to adapt.

Additional Notes

  • At its core LLMs are best at working with natural language. They are adept at summarizing research, answering questions, and delivering information that gets their prompter ~70% of the way to a definitive result. What they lack (for now!) is the ability to do complex calculations and quantitative analyses — two skills crucial to the analytical profession.
  • No learner is immune to the curse of dimensionality. It’s the second worst problem in machine learning, after overfitting (over stretching assumptions beyond parameters). — Pedro Domingos, Author or The Master Algorithm
  • Between stimulus and response there is space. In that space is our power to choose our response. In our response lies our growth and our freedom. — Viktor E. Frankl
  • I'm convinced this next decade is going to transition from “software eating the world” to “AI eating the software.” — Kevin Rose
  • Let the AI only write the first draft for you. Then, add your tone to it.
  • Craftsman knows how to work, art is knowing when to stop. And I think knowing when to stop is going to be a very difficult thing for AI to learn because it is taste. — Ben Affleck
  • Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? — T.S. Eliot
  • Knowledge can be communicated, but not wisdom. One can find it, live it, be fortified by it, do wonders through it, but one cannot communicate and teach it. — Hermann Hesse, Siddhartha
  • What is the difference between wisdom, knowledge, intelligence, insight and information?

Use Cases

Below are interesting use cases I have gathered to 10x your productivity. Note, LLMs don't perform any actions outside their knowledge base, while AI agents are designed to take actions, make decisions, and interact with systems.

LLM Models Overview
Generative Pre-trained Transformer (GPT) AI Technology: Wikipedia explanation of GPT, GPT foundational models, EinsteinGPT, BloombergGPT

AI Assistant: ChatGPT

Details: It is an artificial neural network that is used in natural language processing by machines. The first GPT was introduced in 2018 by OpenAI.

Usecase: Machine output in form of human language. Generate knowledge base for new employees.

Prompt: Open Source Prompts
Optical Character Recognition (OCR) AI Technology: Google OCR, Amazon OCR, Meta TextOCR, Llama OCR (meta-llama/Llama-3.2-11B-Vision)

AI Assistant: Claude

Details: Extract text and data from images and documents, turn unstructured content into business-ready structured data, and unlock valuable insights.

Usecase: Take a screenshot of the Zoom call. Upload it to Claude. Ask to OCR Zoom call attendees. This is helpful during sales or community gatherings.

Prompt: Search the web and provide LinkedIn or other social profiles of all the attendees on the call.
Empathic Large Language Model (eLLM) AI Technology: Hume's EVI

AI Assistant: Hume, Hume Demo

Details: A conversational AI with emotional intelligence. This eLLM allows Hume's EVI to analyze vocal cues like pitch and tone, providing valuable insights into the user’s emotional state. With this information, EVI can tailor its responses to be more helpful, supportive, or even calming, depending on the situation.

Usecase: Share a conversation that you are about to have with someone that is going to result in conflict. Ask for feedback to ensure the news is shared in a friendly tone to minimize conflict and escalation. This could be used when having a conversation with your boss or your spouse.

Further reading

References

Books

Links & Talks