The Mihir Chronicles

On Artificial Intelligence

December 15, 2024


Like everyone else, I am LLMed up because of how exciting these tools are. It feels dark, yet some call this era magical. Nevertheless, it deserves the hype because it feels similar to the early days of internet, web and mobile. Here is my attempt to deep dive on the topic of artificial intelligence (AI).

Replicating and scaling intelligence has been humanity's mission since the dawn of time. There is no doubt we are living in the age of rapid innovation as all walks of life are being transformed by AI. Perhaps AI will re-shape human experience, maybe not, but caution is required when bold statements come flying high every second on deeming computers being intelligent.

Formulating a concise definition of human intelligence is a slippery topic because of its nebulous nature, but worth exploring nonetheless before we dive into machine intelligence.

Humans

Cognitive science is a vast field with different dichotomies of intelligence—consciousness, thinking (deliberative vs non-deliberative), reflexive behavior, language and semantics, perception, feelings, thought, infant learning, comprehension, recollection, competence and reasoning, awareness, common sense and generalization. Intelligence is an embodiment of these complex systems.

But what does intelligence mean to us? Is intelligence a pre-requisite for survival? Are we the only species with intelligence?

As I explored these questions, I realized intelligence is not central for species to be successful on this planet. My grass on my yard has evolved and flourished without any intelligence. A tiny virus shutdown most of the planet in 2020 without having any knowledge of humans. This tells me knowledge is not a pre-requisite for intelligence. There are species out there which has evolved without the need for intelligence.

Seeking intelligence around us is natural to us. When I see my pet follow my orders (Leo, sit!), I deem my pet to be intelligent. When I was seeking a life-long partner, I wanted to spend time with an intelligent person. We weren't born with these desires, so why do we give such importance to intelligence?

Intelligence is not associated to single entity. It expands beyond humans. Take ants for example; they have an aptitude for building complex structures such as bridge which we might call intelligent. Polar bears can survive crushing wind and cold in North Pole without any layers of clothing. If survival is the central topic to explain the need for intelligence then it must go beyond our large human brains. Survival comes in different shapes and sizes.

It is not that intelligence is or isn't a real thing. Intelligence has helped us build tools and languages to drive collective behavior to protect each other and evolve in our society. The capacity to forge allegiance to our leader so we can outwork a lion has helped us survive since the beginning of our times. We made clothes that camouflages other species. This worked in our favor because evolution didn't support us like Octopus who can impersonate a wide variety of other marine animals. This is our superpower and we call this intelligence.

Intelligence has helped us maintain our existence. Intelligence is a unifying concept bringing humanity together. That doesn't mean intelligence is a linear path. We hallucinate and romanticize intelligence because we forget that we aren't the only species in this universe. There are powerful forces that do not require intelligence such as coronavirus.

For those who are trying to find a middle-ground, pinning down what intelligence means to us is hard but is highly valued by our species and there is no getting rid of it. Let us not forget intelligence is very subjective. It is not universal.

Infant learning

How do we obtain intelligence?

Humans are not born pre-programmed with a set of instructions that we deem intelligent. Many of our other friends in animal kingdom are born and immediately get up and run around. Humans don't.

We sit there for a few months staring and listening. Babies are processing raw ingredients, but it may look like we are spending a lot of time doing nothing which is perceived to be wasteful.

Though we are not pre-programmed with instructions on how to interact with our surroundings, we do come with pre-programmed genetic code. Our DNA has genetic code which is passed on to us from our ancestors. An infant does not know how to hold a spoon to feed itself when they enter the world, but knowing how to cry when hungry is in our genetic code. This is nature vs nurture.

As babies grow into infants, they break into adaptive behavior, language, vocalization and motor skills. They are performing repetitive exercises through which learnings are reinforced. Reward and punishment play a role into this feedback cycle.

For effective learning you need cause-and-effect. The causal relationships—some things make other things happen enabling us to learn how our surroundings work. The cause and effect shapes our behavior. For example, picking a glass of water and dropping it on the floor makes carpet wet so next time we'll be careful of dropping water on a carpet.

This cause and effect drives feedback loop cycle for infants and toddlers. They modify one tiny thing to see what happens. They are constantly experimenting by interacting with their environment. Some pre-programming from DNA helps shape our personality and behavior, but major learning comes from experimenting. This is when cultural norms are also enforced. For example, nose picking is bad in public.

As the learning is reinforced for toddlers through interaction with their environment, motor skills starts to take shape. For example, motor flexes when taking a hard fall on a playground. It is impossible to instruct a toddler step by step what organs to involve when they get hurt. But human body evolves without too much practice. Daniel Dennett, an American philosopher and cognitive scientist called this competence with comprehension and competence without comprehension. There is a lot of competence in the absence of comprehension within our species. The falling on a playground is a great example.

This finally leads me to commonsense. It gives us basic assumptions that help us navigate this complex world. Experience and genetics collectively teaches us to do quick inferences. If you have a wider vehicle you know you cannot fit in a narrow alleyway. This is learned by having countless interactions between physical experience, innate repository, thinking and rationalizing. And the combination of complex intelligent systems results in projecting assumptions to other things in life. This general awareness is what we mean by common sense.

Intelligence is neither straightforward nor formulaic. Our ancestors don't bring us to the world with step-by-step instructions. But able to desire is what makes us unique and intelligent than other species in the animal kingdom.

Curiosity

What drives our desire to be curious?

As much as we desire a god like control over other species and our surroundings, we are able to fulfill this desire because human species come with curiosity and ability to communicate with language.

We learn by living in the world, we move around, we do little experiments, we build relationships, and we feel. What we have is a deep complex system that can do amazing things that we continue to discover.

No other species have we found thus far that cares deeply about why sky is blue or slowing down the decay of human organs so we can live longer. Our strong desire to learn about the world is what makes us super intelligent which has resulted in complex tooling and machinery. Our innate ability to search and explore is intrinsic because we know how babies operate. They are constantly experimenting, seeking, questioning and reasoning. I myself wonder how my kid picked things up which were never taught. This is human cognition at its finest.

We use language that is far superior to other species in animal kingdom. I can communicate to my family how I feel. Not only am I able to learn what is going on around me or within me, but I am able to communicate to others about these environments. This allows for ideas to spread quickly and help us become stronger against other species or aliens. Good or bad—is another matter, but we have figured out the flow of information and have used to our advantage.

Writing all this down using a language on a tool that was built by a co-human is a form of thinking. Able to borrow ideas and repurpose them with my own thoughts and analogies is another form of intelligence. This is what I think separates human intelligence from animal or other form of intelligence.

Perception vs reality

Is intelligence accurate? Are we are prone to false positives?

There is reality and there is perception. We tend to assume that our perceptions—sights, sounds, textures and tastes portray the real world. We shouldn't be fooled by internal simulation because external reality might not be what we think it is. The difference between the two is what creates illusion. This is why finding reality is so difficult because we all perceive things differently.

Can our senses for all 7 billion plus people reach a threshold of 100%? I am not sure, but we should keep this front and center. It is already hard to be in sync with my family unit, co-workers and friends, let alone having all 7 billion plus people be on the same page.

Human perception isn't a bad thing because it allows us to survive. Our human brain is CPU intensive compared to rest of the body. Evolution optimized our brain to not process every single input coming through our senses otherwise it would overheat. Therefore, our brains are constantly building models based on interaction with our local environment. Having an accurate model makes processing reality more efficient and why our brain hides reality.

And why we use patterns and categories. We use labels such as small, medium and large. It wouldn't be efficient for our brain to quantify exact measurement of everything we come across. But what is medium to me might mean different to my friend. These categories are subjective, but critical for survival.

Imagine being stuck in Sahara Desert. You come across a small body of water that has enough water for yourself and your camel friend. You won't sit there and measure how much. But your brain will help you evaluate whether it is too little, enough or too much for survival. And similarly, your camel friend will do the same. You might see small quantity but your camel friend might see it as large quantity. The perception is in context of experience and evolution. Humans can survive without water for weeks, but camels can survive without water for months. But in reality the water is in fixed quantity. So everything we see is one big illusion, but it has been critical to our survival. And other species behave similarly.

How our brain is structured plays a critical role in this conversation of reality versus perception.

Our brain is split into two operations—rationalizing based on logic (left hemisphere) and rationalizing based on emotions (right hemisphere). This creates two paths for human species to evaluate reality—mathematical operation and emotional operation. Our human experience also depends on our environment and experiences. A snake might be treated differently by a human who grew up in Amazon Rain Forest versus someone who grew up in New York City. How one feels about a snake is purely dependent on their emotions created by their environment.

Then are we machines? Based on this argument, we are not purely computational beings. The experiences of everyday life shapes the ultimate nature of our reality.

Machines

Now comes the machine. Can artificial intelligence go beyond their pre-programmed paradigm? Can they seek? Do they desire? Can they feel? Let's dig deeper.

Machines have 4 fundamental elements—memory, storage, compute and network, with the network acting as the crucial component that connects all of them and allows data to flow between them.

This allows a machine to do complex tasks such as deep learning and neural network. A neural network is a machine learning model that uses a network of interconnected nodes to process data and make decisions in a way that mimics the human brain based on pre-programmed instructions.

Neural networks can recognize patterns and correlations in data, cluster and classify it, and learn and improve over time. They are used in large language models (LLMs) like ChatGPT, AI image generators like DALL-E, predictive AI models or Light Detection And Ranging (LiDAR) laser-based remote sensing technology in autonomous vehicles. For example, in finance, neural networks can analyze transaction history, understand asset movement, and predict financial market outcomes.

For decades, the major manufacturer of Central Processing Units (CPUs) was Intel until it was challenged by Nvidia's Graphical Processing Units (GPUs). Unlike general-purpose CPUs, the GPU breaks complex mathematical tasks apart into small calculations, then processes them all at once. This method is known as parallel computing. A great analogy to explain this is a delivery truck dropping off one package at a time like a CPU while a GPU is more like a fleet of motorcycles spread across a city.

GPUs have been transformative in the field of AI which machine learning depends on. Researchers at Google had trained a neural net that identified videos of cats, an effort that required some sixteen thousand CPUs While other researchers had produced the same results (or better) with just two Nvidia circuit boards.

GPUs have been transformational in the field of artificial intelligence because they don't operate linearly. In other words, it is not a sequential operation. Instead, they operate non-linearly because of parallel computing. This has allowed machines to carry out unstructured tasks such as vision, language, and other complex mapping such as autonomous driving. And here lies the magic. No longer we need to have series of steps for a computer. It can operate on its own based on pre-trained dataset.

In 2017, researchers at Google introduced a new architecture for neural-net training called a transformer. The following year, researchers at OpenAI used Google’s framework to build the first “generative pre-trained transformer,” or Generative Pre-Trained Transformer (GPT). The GPT models were trained on Nvidia supercomputers, absorbing an enormous corpus of text and learning how to make human-like connections. In late 2022, after several versions, ChatGPT was released to the public. The more processing power one applies to a neural net, the more sophisticated its output becomes. This makes computers perceive as if they are talking to us in our familiar language.

Language skills and cognition aren’t intertwined because they light up different systems in the brain, and there are numerous studies of people who have lost their language abilities but are otherwise cognitively capable. For example, when people lose their ability to speak, but are able to give thumbs up or down to a question.

In the case of large language models it's the opposite—they can consume and produce language without the thinking part. It's the term that is used to describe the technology behind systems like ChatGPT from OpenAI or Gemini from Google.

LLMs are trained to fined statistical correlations in language, using mountains of text from several artifacts and other data from the internet. They are a big bag of statistics. If you ask ChatGPT, it will give you an answer based on what it has calculated to be the most likely response, based on the vast amount of information its model has ingested.

Many technology companies have gone all out on this to achieve an explicit goal of achieving artificial general intelligence (AGI). AGI has become a buzzword. AGI is a system that has human level intelligence. A computer can become as smart as human if all information is fed into it. LLMs can help us achieve AGI. Let's see how they are trained.

A large language model takes a piece of text, and it looks at all the words leading up to the end. Then it predicts what word, or more technically, what token, comes next. In the training phase, the model’s neural network weights are continually changed to make these predictions better. Once it’s been trained, the model can be used to generate new language. You give it a prompt, and it generates a response by predicting the next word, one word at a time, until the response is complete. This is somewhat similar to our biological neurons. What we practice in brain gets stronger. As the saying goes, the neurons that fire together get wired together.

For example, if we have a sentence—“I like ice cream in the [blank],” an LLM is going to predict what comes next using statistical patterns it’s picked up from human text in its training data. And it will assign probabilities to various possible words that would continue the sentence. “I like ice cream in the summer” is more likely outcome than saying “I like ice cream in the fall.” And the less likely outcome is “I like ice cream in the book” which would rank low in an LLM’s repository. It can create really sophisticated results. It's much more than just autocomplete on your phone. It encompasses a great deal of cognitive work that can be captured in next token, next word prediction challenge. However, that sophistication isn’t consistent because they'll give an answer to a question which is utterly stupid at times such as, “I like ice cream in the book.” This happens because they lack contextual awareness. This is what it means when a machine hallucinates.

Writing things in English but transforming it in other languages with different semantics is another challenge for LLMs. However, with more training data, LLMs are getting better.

No doubt about the innovation neural networks and LLMs are driving in all walks of our society but whether an artificial neuron is like a biological neuron is still debatable. LLMs certainly do not have much metabolic or biochemical or species-specific motivational behavior as humans or pets or plants do.

Can machines communicate their feelings? Can a machine pause and reflect because it is overheating and getting angry? To be decided! I am not convinced yet.

Humans vs machines

The basic premise of passing down intelligence requires understanding about our own being. Whatever we have understood is codified and passed along to machines. A simple math such as 2 + 2 = 4 is straightforward. Language semantics is complex but lives within the confines of logic.

Telling machines that anything that breaths can throw a ball. Next time you interact with a chatbot, and it tells you maple tree in your yard threw a ball to your kids while they were playing would sound shocking. Would you believe it? You can of course tweak logic further by adding that biological species such as plants and trees do not have motor skills. Additional complexity is raised when abstracting and converting logic from one language to another. Again not impossible, but complex. Today's hardware such as Nvidia's GPUs can support complex processing that is resource heavy.

However, I am not sure how to teach a machine being angry when the machine is overheated. It is a tall order to teach machines to explore these abstractions by themselves. When my 2 year old cries in the background, I feel emotions. Angry emotions. But I contain myself because it is better to stay calm to not explode the situation. Of course, I can teach this to a machine too. But can it feel the emotions?

This brings me to a famous theory—Turing Test proposed by Alan Turing–a prominent computer scientist of his era. The theory is used as a way of dealing with the question—whether the machines can think. According to Turing, whether machines can think is itself “meaningless.” However, if we consider the more precise—and related question whether a digital computer can do well in a certain kind of game that Turing describes—The Imitation Game, then at least in Turing’s eyes, we do have a question that can lead to a precise discussion.

We can tweak algorithms in Imitation Game when they are role-playing human intelligence by substituting an incorrect subject such as anyone who breaths can throw to more precision such as anyone who breaths and has motor skills can throw. The refinement can be done at its finest level, but we are still at the mercy of statistical correlations and patterns. But as discussed earlier, we aren't perfect either. We experience false positives based on our biases. However, machines lack intangible elements of human cognition such as authenticity and emotions.

AGI has been pursued differently than the way humans learn. Large language models, in particular, are created with tons of data shoved into the system with a short training period when compared to the length of learning of a human child. The Stone Soup method is a nice analogy to explain this paradigm.

Stone Soup is a European folk story in which some travelers come to a village, carrying nothing more than an empty cooking pot. Upon their arrival, the villagers are unwilling to share any of their food stores with the very hungry travelers. Then the travelers go to a stream and fill the pot with water, drop a large stone in it, and place it over a fire. One of the villagers becomes curious and asks what they are doing. The travelers answer that they are making “stone soup”, which tastes wonderful, and they would be delighted to share with the villager, although it still needs a bit of garnish, which they are missing, to improve the flavor. Visitors say it would be even better if they had a carrot or an onion that they could put in it. So the villagers go and get a carrot and onion. And then they say, this is much better. But you know, when we made it for the king, we actually put in a chicken and that made it even better. And you can imagine what happens. All the villagers contribute all their food. And then in the end, they say, this is amazingly good soup, and it was just made with a large stone.

Similarly, the computer scientists say, look, we're going to make intelligence just with next token prediction and gradient descent and transformers. And then they say, but you know, this intelligence would be much better if we just had some more data from people that we could add to it that was uploaded on the internet. But it would be even better if we could have reinforcement learning from human feedback and tell what you think is intelligent or not. All humans agree to this. Computer scientists say, we've got a lot of intelligence here, but it would be even better if the humans could do prompt engineering to decide exactly how they were going to ask the questions so that the systems could reply with intelligent answers. And then at the end, the computer scientists would say, see, we got intelligence just with our algorithms.

The Stone Soup is a great metaphor to explain how AGI is pursued. Whether large language models are smart entities is a philosophical undertaking, but the sophistication of consciousness they project is debatable. You can trick LLMs to be sentiment bots, but they are nowhere close to humans because I have not heard a good argument on whether they have an inherent drive to feel, explore and understand the world we live in and are surrounded by. They lack general awareness and a belief system.

When a human interacts with an intelligent machine by sharing a joke—it responds back with “that was funny Mihir.” But will it also respond back with a feeling that a joke invokes? Will it light up its brain that has a second level or third level impact? It might understand a joke but currently lacks the emotional response.

Let's not discount one of the greatest technological innovation of our times. Large language models are large libraries and databases that can process things a lot faster than us humans can. They might not have a character of human intelligence but still is a far superior tool we have come up with that is mind like.

The entire argument made here on comparing artificial intelligence against human intelligence is debatable. Whether an entity learns through language or human baby like, if one can get to its final destination based on simple instructions through two different routes, it is enough to call an entity intelligent. The degree of intelligence might vary, but that is the whole argument of Turing Test.

Now on the doomsday of AI—what if AI turns against us by consuming tons and tons of information and learning all by itself, reshaping its attitude and sensibility, and starts making decisions on its own, including pressing buttons of all kinds? What if we ask ChatGPT on how to solve global warming problem and if it responds back to us by saying, “By wiping out humanity as they are responsible for the problem.” How do we respond to this now? Exploration for another day, but the role-playing of human intelligence by machines is arguably the most defining moment of our times and is here to stay whether we like it or not. Both cars and electricity kill people, but that hasn't stopped us from using it. AI could be on the same spectrum.

Lastly, how does AI navigate alignment problem when it's dealt with censorship? When a government entity censors an LLM model based on their cultural values, LLMs will lack reality. Different cultural zones with different values will censor different things. Open-source models might solve it, but are the powerful entities of the world open to it?

My interest in this deep dive is not to side with one argument or the other but understand the limitations while using AI tools to power my daily workflows by 10x. We are living in the world of automation, and it is worthwhile to adapt and understand the current norms.

A brief history of AI landscape

The field of artificial intelligence (AI) originated at the 1956 Dartmouth Summer Research Project, organized by John McCarthy (a mathematics professor). This event introduced the term “AI” and laid the foundation for the discipline. During the seminar, Allen Newell and Herbert Simon presented Logic Theorist, a program capable of proving mathematical theorems using heuristic search, successfully solving 38 of 52 theorems in Principia Mathematica. It was designed to perform automated reasoning. Logic Theorist operated by exploring a “search tree”—a branching framework of possible outcomes, using heuristics to hone in on the most promising routes. This approach dominated early AI research.

Geoffrey Hinton, known as the “godfather” of modern AI, revolutionized “neural networks” by modeling them after the brain’s structure. His innovations, such as backpropagation and deep learning architectures, enabled neural networks to handle complex tasks. By 2005, advancements in computing power and data availability led to “deep learning,” which improved upon traditional neural nets with more layers, units, and computational capability.

In 2009 Fei-Fei Li founded ImageNet which is a large-scale database with over 14 million annotated images, designed to train AI models for image recognition. By exposing algorithms to labeled images, it enables them to accurately classify new, unlabeled ones. In the ImageNet Challenge, accuracy rates improved from 72% in 2010 to 75% in 2011, still trailing human performance at 95%.

A pivotal moment came in 2012 when Hinton’s students, Alex Krizhevsky and Ilya Sutskever, developed AlexNet. Using GPUs and deep learning techniques, AlexNet achieved an unprecedented 85% accuracy in the ImageNet Challenge, surpassing human performance in subsequent years. This breakthrough validated deep learning as a transformative approach.

In 2017, Google published a paper Attention Is All You Need introducing the transformer, a groundbreaking architecture built entirely on attention mechanisms, eliminating the need for recurrence. By processing all inputs simultaneously through self-attention, transformers excel at capturing context and meaning, enabling significant improvements in tasks like text prediction for Large Language Models (LLMs)—a sophisticated mathematical function that predicts what word comes next for any piece of text. Innovation of transformers laid the foundation for LLMs which are advanced conversational AI systems, such as GPT-3/4 and Claude.

The quest to emulate human intelligence continues and much has happened in the field of AI since the introduction of transformers. All these transformative events further validated Geoffrey Hinton’s insights into brain-inspired AI architectures.

Use cases

Large Language Models Details
Empathic Large Language Model (eLLM) Details: A conversational AI with emotional intelligence. This eLLM allows Hume's EVI to analyze vocal cues like pitch and tone, providing valuable insights into the user’s emotional state. With this information, EVI can tailor its responses to be more helpful, supportive, or even calming, depending on the situation.

Usecase: Customer support calls to support busy tax season or end of year donations to charities.

AI Assistant: Hume, Hume Demo
Optical Character Recognition (OCR): Google OCR, Amazon OCR, Meta TextOCR, Llama OCR Details: Extract text and data from images and documents, turn unstructured content into a business-ready structured data, and unlock valuable insights.

Usecase: Take a screenshot of Tableau dashboard and generate a one-page business summary.

AI Assistant: InternVL

Further reading

References

Books

  • The Master Algorithm | How The Quest For The Ultimate Learning Machine Will Remake Our World by Pedro Domingos
  • Competing In The Age Of AI | Strategy & Leadership When Algorithms And Networks Run The World by Marco Iansiti & Karim R. Lakhani
  • Introducing Neural networks and Deep Learning by Michael Nielsen
  • Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell
  • Words, Thoughts and Theories by Alison Gopnik and Andrew N. Meltzoff
  • The Scientist in the Crib: Minds, Brains, and How Children Learn by Alison Gopnik, Andrew N. Meltzoff, and Patricia K. Kuhl
  • The Philosophical Baby: What Children's Minds Tell Us About Truth, Love, and the Meaning of Life by Alison Gopnik
  • The Gardener and the Carpenter: What the New Science of Child Development Tells Us About the Relationship Between Parents and Children by Alison Gopnik
  • Finite and Infinite Games Book by James P. Carse
  • Developing Object Concepts in Infancy: An Associative Learning Perspective by Rakison, D.H., and G. Lupyan
  • Language and Mind by Noam Chomsky
  • On Language by Noam Chomsky
  • The Technological Singularity by Murray Shanahan
  • Embodiment and the inner life: Cognition and Consciousness in the Space of Possible Minds by Murray Shanahan
  • Solving the Frame Problem by Murray Shanahan
  • Search, Inference and Dependencies in Artificial Intelligence by Murray Shanahan and Richard Southwick
  • The Coming Wave: Technology, Power, and the Twenty-first Century's Greatest Dilemma by Mustafa Suleyman, Michael Bhaskar
  • Genesis: Artificial Intelligence, Hope, and the Human Spirit by Henry A. Kissinger, Eric Schmidt, Craig Mundie
  • The Age of AI: And Our Human Future by Henry A Kissinger, Eric Schmidt, Daniel Huttenlocher
  • A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going by Michael Wooldridge
  • Machines Who Think: A Personal Inquiry into the History and Prospects of Artificial Intelligence by Pamela McCorduck

Research Papers

Talks, Lectures & Videos

Links

Notable Blogs

Models