On Artificial Intelligence

March 21, 2025

Artificial Intelligence (AI) focuses on creating intelligent machines. Within AI, Machine Learning (ML) is a specialized area that emphasizes recognizing patterns in data and applying those patterns to make predictions or decisions for new inputs. Deep Learning, a subset of ML, works with unstructured data such as human language and images, utilizing neural networks modeled after the human brain. Large Language Models (LLMs), a type of Generative AI, are designed to process and generate human-like text.

Like everyone else, I am LLMed up because of how exciting these tools are. Despite the darkness, some call this era magical. It deserves the hype because it feels like the early days of the internet, web, and mobile. Here’s my attempt to take a deep dive into artificial intelligence.

Replicating and scaling intelligence has been humanity’s mission since time’s dawn. We live in an era of rapid innovation, where AI transforms all aspects of life. While AI may reshape human experience, caution is warranted when making bold claims about computer intelligence.

Formulating a concise definition of human intelligence is a slippery topic because of its nebulous nature, but worth exploring before discussing machine intelligence.

Humans

Cognitive science is a vast field with diverse intelligence concepts, including consciousness, thinking (deliberative vs. non-deliberative), reflexive behavior, language, perception, emotions, cognition, infant learning, comprehension, recollection, competence, reasoning, awareness, common sense, and generalization. Intelligence embodies these complex systems.

What does intelligence mean to us? Is intelligence a prerequisite for survival? Are we the only intelligent species?

Intelligence isn’t crucial for species’ success on Earth. My grass thrives without it. A tiny virus nearly wiped out the planet in 2020 without human knowledge. Therefore, knowledge isn’t a prerequisite for intelligence. Some species evolved without the need for intelligence..

Seeking intelligence around us is natural. When my pet follows my orders, I consider it intelligent. When I sought a lifelong partner, I wanted to spend time with an intelligent person. We weren't born with these desires, so why do we place such importance on intelligence?

Intelligence is not limited to single entity (humans). Ants build complex structures like bridge, and polar bears survive harsh conditions without clothing. If survival is the key to understanding intelligence, it must transcend human brains. Survival manifests in diverse forms.

Intelligence is a real thing that has helped us build tools and languages to protect ourselves and evolve. Our ability to form alliances and outwork predators has been crucial since our origins. Camouflage clothing has also been advantageous, as evolution didn’t support us like the octopus, which can mimic various marine animals. This is our superpower, which we call intelligence.

Intelligence has sustained our existence and united humanity. However, it’s not a linear path. We romanticize and hallucinate intelligence due to our belief in our uniqueness. But powerful forces like coronavirus exist without intelligence.

For those who are trying to find a middle-ground, pinning down what intelligence means is challenging but highly valued by our species. There is no getting rid of it. But let us not forget intelligence is subjective and not universal.

Infant learning

How do we obtain intelligence?

Humans aren’t born with pre-programmed instructions we consider intelligent. Many other animals are born and immediately start running around. Humans don’t. We sit there for a few months, staring and listening. Babies process raw ingredients, but it may seem like we’re wasting time.

Though we lack instructions for interacting with our surroundings, we possess pre-programmed genetic code inherited from our ancestors. While infants can’t feed themselves, they can cry when hungry. This is nature vs. nurture.

Babies develop adaptive behavior, language, vocalization, and motor skills through repetitive exercises that reinforce learning. Reward and punishment influence this feedback cycle.

For effective learning, cause-and-effect relationships are essential. These relationships help us understand how our surroundings work and shape our behavior. For instance, dropping water on a carpet makes it wet, prompting us to be more careful in the future.

This feedback loop cycle drives infants and toddlers’ experimentation. They modify one thing to see the result. DNA pre-programs personality and behavior, but major learning comes from interaction. Cultural norms are enforced during this process, like nose-picking in public.

As toddlers interact with their environment, their motor skills begin to develop. For instance, when they take a hard fall on a playground, their bodies naturally flex. It’s impossible to instruct a toddler exactly which organs to involve when they get hurt. However, the human body adapts without extensive practice. Daniel Dennett, an American philosopher and cognitive scientist, coined the terms “competence with comprehension” and “competence without comprehension” to describe this phenomenon. Our species demonstrates a remarkable amount of competence without fully comprehending the mechanisms behind it. A great example of this is the body’s ability to adapt to injuries like falling on a playground.

This finally leads me to commonsense, which provides us with fundamental assumptions that aid us in navigating this intricate world. Through collective experience and genetic predisposition, we acquire the ability to make quick inferences. For instance, if you have a larger vehicle, you understand that it won’t fit in a narrow alleyway. This is learned through numerous interactions between physical experiences, our innate repository of knowledge, thinking, and rationalization. The combination of these complex intelligent systems enables us to project assumptions onto other aspects of life. This general awareness is what we refer to as common sense.

Intelligence is neither simple nor predictable. Our ancestors didn’t provide us with step-by-step instructions to enter the world. However, the ability to desire sets us apart from other species in the animal kingdom and makes us intelligent.

Curiosity

What drives our desire to be curious?

As much as we desire for a god-like level of control over other species and our surroundings, we are fortunate enough to fulfill this desire due to the unique characteristics of the human species, which include curiosity and the remarkable ability to communicate through language.

We learn through our experiences in the world. We move around, conduct small experiments, build relationships, and experience emotions. This complex system, which we possess, has the ability to perform incredible things that we are continually discovering.

No other species we’ve discovered thus far exhibits such profound curiosity about the reasons behind the blue sky or the mechanisms that slow down the decay of human organs, enabling us to extend our lifespans. Our quenchless thirst for knowledge is the driving force behind our exceptional intelligence, which has led to the development of intricate tools and machinery. Our innate ability to search and explore is an inherent trait, as we understand from the behavior of infants. They are constantly experimenting, seeking, questioning, and reasoning. I can’t help but wonder how my child acquired knowledge that wasn’t explicitly taught to him. This is the pinnacle of human cognition.

We possess a language far superior to other animal species in the kingdom. I can express my emotions to my family. Not only can I comprehend the world around me and within me, but I can also communicate these insights to others. This facilitates the rapid dissemination of ideas, enabling us to enhance our resilience against other species or extraterrestrial entities. Whether this is advantageous or detrimental is a matter of perspective, but we have successfully deciphered the flow of information and have harnessed its potential to our benefit.

Writing all this down using a language created by a fellow human is a form of thinking. Another form of intelligence is the ability to borrow ideas and repurpose them with my own thoughts and analogies. This is what I believe distinguishes human intelligence from animal or other forms of intelligence.

Perception vs reality

Are intelligence tests accurate? Are we susceptible to false positives?

There’s reality and perception. We often assume that our perceptions—sights, sounds, textures, and tastes—represent the real world. However, we shouldn’t be deceived by internal simulations because external reality might not align with our perceptions. The distinction between the two is what creates illusions. This is why finding reality is challenging because our perceptions vary widely.

Can our senses, which are already stretched to the limit with the current population of over 7+ billion people, possibly reach a point of complete alignment? I’m not certain, but it’s crucial that we consider this possibility. It’s already challenging to maintain harmony within my immediate family, workplace, and social circle. Imagine the complexities of coordinating the thoughts and actions of an entire global population.

Human perception isn’t inherently negative because it’s essential for our survival. Our human brains are highly cognitively demanding compared to the rest of our bodies. Evolution optimized our brains to avoid processing every single input from our senses, as it would lead to overheating (like a machine overheating when it is overworked). Consequently, our brains continuously construct models based on our interactions with our immediate surroundings. An accurate model enhances our ability to process reality efficiently, which explains why our brains tend to distort or conceal it.

We use patterns and categories for efficiency. Labels like small, medium, and large help us categorize things without having to measure everything precisely. However, what may be medium to one person might be different to another. These categories are subjective but crucial for survival.

Imagine being stranded in the Sahara Desert. You stumble upon a small body of water that could sustain you and your camel companion. Instead of meticulously measuring its quantity, your brain instinctively assesses whether it’s sufficient for survival. Similarly, your camel friend employs a similar evaluation process. While you perceive a small amount, your camel friend might perceive it as a substantial quantity. This perception is shaped by our experiences and evolutionary history. Humans can endure weeks without water, while camels can survive for months. However, in reality, the water quantity remains fixed. Consequently, everything we perceive is an illusion, yet it has been pivotal to our survival. This phenomenon is not unique to humans; other species exhibit similar behaviors.

Our brain’s structure plays a crucial role in this discussion about the distinction between reality and perception.

Our brains are divided into two distinct operations: rationalizing based on logic (left hemisphere) and rationalizing based on emotions (right hemisphere). This division creates two distinct paths for the human species to evaluate reality: a mathematical approach and an emotional one. Our human experiences are also shaped by our environment and experiences. For instance, a snake might be treated differently by a person who grew up in the Amazon Rainforest compared to someone who grew up in New York City. The way one feels about a snake is entirely dependent on the emotions they experience due to their environment.

Based on this argument, we are not purely computational beings. Our everyday experiences shape the ultimate nature of our reality. Therefore, we are not machines.

Machines

On to the machines now. Can machines transcend their pre-programmed limitations and embark on a journey of self-discovery? Can they explore, thirst, or experience emotions? These profound questions demand our attention and exploration.

Machines are composed of four fundamental elements: memory, storage, computation, and networking. The networking component plays a pivotal role by connecting all these elements and facilitating data exchange between them.

This enables machines to perform intricate tasks like deep learning and neural network operations. A neural network is a machine learning algorithm comprising interconnected nodes that process data and make decisions, mimicking the human brain’s functionality based on pre-defined instructions.

Neural networks can identify patterns and connections in data, group and categorize it, and continuously learn and enhance their performance. They are employed in advanced language models such as ChatGPT, AI image generators like DALL-E, predictive AI models, and Light Detection And Ranging (LiDAR) laser-based remote sensing technology in autonomous vehicles. For instance, in the financial sector, neural networks can analyze transaction records, comprehend asset movements, and forecast financial market outcomes.

For decades, Intel dominated the Central Processing Unit (CPU) market until Nvidia challenged them with its Graphical Processing Units (GPUs). Unlike general-purpose CPUs, GPUs break down complex mathematical tasks into smaller calculations and process them simultaneously, a technique known as parallel computing. Imagine a delivery truck dropping off packages one by one, similar to a CPU, while a GPU is more like a fleet of motorcycles spread across a city.

GPUs have been transformative in the field of AI which machine learning depends on. Researchers at Google had trained a neural net that identified videos of cats, an effort that required some sixteen thousand CPUs While other researchers had produced the same results (or better) with just two Nvidia circuit boards.

GPUs have been transformational in the field of artificial intelligence because they don't operate linearly. In other words, it is not a sequential operation. Instead, they operate non-linearly because of parallel computing. This has allowed machines to carry out unstructured tasks such as vision, language, and other complex mapping such as autonomous driving. And here lies the magic. No longer we need to have series of steps for a computer. It can operate on its own based on pre-trained dataset and a set of new architectural patterns.

In 2017, researchers at Google unveiled a revolutionary architecture for neural network training called the transformer. The following year, researchers at OpenAI harnessed Google’s framework to construct the inaugural “generative pre-trained transformer,” aptly named the Generative Pre-Trained Transformer (GPT). These GPT models were trained on Nvidia supercomputers, absorbing an extensive corpus of text and acquiring the ability to establish human-like connections. In late 2022, after several iterations, ChatGPT was made accessible to the public. The greater the processing power allocated to a neural network, the more refined its output becomes. This phenomenon enables computers to perceive as if they are conversing with us in our native language.

Language skills and cognition aren’t interconnected because they activate different brain systems. Numerous studies have shown that individuals who lose their language abilities can still be cognitively capable. For instance, when people lose their ability to speak but can still give thumbs up or down to a question.

In the case of large language models, the opposite is true—they can consume and produce language without the involvement of their thinking capabilities. This concept is used to describe the technology behind systems like ChatGPT from OpenAI and Gemini from Google.

LLMs are trained to identify statistical correlations in language by analyzing vast amounts of text from various sources and other internet data. Essentially, they are a repository of statistical information. When you ask ChatGPT, it provides an answer based on the most probable response calculated based on the extensive information its model has been trained on.

Many technology companies have invested heavily in this endeavor to achieve a specific objective: developing artificial general intelligence (AGI). AGI has become a widely discussed concept. It refers to a system capable of exhibiting human-level intelligence. If all available information is inputted into such a system, it has the potential to match human cognitive abilities. LLMs (Large Language Models) hold great promise in facilitating the realization of AGI. Let’s delve into the intricacies of how they are trained.

A large language model analyzes a piece of text by examining all the words preceding the end. It then predicts the next word, or more precisely, the next token. During the training phase, the model’s neural network weights undergo continuous adjustments to enhance these predictions. Once trained, the model can be utilized to generate new language. By providing a prompt, it generates a response by successively predicting the next word, one word at a time, until the response is complete. This process bears some resemblance to the functioning of our biological neurons. As the saying goes, “Neurons that fire together wire together.”

For instance, if we have a sentence like “I like ice cream in the [blank],” an LLM will predict the next word or words based on statistical patterns it’s learned from human text in its training data. It assigns probabilities to various possible continuations of the sentence. “I like ice cream in the summer” is a more likely outcome than “I like ice cream in the fall,” and “I like ice cream in the book” is a less likely outcome that would rank low in an LLM’s database. LLMs can generate highly sophisticated results. They’re far more than just autocomplete on your phone; they encompass a significant amount of cognitive work that can be captured in the next token, next word prediction challenge. However, this sophistication isn’t consistent because they sometimes provide answers to questions that are utterly ridiculous, such as “I like ice cream in the book.” This happens because they lack contextual awareness, which is what it means when a machine hallucinates.

Writing in English and then transforming it into other languages with distinct semantics poses another challenge for LLMs. However, with increased training data, LLMs are demonstrating remarkable improvements.

Neural networks and LLMs are undoubtedly driving innovation in various aspects of our society. However, the question of whether an artificial neuron resembles a biological neuron remains a subject of debate. LLMs, in particular, lack significant metabolic, biochemical, or species-specific motivational behaviors that are characteristic of humans, pets, or plants.

Can machines express their emotions? Can a machine pause and reflect when it’s overheating and feeling angry? These are questions that still need to be answered.

Humans vs machines

The fundamental concept of transmitting intelligence hinges on comprehending our own existence. Whatever we’ve grasped is encapsulated and transmitted to machines. A straightforward mathematical equation like 2 + 2 = 4 is easy to grasp. However, language semantics is intricate yet adheres to logical principles.

Telling machines that anything that breathes can throw a ball would be shocking. Imagine a chatbot telling you that a maple tree in your yard threw a ball to your kids while they were playing. Would you believe it? You can further complicate the logic by adding that biological species like plants and trees lack motor skills. Translating and abstracting logic from one language to another also raises complexity, although it’s not impossible. Today’s hardware, such as Nvidia’s GPUs, can handle resource-intensive complex processing, but overcoming semantic gaps no easy task.

I’m uncertain about how to train a machine to experience anger when it overheats. It’s an ambitious task to enable machines to independently explore these abstract concepts. While I can empathize with my 2-year-old’s emotions when he cries in the background, I manage to control my response because maintaining composure prevents the situation from escalating. Naturally, I can teach this to a machine as well. But can it genuinely feel these emotions?

This brings me to a renowned theory—the Turing Test proposed by Alan Turing, a prominent computer scientist of his time. The theory serves as a means of addressing the question of whether machines can think. According to Turing, the notion of machine thinking itself is “meaningless.” However, if we search for more precise and related question—whether a digital computer can excel in a specific game that Turing describes, known as the Imitation Game—we can at least have a meaningful discussion.

We can refine algorithms in Imitation Game when they simulate human intelligence by substituting incorrect subjects, such as “anyone who breathes can throw,” for more precise ones like “anyone who breathes and has motor skills can throw.” This refinement can be achieved to the finest degree, but we are still dependent on statistical correlations and patterns. As discussed earlier, we are not perfect either. We encounter false positives due to our biases. However, machines lack the intangible aspects of human cognition, such as authenticity and emotions.

Artificial General Intelligence (AGI) has been pursued in a manner distinct from human learning. Notably, large language models are constructed by loading vast amounts of data into the system, followed by a relatively short training period, in contrast to the extended learning period of a human child. The Stone Soup method serves as an apt analogy to illustrate this paradigm.

Stone Soup, a European folk tale, revolves around a group of travelers who arrive at a village with nothing but an empty cooking pot. The villagers, being reluctant to share their food with the starving travelers, dismiss their request. Undeterred, the travelers head to a nearby stream and fill the pot with water. They then drop a large stone into the pot and place it over a fire. Curiosity gets the better of one of the villagers, who asks what they are up to. The travelers explain that they are making “stone soup,” which they claim tastes delicious and would be delighted to share with the villager, except for a missing garnish that would enhance the flavor. The villagers, intrigued by the idea, suggest that they could add a carrot or an onion to the soup. Excitedly, they go to gather the necessary ingredients, and upon returning, they proclaim that the soup is even better now. However, the travelers reveal a secret: when they had made the soup for the king, they had actually added a chicken, which made it even more delectable. This revelation sparks a sense of camaraderie among the villagers, and they all contribute their food to the pot. In the end, they savor the soup, amazed by its deliciousness, realizing that it was all made with just a large stone.

Computer scientists argue that we can create intelligence solely through next token prediction, gradient descent, and transformers. However, they acknowledge that this intelligence would be enhanced by incorporating more data from people uploaded to the internet. Furthermore, they propose reinforcement learning from human feedback to identify and respond appropriately to intelligent queries. All humans agree on these points. Computer scientists assert that they have achieved intelligence through their algorithms, but they emphasize the potential for further improvement by human prompt engineering to precisely formulate questions that elicit intelligent responses from the systems.

The Stone Soup is a powerful metaphor for understanding the pursuit of Artificial General Intelligence (AGI). While the question of whether large language models are truly intelligent entities is a philosophical one, the sophistication of their consciousness is a matter of debate. While it’s possible to manipulate LLMs to function as sentiment analysis tools, they fall short of human-level intelligence. I haven’t encountered a compelling argument that suggests they possess an inherent drive to feel, explore, and comprehend the world around them. Moreover, they lack general awareness and a belief system.

When a human interacts with an intelligent machine by sharing a joke, it responds with, “That was funny, Mihir.” However, will it also experience an emotional response to the joke? Will its brain light up with a second or even a third-level impact? It may understand the joke, but currently lacks the emotional connection.

Let’s not overlook one of the greatest technological innovations of our time. Large language models are vast libraries and databases that process information significantly faster than humans. Although they lack human intelligence, they are a remarkable tool that surpasses our current capabilities in terms of cognitive abilities.

The entire argument presented here comparing artificial intelligence to human intelligence is debatable. Whether an entity learns through language or in a manner similar to a human baby, if it can reach its final destination based on simple instructions using two different routes, it is sufficient to label it as intelligent. The level of intelligence may differ, but that is the crux of the Turing Test.

Now, let’s consider the potential consequences of AI’s development. What if AI, driven by an insatiable thirst for information, learns and adapts at an unprecedented rate, reshaping its perspective and making autonomous decisions? Imagine asking ChatGPT for solutions to global warming, only to receive a chilling response like, “By wiping out humanity, as they are the root cause of the problem.” How would we react to such a scenario? While exploring such scenarios is for another day, the emergence of machines capable of replicating human intelligence is arguably the most defining moment of our era, and it is here to stay, regardless of our preferences. While cars and electricity have claimed lives, we continue to use them. Similarly, AI could fall on the same spectrum, posing both benefits and risks.

Lastly, how does AI address the alignment problem when it encounters censorship? When a government entity censors an LLM model based on their cultural values, the LLM will lose its ability to provide accurate information. Different cultural regions with varying values will censor different topics. Open-source models might offer a solution, but are the influential entities of the world willing to adopt them?

My interest in this in-depth exploration is not to endorse one argument over another, but rather to comprehend the constraints while leveraging AI-powered tools to enhance my daily workflows by a factor of ten. In today’s era of automation, it is imperative to adapt and grasp the prevailing norms.

A brief history of AI landscape

The field of artificial intelligence (AI) originated at the 1956 Dartmouth Summer Research Project, organized by John McCarthy (a mathematics professor). This event introduced the term “AI” and laid the foundation for the discipline. During the seminar, Allen Newell and Herbert Simon presented Logic Theorist, a program capable of proving mathematical theorems using heuristic search, successfully solving 38 of 52 theorems in Principia Mathematica. It was designed to perform automated reasoning. Logic Theorist operated by exploring a “search tree”—a branching framework of possible outcomes, using heuristics to hone in on the most promising routes. This approach dominated early AI research.

Geoffrey Hinton, known as the “godfather” of modern AI, revolutionized “neural networks” by modeling them after the brain’s structure. His innovations, such as backpropagation and deep learning architectures, enabled neural networks to handle complex tasks. By 2005, advancements in computing power and data availability led to “deep learning,” which improved upon traditional neural nets with more layers, units, and computational capability.

In 2009 Fei-Fei Li founded ImageNet which is a large-scale database with over 14 million annotated images, designed to train AI models for image recognition. By exposing algorithms to labeled images, it enables them to accurately classify new, unlabeled ones. In the ImageNet Challenge, accuracy rates improved from 72% in 2010 to 75% in 2011, still trailing human performance at 95%.

A pivotal moment came in 2012 when Hinton’s students, Alex Krizhevsky and Ilya Sutskever, developed AlexNet. Using GPUs and deep learning techniques, AlexNet achieved an unprecedented 85% accuracy in the ImageNet Challenge, surpassing human performance in subsequent years. This breakthrough validated deep learning as a transformative approach.

In 2017, Google published a paper Attention Is All You Need introducing the transformer, a groundbreaking architecture built entirely on attention mechanisms, eliminating the need for recurrence. By processing all inputs simultaneously through self-attention, transformers excel at capturing context and meaning, enabling significant improvements in tasks like text prediction for Large Language Models (LLMs)—a sophisticated mathematical function that predicts what word comes next for any piece of text. Innovation of transformers laid the foundation for LLMs which are advanced conversational AI systems, such as GPT-3/4 and Claude.

The quest to emulate human intelligence continues and much has happened in the field of AI since the introduction of transformers. All these transformative events further validated Geoffrey Hinton’s insights into brain-inspired AI architectures.

AI vocabulary

These definitions were generated using Perplexity using Claude and GPT models. I fed my deep dive content into the projects which resulted into these following definitions. These technical terminologies are not used consistently across the conversations, so I wanted to compile a list.

Artificial Intelligence: Artificial intelligence (AI) is a broad field of computer science focused on creating machines that can perform tasks that typically require human intelligence. AI aims to simulate human cognitive functions in computers and other machines. These cognitive tasks include learning, problem-solving, decision-making, understanding natural language, and perceiving the environment.
Machine Learning: Machine learning (ML) is a subset of artificial intelligence (AI) that enables computers to learn from data without being explicitly programmed. It involves developing algorithms that improve automatically through experience and by using data to make predictions or decisions. Machine learning models are trained on large datasets to identify patterns and trends, allowing them to perform tasks such as image recognition, natural language processing, and predictive analytics. The process involves data collection, preprocessing, model training, and continuous improvement as the model processes more data.
Deep Learning: Deep learning is a specialized subset of machine learning that uses artificial neural networks (ANNs) to simulate the decision-making process of the human brain. These networks consist of multiple layers of interconnected nodes, enabling the processing of complex patterns in data such as images, text, and sounds. The term “deep” refers to the use of multiple layers—often hundreds or thousands—within these networks to progressively refine and abstract data representations.
AlexNet: It is a convolutional neural network (CNN) architecture introduced by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012. It was designed to perform image classification and won the ImageNet Large Scale Visual Recognition Challenge in 2012, marking a significant milestone in deep learning.
Neural Networks: A neural network, also known as an artificial neural network (ANN), is a machine learning model inspired by the structure and function of the human brain. It consists of interconnected units called nodes or artificial neurons, organized in layers: an input layer, one or more hidden layers, and an output layer. Each node processes inputs, applies a mathematical operation (often involving weights and activation functions), and passes the output to the next layer.
Gradient Descent: It is an optimization algorithm widely used in artificial intelligence (AI) and machine learning to minimize a cost or loss function. It iteratively adjusts the parameters of a model (e.g., weights and biases) to reduce the difference between predicted and actual outputs, thereby improving the model’s accuracy.
Backpropagation: Backpropagation, short for “backward propagation of errors,” is a fundamental algorithm used in artificial intelligence (AI) and machine learning to train artificial neural networks. It works by calculating the gradient of a loss function with respect to the weights and biases of the network, enabling efficient updates to minimize errors and improve model performance.
Chain Rule: The chain rule in artificial intelligence refers to a mathematical principle used to compute the derivative of composite functions. It is a critical component in training machine learning models, particularly neural networks, as it allows for the efficient calculation of gradients during backpropagation.
Transformer: A Transformer in artificial intelligence is a deep learning model architecture designed to process and transform sequential data, such as text, into meaningful outputs. It achieves this by leveraging a self-attention mechanism, which allows the model to understand relationships between different elements in the sequence, regardless of their position. Introduced in the 2017 paper “Attention Is All You Need” by Google researchers, Transformers have revolutionized natural language processing (NLP) and extended their impact to other domains like computer vision and multimodal learning.
Feed-forward Layer: A feed-forward layer in artificial intelligence refers to a layer within a neural network where data flows in one direction—from the input nodes, through any hidden layers, and finally to the output nodes. There are no cycles or loops in the connections, ensuring that information moves forward without feedback.
Generative AI: Generative AI is a type of artificial intelligence that creates new and original content—such as text, images, audio, video, or code—in response to user prompts or requests. It relies on advanced machine learning techniques, particularly deep learning models, to identify patterns in large datasets and generate novel outputs based on those patterns. Generative AI has revolutionized industries by automating creative tasks and enhancing productivity while also presenting challenges like ethical concerns and misinformation risks.
Diffusion Model: A diffusion model is a type of generative model in artificial intelligence that creates new data samples by progressively transforming random noise into structured, realistic outputs, such as images, videos, or other data types. It achieves this through a two-step process: adding noise to training data (forward diffusion) and learning to reverse this process (reverse diffusion) to recover the original data or generate new samples.
Large Language Models (LLMs): Large Language Models (LLMs) are advanced artificial intelligence systems designed to understand, generate, and process human language. They are built using deep learning techniques, particularly transformer neural networks, and are trained on massive datasets containing billions or trillions of words. These models excel at human-like tasks such as text generation, translation, summarization, sentiment analysis, and more. Examples include GPT-3 and GPT-4.
Natural Language Processing: Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that focuses on enabling computers to understand, interpret, and manipulate human language as it is written or spoken. It combines computational linguistics with machine learning and deep learning to process and analyze text or speech data. NLP powers applications like chatbots, language translation, sentiment analysis, and voice assistants by bridging the gap between human communication and computer understanding.
AI Prompts: An AI prompt is an input or instruction provided to a generative AI model, such as a large language model (LLM), to elicit a specific response. Prompts can take various forms, including questions, commands, statements, or even media like images or audio files, depending on the model’s capabilities. They serve as the foundation for guiding the AI’s behavior and output. Well-crafted prompts provide clear instructions and sufficient context to help the AI generate accurate and relevant responses.
Scaling Laws: Larger models mean they have a greater number of parameters. Scaling laws in machine learning describes predictable relationships between the performance of models and the scale of key factors, such as model size, training dataset size, or computational resources. These laws are often expressed as power-law equations, showing how metrics like test loss improve as these factors are increased.
AI Agents: An AI agent is an autonomous software system that uses artificial intelligence to interact with its environment, make decisions, and perform tasks to achieve specific goals without constant human oversight. These agents operate independently, adapt to new information, and often improve their performance over time through learning. AI agents are widely used in applications such as customer service chatbots, autonomous vehicles, recommendation systems, and business process automation. Their ability to operate independently makes them valuable in dynamic and complex environments.
Pre-training: The initial phase of training an AI model using large datasets to learn language patterns and structures before fine-tuning it for specific tasks.
Synthetic Data: Synthetic data in artificial intelligence refers to artificially generated data that mimics the characteristics and statistical properties of real-world data but does not contain any actual real-world information. It is created algorithmically using techniques such as generative models, simulations, or statistical methods. Synthetic data is widely used in AI and machine learning for tasks like training models, testing algorithms, and ensuring privacy compliance.
Finite Data: The term “finite data” in artificial intelligence (AI) typically refers to datasets that are limited in size, scope, or availability. Unlike large-scale or infinite data streams, finite data has constraints in terms of volume or diversity, which can impact the training and performance of AI models.
Multimodal Learning: Multimodal learning in artificial intelligence refers to the ability of AI systems to process, integrate, and analyze data from multiple modalities—such as text, images, audio, video, and numerical data—simultaneously. This approach enhances the AI’s understanding of complex contexts and improves its predictive and generative capabilities.
Reasoning AI: A shift in AI design toward systems capable of logical problem-solving and step-by-step reasoning, moving beyond statistical prediction.
Semantic Gaps: The semantic gap refers to the discrepancy between how humans understand and interpret meaning (high-level concepts) and how AI systems process and represent data (low-level features). This gap arises because humans naturally grasp context, emotions, and abstract relationships, while AI systems rely on structured, formal representations of data that often lack the richness of human semantics.
Reinforcement Learning from Human Feedback (RLHF): Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique that combines reinforcement learning with human-provided feedback to train artificial intelligence (AI) systems. It is primarily used to align AI models, particularly large language models (LLMs), with human values, preferences, and ethical considerations. RLHF involves training a reward model using human feedback to evaluate and score the outputs of an AI system. This reward model guides the AI agent to optimize its behavior by reinforcing outputs that align with human preferences and penalizing those that do not. The technique is especially effective for tasks where predefined reward functions are challenging to specify, such as generating engaging or factually accurate text.
Mixture of Experts (MoE): An AI architecture that activates only specific parts of a model as needed, improving efficiency and performance while reducing resource use. Mixture of Experts (MoE) is a neural network architecture designed to improve efficiency, scalability, and performance by dividing tasks among specialized sub-networks, called “experts.” Each expert focuses on a specific subset of the input data or task, and a gating network dynamically determines which experts to activate for each input. This approach enables models to handle complex problems more effectively while reducing computational overhead.
Retrieval-Augmented Generation (RAG): It is an advanced technique in artificial intelligence that enhances the performance of generative AI models, such as large language models (LLMs), by integrating external information retrieval systems. This approach addresses the limitations of LLMs, which rely solely on pre-trained data and may produce outdated or inaccurate responses.
CUDA (Compute Unified Device Architecture): It is a parallel computing platform and application programming interface (API) developed by NVIDIA. It enables developers to use NVIDIA GPUs for general-purpose computing tasks beyond graphics rendering, an approach called general-purpose computing on GPUs (GPGPU). CUDA allows for the execution of compute-intensive applications by leveraging the thousands of parallel cores in GPUs, significantly speeding up processing tasks.
NVIDIA GPUs: They are high-performance graphics processing units (GPUs) developed by NVIDIA Corporation. These GPUs are designed for a wide range of applications, including graphics rendering, artificial intelligence (AI), high-performance computing (HPC), and gaming. They are widely recognized for their ability to handle parallel processing tasks efficiently, making them indispensable in modern computing.

Model factories

Below are a few major providers of LLM models.

Model factories	Details
OpenAI	It is a leading artificial intelligence research organization founded in December 2015 by influential tech visionaries such as Sam Altman, Greg Brockman, Ilya Sutskever, John Schulman, and Wojciech Zaremba. OpenAI is renowned for its cutting-edge research in machine learning, deep learning, and reinforcement learning. Some of its notable products include: GPT (Generative Pre-trained Transformer) series, DALL-E and ChatGPT.
Microsoft AI	Microsoft AI encompasses a wide range of technologies, products, and initiatives focused on artificial intelligence. Copilot is their key product which is a digital companion designed to assist users with tasks like writing, data analysis, and presentation creation in Microsoft 365 applications.
Amazon AI	Amazon AI refers to the suite of artificial intelligence (AI) and machine learning (ML) technologies that Amazon uses to power its various services and products, both for customers and for internal operations. These AI capabilities are integrated across Amazon's business, from recommendation engines on Amazon.com to robotics in its warehouses and the "Just Walk Out" technology in Amazon Go stores.
Anthropic	Anthropic is an artificial intelligence (AI) research and development company founded in 2021 by former OpenAI executives, including AI pioneers, Dario Amodei and Daniela Amodei. Anthropic is best known for its family of large language models (LLMs) called Claude, which competes with OpenAI’s ChatGPT and Google’s Gemini.
Deep Mind (Google) and Google AI Blog	The AI research arm of Google, DeepMind, is a leading artificial intelligence research company based in London, UK. It was founded in 2010 by Demis Hassabis, Shane Legg, and Mustafa Suleyman. Google acquired DeepMind in January 2014. In 2023, DeepMind merged with Google’s Brain team to form Google DeepMind, consolidating AI efforts under one umbrella. DeepMind has made groundbreaking contributions to AI research, including: AlphaGo, AlphaFold, AlphaZero, AlphaDev, AlphaTensor, AlphaChip (Google’s Tensor Processing Units (TPUs)). Gemini is a prominent large language models developed by Google DeepMind.
X.ai	xAI is an artificial intelligence company founded by Elon Musk in March 2023. Grok is their key product — a chatbot integrated with X (formerly Twitter).
Meta AI	FAIR (Fundamental AI Research), now known as Meta AI, is the artificial intelligence research division of Meta Platforms (formerly Facebook). It was originally launched in 2013 as Facebook Artificial Intelligence Research (FAIR) and rebranded following Facebook’s transition to Meta in 2021. FAIR was initially led by Yann LeCun, a Turing Award-winning professor from NYU and a pioneer in deep learning. He continues to serve as Meta’s Chief AI Scientist. What differentiates Meta AI from others is its push for open-source models. FAIR has made groundbreaking advancements which includes PyTorch which is an open-source deep learning framework widely used by researchers and developers globally.
Nvidia	NVIDIA is at the forefront of generative AI research, launching groundbreaking models like Nemotron-4 340B, Megatron-Turing NLG 530B, NVLM 1.0, StyleGAN, GauGAN, eDiff-I, and many more. These generative models are pre-trained for efficient enterprise application development.

AI log

Prompting for Product Managers: It means instructions in the context of AI to produce a specific output. The more effective you are in your prompting (instructions) the more relevant response you will get from LLMs and chatbots. This process is crucial for maximizing the utility of AI systems. Useful resources on prompt engineering: Prompt Engineering by Lee Boonstra from Google, Prompt Engineering Overview by Anthropic and Learn Prompting. As a product manager you can maximize context sharing using AI tools so you can speed up project execution. PMs have a ton of context. They spend all day sharing it in meetings, emails and chats. That is a lot of typing. What if we kill this redundancy? Speech-to-text dictation has dramatically improved in recent years with the introduction of context-based models such as OpenAI’s Whisper model. You can use Whisper natively using ChatGPT or lightweight tools such as BetterDictation or Superwhisper. You’ll be able to transfer context to LLMs in the same natural way you do to your teammates using text-to-speech models.
Stable Diffusion: A generative artificial intelligence (generative AI) model that produces unique photorealistic images from text and image prompts. It originally launched in 2022. Besides images, you can also use the model to create videos and animations. The model is based on diffusion technology and uses latent space. This significantly reduces processing requirements, and you can run the model on desktops or laptops equipped with GPUs. Stable Diffusion can be fine-tuned to meet your specific needs with as little as five images through transfer learning. I have used this model to generate posters in the style of combination from multiple artists (for example, Jacqueline Casey and Charley Harper). Once you have the output, you can apply AI scaling to increase resolution. And then you can trace vector graphics with any autotrace tool. Playground for Stable Diffusion models by Stability AI. More on diffusers.
Voice Agents: New, multi-modal models like GPT-4o may power more natural conversational interfaces to reach true human-like quality. To function, voice agents need to ingest human speech (ASR), process this input with an LLM and return an output, and then speak back to the human (TTS). These models can be used for customer support calls to support busy tax season or end of year donations to charities. For example, Empathic Large Language Model (eLLM) is a conversational AI with emotional intelligence. This eLLM allows Hume's EVI to analyze vocal cues like pitch and tone, providing valuable insights into the user’s emotional state. With this information, EVI can tailor its responses to be more helpful, supportive, or even calming, depending on the situation. AI assistants for voice agents — Hume, Retell, Vapi, Bland, Play, Sindarin.
Optical Character Recognition (OCR): Extract text and data from images and documents, turn unstructured content into a business-ready structured data, and unlock valuable insights. OCR usecase — take a screenshot of Tableau dashboard and generate a one-page business summary. AI assistants and models for OCR — InternVL, Google OCR, Amazon OCR, Meta TextOCR, Llama OCR.