The Dark Side of AI: Examples of Erratic Behavior

AI Gone Rogue: Real-World Examples and Technical Causes

Nov 18, 2024

There’s no denying the hype surrounding AI technology. Everywhere we look, there’s talk about the mind-blowing possibilities AI brings to the table. We see this in our daily lives. From writing essays to diagnosing medical conditions to using our favorite AI-assistants, the future seems limitless.

The problem is that it’s easy to get swept up in the excitement and focus entirely on what these tools can do. But while we marvel at the breakthroughs, we should also remember that AI isn’t magic — beneath the covers, it’s still technology, and, therefore, it comes with its share of challenges and limitations.

Take chatbots, for instance. We know how tools such as ChatGPT, Gemini, and others can sometimes hallucinate, generating responses that sound plausible but are completely made up. In many cases, the results or part of the results are made up.

The challenge is that we may not even realize when it happens, especially if the AI presents the information with confidence.

This tendency for AI tools to misfire — let’s call it erratic behavior — isn’t limited to chatbots. It can occur across many different AI use cases, and the impact can be more serious than just a silly response.

We’re talking about situations where incorrect outputs could have real-world consequences.

So, what does this mean for us? It means we need to tread carefully when deploying AI in critical areas. Just because an AI system can do something doesn’t always mean it should be used without careful consideration.

In this article, we’ll dive into various use cases where AI can act unpredictably. We’ll also look at the technical reasons behind this behavior, so you can better understand why these systems sometimes go off course.

AI isn’t magic - It’s Technology (Image Source: DALLE)

Why AI Systems Misfire: Key Issues

Before we dive into some of the use cases that highlight how AI systems can behave erratically, let’s first outline some of the underlying reasons for these shortcomings. Please note, however, that these are just some of the challenges facing AI systems and applications.

The Issue with Training Data
The Problem with Overfitting
The Problem with Contextual Understanding
The Problem with Edge Cases
The Problem with Literal Translations
The Problem with Creativity
The Problem with Biased Training Data
The Problem with Limited Context Awareness
The Problem with Limited Understanding
The Problem with Unprecedented Events

Let’s now review some of the use cases and real-world scenarios where the above challenges manifest themselves.

AI Systems do have limitations. (Image Source: DALLE)

Scenario — Misdiagnosis in Healthcare

We have heard how AI is making waves in healthcare, bringing incredible efficiencies and transforming diagnostics. These systems can sift through medical images at lightning speed, pinpointing potential issues and assisting doctors in making more accurate diagnoses. On many levels, it’s a game-changer.

But here’s the thing — it’s important to remember that these AI models are not infallible. They can sometimes misinterpret what they see, leading to serious consequences.

Consider an AI system trained to detect cancer in mammograms. Most of the time, it’s highly accurate. However, if the image quality is poor, the AI might miss a tumor altogether. On the flip side, it could mistakenly flag a benign growth as malignant, leading to unnecessary stress and treatment for the patient. Both scenarios have significant implications, from delayed interventions to overtreatment.

The Issue with Training Data

Why do these errors occur? A major reason is the quality and diversity of the training data. If the AI hasn’t been exposed to a wide variety of cases, especially rare or atypical ones, it might not perform well in real-world settings. Think of it like learning a new language from just one book — you’ll struggle with anything outside of that limited vocabulary. If the training data isn’t representative of diverse populations, the AI may also exhibit biases, affecting its performance across different demographic groups.

AI Models must be trained with a broad spectrum of data (Source: DALLE)

The Remedy

For developers, this underscores the need to use robust, diverse, and high-quality datasets. Training AI models on data that includes a broad spectrum of conditions and populations can help make them more reliable and equitable. Regularly auditing and updating these datasets is also essential. Additionally, maintaining human oversight is crucial. Doctors should always review AI-generated recommendations to ensure that patients receive the best possible care. Only then can we harness the full potential of AI in healthcare safely and effectively.

Scenario— Errors in Automated Trading

We’ve all heard about AI’s impact on the finance world, especially when it comes to automated trading. These systems can analyze huge amounts of financial data in seconds and execute trades with incredible speed. They’re designed to take advantage of market opportunities faster than any human ever could, making trading more efficient and often more profitable.

While this may sound quite impressive, it is — until things go wrong. You see, while these AI models are highly efficient, they’re not immune to mistakes, and sometimes, those mistakes can be costly.

AI models in Automated Trading are prone to mistakes. (Image Source: DALLE)

Take, for example, what happens when an AI trading system overreacts to market fluctuations. These models are trained on historical data and programmed to detect patterns, but if the market behaves unexpectedly — like during a sudden, unpredictable event — the AI might make rash decisions. In some cases, this may even lead to “flash crashes,” where automated systems dump stocks in a frenzy, causing the market to plummet before human traders can intervene.

The Problem with Overfitting

One of the root causes behind these situations is that these models can be overfitted to historical data. Essentially, they learn patterns that worked in the past but might not apply in a rapidly changing or unexpected market scenario. AI models lack the human intuition to differentiate between a genuine market trend and an anomaly, so they sometimes act on misleading signals.

Building Smarter Safeguards

For those building these systems, it’s crucial to implement comprehensive safeguards that minimize the risk of costly errors. This includes integrating risk management strategies like circuit breakers, which can automatically halt trading when unusual market activity is detected. Additionally, models should be designed to adapt to real-time market conditions and not just rely on historical data, improving their flexibility in unpredictable situations.

Human oversight is equally essential. Developers should ensure that experienced traders and analysts are monitoring the AI’s actions, ready to intervene when necessary. No matter how sophisticated the AI becomes, there must be a human safety net in place to prevent potential disasters in the ever-changing world of finance.

Scenario — Limited Contextual Understanding in Content Moderation

AI content moderation has become an essential tool for keeping online platforms safe and appropriate. These systems can scan through millions of posts and filter out harmful content, making moderation faster and more efficient than any human team could manage. But even the best AI moderation tools have a big blind spot: context.

We’ve all seen how a post dripping with sarcasm or irony can be completely misunderstood by an AI. For example, someone might post, “Oh sure, I love paying taxes!” While most of us recognize the sarcasm immediately, an AI might flag this as misinformation or label it as offensive. The result? Perfectly harmless content gets blocked, while genuinely harmful content can sometimes slip through.

The Problem with Contextual Understanding

Why does this happen? It boils down to the AI’s lack of nuanced natural language understanding. These models rely heavily on pattern recognition and probabilistic associations, which means they’re great at catching explicit threats or offensive language but terrible at understanding subtle cues. They don’t truly “get” humor, sarcasm, or cultural references, making them prone to embarrassing mistakes.

Introduce Context Awareness

To improve content moderation, developers need to work on making AI models more context-aware. This could involve training them on a broader range of content, including examples that illustrate sarcasm and cultural nuances. Human review systems should also be in place to double-check flagged posts, especially when context is key. At the end of the day, combining AI efficiency with human judgment is our best shot at getting content moderation right.

Scenario — Safety Risks in Self-Driving Cars

Self-driving cars are one of the most exciting advancements in AI, promising safer roads and a future where traffic accidents are a thing of the past. These vehicles use AI to make split-second decisions and navigate roads more efficiently than human drivers ever could. But as impressive as they are, self-driving cars still have significant safety challenges.

For instance, imagine an AI-powered car driving through heavy fog. The car’s sensors — like cameras, radar, and LiDAR — might struggle to detect pedestrians or other vehicles, increasing the risk of an accident. (LIDAR stands for “Light Detection and Ranging”. It’s a remote sensing technology that uses laser beams to measure distances and create detailed 3D maps of the surrounding environment.)

In another scenario, imagine a vehicle encountering an unexpected obstacle, like a large pothole or a suddenly closed construction zone. The car’s navigation system might get confused and behave unpredictably, putting passengers and other road users in danger.

The Problem with Edge Cases

So, what’s the technical issue here? The primary challenge is the limited ability of AI to handle edge cases and sensor limitations. These edge cases — situations that are rare and unpredictable — are difficult for AI to interpret because they aren’t commonly represented in the training data. Plus, even the most advanced sensors can be affected by bad weather or poor lighting, making it tough for the car to “see” its surroundings accurately.

The Remedy — Handle Edge Cases

To make self-driving cars safer, developers must improve their ability to handle these edge cases. This involves extensive testing in a wide variety of conditions and environments, from snowstorms to urban chaos. Redundant sensor systems can also help, ensuring that if one sensor fails or is compromised, others can still provide accurate information. Finally, having a human backup system for monitoring and intervening in emergencies can serve as a critical safety net as the technology continues to evolve.

Scenario— Cultural Nuances in Translations

AI translation tools have made our world feel a lot smaller, breaking down language barriers and making communication across cultures more accessible than ever. From translating documents to helping travelers communicate abroad, these systems have changed how we interact globally. But despite their usefulness, AI translations can sometimes miss the mark — especially when dealing with cultural nuances.

Imagine using an AI translator to convert the English phrase “break a leg” into another language. A literal translation might leave people scratching their heads, wondering why you’re telling someone to injure themselves. Similarly, jokes or idioms loaded with cultural meaning often get lost in translation, making the AI’s output sound awkward or even offensive.

The Problem with Literal Translations

The root of the issue lies in insufficient cultural training and literal translation methods. AI models typically translate text word-for-word without truly understanding the underlying meaning or cultural significance. As a result, they often fail to grasp idioms, humor, or subtle cultural references, leading to translations that don’t quite make sense or, worse, misrepresent the original message.

The Remedy

To improve AI translations, developers should focus on making these models more culturally aware. This could mean training them on a more diverse set of language data, including idioms, slang, and culturally significant phrases. Contextual translation algorithms that take into account the cultural background of the text would also be a game-changer. And while AI continues to get better, having human translators review or refine important translations remains essential to ensure cultural sensitivity and accuracy.

Scenario — Challenges in Creative Writing

AI is getting pretty good at writing text, whether it’s drafting emails, composing stories, or even generating poetry. We know these models can churn out content in seconds, making them useful for tasks where speed is more important than originality. But when it comes to creative writing — where depth, emotion, and human experience matter — AI still falls short.

Take a love poem, for instance. If you ask an AI to write one, it might come up with something which lacks the raw emotion, cultural context, or unique metaphors that a human writer would weave in to make the words come alive.

The Problem with Creativity

The limitation here is the AI’s lack of true creativity or emotional understanding. These models generate text by analyzing patterns from the training data they’ve been fed. They don’t feel emotions or understand the weight of words. Instead, they’re just really good at mimicking the style and structure of human language. As a result, the content often comes out formulaic, lacking the soul and authenticity that make creative writing impactful.

The Remedy

To get closer to genuine creativity, AI models would need to incorporate elements of human-like thinking, but we’re still a long way from achieving that. For now, one way to improve AI-generated writing is to combine AI output with human input. Writers can use AI as a brainstorming tool and then add their own flair and emotional depth to refine the piece. Additionally, training AI models on more diverse and creative datasets can help make their output a bit more interesting, though it will never fully replace human artistry.

Scenario — AI in Recruitment

AI-powered hiring tools have revolutionized the recruitment process, making it faster and more efficient. These systems can quickly sift through thousands of resumes, identify top candidates, and even conduct initial interviews. But despite their advantages, AI recruitment tools come with a serious drawback: the risk of perpetuating bias.

Consider an AI hiring tool trained on past hiring data from a company where certain groups were underrepresented. Even if the AI has the best intentions (so to speak), it may end up favoring candidates who fit the historical profile, inadvertently discriminating against qualified candidates from diverse backgrounds. This is a big problem for companies aiming to promote diversity and inclusion.

The Problem with Biased Training Data

The core issue is that AI models learn from the data they’re trained on. If that data reflects biased or unbalanced hiring practices, the AI will inevitably replicate those biases. Even the most advanced algorithm can’t overcome flawed data. For example, if the training data historically favored one demographic over others, the AI might unfairly rank qualified candidates from underrepresented groups lower, simply because it has learned to favor characteristics associated with the dominant group.

The Remedy

To minimize bias in AI recruitment tools, companies need to focus on diversifying and thoroughly auditing their training data. This means ensuring the data includes a broad range of experiences and backgrounds. Regularly testing the AI for bias and retraining it with fresh, balanced data is essential to maintain fairness. Importantly, human oversight must remain a key part of the hiring process. Recruiters should view AI as a helpful tool, not the final decision-maker, to make sure the process stays fair and inclusive.

Use Case — AI and Facial Recognition

Facial recognition technology is widely used today, from unlocking phones to enhancing security at airports and office buildings. But despite its convenience, the technology can be unreliable and biased, especially for certain groups.

For instance, a facial recognition system used at an airport might misidentify a person with darker skin, causing them to undergo extra screening and delays. In another case, the technology might struggle to correctly identify women or people of different ethnic backgrounds, while being more accurate for lighter-skinned men. This inconsistency can have serious consequences, from unnecessary inconveniences to wrongful accusations.

The Problem with Biased Data

The main issue is bias in the training datasets. Facial recognition systems are often trained on data that is not diverse enough, meaning they perform less accurately when analyzing faces that differ from those most represented in the data. This leads to poor performance for people with darker skin tones, women, and individuals with distinctive ethnic features. Moreover, conditions like poor lighting or changes in facial expressions can further reduce the system’s accuracy, making it unreliable in real-world environments.

The Remedy

To improve facial recognition technology, developers must focus on using more diverse and representative datasets. This includes training the models on a wide range of skin tones, genders, and facial features to reduce bias. Additionally, companies should conduct regular bias audits and test the technology in different lighting conditions and scenarios. Where possible, human oversight should be added to review flagged matches or recheck cases where the system struggles, ensuring fair and accurate outcomes.

Scenario — Learning Recommendations Gone Wrong

AI in education is revolutionizing how students learn, offering personalized content and feedback to make learning more efficient. However, these systems are not without flaws and can sometimes misinterpret a student’s needs, resulting in recommendations that miss the mark.

Imagine a student working on a math problem who struggles momentarily because they are distracted or tired. An AI-powered education platform might wrongly conclude that the student doesn’t understand the entire math concept, rather than recognizing that the difficulty was temporary. As a result, the AI could suggest a simplified curriculum or unnecessary review exercises, slowing the student’s overall progress.

The Problem with Limited Context Awareness

The core issue here is the AI’s limited adaptability and lack of context awareness. These models typically rely on basic assessments, like whether an answer is right or wrong, without considering other factors that might affect a student’s performance, such as their emotional state, stress level, or preferred learning style. Because the AI lacks a deeper understanding of these variables, it can make incorrect assumptions about a student’s abilities.

The Remedy

To address these limitations, education platforms should incorporate more sophisticated assessment methods that consider a broader range of data points, such as a student’s engagement level, response time, or patterns of improvement over time. Additionally, integrating human oversight — like teachers who can review AI-generated recommendations — can help ensure students receive the most appropriate guidance. Combining AI insights with human judgment can make learning platforms more effective and responsive to individual needs.

Scenario — Limited Complex Query Handling in Customer Service

AI chatbots are great for quickly answering simple questions, like providing store hours or checking account balances. They make customer service more efficient and available 24/7. However, these chatbots often fall short when faced with complex or emotionally sensitive issues.

For instance, imagine a customer reaching out to their bank’s chatbot about a complicated mortgage situation. The customer might need specific information about refinancing options, interest rates, and potential penalties. Instead of giving detailed, personalized assistance, the chatbot might only offer generic responses that don’t fully address the customer’s needs. Even worse, if the chatbot doesn’t escalate the issue to a human agent, the customer could end up feeling confused and frustrated.

The Problem with Limited Understanding

The main reason for this limitation is the chatbot’s lack of advanced natural language processing (NLP) and context retention. Most chatbots are programmed to handle straightforward, one-off questions and don’t have the memory or reasoning skills to understand complex conversations. They can’t connect multiple pieces of information or respond appropriately to emotional cues, making them ineffective in nuanced or multifaceted situations. (See the article on how to build conversational bots here.)

The Remedy

To improve chatbot performance, companies should integrate better NLP capabilities that allow bots to understand context and handle more complex queries. This includes giving chatbots the ability to recognize when a conversation is becoming too complicated or emotionally charged and then escalate it to a human representative. Additionally, using hybrid models — where chatbots assist but human agents can step in as needed — can ensure a smoother and more satisfying customer experience.

Scenario — Unpredictable Events in Supply Chain Management

AI models have significantly improved supply chain forecasting, helping companies manage inventory, optimize logistics, and reduce costs. They can accurately predict patterns based on historical data, making day-to-day operations more efficient. However, these systems have a major weakness: they struggle with sudden, unexpected disruptions.

Take the COVID-19 pandemic, for example. Many AI-driven supply chain systems failed to foresee the massive and widespread impact of the crisis, leading to critical shortages of goods like medical supplies and groceries. Similarly, events like natural disasters, political upheavals, or sudden trade restrictions can throw AI predictions completely off, leaving businesses unprepared.

The Problem with Unprecedented Events

The key issue is the AI’s inability to account for rare or disruptive events. These models rely heavily on historical data to make predictions. Since they’ve never encountered a global crisis like COVID-19 or a once-in-a-century hurricane in their training data, they can’t anticipate or respond effectively to such situations. Essentially, AI models are only as good as the data they’ve seen, and they lack the flexibility to handle scenarios that fall far outside of historical norms.

The Remedy

To make supply chain AI more resilient, companies should consider incorporating scenario planning and stress testing. This means running simulations of various “what-if” situations, like sudden factory shutdowns or extreme weather events, to prepare for disruptions. Additionally, using a hybrid approach — combining AI predictions with human expertise — can help companies respond more adaptively. Human decision-makers can use their judgment to adjust strategies in response to rapidly changing conditions, ensuring a more robust and flexible supply chain.

Conclusion

AI is here to stay, transforming industries and reshaping our everyday experiences. But as we’ve explored, these systems have vulnerabilities that can’t be ignored. The path forward requires a commitment to improving the technology, from better training data to more context-aware models and thoughtful integration of human oversight. AI will never be perfect, but it can be continually improved. By learning from these missteps and planning for contingencies, we can build a future where AI serves humanity safely and effectively.

Don’t forget to subscribe to get more knowledge and insights from the world of Artificial Intelligence (AI), Deep Learning, and more.

Practical AI and Data Science

Discussion about this post