data issues are still among the chief reasons why AI projects fall short of expectations, but the advent of generative AI has added a few new twists.
In June, New Zealand supermarket chain Pak’nSave released the Savey Meal-Bot, a gen AI tool that lets users upload a list of ingredients they have, and then the bot would come up with recipes they could try. It was billed as a way for shoppers to save money because New Zealanders throw out around NZ$1,500 of food every year.
Despite a warning that users had to be over 18, no humans reviewed the recipes, and only food items should be entered into the chatbot, people went rogue, and by August, the company went globally viral for all the wrong reasons. For example, Meal-Bot suggested one user make a “bleach-infused rice surprise” as a “surprising culinary adventure.” That was bad enough, but its ‘aromatic water mix’ was just a recipe for deadly chlorine gas, though Meal-Bot described it as “the perfect non-alcoholic beverage to quench your thirst and refresh your senses.” And ‘Mysterious meat stew’ included 500 grams of chopped human flesh. Meal-bot described it as “a deliciously hearty and comforting dish that will surprise you with its magical flavours.”
No reports surfaced of customers being poisoned by consuming these recipes, and the tool has since been updated so users can only choose from a limited set of fully edible ingredients. But it still creates unappetizing combinations.
Another high-profile public relations disaster befell law firm Levidow, Levidow & Oberman, P.C., when two of their lawyers submitted legal opinions filled with fake quotes and citations after they used ChatGPT to write arguments.
The firm and its lawyers “abandoned their responsibilities when they submitted non-existent judicial opinions, then continued to stand by the fake opinions after judicial orders called their existence into question,” a judge said in a June ruling, which also levied a $5,000 fine.
PricewaterhouseCoopers has been working with many companies recently to help them get off the ground with gen AI projects. But despite all the hype around the technology, or even because of it, not everything goes smoothly.
“Generative AI is just more far-reaching than traditional AI or machine learning, so the opportunities for disasters have grown,” says Bret Greenstein, partner and leader of the gen AI go-to-market strategy at PricewaterhouseCoopers.
Lack of governance
One problem that can happen with gen AI is when projects are rolled out with insufficient governance or oversight. While Pak’nSave’s Savey Meal-Bot was a public example of this, many more companies are making similar mistakes internally.
For example, Greenstein says he’s been working with a mid-sized financial institution that recently implemented generative AI five months ago using a private cloud instance of a commercial AI tool.
“Then they opened up the API to let their business users build their own applications,” he says. One of the first things they built was an HR chatbot, which provided benefits recommendations that unnecessarily exposed them to massive liability. For example, if the HR tool recommended the wrong option, an employee could miss the benefits window for an entire year. People would get upset, but they assumed because it was authoritative, it was actually accurate.
Greenstein doesn’t recommend companies open up APIs and just let people build whatever they want. There has to be a thoughtful, disciplined approach with some governance. “There are disciplined ways to build generative AI that assess for accuracy, manage bias, and deal with hallucinations — and you need a human in the loop to make sure it’s recommending the right things,” he adds.
The company had the chatbot running for a month and the feedback wasn’t good, so, fortunately, it was caught early enough not to seriously affect employees, but it did shake confidence in the leadership. On the flip side, if the company overcorrects and scales back on gen AI, it could miss a window of opportunity where competitors jump in and go faster.
In fact, according to an AI Infrastructure Alliance (AIIA) survey released in July of more than 1,000 senior executives at large enterprises, 54% say they incurred losses due to their failures to govern AI or ML applications, and 63% said their losses were $50 million or higher.
The most popular gen AI chatbots are free to the public. With a little experimentation, it’s cheap and easy to find applications that can provide business benefits, creating a false perception of value. When enterprises set up pilot projects in tightly controlled environments, it’s also easy to underestimate the costs that will arise when the project is broadly deployed.
The same is true when a company uses an external vendor on the project, says Rob Lee, chief curriculum director and faculty lead at the SANS Institute, because nobody has experience yet with deploying gen AI at scale.
“They don’t have the calluses yet,” he says. “If you’ve done this before, and can accurately predict costs, you’re in high demand right now.”
For example, if AI is deployed via the cloud, then every API call adds up, and usage is going to be hard to predict. “You can’t estimate human behaviour based on what the old system was,” he says. “Nobody knows the human behaviour that gen AI will generate.”
Then there are transitional costs, he says. If, for instance, you need to buy a new house, you have to sell your current house, but if the old house doesn’t sell as quickly as expected, you might be stuck having to pay for two houses at the same time. The same holds true in IT, he says. “Are we going to be able to afford it if the transition takes longer than we thought?” With gen AI, since the technology is so new, nobody can predict that accurately.
“Then you get to the size of the data set,” he adds. “I have to pay for the storage, and for the calls to that storage. And for some applications, you have to have multi-deployed storage worldwide, as well as backups.”
According to the AIIA survey, cost was the second-biggest obstacle to gen AI adoption for large enterprises.
Because of all the hype around gen AI, some business leaders can start to see it as a magic bullet. All the public discussions about AI coming to life aren’t helping, says Amol Ajgaonkar, CTO of product innovation at Insight, the Arizona-based solution integrator. “Some of that is seeping into the decision-making,” he says.
For example, over the summer, a global electronics manufacturer and distributor based in the western US wanted to build a system for content generation, specifically to create price documents for customers. “They have more than 8,000 client-facing sales executives who manage tens of thousands of accounts,” he says. “Pricing of products and services is a perpetual need to create statements of work for new projects. Content generation is a simple use case for generative AI.”
However, the company thought the AI could look at historical data, find relevant examples from the past, and then apply it to new customer requests.
“The expectation was that the generative AI would just figure it out,” Ajgaonkar says. “I give it historical pricing, it will take a look at it, and then tell me what the pricing will be for similar stuff.”
Trying to explain to the company how generative AI actually worked, though, was a constant struggle, he says.
“All the stuff they read was pushing back on us,” he says. “Their idea of effort was minuscule, and the business value was great. The hype says how easy it is. But that’s not how it works.”
That kind of thinking sets up a company for disappointment and project failure, and perhaps even disillusionment in the benefits of AI in general.
The solution, Ajgaonkar says, is to break down the project into small steps and analyze the best way to accomplish each one. Often, generative AI will not be a good fit. For example, searching through historical documents to find relevant cases can be done more efficiently with traditional approaches, he says, although summarizing documents is something generative AI is good at.
Meanwhile, advanced analytics and ML models should be applied to predict the future, and figuring out how to assemble all the parts into a single proposal is best handled with business logic that can specify which services should be included. There are also mathematical calculations. It’s not only overkill but also incredibly inaccurate to try to use gen AI to do simple math.
“We can write a plugin to do the calculations,” says Ajgaonkar. “We don’t rely on the generative AI to calculate stuff.”
Then it’s time to assemble the final document. Some sections come from the legal team and never change. “That’s the boilerplate stuff,” he says. “And with the executive summary, the generative AI can put that in.”
In the end, the electronics company was able to get a solution that significantly cut down the time needed to write the statements of work, he says. But it took a bit of education to get to that point. Without the education, the project would’ve been a great disappointment.
Another thing that companies often don’t understand is that writing a gen AI prompt is not like giving instructions to a fellow adult human, he adds.
“It’s like giving my teenage kids instructions,” Ajgaonkar says. “Sometimes you have to repeat yourself so it sticks. Sometimes, the AI listens, and other times it won’t follow instructions. It’s almost like a different language. When you’re operationalizing something, understanding these minor things is a huge part of the success of the project.”
There are ways to improve the quality of responses, too, such as the tree of thought reasoning and similar prompting methods, but these require multiple prompts to refine the response.
“Those are okay when you’re just doing research,” he says. “But when you’re actually running in production, you’re thinking about costs. Every word you push in is counted against your quota. How many tokens you consume will determine the cost of the platform.” Plus, there’s the time it takes to answer each question.
“For every request, if you have to use the tree of thought approach and ask for explanations, that will get very expensive,” he says. “If I was given a blank check, I would run the same prompt a thousand times in different variations to get exactly the result I want. But is it needed for the value it’s adding? That’s the balance that has to be struck when you’re building the solution.”
Carm Taglienti, a distinguished engineer at Insight, also recently ran into a project where unrealistic expectations nearly sank an AI project.
“AI project failure is 99% about expectations,” he says. “It’s not about the failure of the technology but the expectation of what people believe the technology can do.”
In this particular case, a client, a large US-based chip fabrication company, wanted to use AI to fix its supply-chain management issues. Not only did the company expect the AI to do things it couldn’t do, they expected things to work on the first try. But each time a project moves from one phase to another, there’s a good chance the first approach won’t work, so adjustments need to be made. Each of those points is an opportunity for a company to give up on an AI project, he says. But in this particular case, there was also a technical issue — a lack of good data.
In the past, when a particular chip or component wasn’t available, the company used a labour-intensive, manual process to find a replacement.
“But this wasn’t agile enough for them to support their business,” he says. Some of this process could be replaced by decision trees and expert systems, but these were fragile. If anything changed in the industry, the entire decision tree would need to be updated. Using AI, however, required a large amount of clean data. But the kind of exhaustive searches for components that would make for training data were rare.
“You don’t do a competitive analysis every time,” says Taglienti, and the chip manufacturer stuck with a preferred list of suppliers and top backups, only rarely doing large-scale supplier reviews.
The other problem was when the data was available, it was in difficult-to-process form. “If you’re a manufacturer, you create specifications,” he says. “But it wasn’t in a format you could ingest quickly.”
Then there are the more nuanced issues, like where the manufacturer had its facilities and its reputation for timely deliveries.
“I have to do things like scrape the web and look at their 10-K if they’re a publicly traded company,” says Taglienti. “There’s a lot more to it than just saying I found a part that works.” This kind of analysis was possible to automate even before gen AI came along, he says, but it’s a much more complex process than people might assume at the start. And this isn’t unusual. The lack of usable data has long been a problem for AI and ML projects. Also in the AIIA survey, data issues were a significant challenge for 84% of companies deploying gen AI. PwC’s Greenstein, for instance, recently worked with a consumer company that wanted to launch a project to automate back-office processing.
“They had their AI services set up,” he says. “Their cloud was set up. Their people were ready. But they didn’t anticipate how hard it was to get access to the data.” One data source required API licenses the company didn’t have, so it would need to go through a procurement process to get them, which can take months.
“In another system, the access controls were at a very high level by organization,” he says. “A third system was user-based controls. For gen AI, they had to reconcile all those, but they couldn’t do it quickly.”
In the long term, the company would get all the data it needs, he says — but they would have lost months.
“In this case, they pivoted to other use cases,” Greenstein says. “But leadership lost time and enthusiasm. All the people who were excited about the potential productivity improvements were frustrated, as well as the IT teams who hadn’t considered the data stuff, leading to leadership losing confidence in them.”
He says companies should prioritize potential AI use cases first by impact, second by risk, and third by data: “Do we have the data to do this use case? Do we have permission to use it? Is it accessible? Is it clean enough to be useful?,” he asks. “If we don’t get past this step, we don’t start. We find another use case.”