Proof of Concept: Using AI to Generate Accessible Image Descriptions at Scale

How do you add alternative text to thousands of images on a massive website when doing it manually isn’t an option?

If you’ve ever heard me talk about accessibility, there’s a good chance I’ve mentioned that tackling it from the start is the best approach. Ignoring accessibility early on makes life harder in the not-so-distant future — the longer we wait to make accessibility a requirement, the more (and messier) problems we have to fix later.

At SingleStore, we’re not immune to this issue, as proved by a recent effort to improve accessibility on our website.

We published our first blog post in 2012, and over the years nearly 1,000 more have been published by countless contributors. The result is a long history of posts created without consistent accessibility practices. Two CMS migrations — from WordPress to DecapCMS and more recently to Contentstack — gave us the opportunity to improve many things in the process, but some issues slipped through the cracks of prioritization.

One of the issues the web dev team was aware of was the lack of alternative text (alt text) in many images, but we hadn’t yet taken the time to find an effective way to report the problematic instances to marketing — nor were we sure of the dimension of the issue. So the first thing I worked on was a script to detect images with missing alt text to notify the team. However, I quickly realized that expecting someone to manually write descriptions for every missing alt text was… wishful thinking. On the blog alone, there were hundreds of instances. Even if someone knew exactly what to write for each image without needing to see it in context, it would still take weeks to complete. And our goal was to fix this on all our pages, not just the blog.

That’s when we decided to put artificial intelligence to the test: could we automate part of the content writing process? Specifically, could we use AI to generate accurate image descriptions, and drastically reduce the time and effort required to complete this task?

AI models tested

I selected a sample of different images — a mix of photos and more technical images of diagrams, tables, graphics and so on — and ran them through different tools to compare their outputs.

Here are the instructions I used for tools that support custom prompts:

Context. You are a content writer for SingleStore.
Prompt. Please provide a detailed and accurate alt text description for this image. The description should be concise (less than 255 characters) yet informative, capturing the key elements and context of the image. Output the content only, without additional annotations.

Here’s an example of outputs for one of the sample images:

Microsoft Azure Computer Vision	A diagram of data processing/a diagram of a cloud database
Google Vision	Product, Font, Parallel, Circle, Symmetry Note: Google Vision outputs labels, not an actual description
Claude by Anthropic	Diagram of real-time data platform showing SingleStore at center, surrounded by components like Helios, In-VPC deployment, compute service, and data integration. Features universal store and hybrid search are highlighted below.
ChatGPT by OpenAI	Diagram of the SingleStore real-time data platform highlighting the core features: universal store for transactions and queries, hybrid search, Helios cloud service, native data integration, compute service, SingleStore Kai API, and upcoming in-VPC deployment.

I tested several other agents specialized in image descriptions which were promising, but didn’t allow custom prompts — and their lack of knowledge of SingleStore led to frequent inaccuracies.

Choosing Claude by Anthropic

The outputs by both Claude and ChatGPT were aligned with the level of detail I was aiming for. However, Claude results were generally more accurate for technical images, and ChatGPT frequently ignored the 255-character limit.

Based on these results*, I decided to continue with the proof of concept focusing on Claude.

The final script

Detect images with missing alt text, using Contentstack’s management API. I’m keeping the CMS integrations outside of this post as they are specific to our use case, but everything was done using their management API, which allows you to programmatically access and update content.
Encode each image to Base64 format so it can be used as input for Anthropic.
Generate alt text for each image, using Anthropic's TypeScript SDK.
Update the asset on the CMS to add the description generated by Anthropic.
Add the image to a release on the CMS, so it can be reviewed and later published.

1
import Anthropic from "@anthropic-ai/sdk";
2
import axios from "axios";
3

4
import { getImagesWithoutAltText, updateAssetDescription } from "./our-custom-scripts";
5

6
// Must add API key to .envrc or .envrc.private file
7
// export ANTHROPIC_API_KEY=your_api_key
8
if (!process.env.ANTHROPIC_API_KEY) {
9
  throw new Error("Please set ANTHROPIC_API_KEY as an environment variable.");
10
}
11

12
const anthropic = new Anthropic();
13

14
const VALID_IMG_EXTENSIONS = ["jpeg", "png", "gif", "webp"] as const;
15
type ValidImageExtension = typeof VALID_IMG_EXTENSIONS[number];
16

17
const isValidExtension = (
18
    imageExtension: string | undefined
19
): imageExtension is ValidImageExtension => {
20
    if (!imageExtension) {
21
        throw new Error("Image extension not found.");
22
    }
23
  
24
    const isValid = VALID_IMG_EXTENSIONS.includes(imageExtension as ValidImageExtension);
25
    if (!isValid) {
26
        throw new Error("Invalid image extension: " + imageExtension);
27
    }
28

29

30
    return true;
31
};
32

33
// Example image url: 
34
// https://images.contentstack.io/v3/assets/[account-and-assets-id]/file.jpg?width=50&auto=webp;
35
async function generateAltText(
36
  imageUrl: string
37
): Promise {
38
  try {
39
    let imageExtension = imageUrl
40
      .split("?")[0] // get url without params
41
      .split(".") // split before extension
42
      .pop(); // get last item in array
43
    if (imageExtension === "jpg") {
44
      imageExtension = "jpeg";
45
    }
46

47
    if (!isValidExtension(imageExtension)) {
48
      return;
49
    }
50

51
    // https://stackabuse.com/bytes/converting-images-and-image-urls-to-base64-in-node-js/
52
    const base64Image = await axios
53
      .get(imageUrl, {
54
          responseType: "arraybuffer",
55
      })
56
      .then(response => {
57
          return Buffer.from(response.data).toString("base64");
58
      })
59
      .catch(error => console.log(error));
60

61
    if (!base64Image) {
62
      throw new Error("No image data");
63
    }
64

65
    const output = await anthropic.messages.create({
66
      model: "claude-3-5-sonnet-20240620",
67
      max_tokens: 1000,
68
      temperature: 0,
69
      system: "You are a content writer for SingleStoreDB",
70
      messages: [
71
        {
72
          role: "user",
73
          content: [
74
            {
75
              type: "image",
76
              source: {
77
                type: "base64",
78
                media_type: `image/${imageExtension}`,
79
                data: base64Image,
80
              },
81
            },
82
            {
83
              type: "text",
84
              text:
85
                "Please provide a detailed and accurate alt text description for this image. " +
86
                "The description should be concise (less than 255 characters) yet informative, " +
87
                "capturing the key elements and context of the image. Output the content only, " +
88
                "without additional annotations.",
89
            },
90
          ],
91
        },
92
      ],
93
    });
94

95
    const outputContent = output.content?.[0];
96
    
97
    if (outputContent?.type === "text") {
98
        const suggestedAltText = output.content[0].text;
99
        return suggestedAltText;
100
    } else {
101
        console.error("Output text not found");
102
        return;
103
    }
104
  } catch (error) {
105
      console.error("Error generating alt text:", error);
106
      return;
107
  }
108
}
109

110

111
async function main() {
112
  // Custom fuction to get a list of images with missing alt
113
  const { failingImages } = await getImagesWithoutAltText({
114
    contentType: "blog_posts",
115
  });
116

117
  for (const image of failingImages) {
118
    const { url, uid } = image;
119
    const altText = await generateAltText(url);
120

121
    if (altText) {
122
      // Custom function that updates the asset entry on
123
      // Contentstack and adds it to a release
124
      updateAssetDescription({
125
        releaseUid: "RELEASE_UID", // replace
126
        imageUid: uid,
127
        description: altText,
128
      });
129
    }
130
  }
131
}
132

133

134
main();

Final thoughts

For photographs, the descriptions provided by this AI are usually ready to publish without additional edits.
Manual review is still important for more technical images, as interpreting them may require specialized knowledge of our product — and the context in which they appear can impact the details we want to include in the description.
With Anthropic, we estimated the cost of generating alt text to be around $0.01 per image, or less than $10 per 1,000 images. Since reviewing or tweaking a description is generally quicker than writing it from scratch, this strategy is fairly cost effective.

I hope this example demonstrates how cross-team collaboration can address accessibility challenges more effectively. While the issue may seem to be only within the jurisdiction of "content," it becomes much easier to identify, solve and prevent in collaboration with the development team.

Also, I want to point out this was only one of the strategies we worked on for the alt text issue. The focus of this proof of concept was on fixing past instances, but we planned and implemented additional measures to prevent the problem from recurring in pages published going forward.

Alt Text SEO: Why is Alt Text Important?

* Disclaimer

To be fully honest, the real reason why I went with Anthropic was because of the first output it gave me. I adapted a sample prompt from their docs that contained the instruction “Respond only with short poems,” and I initially ran it without realizing.

Data flows in real-time's stream,
A platform's architectural dream.
SingleStore at the core,
With Helios to explore,
In-VPC and Kai complete the scheme.
Compute, search, and analyze,
Native integration's enterprise guise.
A circular design,
Where data intertwines,
Real-time insights, the ultimate prize.