Museum Semantic Search

Prototype exploring semantic search for museum collections using AI-generated visual descriptions and embeddings.

Features

Search artworks using multilingual natural language queries
Explore the embedding space with interactive visualizations
Compare traditional & semantic search techniques side-by-side
Upload an image to find similar artworks

Quick Links

Try the Demo - Search 5,280 Open Access Met paintings
Explore the Embeddings Visualization - See how artworks cluster by text & image similarity
Technical Guide & Setup - Setup and development guide

Example Searches

Try these queries to see how semantic search finds artworks that traditional keyword search might miss:

"Three Women", "Gazing from the window", "The gnarled tree", "Mother and child", "Old man with a beard", "Person fighting a monster", "People on a bridge", "A banquet scene", "Man on a horse", "Ruins in a landscape", "Sleeping person", "The reclining nude", "Man in armor", "A person with a dog", "Flowers in a vase"

Disclaimer

While AI can enhance search capabilities, it can also perpetuate existing biases or create new ones. (See "Improving the Search: Uncovering AI bias in digital collections") AI-generated content should ideally be verified and edited by human experts. Museum collections search is complex and nuanced. This project is a quick prototype, see the Musefully project which is more reflective of proper faceted search.

Dataset

5,280 Open Access paintings with images from The Metropolitan Museum of Art's Open Access collection. View the full Met Open Access dataset

AI Visual Descriptions

AI is used to generate three types of descriptions for each artwork that adhere to the Cooper Hewitt Guidelines for Image Description:

Alt text (~15 words) - Concise accessibility description
Long description (100-300 words) - Detailed visual elements
Emoji summary (3-8 emojis) - Visual elements as symbols

Two-pass quality control:

Generation pass - AI creates initial descriptions from the image. Full prompt
Editorial pass - AI reviews and removes bias, interpretation, or cultural assumptions. Full prompt

Visual Descriptions Example: The Death of Socrates, Jacques Louis David, 1787

1. Generation Pass, Initial Visual Descriptions Multimodal Gemini 2.5 Flash output:

Alt Text: "Socrates, surrounded by grieving students, reaches for a cup of hemlock in a dimly lit prison cell"
Long Description: "The painting depicts a dramatic scene within a dimly lit, stone-walled room. In the center, an elderly, muscular man with a white beard, identified as Socrates, sits upright on a bed, draped in a white cloth. He gestures upwards with his right hand, as if speaking, while his left hand reaches for a small, red cup offered by a younger, muscular man in a red tunic, who turns away with a pained expression. To the right of Socrates, several men are gathered, showing various states of distress. One man in a red robe sits on a stone block, looking down thoughtfully. Behind him, others express anguish, some covering their faces, others raising their hands in despair. To the left of Socrates, an older man with a white beard and gray robe sits slumped on a bench, his head bowed in sorrow. Further back, in a darker arched doorway, more figures are visible, including a woman being led away. On the floor in the foreground, near the slumped man, are two rolled scrolls and a broken chain. The room's architecture is simple, with stone blocks forming the walls and an arched opening leading to a darker area. A small, dark object hangs from the ceiling. The overall lighting is dim, with a subtle glow on the central figures."
Emojis: 🧔🍷😭⛓️📜

2. Editorial Pass, Gemini 2.5 Flash Editorially Revised Output:

Alt Text: "An elderly man reaches for a cup, surrounded by distressed figures in a dimly lit stone room"
Long Description: "A dimly lit, stone-walled room contains multiple figures. In the center, an elderly, muscular man with a white beard sits upright on a bed, draped in a white cloth. He gestures upwards with his right hand, while his left hand reaches for a small, red cup. A younger, muscular man in a red tunic offers the cup, turning his head away from the elderly man with a downturned mouth. To the right of the central elderly man, several men are gathered, displaying varied postures. One man in a red robe sits on a stone block, looking downwards. Behind him, other figures cover their faces or raise their hands. To the left of the central elderly man, an older man with a white beard and gray robe sits slumped on a bench, his head bowed. Further back, in a darker arched doorway, more figures are visible, including a woman standing near another figure. On the floor in the foreground, near the slumped man, are two rolled scrolls and a broken chain. The room features stone block walls and an arched opening leading to a darker area. A small, dark object hangs from the ceiling. The overall lighting is dim, with a subtle glow on the central figures."
Emojis: 🧔🍷👥⛓️📜

Editorial Changes Made:

Alt Text: Removed specific name "Socrates."
Alt Text: Removed interpretive terms "grieving students," "hemlock," and "prison cell."
Alt Text: Replaced with objective visual descriptions like "distressed figures" and "stone room."
Alt Text: Adjusted word count to be closer to 15 words.
Long Description: Removed subjective phrase "The painting depicts a dramatic scene."
Long Description: Removed specific name "Socrates" and the phrase "identified as Socrates."
Long Description: Removed interpretive phrases such as "as if speaking," "pained expression," "various states of distress," "looking down thoughtfully," "express anguish," "raising their hands in despair," and "bowed in sorrow."
Long Description: Replaced character-specific references like "To the right of Socrates" with neutral spatial references like "To the right of the central elderly man."
Long Description: Rephrased "a woman being led away" to "a woman standing near another figure" to remove implied action/intent.
Long Description: Removed subjective judgment "The room's architecture is simple."
Long Description: Replaced emotional descriptions of figures with objective descriptions of their postures and expressions (e.g., "downturned mouth," "displaying varied postures," "cover their faces").
Emoji Summary: Removed "😭" emoji as it represents an emotion, which is explicitly forbidden.
Emoji Summary: Added "👥" emoji to represent the group of multiple figures, ensuring all main visual elements are covered objectively.

Visual Descriptions Limitations

The strict prompts & two-pass editorial process help reduce bias and subjective interpretation, but visual elements may be misidentified or missed entirely. The primary consideration is if, in spite of minor inaccuracies, the descriptions still improve search relevance, especially when used in text embeddings.

Excerpt from visual description of "The Penitent Magdalen" by Georges de La Tour:

"The mirror reflects two lit candles, their flames appearing as elongated, bright vertical streaks against the dark background within the frame. One candle is visible on a dark, turned wooden candlestick directly in front of the mirror, while the other is only seen as a reflection."

Here, the model seems confused by the mirror and incorrectly identifies two candles when there is only one.

Visual Descriptions Textual Analysis

Besides enabling better semantic search, the AI-generated visual descriptions can be used as a dataset for textual analysis. Below are some examples of frequent words used in the Met Paintings.

COLORS "dark" - 15672 "light" - 11173 "red" - 8175 "white" - 7778 "brown" - 7256 "green" - 6400 "blue" - 5273 "black" - 4225 "gold" - 3706 "gray" - 1907	EMOJIS 🌳 - 1112 ⛰ - 1060 👩 - 985 📜 - 891 🌊 - 795 🧔 - 781 ✍ - 652 🌸 - 582 ☁ - 580 🌲 - 548
ANIMALS "bird" - 1305 "horse" - 853 "dog" - 380 "fish" - 210 "fly" - 146 "deer" - 144 "monkey" - 142 "elephant" - 122 "crane" - 116 "lion" - 114	MYTHOLOGICAL "creature" - 317 "dragon" - 126 "beast" - 12 "demon" - 11 "griffin" - 6 "sphinx" - 3 "centaur" - 3 "monster" - 3 "satyr" - 3 "devil" - 3

AI-Generated Emojis

Sometimes strangely accurate revealing details I missed, at other times questionable and problematic, and often hilarious. Dubious practical use but fun.

Search Comparison

Below are comparisons of keyword search, text embedding search, and image embedding search for the query "woman looking into mirror".

Search for "woman looking into mirror"

Out of a result set of 20:

The conventional Elasticsearch keyword search over Met Museum metadata produces only 3 results that I consider highly relevant.
Text embedding search using Jina v3 embeddings on combined metadata and AI-generated descriptions returns 13 excellent results, including a number of images where the reflection or mirror is not even visible.
Image embedding search returns 8 highly-relevant results, including artworks where there's no actual mirror, but perhaps the concept of mirroring, for example "Portrait of a Woman with a Man at a Casement" by Fra Filippo Lippi and "Dancers, Pink and Green" by Edgar Degas.

Results that I found exciting are highlighted in the image below. A number of these I probably would have missed if browsing through images.

Difficult to see: the woman on the left is looking into a mirror.

Vilaval Ragini: Folio from a ragamala series (Garland of Musical Modes)

I thought the AI-generated visual description and/or text embeddings had it wrong, but there is indeed a mirror in the painting and it's possible the main figure is looking into it.

Madame Marsollier and Her Daughter by Jean Marc Nattier

Perhaps the woman is not looking into a mirror, but it does feel like a mirroring.

Portrait of a Woman with a Man at a Casement by Fra Filippo Lippi

Visualize Embeddings

The /visualize page shows the entire collection as dots on a 2D map, where similar artworks cluster together based on shared themes, styles, and subjects. For example, "Portraits of Men" and "Portraits of Women" appear near each other, as do "Horses" and "Men on Horses". Distinct traditions like "Indian Manuscripts" form separate regions.

Each dot represents one artwork in the collection
Distance between dots shows semantic similarity, closer dots are more similar
Search to highlight relevant results. Larger, brighter dots rank higher
Color dots by artist, period, tags, or department to reveal patterns

Search Types

1. Keyword Search

Traditional Elasticsearch text search using BM25 scoring across artwork metadata (title, artist, medium, etc.) and optional AI-generated visual descriptions.

2. Semantic Search

Vector similarity search using pre-computed embeddings:

Jina v3 Text: Advanced text search combining artwork metadata with AI-generated descriptions (768 dimensions)
SigLIP 2 Cross-Modal: True text-to-image search using Google's SigLIP 2 model (768 dimensions) - enables natural language queries like "red car in snow" or "mourning scene"

3. Hybrid Search

Combines keyword and semantic search with user-adjustable balance control:

Text Mode: Keyword + Jina v3 text embeddings
Image Mode: Keyword + SigLIP 2 cross-modal embeddings
Both Mode: Keyword + both embedding types using RRF
Balance slider: 0% = pure keyword, 100% = pure semantic, 50% = equal weight

4. Image Search

By clicking on "Image Search" in the search bar, you can upload an image to find visually similar artworks using SigLIP 2 cross-modal embeddings. Such a feature could be useful for museum-goers to find more information about an artwork they see in person or projects like Google Arts & Culture's "Art Selfie".

Uploaded Image:

First Search Result:

"Aristotle with a Bust of Homer" by Rembrandt (Rembrandt van Rijn)

Similar Artworks Algorithms

The artwork detail pages display similar artworks using four different algorithms:

1. Metadata Similarity (Elasticsearch-based)

Finds artworks with similar structured metadata using art historical principles:

Artist (weight: 10) - Same artist indicates strong connection
Date/Period (weight: 7) - Temporal proximity using Gaussian decay (±25 years)
Medium (weight: 6) - Similar materials and techniques
Classification (weight: 5) - Same artwork type (painting, sculpture, etc.)
Department (weight: 4) - Museum curatorial groupings
Culture/Nationality (weight: 4) - Cultural and geographic connections
Period/Dynasty (weight: 3) - Art historical movements

2. Jina v3 Text Similarity

Uses 768-dimensional text embeddings to find semantically similar artworks based on:

Artwork metadata (title, artist, date, medium)
AI-generated visual descriptions
Contextual understanding of art terminology

3. SigLIP 2 Visual Similarity

Uses 768-dimensional cross-modal embeddings to find visually similar artworks:

Analyzes visual features like composition, color, style
Works across different media and periods
Captures visual patterns independent of metadata

4. Combined Similarity

Fuses all three similarity types using weighted Reciprocal Rank Fusion (RRF):

35% Jina v3 text embeddings - semantic understanding
35% SigLIP 2 visual embeddings - visual appearance
30% Elasticsearch metadata - art historical context

Note that Elasticsearch has native RRF but it's only available in the Enterprise plan.

5. AI Curated Similarity (Pre-computed LLM Reranking)

The AI curation process:

Retrieves top 20 candidates from metadata and text embeddings searches, 5 candidates from image embeddings search
Removes duplicates and presents candidates without scores to avoid bias
Applies art historical expertise to select truly meaningful connections
Enforces diversity rules (max 3 per artist, max 8 per similarity type)
Returns up to 20 curated recommendations with confidence scores

Uses Gemini 2.5 Flash to intelligently select and rank similar artworks:

Cross-cultural connections: Discovers relationships across time periods and cultures (e.g., Gauguin's Tahitian Madonna with Renaissance Madonnas)
Thematic relationships: Identifies shared subjects and motifs beyond surface similarities
Visual intelligence: Considers composition, style, and emotional resonance
Diversity-aware: Limits over-representation of single artists or similarity types
Explainable: Each recommendation includes a brief explanation of the connection

Full prompt here.

AI-Curated Similarity Example

See Example here: Holy Family with Saint Anne, French Painter (17th century)

For this example, relying only on metadata, the keyword search does a poor job of finding relevant similar artworks, pulling in various works by unknown "French Painter". Text & image embeddings results are better, especially with theme and style. The AI-curated results are perhaps best in my opinion, but I'm not an art historian and not familiar enough with the collection to make an educated judgment.

Technical Guide & Setup

See TECHNICAL_GUIDE.md for technical details & setup instructions including prerequisites, environment configuration, and deployment steps.

Previous Work

Musefully (website, github): Search across museums using Elasticsearch and Next.js
“Accessible Art Tags” GPT: a specialized GPT that generates alt text and long descriptions following Cooper Hewitt Guidelines for Image Description.
OpenAI CLIP Embedding Similarity: Examples of OpenAI CLIP Embeddings artwork similarity search.

Related Projects

MuseRAG++: A Deep Retrieval-Augmented Generation Framework for Semantic Interaction and Multi-Modal Reasoning in Virtual Museums: RAG-powered museum chatbot
National Museum of Norway Semantic Collection Search (Website, Article): Search via embeddings of GPT-4 Vision image descriptions.
Semantic Art Search (Github, Website): Explore art through meaning-driven search
Sketchy Collections (Github, Website): CLIP-based image search tool that lets you explore artworks by drawing or uploading a picture

License

MIT licensed. Museum data used according to The Metropolitan Museum of Art's open access policy.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.claude		.claude
app		app
components/ui		components/ui
config		config
data/met		data/met
docs		docs
lib		lib
modal		modal
public		public
scripts		scripts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
TECHNICAL_GUIDE.md		TECHNICAL_GUIDE.md
components.json		components.json
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

derekphilipau/museum-semantic-search

Folders and files

Latest commit

History

Repository files navigation