Skip to content

derekphilipau/museum-semantic-search

Repository files navigation

Museum Semantic Search

Prototype exploring semantic search for museum collections using AI-generated visual descriptions and embeddings.

Screenshot

Features

  • Search artworks using multilingual natural language queries
  • Explore the embedding space with interactive visualizations
  • Compare traditional & semantic search techniques side-by-side
  • Upload an image to find similar artworks

Quick Links

Example Searches

Try these queries to see how semantic search finds artworks that traditional keyword search might miss:

"Three Women", "Gazing from the window", "The gnarled tree", "Mother and child", "Old man with a beard", "Person fighting a monster", "People on a bridge", "A banquet scene", "Man on a horse", "Ruins in a landscape", "Sleeping person", "The reclining nude", "Man in armor", "A person with a dog", "Flowers in a vase"

Curated Search Results for Various Archetypes

Disclaimer

While AI can enhance search capabilities, it can also perpetuate existing biases or create new ones. (See "Improving the Search: Uncovering AI bias in digital collections") AI-generated content should ideally be verified and edited by human experts. Museum collections search is complex and nuanced. This project is a quick prototype, see the Musefully project which is more reflective of proper faceted search.

Dataset

5,280 Open Access paintings with images from The Metropolitan Museum of Art's Open Access collection. View the full Met Open Access dataset

AI Visual Descriptions

AI is used to generate three types of descriptions for each artwork that adhere to the Cooper Hewitt Guidelines for Image Description:

  1. Alt text (~15 words) - Concise accessibility description
  2. Long description (100-300 words) - Detailed visual elements
  3. Emoji summary (3-8 emojis) - Visual elements as symbols

Two-pass quality control:

  • Generation pass - AI creates initial descriptions from the image. Full prompt
  • Editorial pass - AI reviews and removes bias, interpretation, or cultural assumptions. Full prompt

The Death of Socrates, Jacques Louis David, 1787

1. Generation Pass, Initial Visual Descriptions Multimodal Gemini 2.5 Flash output:

  • Alt Text: "Socrates, surrounded by grieving students, reaches for a cup of hemlock in a dimly lit prison cell"
  • Long Description: "The painting depicts a dramatic scene within a dimly lit, stone-walled room. In the center, an elderly, muscular man with a white beard, identified as Socrates, sits upright on a bed, draped in a white cloth. He gestures upwards with his right hand, as if speaking, while his left hand reaches for a small, red cup offered by a younger, muscular man in a red tunic, who turns away with a pained expression. To the right of Socrates, several men are gathered, showing various states of distress. One man in a red robe sits on a stone block, looking down thoughtfully. Behind him, others express anguish, some covering their faces, others raising their hands in despair. To the left of Socrates, an older man with a white beard and gray robe sits slumped on a bench, his head bowed in sorrow. Further back, in a darker arched doorway, more figures are visible, including a woman being led away. On the floor in the foreground, near the slumped man, are two rolled scrolls and a broken chain. The room's architecture is simple, with stone blocks forming the walls and an arched opening leading to a darker area. A small, dark object hangs from the ceiling. The overall lighting is dim, with a subtle glow on the central figures."
  • Emojis: 🧔🍷😭⛓️📜

2. Editorial Pass, Gemini 2.5 Flash Editorially Revised Output:

  • Alt Text: "An elderly man reaches for a cup, surrounded by distressed figures in a dimly lit stone room"
  • Long Description: "A dimly lit, stone-walled room contains multiple figures. In the center, an elderly, muscular man with a white beard sits upright on a bed, draped in a white cloth. He gestures upwards with his right hand, while his left hand reaches for a small, red cup. A younger, muscular man in a red tunic offers the cup, turning his head away from the elderly man with a downturned mouth. To the right of the central elderly man, several men are gathered, displaying varied postures. One man in a red robe sits on a stone block, looking downwards. Behind him, other figures cover their faces or raise their hands. To the left of the central elderly man, an older man with a white beard and gray robe sits slumped on a bench, his head bowed. Further back, in a darker arched doorway, more figures are visible, including a woman standing near another figure. On the floor in the foreground, near the slumped man, are two rolled scrolls and a broken chain. The room features stone block walls and an arched opening leading to a darker area. A small, dark object hangs from the ceiling. The overall lighting is dim, with a subtle glow on the central figures."
  • Emojis: 🧔🍷👥⛓️📜

Editorial Changes Made:

  • Alt Text: Removed specific name "Socrates."
  • Alt Text: Removed interpretive terms "grieving students," "hemlock," and "prison cell."
  • Alt Text: Replaced with objective visual descriptions like "distressed figures" and "stone room."
  • Alt Text: Adjusted word count to be closer to 15 words.
  • Long Description: Removed subjective phrase "The painting depicts a dramatic scene."
  • Long Description: Removed specific name "Socrates" and the phrase "identified as Socrates."
  • Long Description: Removed interpretive phrases such as "as if speaking," "pained expression," "various states of distress," "looking down thoughtfully," "express anguish," "raising their hands in despair," and "bowed in sorrow."
  • Long Description: Replaced character-specific references like "To the right of Socrates" with neutral spatial references like "To the right of the central elderly man."
  • Long Description: Rephrased "a woman being led away" to "a woman standing near another figure" to remove implied action/intent.
  • Long Description: Removed subjective judgment "The room's architecture is simple."
  • Long Description: Replaced emotional descriptions of figures with objective descriptions of their postures and expressions (e.g., "downturned mouth," "displaying varied postures," "cover their faces").
  • Emoji Summary: Removed "😭" emoji as it represents an emotion, which is explicitly forbidden.
  • Emoji Summary: Added "👥" emoji to represent the group of multiple figures, ensuring all main visual elements are covered objectively.

Visual Descriptions Limitations

The strict prompts & two-pass editorial process help reduce bias and subjective interpretation, but visual elements may be misidentified or missed entirely. The primary consideration is if, in spite of minor inaccuracies, the descriptions still improve search relevance, especially when used in text embeddings.

The Penitent Magdalen by Georges de La Tour

Excerpt from visual description of "The Penitent Magdalen" by Georges de La Tour:

"The mirror reflects two lit candles, their flames appearing as elongated, bright vertical streaks against the dark background within the frame. One candle is visible on a dark, turned wooden candlestick directly in front of the mirror, while the other is only seen as a reflection."

Here, the model seems confused by the mirror and incorrectly identifies two candles when there is only one.

Visual Descriptions Textual Analysis

Besides enabling better semantic search, the AI-generated visual descriptions can be used as a dataset for textual analysis. Below are some examples of frequent words used in the Met Paintings.

COLORS
  1. "dark" - 15672
  2. "light" - 11173
  3. "red" - 8175
  4. "white" - 7778
  5. "brown" - 7256
  6. "green" - 6400
  7. "blue" - 5273
  8. "black" - 4225
  9. "gold" - 3706
  10. "gray" - 1907
EMOJIS
  1. 🌳 - 1112
  2. ⛰ - 1060
  3. 👩 - 985
  4. 📜 - 891
  5. 🌊 - 795
  6. 🧔 - 781
  7. ✍ - 652
  8. 🌸 - 582
  9. ☁ - 580
  10. 🌲 - 548
ANIMALS
  1. "bird" - 1305
  2. "horse" - 853
  3. "dog" - 380
  4. "fish" - 210
  5. "fly" - 146
  6. "deer" - 144
  7. "monkey" - 142
  8. "elephant" - 122
  9. "crane" - 116
  10. "lion" - 114
MYTHOLOGICAL
  1. "creature" - 317
  2. "dragon" - 126
  3. "beast" - 12
  4. "demon" - 11
  5. "griffin" - 6
  6. "sphinx" - 3
  7. "centaur" - 3
  8. "monster" - 3
  9. "satyr" - 3
  10. "devil" - 3

AI-Generated Emojis

Sometimes strangely accurate revealing details I missed, at other times questionable and problematic, and often hilarious. Dubious practical use but fun.

AI-Generated Emojis

Search Comparison

Below are comparisons of keyword search, text embedding search, and image embedding search for the query "woman looking into mirror".

Search for "woman looking into mirror"

Out of a result set of 20:

  • The conventional Elasticsearch keyword search over Met Museum metadata produces only 3 results that I consider highly relevant.
  • Text embedding search using Jina v3 embeddings on combined metadata and AI-generated descriptions returns 13 excellent results, including a number of images where the reflection or mirror is not even visible.
  • Image embedding search returns 8 highly-relevant results, including artworks where there's no actual mirror, but perhaps the concept of mirroring, for example "Portrait of a Woman with a Man at a Casement" by Fra Filippo Lippi and "Dancers, Pink and Green" by Edgar Degas.

Highly Relevant Results for "woman looking into mirror"

Results that I found exciting are highlighted in the image below. A number of these I probably would have missed if browsing through images.

Notable Search Results for "woman looking into mirror"

Vilaval Ragini: Folio from a ragamala series (Garland of Musical Modes) Difficult to see: the woman on the left is looking into a mirror.

Vilaval Ragini: Folio from a ragamala series (Garland of Musical Modes)

Madame Marsollier and Her Daughter

I thought the AI-generated visual description and/or text embeddings had it wrong, but there is indeed a mirror in the painting and it's possible the main figure is looking into it.

Madame Marsollier and Her Daughter by Jean Marc Nattier

Vilaval Ragini: Folio from a ragamala series (Garland of Musical Modes)

Perhaps the woman is not looking into a mirror, but it does feel like a mirroring.

Portrait of a Woman with a Man at a Casement by Fra Filippo Lippi

Visualize Embeddings

Visualization Screenshot

The /visualize page shows the entire collection as dots on a 2D map, where similar artworks cluster together based on shared themes, styles, and subjects. For example, "Portraits of Men" and "Portraits of Women" appear near each other, as do "Horses" and "Men on Horses". Distinct traditions like "Indian Manuscripts" form separate regions.

  • Each dot represents one artwork in the collection
  • Distance between dots shows semantic similarity, closer dots are more similar
  • Search to highlight relevant results. Larger, brighter dots rank higher
  • Color dots by artist, period, tags, or department to reveal patterns

Text Embeddings Clusters

Image Embeddings Clusters

Horses Clusters

Visualization Journey Example

Search Types

1. Keyword Search

Traditional Elasticsearch text search using BM25 scoring across artwork metadata (title, artist, medium, etc.) and optional AI-generated visual descriptions.

2. Semantic Search

Vector similarity search using pre-computed embeddings:

  • Jina v3 Text: Advanced text search combining artwork metadata with AI-generated descriptions (768 dimensions)
  • SigLIP 2 Cross-Modal: True text-to-image search using Google's SigLIP 2 model (768 dimensions) - enables natural language queries like "red car in snow" or "mourning scene"

3. Hybrid Search

Combines keyword and semantic search with user-adjustable balance control:

  • Text Mode: Keyword + Jina v3 text embeddings
  • Image Mode: Keyword + SigLIP 2 cross-modal embeddings
  • Both Mode: Keyword + both embedding types using RRF
  • Balance slider: 0% = pure keyword, 100% = pure semantic, 50% = equal weight

4. Image Search

By clicking on "Image Search" in the search bar, you can upload an image to find visually similar artworks using SigLIP 2 cross-modal embeddings. Such a feature could be useful for museum-goers to find more information about an artwork they see in person or projects like Google Arts & Culture's "Art Selfie".

Uploaded Image: Photo taken in the galleries First Search Result: Aristotle with a Bust of Homer by Rembrandt (Rembrandt van Rijn)

"Aristotle with a Bust of Homer" by Rembrandt (Rembrandt van Rijn)

Similar Artworks Algorithms

The artwork detail pages display similar artworks using four different algorithms:

1. Metadata Similarity (Elasticsearch-based)

Finds artworks with similar structured metadata using art historical principles:

  • Artist (weight: 10) - Same artist indicates strong connection
  • Date/Period (weight: 7) - Temporal proximity using Gaussian decay (±25 years)
  • Medium (weight: 6) - Similar materials and techniques
  • Classification (weight: 5) - Same artwork type (painting, sculpture, etc.)
  • Department (weight: 4) - Museum curatorial groupings
  • Culture/Nationality (weight: 4) - Cultural and geographic connections
  • Period/Dynasty (weight: 3) - Art historical movements

2. Jina v3 Text Similarity

Uses 768-dimensional text embeddings to find semantically similar artworks based on:

  • Artwork metadata (title, artist, date, medium)
  • AI-generated visual descriptions
  • Contextual understanding of art terminology

3. SigLIP 2 Visual Similarity

Uses 768-dimensional cross-modal embeddings to find visually similar artworks:

  • Analyzes visual features like composition, color, style
  • Works across different media and periods
  • Captures visual patterns independent of metadata

4. Combined Similarity

Fuses all three similarity types using weighted Reciprocal Rank Fusion (RRF):

  • 35% Jina v3 text embeddings - semantic understanding
  • 35% SigLIP 2 visual embeddings - visual appearance
  • 30% Elasticsearch metadata - art historical context

Note that Elasticsearch has native RRF but it's only available in the Enter­prise plan.

5. AI Curated Similarity (Pre-computed LLM Reranking)

The AI curation process:

  1. Retrieves top 20 candidates from metadata and text embeddings searches, 5 candidates from image embeddings search
  2. Removes duplicates and presents candidates without scores to avoid bias
  3. Applies art historical expertise to select truly meaningful connections
  4. Enforces diversity rules (max 3 per artist, max 8 per similarity type)
  5. Returns up to 20 curated recommendations with confidence scores

Uses Gemini 2.5 Flash to intelligently select and rank similar artworks:

  • Cross-cultural connections: Discovers relationships across time periods and cultures (e.g., Gauguin's Tahitian Madonna with Renaissance Madonnas)
  • Thematic relationships: Identifies shared subjects and motifs beyond surface similarities
  • Visual intelligence: Considers composition, style, and emotional resonance
  • Diversity-aware: Limits over-representation of single artists or similarity types
  • Explainable: Each recommendation includes a brief explanation of the connection

Full prompt here.

AI-Curated Similarity Example

See Example here: Holy Family with Saint Anne, French Painter (17th century)

Diagram of AI-Curated Similarity (LLM Reranking)

For this example, relying only on metadata, the keyword search does a poor job of finding relevant similar artworks, pulling in various works by unknown "French Painter". Text & image embeddings results are better, especially with theme and style. The AI-curated results are perhaps best in my opinion, but I'm not an art historian and not familiar enough with the collection to make an educated judgment.

Technical Guide & Setup

See TECHNICAL_GUIDE.md for technical details & setup instructions including prerequisites, environment configuration, and deployment steps.

Previous Work

Related Projects

License

MIT licensed. Museum data used according to The Metropolitan Museum of Art's open access policy.

About

AI-powered semantic search for museum collections.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published