Add functionality to display plenary videos #5612

davidstap · 2025-08-01T13:32:44Z

This is long overdue, but this (draft) PR implements functionality to display plenary videos (keynotes, panels, business meetings, etc.) on ACL Anthology event pages and creates dedicated landing pages for each talk.

Overview

Previously, plenary videos were stored in XML files but not displayed on the website #4309. This PR adds:

Display of talks on event pages
Individual landing pages for each talk with video players
BibTeX citation support for talks
Integration with existing ACL Anthology infrastructure

Implementation Details

1. Data Export Updates (`bin/create_hugo_data.py`)

Modified export_events(): Added talk data to event JSON exports, including talk metadata, speakers, and video URLs
Added export_talks(): New function that creates individual talk data files in build/data/talks/ directory
Talk ID format: {event-id}.talk-{number} (e.g., acl-2023.talk-1)

2. Hugo Templates

Created hugo/layouts/talks/single.html: Individual talk landing pages with:
- Embedded video player for available videos
- Speaker information and talk metadata
- BibTeX citation format
- Links back to parent event
Updated hugo/layouts/events/single.html: Added "Talks & Presentations" section between existing "Links" and "Volumes" sections, showing:
- Talk titles with links to individual pages
- Video availability indicators

3. Hugo Content Generation

Created hugo/content/talks/_content.gotmpl: Template for automatic talk page generation
Created hugo/content/talks/_index.md: Index page for talks section
URL structure: Individual talks accessible at /talks/{talk-id}/

4. Data Model Integration

Used existing Talk and Event classes
Used existing NameSpecification and EventFileReference infrastructure
Maintained consistency with paper video handling patterns

TO DO's (improvements):

On plenary pages, speaker names should be linked to author pages, similarly to how that's handled on paper pages.
The Talk ID used in the URL currently mismatches with the file name, e.g. acl-2022.talk-2 => https://aclanthology.org/2022.acl-keynote.2.mp4, this should be fixed.
Close outdated PR [WIP] Display plenary talks on event page #3603 that aimed to do the same as this PR

github-actions · 2025-08-01T13:53:32Z

Build successful. Some useful links:

Complete site preview: https://preview.aclanthology.org/display_plenaries
Potential volumes of interest: 2021.acl-long, 2021.acl-short, 2021.acl-srw, 2021.acl-demo, 2021.acl-tutorials, 2021.emnlp-main, 2021.emnlp-demo, 2021.emnlp-tutorials, 2021.naacl-main, 2021.naacl-demos, 2021.naacl-srw, 2021.naacl-tutorials, 2021.naacl-industry, 2022.acl-long, 2022.acl-short, 2022.acl-srw, 2022.acl-demo, 2022.acl-tutorials, 2022.emnlp-main, 2022.emnlp-tutorials, 2022.emnlp-demos, 2022.emnlp-industry, 2022.naacl-main, 2022.naacl-srw, 2022.naacl-demo, 2022.naacl-tutorials, 2022.naacl-industry, 2023.acl-long, 2023.acl-short, 2023.acl-demo, 2023.acl-srw, 2023.acl-industry, 2023.acl-tutorials, 2023.eacl-main, 2023.eacl-demo, 2023.eacl-srw, 2023.eacl-tutorials, 2023.emnlp-main, 2023.emnlp-tutorial, 2023.emnlp-demo, 2023.emnlp-industry, 2024.eacl-long, 2024.eacl-short, 2024.eacl-demo, 2024.eacl-srw, 2024.eacl-tutorials, 2024.naacl-long, 2024.naacl-short, 2024.naacl-demo, 2024.naacl-srw, (plus 2 more...)

This preview will be removed when the branch is merged.

davidstap · 2025-08-01T13:55:27Z

Curious about your thoughts @mjpost ! :-)

See e.g. ACL 2022, which has plenaries in the XML: https://preview.aclanthology.org/display_plenaries/events/acl-2022/

mbollmann

Looks super cool so far, thank you!

I gave the build script changes a first look and have some comments.

bin/create_hugo_data.py

mbollmann · 2025-08-01T14:14:35Z

bin/create_hugo_data.py

+                # Generate talk ID from video filename if available, otherwise use default pattern
+                if "video" in talk.attachments and talk.attachments["video"].name:
+                    # Extract talk ID from video filename like "2022.acl-keynote.2.mp4"
+                    video_name = talk.attachments["video"].name
+                    if video_name.endswith(".mp4"):
+                        # Remove .mp4 extension to get the talk ID
+                        talk_id = video_name[:-4]
+                    else:
+                        talk_id = video_name
+                else:
+                    # Fallback to sequential numbering if no video
+                    talk_id = f"{event.id}.talk-{idx}"


Hmm, I'm not sure I like I creating these IDs within the build script. It means that it's not possible to find the talk via this ID through our Python library.

Another issue is that the talk ID will change if we change the filename for some reason, which is not how other IDs work.

One idea would be to make the ID explicit in the XML and expose it through the Python library.

bin/create_hugo_data.py

davidstap · 2025-08-01T14:45:26Z

Thanks for your comments @mbollmann! I'll try to make the required changes, will let you know if I could use more input.

mbollmann · 2025-08-01T14:50:39Z

We might want to give the ID question a bit more thought perhaps, since it would be the first time we have IDs that look like paper IDs but refer to non-paper things. E.g. should they be globally unique, could they clash with paper IDs? I know it doesn't matter for the URLs since they use /talks/ but I do think if these function as IDs they should be handled by the Python library, and then it becomes a question how to handle these. I'll think more about it :)

davidstap · 2025-08-01T15:00:11Z

We might want to give the ID question a bit more thought perhaps, since it would be the first time we have IDs that look like paper IDs but refer to non-paper things. E.g. should they be globally unique, could they clash with paper IDs? I know it doesn't matter for the URLs since they use /talks/ but I do think if these function as IDs they should be handled by the Python library, and then it becomes a question how to handle these. I'll think more about it :)

My current assumption is that they should be globally unique to prevent potential clashes with presentation videos, since the mp4 plenary files are currently stored in the same folder as the paper presentation mp4s. (Of course, that could be changed easily.)

…processing bug

mjpost · 2025-08-06T12:03:33Z

Hi @davidstap, sorry I took so long to get to this.

My first reaction: this looks fantastic (e.g., 2021.acl-keynote.1). I'm really excited about this; it will be really great to have this completely new feature, and it looks very good.

I do think we need to find a way to encode the IDs analogous to the way we encode paper IDs. To be explicit, in the XML, the full Anthology ID can be reconstructed based on the hierarchical assembly of the collection ID, volume ID, and paper ID. It seems like mirroring this in the <talk> structure would be helpful. Maybe the simplest way to do this would be to assign an id attribute to every talk, and place them within a <volume> block (inside <event>, which would stay nameless). Or we could just treat the <event> block as volume analogous? We could infer a volume from each talk, e.g., <talk> implicitly denotes a <talks> volume that is associated with the event. This would lose us the keynote portion of the Anthology ID, but that might be fine, since we'll have other types of recordings, including panels and so on.

I don't have a solution here but am just writing "out loud" in hopes that we can come up both with a workable ID system and maybe a taxonomy of video types.

mjpost · 2025-08-06T12:20:59Z

A few more thoughts:

I think having these posted publicly, citable, and easily accessible is going to have a large impact. ACL spends a lot of money to have these recorded and up to now a key part of the conference experience has just disappeared. This is really going to be great.
Currently, we list the videos bullet-point style in the top-level event block.

I wonder if we should group these into a volume-style category and display them visually the same way that papers are displayed, that is, with the title, authors, and a few handy buttons for getting the bib and the video, like this:
We would treat them as a separate volume. I'm not sure what the correct name is. "Plenaries" is wrong since they're not all plenaries. "Talks" is not right, since a panel isn't really a talk. Maybe we just want to do "Recordings" as the volume name?
At the same time, I think allowing variation in the Anthology ID is useful. So even if we group them under the event page in a "recordings" block, allowing IDs like 2021.acl-plenary-1 and 2021.acl-business.1 is helpful, say if you download the files. These are able to be semantically distinguishable in a way that papers aren't, and we should use that.
We might accomplish this by replacing <talk> with <recording> and allowing a richer id tag that includes the volume, e.g., <recording id="business.1">. Would this work? In all other ways, this would be analogous to a <paper> block (i.e., having a title, authors, etc).

mbollmann · 2025-08-06T14:12:56Z

I think it should be clear from an ID where to find the item referred to by it. Currently, this is the case: an ID like 2025.acl-long.1 parses into ['2025.acl', 'long', '1'] which refers to a <collection id="2025.acl"> that contains a <volume id="long"> and <paper id="1">. With most of the ID suggestions so far, this assumption would be violated — it wouldn't be clear from the ID whether to look for a <volume> or a <talk>, and we'd have to start looking in multiple places for a potential match.

One solution would be to reserve a special "volume name" for these, e.g. make them all go under talks, so when we see an ID like 2025.acl-talks.1 we know it's a <talk> under the <event> block.

However, based on the comments so far — that maybe talks should be presented on the website in a similar vein to papers, that they should be citeable etc. — I am thinking that we should maybe really just move them into an ordinary volume.

Volumes already have a type attribute that can currently be either "journal" or "proceedings". We could add another option, type="talks", or maybe more generically type="media".
The <volume type="..."> attribute already controls the generation of bibliographic entries (journals and proceedings trigger different bibentry types etc.), so it would be a natural way to define the correct type of bibliography entry for recordings or other media.
No new rules for ID resolution are needed.
Presentation on the website can be reused from how ordinary volumes work, and only needs to be overridden where desired.
The same approach could be taken to represent other kinds of media, like Ingesting the NLP Highlights podcast #497 or whatever else we want to archive and make citeable in the future, without having to invent new mechanisms.

For completeness, I'm thinking of something along the lines of:

<volume id="keynote" type="media">
    <talk id="1">
      <title>Keynote 1: Harnessing the Power of <fixed-case>LLM</fixed-case>s to Vitalize Indigenous Languages</title>
      <speaker><first>Claudio</first><last>Pinhanez</last></speaker>
      <url>2024.naacl-keynote.1.mp4</url>
    </talk>
</volume>

mjpost · 2025-08-06T17:32:17Z

I like this approach. It would also let us intermingle talks in a proceedings volume, and we'd have flexibility of how to group them (all talks in one volume, or separate out meetings and plenaries, etc).

davidstap · 2025-08-07T13:25:26Z

Thanks @mjpost and @mbollmann for the detailed comments. I'll be traveling from tomorrow and will probably not have time in the next ~1.5 weeks to work on this, but will find time after that.

The main thing to decide seems how to handle the IDs. I also like @mbollmann's latest suggestion, and will try to implement that.

I agree with @mjpost it'd be great to list the videos like papers are listed (with title, authors, and handy buttons). I couldn't find an obvious (non-hacky) way to make <speaker> link to author pages, but I'm sure that can be sorted out.

mbollmann · 2025-08-07T13:55:41Z

<volume id="keynote" type="media">
    <talk id="1">
      <title>Keynote 1: Harnessing the Power of <fixed-case>LLM</fixed-case>s to Vitalize Indigenous Languages</title>
      <speaker><first>Claudio</first><last>Pinhanez</last></speaker>
      <url>2024.naacl-keynote.1.mp4</url>
    </talk>
</volume>

The Python side of this solution might require a bit more work and technical refactoring than I initially anticipated, now that I think of it. Volume objects are currently dictionaries mapping IDs to Paper objects; if we want to make this Paper | Talk, that has many implications e.g. for typing. We might need to create an abstract VolumeItem class or something that defines the interface and make both Paper and Talk inherit from it. If we go this route, I wonder if we should lay the groundwork for this in a separate PR and use this one only for the front-end parts.

mjpost · 2025-08-21T20:22:33Z

@mbollmann I remembered today that I was in talks some time ago to ingest the "NLP Highlights" podcast. That fell through for lack of time, but it might become relevant again. This is all the more reason to generalized VolumeItem so we could also include an audio or podcast type.

mbollmann · 2025-08-21T21:02:13Z

@mjpost I mentioned the issue with the NLP Highlights podcast in my comment above from two weeks ago ;) #5612 (comment) – but good to know it might still be relevant!

added functionality to display plenary talks

6d09d9b

davidstap requested a review from mjpost August 1, 2025 13:32

davidstap self-assigned this Aug 1, 2025

davidstap added the enhancement label Aug 1, 2025

davidstap added this to the 2025Q3 milestone Aug 1, 2025

davidstap mentioned this pull request Aug 1, 2025

[WIP] Display plenary talks on event page #3603

Closed

5 tasks

fix reformatting issues

c28b268

fix Talk ID in URL

e0ff6e1

mbollmann reviewed Aug 1, 2025

View reviewed changes

save indentation level

c48a076

simplify using person_to_dict()

b724e5d

davidstap marked this pull request as draft August 1, 2025 15:03

David Stap added 2 commits August 1, 2025 17:06

standardize talk attachment structure to match paper attachments

fa3debb

replace individual talk files with combined talks.json + fix speaker …

f41df5c

…processing bug

mjpost requested a review from nschneid August 6, 2025 12:21

Add functionality to display plenary videos #5612

Are you sure you want to change the base?

Add functionality to display plenary videos #5612

Uh oh!

Conversation

davidstap commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Implementation Details

1. Data Export Updates (bin/create_hugo_data.py)

2. Hugo Templates

3. Hugo Content Generation

4. Data Model Integration

TO DO's (improvements):

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

davidstap commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbollmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mbollmann Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidstap commented Aug 1, 2025

Uh oh!

mbollmann commented Aug 1, 2025

Uh oh!

davidstap commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjpost commented Aug 6, 2025

Uh oh!

mjpost commented Aug 6, 2025

Uh oh!

mbollmann commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjpost commented Aug 6, 2025

Uh oh!

davidstap commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbollmann commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjpost commented Aug 21, 2025

Uh oh!

mbollmann commented Aug 21, 2025

Uh oh!

Uh oh!

davidstap commented Aug 1, 2025 •

edited

Loading

1. Data Export Updates (`bin/create_hugo_data.py`)

davidstap commented Aug 1, 2025 •

edited

Loading

davidstap commented Aug 1, 2025 •

edited

Loading

mbollmann commented Aug 6, 2025 •

edited

Loading

davidstap commented Aug 7, 2025 •

edited

Loading

mbollmann commented Aug 7, 2025 •

edited

Loading