-
Notifications
You must be signed in to change notification settings - Fork 353
Add functionality to display plenary videos #5612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Curious about your thoughts @mjpost ! :-) See e.g. ACL 2022, which has plenaries in the XML: https://preview.aclanthology.org/display_plenaries/events/acl-2022/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks super cool so far, thank you!
I gave the build script changes a first look and have some comments.
bin/create_hugo_data.py
Outdated
# Generate talk ID from video filename if available, otherwise use default pattern | ||
if "video" in talk.attachments and talk.attachments["video"].name: | ||
# Extract talk ID from video filename like "2022.acl-keynote.2.mp4" | ||
video_name = talk.attachments["video"].name | ||
if video_name.endswith(".mp4"): | ||
# Remove .mp4 extension to get the talk ID | ||
talk_id = video_name[:-4] | ||
else: | ||
talk_id = video_name | ||
else: | ||
# Fallback to sequential numbering if no video | ||
talk_id = f"{event.id}.talk-{idx}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure I like I creating these IDs within the build script. It means that it's not possible to find the talk via this ID through our Python library.
Another issue is that the talk ID will change if we change the filename for some reason, which is not how other IDs work.
One idea would be to make the ID explicit in the XML and expose it through the Python library.
Thanks for your comments @mbollmann! I'll try to make the required changes, will let you know if I could use more input. |
We might want to give the ID question a bit more thought perhaps, since it would be the first time we have IDs that look like paper IDs but refer to non-paper things. E.g. should they be globally unique, could they clash with paper IDs? I know it doesn't matter for the URLs since they use |
My current assumption is that they should be globally unique to prevent potential clashes with presentation videos, since the mp4 plenary files are currently stored in the same folder as the paper presentation mp4s. (Of course, that could be changed easily.) |
Hi @davidstap, sorry I took so long to get to this. My first reaction: this looks fantastic (e.g., 2021.acl-keynote.1). I'm really excited about this; it will be really great to have this completely new feature, and it looks very good. I do think we need to find a way to encode the IDs analogous to the way we encode paper IDs. To be explicit, in the XML, the full Anthology ID can be reconstructed based on the hierarchical assembly of the collection ID, volume ID, and paper ID. It seems like mirroring this in the I don't have a solution here but am just writing "out loud" in hopes that we can come up both with a workable ID system and maybe a taxonomy of video types. |
I think it should be clear from an ID where to find the item referred to by it. Currently, this is the case: an ID like One solution would be to reserve a special "volume name" for these, e.g. make them all go under However, based on the comments so far — that maybe talks should be presented on the website in a similar vein to papers, that they should be citeable etc. — I am thinking that we should maybe really just move them into an ordinary volume.
For completeness, I'm thinking of something along the lines of: <volume id="keynote" type="media">
<talk id="1">
<title>Keynote 1: Harnessing the Power of <fixed-case>LLM</fixed-case>s to Vitalize Indigenous Languages</title>
<speaker><first>Claudio</first><last>Pinhanez</last></speaker>
<url>2024.naacl-keynote.1.mp4</url>
</talk>
</volume> |
I like this approach. It would also let us intermingle talks in a proceedings volume, and we'd have flexibility of how to group them (all talks in one volume, or separate out meetings and plenaries, etc). |
Thanks @mjpost and @mbollmann for the detailed comments. I'll be traveling from tomorrow and will probably not have time in the next ~1.5 weeks to work on this, but will find time after that. The main thing to decide seems how to handle the IDs. I also like @mbollmann's latest suggestion, and will try to implement that. I agree with @mjpost it'd be great to list the videos like papers are listed (with title, authors, and handy buttons). I couldn't find an obvious (non-hacky) way to make |
The Python side of this solution might require a bit more work and technical refactoring than I initially anticipated, now that I think of it. |
@mbollmann I remembered today that I was in talks some time ago to ingest the "NLP Highlights" podcast. That fell through for lack of time, but it might become relevant again. This is all the more reason to generalized |
@mjpost I mentioned the issue with the NLP Highlights podcast in my comment above from two weeks ago ;) #5612 (comment) – but good to know it might still be relevant! |
This is long overdue, but this (draft) PR implements functionality to display plenary videos (keynotes, panels, business meetings, etc.) on ACL Anthology event pages and creates dedicated landing pages for each talk.
Overview
Previously, plenary videos were stored in XML files but not displayed on the website #4309. This PR adds:
Implementation Details
1. Data Export Updates (
bin/create_hugo_data.py
)export_events()
: Added talk data to event JSON exports, including talk metadata, speakers, and video URLsexport_talks()
: New function that creates individual talk data files inbuild/data/talks/
directory{event-id}.talk-{number}
(e.g.,acl-2023.talk-1
)2. Hugo Templates
hugo/layouts/talks/single.html
: Individual talk landing pages with:hugo/layouts/events/single.html
: Added "Talks & Presentations" section between existing "Links" and "Volumes" sections, showing:3. Hugo Content Generation
hugo/content/talks/_content.gotmpl
: Template for automatic talk page generationhugo/content/talks/_index.md
: Index page for talks section/talks/{talk-id}/
4. Data Model Integration
Talk
andEvent
classesNameSpecification
andEventFileReference
infrastructureTO DO's (improvements):
Talk ID
used in the URL currently mismatches with the file name, e.g.acl-2022.talk-2
=>https://aclanthology.org/2022.acl-keynote.2.mp4
, this should be fixed.