Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use VTT transcript to display captions in media player #1469

Merged
merged 3 commits into from
May 16, 2023
Merged

Use VTT transcript to display captions in media player #1469

merged 3 commits into from
May 16, 2023

Conversation

edsu
Copy link
Contributor

@edsu edsu commented Feb 8, 2023

If an object's video or audio file has an associated VTT file, add a <track> element which should allow the player to display the transcript during playing of the video or audio.

You can see some SDR objects already have VTT files added. Their PURL data looks something like:

<resource id="cocina-fileSet-bd111gd4290-bd111gd4290_1" sequence="1" type="video">
  <label>Video file</label>
    <file id="bd111gd4290_sl.mp4" mimetype="video/mp4" size="1077075674" publish="yes" shelve="yes" preserve="yes">
  </file>
  <file id="bd111gd4290_thumb.jp2" mimetype="image/jp2" size="296021" publish="yes" shelve="yes" preserve="yes">
    <imageData height="480" width="640"/>
  </file>
  <file id="bd111gd4290_cap.vtt" mimetype="text/plain" size="145678" publish="yes" shelve="yes" preserve="yes">
  </file>
</resource>

Ideally the VTT files would have a text/vtt mimetype? Also it might be possible for there to be multiple language transcriptions for a media file, which is not currently handled in this PR. Handling multiple languages should be doable if there was a convention or mechanism for determining the language.

Also, I think testing/research with <audio> should be done. I did a bit of quick looking around and it seems that maybe captions don't get displayed unless you say it's video instead? That might be stale information though.

This was quick exploratory in response to questions from @pleonard212 and @dinahhandel.

@cbeer
Copy link
Member

cbeer commented Feb 8, 2023

See also #1143.

@edsu
Copy link
Contributor Author

edsu commented Feb 8, 2023

@andrewjbtw created a test object in Stage for testing. It didn't seem easy to get video streaming to work in my dev environment, maybe because of CORS?

@andrewjbtw
Copy link

Two notes about the test object:

  1. I didn't assign the vtt file any role. In OCR, all the ALTO files are assigned "role=transcription" in order to be picked up for search. In preliminary discussion of caption files, we discussed following that precedent in some way but did not settle on a term. I could set it to "transcription." Cocina validates roles so we would need to add a role to the vocabulary if we wanted something like "caption".

  2. If you look at the filenames, the video file and the caption file differ by both name and extension. Some players like VLC will detect captions if the names are the same and only the extensions differ. I can make a test like that but ultimately it would be more robust if the video file and caption file could differ in name. Plus that would be essential if we offered subtitles in multiple languages.

@edsu
Copy link
Contributor Author

edsu commented Apr 1, 2023

I pushed this to stage where there is a test video object, and it seems to work!

https://sul-purl-stage.stanford.edu/dw202dq8174

@edsu edsu marked this pull request as ready for review April 10, 2023 17:39
spec/fixtures/purl_fixtures.rb Outdated Show resolved Hide resolved
end

def vtt?
mimetype == 'text/plain' && title.end_with?('.vtt')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's right @jcoyne. I think it might be coded in the example XML that way? Perhaps it should allow for both?

Copy link
Contributor

@jcoyne jcoyne May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the example XML is just bad data. We should not build in support for bad data.

Copy link
Contributor Author

@edsu edsu May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks there are 45 text/vtt files at the moment. I guess I'd need to run a DSA report to see if there are many/any .vtt files that need the correct media type?

But yes, agreed, we should set things up to encourage the correct description of VTT files. Thanks for spotting this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm wondering if it should ignore the extension altogether and only key of the media type?

Copy link
Contributor Author

@edsu edsu May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an important point @andrewjbtw -- it would suck to have to go in and manually fix anything with a transcript because our system codes things incorrectly. Does techmd do the mediatype detection for us at the moment? Do we need a ticket in there, and perhaps a data remediation ticket as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvements to mime-type detection should go in https://github.com/sul-dlss/assembly-objectfile/blob/229520e3c590dc3e8c83157c5d62c00cd3e53eb7/lib/assembly/object_file.rb#L103.

It does have a problem with this type as it prefers the value returned from file before the value from extension

file = Assembly::ObjectFile.new('foo.vtt')
file.send :exif_mimetype
=> nil
file.send :file_mimetype
=> "text/plain"
file.send :extension_mimetype
=> "text/vtt"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of different places where mime-types are determined. I think sdr-api does it as well. Maybe counterintuitively, the mime-type in the structural metadata is generated and stored separately from the mime-type in the technical metadata, which has been the case going back to the Fedora era. The access systems don't use the technical metadata.

Copy link
Member

@mjgiarlo mjgiarlo Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I just deposited a VTT file using the SDR API in stage, and it appears both sdr-api and the techmd service applied the correct MIME type: https://argo-stage.stanford.edu/view/druid:pc587kh4617

assembly-objectfile may be the one spot needing attention here as @jcoyne shared above, and it has a related issue: sul-dlss/assembly-objectfile#19

Copy link
Member

@mjgiarlo mjgiarlo Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong about the above: @edsu already patched assembly-objectfile such that it correctly picks the VTT mime type, and that's in the latest version of the gem.

edsu added 2 commits May 16, 2023 09:04
If a video or audio file have an associated VTT file, add it as a
<track> element. This should allow the player to display the transcript
during playing of the video or audio.
The logic for determining if a file is a VTT file is now based on the
media type, not the filename or its extension.
@jcoyne jcoyne merged commit e119456 into main May 16, 2023
@jcoyne jcoyne deleted the vtt branch May 16, 2023 21:54
@andrewjbtw
Copy link

@jcoyne Does merging this PR mean this will go out to prod and items with VTT will start showing captions next week? I ask because I'm not sure we're ready for that in terms of the styling/UI.

edsu added a commit to sul-dlss/assembly-objectfile that referenced this pull request May 16, 2023
Take a similar approach to identifying `application/json` files in #50 to allow `.vtt` files to always return the `text/vtt` media type.

Fixes #119

Refs sul-dlss/sul-embed#1469
@edsu
Copy link
Contributor Author

edsu commented May 16, 2023

@andrewjbtw my apologies, this is my fault for taking the PR out of draft, which signaled to @jcoyne that it was in fact ready.

One thing that came up when we were discussing this in the #dlss-av-captions meeting on April 21st was the need for users to be able to turn off the transcript from playing. It does appear that VideoJS should support adding this if the existing control isn't obvious enough:

https://videojs.com/guides/text-tracks/#working-with-text-tracks

There was also some concern about the length of the each line of text. I believe this is a function of the lines in the VTT file itself, and isn't behavior we can easily change programmatically.

Fortunately there are very few (44) object files coded with a media type of text/vtt at the moment, so it shouldn't be too disruptive (as long as it works correctly?).

@andrewjbtw
Copy link

Looking at some of those items, I think this will cause vtt captions to appear over burned-in captions. For example, the videos with VTT in https://argo.stanford.edu/view/druid:bb761mb4522 are the two videos with burned in English and German captions. I think with VTT captioning turned on, this will overlay the VTT captions onto the video.

It's a small number for sure, but most of them (36) appear to be items where care was taken to generate burned-in captions specifically for a high-profile project. The other 9 appear to be zoom captions.

There's a larger number where the VTT was identified as "text/plain" but those won't show captions until remediated.

@edsu
Copy link
Contributor Author

edsu commented May 17, 2023

@andrewjbtw @jcoyne would it be helpful for me to add a feature flag to turn off VTT transcripts in settings for now?

@jcoyne
Copy link
Contributor

jcoyne commented May 17, 2023

@edsu If andrew doesn't want this out yet, then yes, that would be a good idea.

@andrewjbtw the user should be able to control the display of VTT captions, so they wouldn't have a problem unless they turned the cc on .

@andrewjbtw
Copy link

Let me see if I can get some quick feedback. I do appreciate being able to have captions, just wasn't expecting this this week.

The part of the UI that doesn't look great to me is the settings menu, not really the captions themselves.

@andrewjbtw
Copy link

@edsu After making another sample for stage (of one of the VT objects) and getting feedback from PSM, we should not turn this on yet. I'm happy to file a separate issue for keeping it off.

@jcoyne
Copy link
Contributor

jcoyne commented May 17, 2023

Agreed. The styles are messed up:
Screenshot 2023-05-17 at 2 07 33 PM

it seems like the .vjs-modal-dialog styles are not being applied.

There's also a CORS error:

Security Error: Content at https://embed-stage.stanford.edu/iframe?url=https://sul-purl-stage.stanford.edu/dw202dq8174&_v=1684274192 may not load data from https://sul-stacks-stage.stanford.edu/file/druid:dw202dq8174/Redivis_unedited_GMT20220303-205959_Recording.transcript.vtt.

@dinahhandel
Copy link

dinahhandel commented Oct 19, 2023

Expected behavior

A user should be able to choose if they want captions to display by clicking a button or otherwise selecting the functionality.
The CC icon as well as caption settings options must display in a way that does not interfere with playback of the media (as dropdown menus? As a pop-up over the media?).

Other considerations

What if there is no caption file? Should an option to play CC still display? Is it possible for a viewer to only show an option to play CC if there is a web.vtt file present? Or is CC hard-coded into the viewer and we need to notify a user that if there is no web.vtt file, captions are not yet available for playback? If this is the case, can we build in better functionality for requesting that the media be captioned? What about multi-lingual caption support?

@dinahhandel dinahhandel changed the title Use VTT transcript Use VTT transcript to display captions in media player Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants