Closed Captions and the SCC Format

This page will hopefully cover everything you need to know to add closed captions during the DVD authoring process.

An Introduction to Closed Captions

Line 21 Closed Captions is the system used by North American television stations to encode information useful to the deaf and the hard of hearing in a format that can be turned on or off by the viewer (a page on the Teletext Then and Now site shows what this actually looks like, for those of you from PAL or SECAM-broadcasting countries). There are a handful of alternate formats for this purpose used by TV broadcasters in other parts of the world, but only Line 21 Closed Captions are supported for DVD's, so all non-Region 1 discs claiming to include "Captions for the Deaf and Hard of Hearing" actually use subtitles instead (the short difference between subtitles and closed captions: you turn subtitles on and off with your DVD remote, and you turn closed captions on and off with your TV remote). The following explanation is derived from the Closed Caption FAQ, maintained by Paul Robson, which does an excellent job of explaining what Line 21 Closed Captions are and how they work in a broadcast setting.

The mechanism used for Line 21 Closed Captions allows the viewer to choose between a maximum of four different "channels" of simultaneous captions, plus four more "channels" of non-program related text. In the years since the introduction of this system, it was discovered that channels CC1, CC2, and T1 (the first and second closed-caption channels and the first text channel) were the only ones broadcasters ever used, so alternate uses were found for two of the remaining channels. Channel T2 is now used to transmit Interactive TV (ITV) signals, which are used by MSN-TV to transmit the internet links for their service. Channel CC3 is now used to transmit the eXtended Data Service (XDS). XDS contains a wide variety of information, but the two portions most commonly used are the time of day signal which newer VCR's use to program their clocks, and the rating signal which is used to control what content children are allowed to watch via the "V-Chip"s in newer TV's.

Line 21 Closed Captions are transmitted on the last odd and even lines in the Vertical Broadcast Interval (VBI), the non-visible part of the TV signal used mostly for calibration purposes. If you adjust the vertical hold on a North American television set, you should be able to see one or two lines above the normal "top" of the screen, each made up of sixteen rapidly-blinking segments. These are Fields 1 and 2 of Scanline 21. Each segment of each line is used as a bit to build up a total of four eight-bit bytes, two bytes in the odd field and two bytes in the even field. Field 1 is used to transmit channels CC1, CC2, T1 and T2 (ITV), while Field 2 is used to transmit channels CC3 (XDS), CC4, T3 and T4.

Closed Captions on Videotapes and DVD's

One of the major benefits of the Line 21 Closed Caption system is that it is automatically recorded with the program when taped by a VCR and can then be displayed on playback. Since Digital Versatile Discs only store the visible portion of the video signal, an alternate method had to be found in order to transmit Closed Captions and their related services, especially since there is a legal requirement in the United States to provide Closed Captions on every movie sold in the country. For DVD's, this data is muxed into the MPEG elementary video files in the form of a special user data packet inside each GOP. As far as I know, every DVD authoring program that supports Closed Captions (including Scenarist and Maestro) import them as one or two text files (one for Field 1, the other for Field 2) containing the raw hexidecimal data rather than expect them to already be muxed into the video source files. I have never heard of a DVD that stored anything but closed captions in the user data packets (the DVD specification includes a superior alternative to the XDS ratings packet, PCFriendly is superior to ITV, and of course XDS time of day is useless on a DVD), so the rest of this discussion will focus on the Field 1 data and channels CC1 and CC2.

Closed Caption Requirements

Closed Caption Style Guide

The following are not required, but are followed by all Closed Captions I've seen either broadcast or on DVD's:

SCC Format

Both Sonic Scenarist and Spruce Maestro use the Scenarist Closed Caption format (extension .SCC) to import closed caption data. Here is an example:

Scenarist_SCC V1.0

01:02:53:14	94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f

01:02:55:14	942c 942c

01:03:27:29	94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f

The file is double-spaced, with data lines alternating with blank lines. The first line identifies the format and version--it needs to be exactly like this. The third and subsequent alternating lines start with the timecode and are followed by the data.

The timecode is in SMPTE format, which is either hours:minutes:seconds:frames for non-dropframe timebase or hours:minutes:seconds;frames for dropframe timebase. Both are 29.97 frames per second, but dropframe timebase accomplishes the fractional framerate by using 30 frames per second and skipping the first two frames each minute for nine out of every ten minutes (non-dropframe timebase simply runs the clock at exactly 29.97 frames per second). Use the same format you encoded your video with. Here's a hint: if it came from a broadcast source, it's probably dropframe, while if you created it from scratch, it's probably non-dropframe.

The data is made up of two-byte hexidecimal words, separated from each other by spaces and from the timecode by a tab character. The data uses only seven out of every eight bits of each byte, with the high bit used to satisfy odd parity--adding up all the bits has to result in an odd number, or the closed caption decoder will reject the byte as corrupt data. The major exception is ITV, which not only doesn't enforce odd parity, it also uses a slightly different character set than captions, text or XDS.

Deciphering the bytes

The full requirements for Closed Captions are contained in EIA/CEA standard 608-B (there is also a 708-B standard for high-definition TV captions, but that is beyond the scope of this document). CEA 608 can be purchased from IHS Global for $ 170, but luckily, the requirements are available for free in the Code of Federal Regulations, which can be obtained in PDF format from the Government Printing Office (just click "Browse" on the screen that comes up). Specifically, the requirements are contained in 47CFR15.119: book 47 covers the Federal Communications Commission, section 15 covers broadcasting in radio frequencies (including television), and 119 is the specific subsection for analog closed caption decoder requirements. The main adjustment you need to make to these requirements is for the odd parity: 00h (binary 00000000) is translated to 80h (10000000), but 07h (00000111) is left alone.

Here is a translation matrix to turn a 7-bit hexidecimal number into the equivalent odd-parity 8-bit number:

  80, 01, 02, 83, 04, 85, 86, 07, 08, 89, 8a, 0b, 8c, 0d, 0e, 8f,
  10, 91, 92, 13, 94, 15, 16, 97, 98, 19, 1a, 9b, 1c, 9d, 9e, 1f,
  20, a1, a2, 23, a4, 25, 26, a7, a8, 29, 2a, ab, 2c, ad, ae, 2f,
  b0, 31, 32, b3, 34, b5, b6, 37, 38, b9, ba, 3b, bc, 3d, 3e, bf,
  40, c1, c2, 43, c4, 45, 46, c7, c8, 49, 4a, cb, 4c, cd, ce, 4f,
  d0, 51, 52, d3, 54, d5, d6, 57, 58, d9, da, 5b, dc, 5d, 5e, df,
  e0, 61, 62, e3, 64, e5, e6, 67, 68, e9, ea, 6b, ec, 6d, 6e, ef,
  70, f1, f2, 73, f4, 75, 76, f7, f8, 79, 7a, fb, 7c, fd, fe, 7f

As explained in the Closed Caption FAQ, there are three different types of closed captions: roll-up, paint-on, and pop-on. The only one of these used in DVD's are pop-on. The requirements also cover using CC1 and CC2 to put two different closed caption channels on the DVD, but none of the software DVD players can support CC2, so I'll only explain how to create pop-on captions for channel CC1.

Format of Pop-on Captions

Pop-on captions have a set format, as described below, made up of commands (always 2-byte words) and characters (usually single bytes). If the caption is to be broadcast, each of the commands are doubled up for redundancy in case the signal is garbled in transmission (garbled data is usually displayed as character 7f, the solid block). The decoder is programmed to ignore a second command when it is the same as the first. When writing captions for a DVD, you can choose whether you wish to double or not (if you look at the sample towards the top of the page, you will see a lot of doubling).

  1. Pop-on captions are composed in an off-screen buffer before they are sent to the screen, so the first command is ENM, or Erase Non-displayed [buffer] Memory, with a code of 94ae.
  2. The second command is RCL, Resume Caption Loading, with a code of 9420. This formally tells the decoder that the next caption is of the pop-on type.
  3. The third command is known as a PAC, or Preamble Address Code. It is used to position the cursor. The grid for closed captions is the title safe area, 384 pixels tall by 576 pixels wide, divided into 16 rows and 32 columns (see here). The PAC can position the cursor to any row and to any column divisible by 4. Here is a table to find the PAC code for any position:
  4. Column 0 (can set color and underline):

    High Byte:919192921515161697971013139494
    Low Byte by Column:
    0 (white)d070d070d070d070d070d0d070d070
    0 (white) underline51f151f151f151f151f15151f151f1
    0 greenc262c262c262c262c262c2c262c262
    0 green underline43e343e343e343e343e34343e343e3
    0 bluec464c464c464c464c464c4c464c464
    0 blue underline45e545e545e545e545e54545e545e5
    0 cyan46e646e646e646e646e64646e646e6
    0 cyan underlinec767c767c767c767c767c7c767c767
    0 redc868c868c868c868c868c8c868c868
    0 red underline49e949e949e949e949e94949e949e9
    0 yellow4aea4aea4aea4aea4aea4a4aea4aea
    0 yellow underlinecb6bcb6bcb6bcb6bcb6bcbcb6bcb6b
    0 magenta4cec4cec4cec4cec4cec4c4cec4cec
    0 magenta underlinecd6dcd6dcd6dcd6dcd6dcdcd6dcd6d

    Columns 4 - 28 (color white, can set underline)

    High Byte:919192921515161697971013139494
    Low Byte by Column:
    4 underlined373d373d373d373d373d3d373d373
    8 underlined575d575d575d575d575d5d575d575
    12 underline57f757f757f757f757f75757f757f7
    16 underlined979d979d979d979d979d9d979d979
    20 underline5bfb5bfb5bfb5bfb5bfb5b5bfb5bfb
    24 underline5dfd5dfd5dfd5dfd5dfd5d5dfd5dfd
    28 underlinedf7fdf7fdf7fdf7fdf7fdfdf7fdf7f

  5. If you wish to start the caption on a column not evenly divisible by four, then the PAC is followed by a TO (Tab Over) code: 97a1 to move over one column, 97a2 to move over two columns, or 9723 to move over three columns.
  6. The text of the caption follows. Most of the character set is encoded in a single byte, so two characters are included in a single hexidecimal word. The remaining characters require two bytes. The byte 80h is used as filler and will not cause a space when the caption is displayed. Note that 20h is an "opaque" space (it will wipe out any pre-existing text), while 91b9 is a transparent space. Also note that the character set linked above is approximately the display size and typeface of screen captions. Finally, the third set of characters (labeled "Extended Characters") are not supported by most PC DVD players or by older television sets. ITV uses ISO-8859-1, the standard character set used by web browsers.
  7. The following mid-row commands can also be used for special effects ("no formatting" removes underline, italics, and flash; all PAC commands are assumed to be "no formatting"):
  8. CodeMeaning
    9120change to white, no formatting
    91a1change to white underline
    91a2change to green, no formatting
    9123change to green underline
    91a4change to blue, no formatting
    9125change to blue underline
    9126change to cyan, no formatting
    91a7change to cyan underline
    91a8change to red, no formatting
    9129change to red underline
    912achange to yellow, no formatting
    91abchange to yellow underline
    912cchange to magenta, not formatting
    91adchange to magenta underline
    91aeturn on italics
    912fturn on italics and underline
    94a8turn flash on

  9. If the caption to be displayed contains multiple pieces of dialog, then another PAC, another TO, and more text would follow.
  10. To clear the screen in preparation for drawing the caption, the command EDM (Erase Displayed Memory), code 942c, is used.
  11. The word 8080 may be used as filler to time out the frames until the caption needs to be displayed.
  12. Finally, to display the caption in the buffer on the screen, the command EOC (End Of Caption), code 942f, is used.
  13. All of the gaps between timecodes in an SCC file are filled in with the filler word 8080 when the DVD is created. This shortcut keeps SCC files from wasting space on repetitive information.
  14. To erase a caption, use EDM, 942c.

As an example, here is the sample .SCC file from above, followed by its meaning:

Scenarist_SCC V1.0

01:02:53:14	94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f

01:02:55:14	942c 942c

01:03:27:29	94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f

A Technical Explanation of Placement and Format of DVD Closed Caption User Data Packets

Data in MPEG files is organized in terms of packets. DVD closed captions are stored on a per-GOP basis, and are located within the video MPEG-2 file between the GOP Header packet and the (I-frame) Picture Header packet.

Structure of the DVD Closed Caption User Data Packet (all values are in hexidecimal):

BytesSample Contents Description
HEADER (9 bytes)
0 - 3 00 00 01 b2User Data Packet header (never changes).
4 - 7 43 43 01 f8DVD Closed Caption header (never changes).
89b Attributes:
BitsSample Contents Description
0         1 Extra Field Flag: whether or not to add an extra field's caption to the end of the caption segments. This is a by-product of analog editing equipment, which occasionally cut scenes between two fields of the same frame. I've also seen it used to create a CC User Data Packet with a length evenly-divisible by 4 (14 frames plus an extra field of captions works out to 96 bytes). Note that the Pattern Flag in the next CC User Data Packet must flip if the Extra Field Flag is set (Otherwise, you'd lose that odd field's worth of data).
1 - 5   01 101 Caption Count: How many caption segments in the packet. This is always at least as large as the number of video frames in the GOP (minus 1, when the Extra Field Flag is set), but it can be greater, in which case the extra frames of caption data are not used.
6  0        Filler (never changes)
7 1         Pattern Flag: Determines if each caption segment is Field 1 followed by Field 2 (1) or Field 2 followed by Field 1 (0). This also determines what the extra field will be if the Extra Field Flag is set: Field 1 for Pattern Flag 1, or Field 2 for Pattern Flag 0.
CAPTION SEGMENT (6 bytes)--repeat for each frame of GOP
nff Field (ff = Field 1, fe = Field 2)
n+1 - n+294 a3 Caption: Two bytes that are transmitted this field. Use 80 80 if there's nothing to transmit.
n+3fe Field (always opposite value from above)
n+4 - n+5 01 83Caption (see above)
EXTRA FIELD (3 bytes)--only if Extra Field Flag is set
mff Field (ff = Field 1, fe = Field 2)
m+1 - m+294 a3 Caption: Two bytes that are transmitted this field. Use 80 80 if there's nothing to transmit.
- x 00 00 00 00 00 00 Padding (repeat 00 byte until packet is evenly divisible by 4)

Note that some DVD's create a fixed 96-byte closed caption packet size as described above (by using padding for GOP's below 15 frames and the Truncate Flag for 15-frame GOP's), but many DVD's do not do this, and the DVD's created by Sonic Scenarist and Spruce DVDMaestro never do this. In these cases, the Extra Field Flag is always 0, the Pattern Flag is always 1 (Field 1 followed by Field 2), and no padding is used at the end of the data packet.

Another item to note is the variation of this format used by a number of MPEG-capturing devices, including Hauppauge's WinTV-250 card and Panasonic's DMR-H50S tabletop DVD recorder (in DVR mode). These devices use ff as the flag for both fields' caption data, relying on the Pattern Flag to tell the fields apart.

Return to SCC Tools Documentation.