Closed Captions and the Scenarist Closed Caption Format

Closed Captions and the SCC Format

This page will hopefully cover everything you need to know to add closed captions during the DVD authoring process.

An Introduction to Closed Captions

Line 21 Closed Captions is the system used by North American television stations to encode information useful to the deaf and the hard of hearing in a format that can be turned on or off by the viewer (a page on the Teletext Then and Now site shows what this actually looks like, for those of you from PAL or SECAM-broadcasting countries). There are a handful of alternate formats for this purpose used by TV broadcasters in other parts of the world, but only Line 21 Closed Captions are supported for DVD's, so all non-Region 1 discs claiming to include "Captions for the Deaf and Hard of Hearing" actually use subtitles instead (the short difference between subtitles and closed captions: you turn subtitles on and off with your DVD remote, and you turn closed captions on and off with your TV remote). The following explanation is derived from the Closed Caption FAQ, maintained by Paul Robson, which does an excellent job of explaining what Line 21 Closed Captions are and how they work in a broadcast setting.

The mechanism used for Line 21 Closed Captions allows the viewer to choose between a maximum of four different "channels" of simultaneous captions, plus four more "channels" of non-program related text. In the years since the introduction of this system, it was discovered that channels CC1, CC2, and T1 (the first and second closed-caption channels and the first text channel) were the only ones broadcasters ever used, so alternate uses were found for two of the remaining channels. Channel T2 is now used to transmit Interactive TV (ITV) signals, which are used by MSN-TV to transmit the internet links for their service. Channel CC3 is now used to transmit the eXtended Data Service (XDS). XDS contains a wide variety of information, but the two portions most commonly used are the time of day signal which newer VCR's use to program their clocks, and the rating signal which is used to control what content children are allowed to watch via the "V-Chip"s in newer TV's.

Line 21 Closed Captions are transmitted on the last odd and even lines in the Vertical Broadcast Interval (VBI), the non-visible part of the TV signal used mostly for calibration purposes. If you adjust the vertical hold on a North American television set, you should be able to see one or two lines above the normal "top" of the screen, each made up of sixteen rapidly-blinking segments. These are Fields 1 and 2 of Scanline 21. Each segment of each line is used as a bit to build up a total of four eight-bit bytes, two bytes in the odd field and two bytes in the even field. Field 1 is used to transmit channels CC1, CC2, T1 and T2 (ITV), while Field 2 is used to transmit channels CC3 (XDS), CC4, T3 and T4.

Closed Captions on Videotapes and DVD's

One of the major benefits of the Line 21 Closed Caption system is that it is automatically recorded with the program when taped by a VCR and can then be displayed on playback. Since Digital Versatile Discs only store the visible portion of the video signal, an alternate method had to be found in order to transmit Closed Captions and their related services, especially since there is a legal requirement in the United States to provide Closed Captions on every movie sold in the country. For DVD's, this data is muxed into the MPEG elementary video files in the form of a special user data packet inside each GOP. As far as I know, every DVD authoring program that supports Closed Captions (including Scenarist and Maestro) import them as one or two text files (one for Field 1, the other for Field 2) containing the raw hexidecimal data rather than expect them to already be muxed into the video source files. I have never heard of a DVD that stored anything but closed captions in the user data packets (the DVD specification includes a superior alternative to the XDS ratings packet, PCFriendly is superior to ITV, and of course XDS time of day is useless on a DVD), so the rest of this discussion will focus on the Field 1 data and channels CC1 and CC2.

Closed Caption Requirements

Closed Captions are displayed in a fixed-width font. Personally, I have found that Lucida Console 16 pt Bold does the best job of representing this Closed Caption font in Windows.
The grid for Closed Captions is 16 rows tall, but only 32 columns wide. This means that the vast majority of subtitles have to be reformatted to fit.

Closed Caption Style Guide

The following are not required, but are followed by all Closed Captions I've seen either broadcast or on DVD's:

Closed Captions can be up to four lines long at any one time.
Captions usually appear under the speaker. If this would obscure something important, the caption is moved to the top of the screen, but still next to the speaker.
Dialog is always in all-caps. Older closed caption decoders only used upper-case (non-accented) letters, so make sure nothing is case-dependent. A common exception is "Mc" or "Mac" for Scottish names.
The older machines also only supported ordinary white text, so even though the Closed Caption specification allows flashing text, underlines, and seven different colors, don't make the understanding of any of your captions dependent on any of these features. Every broadcast program I've seen closed captions for put everything in white until the very last caption, which gives credit to the caption company and therefore is the nicest-looking caption of the whole show.
Italics are used for the usual reasons ("THE QUEEN MARY"), for emphasis, for singing, and for anything that occurs off-screen.
Dialog always ends with some sort of punctuation, if only an ellipsis to indicate the dialog will be continued in the next caption.
The special character "" (a musical note) is placed before and after all sung dialog. Two notes with nothing between them means that music is playing. Both of these rules are only used when knowledge of the lyrics or the fact that music is playing is important to the plot; most of the time, this sort of cue is left out of Closed Captions.
Captions for sound effects usually appear centered at the top of the screen. They are in lower case and surrounded by spaced parentheses or brackets (i.e. "( thump )" or "[ thump ]"). If a sound effect is localized, it should be positioned appropriately.
Off-screen dialog is positioned close to the source, or centered at the top of the screen if the source cannot be placed. If the viewer is supposed to be able to recognize the voice, the character's name appears in normal case before the dialog or else a standard screenplay format is used ("Bill: IT'S IN THE CABINET." or "[Tom] NO IT'S NOT.").
There are situations where there is too much text to allow useful positioning. In this case, a change in speaker is indicated by starting the line of dialog with ">> ". For news broadcasts (which use this technique a lot), an additional convention is to start a change of topic with ">>> ". I've also seen some captions use the subtitle standard of "-- " instead of ">> ".
The timing of captions is important, especially if a piece of dialog or a sound effect is dramatically significant. For this reason, closed caption dialog will often appear before the character starts speaking, to ensure that everyone using closed captions can read them in time to get the joke or jump when the killer shows up.

SCC Format

Both Sonic Scenarist and Spruce Maestro use the Scenarist Closed Caption format (extension .SCC) to import closed caption data. Here is an example:

Scenarist_SCC V1.0

01:02:53:14	94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f

01:02:55:14	942c 942c

01:03:27:29	94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f

The file is double-spaced, with data lines alternating with blank lines. The first line identifies the format and version--it needs to be exactly like this. The third and subsequent alternating lines start with the timecode and are followed by the data.

The timecode is in SMPTE format, which is either hours:minutes:seconds:frames for non-dropframe timebase or hours:minutes:seconds;frames for dropframe timebase. Both are 29.97 frames per second, but dropframe timebase accomplishes the fractional framerate by using 30 frames per second and skipping the first two frames each minute for nine out of every ten minutes (non-dropframe timebase simply runs the clock at exactly 29.97 frames per second). Use the same format you encoded your video with. Here's a hint: if it came from a broadcast source, it's probably dropframe, while if you created it from scratch, it's probably non-dropframe.

The data is made up of two-byte hexidecimal words, separated from each other by spaces and from the timecode by a tab character. The data uses only seven out of every eight bits of each byte, with the high bit used to satisfy odd parity--adding up all the bits has to result in an odd number, or the closed caption decoder will reject the byte as corrupt data. The major exception is ITV, which not only doesn't enforce odd parity, it also uses a slightly different character set than captions, text or XDS.

Deciphering the bytes

The full requirements for Closed Captions are contained in EIA/CEA standard 608-B (there is also a 708-B standard for high-definition TV captions, but that is beyond the scope of this document). CEA 608 can be purchased from IHS Global for $ 170, but luckily, the requirements are available for free in the Code of Federal Regulations, which can be obtained in PDF format from the Government Printing Office (just click "Browse" on the screen that comes up). Specifically, the requirements are contained in 47CFR15.119: book 47 covers the Federal Communications Commission, section 15 covers broadcasting in radio frequencies (including television), and 119 is the specific subsection for analog closed caption decoder requirements. The main adjustment you need to make to these requirements is for the odd parity: 00h (binary 00000000) is translated to 80h (10000000), but 07h (00000111) is left alone.

Here is a translation matrix to turn a 7-bit hexidecimal number into the equivalent odd-parity 8-bit number:

  80, 01, 02, 83, 04, 85, 86, 07, 08, 89, 8a, 0b, 8c, 0d, 0e, 8f,
  10, 91, 92, 13, 94, 15, 16, 97, 98, 19, 1a, 9b, 1c, 9d, 9e, 1f,
  20, a1, a2, 23, a4, 25, 26, a7, a8, 29, 2a, ab, 2c, ad, ae, 2f,
  b0, 31, 32, b3, 34, b5, b6, 37, 38, b9, ba, 3b, bc, 3d, 3e, bf,
  40, c1, c2, 43, c4, 45, 46, c7, c8, 49, 4a, cb, 4c, cd, ce, 4f,
  d0, 51, 52, d3, 54, d5, d6, 57, 58, d9, da, 5b, dc, 5d, 5e, df,
  e0, 61, 62, e3, 64, e5, e6, 67, 68, e9, ea, 6b, ec, 6d, 6e, ef,
  70, f1, f2, 73, f4, 75, 76, f7, f8, 79, 7a, fb, 7c, fd, fe, 7f

As explained in the Closed Caption FAQ, there are three different types of closed captions: roll-up, paint-on, and pop-on. The only one of these used in DVD's are pop-on. The requirements also cover using CC1 and CC2 to put two different closed caption channels on the DVD, but none of the software DVD players can support CC2, so I'll only explain how to create pop-on captions for channel CC1.

Format of Pop-on Captions

Pop-on captions have a set format, as described below, made up of commands (always 2-byte words) and characters (usually single bytes). If the caption is to be broadcast, each of the commands are doubled up for redundancy in case the signal is garbled in transmission (garbled data is usually displayed as character 7f, the solid block). The decoder is programmed to ignore a second command when it is the same as the first. When writing captions for a DVD, you can choose whether you wish to double or not (if you look at the sample towards the top of the page, you will see a lot of doubling).

Pop-on captions are composed in an off-screen buffer before they are sent to the screen, so the first command is ENM, or Erase Non-displayed [buffer] Memory, with a code of 94ae.
The second command is RCL, Resume Caption Loading, with a code of 9420. This formally tells the decoder that the next caption is of the pop-on type.
The third command is known as a PAC, or Preamble Address Code. It is used to position the cursor. The grid for closed captions is the title safe area, 384 pixels tall by 576 pixels wide, divided into 16 rows and 32 columns (see here). The PAC can position the cursor to any row and to any column divisible by 4. Here is a table to find the PAC code for any position:

Column 0 (can set color and underline):

Row:	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
High Byte:	91	91	92	92	15	15	16	16	97	97	10	13	13	94	94
Low Byte by Column:
0 (white)	d0	70	d0	70	d0	70	d0	70	d0	70	d0	d0	70	d0	70
0 (white) underline	51	f1	51	f1	51	f1	51	f1	51	f1	51	51	f1	51	f1
0 green	c2	62	c2	62	c2	62	c2	62	c2	62	c2	c2	62	c2	62
0 green underline	43	e3	43	e3	43	e3	43	e3	43	e3	43	43	e3	43	e3
0 blue	c4	64	c4	64	c4	64	c4	64	c4	64	c4	c4	64	c4	64
0 blue underline	45	e5	45	e5	45	e5	45	e5	45	e5	45	45	e5	45	e5
0 cyan	46	e6	46	e6	46	e6	46	e6	46	e6	46	46	e6	46	e6
0 cyan underline	c7	67	c7	67	c7	67	c7	67	c7	67	c7	c7	67	c7	67
0 red	c8	68	c8	68	c8	68	c8	68	c8	68	c8	c8	68	c8	68
0 red underline	49	e9	49	e9	49	e9	49	e9	49	e9	49	49	e9	49	e9
0 yellow	4a	ea	4a	ea	4a	ea	4a	ea	4a	ea	4a	4a	ea	4a	ea
0 yellow underline	cb	6b	cb	6b	cb	6b	cb	6b	cb	6b	cb	cb	6b	cb	6b
0 magenta	4c	ec	4c	ec	4c	ec	4c	ec	4c	ec	4c	4c	ec	4c	ec
0 magenta underline	cd	6d	cd	6d	cd	6d	cd	6d	cd	6d	cd	cd	6d	cd	6d

Columns 4 - 28 (color white, can set underline)

Row:	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
High Byte:	91	91	92	92	15	15	16	16	97	97	10	13	13	94	94
Low Byte by Column:
4	52	f2	52	f2	52	f2	52	f2	52	f2	52	52	f2	52	f2
4 underline	d3	73	d3	73	d3	73	d3	73	d3	73	d3	d3	73	d3	73
8	54	f4	54	f4	54	f4	54	f4	54	f4	54	54	f4	54	f4
8 underline	d5	75	d5	75	d5	75	d5	75	d5	75	d5	d5	75	d5	75
12	d6	76	d6	76	d6	76	d6	76	d6	76	d6	d6	76	d6	76
12 underline	57	f7	57	f7	57	f7	57	f7	57	f7	57	57	f7	57	f7
16	58	f8	58	f8	58	f8	58	f8	58	f8	58	58	f8	58	f8
16 underline	d9	79	d9	79	d9	79	d9	79	d9	79	d9	d9	79	d9	79
20	da	7a	da	7a	da	7a	da	7a	da	7a	da	da	7a	da	7a
20 underline	5b	fb	5b	fb	5b	fb	5b	fb	5b	fb	5b	5b	fb	5b	fb
24	dc	7c	dc	7c	dc	7c	dc	7c	dc	7c	dc	dc	7c	dc	7c
24 underline	5d	fd	5d	fd	5d	fd	5d	fd	5d	fd	5d	5d	fd	5d	fd
28	5e	fe	5e	fe	5e	fe	5e	fe	5e	fe	5e	5e	fe	5e	fe
28 underline	df	7f	df	7f	df	7f	df	7f	df	7f	df	df	7f	df	7f

If you wish to start the caption on a column not evenly divisible by four, then the PAC is followed by a TO (Tab Over) code: 97a1 to move over one column, 97a2 to move over two columns, or 9723 to move over three columns.
The text of the caption follows. Most of the character set is encoded in a single byte, so two characters are included in a single hexidecimal word. The remaining characters require two bytes. The byte 80h is used as filler and will not cause a space when the caption is displayed. Note that 20h is an "opaque" space (it will wipe out any pre-existing text), while 91b9 is a transparent space. Also note that the character set linked above is approximately the display size and typeface of screen captions. Finally, the third set of characters (labeled "Extended Characters") are not supported by most PC DVD players or by older television sets. ITV uses ISO-8859-1, the standard character set used by web browsers.
The following mid-row commands can also be used for special effects ("no formatting" removes underline, italics, and flash; all PAC commands are assumed to be "no formatting"):

Code	Meaning
9120	change to white, no formatting
91a1	change to white underline
91a2	change to green, no formatting
9123	change to green underline
91a4	change to blue, no formatting
9125	change to blue underline
9126	change to cyan, no formatting
91a7	change to cyan underline
91a8	change to red, no formatting
9129	change to red underline
912a	change to yellow, no formatting
91ab	change to yellow underline
912c	change to magenta, not formatting
91ad	change to magenta underline
91ae	turn on italics
912f	turn on italics and underline
94a8	turn flash on

If the caption to be displayed contains multiple pieces of dialog, then another PAC, another TO, and more text would follow.
To clear the screen in preparation for drawing the caption, the command EDM (Erase Displayed Memory), code 942c, is used.
The word 8080 may be used as filler to time out the frames until the caption needs to be displayed.
Finally, to display the caption in the buffer on the screen, the command EOC (End Of Caption), code 942f, is used.
All of the gaps between timecodes in an SCC file are filled in with the filler word 8080 when the DVD is created. This shortcut keeps SCC files from wasting space on repetitive information.
To erase a caption, use EDM, 942c.

As an example, here is the sample .SCC file from above, followed by its meaning:

Scenarist_SCC V1.0

01:02:53:14	94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f

01:02:55:14	942c 942c

01:03:27:29	94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f

Timecode 01:02:53:14: Clear buffer (94ae 94ae); start pop-on caption (9420 9420); move cursor to row 15, column 20 (947a 947a); move over 2 more columns to column 22 (97a2 97a2); display "( horn honking )" (a820 68ef f26e 2068 ef6e 6be9 6e67 2029); clear screen (942c 942c); wait 2 frames (8080 8080); and display caption (942f 942f).
Timecode 01:02:55:14: Clear caption (942c 942c).
Timecode 01:03:27:29: Clear buffer (94ae 94ae); start pop-on caption (9420 9420); move cursor to row 15, column 4 (94f2 94f2); display "HEY, THERE." (c845 d92c 2054 c845 5245 ae80--note the 80 used as a spacer); clear screen (942c 942c); wait 2 frames (8080 8080); and display caption (942f 942f).

A Technical Explanation of Placement and Format of DVD Closed Caption User Data Packets

Data in MPEG files is organized in terms of packets. DVD closed captions are stored on a per-GOP basis, and are located within the video MPEG-2 file between the GOP Header packet and the (I-frame) Picture Header packet.

Structure of the DVD Closed Caption User Data Packet (all values are in hexidecimal):

Bytes Sample Contents Description

HEADER (9 bytes)

0 - 3 00 00 01 b2 User Data Packet header (never changes).

4 - 7 43 43 01 f8 DVD Closed Caption header (never changes).

9b

Attributes:

Bits	Sample Contents	Description
0	`1`	Extra Field Flag: whether or not to add an extra field's caption to the end of the caption segments. This is a by-product of analog editing equipment, which occasionally cut scenes between two fields of the same frame. I've also seen it used to create a CC User Data Packet with a length evenly-divisible by 4 (14 frames plus an extra field of captions works out to 96 bytes). Note that the Pattern Flag in the next CC User Data Packet must flip if the Extra Field Flag is set (Otherwise, you'd lose that odd field's worth of data).
1 - 5	`01 101`	Caption Count: How many caption segments in the packet. This is always at least as large as the number of video frames in the GOP (minus 1, when the Extra Field Flag is set), but it can be greater, in which case the extra frames of caption data are not used.
6	`0`	Filler (never changes)
7	`1`	Pattern Flag: Determines if each caption segment is Field 1 followed by Field 2 (`1`) or Field 2 followed by Field 1 (`0`). This also determines what the extra field will be if the Extra Field Flag is set: Field 1 for Pattern Flag 1, or Field 2 for Pattern Flag 0.

CAPTION SEGMENT (6 bytes)--repeat for each frame of GOP

n ff Field (ff = Field 1, fe = Field 2)

n+1 - n+2 94 a3 Caption: Two bytes that are transmitted this field. Use 80 80 if there's nothing to transmit.

n+3 fe Field (always opposite value from above)

n+4 - n+5 01 83 Caption (see above)

EXTRA FIELD (3 bytes)--only if Extra Field Flag is set

m ff Field (ff = Field 1, fe = Field 2)

m+1 - m+2 94 a3 Caption: Two bytes that are transmitted this field. Use 80 80 if there's nothing to transmit.

FOOTER

- x 00 00 00 00 00 00 Padding (repeat 00 byte until packet is evenly divisible by 4)

Note that some DVD's create a fixed 96-byte closed caption packet size as described above (by using padding for GOP's below 15 frames and the Truncate Flag for 15-frame GOP's), but many DVD's do not do this, and the DVD's created by Sonic Scenarist and Spruce DVDMaestro never do this. In these cases, the Extra Field Flag is always 0, the Pattern Flag is always 1 (Field 1 followed by Field 2), and no padding is used at the end of the data packet.

Another item to note is the variation of this format used by a number of MPEG-capturing devices, including Hauppauge's WinTV-250 card and Panasonic's DMR-H50S tabletop DVD recorder (in DVR mode). These devices use ff as the flag for both fields' caption data, relying on the Pattern Flag to tell the fields apart.

Return to SCC Tools Documentation.