18 March 1993 RW Busby revised 01 Nov 94
This document describes how to read a REF TEK SCSI disk with a PASSCAL Sun computer and perform some limited maintenance operations to recover disabled disks. The two most common problems with SCSI disks used jointly by the REF TEK system and UNIX are a corrupt label and bad blocks. These can be resolved with UNIX software. A third common problem, a blown disk controller, cannot be fixed except by replacement of the disk controller.
A power supply must be connected to spin the disk (with +12 Volts on pin B, return on Pin D) or a battery will work for short periods. Note this power cable is not a standard REF TEK power cable as used at a station. It is a jumped power cable or the cable attached to the power supply PASSCAL normally provides with the computer for the purpose of disk reading.
The REF TEK disk is connected to a PASSCAL computer via the second SCSI Bus Controller and addressed as /dev/sd5c for SCSI ID=1 disks (station disks) or /dev/sd6c for SCSI ID=2 (copy disks). A multicolored ribbon cable with a REF TEK SCSI connector is provided for this purpose. The mapping of ID to /dev/sd_c is done in the Sun kernel at boot time and can be configured differently but these are the Sun defaults. If you are not using a second SCSI bus controller, PASSCAL strongly advises halting the system each time a disk is connected or disconnected from the system. In this case, the address is /dev/sd1c or /dev/sd2c for ID 1 and 2 respectively. If this seems clumsy, users may create their own name for use in command lines. For example;
>set sdisk=-d /dev/sd5c >ref2segy $sdisk 93:076
The program disk2dat hides most of this complexity by presenting the available disks online as selectable options which can then be copied to tape. A similar wrapper for ref2segy called refin is available which also determines what disk is connected automatically.
The disk label for UNIX is a set of parameters written in sectors 0 and 1 of the disk which provide information to the Unix operating system about the disk. Sun OS 4.1.1, also known as Solaris 1.0, caches the disk label and does not expect the configuration of the SCSI devices to change without a proper reboot. As a result, a disk without a label or switching to read a different type of disk may confuse the operating system. If the system recognizes a disk with a different label at an address, e.g. /dev/sd5c, a warning will appear in the console window and the command will fail. The same command if repeated will work. A more serious case is when the system does NOT recognize a different disk is online at an address. If the cached label describes a disk smaller than the number of sectors requested to be transferred (a number read from the actual online disk), the sectors beyond the cached partition size are not transferred and no warning is given. The result is an incomplete dump without any errors. PASSCAL programs such as refdump, ref2segy incorporate a program called cklabel which updates the kernel label for the actual disk online. This program should be included in other scripts to be certain the proper label is cached. dkinfo will always echo the cached label, not necessarily the label for the disk online.
A different label, written by the REF TEK 72A system, is kept in sectors 2 and 3. This is used by the REF TEK System to store the address of the next sector in which to write data. During autodumps this address is updated when the dump completes successfully. Other information is also written in this label, such as a logical end of data for the disk (the last sector the 72A system will write regardless of the actual number of sectors on the disk), and a flag to indicate data is wrapped around the end of disk and continues at the beginning sector addresses, overwriting earlier data. The REF TEK label is refreshed by the program diskclear or a REF TEK “FRMT SCSI / DISK” command. REF TEK CPU Version 2.55 implements the logical end of disk as a fraction of the disk capacity, 0.999 to be exact. diskclear does the same but the numbers are not exactly identical for all disks for some reason.
If a disk fails repeatedly or “can’t be found” there are three likely causes and a host of unlikely ones. First, check to see the disk is connected, that it is powered and spinning and you’ve entered the correct name on the command line. The first likely cause is that the fuse has blown on the second SCSI bus, indicated by a red LED light near the cable port on the Sun chassis. If the fuse has blown, no other disks (or any devices) can be read on this bus until the fuse is replaced. A second likely cause of failure is that the disk may have no label or a corrupted label. This can happen if something has been inadvertently written on the first four sectors of the disk. The label is easily reconstructed, see below. The third cause of failure is a parity error usually associated with the disk controller. The Sun SCSI host checks parity and will not accept data from the SCSI bus with incorrect parity. Other SCSI hosts, including the DAS and the ASC-88 SCSI host adaptor for PCs, ignore parity. Hence disks which have operated fine on the DAS can sometimes not be read on the Sun. Most often a controller problem will prevent the disk from working on either the Sun or the DAS. To date this problem has only been observed on HP2233 and Quantum PD1225 drives and seems to be linked with connecting the disk to a live SCSI bus on the Sun. The parity error messages will be displayed in the console window and are severe enough at times to crash the system.
If a disk fails midstream with messages about defective blocks it may be possible to first skip past these blocks in order to salvage the remaining data and then reformat the disk to correct the defects. Reformatting the disk destroys any data on it. Note that a UNIX format is much different from the DAS command SCSI FORMAT / DISK. The DAS command simply resets the write block pointer in the REF TEK Disk label. The UNIX format authorizes the disk controller to remap sectors on the disk.
To salvage data beyond the bad blocks, use the UNIX dd command to skip ahead a number of blocks, using count to indicate the length. This creates files of orphan REF TEK packets which can then be processed with the ref2segy -f option (and some input from the operator). For example;
>dd if=/dev/sd5c of=disk5234.orph1 skip=10346 count=15876 bs=1024 >ref2segy -f disk5234.orph1 93:076 > “missing datastream information, Enter Sample Rate, Data Form, Gain for channels ...”
Before and after a disk is formatted and in order for it to be read on the Sun, the disk needs a label on sectors 0 and 1. All PASSCAL disks are shipped with Sun labels installed. Disks obtained from REF TEK or disk manufacturers may not have this label installed. To write a new label, use the Sun OS 4.1.1 command format. The version of format shipped with OS 4.1.2 does not work in the same manner and is difficult to use for labelling as you must enter each value manually. The 4.1.1 version runs fine under OS 4.1.2 with a new name. Note that writing a new label does not corrupt any data on the disk nor does it prevent further attempts to label the disk as a different type.
PASSCAL has established labels for each type of disk it owns and placed them in the file format.dat.passcal. This file can be accessed directly on the command line of format, e.g. >format -x format.dat.passcal or appended to the existing file /etc/format.dat by commenting out one line regarding search paths in the appended portion. The PASSCAL computers are shipped with our modified /etc/format.dat. The file format.dat.passcal is available by ftp by itself or as part of the normal PASSCAL software distribution.
To label a disk you must know the type. The type is a keyword to a set of parameters, i.e. the name of the label. The label describes the physical disk such as manufacturer, model, number of heads, and number of sectors. Few of these parameters affect the SCSI disk operation and are mainly of historical importance to the Sun OS. The disk type is written on the case of all PASSCAL supplied disks. Determining the type of other disks may require opening the case to inspect the drive or a trained ear and a good guess.
The disk labelling sequence is as follows:
>format -x format.dat.passcal select a disk  type of disk [for example, 17. “HP C2233 REF TEK 230”] label quit
So, after you’ve told the format program which type of disk and that it should label it, you see all these illegal requests and backup label failures culminating in the message “label failed”. Fine. The program attempts to write some bookkeeping information on alternate cylinders at the end of the disk for the benefit of the Sun OS when managing a filesystem on the disk. This is unnecessary and undesirable since we allow the REF TEK 72A system full use of every sector on the disk. Therefore our labels describe a disk bigger than it really is to send the format program’s extra labelling off into space. Otherwise you would get illegal requests when attempting to read the last sectors of data written by the REF TEK system because they conflict with backup label addresses. Building a proper UNIX filesystem on one of these disks (i.e. to mount it) may require a different label but in the few cases we tested no modification was required. Note that the label includes a partition table which specifies a c partition exactly matching the size returned by a SCSI capacity query and this correctly limits the Sun to the actual size of the disk. Short of a new format program there is no way to eliminate the warnings and preserve the desired behavior in this labeling scheme. Because of the label caching by the Sun OS, you are advised to repeat the instructions to label the SCSI disk a few times so that it actually makes it to the disk.
A “low level” format is initiated with the Sun format program. Any data on the disk will be lost in the process. The type of SCSI disk must be selected and you should label the disk. Before the formatting begins, it is also necessary to extract the original defect list from the drive for the program’s use. Type def to enter the defect management submenu and then extract the original defect list (type orig) and commit it (type com). Now quit back to the entry level menu and format the disk. A time estimate will be shown for the formatting process. The program can be interrupted with a Ctrl-C after it has begun the first of two test pattern passes. To be sure every sector on the disk is ok, let the program finish. Label the disk once more before exiting from the format program.
The REF TEK disk must now have a REF TEK label written to sectors 2 and 3 of the disk. This can be done with a REF TEK 72A system by selecting FRMT SCSI / DISK or by running the program diskclear on the Sun. Without this REF TEK label, dumps from the DAS RAM to the SCSI disk will fail and the terminal will return the message “Improper Format”.
PASSCAL can recover data from disks with blown SCSI controllers but the procedure takes a while. The PIC replaces the blown controller board with a spare kept on hand for the purpose and gets one last gasp of data from the disk. Even after this, the drive can usually be sent back for warranty repair. With PASSCAL disks, we would appreciate that this be done only by the PIC.