Reftek Data Download

(bcb 2001:221)

 

The primary goal of field data download is to create at least two tape copies of the raw reftek data and to evaluate station performance. A secondary goal is to create a PASSCAL database and archive tapes for passive experiments, and to create shot/receiver gathers for active source experiments. This document addresses the primary goal of ensuring that you have raw reftek data to take home from the field and that you are leaving your stations in good working order.

 

After a service run you will have either a stack of A-05 reftek disks, A-06 or A-07 DAS, or perhaps even Fast Copy System disks. All of these formats adhere to the download guidelines outlined below.

 

Throughout this entire process you should keep your eye on the system console (/usr/openwin/bin/cmdtool –C). This window is where all system messages will be redirected. Capturing system messages is very important for diagnosing problems during data download.

 

For each of the following sections there is a corresponding trouble shooting section at the end of this document.

 

Step 1: Powering A Reftek Disk

Before you can begin you will need to supply power to the disk and have the disk spin-up. This is typically done with a battery and a jumpered power cable. The standard reftek power cable, 4-pin to 4-pin, has 12V on pin A and ground on pin-C. A jumpered reftek power cable has pins A & B connect; this is how a DAS powers a disk, by supplying 12V on pin B. When connecting power to the disk you should verify, by listening (put your ear to the disk), that the disk spins up and the disk actuator activates (a series of clicks).

 

Step 2: Connecting to the SUN & refecho

Now that the disk is powered up you connect it to the SUN with the 19-pin SCSI connector. Once connected you should verify that the disk is recognized by the SUN and by PASSCAL software by using the program refecho. refecho ;provides the second indication that your data disk is healthy, the first being the proper power up sequence. refecho should produce the following output (Note: when accessing the reftek disk it is advised to always use the raw device name):

 

<passcal:field> refecho /dev/rsd5c

refecho: Version Number 1999.309

reftek label:

         write sector                     1000

         wrap count                      0

         read sector                      4

         copy sector                     4

         LEOD                            8859935

         Label                              REFTEK ARS DISK

         1stdir sect                       0

         last dir sect                     0

         overwrite enable  0

         num dir blocks   0

         num dir entries   0

         switch sector                   0

 

Important lines in this output are:

write sector: this should roughly coincide with disk usage reported from SCSI status (Note: SCSI status reports KB, refecho reports 512 byte blocks)

read and copy sector: both should be 4

LEOD: Logical End of Data (i.e. total size of disk in 512 byte blocks)

 

Step 3: refdump or disk2dat

At this point you have two options for creating tapes of the raw reftek data: refdump or disk2dat. refdump is a command-line program that will copy a reftek disk to tape or file and little else. refdump requires the user to keep track of tape contents and, if writing multiple files to tape, to use the no-rewind tape device (e.g. /dev/rmt/0n). disk2dat is a GUI interface to refdump that will copy a reftek disk only to tape. disk2dat also maintains an index file for each tape and automatically uses the no-rewinding tape device. The one caveat of using disk2dat is that if you have a problem disk (i.e. there are bad blocks and the copy hangs) the tape you are using cannot have data added beyond the damaged file. This is a nuisance for small DAT tapes but a potentially costly problem for DLTs.

 

The two programs are invoked with the following command line arguments (disk2dat requires that the environment variable PDB_RAWTAPEDIR is set. This is where the index files are stored):

 

<passcal:field> refdump /dev/rsd5c /dev/rmt/0n

 

set PDB_RAWTAPEDIR if not already defined

<passcal:field> echo “setenv PDB_RAWTAPEDIR /my_path” >> ~/.cshrc

<passcal:field> source ~/.cshrc

<passcal:field> disk2dat &

 

Both programs will write to standard output (refdump) or a text window (disk2dat) the DAS number, year:julian day and percent complete of the data being downloaded (Note: disk2dat will most likely not report 100% done even when finished).

e.g.

<passcal:field> refdump /dev/rsd5c /dev/rmt/0n

refdump: Version Number 2001.117 (LargeFile Aware)

7355 01:020     # Percent done 58%

7355 01:020     # Percent done 100%

 

Most researchers wish to make two copies of their raw data. If your field computer comes equipped with two tape drives you can run two concurrent refdump or disk2dat jobs to create your duplicate copy. Alternatively you can run successive tape writing jobs or concurrent with your tape job write data to disk with refdump and then later tar the raw reftek images from disk to tape.

Step 4: QC tools: ref2segy/ref2mseed/ref2log/logview/clockview/pql

While writing your raw reftek images to tape you will probably wish to convert the raw data for QC. The ref2 program suite will convert raw data to segy (ref2segy), mseed (ref2mseed), or generate a log file (ref2log; both ref2segy and ref2mseed create log files). Programs to run after converting the raw data to trace/log files are: logview – a graphical representation of reftek log messages with ties to the text file. clockview – a graphical representation of timing information. And, pql – a waveform viewer.

 

Step 5: diskclear

Once you are satisfied that the data have been successfully backed up you will want use diskclear to reset the write sector pointer to the beginning of the disk. The program performs the same task as FSC’s Format SCSI.

 

Trouble Shooting

Disk Doesn’t Spin

1)    Are your cables properly connected?

2)    Check to see that you have ~12V (and not with a charger connected to the battery) on your battery.

3)    Are you using a jumpered power cable?

4)    If it is a reftek SCSI disk (A-05) check the internal fuses on the power supply.

5)    If it is a DAS
a) can you communicate with the instrument?
b) can you spin the disk with FSC (F5-4-1)
c) do you measure ~12V on the free power port?
d) check internal fuses.

 

Disk Spins but actuator doesn’t sound right

1)    Check your power supply (see above).

2)    This could indicate a damaged disk or SCSI controller. Send to PASSCAL for data recovery.

 

Disk Spins, actuator sounds good, but can’t download or trouble downloading data

1)    Check your power supply (see above).

2)    Check your SCSI connection.

3)    Review console messages and proceed as outlined below.

 

PASSCAL programs complains that it cannot open disk for reading.

e.g.

<passcal:field> refecho /dev/rsd5c

refecho: Version Number 1999.309

Can't open /dev/rdsk/c0t0d0s0 for reading/output

1)    Check your connectons.

2)    Are you trying to access the correct disk?

3)    Check for console messages and proceed as outlined below.

4)    The data disk has lost or corrupted its reftek label
a) use diskclear to reinstall a reftek label
b) use
diskclear -a to move the write pointer sector to end of data.

 

Console Messages:

Wrong Magic Number

e.g.

Jul 27 11:08:25 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@1,0 (sd31):

Jul 27 11:08:25 kali    corrupt label - wrong magic number

 

This indicates that the disk’s SUN label is corrupt or out of date. This is easily fixed with the Unix format command. In order to run this command successfully you must first become root.

 

<passcal:field> su

password: root_password

# format

Searching for disks...done

 

AVAILABLE DISK SELECTIONS:

       0. c0t0d0 <ST39140A cyl 17660 alt 2 hd 16 sec 63>

          /pci@1f,0/pci@1,1/ide@3/dad@0,0

       1. c0t1d0 <ST39140A cyl 17660 alt 2 hd 16 sec 63>

          /pci@1f,0/pci@1,1/ide@3/dad@1,0

       2. c1t0d0 <QUANTUM-ATLAS10K2-TY734L-DDD6 cyl 17336 alt 2 hd 20 sec 413>

          /pci@1f,0/pci@1/scsi@1/sd@0,0

       3. c2t1d0 <SEAGATE-ST34520N-1498 cyl 9004 alt 2 hd 4 sec 246>

          /pci@1f,0/pci@1/scsi@2/sd@1,0

Specify disk (enter its number): 3 (NOTE: make sure it’s the reftek disk)

selecting c2t1d0

[disk formatted]

 

 

FORMAT MENU:

        disk       - select a disk

        type       - select (define) a disk type

        partition  - select (define) a partition table

        current    - describe the current disk

        format     - format and analyze the disk

        repair     - repair a defective sector

        label      - write label to the disk

        analyze    - surface analysis

        defect     - defect list management

        backup     - search for backup labels

        verify     - read and display labels

        save       - save new disk/partition definitions

        inquiry    - show vendor, product and revision

        volname    - set 8-character volume name

        !<cmd>     - execute <cmd>, then return

        quit

Disk not labeled. Would ;you like to label it now? yes

format> q

# exit

<passcal:field>

 

Phase Parity Error

If during a data download you encounter a “Phase Parity Error” you can suspect a problem with your SCSI chain. Possible candidates are loose connectors, damaged cables or damaged SCSI controller. If you suspect the SCSI controller, send the disk to PASSCAL for data recovery.

Disk offline

e.g.

Jul 27 11:01:45 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@1,0 (sd31):

Jul 27 11:01:45 kali    auto request sense failed (reason=unexpected_bus_free)

Jul 27 11:02:01 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@1,0 (sd31):

Jul 27 11:02:01 kali    disk not responding to selection

 

or

Jul 20 11:49:26 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@1,0 (sd31):

Jul 20 11:49:26 kali    disk not responding to selection

Jul 20 11:49:26 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@1,0 (sd31):

Jul 20 11:49:26 kali    offline

 

or

Jul 20 12:08:30 kali unix: /pci@1f,0/pci@1/scsi@2 (glm2):

Jul 20 12:08:30 kali    Cmd (0x61c88) dump for Target 1 Lun 0:

Jul 20 12:08:30 kali unix: /pci@1f,0/pci@1/scsi@2 (glm2):

Jul 20 12:08:30 kali            cdb=[ 0x3 0x0 0x0 0x0 0x14 0x0 ]

Jul 20 12:08:30 kali unix: /pci@1f,0/pci@1/scsi@2 (glm2):

Jul 20 12:08:30 kali    pkt_flags=0x8402 pkt_statistics=0x60 pkt_state=0x7

Jul 20 12:08:30 kali unix: /pci@1f,0/pci@1/scsi@2 (glm2):

Jul 20 12:08:30 kali    pkt_scbp=0x0 cmd_flags=0x2c60

Jul 20 12:08:30 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2 (glm2):

Jul 20 12:08:30 kali    Connected command timeout for Target 1.0

Jul 20 12:08:30 kali unix: WARNING: ID[SUNWpd.glm.cmd_timeout.6017]

Jul 20 12:08:30 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2 (glm2):

Jul 20 12:08:30 kali    Target 1 reducing sync. transfer rate

Jul 20 12:08:30 kali unix: WARNING: ID[SUNWpd.glm.sync_wide_backoff.6014]

Jul 20 12:08:30 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@1,0 (sd31):

Jul 20 12:08:30 kali    auto request sense failed (reason=timeout)

Jul 20 12:08:46 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@1,0 (sd31):

Jul 20 12:08:46 kali    offline

 

These errors indicate that the SUN is not recognizing the reftek disk. If you have determined that you have good power, that the disk is spinning, and that you have good connections then this is a serious problem. It is best to send this disk to PASSCAL for data recovery.

 

Error Block

e.g.

Jul 27 10:44:40 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@1,0 (sd31):

Jul 27 10:44:40 kali    Error for Command: read                    Error Level: Retryable

Jul 27 10:44:40 kali unix:      Requested Block: 67024                     Error Block: 67025

Jul 27 10:44:40 kali unix:      Vendor: SEAGATE                            Serial Number: 00X82718   

Jul 27 10:44:40 kali unix:      Sense Key: Media Error

Jul 27 10:44:40 kali unix:      ASC: 0x16 (data sync mark error), ASCQ: 0x0, FRU: 0xd2

Jul 27 10:44:40 kali unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@1,0 (sd31):

Jul 27 10:44:40 kali    Error for Command: read                    Error Level: Fatal

Jul 27 10:44:40 kali unix:      Requested Block: 67024                     Error Block: 67025

Jul 27 10:44:40 kali unix:      Vendor: SEAGATE                            Serial Number: 00X82718   

Jul 27 10:44:40 kali unix:      Sense Key: Media Error

Jul 27 10:44:40 kali unix:      ASC: 0x16 (data sync mark error), ASCQ: 0x0, FRU: 0xd2

 

When your data download is interrupted with block errors there are 3 approaches that you can take to fix the problem or recover data beyond the bad blocks. It should be noted that occasionally attempts to repair bad blocks, especially approach 3, can further damage the disk.

 

1)    If the bad block is not Block 0-3 then the first approach is to try to repair the bad block using the format command (again, you need to be root).

 

<passcal:field>  su

Password:

# format

Searching for disks...done

 

AVAILABLE DISK SELECTIONS:

       0. c0t0d0 <ST39140A cyl 17660 alt 2 hd 16 sec 63>

          /pci@1f,0/pci@1,1/ide@3/dad@0,0

       1. c0t1d0 <ST39140A cyl 17660 alt 2 hd 16 sec 63>

          /pci@1f,0/pci@1,1/ide@3/dad@1,0

       2. c1t0d0 <QUANTUM-ATLAS10K2-TY734L-DDD6 cyl 17336 alt 2 hd 20 sec 413>

          /pci@1f,0/pci@1/scsi@1/sd@0,0

       3. c2t1d0 <SEAGATE-ST34520N-1498 cyl 9004 alt 2 hd 4 sec 246>

          /pci@1f,0/pci@1/scsi@2/sd@1,0

Specify disk (enter its number): 3

selecting c2t1d0

[disk formatted]

 

FORMAT MENU:

        disk       - select a disk

        type       - select (define) a disk type

        partition  - select (define) a partition table

        current    - describe the current disk

        format     - format and analyze the disk

        repair     - repair a defective sector

        label      - write label to the disk

        analyze    - surface analysis

        defect     - defect list management

        backup     - search for backup labels

        verify     - read and display labels

        save       - save new disk/partition definitions

        inquiry    - show vendor, product and revision

        volname    - set 8-character volume name

        !<cmd>     - execute <cmd>, then return

        quit

format> repair

Enter absolute block number of defect: 67025 (NOTE: Error Block from Console)

Ready to repair block 67025? Yes

Repaired Successfully

format> repair

Enter absolute block number of defect:67026 (Bad Block +1)

This block doesn't appear to be bad.  Repair it anyway? No

format> q

# exit

<passcal:field>

 

In this approach you repair the bad block and then increment by 1 successive repairs until you hit a good block. If you continually encounter bad blocks for ~10 iterations it is best to move on the approach 2 & 3.

 

2)    In this approach you will use dd to move past the bad block. Since bad blocks usually come in clusters we will choose an arbitrary number of blocks past the bad block to move before starting our read. In this way we will loose some data, but we will ensure that we are past the problem spot on the disk.

 

For example, our bad block is block 67025 and we choose 20 blocks to skip beyond the bad block. Then we would use the following:

Seek=int((67025+20)/2) (Note: This must be an even number)

dd if=/dev/rsd5c of=ourfile_name bs=1024 iseek=33522

 

If you know how much data to expect you can use the count option (see dd man page) to limit the dump. In the above example dd will copy from $seek to the end of the disk.

Often if a disk has bad blocks there will be several clusters. If the above dd command encounters another bad block further into the disk it is often useful to this command in a script. Below is an example.

#!/bin/sh

#

# script to pull data of disks with bad blocks

# bcb

# 2001:152

#

# first gather some data

# this allows user to define:

#       PREFIX: for file naming

#       BLOCK: where to start (e.g. first bad block encountered)

#       INTERVAL: number of blocks past bad block to seek

#       MAX: maximum number of iteration before quiting

#

echo "File name prefix: \c"

read PREFIX

echo "Starting integer for file numbering: \c"

read i

echo "Starting block [512 block - use 4 for new disk]: \c"

read BLOCK

echo "Number of 1024 blocks past bad block to skip: \c"

read INTERVAL

echo "Maximum number of iterations: \c"

read MAX

#

# some starting values

ITER=0

LASTBAD=0

#

# loop until $MAX is reached or we have more than

# one instance with the same bad BLOCK

#

while [ $ITER -le $MAX ]

do

  if [ $LASTBAD -eq $BLOCK ]

  then

     echo "

      Last Bad Block $LASTBAD == Current Bad Block $BLOCK

      Assuming that we have either reached the end of

      the disk or a spot we can't get past

      EXIT"

     exit

  fi

#

# test if first iteration and we want the beginning of the

# disk

#

  if [ $BLOCK -le 4 ] && [ $ITER -eq 0 ]

  then

     seek=`expr $BLOCK / 2`

  else

    seek=`expr $BLOCK / 2`

    seek=`expr $seek + ${INTERVAL}`

  fi

#

# now dd data off disk

#

  echo

  echo "#####################ITERATION ${ITER}########################"

  echo "bad block (512): $BLOCK"

  echo "seeking to block (1024): $seek"

  echo "dd if=/dev/rsd5c of=${PREFIX}.dd${i} bs=1024 iseek=$seek"

  dd if=/dev/rsd5c of=${PREFIX}.dd${i} bs=1024 iseek=$seek

#

# test if file is of 0 size

# if yes remove

#

  size=`ls -s ${PREFIX}.dd${i} | awk '{print $1}'`

  if [ $size  -eq  0 ]

   then

      echo "***ZERO SIZE FILE***"

      echo removing ${PREFIX}.dd${i} of $size size

      rm ${PREFIX}.dd${i}

  fi

#

# get location that file croaked

# we are assuming there has been at least one Error Block

# reported to the syslog otherwise why run this puppy?

  LASTBAD=$BLOCK

  BLOCK=`grep "Error Block" /var/adm/messages | tail -1 | awk '{print $NF}'`

#

# increment counters

#

  ITER=`expr $ITER + 1`

  i=`expr $i + 1`

done

 

 

3)    The last approach is the most time consuming and potentially the most damaging to the disk. Here we will run a test from the format command that will analyze and repair bad blocks across the entire disk. This process can take ~4-6 hours so it is best to run it overnight. Again, y6u must be root.

 

<passcal:field> su

Password:

# format

Searching for disks...done

 

AVAILABLE DISK SELECTIONS:

       0. c0t0d0 <ST39140A cyl 17660 alt 2 hd 16 sec 63>

          /pci@1f,0/pci@1,1/ide@3/dad@0,0

       1. c0t1d0 <ST39140A cyl 17660 alt 2 hd 16 sec 63>

          /pci@1f,0/pci@1,1/ide@3/dad@1,0

       2. c1t0d0 <QUANTUM-ATLAS10K2-TY734L-DDD6 cyl 17336 alt 2 hd 20 sec 413>

          /pci@1f,0/pci@1/scsi@1/sd@0,0

       3. c2t1d0 <SEAGATE-ST34520N-1498 cyl 9004 alt 2 hd 4 sec 246>

          /pci@1f,0/pci@1/scsi@2/sd@1,0

Specify disk (enter its number): 3

selecting c2t1d0

[disk formatted]

 

FORMAT MENU:

        disk       - select a disk

        type       - select (define) a disk type

        partition  - select (define) a partition table

        current    - describe the current disk

        format     - format and analyze the disk

        repair     - repair a defective sector

        label      - write label to the disk

        analyze    - surface analysis

        defect     - defect list management

        backup     - search for backup labels

        verify     - read and display labels

        save       - save new disk/partition definitions

        inquiry    - show vendor, product and revision

        volname    - set 8-character volume name

        !<cmd>     - execute <cmd>, then return

        quit

format> analyze

 

 

ANALYZE MENU:

        read     - read only test   (doesn't harm SunOS)

        refresh  - read then write  (doesn't harm data)

        test     - pattern testing  (doesn't harm data)

        write    - write then read      (corrupts data)

        compare  - write, read, compare (corrupts data)

        purge    - write, read, write   (corrupts data)

        verify   - write entire disk, then verify (corrupts data)

        print    - display data buffer

        setup    - set analysis parameters

        config   - show analysis parameters

        !<cmd>   - execute <cmd> , then return

        quit

analyze> test    

Ready to analyze (won't harm data). This takes a long time,

but is interruptable with CTRL-C. Continue? yes