SOD: An Automated Method for Processing Standing Orders at IRIS DMC

Robert Casey, IRIS Data Management Center

The efforts of IRIS over the years have allowed for ever-increasing amounts of seismic information to become available to member institutions and the scientific community at large. This increase in supply and accessibility has ultimately resulted in a very large user base with continuous data retrieval needs. So large is this user interest that personnel associated with the Data Management System (DMS) find themselves constantly pressed to develop new and innovative techniques, including automation, to continue providing a rich array of quality data, made to specification, and delivered with minimal delay.

A new technique has joined the ranks of data request processing methods at the Data Management Center. Called SOD (Standing Order for Data), this system allows for researchers to request data before it becomes available at the DMC. A request sent through SOD becomes a standing order, which is a request that remains active over a period of time, known as the request's shelf life.

In the past, on rare occasions, a user would ask the DMC to hold a request until the data asked for became available. This made request processing difficult because the DMC staff had to manually log the standing order in some fashion and then remember to go back to the request when the data finally came in. It is such problems as this for which members of the DMS seek automated solutions. The efficiency of automation has shown positive results for both the technicians processing data and the users who ideally want their data error-free and as soon as possible.

SOD is an automated solution to the standing order problem and has many other benefits besides satisfying requests with future data. In effect, a SOD request is satisfied the instant that matching data is archived at the DMC. Thus, a SOD request can also have the benefit of allowing a university to maintain an up-to-date archive of data, pruned to their specifications.

The data cannot, in general, be shipped the instant that it is processed. Since hundreds of matches can occur for just one SOD request during an archiving day, the problems of email flooding and shipping quantity would be staggering if shipments occurred a piece at a time. Hence, it was determined that the data generated by SOD would be bundled together and shipped no more frequently than a week at a time. Although this eliminates any semblance of real-time data acquisition, the efficiency and maintainability of this scheme more than makes up for the delay.

Requesting Data with SOD

The method of submitting a SOD request is very similar to BREQ_FAST, which is a simple text-based request format that has been used at the DMC for quite some time. It is not necessary for the reader to be familiar with BREQ_FAST in order to understand SOD. A sample SOD request is illustrated in Figure 1. This request can be created with any text editor. Each of the fields are separated by spaces. Those that are familiar with BREQ_FAST requests will recognize the layout, although it should be stressed that there are distinct differences that BREQ_FAST users should take careful note of (these differences are highlighted in Figure 1).


.SOD
.NAME Chris Smith
.INST University of Kalamazoo
.MAIL 1101 Thurston Blvd., Howell, PA  99111
.EMAIL csmith@kalamazoo.edu
.PHONE 555-555-9999
.FAX 555-555-9991
.MEDIA Exabyte 2 GB
.ALTERNATE MEDIA Exabyte 5 GB
.SHELF LIFE 941007 941229
.SHIPMENT FREQUENCY 2
.LABEL my_request
.END

ANMO  IU  94 09 10  00 00 00.0  94 11 30  23 59 59.9 2 SHZ BHZ
FFC   II  94 08 16  12 00 00.0  94 10 31  14 26 23.5 1 L?E
????? II  94 09 01  02 00 00.0  94 09 30  06 00 00.0 1 B??

Figure 1.  Sample SOD request text file.


A SOD request must always begin a line with ".SOD", typed in exactly as shown in the example. What follows are fields indicating the name and institution of the user, the mail and email addresses, phone and fax numbers, shipment media selections, the shelf life of the request, the shipment frequency in weeks, and the label for the request. This set of information does not need to be in any particular order, but the field names must be spelled correctly and the entries must end with a ".END" line.

Finally, the request continues with any number of request lines, specifying the station, network, time window, and channels desired. Three lines are shown in the Figure 1 example. It looks exactly like the BREQ_FAST format, but the way the time windows are interpreted is distinctly different.

The set of numbers on each line can be broken into four groups of three. Group 1 is the starting year, month, and day of the data desired. Group 2 is the starting hours, minutes, and seconds for each matching day (this is different from BREQ_FAST). Group 3 is the ending year/month/day of the data wanted and Group 4 is the ending time of each matching day.

What results is a span of days requested with time windows of data within each of those days. As an example, the FFC line asks for data between August 16th and October 31st of 1994, windowing out data between noon and 2:26 PM from each of those days. Users may find this handy if they are interested in data from only certain times of the day and it helps in cutting down data volume.*

* A caveat for BREQ_FAST users: It is common practice to enter all zeroes for the hour, minute, and second fields of a BREQ_FAST request. This is interpreted as a day boundary for start and end dates of the request. However, since SOD interprets the start and end times as time windows for each day, entering zeroes for starting and ending time would window out zero hours, zero minutes, and zero seconds out of every day in the date span, effectively nullifying the line.

The last fields on the line are the channel descriptors. The number indicates how many channel descriptors are listed and is followed by the descriptors themselves. What may be apparent is the use of question marks ("?") in the request lines. These are wildcards and essentially match to any character, if present. Thus, "B??" as a channel descriptor asks for all broadband channels. A "?????" in the station field asks for all stations.

After the SOD request is generated, it is mailed to the DMC, specifically to sod@iris.washington.edu. The request is then processed by automated programs and placed on a file system where it can be monitored. When the current date reaches the start time of the requestÕs shelf life, the request becomes active. Only active requests are scanned by SOD when new data are archived at the DMC. All of the data files read in by an archive tape are checked to see if they match to any active SOD requests. If one or more matches are found, a copy of that data file is sent to a temporary file system. Once there, each match spawns its own running copy of the miniSEED generator, using the copied data files as the source.

What results is a miniSEED volume consisting of many stations and channels, with time windowing, as specified in the user's request. Once a particular matching set is completed, the miniSEED volume is moved to a permanent location in the archiving system. On a weekly basis, the most current miniSEED compilations in the archive are shipped to tape or to the ftp file system, depending on the size of the file, and made available to the user. This constant periodic flow of data continues until the request exceeds its shelf life, wherein the request becomes inactive and is deleted shortly thereafter.

The SOD system was designed to keep the remote user continually apprised of the status of his requests. Any time the request changes state or a shipment is generated, the user is notified by email. The user can also track the progress of his requests by finger sod@iris.washington.edu. The table displayed will show all of the SOD requests in the system, their status, the size of the current miniSEED file, and shipment dates.

Making Use of SOD Data

SOD products are playfully called compost files, which illustrates the fact that the data records are lumped and packed together into one file.

If the size of the miniSEED shipment is less than 20 megabytes, it will be moved to the ftp site. The user can then retrieve these data by initiating an anonymous ftp session to dmc.iris.washington.edu. The directory and filename of the userÕs data shipment are provided in the email notification. The grace period for picking up data from the ftp site is generally about seven days, at which point it will be removed to make space for new request shipments. If for any reason the user is unable to retrieve the data in time or the retrieved copy is lost, the DMC can restore the data with little difficulty.

For SOD products larger than 20 megabytes, the data are shipped on tape. The tape medium of choice is sometimes specified by the user on the .MEDIA or .ALTERNATE MEDIA lines in the SOD request. The default is 2GB 8mm Exabyte, but 5GB 8mm, DAT, and 1/2 inch tape formats are also supported.

When the user has the SOD data, it can be read and analyzed by any software that is capable of reading miniSEED. However, it may be desirable to form a full SEED volume from the SOD data. This is performed with the most recent versions of rdseed (versions 3.46 and later), available from IRIS DMC.

In forming a full SEED volume from miniSEED, the data records must be knit together with header information. To supply this, the user must obtain a dataless SEED volume from the IRIS DMC for the network in question. A dataless SEED volume is a SEED volume without the data records. The DMC maintains dataless SEEDs covering all of the stations for each network supported. These SEED volumes can be retrieved by anonymous ftp in /pub/RESPONSES. Some networks have multiple dataless SEEDs listed, each with a date tag. It is usually best to get the most recent one.

Now armed with a miniSEED volume, a dataless SEED volume, and the latest version of rdseed, a full SEED volume can be formed. The stipulation currently is that the SEED volume formed can be only for one network, since any one dataless SEED supplied by the DMC applies to only one network. Before running rdseed, the user must set an environment variable called ALT_RESPONSE_FILE, set equal to the dataless SEED filename. This will tell rdseed that the response information for the data provided by the miniSEED can be found in the file specified.

The user types 'rdseed' on the command line to start up the program and then answers a series of questions to select the nature of processing desired. Enter the name of the SOD volume for the Input File, under Options enter 'd' for data output, and under Output Format enter '5' (the selection number for SEED output). Provided everything is set correctly, the end result is a file called seed.rdseed, which is the output SEED volume produced from merging the data records from the SOD volume with the dataless SEED indicated by the ALT_RESPONSE_FILE variable.

Items to Consider in Using SOD

The IRIS DMC supports many different request formats, each satisfying a special need. It is important to consider what each of these request formats is intended for, so as to make an informed decision about which tool to use to satisfy your needs. The old adage goes: "The right tool for the right job."

Which brings us to consider the proper and intended use of SOD, as opposed to other request formats. SOD can supply recent data, but only if the data is in the process of being archived. Data that is already in the Data Management System, should be requested with BREQ_FAST, XRETRIEVE, or SPROUT. SOD is not currently capable of extracting events, which is better tackled with a RUMBLE request or extraction of FARM products from the DMCÕs ftp site.

In addition, responsible consideration should be taken when submitting a SOD request. It is not the DMC's intention to let SOD become a mirroring tool for other sites, where all of the data being archived at the DMC gets bundled up and sent off to another group. This is simply too CPU and media intensive to be feasible. However, a SOD request asking for a handful of stations with selective time windows and channels is certainly welcomed. In other words a good SOD request is one that meets specific aims and where thought is given to minimizing the volume of data produced.

One final point to note is that a SOD request cannot remain in the system indefinitely. There is a current limit of 6 months imposed on the shelf life duration of a SOD request. Within two weeks of the expiration of that request, the user will be notified through email, giving that person the opportunity to submit a new request to take its place if so desired. The old request in the system cannot be renewed and will expire at the end of its shelf life.

Help Available for Using SOD

There is currently a man page for SOD on our system retrievable through anonymous FTP in /pub/manuals. This guide lays out the specifics of SOD request design. This manual is also presented in HTML form on our World Wide Web site (http://www.iris.washington.edu) and is accessible through our bulletin board under option 'm'. Any additional questions or help needed can be addressed through email to rob@iris.washington.edu or to tim@iris.washington.edu .

Conclusion

SOD is a new tool among the ranks of data request service provided by the IRIS Data Management System, forging a unique path where users can stay on top of the latest quality controlled seismic data and do not have to second-guess when new seismic traces become available. The automation involved removes the frustration and uncertainty that would otherwise occur with pending data issues, for both the scientific community and the data management technicians servicing their needs. SOD's credo could be: "You get it, when we get it."

In essence, SOD may be the first step in bringing more automated data solutions to the scientific community in the near future. The success of this tool will largely depend on its responsible use and the satisfaction voiced from its users.


Go to next Article
Return to: Newsletter Title Page
Return to: Information on IRIS