Abstract

A prototype 64 channel Fastbus TDC built at Fermilab is described. The module features a full custom CMOS four channel gated integrator chip. One level of analog buffering at the inputs is implemented on chip. A four event deep output queue at the bus interface allows a high event rate with low dead time. Each channel can record up to two hits per event. With an occupation rate of 10%, the module can operate at 40,000 events per second with dead time on the order of 15%. The TDC operates in common stop mode with a full scale of 1 μsec and a resolution of 1 nsec.

I. INTRODUCTION

As the data rates for high energy physics experiments increase, the presently available data acquisition modules, particularly ADCs and TDCs are proving inadequate. With the advent of high speed CMOS circuitry, a new generation of high performance data acquisition modules can be built without increasing cost, complexity or power consumption.

Double buffered front ends can substantially reduce deadtime by absorbing trigger rate fluctuations before data is transmitted to the digitizing system. Zero suppression and a compact data format also contribute to efficient use of available system bandwidth. This TDC has a two buffer input queue and a four buffer output queue. After an event is acquired, input buffers can be swapped in about 50 nsec. Two hit capability is implemented with 40 nsec double pulse resolution. Only data from hit channels is digitized and written into the output queue. This queue allows events to be acquired and read out simultaneously. With these two queues, both the on card digitizing system and the crate readout system can be running at nearly maximum capacity with very low deadtime.

Measured speed on the Fastbus with a high speed master (LeCroy 1821) is 100 nsec attachment time and 100 nsec per 32 bit word data transfer. Due to the low occupancy of most wire chamber systems, up to half of the potential bus bandwidth is lost to end effects of the Fastbus protocol yielding a 26 megabyte per second maximum data rate in the TDC crate.

Time is measured using a full custom gated integrator chip implemented in a 2 μm CMOS process. The measured differential non-linearity is 0.6 nsec RMS and integral non-linearity is 3 nsec over a range of 2 μsec. Included on chip are input buffers, multiplicity sum circuits and zero suppression circuitry. The voltages that result from hit channels are routed through two levels of analog multiplexing; the first level on chip and the second selecting one of eight chips. The outputs of the two second level multiplexers are connected to two

FIG. 1

TDC BLOCK DIAGRAM

U.S. Government work not protected by U.S. Copyright.
ADCs where they are digitized to 10 bits of precision.

A microprocessor performs calibration to a crystal time standard and stores a list of gain and pedestal values in a small RAM for use by a multiplier accumulator chip (MAC) to correct the digitized data. The output queue is implemented as a dual port RAM with appropriate addressing logic to serve as a circular buffer. With each event stored, the page write pointer is advanced. When the data for one event has been read by Fastbus, an SS = 2 status is returned and the page read pointer is advanced. If the read pointer overtakes the write pointer, the buffer is empty while the write pointer overtaking the read pointer indicates buffer full.

One result of using CMOS circuitry wherever possible is a total power consumption of 35 watts, of which about 10 watts is dissipated by the ECL level adapters to the Fastbus. A block diagram of the board is shown in Fig. 1.

II. MODULE DESCRIPTION

Inputs

The inputs are 100 ohm differential ECL. A programmable threshold is provided which compensates for attenuation on long cable runs, thus avoiding the need for line repeaters. The inputs are physically located at the rear of the card on the Fastbus auxiliary connector. It is much easier to replace a module in a crate with rear connector inputs since the inputs do not need to be uncabled.

Gating

The TDC operates in common stop mode only. The input enable and stop is one signal called TGATE. The TDC input is live for the duration of TGATE and uses its trailing edge as common stop. In addition to TGATE, a clear input buffer (FCLEAR) and start conversion (STARTC) are provided. These signals can be picked up either from front panel Lemo inputs on each module or from the Fastbus TR lines to facilitate crate wide gating.

In order to keep the logic of two hit operation simple, the TDC requires a gate in time with the input signals. For installations where such a trigger would be difficult to implement, the buffered inputs allow the use of a gate generated by a pre-trigger. The dead time per pre-trigger is only the time needed to swap buffers (50 nsec).

Buffer Operation

There are two buffers in the input queue. The operation of the queue is such that each TGATE fills the buffer at the front of the queue and each conversion cycle completed or an input FCLEAR empties the buffer at the back of the queue. The operating principle is one of preserving the order of each input control signal independently. As an example the following sequence of control inputs would result in the following actions: two TGATES followed by one FCLEAR and then one STARTC. The two TGATES would fill two buffers with two events. The FCLEAR would clear the first buffer and the STARTC would cause the contents of the second buffer to be digitized. All the possible combinations are distilled into the state diagram shown in Fig. 2.

III. CUSTOM TVC CHIPS

Functional blocks

The chip used on the TDC can be divided into six sections: input flip flops and steering logic, 16 precision switched current sources with integrating capacitors, hit map latches, 16 simple current switches for multiplicity sums, zero suppression logic, and output addressing logic.

Inputs

Each input is attached to the clock input of four D flip flops, two for each buffer. Within each flip flop pair, the Q output of the first flip flop is "and"ed with the 2HIT (two hit enable) line, then attached to the D input of the next flip flop.
One flip flop toggles on the first hit, the other on the second. The TGATE signal is attached to the direct clear input of all the flip flops. IBSEL (input buffer select) is used to select flip flop pairs. By having two complete copies of the input section per channel, buffer swapping can be done very quickly. The outputs of each flip flop go to both the hit map latches and the precision current sources. A detail of one input channel is shown in Fig. 3.

**FIG. 3**
INPUT SECTION FOR ONE CHANNEL

![Input Section Diagram](image)

**Precision Current Switches**

In this sub-circuit, the gate of a large geometry (6000 μm x 10 μm) transistor is used as the integrating capacitor. Simulation predicts a value of approximately 70 pF its gate capacity, with minimal dependence of capacitance upon gate voltage in the range of 1.2 to 4 volts. Subsequent indirect measurement has shown a range of values between 60 and 90 pF for the prototype chips.

For maximum speed, the minimum gate width transistors allowed by the process (2 μm) were chosen for the current switch. However such devices have low g. To get both high speed and high output impedance, a triple cascode current source was used.

During the measuring interval, the output of the current source is directed to the integrating capacitor. The resulting voltage is stored until the FCLEAR signal is asserted shorting the capacitor to a node which is attached to an internal band-gap reference. A transient analysis done with a simulator predicts the following results:

- Current switch rise time (T_r): 5 nsec
- Current switch delay time delay (T_d): 10 nsec
- Integrator output 0.01% settling to Band-gap node after FCLEAR (T_s): 50 nsec

Measured values are in reasonable agreement with these figures, although most quantities can only be indirectly measured. Fig. 4 shows the slope of the transfer function for 4 of the integrators within a chip. The graph consists of recording the difference in output voltage (V) for a 50 nsec time difference at 38 points across the range of the chip. The differential non-linearity is defined as the histogram of the ΔV values which in this instance is 0.6 nsec. The integral non-linearity is defined as the largest deviation from a straight line fit. The integral non-linearity tracks very closely channel to channel, so that if accuracy better than 3 nsec were needed, a fit done to the transfer function would yield sub-nanosecond accuracy. The output of each current switch is isolated from external loads by an operational amplifier configured as a unity gain buffer. The settling time of this amplifier (about 2 μsec) determines how long the external logic must wait after the end of TGATE before digitizing.

**Hit Latches**

The input flip flops contain an up to date map of the hit channels only at the trailing edge of TGATE. To preserve the hit map for use by the multiplicity sums and zero suppression logic, one eight bit latch for each buffer is implemented. The latches are transparent and use TGATE as their latch enable signal. The outputs of the two latches go to an octal 1 of 2 selector and to the 16 multiplicity sum current switches. The control line for the selector comes from a pin labelled OBSel (output buffer select). The selector output goes to the zero suppression logic.

**Multiplicity Sums**

A set of simple current switches is used for forming hit multiplicity sums. The outputs from eight current switches are wire "or"ed to a node brought to a pin on the chip. There is a separate sum for each of the two input buffers which allows for the double buffering of the trigger processor that makes use of these signals. Since the inputs of the sums come from the latched hit bits, the sums remain valid until a new event is loaded into the corresponding buffer.
Zero Suppression and Output Logic

The zero suppression portion of the chip consists of 8 flip flops with a common clock called NXTHit (next hit) whose outputs go to an 8 line to 3 line priority encoder. The logic for this section is shown in Fig. 5. Data from the hit map latches is asynchronously loaded with the PLoad (parallel load) signal. The address of the next hit channel is encoded as three bits and sent to the output address tri-state buffers, the lower order three bits of the output analog selector and a 3 line to 8 line decoder. The 8 outputs of the decoder are routed to the synchronous reset lines of the flip flops. With each NXTHit, the highest priority address is cleared and the next highest priority hit channel address appears.

FIG. 5
HIT EDITOR BLOCK DIAGRAM

A carry in and carry out are provided for operating multiple chips on a common bus. The carry out of each chip is priority encoded externally to form the higher order address bits. Carry out is registered which allows it to be synchronously clocked from chip to chip with NXTHit which is common to all the interconnected chips.

The outputs of the 16 op-amp buffers are routed to the inputs of a 1 of 16 analog selector. The low order three of four selector control lines come from the zero suppression logic, the fourth line is attached to OBSel. An analog voltage and its matching channel address are thus presented for each hit.

Chip packaging

The prototype chips are packaged in a 40 pin dual in-line ceramic package. ES 2 Corporation, the chip manufacturer, has a number of packaging options, including leaded and leadless chip carriers. We chose the D.I.P. package because it was the cheapest. So much board space has already been saved by going to a custom chip that there was no need to get the smallest package possible.

IV. MODULE OPERATION

The support circuitry as much as the TVC chips has been designed with the goal of keeping dead time to an absolute minimum. For the expected occupancies, dividing the channels into two halves and using two ADCs converting simultaneously gives a speed increase of about 30% over one ADC. A small state machine is employed to scan the TVCs to find the hit count which is in turn used for output memory allocation. The chips are specifically designed to have a non-destructive read of the hit map so that they can be read twice; once each for scanning and conversion. The scan time scales with occupancy at 200 nsec per hit channel in each group of 32 channels.

After the scan finishes, the conversion cycle begins. A large state machine PAL device controls the address lines of the correction RAM sending the appropriate gain and pedestals values to the MAC inputs. The ADCs run one half conversion time out of phase with respect to each other which allows one MAC, correction RAM and output RAM to serve both. With an average occupancy of 10%, a full crate of TDCs should be able to read out in less than 20 nsec. With output buffering, the module can acquire and transmit data simultaneously. If Fastbus readout time is less than conversion time, it does not contribute to deadtime.

Timing

A logic analyzer recording of a readout of two closely spaced events is shown in Fig. 6. The upper portion of the figure shows the input and control signals, while the analog output of the TVC chip is shown in the lower section. It can be seen from the figure that the output voltage is proportional to the interval between the arrival of an input signal (labelled W0..W3) and the trailing edge of TGATE (labelled TSTOP). The first event consists of two hits on all four inputs while the second event shows one hit on three inputs. The figure demonstrates the double buffered operation of the TVC chip. The second event is acquired at the same time the first event is being read out.

Microprocessor

A Zilog Z8002 serves as a host processor for an RS-232 diagnostic port and calculates the MAC correction values based on data acquired during a calibration cycle. A summary of the results of the calibration cycle is available to the Fastbus in a "report page" in the output RAM which includes a list of the gain and pedestal values and the relative efficiency of each channel.

Since control operations are not speed critical, a Fastbus write to specific output RAM locations designated as CSR registers interrupts the processor which in turn executes the control register operation. By using the processor as an
intermediary between the various control registers on the board and the bus, the interface hardware is significantly simplified, since only memory is attached to the bus. The use of a processor in this way allows arbitrary mapping of control functions in memory, enabling the use of one slave interface and host processor building block on different board designs. The processor takes from two to five μsec to complete CSR commands. The host can check the status of the interrupt by reading back the control bit(s) set in the CSR register. They are cleared by the processor at the end of its service routine.

The microprocessor allows for the possibility of on board diagnostic software to facilitate initial debugging and subsequent maintenance. 16K bytes each of system ROM and RAM are attached to the processor.

Self-Calibration

The TDC uses an input test pulse injector whose timing is based on a crystal oscillator to generate an internal data cycle. An internal cycle is used to set the gain to a fixed one nsec per count. Pedestals may be based on either internal or external signals. An internal cycle sets pedestals such that all offsets are zero. For external cycles, a CSR register is provided for specifying the desired reading for a delay between an externally generated start and stop (typically chamber cathode pulsing). Up to 16384 events can be averaged during the acquisition of calibration data. The processor calculates channel by channel a series of pedestals that bring the actual values into line with the desired values, thereby subtracting out external delays. Since pedestals are calculated using unsigned arithmetic, negative pedestal values are truncated to zero and an error flag is set for the offending channel in the report area.

Data Format

For efficient use of the bandwidth of the data acquisition system, a compact data format is important. The TDC uses a leading wordcount format. The first 16 bit word contains eight bits of wordcount (0 - 128 bits has 129 possibilities to encode), five bits of slot address in the crate, and three user definable bits. Data words consisting of 10 bits of time information and 6 bits of channel number follow. If a channel has two hits, the address is repeated. To demarcate event boundaries in the data stream, an empty module sends a wordcount of zero. All 16 bits can be used for data; no tag bit is necessary for indicating the difference between control information and data. In addition, leading wordcount format is easily handled by processor based systems.

Summary

A Fastbus TDC using a full custom 2 μm CMOS time to voltage integrator chip has been built. The board can operate with a deadtime of 15% at event rates of 40KHz. Additional features unique to this module include: front end buffering, a dual ported bus interface and programmable input thresholds. Data compaction and a leading word count format have been implemented. The board is less complex and dissipates less power than existing designs.

V. REFERENCES


* Operated by Universities Research Association, Inc. under contract #DE-AC02-76CH03000 with the U.S. Department of Energy