A Lossless Data Compression

Algorithm for Electro-Cardiograms

 

 

 

 

Presented by Mark Hosang

 

 

 

 

 

Honor’s Thesis

 

 

 

 

Submitted in partial fulfillment of the

Requirements for the degree of

Bachelor of Science in Computer Science at

Brandeis University

 

 

 

 

 

Under the Supervision of

James Storer and

Giovanni Motta

 

 

 

 

 

May 19, 2002

Waltham, Massachusetts

 


      Table of Contents

 

Table of Contents                           ……………………………………… ii

 

 

Abstract                                         ……………………………………… iii

 

 

1.  Introduction                               ……………………………………… 1

 

 

2.  Data Compression Background……………………………………... 2

 

 

3.  Current Work in Area                ……………………………………… 4

 

 

4.  Method                                     ……………………………………… 6

 

 

5.  Results                                      ……………………………………… 9

 

 

6.  Discussion                                 ……………………………………… 10

 

 

7.  Conclusion                                ……………………………………… 11

 

 

8.  Appendix 1:                               ……………………………………… 12

EKG Data Reference Information

 

 

9.  References                                ……………………………………… 13

 


 

Abstract

Within any given set of EKG data there is a high degree of similarity between many of the leads that monitor a patient’s heartbeat.  We present a method that is an improvement over the present data compression algorithms by first performing cross-prediction between the leads that are most highly correlated.  We will then perform auto-correlation on the residuals, and arithmetically encode the residuals derived from the auto-correlation.  We will show an increase in the amount of compression achievable by removing information that is mutual between the leads.

 

1.  Introduction

At the beginning of the home computing era, hard drive space was at a premium; people were very concerned with the amount of space that data occupied on their hard drives.  The simple solution to this dilemma was data compression, the field of compressing data such that it consumes less space.  Data compression algorithms take advantage of the fact that most data is redundant or predictable.  By removing these redundancies, the algorithm is able to reduce the amount of space the data takes up.  Thus, data compression was vital for cost-efficient local storage of data.  In comparison, storage today is comparatively cheap and plentiful.  However, in the rapidly growing digital marketplace, data compression has become far more important for the cost-efficient transfer of data over networks, the internet being a ubiquitous example of one of these networks.

Within the medical sector, there is a device called an Electrocardiogram (EKG or ECG), which is used to monitor a patient’s heartbeat.  It monitors the electrical signal the heart produces each time it beats.  These electrical signals appear similar to sound waves.  EKGs play an instrumental role in the diagnosis of heart abnormalities such as palpitations, arrhythmias and heart disease.  Furthermore, physicians use it to check the current state of the heart, or to monitor if the patient is having a heart attack. 

There are three primary types of EKGs: resting, stress, and ambulatory.  A resting EKG uses twelve leads and is performed at a facility, such as a hospital equipped with an EKG machine.  A stress EKG is a test taken while the patient is on a treadmill to observe the heart’s reaction to physical stress.  The last type, ambulatory EKG is used to examine patient’s daily heart activity.

We are working with the corpus from the PTB ECG reference database in Germany.  The specifics for this corpus are located in Appendix 1 at the end of this document.  Most importantly, we shall note that the data is over-sampled at 10kHz, whereas the actual sample rate range of a standard EKG machine is 125 - 500Hz.  This, however, should not affect the results of our approach since the data points are more correlated and therefore are more compressible.  This compensates for the increase of lead size from over-sampling.  Each data set in the corpus contains each of the twelve leads in the standard EKG.  We would also like to point out that the RAW data samples are stored in 16-bit little-endian byte ordering (stored least significant byte first), though an actual EKG machine only uses 8 – 12 bits.  Furthermore, since the heart is an analog source, there is a certain amount of error in the stored EKG data from digitization error of the analog to digital conversion.

The standard setup for an EKG is twelve receptors attached to different positions on the chest and back regions of the patient.  All twelve receptors, or leads, record the same electrical signals, and therefore all contain the same heartbeat, plus some interference caused by the patient’s body.  For example, any movement the patient may make, such as breathing.  Some types of these interferences can invert the heartbeat, while other types weaken the signal or add noise.  Each lead monitors a specific region of the heart; the standard setup of twelve leads monitors the front and side of the heart and a little bit of the right ventricle. 

The Ambulatory Electrocardiogram is a portable self-contained system that uses Holter recorders with tapeless or solid-state technologies for storage.  Physicians attach these portable EKGs to the patient at the beginning of the day.  The patients then carries out their daily activities and reports to the center at the end of the day.  These new portable monitors allow for better analysis of the condition of the patient’s heart, which is unobtainable from the shorter clinical observations.  However, we should note that the Ambulatory EKG utilizes only two or three leads for monitoring a patient’s heart.  Currently, there is no need to apply more leads since this type of EKG is used only to diagnosis arrhythmias or palpitations.  Because of the duration of the monitoring and the fact that the patient is moving, there are significant muscle artifacts introduced into the signal.  These artifacts make ambulatory EKGs very difficult to interpret.

With the widespread adoption of the internet by the general populace, the medical community has expressed a desire for the ability to transmit data on patients to other colleagues in order to discuss patient diagnoses.  However, raw EKGs are quite large, being comprised of twelve different leads that may span up to 24 hours in the case of the portable monitor.  With a sample rate of 200 samples per second (sps), at 8-bits per sample, a 24-hour recording of all twelve leads would be 192 megabytes.  A file this large would take approximately eight hours to transmit over a 56K modem, or 2 hours over a 25KB/sec cable connection.  This makes it essential to use a data compression algorithm to reduce these long transmission times.

 

2.  Data Compression Background

Correlation is a measure of similarity between two variables.  That is to say, knowing the value of X, how much do you know of Y?  The higher the correlation, the more shared information there is between the variables X and Y.  Thus, a higher correlation indicates greater redundancy.  Moreover, as we have mentioned before, a highly redundant signal is, theoretically, also highly compressible.

One of the tools often used in the field of data compression is a linear predictor.  A linear predictor attempts to use the data elements that precede an element xn to make an educated guess about the value of xn.  It then subtracts this predicted value from the actual value of xn, to get the error from the prediction.  This error is called the residual of the prediction.  To reconstruct the value of xn without any data loss we would only have to encode the error from the prediction.

When talking about linear predictors, people mention the terms “order” or “lag”.  The order or lag of the prediction specifies how many past or future values the prediction is based on.  This is to say, a first order linear predictor would only use xn-1 to predict xn, whereas we would base a second order prediction on xn-1 and xn-2. 

Linear predictors perform best on continuous data, like waveforms.  We can see linear prediction used in the popular audio format known as “Adaptive Differential Pulse Code Modulation” (ADPCM).  The following equation details how the auto-correlation function works mathematically.

 

 

Where Rxx is the autocorrelation function with lag k, E[] is the expectation function of xn knowing xn-k, and M is the total number of data points.  In this particular example, the above equation finds the average weights for the whole data with order k.  From this basic model, we construct a matrix R from which we can determine the values of the weights to use.

 

 

Where N is the order of prediction used.

 

To predict xn, we find the “weights”, that is to say, how much an element contributes to the prediction of xn.  To do this, we create a matrix R of all of the auto-correlations values with lag from 0 to n-1.  We then compute vector P that contains the auto-correlation values from 1 to n.  We then determine the weight vector , which is the size of the lag.  A consists of all weights to be applied to elements being used for prediction.  The sum of these weights is approximately 1, and we derive the predicted value of x from the following equation:

 

 

The cross-correlation function is derived from the auto-correlation function.  However, instead of using xn-1 to predict xn, we use element yn, where y denotes a different data set.  In this case, we can use many different data sets to predict xn, just as a linear predictor can use the preceding elements.  The term lag here has the same meaning as it had in the linear predictor.  That is to say that with a lag of 1, we would use yn-1 to predict xn, and so on.  The advantage of this method is that it is useful with multiple signals from the same source, as this would make all the signals highly correlated.  We illustrate below what the R matrix and P vector look like for computing the cross-correlation with zero lag.

 

 

 

The following graphs (Figure 2-1) are examples of all the leads taken from one of the data sets from the aforementioned EKG corpus.  The actual heartbeat is the predominant feature and is easily visible as a large spike in each of the twelve leads.

 

Figure 2-1 – Example of data on 12 leads of an EKG.

 

3.  Current State of the Art

Current work in the field of EKG compression by Horspool and Windels is a slight variant of the LZ77 sliding-window text compression technique.  They search for a local maximum, which indicates a spike in the EKG data.  They then store the data points for the spike in a buffer.  Next, according to some predefined tolerance value, they compared the buffer and the spike.  If the comparison is within the specified tolerances (with a tolerance of zero designated lossless compression), the algorithm stores the pair - the index in the buffer and the length of the match.  Horspool and Windels quickly realized that when the tolerance was not zero, error propagation occurred.  They addressed this issue by updating the buffer with the values of the current spike.  Horspool and Windels suggest we could achieve better compression by applying this method across the 12 leads simultaneously and using Huffman encoding on the stored pairs of data.  Table 3-1 contains the final compression results of this method for a few tolerances and their root mean squared (RMS) error differences from the original signal.

 

Tolerance

Compression

Ratio

% RMS

Difference

0

3.2

0%

1

12.9

0.3%

2

22.5

0.8%

3

31.4

1.6%

Table 3-1 – Results of ALZ77 method for a

Sampling rate of 250Hz and 8-bit samples. 

 

Hamilton discusses his approach to the problem of the limited bandwidth associated with the Holter recorder using a solid-state storage system.  He proposes a method in which the adaptation of the algorithm is determined on the resulting bitrate.  Specifically, if the current area to be compressed, with a certain quality level, produces a bitrate lower than the specified bitrate (in this case, the bitrate of the solid-state Holter recorder), the quality of the area to be encoded is increased, and the resulting bitrate is re-calculated.  This provides an adaptive method that makes full use of the bandwidth available, while creating a compressed version of the data with the highest possible quality. 

            The algorithm for compression first performs beat detection to construct an average-beat template.  It then performs average-beat subtraction, followed by quantization, and finally differencing (shown below).

 

 

We will briefly mention lossless compression of digital audio in our survey of current methods since EKG data is very similar in form to audio.  The first step in the basic method for lossless audio compression is intrachannel decorrelation, in which we remove all redundancies between the two channels.  One such way to decorrelate the two channels is to perform cross-prediction.  The next step is applying a linear predictor with a quantizer to each channel and then entropy encoding the residuals.  One excellent lossless audio compression scheme is RKAU developed by Malcolm Taylor in 2000.  Since it is the best in terms of compression ratio, we include it in our study.  Unfortunately, Hans and Schafer did not include RKAU compression in their survey of digital audio compression.  However, an independent study compared RKAU to several of the other methods that were included in the survey, such as Shorten by T. Robinson, Wave Arc (WA) by D. Lee, and Sonarc by R. Sprague.  In this independent study, they found RKAU was the better compressor of numerous digital audio samples compared to these and other methods.

Jalaleddine et al.  discuss many different approaches to the problem of EKG data compression in their survey of the current state of the art.  One such method proposed by Sherwood and Rautaharju was to use a linear predictor and encode the difference between the value and the predicted value.  Referring to such coding schemes as “delta coding”, which is present in Differential Pulse Code Modulation or DPCM for short, a type of audio compression.  From this method, Stewart et al. introduced a threshold for three-lead prediction.  When the difference between two samples in a lead exceeds a certain predefined threshold, they record the data and the interval from the last recorded sample; otherwise, Stewart et al. remove that data point.  This results in a compressed file of the change in amplitudes when it is greater than the threshold.  This is similar in effect to a run-length encoder during the silent regions of the wave and a smoothing function during the spike regions.

 

4.  Method

Because of the nature of the data collection from various positions on the human body, certain leads go through certain unknown transformations and as a result may have a different mean value, or may be inverted.  We begin by determining if we need to invert a particular lead in relation to the other leads.  The idea behind inverting a lead is that it may increase the amount of mutual information between the leads, and therefore increase the potential for compression.  As mentioned earlier, the lead could be offset or even amplified, and simply comparing two leads might lead to erroneous results in determining if the lead should be inverted.  We compute the cross-correlation between two leads and take the coefficient, which we normalize, and use this as our basis of comparison between the leads.  We do this for all possible combinations of leads, creating a 12x12 symmetric matrix (Table 4-1).  If we need to invert the lead for higher correlation, the coefficient for that lead will be negative.

 

Inversion Matrix

avf

avl

avr

i

ii

iii

v1

v2

v3

v4

v5

v6

avf

1

0.191

0.240

0.026

0.567

0.537

0.107

0.701

0.688

0.218

0.805

0.540

avl

0.19

1

0.906

0.976

0.699

0.930

0.958

0.174

0.302

0.956

0.120

0.789

avr

0.240

0.906

1

0.976

0.935

0.689

0.901

0.128

0.003

0.851

0.226

0.548

i

0.026

0.976

0.976

1

0.837

0.829

0.952

0.023

0.155

0.925

0.055

0.684

ii

0.567

0.699

0.935

0.837

1

0.389

0.725

0.363

0.247

0.642

0.485

0.268

iii

0.537

0.930

0.689

0.829

0.389

1

0.864

0.412

0.516

0.903

0.404

0.879

v1

0.107

0.958

0.901

0.952

0.725

0.864

1

0.103

0.258

0.970

0.121

0.799

v2

0.701

0.174

0.128

0.023

0.363

0.412

0.103

1

0.980

0.301

0.907

0.661

v3

0.688

0.30

0.003

0.155

0.247

0.516

0.258

0.980

1

0.440

0.923

0.775

v4

0.218

0.956

0.851

0.925

0.642

0.903

0.970

0.301

0.440

1

0.268

0.891

v5

0.805

0.120

0.226

0.055

0.485

0.404

0.121

0.907

0.923

0.268

1

0.675

v6

0.540

0.789

0.548

0.684

0.268

0.879

0.799

0.661

0.775

0.891

0.675

1

Table 4-1 – An example of the inversion matrix created

To be used in creating the maximum spanning tree.

 

After creating the inversion matrix, we proceed to separate which leads we predict from each other.  We sorted, by the correlation coefficients, the upper triangle of the inversion matrix in descending order into a list.  We then move down the list, checking if we had seen one the two leads before, until we have recorded all twelve leads in a matrix (Table 4-2).  We then rescan this matrix and perform swap operations to minimize the number of leads that do not undergo prediction.  Since the correlation coefficients are the same whether lead a is predicted from b or vice versa, the swap function does not alter the order of the matrix.  This creates a maximum spanning tree that we will use to minimize prediction error from cross-correlation.

 

Lead to Predict

Predicted from

Correlation Coefficient

V2

V3

0.9805

V3

V5

0.9234

AVF

V5

0.8055

V6

V3

0.7755

I

V6

0.6842

II

AVF

0.5678

AVR

V6

0.5483

III

AVF

0.5372

V4

V3

0.4400

AVL

V3

0.3020

V1

V3

0.2586

Table 4-2 – The matrix of the edges and weights

Used to create the maximum spanning tree if figure 4-1.

 

Figure 4-1 – Maximum spanning tree

Created from table 4-2

 

We will briefly mention the definition of a maximum spanning tree by defining its parts. We begin by defining a graph as any set of connected or unconnected nodes.  A tree is a graph in which a path from each node can be made to a root node by transversing connections.  A weighted graph is a graph in which each edge, the connection between two nodes, has a “cost” or “weight” associated with it.  A graph is directed if a direction is specified for the edge.  Kennith H. Rosen defines nicely that “a maximum spanning tree of a connected weighted undirected graph is a spanning tree with the largest possible weight.”  Maximum spanning trees are good for when a user wants to find the most optimal way to transverse a set of nodes in relation to weighted value.

Using the spanning tree as a guide for the relationships between the leads, we perform the cross-correlation again, only this time we do not normalize because we are creating the weights for the prediction.  Where X is the data elements of a lead, , we now have the prediction error from the cross-correlation. 

 

 

Figure 4-2 – Comparison of the actual data samples (top)

And the residuals from cross-prediction (bottom)

 

 

Figure 4-2 shows the lead X and the resulting after cross-prediction.  As we can see, there is clearly more of a pattern present after cross correlation.  The cross-predicted set would be an excellent candidate for the average spike removal method mentioned earlier.  We then perform the standard linear predictor on the resultant data set  to finish the compression.

The order of the linear predictor is a trade-off between extra computation and compressibility.  A graph of the sum-squared error for an array of order shows a rapidly diminishing function that tapers off when it reaches some lower limit.  The key to choosing a good order of prediction is choosing an order that occurs when the function begins to flatten out.  In the example below, any order from 6-8 would be suitable for linear prediction.  We use a simple method for determining this critical point.  We take the list of the sum squared elements and find the point at which the slope is less than the average slope of the graph, s.  This point, i, is a good estimate for the order of prediction to use. Figure 4-3 outlines the equations used to find a sufficient order of prediction.

 


 

 

 

A.       

 

B.        

 


Figure 4-3 – In the graph of the means squared error of the auto-correlation, using the average slope function (A) as a tolerance for the localized slope between two adjacent points (B) to determine That 6th order prediction should be sufficient, whereas 8th prediction would be the qualitative “good” order of prediction.

 

5.  Results

            The application of the algorithm set forth in this paper has led to the following results.  The subsequent figure, Figure 5-1, is the mean squared error for each order prediction of the data set after we perform cross-correlation.  If one compares this figure to Figure 4-3, it is evident that this lead would have good prediction with a 5th order predictor.  For a quantitative analysis, the maximum error before cross-correlation (Figure 4-3) is approximately 1.4e5 while after cross-correlation it is 1.5e4.

 

Figure 5-1

 

In the following figure, Figure 5-2, we give a sample of one of the leads in its original, after cross-prediction, and after auto-correlation is performed on the residuals from cross-correlation. 


 

Figure 5-2

 

As a mark of improvement, we have also noted that the original entropy of the leads was approximately 10.9, after only auto-correlation 4.5, and with the method described within, 4.1. 

 

6.  Discussion

            As described in the previous section, the results are very promising.  The fact that the sum squared error was reduced by a factor of ten after cross-correlation is clear evidence that the leads are correlated.  Additionally, Figure 5-2 suggests that some methods mentioned earlier would benefit greatly from cross-correlation, as it appears to make the signal more regular, or less erratic.  We believe this would greatly improve Hamilton’s average-beat template, resulting in better compression.  Furthermore, in the residuals from the auto-correlation function, the spikes and small anomalies in the signal are easily apparent.  The fact that the spikes are evident suggests that our algorithm might benefit from such methods as average beat detection [2] or by having tolerances on the slope of the signal [4].  In comparison to RKAU, on particular leads RKAU could compress to 27.45%, while with just auto-correlation on that lead could result in a maximum compression of 27.53%, and combined with cross-correlation 25.625%.  The theoretical estimate is based on the calculated entropy of the residuals.  Unfortunately, we did not have enough time to explore any entropy encoding methods.  In addition, on a modern personal computer, the compression took six hours.  The reason is that we were working with highly over-sampled data sets.  There were 1,150,000 data points in each lead; whereas a real EKG would only have 28,750.  On the same machine, the execution of the compression algorithm presented would take 5-10 minutes. 

Our approach does not perform any kind of spike detection or noise reduction like some of the aforementioned methods.  Such tools could decrease the amount of error in the prediction, which would increase compressibility as these methods have shown.  Moreover, our current approach only takes into consideration one other lead when performing the cross correlation.  We might be able to reduce the error from the prediction if we use more leads during cross-correlation.  In addition, if we perform the auto-correlation as well as the cross-correlation on a sample at a given time, instead of holistically for the set, we would see further compression.  Localizing the prediction increases the computation time during encoding and decoding, but catches local variations that would otherwise be lost.  This is especially important with our data because it contains long periods of silence, followed by a spike.

Our approach takes advantage of the high correlation between the twelve different leads created from the same input signal, the patient’s heart.   In particular, leads v1-v6 are so similar that we can see the correlation just from observing the waveform.

Though not designed for compression of EKG data, we include the RKAU compression method in our comparison.  However, since we can represent EKG data in a waveform, we can argue that RKAU could achieve better compression than standard text compression techniques such as BZIP2 and GZIP.  The results are not suggestive of RKAU’s full compression range, as it assumes the data is on a stereo channel and therefore does cross channel coding.  We only presented RKAU with each individual lead, so the algorithm performes no cross-correlation.  However, using our sorted inversion matrix we could feed the RKAU compression program two leads that are highly correlated for the same effect as a stereo signal.

 

7.  Conclusion

This new method has introduced a very simple approach to take advantage of the fact that all twelve leads are highly correlated and therefore highly compressible.  This method could also be used as preprocessing for any of the other methods mentioned, improving the effectiveness of those methods.  This method alone has performed better than some of the methods presented here, showing that the field of EKG data compression should look more closely at applying some cross-lead compression to current methods.

 


Appendix 1:  EKG Data Reference Information

 

Sampling rate

10 kHz

High pass filter

none; DC after subtraction of electrode potential

Low pass filter

1 kHz Bessel filter 8th order

Resolution

>= 13 bit, < 4 uV/LSB

System noise

< 8 uVpp or  < 1.5 uVrms 

Record duration

38 s - 120 s

Recorded signals

12 simultaneous recording channels and up to 4 simultaneous references

Recording type

body surface potentials at the electrodes, exact ECG is calculated

ECG lead configuration

Einthoven, Goldberger, Wilson, Frank

Noise

recordings with a noise level set by the biological noise limit

 


 

References

1.  First Principles:  Lossless Compression of Audio.  Robin Whittle.  11 Dec. 2000.  15 Oct 2001 <http://www.firstpr.com.au/audiocomp/lossless/>.

 

2.  Hamilton, Patrick.  “Adaptive Compression of the Ambulatory EKG.”  Biomedical Instrumentation & Technology January/February (1993): 56-63.

 

3.  Hans, Mat and Ronald W. Schafer.  “Lossless Compression of Digital Audio.”  IEEE Signal Processing Magazine.  July (2001): 21-32.

 

4.  Horspool, R. Nigel, and Warren J. Windels.  An LZ approach to ECG Compression.  Dept. of Computer Science, University of Victoria.

 

5.  Jalaleddine, Sateh M. S., Chriswell G. Hutchens, Robert D. Strattan, and William A. Coberly.  “ECG Data Compression Techniques – A Unified Approach.”  IEEE Transactions on Biomedical Engineering Vol. 37, No. 4, April (1990): 329-343.

 

6.  PTB ECG Reference Data Set.  0-version.  Physikalisch-Technische Bundesanstalt.  0-version.  <http://www.lab1033.berlin.ptb.de/8/

83/831/dbaccess/ecgrefdataset.html>.

 

7.  Reddy, B. R. Shankara, David W. Christenson, G. Ian Rowlandson, Christoph Zywietz, Thomas Sheffield, and Christian Brohet. “Data Compression for storage of Resting ECGs Digitized at 500 Samples/Second.”  Biomedical Instrumentation & Technology March/April (1992): 133-149.

 

8.  Sayood, Khalid.  Introduction to Data Compression.  Boston: Morgan Kaufmann, 2000.

 

9.  RKAU Software.  Malcolm Taylor.  Vers. 1.07.  28 Oct. 2000.  15 Oct 2001 <http://rksoft.virtualave.net/index.html>.

 

10.  Rosen, Kenneth H.  Discrete Mathematics and its Applications.  Boston: WCB/McGraw-Hill, 1999.