A Lossless Data Compression
Algorithm for Electro-Cardiograms
![]()
Presented by Mark Hosang
Honors Thesis
Submitted in partial fulfillment of the
Requirements for the degree of
Bachelor of Science in Computer Science at
Under the Supervision of
James Storer and
Giovanni Motta
Table
of Contents
Table of Contents ii
Abstract iii
1. Introduction 1
2.
Data Compression Background
... 2
3. Current Work in Area 4
4. Method 6
5. Results 9
6. Discussion 10
7. Conclusion 11
8. Appendix 1: 12
EKG Data Reference Information
9. References 13
Abstract
Within any given set of EKG data there is a high degree of similarity between many of the leads that monitor a patients heartbeat. We present a method that is an improvement over the present data compression algorithms by first performing cross-prediction between the leads that are most highly correlated. We will then perform auto-correlation on the residuals, and arithmetically encode the residuals derived from the auto-correlation. We will show an increase in the amount of compression achievable by removing information that is mutual between the leads.
1. Introduction
At the beginning of the home computing era, hard drive space was at a premium; people were very concerned with the amount of space that data occupied on their hard drives. The simple solution to this dilemma was data compression, the field of compressing data such that it consumes less space. Data compression algorithms take advantage of the fact that most data is redundant or predictable. By removing these redundancies, the algorithm is able to reduce the amount of space the data takes up. Thus, data compression was vital for cost-efficient local storage of data. In comparison, storage today is comparatively cheap and plentiful. However, in the rapidly growing digital marketplace, data compression has become far more important for the cost-efficient transfer of data over networks, the internet being a ubiquitous example of one of these networks.
Within the medical sector, there is a device called an Electrocardiogram (EKG or ECG), which is used to monitor a patients heartbeat. It monitors the electrical signal the heart produces each time it beats. These electrical signals appear similar to sound waves. EKGs play an instrumental role in the diagnosis of heart abnormalities such as palpitations, arrhythmias and heart disease. Furthermore, physicians use it to check the current state of the heart, or to monitor if the patient is having a heart attack.
There are three primary types of EKGs: resting, stress, and ambulatory. A resting EKG uses twelve leads and is performed at a facility, such as a hospital equipped with an EKG machine. A stress EKG is a test taken while the patient is on a treadmill to observe the hearts reaction to physical stress. The last type, ambulatory EKG is used to examine patients daily heart activity.
We are working with the corpus from the PTB ECG reference database in
The standard setup for an EKG is twelve receptors attached to different positions on the chest and back regions of the patient. All twelve receptors, or leads, record the same electrical signals, and therefore all contain the same heartbeat, plus some interference caused by the patients body. For example, any movement the patient may make, such as breathing. Some types of these interferences can invert the heartbeat, while other types weaken the signal or add noise. Each lead monitors a specific region of the heart; the standard setup of twelve leads monitors the front and side of the heart and a little bit of the right ventricle.
The Ambulatory Electrocardiogram is a portable self-contained system that uses Holter recorders with tapeless or solid-state technologies for storage. Physicians attach these portable EKGs to the patient at the beginning of the day. The patients then carries out their daily activities and reports to the center at the end of the day. These new portable monitors allow for better analysis of the condition of the patients heart, which is unobtainable from the shorter clinical observations. However, we should note that the Ambulatory EKG utilizes only two or three leads for monitoring a patients heart. Currently, there is no need to apply more leads since this type of EKG is used only to diagnosis arrhythmias or palpitations. Because of the duration of the monitoring and the fact that the patient is moving, there are significant muscle artifacts introduced into the signal. These artifacts make ambulatory EKGs very difficult to interpret.
With the widespread adoption of the internet by the general populace, the medical community has expressed a desire for the ability to transmit data on patients to other colleagues in order to discuss patient diagnoses. However, raw EKGs are quite large, being comprised of twelve different leads that may span up to 24 hours in the case of the portable monitor. With a sample rate of 200 samples per second (sps), at 8-bits per sample, a 24-hour recording of all twelve leads would be 192 megabytes. A file this large would take approximately eight hours to transmit over a 56K modem, or 2 hours over a 25KB/sec cable connection. This makes it essential to use a data compression algorithm to reduce these long transmission times.
2. Data
Compression Background
Correlation is a measure of similarity between two variables. That is to say, knowing the value of X, how much do you know of Y? The higher the correlation, the more shared information there is between the variables X and Y. Thus, a higher correlation indicates greater redundancy. Moreover, as we have mentioned before, a highly redundant signal is, theoretically, also highly compressible.
One of the tools often used in the field of data compression
is a linear predictor. A linear
predictor attempts to use the data elements that precede an element xn to make an educated guess
about the value of xn. It then subtracts this predicted value
from the actual value of xn,
to get the error from the prediction. This
error is called the residual of the prediction. To reconstruct the value of xn without any data loss we
would only have to encode the error from the prediction.
When talking about linear predictors, people mention the terms order or lag. The order or lag of the prediction specifies how many past or future values the prediction is based on. This is to say, a first order linear predictor would only use xn-1 to predict xn, whereas we would base a second order prediction on xn-1 and xn-2.
Linear predictors perform best on continuous data, like waveforms. We can see linear prediction used in the popular audio format known as Adaptive Differential Pulse Code Modulation (ADPCM). The following equation details how the auto-correlation function works mathematically.
![]()
Where Rxx is the autocorrelation function with lag k, E[] is the expectation function of xn knowing xn-k, and M is the total number of data points. In this particular example, the above equation finds the average weights for the whole data with order k. From this basic model, we construct a matrix R from which we can determine the values of the weights to use.


Where N is the order of prediction used.
To predict xn,
we find the weights, that is to say, how much an element contributes to the
prediction of xn. To do this, we create a matrix R of all of the auto-correlations values
with lag from 0 to n-1. We then compute vector P that contains the auto-correlation values from 1 to n.
We then determine the weight vector
, which is the size of the lag. A
consists of all weights to be applied to elements being used for
prediction. The sum of these weights is
approximately 1, and we derive the predicted value of x from the following equation:
![]()
The cross-correlation function is derived from the
auto-correlation function. However,
instead of using xn-1 to
predict xn, we use element
yn, where y denotes a different data set. In this case, we can use many different data
sets to predict xn, just
as a linear predictor can use the preceding elements. The term lag
here has the same meaning as it had in the linear predictor. That is to say that with a lag of 1, we would
use yn-1 to predict xn, and so on. The advantage of this method is that it is
useful with multiple signals from the same source, as this would make all the signals
highly correlated. We illustrate below
what the R matrix and P vector look like for computing the
cross-correlation with zero lag.
![]()

The following graphs (Figure 2-1) are examples of all the leads taken from one of the data sets from the aforementioned EKG corpus. The actual heartbeat is the predominant feature and is easily visible as a large spike in each of the twelve leads.

Figure 2-1 Example of data on 12 leads of an EKG.
3.
Current work in the field of EKG compression by Horspool and Windels is a slight variant of the LZ77 sliding-window text compression technique. They search for a local maximum, which indicates a spike in the EKG data. They then store the data points for the spike in a buffer. Next, according to some predefined tolerance value, they compared the buffer and the spike. If the comparison is within the specified tolerances (with a tolerance of zero designated lossless compression), the algorithm stores the pair - the index in the buffer and the length of the match. Horspool and Windels quickly realized that when the tolerance was not zero, error propagation occurred. They addressed this issue by updating the buffer with the values of the current spike. Horspool and Windels suggest we could achieve better compression by applying this method across the 12 leads simultaneously and using Huffman encoding on the stored pairs of data. Table 3-1 contains the final compression results of this method for a few tolerances and their root mean squared (RMS) error differences from the original signal.
|
Tolerance |
Compression Ratio |
% RMS Difference |
|
0 |
3.2 |
0% |
|
1 |
12.9 |
0.3% |
|
2 |
22.5 |
0.8% |
|
3 |
31.4 |
1.6% |
Table 3-1 Results of ALZ77 method for a
Sampling rate of 250Hz and 8-bit samples.
The algorithm for compression first performs beat detection to construct an average-beat template. It then performs average-beat subtraction, followed by quantization, and finally differencing (shown below).
![]()
We will briefly mention lossless compression of digital audio in our survey of current methods since EKG data is very similar in form to audio. The first step in the basic method for lossless audio compression is intrachannel decorrelation, in which we remove all redundancies between the two channels. One such way to decorrelate the two channels is to perform cross-prediction. The next step is applying a linear predictor with a quantizer to each channel and then entropy encoding the residuals. One excellent lossless audio compression scheme is RKAU developed by Malcolm Taylor in 2000. Since it is the best in terms of compression ratio, we include it in our study. Unfortunately, Hans and Schafer did not include RKAU compression in their survey of digital audio compression. However, an independent study compared RKAU to several of the other methods that were included in the survey, such as Shorten by T. Robinson, Wave Arc (WA) by D. Lee, and Sonarc by R. Sprague. In this independent study, they found RKAU was the better compressor of numerous digital audio samples compared to these and other methods.
Jalaleddine et al. discuss many different approaches to the problem of EKG data compression in their survey of the current state of the art. One such method proposed by Sherwood and Rautaharju was to use a linear predictor and encode the difference between the value and the predicted value. Referring to such coding schemes as delta coding, which is present in Differential Pulse Code Modulation or DPCM for short, a type of audio compression. From this method, Stewart et al. introduced a threshold for three-lead prediction. When the difference between two samples in a lead exceeds a certain predefined threshold, they record the data and the interval from the last recorded sample; otherwise, Stewart et al. remove that data point. This results in a compressed file of the change in amplitudes when it is greater than the threshold. This is similar in effect to a run-length encoder during the silent regions of the wave and a smoothing function during the spike regions.
4. Method
Because of the nature of the data collection from various positions on the human body, certain leads go through certain unknown transformations and as a result may have a different mean value, or may be inverted. We begin by determining if we need to invert a particular lead in relation to the other leads. The idea behind inverting a lead is that it may increase the amount of mutual information between the leads, and therefore increase the potential for compression. As mentioned earlier, the lead could be offset or even amplified, and simply comparing two leads might lead to erroneous results in determining if the lead should be inverted. We compute the cross-correlation between two leads and take the coefficient, which we normalize, and use this as our basis of comparison between the leads. We do this for all possible combinations of leads, creating a 12x12 symmetric matrix (Table 4-1). If we need to invert the lead for higher correlation, the coefficient for that lead will be negative.
|
Inversion Matrix |
avf |
avl |
avr |
i |
ii |
iii |
v1 |
v2 |
v3 |
v4 |
v5 |
v6 |
|
avf |
1 |
0.191 |
0.240 |
0.026 |
0.567 |
0.537 |
0.107 |
0.701 |
0.688 |
0.218 |
0.805 |
0.540 |
|
avl |
0.19 |
1 |
0.906 |
0.976 |
0.699 |
0.930 |
0.958 |
0.174 |
0.302 |
0.956 |
0.120 |
0.789 |
|
avr |
0.240 |
0.906 |
1 |
0.976 |
0.935 |
0.689 |
0.901 |
0.128 |
0.003 |
0.851 |
0.226 |
0.548 |
|
i |
0.026 |
0.976 |
0.976 |
1 |
0.837 |
0.829 |
0.952 |
0.023 |
0.155 |
0.925 |
0.055 |
0.684 |
|
ii |
0.567 |
0.699 |
0.935 |
0.837 |
1 |
0.389 |
0.725 |
0.363 |
0.247 |
0.642 |
0.485 |
0.268 |
|
iii |
0.537 |
0.930 |
0.689 |
0.829 |
0.389 |
1 |
0.864 |
0.412 |
0.516 |
0.903 |
0.404 |
0.879 |
|
v1 |
0.107 |
0.958 |
0.901 |
0.952 |
0.725 |
0.864 |
1 |
0.103 |
0.258 |
0.970 |
0.121 |
0.799 |
|
v2 |
0.701 |
0.174 |
0.128 |
0.023 |
0.363 |
0.412 |
0.103 |
1 |
0.980 |
0.301 |
0.907 |
0.661 |
|
v3 |
0.688 |
0.30 |
0.003 |
0.155 |
0.247 |
0.516 |
0.258 |
0.980 |
1 |
0.440 |
0.923 |
0.775 |
|
v4 |
0.218 |
0.956 |
0.851 |
0.925 |
0.642 |
0.903 |
0.970 |
0.301 |
0.440 |
1 |
0.268 |
0.891 |
|
v5 |
0.805 |
0.120 |
0.226 |
0.055 |
0.485 |
0.404 |
0.121 |
0.907 |
0.923 |
0.268 |
1 |
0.675 |
|
v6 |
0.540 |
0.789 |
0.548 |
0.684 |
0.268 |
0.879 |
0.799 |
0.661 |
0.775 |
0.891 |
0.675 |
1 |
Table 4-1 An example of the inversion matrix created
To be used in creating the maximum spanning tree.
After creating the inversion matrix, we proceed to separate which leads we predict from each other. We sorted, by the correlation coefficients, the upper triangle of the inversion matrix in descending order into a list. We then move down the list, checking if we had seen one the two leads before, until we have recorded all twelve leads in a matrix (Table 4-2). We then rescan this matrix and perform swap operations to minimize the number of leads that do not undergo prediction. Since the correlation coefficients are the same whether lead a is predicted from b or vice versa, the swap function does not alter the order of the matrix. This creates a maximum spanning tree that we will use to minimize prediction error from cross-correlation.
|
Lead to Predict |
Predicted from |
Correlation Coefficient |
|
V2 |
V3 |
0.9805 |
|
V3 |
V5 |
0.9234 |
|
AVF |
V5 |
0.8055 |
|
V6 |
V3 |
0.7755 |
|
I |
V6 |
0.6842 |
|
II |
AVF |
0.5678 |
|
AVR |
V6 |
0.5483 |
|
III |
AVF |
0.5372 |
|
V4 |
V3 |
0.4400 |
|
AVL |
V3 |
0.3020 |
|
V1 |
V3 |
0.2586 |
Table 4-2 The matrix of the edges and weights
Used to create the maximum spanning tree if figure 4-1.

Figure 4-1 Maximum spanning tree
Created from table 4-2
We will briefly mention the definition of a maximum spanning tree by defining its parts. We begin by defining a graph as any set of connected or unconnected nodes. A tree is a graph in which a path from each node can be made to a root node by transversing connections. A weighted graph is a graph in which each edge, the connection between two nodes, has a cost or weight associated with it. A graph is directed if a direction is specified for the edge. Kennith H. Rosen defines nicely that a maximum spanning tree of a connected weighted undirected graph is a spanning tree with the largest possible weight. Maximum spanning trees are good for when a user wants to find the most optimal way to transverse a set of nodes in relation to weighted value.
Using the spanning tree as a guide for the relationships
between the leads, we perform the cross-correlation again, only this time we do
not normalize because we are creating the weights for the prediction. Where X
is the data elements of a lead,
, we now have the prediction error from the
cross-correlation.

Figure 4-2 Comparison of the actual data samples (top)
And the residuals from cross-prediction (bottom)
Figure 4-2 shows the lead X and the resulting
after cross-prediction. As we can see, there is clearly more of a
pattern present after cross correlation.
The cross-predicted set would be an excellent candidate for the average
spike removal method mentioned earlier. We
then perform the standard linear predictor on the resultant data set
to finish the compression.
The order of the linear predictor is a trade-off between extra computation and compressibility. A graph of the sum-squared error for an array of order shows a rapidly diminishing function that tapers off when it reaches some lower limit. The key to choosing a good order of prediction is choosing an order that occurs when the function begins to flatten out. In the example below, any order from 6-8 would be suitable for linear prediction. We use a simple method for determining this critical point. We take the list of the sum squared elements and find the point at which the slope is less than the average slope of the graph, s. This point, i, is a good estimate for the order of prediction to use. Figure 4-3 outlines the equations used to find a sufficient order of prediction.
A. ![]()
B. ![]()

Figure 4-3 In the graph of the means squared error of the auto-correlation, using the average slope function (A) as a tolerance for the localized slope between two adjacent points (B) to determine That 6th order prediction should be sufficient, whereas 8th prediction would be the qualitative good order of prediction.
5. Results
The
application of the algorithm set forth in this paper has led to the following
results. The subsequent figure, Figure
5-1, is the mean squared error for each order prediction of the data set after
we perform cross-correlation. If one
compares this figure to Figure 4-3, it is evident that this lead would have
good prediction with a 5th order predictor. For a quantitative analysis, the maximum
error before cross-correlation (Figure 4-3) is approximately 1.4e5
while after cross-correlation it is 1.5e4.

Figure 5-1
In the following figure, Figure 5-2, we give a sample of one of the leads in its original, after cross-prediction, and after auto-correlation is performed on the residuals from cross-correlation.

Figure 5-2
As a mark of improvement, we have also noted that the original entropy of the leads was approximately 10.9, after only auto-correlation 4.5, and with the method described within, 4.1.
6. Discussion
As
described in the previous section, the results are very promising. The fact that the sum squared error was
reduced by a factor of ten after cross-correlation is clear evidence that the
leads are correlated. Additionally,
Figure 5-2 suggests that some methods mentioned earlier would benefit greatly
from cross-correlation, as it appears to make the signal more regular, or less
erratic. We believe this would greatly
improve
Our approach does not perform any kind of spike detection or noise reduction like some of the aforementioned methods. Such tools could decrease the amount of error in the prediction, which would increase compressibility as these methods have shown. Moreover, our current approach only takes into consideration one other lead when performing the cross correlation. We might be able to reduce the error from the prediction if we use more leads during cross-correlation. In addition, if we perform the auto-correlation as well as the cross-correlation on a sample at a given time, instead of holistically for the set, we would see further compression. Localizing the prediction increases the computation time during encoding and decoding, but catches local variations that would otherwise be lost. This is especially important with our data because it contains long periods of silence, followed by a spike.
Our approach takes advantage of the high correlation between the twelve different leads created from the same input signal, the patients heart. In particular, leads v1-v6 are so similar that we can see the correlation just from observing the waveform.
Though not designed for compression of EKG data, we include the RKAU compression method in our comparison. However, since we can represent EKG data in a waveform, we can argue that RKAU could achieve better compression than standard text compression techniques such as BZIP2 and GZIP. The results are not suggestive of RKAUs full compression range, as it assumes the data is on a stereo channel and therefore does cross channel coding. We only presented RKAU with each individual lead, so the algorithm performes no cross-correlation. However, using our sorted inversion matrix we could feed the RKAU compression program two leads that are highly correlated for the same effect as a stereo signal.
7. Conclusion
This new method has introduced a very simple approach to take advantage of the fact that all twelve leads are highly correlated and therefore highly compressible. This method could also be used as preprocessing for any of the other methods mentioned, improving the effectiveness of those methods. This method alone has performed better than some of the methods presented here, showing that the field of EKG data compression should look more closely at applying some cross-lead compression to current methods.
Appendix 1: EKG Data Reference Information
|
Sampling rate |
10 kHz |
|
High pass filter |
none; DC after subtraction of electrode potential |
|
Low pass filter |
1 kHz Bessel filter 8th order |
|
Resolution |
>= 13 bit, < 4 uV/LSB |
|
System noise |
< 8 uVpp or < 1.5 uVrms |
|
Record duration |
38 s - 120 s |
|
Recorded signals |
12 simultaneous recording channels and up to 4 simultaneous references |
|
Recording type |
body surface potentials at the electrodes, exact ECG is calculated |
|
ECG lead configuration |
Einthoven, Goldberger, Wilson, Frank |
|
Noise |
recordings with a noise level set by the biological noise limit |
References
1. First
Principles: Lossless Compression of
Audio. Robin Whittle. 11 Dec. 2000.
2. Hamilton, Patrick. Adaptive Compression of the Ambulatory EKG. Biomedical Instrumentation & Technology January/February (1993): 56-63.
3. Hans, Mat and Ronald W. Schafer. Lossless Compression of Digital Audio. IEEE Signal Processing Magazine. July (2001): 21-32.
4.
Horspool, R. Nigel, and Warren J. Windels. An LZ approach to ECG Compression. Dept. of Computer Science,
5. Jalaleddine, Sateh M. S., Chriswell G. Hutchens, Robert D. Strattan, and William A. Coberly. ECG Data Compression Techniques A Unified Approach. IEEE Transactions on Biomedical Engineering Vol. 37, No. 4, April (1990): 329-343.
6. PTB ECG
Reference Data Set. 0-version. Physikalisch-Technische Bundesanstalt. 0-version.
<http://www.lab1033.berlin.ptb.de/8/
83/831/dbaccess/ecgrefdataset.html>.
7. Reddy, B. R. Shankara, David W. Christenson, G. Ian Rowlandson, Christoph Zywietz, Thomas Sheffield, and Christian Brohet. Data Compression for storage of Resting ECGs Digitized at 500 Samples/Second. Biomedical Instrumentation & Technology March/April (1992): 133-149.
8. Sayood,
Khalid. Introduction to Data
Compression.
9. RKAU
Software. Malcolm Taylor. Vers. 1.07.
10. Rosen,
Kenneth H. Discrete Mathematics and
its Applications.