Facebook

2017年7月26日水曜日

Labtech Cardiospy


Abstract of PhD Thesis Intelligent Data Processing and Its Applications

Aniko Szilvia Vanger



1 Introduction

Nowadays the rapidly increasing performance of hardware and the efficient

intelligent scientific algorithms enable us to store and process big data. This

tendency will cover more opportunities to get more and more information from

the large amount of data. My thesis is only a precursor of this topic, because

I did not have sufficient hardware and I had only a little data to be processed.

However, all the topics of my thesis belong to the intelligent data processing.

In Chapter 2 of my thesis I introduce a new clustering algorithm named

GridOPTICS, whose goal is to accelerate the well-known OPTICS density

clustering technique. The density-based clustering techniques are capable of

recognizing arbitrary-shaped clusters in a point set. The DBSCAN results in

only one cluster set, but the OPTICS generates a reachability plot from which

a lot of cluster sets can be read as a result without having to execute the whole

algorithm again. I experienced that it is very slow for large data sets, so I wanted

to nd a solution to accelerate it. I wanted to see that the speed of the GridOptics

is better than OPTICS, so I executed both the algorithms on several point sets.

In Chapter 3 of my thesis I introduce two new modules of the Cardiospy system of

Labtech Ltd. On these two projects I worked together with Istvan Juhasz, Laszlo

Farkas, Peter Toth, and 4 students of the university, Jozsef Kuk, Adam Balazs,Bela Vamosi, and David Angyal.Bela Kincs, who was the executive of the Labtech Ltd., wanted the Cardiospy system to be improved. He and his team surveyed what the demand of the users are in this area and how their software could be better. The Labtech Ltd. And the University of Debrecen worked together in two projects. In both cases theLabtech had early solutions for the algorithms, but they were insufficient and slow, the results could not be validated, or they gave insufficient results. Moreover,

there were no visualization tools for either problems. The tasks of the team of the

University of Debrecen were to give a quick algorithm and to create an interactive

visualization interface for each problem.

The goal of the first module of Cardiospy is to cluster and visualize the long (up

to 24-hours) recordings of ECG signals, because the manual evaluation of long

recordings is a lengthy and tedious task. During this project I recognized that it

is a very interesting topic to find out how the OPTICS can be accelerated with a

grid clustering method independently, without any ECG signals.

The goal of the second module of Cardiospy is to calculate and visualize the

steps of the blood pressure measurement and the values of blood pressure. The

recordings (which can contain a sequence of measurements) are collected by a

microcontroller, but this module runs on a PC. With the help of the application

the physicians can recognize the types of errors on the measurements and they

can also find the noisy measurements.

In Chapter 4 I introduce how I applied an active learning method in a subject

whose topic is database programming. I taught Oracle SQL and PL/SQL in

the Advanced DBMS 1 subject, and I saw that the students do not practice at

home. The prerequirements of this subject are the Programming language and

the Database systems courses, so they are not absolute beginners in the field. I

wanted to force the students to try out the programming tools independently, but

with the help of the teacher.

To support the active learning method, an application had to be built. The

application helps the teacher organize and monitor the tasks and their solutions

of the students. Moreover the application can verify the syntax of the solutions

before the students upload them. If the syntax is wrong, the student cannot

upload it. This feature makes the task of the teacher easier.

To demonstrate whether the active learning method is good or not, I gathered and

examined the results of the students during the 3 years when I used this method.

New results

The abstract of the thesis presents new results grouped into four main statements.

The first statement deals with a clustering method, the second one demonstrates

an application of this clustering method, namely clustering of ECG signals, which

can be considered as an application of the GridOPTICS clustering method. The

third statement introduces the visualization of the steps of the blood pressure

measurement, whereas the last statement demonstrates how the solutions of the

students can easily be managed during an active learning method for database

programming.

2.1 A clustering algorithm

Cluster analysis is an important research field of data mining, which is applied

on many other disciplines, such as pattern recognition, image processing, machine

learning, bioinformatics, information retrieval, artificial intelligence, marketing,

psychology, etc. The density-based clustering approach is capable of finding

arbitrarily shaped clusters, but they have a disadvantage, namely it is hard to

choose parameter values in order that the algorithm gives an appropriate result

(Gan et al., 2007). The OPTICS (Ankerst et al., 1999) clustering algorithm gives

not only one result but a set of the results. It builds a reachability plot, namely it

orders the input points, and it assigns a reachability distance to an input point.

Based on the reachability plot, the algorithm can produce a lot of clustering

results. Building the reachability plot is slow, but reading the clusters from the

reachability plot is fast.

The OPTICS has a limitation, namely it has high complexity, which means that

it is very slow for large datasets. (Yue et al., 2007) (Schneider and Vlachos, 2013)

Statement A - The GridOPTICS clustering algorithm: I introduced a

new clustering algorithm named GridOPTICS which is a combination of a grid

clustering technique and the OPTICS algorithm. For a large input point sets the

GridOPTICS algorithm works with insignificant information loss and provides

even one or more order of magnitude faster than the OPTICS algorithm. (Vagner,

in press)

The main idea of the GridOPTICS algorithm is to reduce the number of input

points with a grid technique and then to execute the OPTICS algorithm on the

grid structure. Based on the reachability plot, the clusters of the grid structure

can be determined. In the end, the input points can be assigned to the clusters.

The experimental results show that the execution time can be faster with more

orders of magnitude than OPTICS, which is very useful for large data sets.

However, they also show that the GridOPTICS algorithm is less accurate than

OPTICS.

2.2 Cardiology information system for ECG signals

The big data problem also appears in the medical area. Without intelligent

information systems, the physicians cannot eOne of its modules is the ECG clustering module.

Statement B - Clustering and visualization of ECG signals: We

developed the ECG clustering and visualization module of Cardiospy software. The

goal of the module is to cluster and visualize the long (up to 24-hours) recordings

of ECG signals. In this way the cardiologists can easier find the heart beats which

morphologically differ from the normal beats. (Vagner et al., 2011 A)

On this project I worked together with Laszlo Farkas (Labtech Ltd.), Istvan

Juhasz (Faculty of Informatics, University of Debrecen), and two students from

the Faculty of Informatics, University of Debrecen, Jozsef Kuk and Adam Balazs.

My contribution to this project was to implement the clustering algorithm and

make it fast. The clustering algorithm is a special simpler version of the

GridOPTICS algorithm. I also contributed to

 2.3 Cardiology information system for blood pressure measurement

In the public health care it is very common that a microcontroller calculates the

result of oscillometric blood pressure measurements. It has only limited resources,

such as memory and processor, moreover it can give only a little feedback about

the measurement. This means that the result can be imprecise; it does not inform

the patient and the physician appropriately. (Sorvoja, 2006)

Cardiospy software has another module, the blood pressure measurement module.

It receives the recordings collected by the microcontroller. The recording can

contain only one measurement or sequence of measurements created during 24

hours. Cardiospy runs on a PC, in this way the algorithm can use more

resources (memory and processor), which means that it is faster and more precise.

Additionally, it can visualize the whole process of the measurement.

Statement C { Visualization of o-line processing of blood pressure

measurements: We developed the blood pressure measurement module of

Cardiospy software. The goal of the blood pressure measurement module is to

calculate and visualize the values of blood pressure. (Vagner et al., 2014)

The module determines the values of the blood pressure based on an oscillometric

blood pressure measurement algorithm. The application visualizes the result of

each step of the algorithm. The algorithm decides whether the result is acceptable

and authentic based on the characteristic of the measurement.

The other part of the application helps in the validation process. It executes

the blood pressure measurement algorithm on mass of the measurements each of

which has reference blood pressure values. The application shows the differences

between the results of the algorithm and the values of reference and it helps to

qualify the algorithm according to the international standards.

On this project I worked together with Peter Toth (Labtech Ltd.), Istvan Juhasz

(Faculty of Informatics, University of Debrecen), and two students from the

Faculty of Informatics, University of Debrecen, Bela Vamosi and David Angyal.

My contribution to this project was to construct and implement a signal processing

algorithm which produces the blood pressure values and the pulse values of a

measurement.
2.4 Education of database programming
finding out how we can characterize the m