A Novel Integrated Prognosis & Diagnosis System for Lung Cancer Disease Detection using Soft Computing Techniques

:: Nowadays, lung cancer is one of the ranking first causes of mortality worldwide among men and women. Although there are a lot of treatment options like surgery, radiotherapy, and chemotherapy, five-year survival rate for patients is quite low. However, survival rate may go up to 54% in case lung cancer is identified in an early stage. Therefore, early detection of lung cancer is vital to decrease lung cancer mortality. Medical Experts are continuously trying to find the best solution for the early prediction and diagnosis of Lung Cancer Disease; in this Research work, an attempt has been made to design and develop a novel integrated soft computing predictive system to handle various types of patients’ clinical data to diagnose the lung cancer disease. Here data mining techniques are used to handle the numeric and textual data, image processing techniques are used to handle CT scan images, neural networks are used to train the lung cancer patient images, and fuzzy inference mechanism is used to predict the lung cancer stages. This integrated approach results in detection of lung cancer disease with Prognosis and suggesting diagnosis by the expert system for lung cancer disease. Even in cases of small-sized nodules (3 –10 mm), the proposed system is able to de termine the nodule type with 96% accuracy.


INTRODUCTION
The novel model suggested in this paper is to diagnose the lung cancer diseases using the techniques of image segmentation and decision making. A Fuzzy Inference System (FIS) is developed and result is tested by using artificial neural network. The methodology proposed for lung nodule detection consists of the following major components as shown in the figure 1. The figure 1 consists of five major components: Knowledge Base, Inference Engine, Working Memory, Database and a User Interface. Knowledge base consists of set of facts and rules. Here facts are true statements already tested by the system and rules are written by the user. An Input query given at the user interface can be solved by using the rules and facts in knowledge base.

Working Memory
Working memory consists of the data recently processed by the system. This memory is similar to a cache memory which contains the information about the frequently used rules and facts. This is also a very useful block in prediction system.

Database
In most of the prediction system designs a separate database will be maintained apart from the existing knowledge base to store the history of old records and other valuable information. In this predictive system design, the database is exclusively used to store the old patient records of lung cancer disease like CT Images, Test and Diagnostic Reports etc..

User Interface
User interface is the key component in the system through which user can interact with the predictive system. Here symptoms and patient test reports are given as input to the system and diagnosis and advice given by the expert are taken as output.

Inference Engine
Inference engine is used to infer the data from the knowledge base. The inference engine for the proposed integrated system contains four major independent components: Numeric Data Component, Image Data Component, Imprecise data components and training data component.

METHODOLOGY
In this novel predictive system design, the inference mechanism methodology is shown in the figure 2. The inference engine of the predictive system contains four major components, each of which are handling different types of feature of the lung cancer patient records. The components used in this methodology are explained hereunder.

Imprecise Data Component
This component is used to handle categorical data of the lung cancer patient records. The details are given in Table 1. These features are handled by Fuzzy inferencing techniques to determine the stage of the cancer. i.e. one of the four stages.

Imagery Data Component
This component is used to handle imagery data of the lung cancer patient records. The details are given in Table  1. These features are handled by Image Processing techniques to detect the tumors and their sizes and thereby determining whether a patient has cancer or not.

Text/ Imagery Data Neural Network Component
This component is used to handle Text/Imagery data in the input feature dataset given in Table 1. These features are handled by Neural Networks to detect whether the patient has cancer or not.

SYSTEM DESIGN
The proposed system handles numeric/categorical/imagery features supplied by users to determine lung cancer malignancy as well as the necessary diagnosis if the patient has lung cancer. The detailed steps of the proposed methodology are mentioned hereunder.
Step 1: User logs into the system with user name and password.
Step 2: User enters the Lung Cancer patient record feature set which contains combination of Textual, Imprecise & Imagery features. The feature set of a patient's record is given in Table 1.
Step 3: Given feature set is segregated into three feature subsets and those subsets will be supplied to respective components as shown in figure 2.
The feature subsets are a) Textual feature set contains numeric/categorical features of the given feature set as mentioned in Table 1.
b) The imprecise feature set contains fuzzy features of the given feature set as given in Table 1 c) The imagery feature set contains CT scan image related data as mentioned in Table 1.
Step 4: Each component evaluates received feature subsets and generates results based on the inference mechanism discussed above.
Step 5: Decision Aggregation will be done by the system and suggests necessary diagnosis/treatment if the given.

IMPLEMENTATION
The Methodology of the research work is implemented as mentioned in this section.
The trace of implementation steps are mentioned below.
Step 1: User logged into the system with user name= xxxx and password = **** Step 2: User Inputs the Lung Cancer patient record which is a combination of Numeric and Categorical Data. The cancer patient record details contains the features F1, F2, ….. F28, as given in Table 1.
Step 3: These features are handled by individual data components in the inference engine.
Step 3(a) : The Numerical data in the patient records described in Table 1 is handled by Numeric/Categorical data component. Data Mining concepts are used in this data component.

Input :
If (Age is Medium) and (Gender = Male) and (Hemoptysis is yes) and (pain in chest or bone is high) and (shortness of breath is yes) and (unexpected weight loss is high) Output lung cancer = yes The categorical feature set described in Table 2 of a lung cancer patient record is handled by Numeric/Categorical Data component. Data mining concepts are used in this data component.

For example :
Input :

If (Family History is yes) and (Exposure to Harmful Chemicals is high) and (pain in chest or bone is medium) and (shortness of breath is yes) and (unexpected weight loss is medium)
Output lung cancer = yes Step 3(b) : The imprecise data features of the patient records described in Table 1 For example :

Input :
If (Family History is yes) and (Hemoptysis is yes ) and (pain in chest or bone is high) and (Wheezing with sound is yes) and (Dysphagia is yes) and (unexpected weight loss is high) Output: lung cancer = yes and Stage = 3.
Step 3(c) : The imagery data described in table 4 is handled by Imagery data

Input: Image data
Output: Image Data

Input :
If (Exposure to Harmful chemicals is low) and (Hemoptysis is low) and (pain in is low)

Fig 4(a) & 4(b)
Step 4: Decision Aggregation will be performed by the system based on the results obtained from individual components.
Step 5: Finally the system suggests necessary Diagnosis/Treatment. The figure. 5 shows a typical treatment suggested by the system

CONCLUSIONS
Finally a Novel Integrated Predictive System is developed using Soft computing techniques for Prognosis and Diagnosis of Lung Cancer Disease which is helpful for both patients and doctors to detect the stage of the Lung Cancer using all sorts of datasets like Imagery dataset, Categorical dataset and Numerical datasets. It is also observed that the results of the system are more accurate and promising when compared with other Lung Cancer Analyst systems.

FUTURE ENHANCEMENTS
This research work can be developed as a smart device application. In this case, Graphical User Interface (GUI) is required to enable the user to get contact with the system easily. Using smart devices provide the users with the ability to use the system anywhere, which makes the process of diagnosis faster and easier.