IMPEMENTATION OF DATA WATCHER IN DATA LEAKAGE DETECTION SYSTEM 1

Now days, every company is facing data leakage. That is very serious problem faced by company. An owner of enterprise has given confidential data to its employee but most of the time employee leaks the data. That leak data found in illegal place such as on the web of comparator enterprise or on laptop of employee of comparator enterprise or the owner of Comparators Company’s laptop. It May or may not be observed by owner. Leak data may be basic code or design provision, cost lists, rational property and copy rights data, trade secrets, forecasts and budgets. In this case the data leaked out it leaves the company goes in undefended the authority of the corporation. This uninhibited data leakage puts business in a back in position. To find the solution on this problem we develop two models. First, when any employee of enterprise access confidential data without the consent of owner in that case ,we developed data watcher model to identifying data leaker and suppose employee given data outside the enterprise for that we devolved second model for assessing the “guilt” of agents. Guilt model are used to improve the probability of identifying guilty third parties. For implementing this system, we used educational institute database. In this system we consider, data owner is college chairman called as distributor and other employee is called as agents. For that we considered two condition sample or explicit condition because agents want data in sample or condition.


INTRODUCTION
In company or small fire or educational institute, owner must hand over sensitive data to supposedly trusted agents For example; financial data give to the financial employee for making balance sheet or for making financial transaction but that data was leaked out. Similarly, a company may have partnerships with other companies that require sharing customer data. We consider applications where the original sensitive data cannot be perturbed. Perturbation is a very useful technique where the data are modified and made "less confidential" before being handed to agents. For example, one can add random noise to certain attributes, or one can replace exact values by ranges [1]. However, in some cases, it is important not to alter the original distributor's data. For example, if financial data cannot be perturbation. If medical researchers will wants exact data of patients. They may need accurate data for the patients. Traditionally, leak-age detection is handled by watermarking, e.g., a unique code is embedded in each distributed copy. If that copy is later discovered in the hands of an illegal party, the leaker can be identified. Watermarks can be very useful in some cases, but again, involve some modification of the original data. In addition, watermarks can sometimes be fractured if the data recipient is malicious. In this paper, we study unobtrusive techniques for detecting leakage of a set of objects or records [7] [8].
Specifically we study the following scenario: In every enterprise, data leakage is very serious problem faced by it. An owner of enterprise has given sensitive data to its employee but in most of the situation employee leak the data. That leak data found in un-authorized place such as on the web of comparator enterprise or on laptop of employee of comparator enterprise or the owner of comparators laptop. It is either observed or sometimes not observed by owner. Leak data may be basic code or design provision, cost lists, relational property and copy rights data, trade secrets, forecasts and budgets. At this point, the distributor can assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. If the distributor sees "sufficient proof" that an agent leaked data, he may stop doing business with him, or may initiate legal proceedings. In this paper, we develop a model for assessing the "guilt" of agents. Such objects do not correspond to real entities but appear practical to the agents. In a sense, the fake objects act as a type of watermark for the entire set, without modifying any individual members. If it turns out that an agent was given one or more fake objects that were leaked, then the distributor can be surer that agent was guilty [1].

OBJECTIVE
A data violate is the unintentional release of secure information to an untrusted environment. The goal is to estimate the likelihood that the leaked data came from the agents as opposed to other sources. Not only to we want to estimate the likelihood the agents leaked data, but we would also like to find out if one of them in particular was more likely to be the leaker with large number of overlapping. The data allocation strategies help the distributor "cleverly" give data to agents. Fake objects are added to identify the guilty part, to address this problem four instances are specified. Depending on which the data request is provided. Depending upon the type of data request, the fake objects are allowed.
A distributor owns a set T = {t 1 , t 2 , t 3 …t m } of valuable data objects. The distributor wants to share some of the objects with a set of agents U 1 , U 2 ….U n , but does not wish the objects be leaked to other third parties. The objects in T could be of any type and size, e.g., they could be tuples in a relation, or relations in a database. An agent U i receives a subset of objects R i є T, determined either by a sample request or an explicit request: Sample request R i = SAMPLE (T, m i ): Any subset of mi records from T can be given to Ui. w w w . c i r w o r l d . c o m Explicit request R i = EXPLICIT (T, Cond i ): Agent U i receives all the T objects that satisfy Condi [1].

Data Leakage Worldwide Common Risks and Mistakes
Employees Make examined the relationships between employee behavior and data loss, as well as IT perceptions of those factors. The survey found that employees around the world are engaging in behaviors that put corporate and personal data at risk that IT professionals are often unaware of those behaviors, and that preventing data leakage is a business-wide challenge [2].
The helpfulness of Security Policies, offered insight into how security policy creation, communication, and compliance affect data leakage. The analysis showed that a lack of security policies and a lack of employee compliance with security policies were significant factors in data loss. And as in the first set of findings, the survey showed that IT professionals lacked important awareness-in this case about how many employees actually understand and ob-serve with security policies. Thus it is concluded that companies must address the dual challenge of creating security policies and enforcing employee compliance [2].
The guilt detection approach is related to the data provenance problem [9] tracing the lineage of S objects implies essentially the detection of the guilty agents. Tutorial [3] provides a good over-view on the research conducted in this field. Suggested solutions are domain specific, such as lineage tracing for data warehouses [4], and assume some prior knowledge on the way a data view is created out of data sources. Our problem formulation with objects and sets is more general and simplifies lineage tracing. As far as the data allocation strategies are concerned, our work is mostly relevant to watermarking that is used as a means of establishing original ownership of distributed objects. Watermarks were initially used in images [5], video [6]. Watermark cannot be inserted. In such cases, methods that attach watermarks to the distributed data are not applicable. Finally, there are also lots of other works on mechanisms that allow only authorized users to access sensitive data through access control policies. Such approaches pre-vent in some sense data leakage by sharing information only with trusted parties [5]. An employee who is disgruntled or seeks to gain profit through illegal actions that involve corporate resources can become an insider threat that adds a dangerous new dimension to the data loss prevention challenge. The disgruntled insider threat defines a common awareness that the most significant security threats originate outside the company. Employees with a spiteful agenda and a profit motive can use their insider status to engage in activities that cause even greater financial loss than external threats. Rightful network access and stewardship of devices such as laptops and PDAs makes it simple for disloyal employees to leak corporate data [2]. Some employees simply fail to return company devices when they leave a job. This is an expensive and dangerous activity for businesses because it adds yet another avenue for data loss. Even if only 5 percent of exiting employees take a device, that adds up to 50 employees in a company of 1000, or 500 in an enterprise of 10,000 employees. For larger organizations, the financial and data loss risks are far more significant. A shocking 11 % of employees reported that they or fellow employees accessed unauthorized information and sold it for profit, or stole computers. Employee reasons for keeping their corporate devices when leaving a job included needing the device for personal use (60 %), getting back at their companies, and a belief that their previous employers would not find out. 20 % of IT professionals said disgruntled employees were their biggest concern in the insider threat arena [2].

EXISTING SYSTEM
In many cases distributor must indeed work with agents that may not be trusted, and distributor may not be sure that a leaked object came from an agent or from some other source, since sure data cannot admit watermarks. In existing system there is few problem like fixed agents and existing system work comparable with agents whose request known in advance. Also with adding fake object original sensitive data cannot be alter and absences of agent guilt models that capture leakage scenarios and appropriate model for cases where agents can collude and identify fake tuples. Lastly system is not online capture of leak scenario also in existing system more focus on data allocation problem.

PROPOSED SYSTEM
To find the solution on this problem we develop two models. First, when any employee of enterprise access sensitive data without the consent of owner in that case, we developed data watcher model to identifying data leaker in this point suppose data leaker will identify then no need to calculating the probability of agents that method gives near about 90 % of result. But suppose employee given data outside the enterprise for that we devolved second model for assessing the "guilt" of agents. Guilt model are used to improve the probability of identifying guilty third parties. For implementing this system we used educational institute database. In this system we consider data owner is college management called distributor and other employee is called agents. For that we take two condition sample or explicit condition because agents want data in sample or condition. In this approach, the model for assessing the "guilt" of agents is developed. The option of adding "fake" objects to the distributed set is considered. Such objects do not correspond to real entities but appear practical to the agents. In a sense, the fake objects acts as a type of watermark for the entire set, without modifying any individual members. If it turns out an agent was given one or more fake objects that were leaked, then the distributor can be more confident that agent was guilty. Proposed System worked on two processes

Data Distribution Method
In that considered two exiting techniques for data allocating to the agents. There are four instances of this problem they address, depending on the type of data requests made by agents (E for Explicit and S for Sample requests) and whether "fake data" are allowed (F for the use of fake data, and F for the case where fake data are not allowed). Fake data are data generated by the distributor that are not in set T. The data are designed to look like real data, and are distributed to agents together with the T data, in order to increase the chances of detecting agents that leak data [1].

Probability Finding Process
While distributing the data to any agents some kind of receiver's information can be added to find out the guilty agent it is more concentrated on finding the probability of an agent to be found as guilty. Data object is to be important aspect of our work; it is consider agents parameter and overlapping between pair of agents of this data object which we are forwarding to other agent. The parameter would then w w w . c i r w o r l d . c o m be checked once a data object is received from a malicious target for that used a special process for data object is received from any target the probability is calculated the data object came from which source or we can guess that which agent has leaked the data. Guilty Agent Model would be used to find the agent to be guilty with numerous conditions. Also we have considered if the object cannot be guessed or if its probability can't be find out then the agent can't be considered to be guilty

Steps for Finding Guilt Agent
Distributor select agent to send data. Distributor selects the agents to send the data according to agent request.
Distributor creates fake data and allocates it to the agent. The distributor can create fake data and distribute with agent data or without fake data. Distributor is able to create more fake data; he could further improve the chance of finding guilt agent.
Check number of agents, who have already received data. Distributor checks the number of agents, who have already received data.
Check for remaining agents. Distributor chooses the remaining agents to send the data. Distributor can increase the number of possible allocations by adding fake data. Estimate the probability value for guilt agent. To compute this probability, we need an estimate for the probability that values can be "guessed" by the target.

IMPLEMENTATION
In company or small fire or educational institute, owner must hand over sensitive data to supposedly trusted agents For example; financial data give to the financial employee for making balance sheet or for making financial transaction but that data was leaked out. Similarly, a company may have partnerships with other companies that require sharing customer data. We consider applications where the original sensitive data cannot be perturbed. Perturbation is a very useful technique where the data are modified and made "less confidential" before being handed to agents. For example, one can add random noise to certain attributes, or one can replace exact values by ranges [1]. However, in some cases, it is important not to alter the original distributor's data. For example, if financial data cannot be perturbation. If medical researchers will wants exact data of patients. They may need accurate data for the patients. Traditionally, leak-age detection is handled by watermarking, e.g., a unique code is embedded in each distributed copy. If that copy is later discovered in the hands of an illegal party, the leaker can be identified. Watermarks can be very useful in some cases, but again, involve some modification of the original data. In addition, watermarks can sometimes be fractured if the data recipient is malicious. In this paper, we study unobtrusive techniques for detecting leakage of a set of objects or records [7] [8].
At this point the distributor can assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means [1] [7].
In the figure Fig. (1) Distributor has been given the data to agents according to the request by agents. In this system database maintained by Distributor according to the request by agents data passing to agents with fake or without fake object. When agent doing the business with target without the consent of distributor and leak data. Distributor discovers some of those same objects in an unauthorized place. (For example, the data may be found on a web site, or may be obtained through a legal discovery process or someone's laptop). Then distributor match leak data with his data. Distributor also check overlapping of data among agents and then he calculating probability of agents varies for {0, 1}

RESULTS AND DISSCUSION
The effectiveness of a system is most commonly described with its "Record wise leak report" and "Probability of agent guilty".

Record wise leak report=
This formula calculates records wise leak data without considering agents overlapping. According to that the w w w . c i r w o r l d . c o m distributor knows which agents consider for calculating guilt probability.

Probability of agent guilty=
This formula calculates records wise leak data with considering agents overlapping mean that particular record share by how many agents. According to that the distributor knows which agents consider for calculating guilt probability.

Fig 2: Agent guilt probability distribution
As shown In Fig.(2) given guilt probability of agents like agent1 having 0.9 probability means he having more probable to leak the owner data then agent3 having 0.7 then agent 5 having 0,5 likewise owner can find out leak scenario.

CONCLUSION AND FUTURE WORK
In the company owner hand over its sensitive data to employee but before that owner of data must be add water mark to each and every sensitive data. Also check the record of employee means that particular employee is legally responsible or not to handle that data or not. Then hand over data to employee. Suppose that employee leak data accidently for this case owner considering that things. But employee will do this data leakage deliberately then owner think about that employee and owner boycott that particular employee form shared data or confidential work or talk.
In spite of these difficulties, we have shown it is possible to assess the probability that an agent is responsible for a leak, based on the overlie of his data with the leaked data and the data of other agents, and based on the probability that objects can be "guessed" by other means. The data distribution strategies improve the distributor's chances of identifying a leaker. It has been shown that distributing objects judiciously can make a significant difference in identifying guilty agents, especially in cases where there is large overlap in the data that agents must receive. In some cases "practical but fake" data records are injected to improve the chances of detecting leakage and identifying the guilty party. Our future work includes extension of this work considering allocation strategies so that they can handle agent requests uniquely in an online fashion.