- Data Anomaly - How Data Anomaly Could Be Prevented
Data Anomaly - How Data Anomaly Could Be Prevented
Anomalies are caused directly by users when the data does not conform to the expected behavior, and the design of the database is not done correctly. Database administrators (DBAs) face many problems when developing a database, which has many tables that are related. When an anomaly happens, data redundancy may occur, which means that the data has been duplicated, and may cause users to access incorrect information (Chen et al. 2011). Anomalies must be detected, because a lot of applications, could use well design databases; fraud detection in insurance and healthcare, as well as credit cards, are some of the database driven applications, where anomalies can be found (Chandola, Banerjee & Kumar, 2009). Anomalies can also cause problems in applications such as military surveillance systems and cyber-security system (Chandola, Banerjee & Kumar, 2009). When a database is not correctly designed, three types of anomalies can occur; according to EESSAAR (2016), update, insert and deletion are three types of data anomaly. Strange situations can happen, when the linkage between data items are not done properly, and these data items are scattered in the database inconsistently when updates of these data items take place, this is known as update anomalies.
When an update of one data item, that is supposed to be linked to other data items; takes place in the database and these other items are not updated properly, it is said that an update anomaly has occurred (EESSAAR 2016). The database will, in turn, have data inconsistency, due to update anomalies, for example, when users want to pull information from the database in my project, about the patient table and the staff table, then a new table would be created. Using SQL statements, such as JOINS, two tables can be used together, to provide valuable information, to users about these data items. This new table would be known as the patient_staff table, if the address of a staff change in the staff table, then this update must also take place in the patient_staff table, or else there would be data inconsistency also known as an update anomaly.
Deletion anomaly is another type of anomaly, where data that is meant to be deleted, is not deleted from the database, leaving undeleted data on the database. Deletion also happens, when the deletion of one item that is linked to another, causes the deletion of the linked item, which the user didn’t intend to delete, to be deleted.
For example, in the table patient_staff in the project that I am proposing, if a patient is discharged and that patient record needs to be deleted, then the deletion should not affect the staff table. The staff table do form a relationship, with the patient entity, and anomalies can occur when the patient_staff table is created. Only the patient table and the patient_staff table should be affected by the deletion, in order to make sure, that all information about the patient, has been deleted all across the database (EESSAAR 2016).
Another type of anomaly is the insertion anomaly, which happens when data that is meant to be inserted into the database, cannot enter the database, due to the fact that other items must exist in the database, in order for the data to be inserted (EESSAAR 2016). A good example would be, in my project there would be to entities known as patient and hospital entity, and these will be forming a relationship. When users try to input a new patient information in the database, there needs to be hospital information as well, in order for that patient information to be inserted, in the database. If the hospital information, for that patient, is not found, then the database will be missing vital information about the patient.
Anomalies can mean something different, for different types of database driven applications, for example, in my project, a change in the blood pressure, in the patient record table, might signify an anomaly to doctors. While a change in the address of a patient, in the patient entity, might be considered normal because people move all the time. In order to avoid all these problems, normalization must be used to normalize the database. The techniques of normalization will be discussed below, in order to answer the second question below.
Thank you for reading this article!!!
Chandola, V., Banerjee, A. & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys. Retrieved from http://cucis.ece.northwestern.edu/projects/DMS/publications/AnomalyDetection.pdf
Chen, T. X., Meyer, M. D., Ganapathi, N., Liu, S., & Jonathan M. Cirella, J. M. (2011). Improving data quality in relational databases: Overcoming functional entanglements. RIT International. Retrieved from https://www.rti.org/sites/default/files/resources/op-0004-1105-chen.pdf
Coulson, F. (2016). The 3 normal forms: A tutorial by Fred Coulson. Retrieved from http://www.phlonx.com/resources/nf3/nf3_tutorial.pdf
Eessaar, E. (2016). The database normalization theory and the theory of normalized systems: finding a common ground. Baltic J. Modern Computing, 4(1), 5-33. Retrieved from https://www.bjmc.lu.lv/fileadmin/user_upload/lu_portal/projekti/bjmc/Contents/4_1_3_Eessaar.pdf
Add a Comment
Hi every one, I obtained a bachelor's degree in Bioinformatics back in 2006, from Claflin University, after I received my bachelor's degree, I gained full time employment as a software engineer at a Video Relay Service company, maintaining databases and developing software for a new developed device called the VPAD.
I worked at that company for two years, then I became a web developer, and worked for a magazine for three years. After that job, I worked as a Drupal web developer, as a subcontractor for the NIH, for a year and then left the job to go back to school.
Collaboratively administrate empowered markets via plug-and-play networks. Dynamically procrastinate B2C users after installed base benefits. Dramatically visualize customer directed convergence without
Collaboratively administrate empowered markets via plug-and-play networks. Dynamically procrastinate B2C users after installed base benefits. Dramatically visualize customer directed convergence without revolutionary ROI.