جامعة الاسراء- Al-Esraa Universiry - أ.د كاظم بريهي سوادي الجنابي

أ.د كاظم بريهي سوادي الجنابي

Kadhim Breahy Aljanabi

تدريسي : قسم علوم الامن السيبراني

Teaching : Department of Cybersecurity Sciences

دكتوراه هندسة حاسوب - معالجات وتنقيب البيانات

Dr. Computer Engineering and Information Technology- Data Science and Data Mining

kadhim.aljanabi@esraa.edu.iq

kadhim.aljanabi@uokufa.edu.iq

السيرة الذاتية

2332

نشاطات التدريسي

البحوث المنشورة

المؤلفات

المقررات المكلف بها

المحاضرات

مشاريع التخرج

المقالات داخل المجلة

البحوث

البحوث

	2024	CDI2024
Traditional decision tree and Naive Bayes algorithms might face challenges when handling huge data due to various limitations such as Computational Complexity, Scalability, Data Storage, and I/O Operations While these challenges exist, numerous methods and adaptations have been proposed to address these limitations, the most common solution is parallelization using Hadoop environment which optimizing the algorithms for parallel manner. Algorithms can be implemented in Hadoop using a map-reduce programming model. Map Reduce job can be configured to use either a single reducer or multiple reducers. The number of reducers can significantly impact the efficiency, execution time, and performance of a Map Reduce job. Not all machine learning algorithms can naturally or easily be split across multiple reducers due to their inherent characteristics and computations. In this paper, the multi-reducers Map-Reduce job is used to compute the information gain, where each reducer calculates the information gain of one feature. On the other hand, Naïve Bayes can be implemented in multi reducers, where the training phase of n features can be done in n reducers Hadoop job. This study revolves around a malware detection dataset as the primary subject. The research employed ANOVA feature selection to discern the most informative attributes, a pivotal step preceding data preprocessing. The dataset underwent a scaling (z-score normalization) process to enhance its classification readi-ness, resulting in a marked improvement in accuracy. Initially standing at 88%, the accuracy surged to 95% post-scaling. Notably, the research delved into leveraging parallelism in the Hadoop streaming framework. The proposed system was implemented, dedicating individual reducers for each feature, aligning with the dataset's feature count. This strategic parallelism approach was instrumental in the training phase, enhancing system efficiency and performance. Keywords: Cybersecurity, Decision tree, Information gain, Naïve Bayes, Hadoop streaming

	2024	CDI2024
Implementing multilinear regression using gradient descent in Hadoop streaming with multiple reducers can indeed help reduce the required time for computation, especially when dealing with large datasets. Hadoop's main advantage lies in its ability to distribute computations across multiple nodes. When performing multilinear regression using gradient descent, this distributed processing capability can be leveraged to divide the dataset into chunks and perform computations simultaneously on different nodes (reducers in Hadoop's context). Using multiple reducers means that different parts of the computation can be carried out concurrently. Each reducer can handle a subset of the data, performing computations independently. This parallel processing reduces the overall computation time significantly compared to a single-reducer or non-distributed approach. This will avoids the bottleneck of processing massive datasets on a single reducer. The work in this paper proposes a method that Leeds to careful algorithm design to ensure convergence and accuracy while considering the distributed nature of the computation. Handling updates to coefficients and convergence criteria across multiple reducers. Speed of algorithms can be useful in different real-world applications especially in on line detecting of malicious attacks and hence, cybersecurity represents a most important field were the proposed work can be applied. The results obtained from this work showed that an improvement was achieved in processing time of huge amount of data, and hence applications with on line processing such as detecting malicious attacks in cybercrimes will use such approach. Generation synthetic big dataset with size ten million record, Initialization multi linear regression using gradient descent algorithm to work with Hadoop environment, and using multi reducers Hadoop map reduce job to decrease execution time and get low error are the main outcomes of this work. Keywords: Big Data, Hadoop Streaming, MapReduce, Linear Regression.

	2024	CDI2024
Cyber security and Cloud platforms are utilized in various usage and applications in today’s world. Given the wide range of applications, and the ease of usage they provide, the popularity of them are increasing dramatically. Leading many individuals and organizations to depend on them mainly. Securing data, hardware, networks and other resources from cyber-attacks represent a crucial factor for these organizations. The work in this paper proposes an approach of multiple stages to detect and predict the cyber -attacks types aiming to enforce higher security procedures to secure the organization resources in general and data in specific. The approach first stage is the data collection where Meraz dataset available on the internet is used, and then different levels of preprocessing were conducted. The third stage is to apply different classification algorithms to group the attacks into malicious or not. Then after, the data related to the classifier that yield optimum classification results is selected for next level of knowledge extraction where hierarchical clustering was applied. The clustering is built on the malware samples of test dataset only. This dataset is divided into training and testing samples. A 10% of the dataset was used to predict the malware type. Hierarchical clustering was used with various configurations. The reason for using clustering is to predict the attack type by assigning each attack for distinct cluster. The proposed approach gave 98.88% of accuracy with Random Forest classifier and a reliable results for clustering were using Hierarchical clustering by using Euclidean distance metric, and ward linkage, The prediction values were as follows{0: 10671, 1: 3603, 2: 824}.The results obtained gave a novel approach for developing Machine Learning solution for cloud systems security. With this novel solution, the limitations of the traditional solutions are solved. Keywords: Cyber Security, Cloud System, Cyber Attacks, Machine Learning, Classification, Clustering

	2011	Journal of Kufa for Mathematics and Computer
This paper presents a proposed framework for the crime and criminal data analysis and detection using Decision tree Algorithms for data classification and Simple K Means algorithm for data clustering. The paper tends to help specialists in discovering patterns and trends, making forecasts, finding relationships and possible explanations, mapping criminal networks and identifying possible suspects. The classification is based mainly on grouping the crimes according to the type, location, time and other attributes; Clustering is based on finding relationships between different Crime and Criminal attributes having some previously unknown common characteristics. The results of both classifications and Clustering are used for prediction of trends and behavior of the given objects (Crimes and Criminals).

	2018	Journal of Physics: Conference Series
Text classification (TC) is an essential field in both text mining (TM) and natural language processing (NLP). Humans have a tendency to organize and categorize everything as they want to make things easier to understand. Therefore, text classification is an important step to achieve this goal. Arabic text classification (ATC) is a difficult process because the Arabic language has complications and limitations resulting from the nature of its morphology. In this paper, a proposed approach called the Master-Slaves technique (MST) was used to improve Arabic text classification. It consists of two main phases: in the first phase, a new Arabic corpus of 16757 text files was collected. These text files were classified into five categories manually. In the second phase, four different classifiers were implemented on the collected corpus. These classifiers are Naïve Bayes (NB), K-Nearest Neighbour (KNN), Multinomial Logistic

	2015	International Journal of Science and Research (IJSR)

	2022	AIP Conference Proceedings
Arabic language has its own characteristics which are different than other languages. It is concatenative language, so there are many problems in processing of Arabic language. The worN in this research project represents an attempt to solve the problems facing Arabic documents users by proposing a new approach where the retrieved documents are classified into limited number of groups (clusters) that may help the users in finding out the relevant documents efficiently and effectively. First the data was collected and pre-processed, and then a complete IR system has been implemented that included all IR processing, which in turn has been improved by clustering techniques for the user point of view where the retrieved documents were grouped into N clusters. Three clustering techniques (N-means as a type of flat clustering, Ward’s and Average agglomerative as a types of Hierarchal clustering) were applied

	2015	Journal of Kufa for mathematics and computer
Solving transportation problems where products to be supplied from one side (sources) to another (demands) with a goal to minimize the overall transportation cost represents an activity of great importance. Most of the works done in the field deals with the problem as two-sided model (Sources such as factories and Demands such as warehouses) with no connections between sources or demands. However, real world transportation problems may come in another model where sources are connected in a network like graph in which each source may supply other sources in a specific cost. The work in this paper suggests an algorithm and a graph model with mathematical solution for finding the minimum feasible solution for such widely used transportation problems. In this work, the graph representing the problem in which all sources are connected together in a network model with specific cost on each edge is converted into a new graph where additional virtual sources representing supplies between sources are added to the graph, new costs between the added sources and the demands are also calculated, and then modified Kruskalâ€™ s algorithm is applied to get the minimum feasible solution. The proposed solution is a straight forward model with strong mathematical and graph models. It can be widely used for solving real world transportation problems with feasible time and space complexity where time complexity of O (E2+ V2) is required, where E represents the number of edges and V represents the number of vertices. Different numerical examples were used to study the effectiveness and correctness of the proposed algorithm.

	2022	Iraqi International Conference on Communication and Information Technologies
Supervised learning algorithms represent a crucial goal in data analysis fields where the data are groups according to some predefined class labels. However, many applications in real world come with no class labels in the used datasets, which in turn reduce the possibility of converting such data into knowledge. This paper presents an approach for automatic class labeling of the objects using clustering-classification approach.The approach consists of two main phases. The first one is to group the objects of the dataset into k clusters (unsupervised learning) using k mode clustering algorithm since the available data are categorical type. The second phase is to convert the clusters received in phase one into class labels with their margins (supervised learning). An accuracy of more than 85% was received using logistic regression classifier.The obtained results have been shown high range of accuracy, precision

	2016
Clustering represents one of the most popular knowledge extraction algorithms in data mining techniques. Hierarchical and partitioning approaches are widely used in this field. Each has its own advantages, drawbacks and goals. K-means represents the most popular partitioning clustering technique, however it suffers from two major drawbacks; time complexity and its sensitivity to the initial centroid values. The work in this paper presents an approach for estimating the starting initial centroids throughout three process including density based, normalization and smoothing ideas. The proposed algorithm has a strong mathematical foundation

	2020	Indonesian Journal of Electrical Engineering and Computer Science
Clustering represents one of the most popular and used Data Mining techniques due to its usefulness and the wide variations of the applications in real world. Defining the number of the clusters required is an application oriented context, this means that the number of clusters k is an input to the whole clustering process. The proposed approach represents a solution for estimating the optimum number of clusters. It is based on the use of iterative K-means clustering under three different criteria; centroids convergence, total distance between the objects and the cluster centroid and the number of migrated objects which can be used effectively to ensure better clustering accuracy and performance. A total of 20000 records available on the internet were used in the proposed approach to test the approach. The results obtained from the approach showed good improvement on clustering accuracy and algorithm performance over the other techniques where centroids convergence represents a major clustering criteria. C# and Microsoft Excel were the software used in the approach

	2014	journal of kerbala university
Traffic flow and tours represent one of the most important issues in what is known as city planning since their results show how the main street, hi ways, and intersections look like and how they are connected to each other to give the maximum performance and traffic flow during the different time intervals including the rush hours. In this paper we present a traffic model for AlNajaf City based on graph theory, Minimum Spanning Tree, and Shortest Path Algorithms. The model shows the best network paths and alternative tours for the traffic flow in the main streets and intersections in different rush hours. Different tools and software were used in the implementation of the proposed model, including MatLab (Matrix-Laboratory), AutoCad, and others

	2017	International Journal of Computer Science and Information Security

	2010	Journal of Kufa for Mathematics and Computer
This paper presents a framework for the crime data analysis and detection using different data mining techniques including Classification, Association, Prediction and finally outliers and link analysis. The paper tends to help specialists in discovering patterns and trends, making forecasts, finding relationships and possible explanations, mapping criminal networks and identify possible suspects. The classification is mainly based on grouping the crimes according to the type, location, time and other attributes, Association is based on finding relationships between different Crime and Criminal attributes, Prediction helps in finding out the trends and behavior of the given objects, Link analysis shows the link between different attributes and the weight of this linkage. Data for both Crimes and Criminals were collected from free dataset police departments from the Internet, to create and test the proposed framework, and then these data were preprocessed to get clean and accurate data using different preprocessing techniques. The preprocessed data were used to find out different crime and criminal groups, associations and relationships between different attributes were discovered, and finally the linkage between different attributes including crime type and criminal age, job, history and others was found. WEKA mining software was used to analyze the given data

	2011	Journal of Kufa for Mathematics and Computer
This paper presents an improved algorithm for data preprocessing to solve the problem of missing values and smoothing the outliers in the real world data sets. Previous works in this field are based mainly on replacing the missing values with the average, class average, most common values and some other techniques in the same direction, and outliers were generally cancelled from the data set. Crime and criminal data sets have their own special characteristics and benchmark in that missing values and outliers have different meanings than in other fields, so they need to be processed in different manners. The algorithm is based mainly on using clustering techniques to group the objects according to their similarities and dissimilarities, then smoothing the outliers accordingly and the missing values are processed according to their clusters. WEKA is used as a tool to find different clusters of the criminals

	2010	Journal of Kufa for Mathematics and Computer
This paper presents an approach for analyzing data of the Information Technology graduates according to the employability knowledge areas in order to predict feedback recommendations to improve the IT programmes teaching and learning resources and processes towards the improvement of the programme learning outcomes. The approach is based on features (knowledge areas) extracted from logged data for employment and university graduates. Link analysis is an efficient approach to study the correlation and relationships between different attributes that highly affect jobs in IT market, including different skills areas in both the market and the programme curriculum, and it gives good weighted evaluation for these knowledge areas. The link analysis shows great relationship and associations between these attributes (Student Performance in Bachelor degree, analytical and development skills, Programming skills (Java, C++, C#, etc), practical skills, communication skills, and training and certificates) and the market demands. Data set from IT market and university records is used to create and test the model. WEKA was used as a software for mining tasks

	2022	AIP Conference Proceedings
Abstract Almost all Data Mining techniques and algorithms suffer from the high time complexity due to the huge amount of data and the algorithms nature. In general it can be concluded that the time complexity of different mining algorithms is a function of number of records in the dataset, number of features and number of distinct values in these features in addition to some other factors. At the same time data privacy, ownership and security represent a big challenge to the warehouse projects. The worN in this paper tends to improve such complexity and reduce the required accessing time for different queries and improve privacy, security, data ownership throughout a combination of warehouse design with many fact tables (Galaxy model) and using data cube approach containing both detailed and highly summarized data. Cause of death free dataset available on the internet that include more than 14 million

	2020	International Journal of Advanced Trend in Computer Science and Engineering
Data science and analytics represent one of the most emerging fields nowadays. Collecting, storing and analyzing the data are challenging issues in the field since they require the most advanced techniques and technologies. Data Warehouse and Data Marts represent some solutions for collecting, storing and accessing the data. Good Warehouse design leads to better analysis results. Among different application fields of the data, crime data is an important and complex discipline that contains a number of complex relationships between its contents, a wide range of applications and its crucial importance. The aim of the work in this paper is building an optimal Data warehouse for crime dataset using real crime data collected from the internet. Among the different DW modules available in this field galaxy module is used in this work. The data warehouse will support the decision-making process for lawmaker and police departments by understanding crime subjects, and statistics that allow them to track actions, foretell the probability of occurring crimes and efficiently use supplies which are inverted in this paper. The proposed design of the DW shows more reliability, better storing and accessing capabilities and lower anomalies among the other designs. The proposed design was supported with a crime database design to remove heterogonous of the data and to apply some preprocessing issues from which they require data is extracted, transformed and loaded (ETL) into the warehouse

	2017	Journal of Kufa for Mathematics and Computer
The performance of different Data Mining Algorithms including Classification, Clustering, Association, Prediction and others are highly related to the approaches used in Data Warehouse design and to the way the data is stored (lightly summarized, highly summarized and detailed). Detailed data is important to get detailed reports but as the amount of data is huge this represents a big challenge to the mining algorithms, on the other hand, the summarized data leads to better algorithms performance but the lack of the required knowledge may affect the overall mining process. Knowledge extraction and mining algorithms performance and complexities represent a big challenge in data analysis field, hence the work in this paper represents a proposed approach to improve the algorithms performance throughout well designed warehouse and data reduction technique

	2023	Journal of Kufa for Mathematics and Computer
Among different techniques, algorithms and applications of Data Mining, predicting the class label of unlabeled objects (undefined class label) is a crucial term in the field. The most common approaches in this area is the use of classification technique (DT, Bayes, SVM, KNN and others) that represent what is known as supervised learning. However, in many cases no target class labels and the boundaries are available to perform the prediction, so the new approach Clustering-classification technique is used

	2022	Journal of Kufa for Mathematics and Computer
Data preprocessing in general and data reduction in specific represent the main steps in data mining techniques and algorithms since data in real world due to its vastness, the analysis will take a long time to complete. Almost all mining techniques including classification, clustering, association and others have high time and space complexities due to the huge amount of data and the algorithm behavior itself. That is the reason why data reduction represent an important phase in Knowledge Discovery in Databases (KDD) process. Many researchers introduced important solutions in this field. The study in this paper represents a comparative study for about 22 research papers in data reduction fields that covers different data reduction techniques such as dimensionality reduction, numerisoty reduction, sampling, clustering data cube aggregation and other techniques. From the conducted study, it can be concluded that the appropriate technique that can be used in data reduction is highly dependent on the data type, the dataset size, the application goal, the availability of noise and outliers and the compromise between the reduced data and the knowledge required from the analysis

	2022	Journal of Optoelectronics Laser
Data Reduction (DR) represents a crucial factor in the whole KDD process since it improves the algorithms performance in both time and space complexities. Many techniques and algorithms are available nowadays for RD such as Data Cube Aggregation, feature selection and extraction, sampling, clustering and other techniques

	2020	Journal of Kufa for Mathematics and Computer
—The problem of transportation is studied in many areas, most importantly in the field of logistics and operations management. The distribution problem of goods and commodities from sources to destinations is an important problem where many methods have been used to obtain its optimum solution, which represents the minimum cost of distribution the goods from sources to destinations. Generally, the transportation classical cost of one unit of a good is depending on the source and the destination. In this paper, we suggest an approach to obtain a solution to the transportation problem consisting of two products or more and then by using the modified Kruskal’s algorithm we find the minimum feasible solution

	2019	Albahir journal
Many algorithms of classification implemented to the issue of text categorization. A large portion of the work implemented in the English text. On the otherhand, very few researchers implemented in the Arabic text. The nature of Arabictext is very different than English text, and the preprocessing of the Arabictext is extremely difficult and more challenging. In this paper, Maximum Weight (MW) algorithm applied after preprocessing on the Arabic dataset that consists of (16757) text files used for the first time. The results showed that MW is applicableto Arabic text, it reached about (0.83) on average. 10-fold cross-validation used tothe reliability of the result

	2019	REVISTA AUS 26-2

		International Journal of Advanced Computer Science and Technology (IJACST)

	2017	International Journal of Advanced Computer Science and Technology (IJACST)

	2017	International Journal of Advanced Computer Science and Technology (IJACST)

	2017	International Journal of Advanced Computer Science and Technology (IJACST)

	2016	Journal of Kufa for Mathematics and Computer
The work in this paper presents a proposed solution for preprocessing, analyzing, mining and data warehouse model for personal medical data collected from different hospitals and clinics. The proposed solution contains different phases and steps, including Extraction, Transforming and Loading (ETL) and data preprocessing focuses on converting the logged data into categories suitable for analysis and mining process, a star warehouse model was implemented that fulfills the required processing techniques, data are represented by multidimensional cubes for efficient and better data representation, and finally link analysis was applied on the data. The proposed framework is simple and straight forward for implementation. Personal medical data from different sources mostly in Excel files were converted into clean, complete and consistent data by different preprocessing techniques. Logged data were converted into high quality, reliable and suitable for analysis and mining process. Star warehouse schema was implemented since it is very suitable for such type of data and mining techniques. 19900 patients records were collected and used in this work. Excel and WEKA software were used for the analysis and mining processes

	2016	First International Scientific Conference, College of Humanities and Scientific Studies
As data science has become as important as the raw material in the new economic era, and since the knowledge nowadays represents a crucial factor in the success of different organizations, these two concepts came at the top priority of the most wanted jobs worldwide. The work in this paper presents a study and analysis of some important type of data which is employment data. The study consists of different mining algorithms including classification, association and clustering techniques

	2016	First International Scientific Conference, College for Humanities and Scientific Studdies

	2015	International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064

	2014	Journal of Kufa for Mathematics and Computer

	2013	Journal of College of Education for Pure Sciences

	2012	CITEL2012, University of Kufa

	2009	Iraqi Scientific Conference for Applied Sciences, Kufa University, Iraq, March 2009

	2008	Iraqi National Conference for Higher Education, Iraq, 2008

	2008	Iraqi Scientific Conference in Applied Sciences, Kufa University, Iraq, 2008

	2016	مؤتمر ترصين التعليم العالي، وزارةالتعليم العالي والبحث العلمي كانون الثاني 20-21 2016

	2014	المؤتمر العلمي الاول للامانة العامة لمجلس الوزراء، 2014 بغداد-العراق

	2012	مؤتمر ضمان الجودة الثالث، جامعة الكوفة، اذار 2012.

	2011	مؤتمر تطوير مناهج الحاسوب، جامعة القادسية 12-13/1/2011

	2010	المؤتمر السنوي الثاني لضمان الجودة"،جامعة الكوفة، 26-28/12/2010

	2010	المؤتمر السنوي الثاني لضمان الجودة"،جامعة الكوفة، 26-28/12/2010

	2023	ICITAMS 2023

	2022	Journal of Kufa for Mathematics and Computer

	2022	AIP Conference Proceedings
Abstract Almost all Data Mining techniques and algorithms suffer from the high time complexity due to the huge amount of data and the algorithms nature. In general it can be concluded that the time complexity of different mining algorithms is a function of number of records in the dataset, number of features and number of distinct values in these features in addition to some other factors. At the same time data privacy, ownership and security represent a big challenge to the warehouse projects. The worN in this paper tends to improve such complexity and reduce the required accessing time for different queries and improve privacy, security, data ownership throughout a combination of warehouse design with many fact tables (Galaxy model) and using data cube approach containing both detailed and highly summarized data. Cause of death free dataset available on the internet that include more than 14 million

	2022	Journal of Kufa for Mathematics and Computer

	2022	Journal of Kufa for Mathematics and Computer
Data preprocessing in general and data reduction in specific represent the main steps in data mining techniques and algorithms since data in real world due to its vastness, the analysis will take a long time to complete. Almost all mining techniques including classification, clustering, association and others have high time and space complexities due to the huge amount of data and the algorithm behavior itself. That is the reason why data reduction represent an important phase in Knowledge Discovery in Databases (KDD) process. Many researchers introduced important solutions in this field. The study in this paper represents a comparative study for about 22 research papers in data reduction fields that covers different data reduction techniques such as dimensionality reduction, numerisoty reduction, sampling, clustering data cube aggregation and other techniques. From the conducted study, it can be concluded that the appropriate technique that can be used in data reduction is highly dependent on the data type, the dataset size, the application goal, the availability of noise and outliers and the compromise between the reduced data and the knowledge required from the analysis

المؤلفات

المؤلفات

	1999	1999، دار المناهج، الاردن