Web Application for Statistical Analysis and Machine Learning Algorithms with R and Python Programming Languages

<aside> 🐸 Jump to

</aside>

Overview

<aside> 💡 Machine learning algorithms are powerful tools used extensively to facilitate decision-making, especially when the dependent variable is dependent on many numerical or categorical variables. In this project, I created a web application that the user can upload the desired datasets, presenting the descriptive statistics selected according to the user's demand and the dependent variable predicted with a machine learning algorithm chosen by the user. The hyper-parameters can be optimized, and in addition to these, the user can specify the analysis. It is planned to create an interface that presents the results (prediction values, confusion matrix, accuracy, etc.).

</aside>

Application

The main window of the application is shown below. Users can navigate the application via the sidebar panel and every window has tabset panels to navigate inside the window.

Figure 1. Screenshot of the uploading data for the classification modeling.

The users can upload their “csv” formatted data by clicking the browse button. The application provides 3 different separator options so users can add a comma, semicolon, and tab-separated data. It is also possible to choose whether the header of the data is included or not by clicking the “Header” radio button.

Users can decide whether to display all data or just the first five observations by clicking the “Display” radio button options.

Figure 2. Data Display Screen

To see data, users can simply click “View Data” from the tabset panel. This data viewer allows users to search specific observations and sort the variables in a decreasing or increasing order.

Users can obtain summary statistics of data from this tabset panel.

Figure 8. Screenshot for machine learning module

This tabset panel provides information about the usage of the machine learning module.

The Machine Learning module of the application allows users to preprocess their data for modeling. In these windows, users can specify dependent and independent variables, handle categorical, numeric missing values and outliers based on desired filling options and encode dummy variables with provided encoding options. The application also makes viewing preprocessed data possible from this window.

This tabset panel makes modeling possible even with raw data.

Figure 9. Screenshot for model fitting panel

Users can choose the desired model from the “Chose a model” drop-down list and decide whether to split data as a train-test or not. If yes, it is possible to decide the proportion from slider input.

After clicking the model name from the “Choose a model” drop-down list, all the outputs of the model are displayed in the same windows immediately.

Fiugre 10. Scrensoot for Logistic Regression model summary