Development Blog #1

August 29th, 2022

  • Met with capstone team to develop a game plan for the rest of the semester
  • Created CI Sync development blog
  • Revisited the work our team had completed last semester

Development Blog #2

September 5th, 2022

  • Met with capstone team to analyze repositories together
  • Created tasks for each member
  • I analyzed the code in the main repository we will be working with and incorporating into a SQLite Server: ETC_Data_Analysis_Main
    • CSV Generator: A project folder that contains droid.py file. This droid.py file makes use of DROID and creates csv’s given a file system hierarchy
    • Database: A project folder that contains blocks of code that take in the csv’s generated by droid.py to create a database
    • THESE TWO project folders will help us create the data we need for visualizations in Tableau
  • Read up on the documentation for these repositories
  • Was able to generate a mock database with already existing csv’s

Tasks:

  • Study the repository even further and get the code in the following project folder CSV Generator and Database working again.

Development Blog #3

September 12th, 2022

  • The code for driod.py in CSV Generator is finally working
    • Installed droid, droid is now runs on a working directory of project folders
    • The process of doing so was a bit difficult as there were program and system dependencies needed that were not detailed in the documentation
    • There was much research done as far as how to install/work around these dependencies
    • In the process, I was able to familiarize my self with the code in driod.py even further as I had to really debug where the errors were coming from
    • I documented the processes and the steps and I hope to incorporate into the already existing documentation

Tasks:

  • Run the output of droid.py into databasegenerator.py, this should not take long
  • See how we can incorporate this into sqlite server
  • Talk to Evan

Development Blog #4

September 19th, 2022

  • This week consisted of getting databasegenerator.py to run. In contrast to my previous thoughts, this in fact took a long time.
    • I ran into errors that I was not quite able to debug effectively. There is not much documentation on Droid on line that was able to point me in the right direction.
    • I attempted to run csvgenerator.py file differently, this did not seem to help my case
    • Brainstormed with Evan of different ideas
    • Professor Kaltman was able to help debug the issue, databasegenerator.py is now running

Tasks:

  • Run this process on the server, take notes on the processes while doing so

Development Blog #5

September 26th, 2022

  • This week consisted of transferring the repository onto the sever and figuring out how to use command lines in the process
  • This took many trail and error, Evan went ahead and joined me on this task as well
    • There is a line of code that deals with setting the hash type that for some reason was being overlooked in the code when being ran in the server
    • This was being overlooked as well when being ran in general, we had to access the GUI to set the hash settings
  • We studied the code to see how the setting were being configured in the code. We found resources that were able to point us in the right direction. We made a small edit to the line of code and we were able to fix the hash error
  • We ran csvgenerator on the sever and we finally got csvs to generate

Tasks:

  • See how we can do data analysis on sigfried

Development Blog #6

October 3rd, 2022

  • I attempted to execute data analysis with sigfried and it partially worked
    • I was able to figure out how to tell it to use droid while also setting the correct hash setting
    • I came to a stop as there was not a way to tell sigfried to recrusively make csvs for each project folder within the folder hierarchy
    • I tried to create a script that would trigger sigfried to do this recursively but that did not work as well
    • We made the collective decision that, for the sake of time, to move away from working with sigfried and start creating the database

Tasks:

  • Start working on the database with Evan

Development Blog #7

October 10th, 2022

  • We began making more csvs on the newer 2019 semester data
  • We looked into how we can clean the data/organize the database
  • I drew up how our potentials statistics view could look like
  • Not much progress, baby steps of cleaning

Tasks:

  • Further develop the database

Development Blog #8

October 17th, 2022

  • Further database creation
  • I revisted the tabluea workbook to see how we may want to display the data
  • Was out sick for some time during this week. Not much work was completed on my end

Tasks

  • Further develop database

Development Blog #9

October 24th, 2022

  • Began analyzing the tables that Evan created
    • I went ahead and ran some test and trail runs to understand the database a bit better
    • I revisited data visualization notes to imagine how we can categorize fields and create calculuations
    • We wanted to have a data analysis at project level, technologies level, and year level
      • I attempted to query the data in this way. Ex. there is a field type called “mime type” that describes the type of technology of the project. I began parsing this field so that only a single technology appears. Did not get so far. Database is huge.

Tasks

  • Work on the etc statistics view

Development Blog #10

October 31st, 2022

  • Created the ETC Statistics View
    • This view contains the fields and statistics (calculations) that we would want to visualize in tableau
    • I created multiple ctes that calculate file counts, redundancy, project size and some more
    • We can definitely optimize the code a bit more to lower runtime execution. I filtered the data to a single project so that I could test these calculations without having to wait for it to query the entire database.
    • As of right now the view is still making use of ETC_DIRTY

Tasks

  • Optimize the data possibly
  • Run the view on the entire data set to see how we should move foward

Development Blog #11

November 7th, 2022

  • Worked more on the
  • It was taking a long time to finish querying the data
  • We went ahead and combines CTE’s as much as I was able to
    • Some trail and error here. We use a GROUP BY function to group the calculations by project that brought up some errors.
  • Joined evan to see how he cleaned the dataset

Tasks

  • Look more how we can optimize the data set / database further as the execution time is still very high

Development Blog #12

November 21st, 2022

  • DataGrip was still very much taking a long time to query the data
  • We connected the database with Tabluea, the execution time in tableau went up to 300 minutes before closing unexpectedly
  • Our data visualization process was put on hold for last week, hopefully we are able to build visualizations before wednesday. Professor kaltman was able to help us index the database to where it executes everything much quicker than before

Tasks

  • Create visualizations

Leave a Reply

Your email address will not be published. Required fields are marked *