150 words agree or disagrees to each questions
Data Analytics Life Cycle
The data analytics life cycle is composed of six phases. Multiple phases can take place in a single step and the process is designed to move forward and backwards. As the data is extracted, modeled, and analyzed, new knowledge and information can emerge making the team move to an earlier stage and start the process again. The analytics team is made of several roles and positions. These include the business user, project sponsor, project manager, business intelligence analyst, database administrator, data engineer, and data scientist. These roles can be accomplished by a single individual filling multiple of them or and individual assigned to each one of them. Together they fulfill the following steps: discovery, data preparation, model planning, model building, communicate results, and operationalize (EMC, 2015).
Discovery is the first step in the data life cycle. In this phase the team learns about the business needs, history, and resources. In this phase the initial hypothesis is formulated. Once this information is collected and the team understands what are the questions that are trying to be answered, the team moves to prep the data.
The second step is data preparation. This step is crucial, and possibly the most time consuming one. The team must mine the data, clean it, and load it into the sandbox. On this stage the team lays the foundation for the upcoming phases.
The third step is model planning. Based on the questions, initial hypothesis, and the mined data, the team selects the best methods and techniques to manipulate the data. The team also selects which variables will be the best for the model.
The fourth step is model building. On this stage, the team develops their testing, training, and production models.
The fifth step is communicating the results. In this step, the team presents their key findings to the stakeholders. This is like a quality control step before moving to the last stage on the data life cycle.
The sixth and final step is operationalize. In this team delivers the final reports, codes, and technical documents. It is here where the team answers the business questions and delivers the solutions.
In conclusion, the data analytics life cycle is composed of a sequence of steps that enables analytics teams mine, clean, and model data and solutions based on business inquiries. These steps are accomplished in sequence and as the process develops might make the team move back and forward until arriving to coherent and feasible solutions. Lastly, the cycle is accomplished by the various business roles (mentioned in the opening statements) and are crucial in the understanding of the data, the process, and solutions.
EMC. (2015). Data Analytics Life Cycle. In EMC, Data Science and Big Data Analytics. Indianapolis: John Wiley & Sons, Inc.
Please explain which Big Data Analytical’ phases are often used to address the importing and exporting of data by using R programming process? Which stages are helpful in detecting dirty data? What is the most successful visualization process analyzed variables?
A: The broad answer would be that at any point in time the programming process could import and export data using R programming process. That being said the majority of the data work and time spent in the analytical phases would be in phase 2, but to 50% of the processes time is spent here (EMC Education Services, 2015, p. 53). During phase 2 and phase 3 the data collected is evaluated for the “clarity” to answer the proposed question. At which point (at phase 3) the whole project could go back to phase 2. Back in phase 2 more data could be pulled, to help support the project.
Visualization of the variable is also best done within phase 2 as the process can root out data that may not be compatible to the study. Processes such as scatter plots, and lag plots may in fact show weak correlation between data points, allows some data to be removed from the model all together. Forcing data scientists to go back to pulling or collecting more data that will better fit the study. Raw data as well can be sorted and displayed to match at this point. However, the importance of visualization in the communicate results phase 5 should not be understated. Displaying an overview of the research without visualizations would be very difficult for managers/CEOs to understand. At this point your trying to convince people that your outcome with X data is the right decision to make and visualizations are a very important part of this.
EMC Education Services. (2015). Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. Indianapolis, IN: John Wiley & Sons, Inc.
Q2: I think I’m way behind but how many of you already know how to program in R, I’m taking classes now in it, I feel like I’m way behind!