new research on data analysis

Data, AI, & Machine Learning
Managing Technology
Social Responsibility
Workplace, Teams, & Culture
AI & Machine Learning
Diversity & Inclusion
Big ideas Research Projects
Artificial Intelligence and Business Strategy
Responsible AI
Future of the Workforce
Future of Leadership
All Research Projects

AI in Action

Most Popular
The Truth Behind the Nursing Crisis
Work/23: The Big Shift
Coaching for the Future-Forward Leader
Measuring Culture

The spring 2024 issue’s special report looks at how to take advantage of market opportunities in the digital space, and provides advice on building culture and friendships at work; maximizing the benefits of LLMs, corporate venture capital initiatives, and innovation contests; and scaling automation and digital health platform.

Past Issues
Upcoming Events
Video Archive
Me, Myself, and AI
Three Big Points

Five Key Trends in AI and Data Science for 2024

These developing issues should be on every leader’s radar screen, data executives say.

Data, AI, & Machine Learning
AI & Machine Learning
Data & Data Culture
Technology Implementation

Carolyn Geason-Beissel/MIT SMR | Getty Images

Artificial intelligence and data science became front-page news in 2023. The rise of generative AI, of course, drove this dramatic surge in visibility. So, what might happen in the field in 2024 that will keep it on the front page? And how will these trends really affect businesses?

During the past several months, we’ve conducted three surveys of data and technology executives. Two involved MIT’s Chief Data Officer and Information Quality Symposium attendees — one sponsored by Amazon Web Services (AWS) and another by Thoughtworks . The third survey was conducted by Wavestone , formerly NewVantage Partners, whose annual surveys we’ve written about in the past . In total, the new surveys involved more than 500 senior executives, perhaps with some overlap in participation.

Get Updates on Leading With AI and Data

Get monthly insights on how artificial intelligence impacts your organization and what it means for your company and customers.

Please enter a valid email address

Thank you for signing up

Surveys don’t predict the future, but they do suggest what those people closest to companies’ data science and AI strategies and projects are thinking and doing. According to those data executives, here are the top five developing issues that deserve your close attention:

1. Generative AI sparkles but needs to deliver value.

As we noted, generative AI has captured a massive amount of business and consumer attention. But is it really delivering economic value to the organizations that adopt it? The survey results suggest that although excitement about the technology is very high , value has largely not yet been delivered. Large percentages of respondents believe that generative AI has the potential to be transformational; 80% of respondents to the AWS survey said they believe it will transform their organizations, and 64% in the Wavestone survey said it is the most transformational technology in a generation. A large majority of survey takers are also increasing investment in the technology. However, most companies are still just experimenting, either at the individual or departmental level. Only 6% of companies in the AWS survey had any production application of generative AI, and only 5% in the Wavestone survey had any production deployment at scale.

Surveys suggest that though excitement about generative AI is very high, value has largely not yet been delivered.

Production deployments of generative AI will, of course, require more investment and organizational change, not just experiments. Business processes will need to be redesigned, and employees will need to be reskilled (or, probably in only a few cases, replaced by generative AI systems). The new AI capabilities will need to be integrated into the existing technology infrastructure.

Perhaps the most important change will involve data — curating unstructured content, improving data quality, and integrating diverse sources. In the AWS survey, 93% of respondents agreed that data strategy is critical to getting value from generative AI, but 57% had made no changes to their data thus far.

2. Data science is shifting from artisanal to industrial.

Companies feel the need to accelerate the production of data science models . What was once an artisanal activity is becoming more industrialized. Companies are investing in platforms, processes and methodologies, feature stores, machine learning operations (MLOps) systems, and other tools to increase productivity and deployment rates. MLOps systems monitor the status of machine learning models and detect whether they are still predicting accurately. If they’re not, the models might need to be retrained with new data.

Producing data models — once an artisanal activity — is becoming more industrialized.

Most of these capabilities come from external vendors, but some organizations are now developing their own platforms. Although automation (including automated machine learning tools, which we discuss below) is helping to increase productivity and enable broader data science participation, the greatest boon to data science productivity is probably the reuse of existing data sets, features or variables, and even entire models.

3. Two versions of data products will dominate.

In the Thoughtworks survey, 80% of data and technology leaders said that their organizations were using or considering the use of data products and data product management. By data product , we mean packaging data, analytics, and AI in a software product offering, for internal or external customers. It’s managed from conception to deployment (and ongoing improvement) by data product managers. Examples of data products include recommendation systems that guide customers on what products to buy next and pricing optimization systems for sales teams.

But organizations view data products in two different ways. Just under half (48%) of respondents said that they include analytics and AI capabilities in the concept of data products. Some 30% view analytics and AI as separate from data products and presumably reserve that term for reusable data assets alone. Just 16% say they don’t think of analytics and AI in a product context at all.

We have a slight preference for a definition of data products that includes analytics and AI, since that is the way data is made useful. But all that really matters is that an organization is consistent in how it defines and discusses data products. If an organization prefers a combination of “data products” and “analytics and AI products,” that can work well too, and that definition preserves many of the positive aspects of product management. But without clarity on the definition, organizations could become confused about just what product developers are supposed to deliver.

4. Data scientists will become less sexy.

Data scientists, who have been called “ unicorns ” and the holders of the “ sexiest job of the 21st century ” because of their ability to make all aspects of data science projects successful, have seen their star power recede. A number of changes in data science are producing alternative approaches to managing important pieces of the work. One such change is the proliferation of related roles that can address pieces of the data science problem. This expanding set of professionals includes data engineers to wrangle data, machine learning engineers to scale and integrate the models, translators and connectors to work with business stakeholders, and data product managers to oversee the entire initiative.

Another factor reducing the demand for professional data scientists is the rise of citizen data science , wherein quantitatively savvy businesspeople create models or algorithms themselves. These individuals can use AutoML, or automated machine learning tools, to do much of the heavy lifting. Even more helpful to citizens is the modeling capability available in ChatGPT called Advanced Data Analysis . With a very short prompt and an uploaded data set, it can handle virtually every stage of the model creation process and explain its actions.

Of course, there are still many aspects of data science that do require professional data scientists. Developing entirely new algorithms or interpreting how complex models work, for example, are tasks that haven’t gone away. The role will still be necessary but perhaps not as much as it was previously — and without the same degree of power and shimmer.

5. Data, analytics, and AI leaders are becoming less independent.

This past year, we began to notice that increasing numbers of organizations were cutting back on the proliferation of technology and data “chiefs,” including chief data and analytics officers (and sometimes chief AI officers). That CDO/CDAO role, while becoming more common in companies, has long been characterized by short tenures and confusion about the responsibilities. We’re not seeing the functions performed by data and analytics executives go away; rather, they’re increasingly being subsumed within a broader set of technology, data, and digital transformation functions managed by a “supertech leader” who usually reports to the CEO. Titles for this role include chief information officer, chief information and technology officer, and chief digital and technology officer; real-world examples include Sastry Durvasula at TIAA, Sean McCormack at First Group, and Mojgan Lefebvre at Travelers.

This evolution in C-suite roles was a primary focus of the Thoughtworks survey, and 87% of respondents (primarily data leaders but some technology executives as well) agreed that people in their organizations are either completely, to a large degree, or somewhat confused about where to turn for data- and technology-oriented services and issues. Many C-level executives said that collaboration with other tech-oriented leaders within their own organizations is relatively low, and 79% agreed that their organization had been hindered in the past by a lack of collaboration.

We believe that in 2024, we’ll see more of these overarching tech leaders who have all the capabilities to create value from the data and technology professionals reporting to them. They’ll still have to emphasize analytics and AI because that’s how organizations make sense of data and create value with it for employees and customers. Most importantly, these leaders will need to be highly business-oriented, able to debate strategy with their senior management colleagues, and able to translate it into systems and insights that make that strategy a reality.

About the Authors

Thomas H. Davenport ( @tdav ) is the President’s Distinguished Professor of Information Technology and Management at Babson College, a fellow of the MIT Initiative on the Digital Economy, and senior adviser to the Deloitte Chief Data and Analytics Officer Program. He is coauthor of All in on AI: How Smart Companies Win Big With Artificial Intelligence (HBR Press, 2023) and Working With AI: Real Stories of Human-Machine Collaboration (MIT Press, 2022). Randy Bean ( @randybeannvp ) is an industry thought leader, author, founder, and CEO and currently serves as innovation fellow, data strategy, for global consultancy Wavestone. He is the author of Fail Fast, Learn Faster: Lessons in Data-Driven Leadership in an Age of Disruption, Big Data, and AI (Wiley, 2021).

More Like This

Add a comment cancel reply.

You must sign in to post a comment. First time here? Sign up for a free account : Comment on articles and get access to many more articles.

Comment (1)

Nicolas corzo.

Survey Paper
Open access
Published: 18 December 2021

A new theoretical understanding of big data analytics capabilities in organizations: a thematic analysis

Renu Sabharwal 1 &
Shah Jahan Miah ORCID: orcid.org/0000-0002-3783-8769 1

Journal of Big Data volume 8 , Article number: 159 ( 2021 ) Cite this article

18k Accesses

14 Citations

Metrics details

Big Data Analytics (BDA) usage in the industry has been increased markedly in recent years. As a data-driven tool to facilitate informed decision-making, the need for BDA capability in organizations is recognized, but few studies have communicated an understanding of BDA capabilities in a way that can enhance our theoretical knowledge of using BDA in the organizational domain. Big Data has been defined in various ways and, the past literature about the classification of BDA and its capabilities is explored in this research. We conducted a literature review using PRISMA methodology and integrated a thematic analysis using NVIVO12. By adopting five steps of the PRISMA framework—70 sample articles, we generate five themes, which are informed through organization development theory, and develop a novel empirical research model, which we submit for validity assessment. Our findings improve effectiveness and enhance the usage of BDA applications in various Organizations.

Introduction

Organizations today continuously harvest user data [e.g., data collections] to improve their business efficiencies and practices. Significant volumes of stored data or data regarding electronic transactions are used in support of decision making, with managers, policymakers, and executive officers now routinely embracing technology to transform these abundant raw data into useful, informative information. Data analysis is complex, but one data-handling method, “Big Data Analytics” (BDA)—the application of advanced analytic techniques, including data mining, statistical analysis, and predictive modeling on big datasets as new business intelligence practice [ 1 ]—is widely applied. BDA uses computational intelligence techniques to transform raw data into information that can be used to support decision-making.

Because decision-making in organizations has become increasingly reliant on Big Data, analytical applications have increased in importance for evidence-based decision making [ 2 ]. The need for a systematic review of Big Data stream analysis using rigorous and methodical approaches to identify trends in Big Data stream tools, analyze techniques, technologies, and methods is becoming increasingly important [ 3 ]. Organizational factors such as organizational resources adjustment, environmental acceptance, and organizational management relate to implement its BDA capability and enhancing its benefits through BDA technologies [ 4 ]. It is evident from past literature that BDA supports the organizational decision-making process by developing suitable theoretical understanding, but extending existing theories remains a significant challenge. The improved capability of BDA will ensure that the organizational products and services are continuously optimized to meet the evolving needs of consumers.

Previous systematic reviews have focused on future BDA adoption challenges [ 5 , 6 , 7 ] or technical innovation aspects of Big Data analytics [ 8 , 9 ]. This signifies those numerous studies have examined Big Data issues in different domains. These different domains are included: quality of Big Data in financial service organization [ 10 ]; organizational value creation because of BDA usage [ 11 ]; application of Big Data in health organizations [ 9 ]; decision improvement using Big Data in health [ 12 ]; application of Big Data in transport organizations [ 13 ]; relationships between Big Data in financial domains [ 14 ]; and quality of Big Data and its impact on government organizations [ 15 ].

While there has been a progressive increase in research on BDA, its capabilities and how organizations may exploit them are less well studied [ 16 ]. We apply a PRISMA framework [ 17 ]) and qualitative thematic analysis to create the model to define the relationship between BDAC and OD. The proposed research presents an overview of BDA capabilities and how they can be utilized by organizations. The implications of this research for future research development. Specifically, we (1) provide an observation into key themes regarding BDAC concerning state-of-the-art research in BDA, and (2) show an alignment to organizational development theory in terms of a new empirical research model which will be submitted for validity assessment for future research of BDAC in organizations.

According to [ 20 ], a systematic literature review first involves describing the key approach and establishing definitions for key concepts. We use a six-phase process to identify, analyze, and sequentially report themes using NVIVO 12.

Study background

Many forms of BDA exist to meet specific decision-support demands of different organizations. Three BDA analytical classes exist: (1) descriptive , dealing with straightforward questions regarding what is or has happened and why—with ‘opportunities and problems’ using descriptive statistics such as historical insights; (2) predictive , dealing with questions such as what will or is likely to happen, by exploring data patterns with relatively complex statistics, simulation, and machine-learning algorithms (e.g., to identify trends in sales activities, or forecast customer behavior and purchasing patterns); and (3) prescriptive , dealing with questions regarding what should be happening and how to influence it, using complex descriptive and predictive analytics with mathematical optimization, simulation, and machine-learning algorithms (e.g., many large-scale companies have adopted prescriptive analytics to optimize production or solve schedule and inventory management issues) [ 18 ]. Regardless of the type of BDA analysis performed, its application significantly impacts tangible and intangible resources within an organization.

Previous studies on BDA

BDA tools or techniques are used to analyze Big Data (such as social media or substantial transactional data) to support strategic decision-making [ 19 ] in different domains (e.g., tourism, supply chain, healthcare), and numerous studies have developed and evaluated BDA solutions to improve organizational decision support. We categorize previous studies into two main groups based on non-technical aspects: those which relate to the development of new BDA requirements and functionalities in a specific problem domain and those which focus on more intrinsic aspects such as BDAC development or value-adding because of their impact on particular aspects of the business. Examples of reviews focusing on technical or problem-solving aspects are detailed in Table 1 .

The second literature group examines BDA in an organizational context, such as improving firm performance using Big Data analytics in specific business domains [ 26 ]. Studies that support BDA lead to different aspects of organizational performance [ 20 , 24 , 25 , 27 , 28 , 29 ] (Table 2 ). Another research on BDA to improve data utilization and decision-support qualities. For example, [ 30 ] explained how BDAC might be developed to improve managerial decision-making processes, and [ 4 ] conducted a thematic analysis of 15 firms to identify the factors related to the success of BDA capability development in SCM.

Potential applications of BDA

Many retail organizations use analytical approaches to gain commercial advantage and organizational success [ 31 ]. Modern organizations increasingly invest in BDA projects to reduce costs, make accurate decision making, and future business planning. For example, Amazon was the first online retailer and maintained its innovative BDA improvement and use [ 31 ]. Examples of successful stories of BDA use in business sectors include.

Retail: business organizations using BDA for dynamic (surge) pricing [ 32 ] to adjust product or service prices based on demand and supply. For instance, Amazon uses dynamic pricing to surge prices by product demand.

Hospitality: Marriott hotels—the largest hospitality agent with a rapidly increasing number of hotels and serviced customers—uses BDA to improve sales [ 33 ].

Entertainment: Netflix uses BDA to retain clientele and increase sales and profits [ 34 , 35 ].

Transportation : Uber uses BDA [ 36 ] to capture Big Data from various consumers and identify the best routes to locations. ‘Uber eats,’ despite competing with other delivery companies, delivers foods in the shortest possible time.

Foodservice: McDonald's continuously updates information with BDA, following a recent shift in food quality, now sells healthy food to consumers [ 37 ], and has adopted a dynamic menu [ 38 ].

Finance: American Express has used BDA for a long time and was one of the first companies to understand the benefits of using BDA to improve business performance [ 39 ]. Big Data is collected on the ways consumers make on- and offline purchases, and predictions are made as to how they will shop in the future.

Manufacturing: General Electric manufactures and distributes products such as wind turbines, locomotives, airplane engines, and ship engines [ 40 ]. By dealing with a huge amount of data from electricity networks, meteorological information systems, geographical information systems, benefits can be brought to the existing power system, including improving customer service and social welfare in the era of big data.

Online business: music streaming websites are increasingly popular and continue to grow in size and scope because consumers want a customized streaming service [ 41 ]. Many streaming services (e.g., Apple Music, Spotify, Google Music) use various BDA applications to suggest new songs to consumers.

Organization value assessment with BDA

Specific performance measures must be established that rely on the number of organizational contextual factors such as the organization's goal, the external environment of the organization, and the organization itself. When looking at the above contexts regarding the use of BDA to strengthen process innovation skills, it is important to note that the approach required to achieve positive results depends on the different combinations along with the area in which BDA deployed [ 42 ].

Organizational development and BDA

To assist organization decision-making for growth, effective processes are required to perform operations such as continuous diagnosis, action planning, and the implementation and evaluation of BDA. Lewin’s Organizational Development (OD) theory regards processes as having a goal to transfer knowledge and skills to an organization, with the process being mainly to improve problem-solving capacity and to manage future change. Beckhard [ 43 ] defined OD as the internal dynamics of an organization, which involve a collection of individuals working as a group to improve organizational effectiveness, capability, work performance, and the ability to adjust culture, policies, practices, and procedure requirements.

OD is ‘a system-wide application and transfer of behavioral science knowledge to the planned development, improvement, and reinforcement of the strategies, structures, and processes that lead to organization effectiveness’ [ 44 ], and has three concepts: organizational climate, culture, and capability [ 45 ]. Organizational climate is ‘the mood or unique personality of an organization’ [ 45 ] which includes shared perceptions of policies, practices, and procedures; climate features also consist of leadership, communication, participative management, and role clarity. Organizational culture involves shared basic assumptions, values, norms, behavioral patterns, and artifacts, defined by [ 46 ] as a pattern of shared basic assumptions that a group learned by solving problems of external adaptation and internal integration (p. 38). Organizational capacity (OC) implies the organization's function, such as the production of services or products or maintenance of organizational operations, and has four components: resource acquisition, organization structure, production subsystem, and accomplishment [ 47 ]. Organizational culture and climate affect an organization’s capacity to operate adequately (Fig. 1 ).

Framework of modified organizational development theory [ 45 ]

Research methodology

Our systematic literature review presents a research process for analyzing and examining research and gathering and evaluating it [ 48 ] In accordance with a PRISMA framework [ 49 ]. We use keywords to search for articles related to the BDA application, following a five-stage process.

Stage1: design development

We establish a research question to instruct the selection and search strategy and analysis and synthesis process, defining the aim, scope, and specific research goals following guidelines, procedures, and policies of the Cochrane Handbook for Systematic Reviews of Intervention [ 50 ]. The design review process is directed by the research question: what are the consistent definitions of BDA, unique attributes, objections, and business revolution, including improving the decision-making process and organization performance with BDA? The below table is created using the outcome of the search performed using Keywords- Organizational BDAC, Big Data, BDA (Table 3 ).

Stage 2: inclusion and elimination criteria

To maintain the nuances of a systematic review, we apply various inclusion and exclusion criteria to our search for research articles in four databases: Science Direct, Web of Science, IEEE (Institute of Electrical and Electronics Engineers), and Springer Link. Inclusion criteria include topics on ‘Big Data in Organization’ published between 2015 to 2021, in English. We use essential keywords to identify the most relevant articles, using truncation, wildcarding, and appropriate Boolean operators (Table 4 ).

Stage 3: literature sources and search approach

Research articles are excluded based on keywords and abstracts, after which 8062 are retained (Table 5 ). The articles only selected keywords such as Big Data, BDA, BDAC, and the Abstract only focused on the Organizational domain.

Stage 4: assess the quality of full papers

At this stage, for each of the 161 research articles that remained after stage 3 presented in Table 6 , which was assessed independently by authors in terms of several quality criteria such as credibility, to assess whether the articles were well presented, relevance which was assessed based on whether the articles were used in the organizational domain.

Stage 5: literature extraction and synthesis process

At this stage, only journal articles and conference papers are selected. Articles for which full texts were not open access were excluded, reducing our references to 70 papers Footnote 1 (Table 7 ).

Meta-analysis of selected papers

Of the 70 papers satisfying our selection criteria, publication year and type (journal or conference paper) reveal an increasing trend in big data analytics over the last 6 years (Table 6 ). Additionally, journals produced more BDA papers than Conference proceedings (Fig. 2 ), which may be affected during 2020–2021 because of COVID, and fewer conference proceedings or publications were canceled.

Distribution of publications by year and publication type

Of the 70 research articles, 6% were published in 2015, 13% (2016), 14% (2017), 16% (2018), 20% (2019), 21% (2020), and 10% (untill May 2021).

Thematic analysis is used to find the results which can identify, analyze and report patterns (themes) within data, and produce an insightful analysis to answer particular research questions [ 51 ].

The combination of NVIVO and Thematic analysis improves results. Judger [ 52 ] maintained that using computer-assisted data analysis coupled with manual checks improves findings' trustworthiness, credibility, and validity (p. 6).

Defining big data

Of 70 articles, 33 provide a clear replicable definition of Big Data, from which the five representative definitions are presented in Table 8 .

Defining BDA

Of 70 sample articles, 21 clearly define BDA. The four representative definitions are presented in Table 9 . Some definitions accentuate the tools and processes used to derive new insights from big data.

Defining Big Data analytics capability

Only 16% of articles focus on Big Data characteristics; one identifies challenges and issues with adopting and implementing the acquisition of Big Data in organizations [ 42 ]. The above study resulted that BDAC using the large volumes of data generated through different devices and people to increase efficiency and generate more profits. BDA capability and its potential value could be more than a business expects, which has been presented that the professional services, manufacturing, and retail have structural barriers and overcome these barriers with the use of Big Data [ 60 ]. We define BDAC as the combined ability to store, process, and analyze large amounts of data to provide meaningful information to users. Four dimensions of BDAC exist data integration, analytical, predictive, and data interpretation (Table 10 ).

It is feasible to identify outstanding issues of research that are of excessive relevance, which has termed in five themes using NVIVO12 (Fig. 3 ). Table 11 illustrates four units that combine NVIVO with thematic analysis for analysis: Big data, BDA, BDAC, and BDA themes. We manually classify five BDA themes to ensure accuracy with appropriate perception in detail and provide suggestions on how future researchers might approach these problems using a research model.

Thematic analysis using NVIVO 12

Manyika et al . [ 63 ] considered that BDA could assist an organization to improve its decision making, minimize risks, provide other valuable insights that would otherwise remain hidden, aid the creation of innovative business models, and improve performance.

The five themes presented in Table 11 identify limitations of existing literature, which are examined in our research model (Fig. 4 ) using four hypotheses. This theoretical model identifies organizational and individual levels as being influenced by organization climate, culture, and capacity. This model can assist in understanding how BDA can be used to improve organizational and individual performance.

The framework of organizational development theory [ 64 ]

The Research model development process

We analyze literature using a new research method, driven by the connection between BDAC and resource-based views, which included three resources: tangible (financial and physical), human skills (employees’ knowledge and skills), and intangible (organizational culture and organizational learning) used in IS capacity literature [ 65 , 66 , 67 , 68 ]. Seven factors enable firms to create BDAC [ 16 ] (Fig. 5 ).

Classification of Big Data resources (adapted from [ 16 ])

To develop a robust model, tangible, intangible, and human resource types should be implemented in an organization and contribute to the emergence of the decision-making process. This research model recognizes BDAC to enhance OD, strengthening organizational strategies and the relationship between BD resources and OD. Figure 6 depicts a theoretical framework illustrating how BDA resources influence innovation sustainability and OD, where Innovation sustainability helps identify market opportunities, predict customer needs, and analyze customer purchase decisions [ 69 ].

Theroretical framework illustrating how BDA resources influence innovation sustainability and organizational development (adapted from [ 68 ])

Miller [ 70 ] considered data a strategic business asset and recommended that businesses and academics collaborate to improve knowledge regarding BD skills and capability across an organization; [ 70 ] concluded that every profession, whether business or technology, will be impacted by big data and analytics. Gobble [ 71 ] proposed that an organization should develop new technologies to provide necessary supplements to enhance growth. Big Data represents a revolution in science and technology, and a data-rich smart city is the expected future that can be developed using Big Data [ 72 ]. Galbraith [ 73 ] reported how an organization attempting to develop BDAC might experience obstacles and opportunities. We found no literature that combined Big Data analytics capability and Organizational Development or discussed interaction between them.

Because little empirical evidence exists regarding the connection between OD and BDA or their characteristics and features, our model (Fig. 7 ) fills an important void, directly connecting BDAC and OD, and illustrates how it affects OD in the organizational concepts of capacity, culture, and climate, and their future resources. Because BDAC can assist OD through the implementation of new technologies [ 15 , 26 , 57 ], we hypothesize:

Proposed interpretation in the research model

H1: A positive relationship exists between Organizational Development and BDAC.

OC relies heavily on OD, with OC representing a resource requiring development in an organization. Because OD can improve OC [ 44 , 45 ], we hypothesize that:

H2: A positive relationship exists between Organizational Development and Organizational Capability.

With the implementation or adoption of BDAC, OC is impacted [ 46 ]. Big data enables an organization to improve inefficient practices, whether in marketing, retail, or media. We hypothesize that:

H3: A positive relationship exists between BDAC and Organizational Culture.

Because BDAC adoption can affect OC, the policies, practices, and measures associated with an organization's employee experience [ 74 ], and improve both the business climate and an individual’s performance, we hypothesize that:

H4: A positive relationship exists between BDAC and Organizational Climate.

Our research is based on a need to develop a framework model in relation to OD theory because modern organizations cannot ignore BDA or its future learning and association with theoretical understanding. Therefore, we aim to demonstrate current trends in capabilities and a framework to improve understanding of BDAC for future research.

Despite the hype that encompasses Big Data, the organizational development and structure through which it results in competitive gains have remained generally underexplored in empirical studies. It is feasible to distinguish the five prominent, highly relevant themes discussed in an earlier section by orchestrating a systematic literature review and recording what is known to date. By conducting those five thematic areas of the research, as depicted in the research model in Fig. 7 , provide relation how they are impacting each other’s performance and give some ideas on how researchers could approach these problems.

The number of published papers on Big Data is increasing. Between 2015 and May 2021, the highest proportion of journal articles for any given year (21%) occurred until May 2021 with the inclusion or exclusion criteria such as the article selection only opted using four databases: Science Direct, Web of Science, IEEE (Institute of Electrical and Electronics Engineers), and Springer Link and included only those articles which titled as 'Big Data in Organization' published, in the English language. We use essential keywords to identify the most relevant articles, using truncation, wildcarding, and appropriate Boolean operators. While BDAC can improve business-related outcomes, including more effective marketing, new revenue opportunities, customer personalization, and improved operational efficiency, existing literature has focused on only one or two aspects of BDAC. Our research model (Fig. 7 ) represents the relationship between BDAC and OD to better understand their impacts on OC. We explain that the proposed model education will enhance knowledge of BDAC and that it may better meet organizational requirements, ensuring improved products and services to optimize consumer outcomes.

Considerable research has been conducted in many different contexts such as the health sector, education about Big Data, but according to past literature, BDAC in an organization is still an open issue, how to utilize BDAC within the organization for development purposes. The full potential of BDA and what it can offer must be leveraged to gain a commercial advantage. Therefore, we focus on summarizing by creating the themes using past relevant literature and propose a research model based on literature [ 61 ] for business.

While we explored Springer Link, IEEE, Science Direct, and Web of Science (which index high-impact journal and conference papers), the possibility exists that some relevant journals were missed. Our research is constrained by our selection criteria, including year, language (English), and peer-reviewed journal articles (we omitted reports, grey journals, and web articles).

A steadily expanding number of organizations has been endeavored to utilize Big Data and organizational analytics to analyze available data and assist with decision-making. For these organizations, influence the full potential that Big Data and organizational analytics can present to acquire competitive advantage. In any case, since Big Data and organizational analytics are generally considered as new innovative in business worldview, there is a little exploration on how to handle them and leverage them adequately. While past literature has shown the advantages of utilizing Big Data in various settings, there is an absence of theoretically determined research on the most proficient method to use these solutions to acquire competitive advantage. This research recognizes the need to explore BDA through a comprehensive approach. Therefore, we focus on summarizing with the proposed development related to BDA themes on which we still have a restricted observational arrangement.

To this end, this research proposes a new research model that relates earlier studies regarding BDAC in organizational culture. The research model provides a reference to the more extensive implementation of Big Data technologies in an organizational context. While the hypothesis present in the research model is on a significant level and can be deciphered as addition to theoretical lens, they are depicted in such a way that they can be adapted for organizational development. This research poses an original point of view on Big Data literature since, by far majority focuses on tools, infrastructure, technical aspects, and network analytics. The proposed framework contributes to Big Data and its capability in organizational development by covering the gap which has not addressed in past literature. This research model also can be viewed as a value-adding knowledge for managers and executives to learn how to drive channels of creating benefit in their organization through the use of Big Data, BDA, and BDAC.

We identify five themes to leverage BDA in an organization and gain a competitive advantage. We present a research model and four hypotheses to bridge gaps in research between BDA and OD. The purpose of this model and these hypotheses is to guide research to improve our understanding of how BDA implementation can affect an organization. The model goes for the next phase of our study, in which we will test the model for its validity.

Availability of data and materials

Data will be supplied upon request.

Appendix A is submitted as a supplementary file for review.

Abbreviations

The Institute of Electrical and Electronics Engineers

Big Data Analytics

Big Data Analytics Capabilities

Organizational Development

Organizational Capacity

Russom P. Big data analytics. TDWI Best Practices Report, Fourth Quarter. 2011;19(4):1–34.

Google Scholar

Mikalef P, Boura M, Lekakos G, Krogstie J. Big data analytics and firm performance: findings from a mixed-method approach. J Bus Res. 2019;98:261–76.

Kojo T, Daramola O, Adebiyi A. Big data stream analysis: a systematic literature review. J Big Data. 2019;6(1):1–30.

Jha AK, Agi MA, Ngai EW. A note on big data analytics capability development in supply chain. Decis Support Syst. 2020;138:113382.

Posavec AB, Krajnović S. Challenges in adopting big data strategies and plans in organizations. In: 2016 39th international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE. 2016. p. 1229–34.

Madhlangobe W, Wang L. Assessment of factors influencing intent-to-use Big Data Analytics in an organization: pilot study. In: 2018 IEEE 20th International Conference on High-Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE. 2018. p. 1710–1715.

Saetang W, Tangwannawit S, Jensuttiwetchakul T. The effect of technology-organization-environment on adoption decision of big data technology in Thailand. Int J Electr Comput. 2020;10(6):6412. https://doi.org/10.11591/ijece.v10i6.pp6412-6422 .

Article Google Scholar

Pei L. Application of Big Data technology in construction organization and management of engineering projects. J Phys Conf Ser. 2020. https://doi.org/10.1088/1742-6596/1616/1/012002 .

Marashi PS, Hamidi H. Business challenges of Big Data application in health organization. In: Khajeheian D, Friedrichsen M, Mödinger W, editors. Competitiveness in Emerging Markets. Springer, Cham; 2018. p. 569–584. doi: https://doi.org/10.1007/978-3-319-71722-7_28 .

Haryadi AF, Hulstijn J, Wahyudi A, Van Der Voort H, Janssen M. Antecedents of big data quality: an empirical examination in financial service organizations. In 2016 IEEE International Conference on Big Data (Big Data). IEEE. 2016. p. 116–121.

George JP, Chandra KS. Asset productivity in organisations at the intersection of Big Data Analytics and supply chain management. In: Chen JZ, Tavares J, Shakya S, Iliyasu A, editors. Image Processing and Capsule Networks. ICIPCN 2020. Advances in Intelligent Systems and Computing, vol 1200. Springer, Cham; 2020. p. 319–330.

Sousa MJ, Pesqueira AM, Lemos C, Sousa M, Rocha Á. Decision-making based on big data analytics for people management in healthcare organizations. J Med Syst. 2019;43(9):1–10.

Du G, Zhang X, Ni S. Discussion on the application of big data in rail transit organization. In: Wu TY, Ni S, Chu SC, Chen CH, Favorskaya M, editors. International conference on smart vehicular technology, transportation, communication and applications. Springer: Cham; 2018. p. 312–8.

Wahyudi A, Farhani A, Janssen M. Relating big data and data quality in financial service organizations. In: Al-Sharhan SA, Simintiras AC, Dwivedi YK, Janssen M, Mäntymäki M, Tahat L, Moughrabi I, Ali TM, Rana NP, editors. Conference on e-Business, e-Services and e-Society. Springer: Cham; 2018. p. 504–19.

Alkatheeri Y, Ameen A, Isaac O, Nusari M, Duraisamy B, Khalifa GS. The effect of big data on the quality of decision-making in Abu Dhabi Government organisations. In: Sharma N, Chakrabati A, Balas VE, editors. Data management, analytics and innovation. Springer: Singapore; 2020. p. 231–48.

Gupta M, George JF. Toward the development of a big data analytics capability. Inf Manag. 2016;53(8):1049–64.

Selçuk AA. A guide for systematic reviews: PRISMA. Turk Arch Otorhinolaryngol. 2019;57(1):57.

Tiwari S, Wee HM, Daryanto Y. Big data analytics in supply chain management between 2010 and 2016: insights to industries. Comput Ind Eng. 2018;115:319–30.

Miah SJ, Camilleri E, Vu HQ. Big Data in healthcare research: a survey study. J Comput Inform Syst. 2021;7:1–3.

Mikalef P, Pappas IO, Krogstie J, Giannakos M. Big data analytics capabilities: a systematic literature review and research agenda. Inf Syst e-Business Manage. 2018;16(3):547–78.

Nguyen T, Li ZHOU, Spiegler V, Ieromonachou P, Lin Y. Big data analytics in supply chain management: a state-of-the-art literature review. Comput Oper Res. 2018;98:254–64.

MathSciNet MATH Google Scholar

Günther WA, Mehrizi MHR, Huysman M, Feldberg F. Debating big data: a literature review on realizing value from big data. J Strateg Inf. 2017;26(3):191–209.

Rialti R, Marzi G, Ciappei C, Busso D. Big data and dynamic capabilities: a bibliometric analysis and systematic literature review. Manag Decis. 2019;57(8):2052–68.

Wamba SF, Gunasekaran A, Akter S, Ren SJ, Dubey R, Childe SJ. Big data analytics and firm performance: effects of dynamic capabilities. J Bus Res. 2017;70:356–65.

Wang Y, Hajli N. Exploring the path to big data analytics success in healthcare. J Bus Res. 2017;70:287–99.

Akter S, Wamba SF, Gunasekaran A, Dubey R, Childe SJ. How to improve firm performance using big data analytics capability and business strategy alignment? Int J Prod Econ. 2016;182:113–31.

Kwon O, Lee N, Shin B. Data quality management, data usage experience and acquisition intention of big data analytics. Int J Inf Manage. 2014;34(3):387–94.

Chen DQ, Preston DS, Swink M. How the use of big data analytics affects value creation in supply chain management. J Manag Info Syst. 2015;32(4):4–39.

Kim MK, Park JH. Identifying and prioritizing critical factors for promoting the implementation and usage of big data in healthcare. Inf Dev. 2017;33(3):257–69.

Popovič A, Hackney R, Tassabehji R, Castelli M. The impact of big data analytics on firms’ high value business performance. Inf Syst Front. 2018;20:209–22.

Hewage TN, Halgamuge MN, Syed A, Ekici G. Big data techniques of Google, Amazon, Facebook and Twitter. J Commun. 2018;13(2):94–100.

BenMark G, Klapdor S, Kullmann M, Sundararajan R. How retailers can drive profitable growth through dynamic pricing. McKinsey & Company. 2017. https://www.mckinsey.com/industries/retail/our-insights/howretailers-can-drive-profitable-growth-throughdynamic-pricing . Accessed 13 Mar 2021.

Richard B. Hotel chains: survival strategies for a dynamic future. J Tour Futures. 2017;3(1):56–65.

Fouladirad M, Neal J, Ituarte JV, Alexander J, Ghareeb A. Entertaining data: business analytics and Netflix. Int J Data Anal Inf Syst. 2018;10(1):13–22.

Hadida AL, Lampel J, Walls WD, Joshi A. Hollywood studio filmmaking in the age of Netflix: a tale of two institutional logics. J Cult Econ. 2020;45:1–26.

Harinen T, Li B. Using causal inference to improve the Uber user experience. Uber Engineering. 2019. https://eng.uber.com/causal-inference-at-uber/ . Accessed 10 Mar 2021.

Anaf J, Baum FE, Fisher M, Harris E, Friel S. Assessing the health impact of transnational corporations: a case study on McDonald’s Australia. Glob Health. 2017;13(1):7.

Wired. McDonald's Bites on Big Data; 2019. https://www.wired.com/story/mcdonalds-big-data-dynamic-yield-acquisition

Bernard M. & Co. American Express: how Big Data and machine learning Benefits Consumers And Merchants, 2018. https://www.bernardmarr.com/default.asp?contentID=1263

Zhang Y, Huang T, Bompard EF. Big data analytics in smart grids: a review. Energy Informatics. 2018;1(1):8.

HBS. Next Big Sound—moneyball for music? Digital Initiative. 2020. https://digital.hbs.edu/platform-digit/submission/next-big-sound-moneyball-for-music/ . Accessed 10 Apr 2021.

Mneney J, Van Belle JP. Big data capabilities and readiness of South African retail organisations. In: 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence). IEEE. 2016. p. 279–86.

Beckhard R. Organizational issues in the team delivery of comprehensive health care. Milbank Mem Fund. 1972;50:287–316.

Cummings TG, Worley CG. Organization development and change. 8th ed. Mason: Thompson South-Western; 2009.

Glanz K, Rimer BK, Viswanath K, editors. Health behavior and health education: theory, research, and practice. San Francisco: Wiley; 2008.

Schein EH. Organizational culture and leadership. San Francisco: Jossey-Bass; 1985.

Prestby J, Wandersman A. An empirical exploration of a framework of organizational viability: maintaining block organizations. J Appl Behav Sci. 1985;21(3):287–305.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62(10):e1–34.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Moher D. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.

Higgins JP, Green S, Scholten RJPM. Maintaining reviews: updates, amendments and feedback. Cochrane handbook for systematic reviews of interventions. 31; 2008.

Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3(2):77–101.

Judger N. The thematic analysis of interview data: an approach used to examine the influence of the market on curricular provision in Mongolian higher education institutions. Hillary Place Papers, University of Leeds. 2016;3:1–7

Khine P, Shun W. Big data for organizations: a review. J Comput Commun. 2017;5:40–8.

Zan KK. Prospects for using Big Data to improve the effectiveness of an education organization. In: 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) . IEEE. 2019. p. 1777–9.

Ekambaram A, Sørensen AØ, Bull-Berg H, Olsson NO. The role of big data and knowledge management in improving projects and project-based organizations. Procedia Comput Sci. 2018;138:851–8.

Rialti R, Marzi G, Silic M, Ciappei C. Ambidextrous organization and agility in big data era: the role of business process management systems. Bus Process Manag. 2018;24(5):1091–109.

Wang Y, Kung L, Gupta S, Ozdemir S. Leveraging big data analytics to improve quality of care in healthcare organizations: a configurational perspective. Br J Manag. 2019;30(2):362–88.

De Mauro A, Greco M, Grimaldi M, Ritala P. In (Big) Data we trust: value creation in knowledge organizations—introduction to the special issue. Inf Proc Manag. 2018;54(5):755–7.

Batistič S, Van Der Laken P. History, evolution and future of big data and analytics: a bibliometric analysis of its relationship to performance in organizations. Br J Manag. 2019;30(2):229–51.

Jokonya O. Towards a conceptual framework for big data adoption in organizations. In: 2015 International Conference on Cloud Computing and Big Data (CCBD). IEEE. 2015. p. 153–160.

Mikalef P, Krogstie J, Pappas IO, Pavlou P. Exploring the relationship between big data analytics capability and competitive performance: the mediating roles of dynamic and operational capabilities. Inf Manag. 2020;57(2):103169.

Shuradze G, Wagner HT. Towards a conceptualization of data analytics capabilities. In: 2016 49th Hawaii International Conference on System Sciences (HICSS). IEEE. 2016. p. 5052–64.

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Hung Byers A. Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute. 2011. https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/big-data-the-next-frontier-for-innovation . Accessed XX(day) XXX (month) XXXX (year).

Wu YK, Chu NF. Introduction of the transtheoretical model and organisational development theory in weight management: a narrative review. Obes Res Clin Pract. 2015;9(3):203–13.

Grant RM. Contemporary strategy analysis: Text and cases edition. Wiley; 2010.

Bharadwaj AS. A resource-based perspective on information technology capability and firm performance: an empirical investigation. MIS Q. 2000;24(1):169–96.

Chae HC, Koh CH, Prybutok VR. Information technology capability and firm performance: contradictory findings and their possible causes. MIS Q. 2014;38:305–26.

Santhanam R, Hartono E. Issues in linking information technology capability to firm performance. MIS Q. 2003;27(1):125–53.

Hao S, Zhang H, Song M. Big data, big data analytics capability, and sustainable innovation performance. Sustainability. 2019;11:7145. https://doi.org/10.3390/su11247145 .

Miller S. Collaborative approaches needed to close the big data skills gap. J Organ Des. 2014;3(1):26–30.

Gobble MM. Outsourcing innovation. Res Technol Manag. 2013;56(4):64–7.

Ann Keller S, Koonin SE, Shipp S. Big data and city living–what can it do for us? Signif (Oxf). 2012;9(4):4–7.

Galbraith JR. Organizational design challenges resulting from big data. J Organ Des. 2014;3(1):2–13.

Schneider B, Ehrhart MG, Macey WH. Organizational climate and culture. Annu Rev Psychol. 2013;64:361–88.

Download references

Acknowledgements

Not applicable

Not applicable.

Author information

Authors and affiliations.

Newcastle Business School, University of Newcastle, Newcastle, NSW, Australia

Renu Sabharwal & Shah Jahan Miah

You can also search for this author in PubMed Google Scholar

Contributions

The first author conducted the research, while the second author has ensured quality standards and rewritten the entire findings linking to underlying theories. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shah Jahan Miah .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Sabharwal, R., Miah, S.J. A new theoretical understanding of big data analytics capabilities in organizations: a thematic analysis. J Big Data 8 , 159 (2021). https://doi.org/10.1186/s40537-021-00543-6

Download citation

Received : 17 August 2021

Accepted : 16 November 2021

Published : 18 December 2021

DOI : https://doi.org/10.1186/s40537-021-00543-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Organization
Systematic literature review
Big Data Analytics capabilities
Organizational Development Theory
Organizational Climate
Organizational Culture

data analysis Recently Published Documents

Total documents.

Latest Documents
Most Cited Documents
Contributed Authors
Related Sources
Related Keywords

Introduce a Survival Model with Spatial Skew Gaussian Random Effects and its Application in Covid-19 Data Analysis

Futuristic prediction of missing value imputation methods using extended ann.

Missing data is universal complexity for most part of the research fields which introduces the part of uncertainty into data analysis. We can take place due to many types of motives such as samples mishandling, unable to collect an observation, measurement errors, aberrant value deleted, or merely be short of study. The nourishment area is not an exemption to the difficulty of data missing. Most frequently, this difficulty is determined by manipulative means or medians from the existing datasets which need improvements. The paper proposed hybrid schemes of MICE and ANN known as extended ANN to search and analyze the missing values and perform imputations in the given dataset. The proposed mechanism is efficiently able to analyze the blank entries and fill them with proper examining their neighboring records in order to improve the accuracy of the dataset. In order to validate the proposed scheme, the extended ANN is further compared against various recent algorithms or mechanisms to analyze the efficiency as well as the accuracy of the results.

Applications of multivariate data analysis in shelf life studies of edible vegetal oils – A review of the few past years

Hypothesis formalization: empirical findings, software limitations, and design implications.

Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization . In a formative content analysis of 50 research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analyses to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation and discuss implications for future tools.

The Complexity and Expressive Power of Limit Datalog

Motivated by applications in declarative data analysis, in this article, we study Datalog Z —an extension of Datalog with stratified negation and arithmetic functions over integers. This language is known to be undecidable, so we present the fragment of limit Datalog Z programs, which is powerful enough to naturally capture many important data analysis tasks. In limit Datalog Z , all intensional predicates with a numeric argument are limit predicates that keep maximal or minimal bounds on numeric values. We show that reasoning in limit Datalog Z is decidable if a linearity condition restricting the use of multiplication is satisfied. In particular, limit-linear Datalog Z is complete for Δ 2 EXP and captures Δ 2 P over ordered datasets in the sense of descriptive complexity. We also provide a comprehensive study of several fragments of limit-linear Datalog Z . We show that semi-positive limit-linear programs (i.e., programs where negation is allowed only in front of extensional atoms) capture coNP over ordered datasets; furthermore, reasoning becomes coNEXP-complete in combined and coNP-complete in data complexity, where the lower bounds hold already for negation-free programs. In order to satisfy the requirements of data-intensive applications, we also propose an additional stability requirement, which causes the complexity of reasoning to drop to EXP in combined and to P in data complexity, thus obtaining the same bounds as for usual Datalog. Finally, we compare our formalisms with the languages underpinning existing Datalog-based approaches for data analysis and show that core fragments of these languages can be encoded as limit programs; this allows us to transfer decidability and complexity upper bounds from limit programs to other formalisms. Therefore, our article provides a unified logical framework for declarative data analysis which can be used as a basis for understanding the impact on expressive power and computational complexity of the key constructs available in existing languages.

An empirical study on Cross-Border E-commerce Talent Cultivation-—Based on Skill Gap Theory and big data analysis

To solve the dilemma between the increasing demand for cross-border e-commerce talents and incompatible students’ skill level, Industry-University-Research cooperation, as an essential pillar for inter-disciplinary talent cultivation model adopted by colleges and universities, brings out the synergy from relevant parties and builds the bridge between the knowledge and practice. Nevertheless, industry-university-research cooperation developed lately in the cross-border e-commerce field with several problems such as unstable collaboration relationships and vague training plans.

The Effects of Cross-border e-Commerce Platforms on Transnational Digital Entrepreneurship

This research examines the important concept of transnational digital entrepreneurship (TDE). The paper integrates the host and home country entrepreneurial ecosystems with the digital ecosystem to the framework of the transnational digital entrepreneurial ecosystem. The authors argue that cross-border e-commerce platforms provide critical foundations in the digital entrepreneurial ecosystem. Entrepreneurs who count on this ecosystem are defined as transnational digital entrepreneurs. Interview data were dissected for the purpose of case studies to make understanding from twelve Chinese immigrant entrepreneurs living in Australia and New Zealand. The results of the data analysis reveal that cross-border entrepreneurs are in actual fact relying on the significant framework of the transnational digital ecosystem. Cross-border e-commerce platforms not only play a bridging role between home and host country ecosystems but provide entrepreneurial capitals as digital ecosystem promised.

Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources

The effects of cross-border e-commerce platforms on transnational digital entrepreneurship, a trajectory evaluator by sub-tracks for detecting vot-based anomalous trajectory.

With the popularization of visual object tracking (VOT), more and more trajectory data are obtained and have begun to gain widespread attention in the fields of mobile robots, intelligent video surveillance, and the like. How to clean the anomalous trajectories hidden in the massive data has become one of the research hotspots. Anomalous trajectories should be detected and cleaned before the trajectory data can be effectively used. In this article, a Trajectory Evaluator by Sub-tracks (TES) for detecting VOT-based anomalous trajectory is proposed. Feature of Anomalousness is defined and described as the Eigenvector of classifier to filter Track Lets anomalous trajectory and IDentity Switch anomalous trajectory, which includes Feature of Anomalous Pose and Feature of Anomalous Sub-tracks (FAS). In the comparative experiments, TES achieves better results on different scenes than state-of-the-art methods. Moreover, FAS makes better performance than point flow, least square method fitting and Chebyshev Polynomial Fitting. It is verified that TES is more accurate and effective and is conducive to the sub-tracks trajectory data analysis.

Export Citation Format

Share document.

More From Forbes

The Top 5 Data Science And Analytics Trends In 2023

Share to Facebook
Share to Twitter
Share to Linkedin

Data is increasingly the differentiator between winners and also-rans in business. Today, information can be captured from many different sources, and technology to extract insights is becoming increasingly accessible.

Moving to a data-driven business model – where decisions are made based on what we know to be true rather than “gut feeling” – is core to the wave of digital transformation sweeping through every industry in 2023 and beyond. It helps us to react with certainty in the face of uncertainty – especially when wars and pandemics upset the established order of things.

But the world of data and analytics never stands still. New technologies are constantly emerging that offer faster and more accurate access to insights. And new trends emerge, bringing us new thinking on the best ways to put it to work across business and society at large. So, here’s my rundown of what I believe are the most important trends that will affect the way we use data and analytics to drive business growth in 2023.

Data Democratization

One of the most important trends will be the continued empowerment of entire workforces – rather than data engineers and data scientists – to put analytics to work. This is giving rise to new forms of augmented working, where tools, applications, and devices push intelligent insights into the hands of everybody in order to allow them to do their jobs more effectively and efficiently.

Best Travel Insurance Companies

Best covid-19 travel insurance plans.

In 2023, businesses will understand that data is the key to understanding customers, developing better products and services, and streamlining their internal operations to reduce costs and waste. However, it’s becoming increasingly clear that this won’t fully happen until the power to act on data-driven insights is available to frontline, shop floor, and non-technical staff, as well as functions such as marketing and finance.

Some great examples of data democracy in practice include lawyers using natural language processing (NLP) tools to scan pages of documents of case law, or retail sales assistants using hand terminals that can access customer purchase history in real time and recommend products to up-sell and cross-sell. Research by McKinsey has found that companies that make data accessible to their entire workforce are 40 times more likely to say analytics has a positive impact on revenue.

Artificial Intelligence

Artificial intelligence (AI) is perhaps the one technology trend that will have the biggest impact on how we live, work and do business in the future. Its effect on business analytics will be to enable more accurate predictions, reduce the amount of time we spend on mundane and repetitive work like data gathering and data cleansing, and to empower workforces to act on data-driven insights, whatever their role and level of technical expertise (see Data Democratization, above).

Put simply; AI allows businesses to analyze data and draw out insights far more quickly than would ever be possible manually, using software algorithms that get better and better at their job as they are fed more data. This is the basic principle of machine learning (ML), which is the form of AI used in business today. AI and ML technologies include NLP, which enables computers to understand and communicate with us in human languages, computer vision which enables computers to understand and process visual information using cameras, just as we do with our eyes; and generative AI, which can create text, images, sounds and video from scratch.

Cloud and Data-as-a-Service

I’ve put these two together because cloud is the platform that enables data-as-a-service technology to work. Basically, it means that companies can access data sources that have been collected and curated by third parties via cloud services on a pay-as-you-go or subscription-based billing model. This reduces the need for companies to build their own expensive, proprietary data collection and storage systems for many types of applications.

As well as raw data, DaaS companies offer analytics tools as-a-service. Data accessed through DaaS is typically used to augment a company’s proprietary data that it collects and processes itself in order to create richer and more valuable insights. It plays a big part in the democratization of data mentioned previously, as it allows businesses to work with data without needing to set up and maintain expensive and specialized data science operations. In 2023, it’s estimated that the value of the market for these services will grow to $10.7 billion .

Real-Time Data

When digging into data in search of insights, it's better to know what's going on right now – rather than yesterday, last week, or last month. This is why real-time data is increasingly becoming the most valuable source of information for businesses.

Working with real-time data often requires more sophisticated data and analytics infrastructure, which means more expense, but the benefit is that we’re able to act on information as it happens. This could involve analyzing clickstream data from visitors to our website to work out what offers and promotions to put in front of them, or in financial services, it could mean monitoring transactions as they take place around the world to watch out for warning signs of fraud. Social media sites like Facebook analyze hundreds of gigabytes of data per second for various use cases, including serving up advertising and preventing the spread of fake news. And in South Africa’s Kruger National Park, a joint initiative between the WWF and ZSL analyzes video footage in real-time to alert law enforcement to the presence of poachers .

As more organizations look to data to provide them with a competitive edge, those with the most advanced data strategies will increasingly look towards the most valuable and up-to-date data. This is why real-time data and analytics will be the most valuable big data tools for businesses in 2023.

Data Governance and Regulation

Data governance will also be big news in 2023 as more governments introduce laws designed to regulate the use of personal and other types of data. In the wake of the likes of European GDPR, Canadian PIPEDA, and Chinese PIPL, other countries are likely to follow suit and introduce legislation protecting the data of their citizens. In fact, analysts at Gartner have predicted that by 2023, 65% of the world’s population will be covered by regulations similar to GDPR.

This means that governance will be an important task for businesses over the next 12 months, wherever they are located in the world, as they move to ensure that their internal data processing and handling procedures are adequately documented and understood. For many businesses, this will mean auditing exactly what information they have, how it is collected, where it is stored, and what is done with it. While this may sound like extra work, in the long term, the idea is that everyone will benefit as consumers will be more willing to trust organizations with their data if they are sure it will be well looked after. Those organizations will then be able to use this data to develop products and services that align more closely with what we need at prices we can afford.

To stay on top of the latest on the latest trends, make sure to subscribe to my newsletter , follow me on Twitter , LinkedIn , and YouTube , and check out my books ‘Data Strategy: How To Profit From A World Of Big Data, Analytics And Artificial Intelligence’ and ‘ Business Trends in Practice ’.

Editorial Standards
Reprints & Permissions

Join The Conversation

One Community. Many Voices. Create a free account to share your thoughts.

Forbes Community Guidelines

Our community is about connecting people through open and thoughtful conversations. We want our readers to share their views and exchange ideas and facts in a safe space.

In order to do so, please follow the posting rules in our site's Terms of Service. We've summarized some of those key rules below. Simply put, keep it civil.

Your post will be rejected if we notice that it seems to contain:

False or intentionally out-of-context or misleading information
Insults, profanity, incoherent, obscene or inflammatory language or threats of any kind
Attacks on the identity of other commenters or the article's author
Content that otherwise violates our site's terms.

User accounts will be blocked if we notice or believe that users are engaged in:

Continuous attempts to re-post comments that have been previously moderated/rejected
Racist, sexist, homophobic or other discriminatory comments
Attempts or tactics that put the site security at risk
Actions that otherwise violate our site's terms.

So, how can you be a power user?

Stay on topic and share your insights
Feel free to be clear and thoughtful to get your point across
‘Like’ or ‘Dislike’ to show your point of view.
Protect your community.
Use the report tool to alert us when someone breaks the rules.

Thanks for reading our community guidelines. Please read the full list of posting rules found in our site's Terms of Service.

Articles on Data analytics

Displaying 1 - 20 of 62 articles.

For over a century, baseball’s scouts have been the backbone of America’s pastime – do they have a future?

H. James Gilmore , Flagler College and Tracy Halcomb , Flagler College

Robo-advisers are here – the pros and cons of using AI in investing

Laurence Jones , Bangor University and Heather He , Bangor University

AI threatens to add to the growing wave of fraud but is also helping tackle it

Laurence Jones , Bangor University and Adrian Gepp , Bangor University

Twitter’s new data fees leave scientists scrambling for funding – or cutting research

Jon-Patrick Allem , University of Southern California

Insurance firms can skim your online data to price your insurance — and there’s little in the law to stop this

Zofia Bednarz , University of Sydney ; Kayleen Manwaring , UNSW Sydney , and Kimberlee Weatherall , University of Sydney

Two years into the pandemic, why is Australia still short of medicines?

Maryam Ziaee , Victoria University

How we communicate, what we value – even who we are: 8 surprising things data science has revealed about us over the past decade

Paul X. McCarthy , UNSW Sydney and Colin Griffith , CSIRO

3 ways for businesses to fuel innovation and drive performance

Grant Alexander Wilson , University of Regina

Sports card explosion holds promise for keeping kids engaged in math

John Holden , Oklahoma State University

Get ready for the invasion of smart building technologies following COVID-19

Patrick Lecomte , Université du Québec à Montréal (UQAM)

David Chase might hate that ‘The Many Saints of Newark’ is premiering on HBO Max – but it’s the wave of the future

Anthony Palomba , University of Virginia

For these students, using data in sports is about more than winning games

Felesia Stukes , Johnson C. Smith University

New data privacy rules are coming in NZ — businesses and other organisations will have to lift their games

Anca C. Yallop , Auckland University of Technology

The value of the Mountain Equipment Co-op sale lies in its customer data

Michael Parent , Simon Fraser University

Disasters expose gaps in emergency services’ social media use

Tan Yigitcanlar , Queensland University of Technology ; Ashantha Goonetilleke , Queensland University of Technology , and Nayomi Kankanamge , Queensland University of Technology

How much coronavirus testing is enough? States could learn from retailers as they ramp up

Siqian Shen , University of Michigan

Tracking your location and targeted texts: how sharing your data could help in New Zealand’s level 4 lockdown

Jon MacKay , University of Auckland, Waipapa Taumata Rau

How sensors and big data can help cut food wastage

Frederic Isingizwe , Stellenbosch University and Umezuruike Linus Opara , Stellenbosch University

Data lakes: where big businesses dump their excess data, and hackers have a field day

Mohiuddin Ahmed , Edith Cowan University

How big data can help residents find transport, jobs and homes that work for them

Sae Chi , The University of Western Australia and Linda Robson , The University of Western Australia

Top contributors

Professor in Business Information Systems, University of Sydney

Lecturer in Finance, Bangor University

Adjunct Professor and Industry Fellow, UNSW Sydney

Professor of Finance, UNSW Sydney

Productivity Growth Program Director, Grattan Institute

Senior Lecturer in Applied Ethics & CyberSecurity, Griffith University

Professor of Law, University of Sydney

Professor of Computing Science, Director of the Digital Institute, Newcastle University

Senior Research Fellow, Allens Hub for Technology, Law & Innovation, and Senior Lecturer, School of Private & Commercial Law, UNSW Sydney

Strategy & Business Development, CSIRO

Professor of Urban and Cultural Geography, Western Sydney University

Professor of Organisational Behaviour, Bayes Business School, City, University of London

Leader, Machine Learning Research Group, Data61

Deputy Vice-Chancellor (Research), University of Tasmania

Research Fellow in Science Communication, UNSW Sydney

X (Twitter)
Unfollow topic Follow topic

Your Modern Business Guide To Data Analysis Methods And Techniques

Data analysis methods and techniques blog post by datapine

Table of Contents

1) What Is Data Analysis?

2) Why Is Data Analysis Important?

3) What Is The Data Analysis Process?

4) Types Of Data Analysis Methods

5) Top Data Analysis Techniques To Apply

6) Quality Criteria For Data Analysis

7) Data Analysis Limitations & Barriers

8) Data Analysis Skills

9) Data Analysis In The Big Data Environment

In our data-rich age, understanding how to analyze and extract true meaning from our business’s digital insights is one of the primary drivers of success.

Despite the colossal volume of data we create every day, a mere 0.5% is actually analyzed and used for data discovery , improvement, and intelligence. While that may not seem like much, considering the amount of digital information we have at our fingertips, half a percent still accounts for a vast amount of data.

With so much data and so little time, knowing how to collect, curate, organize, and make sense of all of this potentially business-boosting information can be a minefield – but online data analysis is the solution.

In science, data analysis uses a more complex approach with advanced techniques to explore and experiment with data. On the other hand, in a business context, data is used to make data-driven decisions that will enable the company to improve its overall performance. In this post, we will cover the analysis of data from an organizational point of view while still going through the scientific and statistical foundations that are fundamental to understanding the basics of data analysis.

To put all of that into perspective, we will answer a host of important analytical questions, explore analytical methods and techniques, while demonstrating how to perform analysis in the real world with a 17-step blueprint for success.

What Is Data Analysis?

Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.

All these various methods are largely based on two core areas: quantitative and qualitative research.

To explain the key differences between qualitative and quantitative research, here’s a video for your viewing pleasure:

Gaining a better understanding of different techniques and methods in quantitative research as well as qualitative insights will give your analyzing efforts a more clearly defined direction, so it’s worth taking the time to allow this particular knowledge to sink in. Additionally, you will be able to create a comprehensive analytical report that will skyrocket your analysis.

Apart from qualitative and quantitative categories, there are also other types of data that you should be aware of before dividing into complex data analysis processes. These categories include:

Big data: Refers to massive data sets that need to be analyzed using advanced software to reveal patterns and trends. It is considered to be one of the best analytical assets as it provides larger volumes of data at a faster rate.
Metadata: Putting it simply, metadata is data that provides insights about other data. It summarizes key information about specific data that makes it easier to find and reuse for later purposes.
Real time data: As its name suggests, real time data is presented as soon as it is acquired. From an organizational perspective, this is the most valuable data as it can help you make important decisions based on the latest developments. Our guide on real time analytics will tell you more about the topic.
Machine data: This is more complex data that is generated solely by a machine such as phones, computers, or even websites and embedded systems, without previous human interaction.

Why Is Data Analysis Important?

Before we go into detail about the categories of analysis along with its methods and techniques, you must understand the potential that analyzing data can bring to your organization.

Informed decision-making : From a management perspective, you can benefit from analyzing your data as it helps you make decisions based on facts and not simple intuition. For instance, you can understand where to invest your capital, detect growth opportunities, predict your income, or tackle uncommon situations before they become problems. Through this, you can extract relevant insights from all areas in your organization, and with the help of dashboard software , present the data in a professional and interactive way to different stakeholders.
Reduce costs : Another great benefit is to reduce costs. With the help of advanced technologies such as predictive analytics, businesses can spot improvement opportunities, trends, and patterns in their data and plan their strategies accordingly. In time, this will help you save money and resources on implementing the wrong strategies. And not just that, by predicting different scenarios such as sales and demand you can also anticipate production and supply.
Target customers better : Customers are arguably the most crucial element in any business. By using analytics to get a 360° vision of all aspects related to your customers, you can understand which channels they use to communicate with you, their demographics, interests, habits, purchasing behaviors, and more. In the long run, it will drive success to your marketing strategies, allow you to identify new potential customers, and avoid wasting resources on targeting the wrong people or sending the wrong message. You can also track customer satisfaction by analyzing your client’s reviews or your customer service department’s performance.

What Is The Data Analysis Process?

When we talk about analyzing data there is an order to follow in order to extract the needed conclusions. The analysis process consists of 5 key stages. We will cover each of them more in detail later in the post, but to start providing the needed context to understand what is coming next, here is a rundown of the 5 essential steps of data analysis.

Identify: Before you get your hands dirty with data, you first need to identify why you need it in the first place. The identification is the stage in which you establish the questions you will need to answer. For example, what is the customer's perception of our brand? Or what type of packaging is more engaging to our potential customers? Once the questions are outlined you are ready for the next step.
Collect: As its name suggests, this is the stage where you start collecting the needed data. Here, you define which sources of data you will use and how you will use them. The collection of data can come in different forms such as internal or external sources, surveys, interviews, questionnaires, and focus groups, among others. An important note here is that the way you collect the data will be different in a quantitative and qualitative scenario.
Clean: Once you have the necessary data it is time to clean it and leave it ready for analysis. Not all the data you collect will be useful, when collecting big amounts of data in different formats it is very likely that you will find yourself with duplicate or badly formatted data. To avoid this, before you start working with your data you need to make sure to erase any white spaces, duplicate records, or formatting errors. This way you avoid hurting your analysis with bad-quality data.
Analyze : With the help of various techniques such as statistical analysis, regressions, neural networks, text analysis, and more, you can start analyzing and manipulating your data to extract relevant conclusions. At this stage, you find trends, correlations, variations, and patterns that can help you answer the questions you first thought of in the identify stage. Various technologies in the market assist researchers and average users with the management of their data. Some of them include business intelligence and visualization software, predictive analytics, and data mining, among others.
Interpret: Last but not least you have one of the most important steps: it is time to interpret your results. This stage is where the researcher comes up with courses of action based on the findings. For example, here you would understand if your clients prefer packaging that is red or green, plastic or paper, etc. Additionally, at this stage, you can also find some limitations and work on them.

Now that you have a basic understanding of the key data analysis steps, let’s look at the top 17 essential methods.

17 Essential Types Of Data Analysis Methods

Before diving into the 17 essential types of methods, it is important that we go over really fast through the main analysis categories. Starting with the category of descriptive up to prescriptive analysis, the complexity and effort of data evaluation increases, but also the added value for the company.

a) Descriptive analysis - What happened.

The descriptive analysis method is the starting point for any analytic reflection, and it aims to answer the question of what happened? It does this by ordering, manipulating, and interpreting raw data from various sources to turn it into valuable insights for your organization.

Performing descriptive analysis is essential, as it enables us to present our insights in a meaningful way. Although it is relevant to mention that this analysis on its own will not allow you to predict future outcomes or tell you the answer to questions like why something happened, it will leave your data organized and ready to conduct further investigations.

b) Exploratory analysis - How to explore data relationships.

As its name suggests, the main aim of the exploratory analysis is to explore. Prior to it, there is still no notion of the relationship between the data and the variables. Once the data is investigated, exploratory analysis helps you to find connections and generate hypotheses and solutions for specific problems. A typical area of application for it is data mining.

c) Diagnostic analysis - Why it happened.

Diagnostic data analytics empowers analysts and executives by helping them gain a firm contextual understanding of why something happened. If you know why something happened as well as how it happened, you will be able to pinpoint the exact ways of tackling the issue or challenge.

Designed to provide direct and actionable answers to specific questions, this is one of the world’s most important methods in research, among its other key organizational functions such as retail analytics , e.g.

c) Predictive analysis - What will happen.

The predictive method allows you to look into the future to answer the question: what will happen? In order to do this, it uses the results of the previously mentioned descriptive, exploratory, and diagnostic analysis, in addition to machine learning (ML) and artificial intelligence (AI). Through this, you can uncover future trends, potential problems or inefficiencies, connections, and casualties in your data.

With predictive analysis, you can unfold and develop initiatives that will not only enhance your various operational processes but also help you gain an all-important edge over the competition. If you understand why a trend, pattern, or event happened through data, you will be able to develop an informed projection of how things may unfold in particular areas of the business.

e) Prescriptive analysis - How will it happen.

Another of the most effective types of analysis methods in research. Prescriptive data techniques cross over from predictive analysis in the way that it revolves around using patterns or trends to develop responsive, practical business strategies.

By drilling down into prescriptive analysis, you will play an active role in the data consumption process by taking well-arranged sets of visual data and using it as a powerful fix to emerging issues in a number of key areas, including marketing, sales, customer experience, HR, fulfillment, finance, logistics analytics , and others.

As mentioned at the beginning of the post, data analysis methods can be divided into two big categories: quantitative and qualitative. Each of these categories holds a powerful analytical value that changes depending on the scenario and type of data you are working with. Below, we will discuss 17 methods that are divided into qualitative and quantitative approaches.

Without further ado, here are the 17 essential types of data analysis methods with some use cases in the business world:

A. Quantitative Methods

To put it simply, quantitative analysis refers to all methods that use numerical data or data that can be turned into numbers (e.g. category variables like gender, age, etc.) to extract valuable insights. It is used to extract valuable conclusions about relationships, differences, and test hypotheses. Below we discuss some of the key quantitative methods.

1. Cluster analysis

The action of grouping a set of data elements in a way that said elements are more similar (in a particular sense) to each other than to those in other groups – hence the term ‘cluster.’ Since there is no target variable when clustering, the method is often used to find hidden patterns in the data. The approach is also used to provide additional context to a trend or dataset.

Let's look at it from an organizational perspective. In a perfect world, marketers would be able to analyze each customer separately and give them the best-personalized service, but let's face it, with a large customer base, it is timely impossible to do that. That's where clustering comes in. By grouping customers into clusters based on demographics, purchasing behaviors, monetary value, or any other factor that might be relevant for your company, you will be able to immediately optimize your efforts and give your customers the best experience based on their needs.

2. Cohort analysis

This type of data analysis approach uses historical data to examine and compare a determined segment of users' behavior, which can then be grouped with others with similar characteristics. By using this methodology, it's possible to gain a wealth of insight into consumer needs or a firm understanding of a broader target group.

Cohort analysis can be really useful for performing analysis in marketing as it will allow you to understand the impact of your campaigns on specific groups of customers. To exemplify, imagine you send an email campaign encouraging customers to sign up for your site. For this, you create two versions of the campaign with different designs, CTAs, and ad content. Later on, you can use cohort analysis to track the performance of the campaign for a longer period of time and understand which type of content is driving your customers to sign up, repurchase, or engage in other ways.

A useful tool to start performing cohort analysis method is Google Analytics. You can learn more about the benefits and limitations of using cohorts in GA in this useful guide . In the bottom image, you see an example of how you visualize a cohort in this tool. The segments (devices traffic) are divided into date cohorts (usage of devices) and then analyzed week by week to extract insights into performance.

Cohort analysis chart example from google analytics

3. Regression analysis

Regression uses historical data to understand how a dependent variable's value is affected when one (linear regression) or more independent variables (multiple regression) change or stay the same. By understanding each variable's relationship and how it developed in the past, you can anticipate possible outcomes and make better decisions in the future.

Let's bring it down with an example. Imagine you did a regression analysis of your sales in 2019 and discovered that variables like product quality, store design, customer service, marketing campaigns, and sales channels affected the overall result. Now you want to use regression to analyze which of these variables changed or if any new ones appeared during 2020. For example, you couldn’t sell as much in your physical store due to COVID lockdowns. Therefore, your sales could’ve either dropped in general or increased in your online channels. Through this, you can understand which independent variables affected the overall performance of your dependent variable, annual sales.

If you want to go deeper into this type of analysis, check out this article and learn more about how you can benefit from regression.

4. Neural networks

The neural network forms the basis for the intelligent algorithms of machine learning. It is a form of analytics that attempts, with minimal intervention, to understand how the human brain would generate insights and predict values. Neural networks learn from each and every data transaction, meaning that they evolve and advance over time.

A typical area of application for neural networks is predictive analytics. There are BI reporting tools that have this feature implemented within them, such as the Predictive Analytics Tool from datapine. This tool enables users to quickly and easily generate all kinds of predictions. All you have to do is select the data to be processed based on your KPIs, and the software automatically calculates forecasts based on historical and current data. Thanks to its user-friendly interface, anyone in your organization can manage it; there’s no need to be an advanced scientist.

Here is an example of how you can use the predictive analysis tool from datapine:

Example on how to use predictive analytics tool from datapine

**click to enlarge**

5. Factor analysis

The factor analysis also called “dimension reduction” is a type of data analysis used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. The aim here is to uncover independent latent variables, an ideal method for streamlining specific segments.

A good way to understand this data analysis method is a customer evaluation of a product. The initial assessment is based on different variables like color, shape, wearability, current trends, materials, comfort, the place where they bought the product, and frequency of usage. Like this, the list can be endless, depending on what you want to track. In this case, factor analysis comes into the picture by summarizing all of these variables into homogenous groups, for example, by grouping the variables color, materials, quality, and trends into a brother latent variable of design.

If you want to start analyzing data using factor analysis we recommend you take a look at this practical guide from UCLA.

6. Data mining

A method of data analysis that is the umbrella term for engineering metrics and insights for additional value, direction, and context. By using exploratory statistical evaluation, data mining aims to identify dependencies, relations, patterns, and trends to generate advanced knowledge. When considering how to analyze data, adopting a data mining mindset is essential to success - as such, it’s an area that is worth exploring in greater detail.

An excellent use case of data mining is datapine intelligent data alerts . With the help of artificial intelligence and machine learning, they provide automated signals based on particular commands or occurrences within a dataset. For example, if you’re monitoring supply chain KPIs , you could set an intelligent alarm to trigger when invalid or low-quality data appears. By doing so, you will be able to drill down deep into the issue and fix it swiftly and effectively.

In the following picture, you can see how the intelligent alarms from datapine work. By setting up ranges on daily orders, sessions, and revenues, the alarms will notify you if the goal was not completed or if it exceeded expectations.

Example on how to use intelligent alerts from datapine

7. Time series analysis

As its name suggests, time series analysis is used to analyze a set of data points collected over a specified period of time. Although analysts use this method to monitor the data points in a specific interval of time rather than just monitoring them intermittently, the time series analysis is not uniquely used for the purpose of collecting data over time. Instead, it allows researchers to understand if variables changed during the duration of the study, how the different variables are dependent, and how did it reach the end result.

In a business context, this method is used to understand the causes of different trends and patterns to extract valuable insights. Another way of using this method is with the help of time series forecasting. Powered by predictive technologies, businesses can analyze various data sets over a period of time and forecast different future events.

A great use case to put time series analysis into perspective is seasonality effects on sales. By using time series forecasting to analyze sales data of a specific product over time, you can understand if sales rise over a specific period of time (e.g. swimwear during summertime, or candy during Halloween). These insights allow you to predict demand and prepare production accordingly.

8. Decision Trees

The decision tree analysis aims to act as a support tool to make smart and strategic decisions. By visually displaying potential outcomes, consequences, and costs in a tree-like model, researchers and company users can easily evaluate all factors involved and choose the best course of action. Decision trees are helpful to analyze quantitative data and they allow for an improved decision-making process by helping you spot improvement opportunities, reduce costs, and enhance operational efficiency and production.

But how does a decision tree actually works? This method works like a flowchart that starts with the main decision that you need to make and branches out based on the different outcomes and consequences of each decision. Each outcome will outline its own consequences, costs, and gains and, at the end of the analysis, you can compare each of them and make the smartest decision.

Businesses can use them to understand which project is more cost-effective and will bring more earnings in the long run. For example, imagine you need to decide if you want to update your software app or build a new app entirely. Here you would compare the total costs, the time needed to be invested, potential revenue, and any other factor that might affect your decision. In the end, you would be able to see which of these two options is more realistic and attainable for your company or research.

9. Conjoint analysis

Last but not least, we have the conjoint analysis. This approach is usually used in surveys to understand how individuals value different attributes of a product or service and it is one of the most effective methods to extract consumer preferences. When it comes to purchasing, some clients might be more price-focused, others more features-focused, and others might have a sustainable focus. Whatever your customer's preferences are, you can find them with conjoint analysis. Through this, companies can define pricing strategies, packaging options, subscription packages, and more.

A great example of conjoint analysis is in marketing and sales. For instance, a cupcake brand might use conjoint analysis and find that its clients prefer gluten-free options and cupcakes with healthier toppings over super sugary ones. Thus, the cupcake brand can turn these insights into advertisements and promotions to increase sales of this particular type of product. And not just that, conjoint analysis can also help businesses segment their customers based on their interests. This allows them to send different messaging that will bring value to each of the segments.

10. Correspondence Analysis

Also known as reciprocal averaging, correspondence analysis is a method used to analyze the relationship between categorical variables presented within a contingency table. A contingency table is a table that displays two (simple correspondence analysis) or more (multiple correspondence analysis) categorical variables across rows and columns that show the distribution of the data, which is usually answers to a survey or questionnaire on a specific topic.

This method starts by calculating an “expected value” which is done by multiplying row and column averages and dividing it by the overall original value of the specific table cell. The “expected value” is then subtracted from the original value resulting in a “residual number” which is what allows you to extract conclusions about relationships and distribution. The results of this analysis are later displayed using a map that represents the relationship between the different values. The closest two values are in the map, the bigger the relationship. Let’s put it into perspective with an example.

Imagine you are carrying out a market research analysis about outdoor clothing brands and how they are perceived by the public. For this analysis, you ask a group of people to match each brand with a certain attribute which can be durability, innovation, quality materials, etc. When calculating the residual numbers, you can see that brand A has a positive residual for innovation but a negative one for durability. This means that brand A is not positioned as a durable brand in the market, something that competitors could take advantage of.

11. Multidimensional Scaling (MDS)

MDS is a method used to observe the similarities or disparities between objects which can be colors, brands, people, geographical coordinates, and more. The objects are plotted using an “MDS map” that positions similar objects together and disparate ones far apart. The (dis) similarities between objects are represented using one or more dimensions that can be observed using a numerical scale. For example, if you want to know how people feel about the COVID-19 vaccine, you can use 1 for “don’t believe in the vaccine at all” and 10 for “firmly believe in the vaccine” and a scale of 2 to 9 for in between responses. When analyzing an MDS map the only thing that matters is the distance between the objects, the orientation of the dimensions is arbitrary and has no meaning at all.

Multidimensional scaling is a valuable technique for market research, especially when it comes to evaluating product or brand positioning. For instance, if a cupcake brand wants to know how they are positioned compared to competitors, it can define 2-3 dimensions such as taste, ingredients, shopping experience, or more, and do a multidimensional scaling analysis to find improvement opportunities as well as areas in which competitors are currently leading.

Another business example is in procurement when deciding on different suppliers. Decision makers can generate an MDS map to see how the different prices, delivery times, technical services, and more of the different suppliers differ and pick the one that suits their needs the best.

A final example proposed by a research paper on "An Improved Study of Multilevel Semantic Network Visualization for Analyzing Sentiment Word of Movie Review Data". Researchers picked a two-dimensional MDS map to display the distances and relationships between different sentiments in movie reviews. They used 36 sentiment words and distributed them based on their emotional distance as we can see in the image below where the words "outraged" and "sweet" are on opposite sides of the map, marking the distance between the two emotions very clearly.

Example of multidimensional scaling analysis

Aside from being a valuable technique to analyze dissimilarities, MDS also serves as a dimension-reduction technique for large dimensional data.

B. Qualitative Methods

Qualitative data analysis methods are defined as the observation of non-numerical data that is gathered and produced using methods of observation such as interviews, focus groups, questionnaires, and more. As opposed to quantitative methods, qualitative data is more subjective and highly valuable in analyzing customer retention and product development.

12. Text analysis

Text analysis, also known in the industry as text mining, works by taking large sets of textual data and arranging them in a way that makes it easier to manage. By working through this cleansing process in stringent detail, you will be able to extract the data that is truly relevant to your organization and use it to develop actionable insights that will propel you forward.

Modern software accelerate the application of text analytics. Thanks to the combination of machine learning and intelligent algorithms, you can perform advanced analytical processes such as sentiment analysis. This technique allows you to understand the intentions and emotions of a text, for example, if it's positive, negative, or neutral, and then give it a score depending on certain factors and categories that are relevant to your brand. Sentiment analysis is often used to monitor brand and product reputation and to understand how successful your customer experience is. To learn more about the topic check out this insightful article .

By analyzing data from various word-based sources, including product reviews, articles, social media communications, and survey responses, you will gain invaluable insights into your audience, as well as their needs, preferences, and pain points. This will allow you to create campaigns, services, and communications that meet your prospects’ needs on a personal level, growing your audience while boosting customer retention. There are various other “sub-methods” that are an extension of text analysis. Each of them serves a more specific purpose and we will look at them in detail next.

13. Content Analysis

This is a straightforward and very popular method that examines the presence and frequency of certain words, concepts, and subjects in different content formats such as text, image, audio, or video. For example, the number of times the name of a celebrity is mentioned on social media or online tabloids. It does this by coding text data that is later categorized and tabulated in a way that can provide valuable insights, making it the perfect mix of quantitative and qualitative analysis.

There are two types of content analysis. The first one is the conceptual analysis which focuses on explicit data, for instance, the number of times a concept or word is mentioned in a piece of content. The second one is relational analysis, which focuses on the relationship between different concepts or words and how they are connected within a specific context.

Content analysis is often used by marketers to measure brand reputation and customer behavior. For example, by analyzing customer reviews. It can also be used to analyze customer interviews and find directions for new product development. It is also important to note, that in order to extract the maximum potential out of this analysis method, it is necessary to have a clearly defined research question.

14. Thematic Analysis

Very similar to content analysis, thematic analysis also helps in identifying and interpreting patterns in qualitative data with the main difference being that the first one can also be applied to quantitative analysis. The thematic method analyzes large pieces of text data such as focus group transcripts or interviews and groups them into themes or categories that come up frequently within the text. It is a great method when trying to figure out peoples view’s and opinions about a certain topic. For example, if you are a brand that cares about sustainability, you can do a survey of your customers to analyze their views and opinions about sustainability and how they apply it to their lives. You can also analyze customer service calls transcripts to find common issues and improve your service.

Thematic analysis is a very subjective technique that relies on the researcher’s judgment. Therefore, to avoid biases, it has 6 steps that include familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. It is also important to note that, because it is a flexible approach, the data can be interpreted in multiple ways and it can be hard to select what data is more important to emphasize.

15. Narrative Analysis

A bit more complex in nature than the two previous ones, narrative analysis is used to explore the meaning behind the stories that people tell and most importantly, how they tell them. By looking into the words that people use to describe a situation you can extract valuable conclusions about their perspective on a specific topic. Common sources for narrative data include autobiographies, family stories, opinion pieces, and testimonials, among others.

From a business perspective, narrative analysis can be useful to analyze customer behaviors and feelings towards a specific product, service, feature, or others. It provides unique and deep insights that can be extremely valuable. However, it has some drawbacks.

The biggest weakness of this method is that the sample sizes are usually very small due to the complexity and time-consuming nature of the collection of narrative data. Plus, the way a subject tells a story will be significantly influenced by his or her specific experiences, making it very hard to replicate in a subsequent study.

16. Discourse Analysis

Discourse analysis is used to understand the meaning behind any type of written, verbal, or symbolic discourse based on its political, social, or cultural context. It mixes the analysis of languages and situations together. This means that the way the content is constructed and the meaning behind it is significantly influenced by the culture and society it takes place in. For example, if you are analyzing political speeches you need to consider different context elements such as the politician's background, the current political context of the country, the audience to which the speech is directed, and so on.

From a business point of view, discourse analysis is a great market research tool. It allows marketers to understand how the norms and ideas of the specific market work and how their customers relate to those ideas. It can be very useful to build a brand mission or develop a unique tone of voice.

17. Grounded Theory Analysis

Traditionally, researchers decide on a method and hypothesis and start to collect the data to prove that hypothesis. The grounded theory is the only method that doesn’t require an initial research question or hypothesis as its value lies in the generation of new theories. With the grounded theory method, you can go into the analysis process with an open mind and explore the data to generate new theories through tests and revisions. In fact, it is not necessary to collect the data and then start to analyze it. Researchers usually start to find valuable insights as they are gathering the data.

All of these elements make grounded theory a very valuable method as theories are fully backed by data instead of initial assumptions. It is a great technique to analyze poorly researched topics or find the causes behind specific company outcomes. For example, product managers and marketers might use the grounded theory to find the causes of high levels of customer churn and look into customer surveys and reviews to develop new theories about the causes.

How To Analyze Data? Top 17 Data Analysis Techniques To Apply

17 top data analysis techniques by datapine

Now that we’ve answered the questions “what is data analysis’”, why is it important, and covered the different data analysis types, it’s time to dig deeper into how to perform your analysis by working through these 17 essential techniques.

1. Collaborate your needs

Before you begin analyzing or drilling down into any techniques, it’s crucial to sit down collaboratively with all key stakeholders within your organization, decide on your primary campaign or strategic goals, and gain a fundamental understanding of the types of insights that will best benefit your progress or provide you with the level of vision you need to evolve your organization.

2. Establish your questions

Once you’ve outlined your core objectives, you should consider which questions will need answering to help you achieve your mission. This is one of the most important techniques as it will shape the very foundations of your success.

To help you ask the right things and ensure your data works for you, you have to ask the right data analysis questions .

3. Data democratization

After giving your data analytics methodology some real direction, and knowing which questions need answering to extract optimum value from the information available to your organization, you should continue with democratization.

Data democratization is an action that aims to connect data from various sources efficiently and quickly so that anyone in your organization can access it at any given moment. You can extract data in text, images, videos, numbers, or any other format. And then perform cross-database analysis to achieve more advanced insights to share with the rest of the company interactively.

Once you have decided on your most valuable sources, you need to take all of this into a structured format to start collecting your insights. For this purpose, datapine offers an easy all-in-one data connectors feature to integrate all your internal and external sources and manage them at your will. Additionally, datapine’s end-to-end solution automatically updates your data, allowing you to save time and focus on performing the right analysis to grow your company.

4. Think of governance

When collecting data in a business or research context you always need to think about security and privacy. With data breaches becoming a topic of concern for businesses, the need to protect your client's or subject’s sensitive information becomes critical.

To ensure that all this is taken care of, you need to think of a data governance strategy. According to Gartner , this concept refers to “ the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics .” In simpler words, data governance is a collection of processes, roles, and policies, that ensure the efficient use of data while still achieving the main company goals. It ensures that clear roles are in place for who can access the information and how they can access it. In time, this not only ensures that sensitive information is protected but also allows for an efficient analysis as a whole.

5. Clean your data

After harvesting from so many sources you will be left with a vast amount of information that can be overwhelming to deal with. At the same time, you can be faced with incorrect data that can be misleading to your analysis. The smartest thing you can do to avoid dealing with this in the future is to clean the data. This is fundamental before visualizing it, as it will ensure that the insights you extract from it are correct.

There are many things that you need to look for in the cleaning process. The most important one is to eliminate any duplicate observations; this usually appears when using multiple internal and external sources of information. You can also add any missing codes, fix empty fields, and eliminate incorrectly formatted data.

Another usual form of cleaning is done with text data. As we mentioned earlier, most companies today analyze customer reviews, social media comments, questionnaires, and several other text inputs. In order for algorithms to detect patterns, text data needs to be revised to avoid invalid characters or any syntax or spelling errors.

Most importantly, the aim of cleaning is to prevent you from arriving at false conclusions that can damage your company in the long run. By using clean data, you will also help BI solutions to interact better with your information and create better reports for your organization.

6. Set your KPIs

Once you’ve set your sources, cleaned your data, and established clear-cut questions you want your insights to answer, you need to set a host of key performance indicators (KPIs) that will help you track, measure, and shape your progress in a number of key areas.

KPIs are critical to both qualitative and quantitative analysis research. This is one of the primary methods of data analysis you certainly shouldn’t overlook.

To help you set the best possible KPIs for your initiatives and activities, here is an example of a relevant logistics KPI : transportation-related costs. If you want to see more go explore our collection of key performance indicator examples .

7. Omit useless data

Having bestowed your data analysis tools and techniques with true purpose and defined your mission, you should explore the raw data you’ve collected from all sources and use your KPIs as a reference for chopping out any information you deem to be useless.

Trimming the informational fat is one of the most crucial methods of analysis as it will allow you to focus your analytical efforts and squeeze every drop of value from the remaining ‘lean’ information.

Any stats, facts, figures, or metrics that don’t align with your business goals or fit with your KPI management strategies should be eliminated from the equation.

8. Build a data management roadmap

While, at this point, this particular step is optional (you will have already gained a wealth of insight and formed a fairly sound strategy by now), creating a data governance roadmap will help your data analysis methods and techniques become successful on a more sustainable basis. These roadmaps, if developed properly, are also built so they can be tweaked and scaled over time.

Invest ample time in developing a roadmap that will help you store, manage, and handle your data internally, and you will make your analysis techniques all the more fluid and functional – one of the most powerful types of data analysis methods available today.

9. Integrate technology

There are many ways to analyze data, but one of the most vital aspects of analytical success in a business context is integrating the right decision support software and technology.

Robust analysis platforms will not only allow you to pull critical data from your most valuable sources while working with dynamic KPIs that will offer you actionable insights; it will also present them in a digestible, visual, interactive format from one central, live dashboard . A data methodology you can count on.

By integrating the right technology within your data analysis methodology, you’ll avoid fragmenting your insights, saving you time and effort while allowing you to enjoy the maximum value from your business’s most valuable insights.

For a look at the power of software for the purpose of analysis and to enhance your methods of analyzing, glance over our selection of dashboard examples .

10. Answer your questions

By considering each of the above efforts, working with the right technology, and fostering a cohesive internal culture where everyone buys into the different ways to analyze data as well as the power of digital intelligence, you will swiftly start to answer your most burning business questions. Arguably, the best way to make your data concepts accessible across the organization is through data visualization.

11. Visualize your data

Online data visualization is a powerful tool as it lets you tell a story with your metrics, allowing users across the organization to extract meaningful insights that aid business evolution – and it covers all the different ways to analyze data.

The purpose of analyzing is to make your entire organization more informed and intelligent, and with the right platform or dashboard, this is simpler than you think, as demonstrated by our marketing dashboard .

An executive dashboard example showcasing high-level marketing KPIs such as cost per lead, MQL, SQL, and cost per customer.

This visual, dynamic, and interactive online dashboard is a data analysis example designed to give Chief Marketing Officers (CMO) an overview of relevant metrics to help them understand if they achieved their monthly goals.

In detail, this example generated with a modern dashboard creator displays interactive charts for monthly revenues, costs, net income, and net income per customer; all of them are compared with the previous month so that you can understand how the data fluctuated. In addition, it shows a detailed summary of the number of users, customers, SQLs, and MQLs per month to visualize the whole picture and extract relevant insights or trends for your marketing reports .

The CMO dashboard is perfect for c-level management as it can help them monitor the strategic outcome of their marketing efforts and make data-driven decisions that can benefit the company exponentially.

12. Be careful with the interpretation

We already dedicated an entire post to data interpretation as it is a fundamental part of the process of data analysis. It gives meaning to the analytical information and aims to drive a concise conclusion from the analysis results. Since most of the time companies are dealing with data from many different sources, the interpretation stage needs to be done carefully and properly in order to avoid misinterpretations.

To help you through the process, here we list three common practices that you need to avoid at all costs when looking at your data:

Correlation vs. causation: The human brain is formatted to find patterns. This behavior leads to one of the most common mistakes when performing interpretation: confusing correlation with causation. Although these two aspects can exist simultaneously, it is not correct to assume that because two things happened together, one provoked the other. A piece of advice to avoid falling into this mistake is never to trust just intuition, trust the data. If there is no objective evidence of causation, then always stick to correlation.
Confirmation bias: This phenomenon describes the tendency to select and interpret only the data necessary to prove one hypothesis, often ignoring the elements that might disprove it. Even if it's not done on purpose, confirmation bias can represent a real problem, as excluding relevant information can lead to false conclusions and, therefore, bad business decisions. To avoid it, always try to disprove your hypothesis instead of proving it, share your analysis with other team members, and avoid drawing any conclusions before the entire analytical project is finalized.
Statistical significance: To put it in short words, statistical significance helps analysts understand if a result is actually accurate or if it happened because of a sampling error or pure chance. The level of statistical significance needed might depend on the sample size and the industry being analyzed. In any case, ignoring the significance of a result when it might influence decision-making can be a huge mistake.

13. Build a narrative

Now, we’re going to look at how you can bring all of these elements together in a way that will benefit your business - starting with a little something called data storytelling.

The human brain responds incredibly well to strong stories or narratives. Once you’ve cleansed, shaped, and visualized your most invaluable data using various BI dashboard tools , you should strive to tell a story - one with a clear-cut beginning, middle, and end.

By doing so, you will make your analytical efforts more accessible, digestible, and universal, empowering more people within your organization to use your discoveries to their actionable advantage.

14. Consider autonomous technology

Autonomous technologies, such as artificial intelligence (AI) and machine learning (ML), play a significant role in the advancement of understanding how to analyze data more effectively.

Gartner predicts that by the end of this year, 80% of emerging technologies will be developed with AI foundations. This is a testament to the ever-growing power and value of autonomous technologies.

At the moment, these technologies are revolutionizing the analysis industry. Some examples that we mentioned earlier are neural networks, intelligent alarms, and sentiment analysis.

15. Share the load

If you work with the right tools and dashboards, you will be able to present your metrics in a digestible, value-driven format, allowing almost everyone in the organization to connect with and use relevant data to their advantage.

Modern dashboards consolidate data from various sources, providing access to a wealth of insights in one centralized location, no matter if you need to monitor recruitment metrics or generate reports that need to be sent across numerous departments. Moreover, these cutting-edge tools offer access to dashboards from a multitude of devices, meaning that everyone within the business can connect with practical insights remotely - and share the load.

Once everyone is able to work with a data-driven mindset, you will catalyze the success of your business in ways you never thought possible. And when it comes to knowing how to analyze data, this kind of collaborative approach is essential.

16. Data analysis tools

In order to perform high-quality analysis of data, it is fundamental to use tools and software that will ensure the best results. Here we leave you a small summary of four fundamental categories of data analysis tools for your organization.

Business Intelligence: BI tools allow you to process significant amounts of data from several sources in any format. Through this, you can not only analyze and monitor your data to extract relevant insights but also create interactive reports and dashboards to visualize your KPIs and use them for your company's good. datapine is an amazing online BI software that is focused on delivering powerful online analysis features that are accessible to beginner and advanced users. Like this, it offers a full-service solution that includes cutting-edge analysis of data, KPIs visualization, live dashboards, reporting, and artificial intelligence technologies to predict trends and minimize risk.
Statistical analysis: These tools are usually designed for scientists, statisticians, market researchers, and mathematicians, as they allow them to perform complex statistical analyses with methods like regression analysis, predictive analysis, and statistical modeling. A good tool to perform this type of analysis is R-Studio as it offers a powerful data modeling and hypothesis testing feature that can cover both academic and general data analysis. This tool is one of the favorite ones in the industry, due to its capability for data cleaning, data reduction, and performing advanced analysis with several statistical methods. Another relevant tool to mention is SPSS from IBM. The software offers advanced statistical analysis for users of all skill levels. Thanks to a vast library of machine learning algorithms, text analysis, and a hypothesis testing approach it can help your company find relevant insights to drive better decisions. SPSS also works as a cloud service that enables you to run it anywhere.
SQL Consoles: SQL is a programming language often used to handle structured data in relational databases. Tools like these are popular among data scientists as they are extremely effective in unlocking these databases' value. Undoubtedly, one of the most used SQL software in the market is MySQL Workbench . This tool offers several features such as a visual tool for database modeling and monitoring, complete SQL optimization, administration tools, and visual performance dashboards to keep track of KPIs.
Data Visualization: These tools are used to represent your data through charts, graphs, and maps that allow you to find patterns and trends in the data. datapine's already mentioned BI platform also offers a wealth of powerful online data visualization tools with several benefits. Some of them include: delivering compelling data-driven presentations to share with your entire company, the ability to see your data online with any device wherever you are, an interactive dashboard design feature that enables you to showcase your results in an interactive and understandable way, and to perform online self-service reports that can be used simultaneously with several other people to enhance team productivity.

17. Refine your process constantly

Last is a step that might seem obvious to some people, but it can be easily ignored if you think you are done. Once you have extracted the needed results, you should always take a retrospective look at your project and think about what you can improve. As you saw throughout this long list of techniques, data analysis is a complex process that requires constant refinement. For this reason, you should always go one step further and keep improving.

Quality Criteria For Data Analysis

So far we’ve covered a list of methods and techniques that should help you perform efficient data analysis. But how do you measure the quality and validity of your results? This is done with the help of some science quality criteria. Here we will go into a more theoretical area that is critical to understanding the fundamentals of statistical analysis in science. However, you should also be aware of these steps in a business context, as they will allow you to assess the quality of your results in the correct way. Let’s dig in.

Internal validity: The results of a survey are internally valid if they measure what they are supposed to measure and thus provide credible results. In other words , internal validity measures the trustworthiness of the results and how they can be affected by factors such as the research design, operational definitions, how the variables are measured, and more. For instance, imagine you are doing an interview to ask people if they brush their teeth two times a day. While most of them will answer yes, you can still notice that their answers correspond to what is socially acceptable, which is to brush your teeth at least twice a day. In this case, you can’t be 100% sure if respondents actually brush their teeth twice a day or if they just say that they do, therefore, the internal validity of this interview is very low.
External validity: Essentially, external validity refers to the extent to which the results of your research can be applied to a broader context. It basically aims to prove that the findings of a study can be applied in the real world. If the research can be applied to other settings, individuals, and times, then the external validity is high.
Reliability : If your research is reliable, it means that it can be reproduced. If your measurement were repeated under the same conditions, it would produce similar results. This means that your measuring instrument consistently produces reliable results. For example, imagine a doctor building a symptoms questionnaire to detect a specific disease in a patient. Then, various other doctors use this questionnaire but end up diagnosing the same patient with a different condition. This means the questionnaire is not reliable in detecting the initial disease. Another important note here is that in order for your research to be reliable, it also needs to be objective. If the results of a study are the same, independent of who assesses them or interprets them, the study can be considered reliable. Let’s see the objectivity criteria in more detail now.
Objectivity: In data science, objectivity means that the researcher needs to stay fully objective when it comes to its analysis. The results of a study need to be affected by objective criteria and not by the beliefs, personality, or values of the researcher. Objectivity needs to be ensured when you are gathering the data, for example, when interviewing individuals, the questions need to be asked in a way that doesn't influence the results. Paired with this, objectivity also needs to be thought of when interpreting the data. If different researchers reach the same conclusions, then the study is objective. For this last point, you can set predefined criteria to interpret the results to ensure all researchers follow the same steps.

The discussed quality criteria cover mostly potential influences in a quantitative context. Analysis in qualitative research has by default additional subjective influences that must be controlled in a different way. Therefore, there are other quality criteria for this kind of research such as credibility, transferability, dependability, and confirmability. You can see each of them more in detail on this resource .

Data Analysis Limitations & Barriers

Analyzing data is not an easy task. As you’ve seen throughout this post, there are many steps and techniques that you need to apply in order to extract useful information from your research. While a well-performed analysis can bring various benefits to your organization it doesn't come without limitations. In this section, we will discuss some of the main barriers you might encounter when conducting an analysis. Let’s see them more in detail.

Lack of clear goals: No matter how good your data or analysis might be if you don’t have clear goals or a hypothesis the process might be worthless. While we mentioned some methods that don’t require a predefined hypothesis, it is always better to enter the analytical process with some clear guidelines of what you are expecting to get out of it, especially in a business context in which data is utilized to support important strategic decisions.
Objectivity: Arguably one of the biggest barriers when it comes to data analysis in research is to stay objective. When trying to prove a hypothesis, researchers might find themselves, intentionally or unintentionally, directing the results toward an outcome that they want. To avoid this, always question your assumptions and avoid confusing facts with opinions. You can also show your findings to a research partner or external person to confirm that your results are objective.
Data representation: A fundamental part of the analytical procedure is the way you represent your data. You can use various graphs and charts to represent your findings, but not all of them will work for all purposes. Choosing the wrong visual can not only damage your analysis but can mislead your audience, therefore, it is important to understand when to use each type of data depending on your analytical goals. Our complete guide on the types of graphs and charts lists 20 different visuals with examples of when to use them.
Flawed correlation : Misleading statistics can significantly damage your research. We’ve already pointed out a few interpretation issues previously in the post, but it is an important barrier that we can't avoid addressing here as well. Flawed correlations occur when two variables appear related to each other but they are not. Confusing correlations with causation can lead to a wrong interpretation of results which can lead to building wrong strategies and loss of resources, therefore, it is very important to identify the different interpretation mistakes and avoid them.
Sample size: A very common barrier to a reliable and efficient analysis process is the sample size. In order for the results to be trustworthy, the sample size should be representative of what you are analyzing. For example, imagine you have a company of 1000 employees and you ask the question “do you like working here?” to 50 employees of which 49 say yes, which means 95%. Now, imagine you ask the same question to the 1000 employees and 950 say yes, which also means 95%. Saying that 95% of employees like working in the company when the sample size was only 50 is not a representative or trustworthy conclusion. The significance of the results is way more accurate when surveying a bigger sample size.
Privacy concerns: In some cases, data collection can be subjected to privacy regulations. Businesses gather all kinds of information from their customers from purchasing behaviors to addresses and phone numbers. If this falls into the wrong hands due to a breach, it can affect the security and confidentiality of your clients. To avoid this issue, you need to collect only the data that is needed for your research and, if you are using sensitive facts, make it anonymous so customers are protected. The misuse of customer data can severely damage a business's reputation, so it is important to keep an eye on privacy.
Lack of communication between teams : When it comes to performing data analysis on a business level, it is very likely that each department and team will have different goals and strategies. However, they are all working for the same common goal of helping the business run smoothly and keep growing. When teams are not connected and communicating with each other, it can directly affect the way general strategies are built. To avoid these issues, tools such as data dashboards enable teams to stay connected through data in a visually appealing way.
Innumeracy : Businesses are working with data more and more every day. While there are many BI tools available to perform effective analysis, data literacy is still a constant barrier. Not all employees know how to apply analysis techniques or extract insights from them. To prevent this from happening, you can implement different training opportunities that will prepare every relevant user to deal with data.

Key Data Analysis Skills

As you've learned throughout this lengthy guide, analyzing data is a complex task that requires a lot of knowledge and skills. That said, thanks to the rise of self-service tools the process is way more accessible and agile than it once was. Regardless, there are still some key skills that are valuable to have when working with data, we list the most important ones below.

Critical and statistical thinking: To successfully analyze data you need to be creative and think out of the box. Yes, that might sound like a weird statement considering that data is often tight to facts. However, a great level of critical thinking is required to uncover connections, come up with a valuable hypothesis, and extract conclusions that go a step further from the surface. This, of course, needs to be complemented by statistical thinking and an understanding of numbers.
Data cleaning: Anyone who has ever worked with data before will tell you that the cleaning and preparation process accounts for 80% of a data analyst's work, therefore, the skill is fundamental. But not just that, not cleaning the data adequately can also significantly damage the analysis which can lead to poor decision-making in a business scenario. While there are multiple tools that automate the cleaning process and eliminate the possibility of human error, it is still a valuable skill to dominate.
Data visualization: Visuals make the information easier to understand and analyze, not only for professional users but especially for non-technical ones. Having the necessary skills to not only choose the right chart type but know when to apply it correctly is key. This also means being able to design visually compelling charts that make the data exploration process more efficient.
SQL: The Structured Query Language or SQL is a programming language used to communicate with databases. It is fundamental knowledge as it enables you to update, manipulate, and organize data from relational databases which are the most common databases used by companies. It is fairly easy to learn and one of the most valuable skills when it comes to data analysis.
Communication skills: This is a skill that is especially valuable in a business environment. Being able to clearly communicate analytical outcomes to colleagues is incredibly important, especially when the information you are trying to convey is complex for non-technical people. This applies to in-person communication as well as written format, for example, when generating a dashboard or report. While this might be considered a “soft” skill compared to the other ones we mentioned, it should not be ignored as you most likely will need to share analytical findings with others no matter the context.

Data Analysis In The Big Data Environment

Big data is invaluable to today’s businesses, and by using different methods for data analysis, it’s possible to view your data in a way that can help you turn insight into positive action.

To inspire your efforts and put the importance of big data into context, here are some insights that you should know:

By 2026 the industry of big data is expected to be worth approximately $273.4 billion.
94% of enterprises say that analyzing data is important for their growth and digital transformation.
Companies that exploit the full potential of their data can increase their operating margins by 60% .
We already told you the benefits of Artificial Intelligence through this article. This industry's financial impact is expected to grow up to $40 billion by 2025.

Data analysis concepts may come in many forms, but fundamentally, any solid methodology will help to make your business more streamlined, cohesive, insightful, and successful than ever before.

Key Takeaways From Data Analysis

As we reach the end of our data analysis journey, we leave a small summary of the main methods and techniques to perform excellent analysis and grow your business.

17 Essential Types of Data Analysis Methods:

Cluster analysis
Cohort analysis
Regression analysis
Factor analysis
Neural Networks
Data Mining
Text analysis
Time series analysis
Decision trees
Conjoint analysis
Correspondence Analysis
Multidimensional Scaling
Content analysis
Thematic analysis
Narrative analysis
Grounded theory analysis
Discourse analysis

Top 17 Data Analysis Techniques:

Collaborate your needs
Establish your questions
Data democratization
Think of data governance
Clean your data
Set your KPIs
Omit useless data
Build a data management roadmap
Integrate technology
Answer your questions
Visualize your data
Interpretation of data
Consider autonomous technology
Build a narrative
Share the load
Data Analysis tools
Refine your process constantly

We’ve pondered the data analysis definition and drilled down into the practical applications of data-centric analytics, and one thing is clear: by taking measures to arrange your data and making your metrics work for you, it’s possible to transform raw information into action - the kind of that will push your business to the next level.

Yes, good data analytics techniques result in enhanced business intelligence (BI). To help you understand this notion in more detail, read our exploration of business intelligence reporting .

And, if you’re ready to perform your own analysis, drill down into your facts and figures while interacting with your data on astonishing visuals, you can try our software for a free, 14-day trial .

Data Analysis

Introduction to Data Analysis
Quantitative Analysis Tools
Qualitative Analysis Tools
Mixed Methods Analysis
Geospatial Analysis
Further Reading

What is Data Analysis?

According to the federal government, data analysis is "the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data" ( Responsible Conduct in Data Management ). Important components of data analysis include searching for patterns, remaining unbiased in drawing inference from data, practicing responsible data management , and maintaining "honest and accurate analysis" ( Responsible Conduct in Data Management ).

In order to understand data analysis further, it can be helpful to take a step back and understand the question "What is data?". Many of us associate data with spreadsheets of numbers and values, however, data can encompass much more than that. According to the federal government, data is "The recorded factual material commonly accepted in the scientific community as necessary to validate research findings" ( OMB Circular 110 ). This broad definition can include information in many formats.

Some examples of types of data are as follows:

Photographs
Hand-written notes from field observation
Machine learning training data sets
Ethnographic interview transcripts
Sheet music
Scripts for plays and musicals
Observations from laboratory experiments ( CMU Data 101 )

Thus, data analysis includes the processing and manipulation of these data sources in order to gain additional insight from data, answer a research question, or confirm a research hypothesis.

Data analysis falls within the larger research data lifecycle, as seen below.

( University of Virginia )

Why Analyze Data?

Through data analysis, a researcher can gain additional insight from data and draw conclusions to address the research question or hypothesis. Use of data analysis tools helps researchers understand and interpret data.

What are the Types of Data Analysis?

Data analysis can be quantitative, qualitative, or mixed methods.

Quantitative research typically involves numbers and "close-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures ( Creswell & Creswell, 2018 , p. 4). Quantitative analysis usually uses deductive reasoning.

Qualitative research typically involves words and "open-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). According to Creswell & Creswell, "qualitative research is an approach for exploring and understanding the meaning individuals or groups ascribe to a social or human problem" ( 2018 , p. 4). Thus, qualitative analysis usually invokes inductive reasoning.

Mixed methods research uses methods from both quantitative and qualitative research approaches. Mixed methods research works under the "core assumption... that the integration of qualitative and quantitative data yields additional insight beyond the information provided by either the quantitative or qualitative data alone" ( Creswell & Creswell, 2018 , p. 4).

Next: Planning >>
Last Updated: May 3, 2024 9:38 AM
URL: https://guides.library.georgetown.edu/data-analysis

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Can J Hosp Pharm
v.68(4); Jul-Aug 2015

Creating a Data Analysis Plan: What to Consider When Choosing Statistics for a Study

There are three kinds of lies: lies, damned lies, and statistics. – Mark Twain 1

INTRODUCTION

Statistics represent an essential part of a study because, regardless of the study design, investigators need to summarize the collected information for interpretation and presentation to others. It is therefore important for us to heed Mr Twain’s concern when creating the data analysis plan. In fact, even before data collection begins, we need to have a clear analysis plan that will guide us from the initial stages of summarizing and describing the data through to testing our hypotheses.

The purpose of this article is to help you create a data analysis plan for a quantitative study. For those interested in conducting qualitative research, previous articles in this Research Primer series have provided information on the design and analysis of such studies. 2 , 3 Information in the current article is divided into 3 main sections: an overview of terms and concepts used in data analysis, a review of common methods used to summarize study data, and a process to help identify relevant statistical tests. My intention here is to introduce the main elements of data analysis and provide a place for you to start when planning this part of your study. Biostatistical experts, textbooks, statistical software packages, and other resources can certainly add more breadth and depth to this topic when you need additional information and advice.

TERMS AND CONCEPTS USED IN DATA ANALYSIS

When analyzing information from a quantitative study, we are often dealing with numbers; therefore, it is important to begin with an understanding of the source of the numbers. Let us start with the term variable , which defines a specific item of information collected in a study. Examples of variables include age, sex or gender, ethnicity, exercise frequency, weight, treatment group, and blood glucose. Each variable will have a group of categories, which are referred to as values , to help describe the characteristic of an individual study participant. For example, the variable “sex” would have values of “male” and “female”.

Although variables can be defined or grouped in various ways, I will focus on 2 methods at this introductory stage. First, variables can be defined according to the level of measurement. The categories in a nominal variable are names, for example, male and female for the variable “sex”; white, Aboriginal, black, Latin American, South Asian, and East Asian for the variable “ethnicity”; and intervention and control for the variable “treatment group”. Nominal variables with only 2 categories are also referred to as dichotomous variables because the study group can be divided into 2 subgroups based on information in the variable. For example, a study sample can be split into 2 groups (patients receiving the intervention and controls) using the dichotomous variable “treatment group”. An ordinal variable implies that the categories can be placed in a meaningful order, as would be the case for exercise frequency (never, sometimes, often, or always). Nominal-level and ordinal-level variables are also referred to as categorical variables, because each category in the variable can be completely separated from the others. The categories for an interval variable can be placed in a meaningful order, with the interval between consecutive categories also having meaning. Age, weight, and blood glucose can be considered as interval variables, but also as ratio variables, because the ratio between values has meaning (e.g., a 15-year-old is half the age of a 30-year-old). Interval-level and ratio-level variables are also referred to as continuous variables because of the underlying continuity among categories.

As we progress through the levels of measurement from nominal to ratio variables, we gather more information about the study participant. The amount of information that a variable provides will become important in the analysis stage, because we lose information when variables are reduced or aggregated—a common practice that is not recommended. 4 For example, if age is reduced from a ratio-level variable (measured in years) to an ordinal variable (categories of < 65 and ≥ 65 years) we lose the ability to make comparisons across the entire age range and introduce error into the data analysis. 4

A second method of defining variables is to consider them as either dependent or independent. As the terms imply, the value of a dependent variable depends on the value of other variables, whereas the value of an independent variable does not rely on other variables. In addition, an investigator can influence the value of an independent variable, such as treatment-group assignment. Independent variables are also referred to as predictors because we can use information from these variables to predict the value of a dependent variable. Building on the group of variables listed in the first paragraph of this section, blood glucose could be considered a dependent variable, because its value may depend on values of the independent variables age, sex, ethnicity, exercise frequency, weight, and treatment group.

Statistics are mathematical formulae that are used to organize and interpret the information that is collected through variables. There are 2 general categories of statistics, descriptive and inferential. Descriptive statistics are used to describe the collected information, such as the range of values, their average, and the most common category. Knowledge gained from descriptive statistics helps investigators learn more about the study sample. Inferential statistics are used to make comparisons and draw conclusions from the study data. Knowledge gained from inferential statistics allows investigators to make inferences and generalize beyond their study sample to other groups.

Before we move on to specific descriptive and inferential statistics, there are 2 more definitions to review. Parametric statistics are generally used when values in an interval-level or ratio-level variable are normally distributed (i.e., the entire group of values has a bell-shaped curve when plotted by frequency). These statistics are used because we can define parameters of the data, such as the centre and width of the normally distributed curve. In contrast, interval-level and ratio-level variables with values that are not normally distributed, as well as nominal-level and ordinal-level variables, are generally analyzed using nonparametric statistics.

METHODS FOR SUMMARIZING STUDY DATA: DESCRIPTIVE STATISTICS

The first step in a data analysis plan is to describe the data collected in the study. This can be done using figures to give a visual presentation of the data and statistics to generate numeric descriptions of the data.

Selection of an appropriate figure to represent a particular set of data depends on the measurement level of the variable. Data for nominal-level and ordinal-level variables may be interpreted using a pie graph or bar graph . Both options allow us to examine the relative number of participants within each category (by reporting the percentages within each category), whereas a bar graph can also be used to examine absolute numbers. For example, we could create a pie graph to illustrate the proportions of men and women in a study sample and a bar graph to illustrate the number of people who report exercising at each level of frequency (never, sometimes, often, or always).

Interval-level and ratio-level variables may also be interpreted using a pie graph or bar graph; however, these types of variables often have too many categories for such graphs to provide meaningful information. Instead, these variables may be better interpreted using a histogram . Unlike a bar graph, which displays the frequency for each distinct category, a histogram displays the frequency within a range of continuous categories. Information from this type of figure allows us to determine whether the data are normally distributed. In addition to pie graphs, bar graphs, and histograms, many other types of figures are available for the visual representation of data. Interested readers can find additional types of figures in the books recommended in the “Further Readings” section.

Figures are also useful for visualizing comparisons between variables or between subgroups within a variable (for example, the distribution of blood glucose according to sex). Box plots are useful for summarizing information for a variable that does not follow a normal distribution. The lower and upper limits of the box identify the interquartile range (or 25th and 75th percentiles), while the midline indicates the median value (or 50th percentile). Scatter plots provide information on how the categories for one continuous variable relate to categories in a second variable; they are often helpful in the analysis of correlations.

In addition to using figures to present a visual description of the data, investigators can use statistics to provide a numeric description. Regardless of the measurement level, we can find the mode by identifying the most frequent category within a variable. When summarizing nominal-level and ordinal-level variables, the simplest method is to report the proportion of participants within each category.

The choice of the most appropriate descriptive statistic for interval-level and ratio-level variables will depend on how the values are distributed. If the values are normally distributed, we can summarize the information using the parametric statistics of mean and standard deviation. The mean is the arithmetic average of all values within the variable, and the standard deviation tells us how widely the values are dispersed around the mean. When values of interval-level and ratio-level variables are not normally distributed, or we are summarizing information from an ordinal-level variable, it may be more appropriate to use the nonparametric statistics of median and range. The first step in identifying these descriptive statistics is to arrange study participants according to the variable categories from lowest value to highest value. The range is used to report the lowest and highest values. The median or 50th percentile is located by dividing the number of participants into 2 groups, such that half (50%) of the participants have values above the median and the other half (50%) have values below the median. Similarly, the 25th percentile is the value with 25% of the participants having values below and 75% of the participants having values above, and the 75th percentile is the value with 75% of participants having values below and 25% of participants having values above. Together, the 25th and 75th percentiles define the interquartile range .

PROCESS TO IDENTIFY RELEVANT STATISTICAL TESTS: INFERENTIAL STATISTICS

One caveat about the information provided in this section: selecting the most appropriate inferential statistic for a specific study should be a combination of following these suggestions, seeking advice from experts, and discussing with your co-investigators. My intention here is to give you a place to start a conversation with your colleagues about the options available as you develop your data analysis plan.

There are 3 key questions to consider when selecting an appropriate inferential statistic for a study: What is the research question? What is the study design? and What is the level of measurement? It is important for investigators to carefully consider these questions when developing the study protocol and creating the analysis plan. The figures that accompany these questions show decision trees that will help you to narrow down the list of inferential statistics that would be relevant to a particular study. Appendix 1 provides brief definitions of the inferential statistics named in these figures. Additional information, such as the formulae for various inferential statistics, can be obtained from textbooks, statistical software packages, and biostatisticians.

What Is the Research Question?

The first step in identifying relevant inferential statistics for a study is to consider the type of research question being asked. You can find more details about the different types of research questions in a previous article in this Research Primer series that covered questions and hypotheses. 5 A relational question seeks information about the relationship among variables; in this situation, investigators will be interested in determining whether there is an association ( Figure 1 ). A causal question seeks information about the effect of an intervention on an outcome; in this situation, the investigator will be interested in determining whether there is a difference ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is cjhp-68-311f1.jpg

Decision tree to identify inferential statistics for an association.

An external file that holds a picture, illustration, etc.
Object name is cjhp-68-311f2.jpg

Decision tree to identify inferential statistics for measuring a difference.

What Is the Study Design?

When considering a question of association, investigators will be interested in measuring the relationship between variables ( Figure 1 ). A study designed to determine whether there is consensus among different raters will be measuring agreement. For example, an investigator may be interested in determining whether 2 raters, using the same assessment tool, arrive at the same score. Correlation analyses examine the strength of a relationship or connection between 2 variables, like age and blood glucose. Regression analyses also examine the strength of a relationship or connection; however, in this type of analysis, one variable is considered an outcome (or dependent variable) and the other variable is considered a predictor (or independent variable). Regression analyses often consider the influence of multiple predictors on an outcome at the same time. For example, an investigator may be interested in examining the association between a treatment and blood glucose, while also considering other factors, like age, sex, ethnicity, exercise frequency, and weight.

When considering a question of difference, investigators must first determine how many groups they will be comparing. In some cases, investigators may be interested in comparing the characteristic of one group with that of an external reference group. For example, is the mean age of study participants similar to the mean age of all people in the target group? If more than one group is involved, then investigators must also determine whether there is an underlying connection between the sets of values (or samples ) to be compared. Samples are considered independent or unpaired when the information is taken from different groups. For example, we could use an unpaired t test to compare the mean age between 2 independent samples, such as the intervention and control groups in a study. Samples are considered related or paired if the information is taken from the same group of people, for example, measurement of blood glucose at the beginning and end of a study. Because blood glucose is measured in the same people at both time points, we could use a paired t test to determine whether there has been a significant change in blood glucose.

What Is the Level of Measurement?

As described in the first section of this article, variables can be grouped according to the level of measurement (nominal, ordinal, or interval). In most cases, the independent variable in an inferential statistic will be nominal; therefore, investigators need to know the level of measurement for the dependent variable before they can select the relevant inferential statistic. Two exceptions to this consideration are correlation analyses and regression analyses ( Figure 1 ). Because a correlation analysis measures the strength of association between 2 variables, we need to consider the level of measurement for both variables. Regression analyses can consider multiple independent variables, often with a variety of measurement levels. However, for these analyses, investigators still need to consider the level of measurement for the dependent variable.

Selection of inferential statistics to test interval-level variables must include consideration of how the data are distributed. An underlying assumption for parametric tests is that the data approximate a normal distribution. When the data are not normally distributed, information derived from a parametric test may be wrong. 6 When the assumption of normality is violated (for example, when the data are skewed), then investigators should use a nonparametric test. If the data are normally distributed, then investigators can use a parametric test.

ADDITIONAL CONSIDERATIONS

What is the level of significance.

An inferential statistic is used to calculate a p value, the probability of obtaining the observed data by chance. Investigators can then compare this p value against a prespecified level of significance, which is often chosen to be 0.05. This level of significance represents a 1 in 20 chance that the observation is wrong, which is considered an acceptable level of error.

What Are the Most Commonly Used Statistics?

In 1983, Emerson and Colditz 7 reported the first review of statistics used in original research articles published in the New England Journal of Medicine . This review of statistics used in the journal was updated in 1989 and 2005, 8 and this type of analysis has been replicated in many other journals. 9 – 13 Collectively, these reviews have identified 2 important observations. First, the overall sophistication of statistical methodology used and reported in studies has grown over time, with survival analyses and multivariable regression analyses becoming much more common. The second observation is that, despite this trend, 1 in 4 articles describe no statistical methods or report only simple descriptive statistics. When inferential statistics are used, the most common are t tests, contingency table tests (for example, χ 2 test and Fisher exact test), and simple correlation and regression analyses. This information is important for educators, investigators, reviewers, and readers because it suggests that a good foundational knowledge of descriptive statistics and common inferential statistics will enable us to correctly evaluate the majority of research articles. 11 – 13 However, to fully take advantage of all research published in high-impact journals, we need to become acquainted with some of the more complex methods, such as multivariable regression analyses. 8 , 13

What Are Some Additional Resources?

As an investigator and Associate Editor with CJHP , I have often relied on the advice of colleagues to help create my own analysis plans and review the plans of others. Biostatisticians have a wealth of knowledge in the field of statistical analysis and can provide advice on the correct selection, application, and interpretation of these methods. Colleagues who have “been there and done that” with their own data analysis plans are also valuable sources of information. Identify these individuals and consult with them early and often as you develop your analysis plan.

Another important resource to consider when creating your analysis plan is textbooks. Numerous statistical textbooks are available, differing in levels of complexity and scope. The titles listed in the “Further Reading” section are just a few suggestions. I encourage interested readers to look through these and other books to find resources that best fit their needs. However, one crucial book that I highly recommend to anyone wanting to be an investigator or peer reviewer is Lang and Secic’s How to Report Statistics in Medicine (see “Further Reading”). As the title implies, this book covers a wide range of statistics used in medical research and provides numerous examples of how to correctly report the results.

CONCLUSIONS

When it comes to creating an analysis plan for your project, I recommend following the sage advice of Douglas Adams in The Hitchhiker’s Guide to the Galaxy : Don’t panic! 14 Begin with simple methods to summarize and visualize your data, then use the key questions and decision trees provided in this article to identify relevant statistical tests. Information in this article will give you and your co-investigators a place to start discussing the elements necessary for developing an analysis plan. But do not stop there! Use advice from biostatisticians and more experienced colleagues, as well as information in textbooks, to help create your analysis plan and choose the most appropriate statistics for your study. Making careful, informed decisions about the statistics to use in your study should reduce the risk of confirming Mr Twain’s concern.

Appendix 1. Glossary of statistical terms * (part 1 of 2)

1-way ANOVA: Uses 1 variable to define the groups for comparing means. This is similar to the Student t test when comparing the means of 2 groups.
Kruskall–Wallis 1-way ANOVA: Nonparametric alternative for the 1-way ANOVA. Used to determine the difference in medians between 3 or more groups.
n -way ANOVA: Uses 2 or more variables to define groups when comparing means. Also called a “between-subjects factorial ANOVA”.
Repeated-measures ANOVA: A method for analyzing whether the means of 3 or more measures from the same group of participants are different.
Freidman ANOVA: Nonparametric alternative for the repeated-measures ANOVA. It is often used to compare rankings and preferences that are measured 3 or more times.
Fisher exact: Variation of chi-square that accounts for cell counts < 5.
McNemar: Variation of chi-square that tests statistical significance of changes in 2 paired measurements of dichotomous variables.
Cochran Q: An extension of the McNemar test that provides a method for testing for differences between 3 or more matched sets of frequencies or proportions. Often used as a measure of heterogeneity in meta-analyses.
1-sample: Used to determine whether the mean of a sample is significantly different from a known or hypothesized value.
Independent-samples t test (also referred to as the Student t test): Used when the independent variable is a nominal-level variable that identifies 2 groups and the dependent variable is an interval-level variable.
Paired: Used to compare 2 pairs of scores between 2 groups (e.g., baseline and follow-up blood pressure in the intervention and control groups).

Lang TA, Secic M. How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. 2nd ed. Philadelphia (PA): American College of Physicians; 2006.

Norman GR, Streiner DL. PDQ statistics. 3rd ed. Hamilton (ON): B.C. Decker; 2003.

Plichta SB, Kelvin E. Munro’s statistical methods for health care research . 6th ed. Philadelphia (PA): Wolters Kluwer Health/ Lippincott, Williams & Wilkins; 2013.

This article is the 12th in the CJHP Research Primer Series, an initiative of the CJHP Editorial Board and the CSHP Research Committee. The planned 2-year series is intended to appeal to relatively inexperienced researchers, with the goal of building research capacity among practising pharmacists. The articles, presenting simple but rigorous guidance to encourage and support novice researchers, are being solicited from authors with appropriate expertise.

Previous articles in this series:

Bond CM. The research jigsaw: how to get started. Can J Hosp Pharm . 2014;67(1):28–30.
Tully MP. Research: articulating questions, generating hypotheses, and choosing study designs. Can J Hosp Pharm . 2014;67(1):31–4.
Loewen P. Ethical issues in pharmacy practice research: an introductory guide. Can J Hosp Pharm. 2014;67(2):133–7.
Tsuyuki RT. Designing pharmacy practice research trials. Can J Hosp Pharm . 2014;67(3):226–9.
Bresee LC. An introduction to developing surveys for pharmacy practice research. Can J Hosp Pharm . 2014;67(4):286–91.
Gamble JM. An introduction to the fundamentals of cohort and case–control studies. Can J Hosp Pharm . 2014;67(5):366–72.
Austin Z, Sutton J. Qualitative research: getting started. C an J Hosp Pharm . 2014;67(6):436–40.
Houle S. An introduction to the fundamentals of randomized controlled trials in pharmacy research. Can J Hosp Pharm . 2014; 68(1):28–32.
Charrois TL. Systematic reviews: What do you need to know to get started? Can J Hosp Pharm . 2014;68(2):144–8.
Sutton J, Austin Z. Qualitative research: data collection, analysis, and management. Can J Hosp Pharm . 2014;68(3):226–31.
Cadarette SM, Wong L. An introduction to health care administrative data. Can J Hosp Pharm. 2014;68(3):232–7.

Competing interests: None declared.

Data Analysis in Research: Types & Methods

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense.

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research.

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words.

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find “food” and “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended text analysis methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other.

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

Content Analysis: It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
Discourse Analysis: Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
Grounded Theory: When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

Fraud: To ensure an actual human being records each response to the survey or the questionnaire
Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
Procedure: To ensure ethical standards were maintained while collecting the data sample
Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

Count, Percent, Frequency
It is used to denote home often a particular event occurs.
Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

Mean, Median, Mode
The method is widely used to demonstrate distribution by various points.
Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

Range, Variance, Standard deviation
Here the field equals high/low points.
Variance standard deviation = difference between the observed score and mean
It is used to identify the spread of scores by stating intervals.
Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

Percentile ranks, Quartile ranks
It relies on standardized scores helping researchers to identify the relationship between different scores.
It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided sample without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected sample to reason that about 80-90% of people like the movie.

Here are two significant areas of inferential statistics.

Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
Cross-tabulation: Also called contingency tables, cross-tabulation is used to analyze the relationship between multiple variables. Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing audience sample il to draw a biased inference.
Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

When I think of “disconnected”, it is important that this is not just in relation to people analytics, Employee Experience or Customer Experience - it is also relevant to looking across them.

I Am Disconnected – Tuesday CX Thoughts

May 21, 2024

20 Best Customer Success Tools of 2024

May 20, 2024

AI-Based Services Buying Guide for Market Research (based on ESOMAR’s 20 Questions)

Data Information vs Insight: Essential differences

May 14, 2024

Home » Data Analysis – Process, Methods and Types

Data Analysis – Process, Methods and Types

Table of Contents

Data Analysis

Definition:

Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets. The ultimate aim of data analysis is to convert raw data into actionable insights that can inform business decisions, scientific research, and other endeavors.

Data Analysis Process

The following are step-by-step guides to the data analysis process:

Define the Problem

The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome.

Collect the Data

The next step is to collect the relevant data from various sources. This may involve collecting data from surveys, databases, or other sources. It is important to ensure that the data collected is accurate, complete, and relevant to the problem being analyzed.

Clean and Organize the Data

Once the data has been collected, it needs to be cleaned and organized. This involves removing any errors or inconsistencies in the data, filling in missing values, and ensuring that the data is in a format that can be easily analyzed.

Analyze the Data

The next step is to analyze the data using various statistical and analytical techniques. This may involve identifying patterns in the data, conducting statistical tests, or using machine learning algorithms to identify trends and insights.

Interpret the Results

After analyzing the data, the next step is to interpret the results. This involves drawing conclusions based on the analysis and identifying any significant findings or trends.

Communicate the Findings

Once the results have been interpreted, they need to be communicated to stakeholders. This may involve creating reports, visualizations, or presentations to effectively communicate the findings and recommendations.

Take Action

The final step in the data analysis process is to take action based on the findings. This may involve implementing new policies or procedures, making strategic decisions, or taking other actions based on the insights gained from the analysis.

Types of Data Analysis

Types of Data Analysis are as follows:

Descriptive Analysis

This type of analysis involves summarizing and describing the main characteristics of a dataset, such as the mean, median, mode, standard deviation, and range.

Inferential Analysis

This type of analysis involves making inferences about a population based on a sample. Inferential analysis can help determine whether a certain relationship or pattern observed in a sample is likely to be present in the entire population.

Diagnostic Analysis

This type of analysis involves identifying and diagnosing problems or issues within a dataset. Diagnostic analysis can help identify outliers, errors, missing data, or other anomalies in the dataset.

Predictive Analysis

This type of analysis involves using statistical models and algorithms to predict future outcomes or trends based on historical data. Predictive analysis can help businesses and organizations make informed decisions about the future.

Prescriptive Analysis

This type of analysis involves recommending a course of action based on the results of previous analyses. Prescriptive analysis can help organizations make data-driven decisions about how to optimize their operations, products, or services.

Exploratory Analysis

This type of analysis involves exploring the relationships and patterns within a dataset to identify new insights and trends. Exploratory analysis is often used in the early stages of research or data analysis to generate hypotheses and identify areas for further investigation.

Data Analysis Methods

Data Analysis Methods are as follows:

Statistical Analysis

This method involves the use of mathematical models and statistical tools to analyze and interpret data. It includes measures of central tendency, correlation analysis, regression analysis, hypothesis testing, and more.

Machine Learning

This method involves the use of algorithms to identify patterns and relationships in data. It includes supervised and unsupervised learning, classification, clustering, and predictive modeling.

Data Mining

This method involves using statistical and machine learning techniques to extract information and insights from large and complex datasets.

Text Analysis

This method involves using natural language processing (NLP) techniques to analyze and interpret text data. It includes sentiment analysis, topic modeling, and entity recognition.

Network Analysis

This method involves analyzing the relationships and connections between entities in a network, such as social networks or computer networks. It includes social network analysis and graph theory.

Time Series Analysis

This method involves analyzing data collected over time to identify patterns and trends. It includes forecasting, decomposition, and smoothing techniques.

Spatial Analysis

This method involves analyzing geographic data to identify spatial patterns and relationships. It includes spatial statistics, spatial regression, and geospatial data visualization.

Data Visualization

This method involves using graphs, charts, and other visual representations to help communicate the findings of the analysis. It includes scatter plots, bar charts, heat maps, and interactive dashboards.

Qualitative Analysis

This method involves analyzing non-numeric data such as interviews, observations, and open-ended survey responses. It includes thematic analysis, content analysis, and grounded theory.

Multi-criteria Decision Analysis

This method involves analyzing multiple criteria and objectives to support decision-making. It includes techniques such as the analytical hierarchy process, TOPSIS, and ELECTRE.

Data Analysis Tools

There are various data analysis tools available that can help with different aspects of data analysis. Below is a list of some commonly used data analysis tools:

Microsoft Excel: A widely used spreadsheet program that allows for data organization, analysis, and visualization.
SQL : A programming language used to manage and manipulate relational databases.
R : An open-source programming language and software environment for statistical computing and graphics.
Python : A general-purpose programming language that is widely used in data analysis and machine learning.
Tableau : A data visualization software that allows for interactive and dynamic visualizations of data.
SAS : A statistical analysis software used for data management, analysis, and reporting.
SPSS : A statistical analysis software used for data analysis, reporting, and modeling.
Matlab : A numerical computing software that is widely used in scientific research and engineering.
RapidMiner : A data science platform that offers a wide range of data analysis and machine learning tools.

Applications of Data Analysis

Data analysis has numerous applications across various fields. Below are some examples of how data analysis is used in different fields:

Business : Data analysis is used to gain insights into customer behavior, market trends, and financial performance. This includes customer segmentation, sales forecasting, and market research.
Healthcare : Data analysis is used to identify patterns and trends in patient data, improve patient outcomes, and optimize healthcare operations. This includes clinical decision support, disease surveillance, and healthcare cost analysis.
Education : Data analysis is used to measure student performance, evaluate teaching effectiveness, and improve educational programs. This includes assessment analytics, learning analytics, and program evaluation.
Finance : Data analysis is used to monitor and evaluate financial performance, identify risks, and make investment decisions. This includes risk management, portfolio optimization, and fraud detection.
Government : Data analysis is used to inform policy-making, improve public services, and enhance public safety. This includes crime analysis, disaster response planning, and social welfare program evaluation.
Sports : Data analysis is used to gain insights into athlete performance, improve team strategy, and enhance fan engagement. This includes player evaluation, scouting analysis, and game strategy optimization.
Marketing : Data analysis is used to measure the effectiveness of marketing campaigns, understand customer behavior, and develop targeted marketing strategies. This includes customer segmentation, marketing attribution analysis, and social media analytics.
Environmental science : Data analysis is used to monitor and evaluate environmental conditions, assess the impact of human activities on the environment, and develop environmental policies. This includes climate modeling, ecological forecasting, and pollution monitoring.

When to Use Data Analysis

Data analysis is useful when you need to extract meaningful insights and information from large and complex datasets. It is a crucial step in the decision-making process, as it helps you understand the underlying patterns and relationships within the data, and identify potential areas for improvement or opportunities for growth.

Here are some specific scenarios where data analysis can be particularly helpful:

Problem-solving : When you encounter a problem or challenge, data analysis can help you identify the root cause and develop effective solutions.
Optimization : Data analysis can help you optimize processes, products, or services to increase efficiency, reduce costs, and improve overall performance.
Prediction: Data analysis can help you make predictions about future trends or outcomes, which can inform strategic planning and decision-making.
Performance evaluation : Data analysis can help you evaluate the performance of a process, product, or service to identify areas for improvement and potential opportunities for growth.
Risk assessment : Data analysis can help you assess and mitigate risks, whether it is financial, operational, or related to safety.
Market research : Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective marketing strategies.
Quality control: Data analysis can help you ensure product quality and customer satisfaction by identifying and addressing quality issues.

Purpose of Data Analysis

The primary purposes of data analysis can be summarized as follows:

To gain insights: Data analysis allows you to identify patterns and trends in data, which can provide valuable insights into the underlying factors that influence a particular phenomenon or process.
To inform decision-making: Data analysis can help you make informed decisions based on the information that is available. By analyzing data, you can identify potential risks, opportunities, and solutions to problems.
To improve performance: Data analysis can help you optimize processes, products, or services by identifying areas for improvement and potential opportunities for growth.
To measure progress: Data analysis can help you measure progress towards a specific goal or objective, allowing you to track performance over time and adjust your strategies accordingly.
To identify new opportunities: Data analysis can help you identify new opportunities for growth and innovation by identifying patterns and trends that may not have been visible before.

Examples of Data Analysis

Some Examples of Data Analysis are as follows:

Social Media Monitoring: Companies use data analysis to monitor social media activity in real-time to understand their brand reputation, identify potential customer issues, and track competitors. By analyzing social media data, businesses can make informed decisions on product development, marketing strategies, and customer service.
Financial Trading: Financial traders use data analysis to make real-time decisions about buying and selling stocks, bonds, and other financial instruments. By analyzing real-time market data, traders can identify trends and patterns that help them make informed investment decisions.
Traffic Monitoring : Cities use data analysis to monitor traffic patterns and make real-time decisions about traffic management. By analyzing data from traffic cameras, sensors, and other sources, cities can identify congestion hotspots and make changes to improve traffic flow.
Healthcare Monitoring: Healthcare providers use data analysis to monitor patient health in real-time. By analyzing data from wearable devices, electronic health records, and other sources, healthcare providers can identify potential health issues and provide timely interventions.
Online Advertising: Online advertisers use data analysis to make real-time decisions about advertising campaigns. By analyzing data on user behavior and ad performance, advertisers can make adjustments to their campaigns to improve their effectiveness.
Sports Analysis : Sports teams use data analysis to make real-time decisions about strategy and player performance. By analyzing data on player movement, ball position, and other variables, coaches can make informed decisions about substitutions, game strategy, and training regimens.
Energy Management : Energy companies use data analysis to monitor energy consumption in real-time. By analyzing data on energy usage patterns, companies can identify opportunities to reduce energy consumption and improve efficiency.

Characteristics of Data Analysis

Characteristics of Data Analysis are as follows:

Objective : Data analysis should be objective and based on empirical evidence, rather than subjective assumptions or opinions.
Systematic : Data analysis should follow a systematic approach, using established methods and procedures for collecting, cleaning, and analyzing data.
Accurate : Data analysis should produce accurate results, free from errors and bias. Data should be validated and verified to ensure its quality.
Relevant : Data analysis should be relevant to the research question or problem being addressed. It should focus on the data that is most useful for answering the research question or solving the problem.
Comprehensive : Data analysis should be comprehensive and consider all relevant factors that may affect the research question or problem.
Timely : Data analysis should be conducted in a timely manner, so that the results are available when they are needed.
Reproducible : Data analysis should be reproducible, meaning that other researchers should be able to replicate the analysis using the same data and methods.
Communicable : Data analysis should be communicated clearly and effectively to stakeholders and other interested parties. The results should be presented in a way that is understandable and useful for decision-making.

Advantages of Data Analysis

Advantages of Data Analysis are as follows:

Better decision-making: Data analysis helps in making informed decisions based on facts and evidence, rather than intuition or guesswork.
Improved efficiency: Data analysis can identify inefficiencies and bottlenecks in business processes, allowing organizations to optimize their operations and reduce costs.
Increased accuracy: Data analysis helps to reduce errors and bias, providing more accurate and reliable information.
Better customer service: Data analysis can help organizations understand their customers better, allowing them to provide better customer service and improve customer satisfaction.
Competitive advantage: Data analysis can provide organizations with insights into their competitors, allowing them to identify areas where they can gain a competitive advantage.
Identification of trends and patterns : Data analysis can identify trends and patterns in data that may not be immediately apparent, helping organizations to make predictions and plan for the future.
Improved risk management : Data analysis can help organizations identify potential risks and take proactive steps to mitigate them.
Innovation: Data analysis can inspire innovation and new ideas by revealing new opportunities or previously unknown correlations in data.

Limitations of Data Analysis

Data quality: The quality of data can impact the accuracy and reliability of analysis results. If data is incomplete, inconsistent, or outdated, the analysis may not provide meaningful insights.
Limited scope: Data analysis is limited by the scope of the data available. If data is incomplete or does not capture all relevant factors, the analysis may not provide a complete picture.
Human error : Data analysis is often conducted by humans, and errors can occur in data collection, cleaning, and analysis.
Cost : Data analysis can be expensive, requiring specialized tools, software, and expertise.
Time-consuming : Data analysis can be time-consuming, especially when working with large datasets or conducting complex analyses.
Overreliance on data: Data analysis should be complemented with human intuition and expertise. Overreliance on data can lead to a lack of creativity and innovation.
Privacy concerns: Data analysis can raise privacy concerns if personal or sensitive information is used without proper consent or security measures.

About the author

Muhammad Hassan

Researcher, Academic Writer, Web developer

Cluster Analysis – Types, Methods and Examples

Data Collection – Methods Types and Examples

Delimitations in Research – Types, Examples and...

Discriminant Analysis – Methods, Types and...

Research Process – Steps, Examples and Tips

Research Design – Types, Methods and Examples

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
Explore content
About the journal
Publish with us
Sign up for alerts
09 May 2024

Cubic millimetre of brain mapped in spectacular detail

Carissa Wong

You can also search for this author in PubMed Google Scholar

Rendering based on electron-microscope data, showing the positions of neurons in a fragment of the brain cortex. Neurons are coloured according to size. Credit: Google Research & Lichtman Lab (Harvard University). Renderings by D. Berger (Harvard University)

You have full access to this article via your institution.

Researchers have mapped a tiny piece of the human brain in astonishing detail. The resulting cell atlas, which was described today in Science 1 and is available online , reveals new patterns of connections between brain cells called neurons, as well as cells that wrap around themselves to form knots, and pairs of neurons that are almost mirror images of each other.

The 3D map covers a volume of about one cubic millimetre, one-millionth of a whole brain, and contains roughly 57,000 cells and 150 million synapses — the connections between neurons. It incorporates a colossal 1.4 petabytes of data. “It’s a little bit humbling,” says Viren Jain, a neuroscientist at Google in Mountain View, California, and a co-author of the paper. “How are we ever going to really come to terms with all this complexity?”

Slivers of brain

The brain fragment was taken from a 45-year-old woman when she underwent surgery to treat her epilepsy. It came from the cortex, a part of the brain involved in learning, problem-solving and processing sensory signals. The sample was immersed in preservatives and stained with heavy metals to make the cells easier to see. Neuroscientist Jeff Lichtman at Harvard University in Cambridge, Massachusetts, and his colleagues then cut the sample into around 5,000 slices — each just 34 nanometres thick — that could be imaged using electron microscopes.

Jain’s team then built artificial-intelligence models that were able to stitch the microscope images together to reconstruct the whole sample in 3D. “I remember this moment, going into the map and looking at one individual synapse from this woman’s brain, and then zooming out into these other millions of pixels,” says Jain. “It felt sort of spiritual.”

Rendering of a neuron with a round base and many branches, on a black background.

A single neuron (white) shown with 5,600 of the axons (blue) that connect to it. The synapses that make these connections are shown in green. Credit: Google Research & Lichtman Lab (Harvard University). Renderings by D. Berger (Harvard University)

When examining the model in detail, the researchers discovered unconventional neurons, including some that made up to 50 connections with each other. “In general, you would find a couple of connections at most between two neurons,” says Jain. Elsewhere, the model showed neurons with tendrils that formed knots around themselves. “Nobody had seen anything like this before,” Jain adds.

The team also found pairs of neurons that were near-perfect mirror images of each other. “We found two groups that would send their dendrites in two different directions, and sometimes there was a kind of mirror symmetry,” Jain says. It is unclear what role these features have in the brain.

Proofreaders needed

The map is so large that most of it has yet to be manually checked, and it could still contain errors created by the process of stitching so many images together. “Hundreds of cells have been ‘proofread’, but that’s obviously a few per cent of the 50,000 cells in there,” says Jain. He hopes that others will help to proofread parts of the map they are interested in. The team plans to produce similar maps of brain samples from other people — but a map of the entire brain is unlikely in the next few decades, he says.

“This paper is really the tour de force creation of a human cortex data set,” says Hongkui Zeng, director of the Allen Institute for Brain Science in Seattle. The vast amount of data that has been made freely accessible will “allow the community to look deeper into the micro-circuitry in the human cortex”, she adds.

Gaining a deeper understanding of how the cortex works could offer clues about how to treat some psychiatric and neurodegenerative diseases. “This map provides unprecedented details that can unveil new rules of neural connections and help to decipher the inner working of the human brain,” says Yongsoo Kim, a neuroscientist at Pennsylvania State University in Hershey.

Nature 629 , 739-740 (2024)

doi: https://doi.org/10.1038/d41586-024-01387-9

Shapson-Coe, A. et al. Science 384 , eadk4858 (2024).

Article Google Scholar

Download references

Reprints and permissions

Neuroscience

First ‘bilingual’ brain-reading device decodes Spanish and English words

News 21 MAY 24

Temporal multiplexing of perception and memory codes in IT cortex

Article 15 MAY 24

Volatile working memory representations crystallize with practice

Organoids merge to model the blood–brain barrier

Research Highlight 15 MAY 24

How does ChatGPT ‘think’? Psychology and neuroscience crack open AI large language models

News Feature 14 MAY 24

Postdoctoral Fellow

New Orleans, Louisiana

Tulane University School of Medicine

Postdoctoral Associate - Immunology

Houston, Texas (US)

Baylor College of Medicine (BCM)

Postdoctoral Associate

Vice president, nature communications portfolio.

This is an exciting opportunity to play a key leadership role in the market-leading journal Nature Portfolio and help drive its overall contribution.

New York City, New York (US), Berlin, or Heidelberg

Springer Nature Ltd

Senior Postdoctoral Research Fellow

Senior Postdoctoral Research Fellow required to lead exciting projects in Cancer Cell Cycle Biology and Cancer Epigenetics.

Melbourne University, Melbourne (AU)

University of Melbourne & Peter MacCallum Cancer Centre

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Microsoft Research AI for Science

Chris Bishop and Frank Noé in conversation

AI for Science in Conversation: Chris Bishop and Frank Noé discuss setting up a team in Berlin

Screenshot of Bonnie Kruft from The fifth paradigm of scientific discovery plenary

Watch Research Summit "Fifth Paradigm of Scientific Discovery" plenary on demand

Christopher Bishop, Technical Fellow and Director, Microsoft Research AI4Science

AI for Science to empower the fifth paradigm of scientific discovery

Christopher Bishop, Technical Fellow, and Director, AI4Science

“Over the coming decade, deep learning looks set to have a transformational impact on the natural sciences. The consequences are potentially far-reaching and could dramatically improve our ability to model and predict natural phenomena over widely varying scales of space and time. Our AI4Science team encompasses world experts in machine learning, quantum physics, computational chemistry, molecular biology, fluid dynamics, software engineering, and other disciplines, who are working together to tackle some of the most pressing challenges in this field.“ 未来十年，深度学习注定将会给自然科学带来变革性的影响。其结果具有潜在的深远意义，可能会极大地提高我们在差异巨大的空间和时间尺度上对自然现象进行建模和预测的能力。为此，微软研究院科学智能中心（AI4Science）集结了机器学习、计算物理、计算化学、分子生物学、软件工程和其他学科领域的世界级专家，共同致力于解决该领域中最紧迫的挑战。 Professor Chris Bishop , Technical Fellow, and Director, AI for Science

Work with us

Senior researcher – machine learning .

Location : Beijing, China

Technical Program Manager 2 – AI for Science

Our locations.

Amsterdam, Netherlands

Beijing, China

Berlin, Germany

Cambridge, UK

Redmond, USA

Cityscape of Shanghai, showing the MSR office

Shanghai, China

Follow on Twitter
Like on Facebook
Follow on LinkedIn
Subscribe on Youtube
Follow on Instagram
Subscribe to our RSS feed

Share this page:

Share on Twitter
Share on Facebook
Share on LinkedIn
Share on Reddit

Future Students
Current Students
Faculty/Staff

News and Media

News & Media Home
Research Stories
School's In
In the Media

As the nation prepares to mark the 70th anniversary of the landmark U.S. Supreme Court ruling in Brown v. Board of Education , a new report from researchers at Stanford and USC shows that racial and economic segregation among schools has grown steadily in large school districts over the past three decades — an increase that appears to be driven in part by policies favoring school choice over integration.

Analyzing data from U.S. public schools going back to 1967, the researchers found that segregation between white and Black students has increased by 64 percent since 1988 in the 100 largest districts, and segregation by economic status has increased by about 50 percent since 1991.

The report also provides new evidence about the forces driving recent trends in school segregation, showing that the expansion of charter schools has played a major role.

The findings were released on May 6 with the launch of the Segregation Explorer , a new interactive website from the Educational Opportunity Project at Stanford University. The website provides searchable data on racial and economic school segregation in U.S. states, counties, metropolitan areas, and school districts from 1991 to 2022.

“School segregation levels are not at pre- Brown levels, but they are high and have been rising steadily since the late 1980s,” said Sean Reardon , the Professor of Poverty and Inequality in Education at Stanford Graduate School of Education and faculty director of the Educational Opportunity Project. “In most large districts, school segregation has increased while residential segregation and racial economic inequality have declined, and our findings indicate that policy choices – not demographic changes – are driving the increase.”

“There’s a tendency to attribute segregation in schools to segregation in neighborhoods,” said Ann Owens , a professor of sociology and public policy at USC. “But we’re finding that the story is more complicated than that.”

Assessing the rise

In the Brown v. Board decision issued on May 17, 1954, the U.S. Supreme Court ruled that racially segregated public schools violated the Equal Protection Clause of the Fourteenth Amendment and established that “separate but equal” schools were not only inherently unequal but unconstitutional. The ruling paved the way for future decisions that led to rapid school desegregation in many school districts in the late 1960s and early 1970s.

Though segregation in most school districts is much lower than it was 60 years ago, the researchers found that over the past three decades, both racial and economic segregation in large districts increased. Much of the increase in economic segregation since 1991, measured by segregation between students eligible and ineligible for free lunch, occurred in the last 15 years.

White-Hispanic and white-Asian segregation, while lower on average than white-Black segregation, have both more than doubled in large school districts since the 1980s.

Racial-economic segregation – specifically the difference in the proportion of free-lunch-eligible students between the average white and Black or Hispanic student’s schools – has increased by 70 percent since 1991.

School segregation is strongly associated with achievement gaps between racial and ethnic groups, especially the rate at which achievement gaps widen during school, the researchers said.

“Segregation appears to shape educational outcomes because it concentrates Black and Hispanic students in higher-poverty schools, which results in unequal learning opportunities,” said Reardon, who is also a senior fellow at the Stanford Institute for Economic Policy Research and a faculty affiliate of the Stanford Accelerator for Learning .

Policies shaping recent trends

The recent rise in school segregation appears to be the direct result of educational policy and legal decisions, the researchers said.

Both residential segregation and racial disparities in income declined between 1990 and 2020 in most large school districts. “Had nothing else changed, that trend would have led to lower school segregation,” said Owens.

But since 1991, roughly two-thirds of districts that were under court-ordered desegregation have been released from court oversight. Meanwhile, since 1998, the charter sector – a form of expanded school choice – has grown.

Expanding school choice could influence segregation levels in different ways: If families sought schools that were more diverse than the ones available in their neighborhood, it could reduce segregation. But the researchers found that in districts where the charter sector expanded most rapidly in the 2000s and 2010s, segregation grew the most.

The researchers’ analysis also quantified the extent to which the release from court orders accounted for the rise in school segregation. They found that, together, the release from court oversight and the expansion of choice accounted entirely for the rise in school segregation from 2000 to 2019.

The researchers noted enrollment policies that school districts can implement to mitigate segregation, such as voluntary integration programs, socioeconomic-based student assignment policies, and school choice policies that affirmatively promote integration.

“School segregation levels are high, troubling, and rising in large districts,” said Reardon. “These findings should sound an alarm for educators and policymakers.”

Additional collaborators on the project include Demetra Kalogrides, Thalia Tom, and Heewon Jang. This research, including the development of the Segregation Explorer data and website, was supported by the Russell Sage Foundation, the Robert Wood Johnson Foundation, and the Bill and Melinda Gates Foundation.

Get the Educator

Subscribe to our monthly newsletter.

Stanford Graduate School of Education

482 Galvez Mall Stanford, CA 94305-3096 Tel: (650) 723-2109

Contact Admissions
GSE Leadership
Site Feedback
Web Accessibility
Career Resources
Faculty Open Positions
Explore Courses
Academic Calendar
Office of the Registrar
Cubberley Library
StanfordWho
StanfordYou

Improving lives through learning

Stanford Home
Maps & Directions
Search Stanford
Emergency Info
Terms of Use
Non-Discrimination
Accessibility

IMAGES

Data Analysis
Data Analysis Phases
What is Data Analytics? A Complete Guide for Beginners
Research
#limeproxies #proxyservers #security #privacy #technology#codinglife #codingisfun#SpreadPrivacy#
Tools for data analysis in research

VIDEO

Analysis of Data? Some Examples to Explore
How To Add Data Analysis in Excel (2023)
Applied Survey Research Data Analysis Correlations
Day 6 Practical applications of ChatGPT in generating insights By Mohammad Rafiq
Succeeding as a New Data Analyst
Day 2 AI tools commonly used in research data analysis By Mohammad Rafiq

COMMENTS

Five Key Trends in AI and Data Science for 2024
5. Data, analytics, and AI leaders are becoming less independent. This past year, we began to notice that increasing numbers of organizations were cutting back on the proliferation of technology and data "chiefs," including chief data and analytics officers (and sometimes chief AI officers).
A new theoretical understanding of big data analytics capabilities in
Data analysis is complex, but one data-handling method, "Big Data Analytics" (BDA)—the application of advanced analytic techniques, including data mining, statistical analysis, ... To this end, this research proposes a new research model that relates earlier studies regarding BDAC in organizational culture. The research model provides a ...
What is Data Analysis? An Expert Guide With Examples
Data analysis is a comprehensive method of inspecting, cleansing, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. It is a multifaceted process involving various techniques and methodologies to interpret data from various sources in different formats, both structured and unstructured.
data analysis Latest Research Papers
Data Missing . The Given. Missing data is universal complexity for most part of the research fields which introduces the part of uncertainty into data analysis. We can take place due to many types of motives such as samples mishandling, unable to collect an observation, measurement errors, aberrant value deleted, or merely be short of study.
The Top 5 Data Science And Analytics Trends In 2023
Today, information can be captured from many different sources, and technology to extract insights is becoming increasingly accessible. The Top 5 Data Science And Analytics Trends In 2023. Adobe ...
Learning to Do Qualitative Data Analysis: A Starting Point
For many researchers unfamiliar with qualitative research, determining how to conduct qualitative analyses is often quite challenging. Part of this challenge is due to the seemingly limitless approaches that a qualitative researcher might leverage, as well as simply learning to think like a qualitative researcher when analyzing data. From framework analysis (Ritchie & Spencer, 1994) to content ...
Data science News, Research and Analysis
From stock markets to brain scans, new research harmonises hundreds of scientific methods to understand complex systems. Ben Fulcher, University of Sydney. There are hundreds of ways to analyse ...
Data Science and Analytics: An Overview from Data-Driven Smart
The term "Data analysis" refers to the processing of ... Cao et al. defined data science from the disciplinary perspective as "data science is a new interdisciplinary field that synthesizes ... and real-world application areas in "Real-world application domains" open several research issues in the area of data-driven business ...
Data analytics News, Research and Analysis
Paul X. McCarthy, UNSW Sydney and Colin Griffith, CSIRO. Big data analysis has unveiled startling links between seemingly unrelated things, such as how a person's physical elevation above sea ...
PDF The SAGE Handbook of Qualitative Data Analysis
Data analysis is the central step in qualitative research. Whatever the data are, it is their analysis that, in a decisive way, forms the outcomes of the research. Sometimes, data collection is limited to recording and docu-menting naturally occurring phenomena, for example by recording interactions. Then qualitative research is concentrated on ...
Research data
Research Open Access 15 May 2024 Scientific Data Volume: 11, P: 496 A large-scale machine learning analysis of inorganic nanoparticles in preclinical cancer research
Full article: Design Principles for Data Analysis
Our primary focus in this article is to (i) introduce a set of data analytic design principles ( Section 2 ), (ii) describe an example of how the design principles can be used to measure different characteristics of a data analysis ( Section 3 ), and (iii) present data on the variation in principles within and between producers of data analyses ...
What Is Data Analysis? (With Examples)
Written by Coursera Staff • Updated on Apr 19, 2024. Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock ...
What is data analysis? Methods, techniques, types & how-to
A method of data analysis that is the umbrella term for engineering metrics and insights for additional value, direction, and context. By using exploratory statistical evaluation, data mining aims to identify dependencies, relations, patterns, and trends to generate advanced knowledge.
From Data Management to Actionable Findings: A Five-Phase Process of
Triangulating data with, and situating the findings in, the existing literature and theory can also support trustworthiness in qualitative studies - specifically, by increasing the credibility, confirmability, and transferability of the findings (Tierney & Clemens, 2011).To support credibility and transferability, generally, researchers must connect the findings of the analysis to the ...
Introduction to Data Analysis
Data analysis can be quantitative, qualitative, or mixed methods. Quantitative research typically involves numbers and "close-ended questions and responses" (Creswell & Creswell, 2018, p. 3).Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures (Creswell & Creswell, 2018, p. 4).
A practical guide to data analysis in general literature reviews
This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.
Creating a Data Analysis Plan: What to Consider When Choosing
For those interested in conducting qualitative research, previous articles in this Research Primer series have provided information on the design and analysis of such studies. 2, 3 Information in the current article is divided into 3 main sections: an overview of terms and concepts used in data analysis, a review of common methods used to ...
Data Analysis in Research: Types & Methods
Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...
Data analysis
Recent News. data analysis, the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data, generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for data-driven decision making.
The 7 Most Useful Data Analysis Techniques [2024 Guide]
4. The data analysis process. In order to gain meaningful insights from data, data analysts will perform a rigorous step-by-step process. We go over this in detail in our step by step guide to the data analysis process —but, to briefly summarize, the data analysis process generally consists of the following phases: Defining the question
Data Analysis
Data Analysis. Definition: Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets.
Cubic millimetre of brain mapped in spectacular detail
The 3D map covers a volume of about one cubic millimetre, one-millionth of a whole brain, and contains roughly 57,000 cells and 150 million synapses — the connections between neurons. It ...
Microsoft Research AI for Science
AI for Science to empower the fifth paradigm of scientific discovery. "Over the coming decade, deep learning looks set to have a transformational impact on the natural sciences. The consequences are potentially far-reaching and could dramatically improve our ability to model and predict natural phenomena over widely varying scales of space ...
Sigma is building a suite of collaborative data analytics tools
Sigma offers a range of tools built for business intelligence and data analytics workloads. Image Credits: Sigma Computing The go-to-market strategy has turned out to be a winning one.
Link Rot and Digital Decay on Government, News ...
Here are some of the findings from our analysis of digital decay in various online spaces. Webpages from the last decade. To conduct this part of our analysis, we collected a random sample of just under 1 million webpages from the archives of Common Crawl, an internet archive service that periodically collects snapshots of the internet as it exists at different points in time.
70 years after Brown v. Board of Education, new research shows rise in
As the nation prepares to mark the 70th anniversary of the landmark U.S. Supreme Court ruling in Brown v. Board of Education, a new report from researchers at Stanford and USC shows that racial and economic segregation among schools has grown steadily in large school districts over the past three decades — an increase that appears to be driven in part by policies favoring
A Step-by-Step Process of Thematic Analysis to Develop a Conceptual
Thematic analysis is a research method used to identify and interpret patterns or themes in a data set; it often leads to new insights and understanding (Boyatzis, 1998; Elliott, 2018; Thomas, 2006).However, it is critical that researchers avoid letting their own preconceptions interfere with the identification of key themes (Morse & Mitcham, 2002; Patton, 2015).
Home
Zillow Research aims to be the most open, authoritative source for timely and accurate housing data and unbiased insight. Have questions about buying, selling or renting during COVID-19? ... Change in New Listings +15.5% YoY. APRIL 2024 Change in Typical Mortgage Payment +11.6% YoY. The Latest
SNAP Benefits Remained Too Low to Help Families Afford Food in 2023
An analysis of data from 2023 found the U.S. Department of Agriculture's annual cost-of-living adjustment had little effect on the generosity of SNAP benefits in 2023. Key Findings. In the last quarter of 2023, a 53-cent gap remained between the $3.37 cost of a modestly priced meal, and SNAP's average maximum benefit of $2.84—a shortfall ...