Skip to content

menkes-tom/Advanced_DL_Group10_2023

Repository files navigation

Data Exploration and Visualization - Group 10 - 2023

In the first part of the project, the following actions were performed for data exploration and visualization:

  1. Data Loading: The dataset consisting of 1.6 million headlines from The Times Irish news site was loaded via Kaggle.

  2. Visualization:

    • Word Clouds: Word clouds were generated to visualize the most frequent words in the headlines, providing an overview of the key themes and topics. Most Common Entities per Category

    • Bar Plots: Bar plots were used to display the distribution of headlines across different categories or classes, giving an understanding of the class imbalance and the prevalence of each category. We also visualized the sentiment and sentiment polarity for each category: Number of Headlines by Category per Year Average Sentiment per Category

      Sentiment Polarity

    • Histograms: Histograms were created to analyze the headline lengths, helping identify any patterns or trends in headline lengths. Top Words by Category

    • Count Plots: Count plots were employed to visualize the distribution of headlines based on specific criteria, such as the source or publication date, allowing for a deeper exploration of the data.

    • Pie Charts: Pie charts were used to present the proportion of headlines in each category, providing a visual representation of the class distribution: Percentage of Headlines by Category

  3. Statistical Analysis:

    • Descriptive Statistics: Descriptive statistics were calculated to summarize headline lengths, category frequencies, and other relevant metrics.
    • Correlation Analysis: Correlation analysis was performed to identify any relationships or dependencies between headline features, providing insights into potential associations between variables.
  4. Data Preprocessing: Preprocessing steps were undertaken to clean and prepare the data for subsequent modeling, including text normalization, handling missing values, and encoding categorical variables.

By employing these exploratory data analysis techniques, we gained a comprehensive understanding of the dataset's characteristics, distribution, and key insights. These actions set the foundation for further model development and optimization in the second part of the project.

About

Part A of the final course assignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published