1-user_knowledge.csv
: This dataset has 403 rows and 6 columns. This dataset
relates to learning efforts and knowledge levels of participants in a
corporate training course.
# | Name | Refined Definition | Data Type |
---|---|---|---|
1 | STG | Study Time on Goal Material – How much time a participant spends studying the test material. | Quantitative |
2 | SCG | Study Count on Goal Material – How often a participant reviews or repeats the test material. | Quantitative |
3 | STR | Study Time on Related Material – How much time a participant spends learning content related to, but not part of, the test material. | Quantitative |
4 | LPR | Performance on Related Practice – How well a participant performs on quizzes related to the subject but not on the main goal content. | Quantitative |
5 | PEG | Performance on Exam Goals – How well a participant scores on questions that assess the main learning objectives. | Quantitative |
6 | UNS | User Knowledge Level – A label that reflects the participant's overall understanding. | Qualitative |
2-car_acceptance.csv
: The data set has 1728 rows and 7 columns in which car
attributes such as price and technology are described across 6 attributes such
as Buying Price, Maintenance, and Safety etc.
# | Name | Definition | Data Type |
---|---|---|---|
1 | buying | Buying price of the car (thousand USD) | Quantitative |
2 | maint | Price of the maintenance of car (thousand USD) | Quantitative |
3 | doors | Number of doors (2, 3, 4, 5-more) | Qualitative |
4 | persons | Capacity in terms of persons to carry (2, 4, more) | Qualitative |
5 | lug_boot | Indicator whether the car has a large luggage boot (TRUE, FALSE) | Qualitative |
6 | safety | Estimated safety of the car (low, med, high) | Qualitative |
7 | class | Car acceptability (0: unacceptable, 10: very good) | Quantitative |
3-online_news_popularity.csv
: This dataset has 39644 rows and 61 columns. This
dataset summarizes a heterogeneous set of features about articles published by
Mashable in a period of two years.
# | Name | Definition | Data Type |
---|---|---|---|
1 | URL | URL of the Article | Qualitative |
2 | Timedelta | Days Between the Article Publication and the Dataset Acquisition | Quantitative |
3 | N_Tokens_Title | Number of Words in the Title | Quantitative |
4 | N_Tokens_Content | Number of Words in the Content | Quantitative |
5 | N_Unique_Tokens | Rate of Unique Words in the Content | Quantitative |
6 | N_Non_Stop_Words | Rate of Non-Stop Words in the Content | Quantitative |
7 | N_Non_Stop_Unique_Tokens | Rate of Unique Non-Stop Words in the Content | Quantitative |
8 | Num_Hrefs | Number of Links | Quantitative |
9 | Num_Self_Hrefs | Number of Links to Other Articles Published by Mashable | Quantitative |
10 | Num_Imgs | Number of Images | Quantitative |
11 | Num_Videos | Number of Videos | Quantitative |
12 | Average_Token_Length | Average Length of the Words in the Content | Quantitative |
13 | Num_Keywords | Number of Keywords in the Metadata | Quantitative |
14 | Data_Channel_Is_Lifestyle | Is Data Channel 'Lifestyle'? | Quantitative |
15 | Data_Channel_Is_Entertainment | Is Data Channel 'Entertainment'? | Quantitative |
16 | Data_Channel_Is_Bus | Is Data Channel 'Business'? | Quantitative |
17 | Data_Channel_Is_Socmed | Is Data Channel 'Social Media'? | Quantitative |
18 | Data_Channel_Is_Tech | Is Data Channel 'Tech'? | Quantitative |
19 | Data_Channel_Is_World | Is Data Channel 'World'? | Quantitative |
20 | Kw_Min_Min | Worst Keyword (Min. Shares) | Quantitative |
21 | Kw_Max_Min | Worst Keyword (Max. Shares) | Quantitative |
22 | Kw_Avg_Min | Worst Keyword (Avg. Shares) | Quantitative |
23 | Kw_Min_Max | Best Keyword (Min. Shares) | Quantitative |
24 | Kw_Max_Max | Best Keyword (Max. Shares) | Quantitative |
25 | Kw_Avg_Max | Best Keyword (Avg. Shares) | Quantitative |
26 | Kw_Min_Avg | Avg. Keyword (Min. Shares) | Quantitative |
27 | Kw_Max_Avg | Avg. Keyword (Max. Shares) | Quantitative |
28 | Kw_Avg_Avg | Avg. Keyword (Avg. Shares) | Quantitative |
29 | Self_Reference_Min_Shares | Min. Shares of Referenced Articles in Mashable | Quantitative |
30 | Self_Reference_Max_Shares | Max. Shares of Referenced Articles in Mashable | Quantitative |
31 | Self_Reference_Avg_Sharess | Avg. Shares of Referenced Articles in Mashable | Quantitative |
32 | Weekday_Is_Monday | Was the Article Published on a Monday? | Quantitative |
33 | Weekday_Is_Tuesday | Was the Article Published on a Tuesday? | Quantitative |
34 | Weekday_Is_Wednesday | Was the Article Published on a Wednesday? | Quantitative |
35 | Weekday_Is_Thursday | Was the Article Published on a Thursday? | Quantitative |
36 | Weekday_Is_Friday | Was the Article Published on a Friday? | Quantitative |
37 | Weekday_Is_Saturday | Was the Article Published on a Saturday? | Quantitative |
38 | Weekday_Is_Sunday | Was the Article Published on a Sunday? | Quantitative |
39 | Is_Weekend | Was the Article Published on the Weekend? | Quantitative |
40 | Lda_00 | Closeness to Lda Topic 0 | Quantitative |
41 | Lda_01 | Closeness to Lda Topic 1 | Quantitative |
42 | Lda_02 | Closeness to Lda Topic 2 | Quantitative |
43 | Lda_03 | Closeness to Lda Topic 3 | Quantitative |
44 | Lda_04 | Closeness to Lda Topic 4 | Quantitative |
45 | Global_Subjectivity | Text Subjectivity | Quantitative |
46 | Global_Sentiment_Polarity | Text Sentiment Polarity | Quantitative |
47 | Global_Rate_Positive_Words | Rate of Positive Words in the Content | Quantitative |
48 | Global_Rate_Negative_Words | Rate of Negative Words in the Content | Quantitative |
49 | Rate_Positive_Words | Rate of Positive Words Among Non-Neutral Tokens | Quantitative |
50 | Rate_Negative_Words | Rate of Negative Words Among Non-Neutral Tokens | Quantitative |
51 | Avg_Positive_Polarity | Avg. Polarity of Positive Words | Quantitative |
52 | Min_Positive_Polarity | Min. Polarity of Positive Words | Quantitative |
53 | Max_Positive_Polarity | Max. Polarity of Positive Words | Quantitative |
54 | Avg_Negative_Polarity | Avg. Polarity of Negative Words | Quantitative |
55 | Min_Negative_Polarity | Min. Polarity of Negative Words | Quantitative |
56 | Max_Negative_Polarity | Max. Polarity of Negative Words | Quantitative |
57 | Title_Subjectivity | Title Subjectivity | Quantitative |
58 | Title_Sentiment_Polarity | Title Polarity | Quantitative |
59 | Abs_Title_Subjectivity | Absolute Subjectivity Level | Quantitative |
60 | Abs_Title_Sentiment_Polarity | Absolute Polarity Level | Quantitative |
61 | Shares | Number of Shares | Quantitative |
62 | High_Shares | Dummy indicator whether the article has been highly shared | Qualitative |
4-repurchase_likelihood.csv
: This dataset has 891 rows and 10 columns. This
dataset builds on the Titanic Survival dataset and was renamed to summarize a
set of factors that might predict the likelihood of a repeated purchase in from
a retail store chain. The data is prepared as if it were taken from a loyalty
card programme.
# | Name | Definition | Data Type |
---|---|---|---|
1 | customer_id | Customer ID | Qualitative |
2 | repurchase | Indicator whether a repurchase occured (0: No, 1: Yes) | Qualitative |
3 | customer_tier | Loyalty tier to which a customer was assigned (1: Frequent, 2: Repated, 3: Seldom) | Qualitative |
4 | customer_name | Name of the customer | Qualitative |
5 | customer_sex | Sex of the customer (female, male) | Qualitative |
6 | customer_age | Age of the customer (parents sometimes created loyalty cards for their children) | Quantitative |
7 | customer_siblings | Number of siblings that also have a loyalty card | Quantitative |
8 | customer_parents | Number of parents/grandparents that also have a loyalty card | Quantitative |
9 | recent_purchase | Amount spent on the most recent purchase | Quantitative |
10 | customer_location | Preferred location of the customer (C: Cherbourg, Q: Queenstown, S: Southampton) | Qualitative |
5-subsidiary_income.csv
: This dataset has 10,159 rows and 14 columns. The data
contains income data for foreign subsidiaries in multiple countries and years.
It contains anonymized numeric and categorical features.
# | Name | Definition | Data Type |
---|---|---|---|
1 | Year | Year of the record | Integer |
2 | CountryCode | ISO code representing the country | String |
3 | DomesticId | Unique identifier for domestic entities | Integer |
4 | ForeignId | Unique identifier for foreign entities | Integer |
5 | Income | Income value, possibly in local currency | Float |
6 | NumFeature1 | Numeric feature 1 | Float |
7 | NumFeature2 | Numeric feature 2 | Float |
8 | NumFeature3 | Numeric feature 3 | Float |
9 | NumFeature4 | Numeric feature 4 | Float |
10 | NumFeature5 | Numeric feature 5 | Float |
11 | NumFeature6 | Numeric feature 6 | Float |
12 | FactorFeature1 | Categorical or factor feature 1 | String |
13 | FactorFeature2 | Categorical or factor feature 2 | String |
14 | FactorFeature3 | Categorical or factor feature 3 | String |
Datasets 1-4 were sourced from Data Science Dojo. Dataset 5 is synthetic data based on information from the Austrian National Bank.
This data set has been originally sourced from the Machine Learning Repository of the University of California, Irvine User Knowledge Modeling Data Set (UC Irvine). The UCI page mentions the following publication as the original source of the data set: H. T. Kahraman, Sagiroglu, S., Colak, I., Developing intuitive knowledge classifier and modeling of users' domain dependent data in web, Knowledge Based Systems, vol. 37, pp. 283-295, 2013.
This data set has been sourced from the Machine Learning Repository of University of California, Irvine Car Evaluation Data Set (UC Irvine). The UCI page mentions following as the donors of the dataset: Marko Bohanec (marko.bohanec '@' ijs.si) and Blaz Zupan (blaz.zupan '@' ijs.si).
This data set has been sourced from the Machine Learning Repository of University of California, Irvine Online News Popularity Data Set (UC Irvine). The UCI page mentions the following publication as the original source of the data set: K. Fernandes, P. Vinagre and P. Cortez. a Proactive Intelligent Decision Support System for Predicting the Popularity of Online News. Proceedings of the 17th EPIA 2015 - Portuguese Conference on Artificial Intelligence, September, Coimbra, Portugal.