Question 1
The file “Sports Sales.csv” contains data on the sales of products by sports companies around the world. Write a Python script that performs the following tasks in the given order. If you are using Jupyter Notebook, your script must be self-contained in a single code cell. That is, all the given tasks are performed without any error or warning when your script is run once from a single code cell. The tasks are:
- Read the dataset into a DataFrame called df. Then, display the first 5 rows of the dataset.
- Print out the number of records in the dataset and the total number of missing values.
- Remove the records where the Date, Customer ID, Customer Gender, Country, or Product Category fields have missing data. Save the result in a DataFrame called df_cleaned. Print out the total number of records removed this way.
- Fill in the remaining missing data in the fields of df_cleaned with the mean of the field. Print out the total number of values filled this way.
- Convert the column “Date” of df_cleaned to DateTime datatype (assume that the dates are day first). Then, set the column “Date” as the index and sort these dates in descending order.
- Convert the datatype of the numeric columns in df_cleaned to integer datatype. Note that the numbers should be rounded to the nearest integer after the conversion. Print out the data types of all the columns for confirmation.
- Add columns “Year” and “Quarter” to df_cleaned, where the column “Year” contains the year of the date in the index, and the column “Quarter” contains the quarter of the year of the date in the index. Then, display the first 5 rows of the dataset.
- Using df_cleaned, create a DataFrame called df_customers that keeps 5 sums –Order Quantity, Unit Cost, Unit Price, Cost, Revenue, and Profit — for each customer. Note that each customer is identified by his or her unique Customer ID. Then, sort the dataset by Revenue in descending order. Then, display the first 5 rows of the dataset.
- Using df_cleaned, create a dictionary called df_countries that keeps the unique
values in the column “Country” as its keys and keeps the dataset for each country as its values. For example, df_countries[“United States”] should reference the DataFrame containing the data for only the United States. The column “Country” should be dropped from this DataFrame. You should test this and display the resulting DataFrame. Extra marks will be given for automation.
Stuck in This Assignment? Deadlines Are Near?
Question 2
The file “Survey.csv” is a dataset that contains the results of a survey on social media users. The questions ask about:
- the background (demographics) of the respondent,
- the types of social media that are consumed by the respondent, and
- the types of issues that the respondent takes interest in on social media.
Each column (except the first) in the dataset corresponds to a question in the survey. The questions are given in row 8 and the category of the questions is in row 7. From row 9 onward, each row in the dataset corresponds to a respondent of the survey. The possible answers to each question in the survey are given in the top rows, that is, from row 1 up to row 6. In addition, the types of issues that the respondents are asked about are categorized into:
- national issues, and
- local issues.
In particular, the columns “Living Costs” up to “NationalOthers” belong to national issues, and the columns “Land” up to “LocalOther” belong to local issues.
Get 30% Discount on This Assignment Answer Today!