Does the smoking habit of customers depend on their region?

Description Background: Leveraging customer information is of paramount importance for most businesses. In the case of an insurance company, the attributes of customers can be crucial in making business decisions. Hence, knowing to explore and generate value out of such data can be an invaluable skill to have. Suppose you are hired as a Data Scientist in an Insurance company. The company wants to have a detailed understanding of the customer base for one of its Insurance Policy ‘MediClaim’. The idea is to generate insights about the customers and answer a few key questions with statistical evidence, by using the past dataset. The dataset ‘AxisInsurance’ contains customers’ details like age, sex, charges, etc. Perform the statistical analysis to answer the following questions using the collected data. Objective: Statistical Analysis of Business Data. Explore the dataset and extract insights from the data. The idea is for you to get comfortable with doing statistical analysis in Python. You are expected to do the following: Explore the dataset and extract insights using Exploratory Data Analysis. Prove(or disprove) that the medical claims made by the people who smoke are greater than those who don’t? Prove (or disprove) with statistical evidence that the BMI of females is different from that of males. Does the smoking habit of customers depend on their region? [Hint: Create a contingency table using the pandas.crosstab() function] Is the mean BMI of women with no children, one child, and two children the same? Explain your answer with statistical evidence. *Consider a significance level of 0.05 for all tests. Data Dictionary – Age – This is an integer indicating the age of the primary beneficiary (excluding those above 64 years, since they are generally covered by the government). Sex – This is the policy holder’s gender, either male or female. BMI – This is the body mass index (BMI), which provides a sense of how over or underweight a person is relative to their height. BMI is equal to weight (in kilograms) divided by height (in meters) squared. An ideal BMI is within the range of 18.5 to 24.9. Children – This is an integer indicating the number of children/dependents covered by the insurance plan. Smoker – This is yes or no depending on whether the insured regularly smokes tobacco. Region – This is the beneficiary’s place of residence in the U.S., divided into four geographic regions – northeast, southeast, southwest, or northwest. Charges – Individual medical costs billed to health insurance Submission Guidelines There are two ways to work on this project: i. Full-code way: The full code way is to write the solution code from scratch and only submit a final Jupyter notebook with all the insights and observations. ii. Low-code way. The low-code way is to use an existing solution notebook template to build the solution and then submit a business presentation with insights and recommendations. The primary purpose of providing these two options is to allow learners to opt for the approach that aligns with their individual learning aspirations and outcomes. The below table elaborates on these two options. Submission type Who should choose What is the same across the two What is different across the two Final submission file [IMP] Submission Format Full-code Learners who aspire to be in hands-on coding roles in the future focussed on building solution codes from scratch Perform exploratory data analysis to identify insights and recommendations for the problem Focus on code writing: 10-20% grading on the quality of the final code submitted Solution notebook from the full-code template submitted in .html format .html Low-code Learners who aspire to be in managerial roles in the future-focussed on solution review, interpretation, recommendations, and communicating with business Focus on business presentation: 10-20% grading on the quality of the final business presentation submitted Business presentation in .pdf format with problem definition, insights, and recommendations .pdf Please follow the below steps to complete the assessment. Kindly note that if you submit a presentation, ONLY the presentation will be evaluated. Please make sure that all the sections mentioned in the rubric have been covered in your submission. i. Full-code version Download the full-code version of the learner notebook. Follow the instructions provided in the notebook to complete the project. Clearly write down insights and recommendations for the business problems in the comments. Submit only the solution notebook prepared from the learner notebook [format: .html] ii. Low-code version Download the low-code version of the learner notebook. Follow the instructions provided in the notebook to complete the project. Prepare a business presentation with insights and recommendations to the business problem. Submit only the presentation [format: .pdf] 2. Any assignment found copied/plagiarized with other submissions will not be graded and awarded zero marks. 3. Please ensure timely submission as any submission post-deadline will not be accepted for evaluation. 4. Submission will not be evaluated if it is submitted post-deadline, or, more than 1 file is submitted. Best Practices for Full-code submissions The final notebook should be well-documented, with inline comments explaining the functionality of code and markdown cells containing comments on the observations and insights. The notebook should be run from start to finish in a sequential manner before submission. It is important to remove all warnings and errors before submission. The notebook should be submitted as an HTML file (.html) and NOT as a notebook file (.ipynb). Please refer to the FAQ page for common project-related queries. Best Practices for Low-code submissions The presentation should be made keeping in mind that the audience will be the Data Science lead of a company. The key points in the presentation should be the following: Business Overview of the problem and solution approach Key findings and insights which can drive business decisions Business recommendations Focus on explaining the key takeaways in an easy-to-understand manner. The inclusion of the potential benefits of implementing the solution will give you the edge. Copying and pasting from the notebook is not a good idea, and it is better to avoid showing codes unless they are the focal point of your presentation. The presentation should be submitted as a PDF file (.pdf) and NOT as a .pptx file. Please refer to the FAQ page for common project-related queries.