Building your own supervised learning model

In this project, you will build your own supervised machine learning model and investigate potential sources of bias. . This project does not require any coding experience! Template is for write up submission

This project (both Part 1 and 2) is due Friday, Feb. 26, 11:59 pm PST.

●  Part1

○ Instructions

○ Deliverables

●  Part2

○ Instructions

○ Deliverables

●  Rubric

●  Submission template

●  Gradescope submission page

Part 1

In your group, use Google’s Teachable Machine to create a machine learning model to classify images, human poses, or sounds. You may want to follow along with the tutorial here (which creates an image model that can tell if a banana is ripe or not).

Here are the suggested steps to building the machine:

1.

Determine the goal of your classifier: Your classifier could contribute to society or be completely just-for-fun. In either case, consider who might use your model, and how it might be used. Will you classify images, sounds, or poses? How many classes do you want the model to be able to identify, and what are the classes?

2.

Assemble a data set.2 It is easiest to upload and label all your files in a Google Drive folder, which you will then import into your Teachable Machine in Step 3.

Make sure you keep track of the source for each of your files! (See footnote for more information)3

1 If you would like our help to find a group, submit this form by Friday Feb. 19.

2 You may NOT use a pre-curated dataset (from kaggle, google, etc), you need to construct your OWN dataset.

3Make sure you keep track of the source for each of your files, as you will be deducted points if you’re missing the source link for any of them! You CANNOT say you got the file from Google or the internet,

3. Training: Your training data set should have at least 15 samples per class.4 You can use files from the internet or your webcam to create your training set. Upload your samples for each class to the Teachable Machine. Check out the video here.

a. Testing: Your testing data set (different from your training set!) should have at least 5 samples per class.5 Be sure to make this data set as representative (i.e., not biased) as possible. This test set will remain the same between Part 1 and 2.

4.

Train your model: Upload only the training data to your Teachable Machine and click on “Train Model". Check out the video here.

5.

Test your model using your testing data set: On the right side of the screen you should see a window with the title “Preview”. Click on the dropdown menu and select “File” (red box in the screenshot below) and upload your test samples one at a time.6

Record the observations (to which class was each sample classified, and at what probability?).

You may want to iterate on your model to improve accuracy: See what happens if you change your number of training data points, epoch, batch size, or learning rate. Aim to get at least 80% accuracy.

6.

Save your model to Google Drive (red box in the screenshot below; “Save project to Drive”). This saves a .zip file that contains all the samples in each of your classes to Drive. You can then open that .zip again from Teachable Machine later to pick up where you left off. Check out the video here.7

you must provide the exact link where you found the file. Tip: One easy way to do this (for images) is creating a google doc where you paste the image file and its URL, and then downloading the google doc as a Zipped HTML file. If you open that, you will see an “Images” folder with all the images inside.

4 If you are using audio, please use 15 different kinds of audio files (with each file at least 3-5 seconds long)

5 If you are using audio, please use 5 different kinds of audio files (with each file at least 3-5 seconds long)

6 Note: If you decided to train your model on audio instead of images, you actually don’t have the option to select files from your computer, you can only input from your microphone. Please create test audio clips and save them onto your computer, then play them out loud while the built in mic is running.

7 Do NOT use “Export Model” in the preview screen -- this only shares the model without showing any any training data

Part 1 Deliverables

Use t his template for your deliverables. If you are working in a group, please make sure that each person in your group submits the deliverables on Canvas. The write-up should be each student’s own work; i.e., you may discuss your answers with your group or other students but you may not share any written materials.

●  A Google Drive link to your project (.zip file for Teachable machine algorithm that you created in step #5 above) from Part 1. Please make sure that you set the link to be accessible to anyone with a Stanford email.

●  A Google Drive link with your test set (folder with test data, which will remain the same for Part 1 and Part 2). Please make sure that you set the link to be accessible to anyone with a Stanford email.

●  A write-up answering the following:

○  [1 paragraph] Description of model: What is the computational goal of your

Teachable Machine (i.e., the thing you want the model to be able to do)? How many classes do you want the model to be able to identify, and what are the classes? What is a potential use case (e.g., how might the algorithm be used)?

○  [1 paragraph] Description of dataset: How did you assemble your datasets? How did you make sure that the datasets are as representative (and not biased) as they can be? Describe the sources of your data for both training and testing.

Include screenshots and descriptions for at least three training data points per class.

Include screenshots and descriptions for at least two testing data points per class.

○  [1 paragraph] Analysis of results:

What was the accuracy on your test data set? (number of correct predictions / number of total test samples)

Select at least one instance of successful prediction and one instance of

failed prediction from the test data set. Provide your own hypothesis about the reason for the success/failure.

Part 2

After you have completed your Teachable Machine model in Part 1, you and your group will attempt to intentionally introduce or exacerbate existing algorithmic bias.

Here are the suggested steps:

1.

Make a copy of your Teachable Machine from Part 1.

2.

Read Week 6 Reading 2: 2019 Brookings report on algorithmic bias detection and mitigation (link here).

3.

Introduce or exacerbate algorithmic bias in your model: Change your data set and/or Teachable Machine model to intentionally highlight one of the forms of algorithmic bias named in the Brookings report (or a bias that has some similarity/connection to algorithmic bias described in the Brookings report). (Your model should still have at least 15 samples per class but you don’t have to replace all of them, just some changes)

4.

Test your model: Test your model again but using the same testing data set from Part 1. Record the observations.

5.

Save your new model to Google Drive. This saves a .zip file that contains all the samples in each of your classes to Drive. You can then open that .zip again from Teachable Machine later to pick up where you left off.

Part 2 Deliverables

Use t his template for your deliverables.

●  A Google Drive link to your updated project (Teachable Machine .zip file) from Part 2.

● A write-up answering:

○  [1 paragraph] Description of bias: What type of algorithmic bias did you introduce/exacerbate in your machine learning model? Cite the Brookings paper when possible. What did you change in order to cause this bias?

○  [1 paragraph] Impact on accuracy: How did the bias you introduced/exacerbated in the data set affect your model's performance? Report the test results from Step 3. Include screenshots and descriptions of at least 2 egregiously misclassified data points to illustrate your point.

○  [1 paragraph] Reflection on algorithmic bias:

What are some forms of bias detection (suggested in the Brookings article) that would help reduce bias in this situation? Speculate on how successful this mitigation might be in reducing bias over the long term.

Describe a real-world case of algorithmic bias that is related to the bias

you used in Part 2. How might this type of bias potentially lead to harm or negative consequences? Illustrate with the real-world example (and cite a source if you find something that actually happens/happened in real life!) and/or your own Teachable Machine as a case study.