My first deep learning model: Doo Doo Detective
I’m taking Fast AI’s Practical Deep Learning for Coders class now and I’ve just created and deployed my very first end-to-end deep learning image recognition model. In Fast AI’s words, “using a model that’s so advanced it was considered at the cutting edge of research capabilities in 2015.”
For my real world problem, I decided to train an AI that could detect whether or not my dog is doing her business. You see, my dog Kona has a nasty habit of doing her business in a very particular spot in the upstairs of our house.
If you’ve trained a dog, you know the challenges of trying to catch them in the act. With this new model, I’ll be able push photos from a webcam and trigger a text alert so I can catch her in the act and train her in real time.
- Fetching 100’s of images of dogs doing their business and dogs sitting, standing and laying normally. I used Duck Duck Go image search for this and some manual clean up. This took 30 minutes.
- Use the Fast AI libraries vision learner to fine-tune ResNet-18 on my images. ResNet-18 is an 18-layer convolutional neural network pre-trained on more than a million images from the ImageNet database.
- Run my model on test data to inspect false positives and false negatives (using their confusion matrix library) and clean up and remove samples before re-training my fine tune a few times until it reached a “good enough” quality on my security camera images for my purposes.
- Learn how to build a Gradio app so that I can offer a web UI for my model on Hugging Face. Gradio is a simple Python library that handles forms and other useful things.
- Export my model file and deploy it along with my Gradio app’s Python code to Hugging Face for all to see.
Total time: ~2 hours. It’s incredible to think that in just a couple of hours I’ve created an ability that just 5 years ago I would have thought was science fiction.
Check out my AI model, The Doo Doo Detective on Hugging Face! 🐕💩🌈
Finally, I attempted to use DALL·E 2‘a in-painting features to create synthetic training of my dog doing her business in my office so that I’d have a few much more relevant training images for my fine-tuned model. Unfortunately, “dog pooping” or even “dog doing her business” are flagged by DALL·E 2. This is quite annoying given the context in which I would be using those images, and frankly just in general. 😐