.

My first deep learning model: Doo Doo Detective

Joe Heitzeberg
Joe Heitzeberg
September 26, 2022

I’m taking Fast AI’s Practical Deep Learning for Coders class now and I’ve just created and deployed my very first end-to-end deep learning image recognition model. In Fast AI’s words, “using a model that’s so advanced it was considered at the cutting edge of research capabilities in 2015.”

For my real world problem, I decided to train an AI that could detect whether or not my dog is doing her business. You see, my dog Kona has a nasty habit of doing her business in a very particular spot in the upstairs of our house.

(Photo) A screenshot of a Hugging Face application interface featuring a photo of a dog and text related to dog behavior analysis. Text: Hugging Face Search models, datasets, users... Spaces: sloppyjoe/doodoodetective App Files Community Settings Models Datasets Spaces Docs Solutions Pricing like 0 See logs • Running Is this a photo of a doing his business? Or just a dog hanging out? My dog Kona has a nasty habit of doing her business in a very particular spot in the upstairs of our house. Using this model, I'll be able to aim a camera at that spot and send a text alert whenever she does this, so I'll have a better chance of correcting the bad behavior in the moment, which is essential to training a dog. img Clear Submit output normal normal 98% pooping 2% dog | indoor | single dog, patterned rug | candid Note: The central element of the image within the application interface is a photograph of a dog in a room. The surrounding text describes the purpose and context of the photo.

If you’ve trained a dog, you know the challenges of trying to catch them in the act. With this new model, I’ll be able push photos from a webcam and trigger a text alert so I can catch her in the act and train her in real time.

Exposing a model as a REST API is fairly easy using services like Replicate or Render, but as a first milestone, I’ve deployed a quick test app on 🤗Hugging Face using Gradio. Overall project steps:

  • Fetching 100’s of images of dogs doing their business and dogs sitting, standing and laying normally. I used Duck Duck Go image search for this and some manual clean up. This took 30 minutes.
  • Use the Fast AI libraries vision learner to fine-tune ResNet-18 on my images. ResNet-18 is an 18-layer convolutional neural network pre-trained on more than a million images from the ImageNet database.
  • Run my model on test data to inspect false positives and false negatives (using their confusion matrix library) and clean up and remove samples before re-training my fine tune a few times until it reached a “good enough” quality on my security camera images for my purposes.
  • Learn how to build a Gradio app so that I can offer a web UI for my model on Hugging Face. Gradio is a simple Python library that handles forms and other useful things.
  • Export my model file and deploy it along with my Gradio app’s Python code to Hugging Face for all to see.

Total time: ~2 hours. It’s incredible to think that in just a couple of hours I’ve created an ability that just 5 years ago I would have thought was science fiction.

Check out my AI model, The Doo Doo Detective on Hugging Face! 🐕💩🌈

Finally, I attempted to use DALL·E 2‘a in-painting features to create synthetic training of my dog doing her business in my office so that I’d have a few much more relevant training images for my fine-tuned model. Unfortunately, “dog pooping” or even “dog doing her business” are flagged by DALL·E 2. This is quite annoying given the context in which I would be using those images, and frankly just in general. 😐

(Illustration) An illustration of a small orange and white kitten and a small corgi puppy. Text: It looks like this request may not follow our content policy. flat | Colors: #f2ae3d, #a3826d, #ffffff Note: The image is a drawing of animals, which fits the category of illustration. It's a simple, cartoon-like drawing.

Ready for more?

Check out other posts from this blog.

View all posts