Anika Dutta

All posts

Multi-Label Image Classifier

Let's build a multi-label image classifier that tags an image with any subset of a fixed label set. We'll tag scenes by the attributes that apply: outdoor, indoor, daytime, nighttime, people, vehicle, nature, architecture.

TaskMulti-label classification
InputImage
OutputList of tags from a fixed set
Label setoutdoor, indoor, daytime, nighttime, people, vehicle, nature, architecture
ModelGemini 3.5 Flash
SchemaPydantic with Literal[...] constraint
"""
Image Classification - Multilabel
=================================

Assign any subset of N tags to an image. Useful for scene tagging and
content categorization.
"""

from typing import List, Literal

from agno.agent import Agent, RunOutput
from agno.media import Image
from agno.models.google import Gemini
from pydantic import BaseModel, Field
from rich.pretty import pprint

SceneTag = Literal[
    "outdoor",
    "indoor",
    "daytime",
    "nighttime",
    "people",
    "vehicle",
    "nature",
    "architecture",
]


class Tagging(BaseModel):
    tags: List[SceneTag] = Field(
        ..., description="All scene tags that apply; empty if none"
    )


instructions = """\
Tag the image with every scene attribute that clearly applies. Include a
tag only if it is unambiguously present in the image - skip tags that are
inferred or implied.
"""


agent = Agent(
    model=Gemini(id="gemini-3.5-flash"),
    instructions=instructions,
    output_schema=Tagging,
)


if __name__ == "__main__":
    samples = [
        "https://agno-public.s3.amazonaws.com/images/krakow_mariacki.jpg",
        "https://storage.googleapis.com/generativeai-downloads/images/generated_elephants_giraffes_zebras_sunset.jpg",
    ]
    for url in samples:
        run: RunOutput = agent.run("Tag this image.", images=[Image(url=url)])
        pprint({"url": url, "result": run.content})

Read more