Building a Tag Vocabulary That Matches Your Work

When you drop a folder of clips into VideoTagger, the AI does the heavy lifting: it recognizes people, objects, scenes, and actions, and tags every moment automatically. Most users are surprised at how much of the work is already done.

But the AI does not know your projects. It does not know that "the rooftop shot" is the one you keep reusing, or that "good B-roll" means something specific to your style. That layer — your own vocabulary — is what turns a tagged library into a useful one.

This post is about how to build that vocabulary without overthinking it.

Start From How You Already Search

The most common mistake is to design a tag taxonomy up front — sit down, draw a tree, and try to predict everything you will ever need. This almost always fails. You end up with categories you never use and gaps where the real searches happen.

Try the opposite instead:

Use the library for a week with only the AI tags.
Notice every time you wish a tag existed. Write down the words you actually mouthed.
After a week, add those words as custom tags. That is your real vocabulary.

The vocabulary that emerges this way is short, specific, and unique to how you think about footage.

Three Layers That Tend to Work

Across the workflows we have seen, useful tag vocabularies tend to sort into three layers. You do not need all three on day one — add them as you feel the need.

Layer 1 — Content (what is in the shot)

This is what the AI already gives you for free: people, objects, settings, actions. You rarely need to add to this layer manually. When you do, it is usually to disambiguate — for example, naming a specific person the AI labeled generically, or distinguishing two locations that look similar on camera.

Layer 2 — Quality and intent (how you would use this shot)

This is the layer the AI cannot fill in, and where the highest-value custom tags live. Examples:

hero — the shot you would build a sequence around.
b-roll — usable filler.
cutaway-safe — works in a cutaway without needing context.
audio-clean — usable for the dialogue, not just the picture.
reject — known bad take; never use.

These are subjective. That is the point — only you know them, and they are exactly what makes search powerful.

Layer 3 — Project / deliverable (where this shot is going)

This is the most ephemeral layer. Tags like:

proj-acme-launch
q3-newsletter
client-review-2026-05

They have a short useful life but they save real time. When you ship the project, you can keep them as historical breadcrumbs or remove them — your call.

Rules of Thumb That Save Pain Later

A few small habits make a big difference six months in:

Lowercase, hyphenated. hero-shot not Hero Shot. Tags become hard to clean up once you have hundreds of them in mixed styles.
One concept per tag. interview and outdoor separately, not outdoor-interview. The two-tag combination is more flexible.
Prefix project tags. proj- or client- keeps them visually grouped and easy to retire.
Resist synonyms. Pick one of hero / pick / selects and stick with it. The AI tags can sprawl; your tags should not.
Prune quarterly. Five minutes every three months removing dead project tags keeps the surface clean.

What "Done" Looks Like

A mature vocabulary is smaller than people expect — usually 20 to 50 custom tags total, on top of the AI's content tags. If you find yourself adding new tags every week after the first month, you are probably overspecifying. If you are still scrolling past unrelated results, you are probably underspecifying.

The signal you are looking for: searches that used to take minutes now take seconds, and the first three results are almost always what you wanted.

That is the point at which the library has become a real tool, not just a pile of tagged files.