WhatFormat — How to Quickly Identify Any File ExtensionIdentifying a file extension sounds simple — just look at the letters after the dot in a filename — but in practice many files are ambiguous, misnamed, or use uncommon extensions. This guide explains how file extensions work, why they sometimes fail, and practical methods (manual, built-in OS tools, and third‑party utilities) to quickly and reliably determine a file’s real format so you can open, convert, or handle it safely.
What a file extension is (and what it isn’t)
A file extension is the suffix at the end of a filename (for example, .pdf, .jpg, .mp3) that conventionally indicates the file’s format. Extensions are simple and useful, but they are not authoritative:
- An extension is only a hint to the operating system and applications about how to handle a file.
- Files can be renamed — changing the extension does not change the file’s internal structure.
- Different formats can share an extension (rare but possible), and some formats use no extension at all.
Understanding this distinction helps when extensions are missing or misleading.
Why quick identification matters
- Prevents trying to open a file with the wrong program (saving time and reducing errors).
- Helps detect potentially malicious files disguised with safe-looking extensions (e.g., an executable named picture.jpg.exe on Windows).
- Allows correct selection of conversion or editing tools.
- Essential when receiving files from unknown sources or dealing with legacy archives.
Visual inspection: first quick checks
- Look at the filename and extension. Many files are correctly labeled and this is often all you need.
- Check file size. Tiny files claim to be HD videos? That’s suspicious.
- Inspect file icons. OS-assigned icons can hint at associated applications.
- Consider the source. Files from cameras, phones, or email attachments often follow predictable formats (.heic, .mov, .zip, .docx).
If any of these raise doubts, move to deeper checks.
Use built-in OS tools
-
Windows:
- Turn on file extensions in File Explorer (View → File name extensions) so you don’t rely on hidden suffixes.
- Right-click → Properties shows “Type of file” and the file’s digital signature details for some formats.
- Use the command prompt: file signatures can be examined with certutil -hashfile or by opening a file in a hex viewer.
-
macOS:
- Finder shows extensions; use Get Info (Cmd+I) to see “Kind” and associated app.
- Quick Look (spacebar) previews many file types without opening them.
- Use Terminal’s file command (see below).
-
Linux / Unix:
- The file utility (file filename) inspects magic numbers and identifies a file’s format reliably.
- Tools like hexdump or xxd let you view the file’s initial bytes.
Identify by file signature (magic numbers)
Most file formats include a distinctive sequence of bytes at the start of the file called a “magic number.” Checking these bytes is more reliable than trusting the extension.
Common examples:
- PDF: begins with %PDF (25 50 44 46 in hex)
- PNG: begins with 89 50 4E 47 0D 0A 1A 0A
- JPG: begins with FF D8 FF
- ZIP / DOCX / EPUB: begins with 50 4B 03 04
- MP4 / MOV: contain ftyp within the first 32 bytes
Tools to read magic numbers:
- file (Unix/macOS)
- hexdump / xxd
- Binary/hex viewers (HxD on Windows, 0xED on macOS)
- Online hex viewers (use cautiously; don’t upload sensitive files)
Example (Linux/macOS):
file example.jpg
This typically returns the detected format and encoding.
Use dedicated identification tools and services
- TrID: a signature-based file identifier with an extensive signature database. Works offline, supports batch checks, and is available for multiple platforms.
- DROID (Digital Record Object Identification): developed by The National Archives (UK) for large-scale archival identification using PRONOM signatures.
- Apache Tika: Java library that detects file types by combining magic number checks with mime-type heuristics; useful in apps and servers.
- Online detectors: websites that analyze uploaded files and return format information. Only use for non-sensitive files.
Check metadata and internal structure
Some formats include clear metadata structures you can inspect:
- Office files (DOCX, XLSX, PPTX) are ZIP archives containing XML — open with an unzip tool to inspect contents.
- Image formats may include EXIF metadata with camera model, timestamp, and orientation.
- Audio/video containers (MKV, MP4) contain headers with codec and stream info — tools like MediaInfo reveal this.
Commands/tools:
- unzip -l file.docx
- mediainfo file.mp4
- exiftool image.heic
Handling files with no or ambiguous extensions
- Use file (Unix) or TrID to detect format from content.
- Try opening with a universal viewer (like VLC for multimedia, or LibreOffice for documents).
- For unknown archives, attempt common archive tools (unzip, 7-Zip, tar) — many formats are just wrappers.
- If you suspect an executable or malware, don’t open it; scan with antivirus in an isolated environment.
Batch and automated identification
For many files, manual checks are impractical. Use:
- TrID’s batch mode or DROID for large collections.
- Write simple scripts around file or TrID to generate CSV reports (filename, detected type, confidence).
- For servers, integrate Apache Tika or libmagic bindings to automatically tag uploaded files.
Example (bash):
for f in *; do file -b --mime-type "$f"; done
Converting once identified
After identification, choose appropriate conversion tools:
- Images: ImageMagick, ffmpeg, or dedicated converters (heic-to-jpeg).
- Audio/video: ffmpeg (very versatile).
- Documents: LibreOffice’s soffice –headless, pandoc, or cloud converters. Always verify converted output for fidelity.
Security tips
- Never trust an extension alone. Check signatures for executables (.exe, .dll) or scripts.
- Scan unknown files with up-to-date antivirus or use sandboxed VMs.
- Avoid uploading sensitive files to unknown online detectors.
- When receiving attachments, prefer verified file formats (PDF/A for documents, standardized image codecs).
Quick reference: common extensions and what they usually mean
- .pdf — Portable Document Format (document)
- .docx — Microsoft Word (Open XML document, actually a ZIP of XMLs)
- .xlsx — Microsoft Excel (Open XML spreadsheet)
- .jpg / .jpeg — JPEG image
- .png — PNG image
- .heic — HEIF image (Apple/modern phones)
- .mp3 — MPEG audio
- .mp4 — MPEG-4 container (video/audio)
- .mkv — Matroska container (video/audio)
- .zip — ZIP archive
- .tar.gz / .tgz — gzipped tar archive
Summary checklist to quickly identify any file
- Show extensions in your OS and view the filename.
- Run file (or equivalent) to read magic numbers.
- Use dedicated tools: TrID, DROID, Apache Tika, or MediaInfo.
- Inspect internal structure (unzip, exiftool, mediainfo).
- For bulk tasks, script file/TrID and produce reports.
- When in doubt, scan and open in a safe, sandboxed environment.
If you want, I can: provide a short script for batch-identifying files on your OS of choice (Windows PowerShell, macOS/Linux bash), or walk through using TrID or DROID step-by-step.
Leave a Reply