Skip to content
← Back to the blog

Inside MS-SHLLINK: a field-by-field tour of the .lnk binary format

· 5 min read

The Windows Shell Link format is one of those rare cases where what you see in Explorer is fully and publicly specified. Microsoft documents it as [MS-SHLLINK]Shell Link (.LNK) Binary File Format — and any conforming parser follows it byte-for-byte. Below is the practical tour, in the order a parser actually reads the file. I will flag the places parsers commonly get wrong and the fields a DFIR analyst should care about.

1. ShellLinkHeader (76 bytes)

The fixed-size header every .lnk opens with.

  • HeaderSize — always 0x0000004C (76). The four-byte little-endian 4C 00 00 00 is the most reliable Shell Link sniff test. Anything else is not a .lnk, whatever the extension says.
  • LinkCLSID00021401-0000-0000-C000-000000000046, the COM class id for ShellLink objects. Stored little-endian-shuffled per the COM convention; bytes are not in the order the human-readable form suggests.
  • LinkFlags — 32 bits that decide which optional sections follow. Bits worth knowing on sight: HasLinkTargetIDList, HasLinkInfo, HasName, HasRelativePath, HasWorkingDir, HasArguments, HasIconLocation, IsUnicode, ForceNoLinkInfo, HasExpString, RunInSeparateProcess, HasExpIcon.
  • FileAttributes — 32 bits mirroring the target's NTFS attributes (READONLY, HIDDEN, SYSTEM, DIRECTORY, ARCHIVE).
  • CreationTime / AccessTime / WriteTime — three 64-bit FILETIME values. 100-ns ticks since 1601-01-01 UTC. Snapshots taken at link creation; they do not update afterwards. A mismatch between header FILETIMEs and the current target's metadata tells you when the link itself was first written, which is frequently the more useful timestamp.
  • FileSize, IconIndex, ShowCommand, HotKey, plus reserved bytes that must be zero.

ShowCommand = 7 (minimized) on a .lnk that points at cmd.exe is a strong malware indicator. Worth flagging early.

2. LinkTargetIDList (optional)

Present iff HasLinkTargetIDList. A length-prefixed sequence of ItemID records that walks from a root shell folder (Desktop, My Computer, Network) through each shell namespace step to the target. Each ItemID has its own length prefix; the list ends with a zero-length terminator.

The exact encoding of each ItemID depends on the kind of shell item — drive, filesystem entry, control panel applet, network share — and is partly opaque to the format spec. Eric Zimmerman's LECmd and libyal's liblnk both implement the common shell item types; lnkparse3 covers fewer. When the IDList walk decodes cleanly but LinkInfo is missing or ForceNoLinkInfo is set, the IDList is your only path to the target, and a partial decode can still produce a usable path string.

3. LinkInfo (optional)

Present iff HasLinkInfo and ForceNoLinkInfo is clear. Carries the information needed to resolve the target when the IDList walk fails or is absent.

  • VolumeID — drive type, serial number, volume label. The serial is your removable-media correlation key.
  • LocalBasePath — the absolute path on the originating machine. Usually contains the username, which is a soft attribution signal.
  • CommonNetworkRelativeLink — UNC path and network provider type for shortcuts that lived on a network share.
  • CommonPathSuffix — the tail past the LocalBasePath or NetName.

Unicode variants of each path exist alongside the ANSI ones, present depending on the version flag inside the LinkInfo header. A parser that only reads the ANSI variants will misrepresent non-ASCII paths.

4. StringData (optional)

A sequence of length-prefixed strings, present in this order if the corresponding LinkFlag is set:

NAME_STRING → RELATIVE_PATH → WORKING_DIR → COMMAND_LINE_ARGUMENTS → ICON_LOCATION

Each string is length-prefixed (16-bit little-endian, in characters, not bytes) and encoded as UTF-16LE if IsUnicode is set, otherwise as the system ANSI code page. Order matters absolutely. A parser that does not track which LinkFlags were set will misread every subsequent string. This is the single most common bug in homegrown LNK parsers.

For DFIR: COMMAND_LINE_ARGUMENTS is where the phishing payload arguments live. ICON_LOCATION is where the spoofed icon path lives. WORKING_DIR often reveals USB drive letters and staging folders.

5. ExtraData blocks (optional, repeating)

Zero or more ExtraData blocks, each (size, signature, payload). The signature picks the block type:

SignatureBlock
0xA0000001EnvironmentVariableDataBlock
0xA0000002ConsoleDataBlock
0xA0000003TrackerDataBlock
0xA0000004ConsoleFEDataBlock
0xA0000005SpecialFolderDataBlock
0xA0000006DarwinDataBlock
0xA0000007IconEnvironmentDataBlock
0xA0000008ShimDataBlock
0xA0000009PropertyStoreDataBlock
0xA000000BKnownFolderDataBlock
0xA000000CVistaAndAboveIDListDataBlock

The list ends with a 4-byte TerminalBlock (size < 0x4). Iterate, route on the signature, decode the payload accordingly.

The famous one in forensics is TrackerDataBlock. It records the originating machine's NetBIOS name and a Distributed Link Tracking droid GUID derived from the MAC address. The droid is a v1 UUID; the last six bytes are the MAC. That is where attribution wins come from.

PropertyStoreDataBlock is a serialized property store with arbitrary GUID-keyed values. Sometimes carries an authoritative path that the LinkInfo does not. EnvironmentVariableDataBlock resolves at runtime — a literal %TEMP% is benign noise; a literal %TEMP%\evil.dll is not. DarwinDataBlock is the MSI installer descriptor; it appears in legitimate shortcuts to MSI-installed applications and looks weird if you have never seen it before.

Putting it together

A parser is a state machine driven by LinkFlags. Each flag turns a downstream section on or off. The order is non-negotiable. The spec is strict about reserved-must-be-zero bytes, so a single corrupted flag bit corrupts every subsequent section length.

Reference parsers worth diff-testing your output against: Eric Zimmerman's LECmd, libyal's liblnk, lnkparse3, and the Windows-LNK-Parsing-Library. When two of them agree and a third disagrees, the spec wins.

To see all of this on a real file, drop one into the parser on the home page — every field above is rendered explicitly.

Further reading