polprog, 15.02.2022
RomView is a set of python3 tools that allow you to look at a binary files as images. Example use includes examination of ROM contents, media images, bitmap files as well as software reverse engineering and data preservation. RomView can be used both for qualitative (entropy.py, textloc.py) or quantitative (romimage.py) analysis.
These scripts are a must have for anyone doing data preservation of old media or EPROMs or binary reverse engineering
romimage.py v1.0 - this script plots the binary file as 1bpp bitmap to show data patterns like fonts and bitmaps
entropy.py v1.3 - plots a heat map of file's entropy in sectors. Good for qualitative analysis of file contents
textloc.py v1.0 - grpahically identify location of printable strings in a file
Dependencies: matplotlib, numpy, argparse
pip install matplotlib numpy argparse
Run with -h for a summary of options and usage
Jump to section: romimage - entropy - textloc
romimage can show bitmap data inside files. It can also be used to quickly verify ROM dumps and image file copies.
For example, here is an image of an EGA card font ROM (left):
The right image shows a different video card's ROM. You can see that a large part at the end is not used (filled with 1's)
romimage also supports wider word sizes. Below are two screenshots of a video card driver floppy image (showing roughly the same area). These contain two bitmap fonts next to each other - one font is 8 bit wide (left), the other 16 (right). You can set the width (in bytes) with the -w WIDTH option. Notice how the 16-bit font looks when viewed in 8-bit mode.
romimage automatically picks the number of slices (heatmap columns) to make a square image. You can change that behavior by using the -l SLICESB option to specify the number of columns
Use the -g option to enable grid showing where the bytes end. In most cases it's not necesary. Most data meaningfully aligns with byte boundaries as you can see in the previous examples.
The entropy.py script allows for qualitative judgement of file's contents. It shows a heatmap of each sector's entropy. Originally designed to look at floppy disk images, it uses 18 sectors per track to show one track as one heatmap column, and calculates the entropy in 512-byte sectors. Both parameters can be changed from the defaults with command line options.
Entropy of a sector is defined as the number of unique byte values in that sector (or to be preicse, a sum of k * log2 k over all k, where k is the frequency with which a byte value appears in the sector). The entropy value is expressed in bits per byte. Different kinds of data have different entropy. Empty files (filled with uniform bytes, like 00 or FF) have a near zero entropy - there's only one value in the entire sector. Text files have a slightly higher entropy, as there will be about 110-ish unique values - printable ASCII characters. The highest entropy is for files that contain many unique byte values. This is typical for compressed, encrypted or random data.
** Ballpark entropy values (bits per byte) ** Value Data type ----------------------- 0 Uniform byte fill (empty sector) 5 - 6 ASCII Text 6 - 7 Machine code close to 8 Compressed or encrypted data 8 Maximum theoretical value (random data)
Below you can see annotated the entropy chart of the Microsoft Office 95 install disk along with the contents of the disk.
Unusual entropy patterns can indicate that a disk has bad sectors or otherwise corrupted data
The entropy script can also be used to look at files other than floppy images. The below image of a x86 Single Board Computer ROM was created by passing -s 32 option to make the heatmap 32 sectors high. You can also pass -z to display files which size is not a multiple of block size and -c BYTES to change the default block size
You can clearly see that this ROM contains a large block of compressed data and then some machine code at the end.
Here's an image of a ROM chip contents that were read as a ROM twice the size. Notice where the data ends and how the entropy reflects that
Textloc is a tool that hilights which parts of a binary file contain ascii text. The "printable character density heatmap" shows which sectors contain large amounts of printable text. These areas usually contain consolidated messages stored in the data section, and it is these messages that help to understand what a program does. Additionally, some architectures' machine code has similar entropy to text - in these cases textloc can then help differentiate between data and code sections
Graphical representation is a powerful tool which allows you to quickly analyze large amounts of data. Some information is lost - for example, you cannot read the contents of a text file by looking at it's bitmap image. But you can tell where a repetitive pattern is in such file far quicker than by scrolling in a text editor.
In a similar manner, looking at a ROM hexdump will tell you what kind of program is stored there (if there are any strings). But by looking at an entropy map you will be able to spot regions where there is machine code, strings or other data within several seconds.
A floppy disk image with a corrupted partition table is not readable using standard OS tools, but an entropy map will show if, and where, there is data.
Textloc will indicate where exactly a binary contains large blobs of ascii text, which then can be examined with a conventional text editor. Save yourself some scrolling! Connected with entropy.py, it gives a multidimensional insight into the binary structure.
RomView can be a good addition to your reverse engineering toolkit!