PLB Data Check

PLB Data Check is designed to scan a PLB data file and determine the individual data record lengths within the data file.  Older MS-DOS command line program like pcDataCheck and fileStat had become a valuable tool within our collection of programmer utilities.  However they were not compatible with 64-bit operating systems. PLB Data Check was designed to replace the basic functionality of those utilities, while utilizing the benefits of windows style program to make them easier to use as well as provide additional information.

plbDataCheckIcon.png
 

PLB Data Check - Data Record Length Information

PLB Data Check - Visual Data Comparison

PLB Data Check Preferences - File Scanning

PLB Data Check Preferences - Character Scanning

PLB Data Check Preferences - After Scanning and Output File

PLB Data Check Terminology

Open: Allows the user to choose which file (or files) will be opened and scanned by the PLB Data Check program. During the “Scan” process the file will be opened in read only mode, no changes are made to this file. If more than one file is selected, a new instance of PLB Data Check will be started for each additional file selected. Up to 99 instances of PLB Data Check can be running at the same time.

Output: Allows the user to specify a file name that will be created during the scanning process. The file will contain all data records that match the selection method. Creating this output file is an optional process, default settings can be established in the preference settings.

Scan: Starts the process of reading the “Data File” and determining records length and other file stats. See “PLB Data Check Results” and “PLB Data Check File Stats” for further details on information gathered during the scanning process. Note: There is a preference setting that will automatically start the scanning process right after the user selects the file to scan.

Preferences: Allows the user to establish default preferences to control how some of the features within PLB Data Check operate.  Preferences pertaining to the current instance of PLB Data Check can be changed using the icon tray at the bottom right of the screen. These changes don’t change the defaults.  Defaults can only be changed by opening the preference window and saving your changes.

Help: Displays a dropdown menu that will allow the user to view help contents, navigate to our website, or see the PLB Data Check about screen.

Exit: Allows the user to exit or close this particular instance of the PLB Data Check program without exiting the PLB Utility Suite.

Data File to Scan: The name of file (or files) to be scanned.  The user can enter the file name manually or they can find the file using the standard windows open dialog by clicking on the “Open” button.

Create Output: During the scanning process, data records matching certain selection criteria can be copied to a new file.  Four different methods are currently supported:

  • Copy records with a length other than…: If this method is selected when the “Scan” process is started, the PLB Data Check program will copy any data record found in the “Data File” that has a physical record length other than the specified record length, to the “Output” file name.  When this option is checked the user is required to enter a record length.
  • Copy records with NUL or TAB characters: If this method is selected when the “Scan” process is started, the PLB Data Check program will copy any data record found in the “Data File” that contains either a NUL (Hex 00) or a TAB (Hex 09) character, to the “Output” file name.
  • Copy records with ASCII control characters: If this method is selected when the “Scan” process is started, the PLB Data Check program will copy any data record found in the “Data File” that contains “ASCII control characters”, to the “Output” file name.
  • Copy records with ASCII control or high characters: If this method is selected when the “Scan” process is started, the PLB Data Check program will copy any data record found in the “Data File” that contains either “ASCII control characters” or “High ASCII characters”, to the “Output” file name.

Record Length: Data files contain records, and records are usually a specific length. Within the file data, records are separated by “End of Record” characters. The number/bytes of data between two “End of Record” characters determines the record length. The main purpose of the PLB Data Check program is to scan a file and determine the record lengths of the data within the file.

End of Record: Data files contain records, and data records are separated by special “ASCII Control Characters” called “End of Record” characters. The specific character(s) used to specify the end of a data record vary. On DOS/Windows operating systems the most common end of record is the combination of a Hex 0D (aka Carriage Return) followed by a Hex 0A (aka Line Feed). Other possible end of record characters are Hex 0A (aka Line Feed) followed by a Hex 0D, or just single Hex 0D or Hex 0A characters. The PLB Data Check program will determine the end of record type by searching for the first occurrence of the 4 methods mentioned above.  If a data file uses an end of record scheme other than those four, that data file will not be accurately scanned.

ASCII Characters: Every byte of data within a data file is one of 256 possible ASCII values.  So while you may see an “A” character, that character has an ASCII decimal value (in the case of “A” the value is 65). Not all ASCII characters are printable characters, some have special meanings, some are the basic printable characters, and others are for international or drawing. PLB Data Check classifies each ASCII character into one of three groups.

  • ASCII Control Characters: ASCII decimal values from 0 to 31 and 127. Data containing control characters can cause problems if the programs reading the data are not expecting them. Certain ASCII control characters such as decimal 9 (aka Hex 09, or TAB) or decimal 0 (aka Hex 00, or NULL) have specific meaning to PLB programs.
  • ASCII Printable Characters: ASCII decimal values from 32 to 126. These are the standard viewable ASCII characters ! ” # $ % & ‘ ( ) * + , – . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
  • High ASCII Characters: ASCII decimal values from 128 to 255. Another good name for this group of ASCII characters is just “other”.  Characters in this range can vary depending on the associated language or font. Some fonts will not even have characters associated with this range of values.  €  ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ  Ž   ‘ ’ “ ” • – — ˜ ™ š › œ  ž Ÿ   ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
  • ASCII Character Usage: The PLB Data Check program can scan the ASCII data within the data file and report individual ASCII character usage.  The user can control how detailed the scan of the ASCII character usage is by changing the default settings in the preferences dialog, or the current setting settings in the task bar.  The default is to “Count (just) Null and Tab control characters” this will only scan for 0 (aka Hex 00, or NULL) and 9 (aka Hex 09, or TAB) characters.  Other choices are “Count all ASCII control characters” which will scan for 0 to 31 and 127 values.   Selecting either of those first two options will automatically disable the other since they overlap.  The remaining options enable scanning of different ASCII areas.  The “Count ASCII Printable characters” option will scan for values from 32 to 126.  The “Count High ASCII characters” option will scan for values from 128 to 255.  These options greatly impact file scanning performance, in general the more detailed the ASCII Character Usage, the longer it will take to scan a “Data File”.

Data Record Length Information

Length: The data record length in bytes.  An individual data record is determined by counting the bytes of data between two end of record characters (or start of file and first end of record).

Quantity: The quantity of data records that share the same record length.  Every data record in a fixed length data file should have the same length.

Percentage: The quantity of data records divided by the total data records determines the reported percentage.

Last Record: Within a data file, records are numbered sequentially.  For the specified record length the last (sequential) record number matching that record length is reported.

Byte Offset: For the specified record length, the starting byte position of the last (sequential) data record within the data that matches that record length.
Last Data (first 260 bytes): For the specified record length the first 260 bytes of data from last (sequential) record number matching that record length is reported. The limitation of only the first 260 bytes is a display limitation associated with the listview object. If you need to see the entire record see “Visual Data Comparison”, or choose to “Copy records with a length other than…” to a new file.

Visual Data Comparison

After scanning a data file, the PLB Data Check program displays one row of data for each unique data record length it finds.  This allows a visual comparison to be made between the different record lengths.  The first two lines in the display are to display column positions.

PLB Data Check File Stats

Data Record Count: The total quantity of data records within the data file that are readable by PLB programs.  This amount plus any zero byte records should equal the total record count.

Zero Byte Record Count: The total quantity of data records within the data file that contained no data (aka two end of record markers right after each other).

Total Record Count: The total quantity of data records within the data file.

Deleted Record Count: Deleted records are not actually deleted or removed from the data file.  The original data is replaced with a series of null (aka hex 00) characters.  The quantity of deleted records is calculated by counting the total number of sequential hex 00 characters found within the entire data file, and dividing that amount by the most common record length.

File Size Bytes:  The total size of the “Data File” in bytes.  This includes all readable data, bytes used as end of record markers, and bytes of deleted data.  This should match the file size reported by the operating system.

End of File:  If the “Data File” has a proper end of record marker after the last data record (the very end of the file) the position of the end of file is reported which should match the file size bytes.  If the “Data File” does NOT have a proper end of record for the last data record the word “Invalid” will be displayed.

End of Record Type: The hex value(s) used to indicate an end of record (see “End of Record”)

Unique Record Lengths: The number of unique record lengths found during the scan.  This should match the number of lines detailed in the results.

ASCII Character Usage: The number of times each specific ASCII character occurs within the readable data area.  If the ASCII character is not found within the readable data area nothing is reported (vs showing quantities of zero).  By default the only ASCII characters that are scanned for are NULL and TAB characters.  See previous “ASCII Character Usage” are for more information.

Known Issues and Limitations

  • Records Lengths larger than 65,000 bytes are not supported.
  • End of record characters other than Hex 0D, Hex 0A, Hex 0D + Hex 0A, or Hex 0A + Hex 0D are not supported.