Wiesemann & Theis GmbH

Networking, sensors and interface technology for industry, office and IT

Background information:

Data formats and protocols

From a single byte to the industry protocol



What are data?

Whether digitalization, Industry 4.0 or IoT (Internet of Things), the bottom line is that a wide variety of often industrial components as well as their users are communicating data with each other.

From the perespective of the users these data can contain many kinds of information, such as temperatures, switching states, weights, time indications, positioning details and much more.

But regardless of the content, when it comes to electronic data processing and computer technology data are always an undetermined number of bytes.

One byte represents a numerical value of between 0 and 255.

Data exchange thus means sending bytes from A to B


Regulated data exchange using protocols

To ensure that the recipient understands the data he receives from the sender, it is important to determine what form the data are transmitted in.

In addition, in systems that contain multiple networked components it must be specified who is to receive the data. Along with the actually transmitted user data, the data transmission must therefore also include some address information.

When user data and address information follow a prescribed frame structure, then we are dealing with a protocol.

Fieldbus systems used to be commonly used for data transmission between industrial components. Fieldbuses are serial connections between the respective components. Over time various standards have evolved in parallel which differ not only in their protocol and transmission speed. The physical transmission including the mechanical connection possibilities also varies significantly.

Newer industry protocols differ on the protocol level in their coding of the data, but most use TCP/IP-Ethernet as the physical transport medium.

This represents a common standard that has many virtues:

  • The existing infrastructure can be used
  • Different industry protocols can be used together in the same network
  • Uniform transmission technology and connectors
  • Cross-location communication is possible
  • Can be expanded as desired

Data formats

With protocols that use TCP/IP-Ethernet as the common standard, addressing is with few exceptions accomplished using the IP address.

The actual industry protocol rather defines in which form the transported data are sent.

There are two basic data formats:

  • Message text
  • Binary data

Which variant is used depends on many factors.

Data as text

Especially for web-based application data of all kinds are sent as text. Text means that the information is sent as a character string that can be read by persons. Each character is represented by one byte.

Data as ASCII characters


The coding used to conform to the ASCII standard. The arrangement of which written character corresponds to which numerical value is defined in the ASCII table (ASCII = American Standard Code for Information Interchange).

ASCII table


The peculiarity was that only 7 or the 8 available bits in a byte were used, limiting the usable character set to 128 readable characters.

Newer standards such as UTF8 overcome this limitation and allow for special characters, even two bytes for a character.

In addition to freely formulated text content, standardized text formats have become established for web and industry protocols:

  • XML
  • JSON

We will briefly describe both formats here.

XML - Extensible Markup Language
XML is a so-called markup language. The actual user data are embedded in tags. The tags are element names for the respective values and contents. Each tag begins with a < bracket and ends with a > bracket.

Each XML construct begins with a start-tag in which at least the XML version is indicated. Additional parameters, such as the character coding used, are also possible:

<?xmlversion="1.0" encoding="UTF-8"?>

After the start-tag follow the additional contents embedded in tags. All contents except for the start-tag have a start and end tag having the same name. Note, that the end tags include a solidus ("/") before the name of an element.

Example:

<inhalt>irgendetwas</inhalt>

XML also permits structured, hierarchically nested tags. Here an example for the sensor values from a W&T Web-Thermo-Hygrobarometer:

<?xml version="1.0" encoding="UTF-8"?>
<webio>
  <iostate>
    <sensor>
      <name>Temperatur</name>
      <number>0</number>
      <unit>°C</unit>
      <value>23.900000</value>
    </sensor>
    <sensor>
      <name>rel. Feuchte</name>
      <number>1</number>
      <unit>%</unit>
      <value>36</value>
    </sensor>
    <sensor>
      <name>Luftdruck</name>
      <number>2</number>
      <unit>hPa</unit>
      <value>992</value>
    </sensor>
  </iostate>
</webio>

The indentations are not required for XML, but are commonly used to enhance readability.

The advantage of XML as a transmission format is that both man and machine or a processing program can easily read the contents.

The disadvantage lies in the very high gross data quantity for little contents.

JSON - JavaScript Object Notation
The syntax, i.e. the structure of JSON, is based on a subset of JavaScript syntax.

JSON uses pairs of names and value/content for coding the data.

Example: "content" "anything"

JSON as well allows a structured, hierarchically nested construction. Here as an example again the sensor values from a W&T Web-Thermo-Hygrobarometer:

{
  "iostate":
  {
    "sensor":
    [
      {
        "name": "Temperatur",
        "number": 0,
        "unit": "°C",
        "value": 24.1
      },
      {
        "name": "rel. Feuchte",
        "number": 1,
        "unit": "%",
        "value": 35.9
      },
      {
        "name": "Luftdruck",
        "number": 2,
        "unit": "hPa",
        "value": 991.8
      }
    ]
  }
}

Both names and values are embedded above in quotation marks. Numerical values are the exception - here no quotation marks are necessary.

Names/value pairs are separated by commas.

Associated name/value pairs must be combined into groups using braces.

Associated groups can form an array which are separated by commas in square brackets.

A detailed description of JSON format can be found at https://www.json.org.

In terms of data volume JSON is significantly more compact than XML yet still highly readable by man and machine.

Base64 encoding
Base64 is a procedure which codes and decodes binary data into a chain of readable ASCII characters. In this way binary content can also be transmitted using text-based transmission formats.

The scheme is quite simple. Binary data (more specifically, a sequence of 8-bit bytes) is represented in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

Base64 encoding

Each of the four digits is assigned the character corresponding to the value according to the following table. Thus three binary bits are replaced by four chars, i.e. readable characters.

Base64 encoding

This procedure is repeated until all the bits are encoded. If there are any bytes remaining at the end, padding bytes are added in order to encode the last three bytes. Padding bytes have a value of 0.

In order to filter out the padding when decoding, that is restoring the original binary bytes, the encoded string has a "=" character appended to the end for each padding byte.

The most common applications for Base64 encoding are web-based applications and email.

Binary data

Data are always a certain number of bytes.

Which byte serves which purpose at which location is determined either by a standardized protocol or by the application. Behind one or more bytes hides a value, a value array, a character string or a function call.

Individual values can be sent in a data transmission. But data structures are also often used, where what value is stored at which location of the transmitted byte string is determined.

Here as an example are data for a Modbus function call. The function code is for example always contained in the 8th byte:

Modbus-TCP packet structure

Another common procedure for binary data construction is TLV, which stands for Type Length Value. Multiple contents of any size can be send in succession in a data transmission.

The following sequence applies for any content:

  • Type - what kind of content is this?
    Type determined by the application
  • Length - how many bytes make up the contents?
  • Value - bytes in the value or contents.

If additional bytes follow such a sequence, then the next sequence is added.

Here a simple example:

TLV packet structure

The transmitted bytes contain two values: a 16-bit value (2 bytes) and a 32-bit value (4 bytes).

The advantage of binary data transmission is the highly compact structure of the data.


Basics of common industrial protocols

Products for industrial applications with standard protocols

^