Tensorflow 4-bit quantization

What is qat training?
What is INT8 quantization?
Does quantization reduce model size?

What is qat training?

Quantization Aware Training (QAT) Quantization Aware Training aims at computing scale factors during training. Once the network is fully trained, Quantize (Q) and Dequantize (DQ) nodes are inserted into the graph following a specific set of rules.

What is INT8 quantization?

The ability to lower the precision of a model from FP32 to INT8 is built into the DL Workbench application. This process is called quantization and it is a great practice to accelerate the performance of certain models on hardware that supports INT8.

Does quantization reduce model size?

Quantization can reduce the size of a model in all of these cases, potentially at the expense of some accuracy. Pruning and clustering can reduce the size of a model for download by making it more easily compressible.