This tutorial demonstrates two ways to load and preprocess text.
- First, you will use
Keras
utilities and preprocessing layers. These include:
tf.keras.utils.text_dataset_from_directory
to turn data into atf.data.Dataset
andtf.keras.layers.TextVectorization
for data standardization, tokenization, and vectorization. If you are new to TensorFlow, you should start with these. - Then, you will use lower-level utilities like
tf.data.TextLineDataset
to load text files,tf.lookup
for custom in-model lookup tables, andTensorFlow Text
APIs, such astext.UnicodeScriptTokenizer
andtext.case_fold_utf8
, to preprocess the data for finer-grain control.