PYTHON HELP

fastText Python (How It Works: A Guide for Developers)

Published March 5, 2025
Share:

Introduction to FastText

FastText is a Python library initially developed by Facebook's AI Research lab (FAIR) that provides efficient text classification and Word representation learning. It offers an innovative approach to Word representation, addressing some of the limitations of previous models like Word2Vec. Its ability to understand sub-word vectors or elements and handle various linguistic challenges makes it a powerful tool in the NLP toolkit. FastText builds on modern macOS and Linux distributions. In this article, we will learn about the FastText Python package and also a versatile PDF generation library, IronPDF from Iron Software.

Features

  1. Word Representation Model:

    • Learn to compute Word vectors, use fasttext.train_unsupervised with either the skip-gram or CBOW model with low memory usage:

      import fasttext
      model = fasttext.train_unsupervised('data.txt', model='skipgram')
      # data.txt is the training file
      import fasttext
      model = fasttext.train_unsupervised('data.txt', model='skipgram')
      # data.txt is the training file
      import fasttext model = fasttext.train_unsupervised( 'data.txt', model='skipgram')
      #data.txt is the training file
      $vbLabelText   $csharpLabel
    • Retrieve Word vectors:

      print(model.words)  # List of words in the dictionary
      print(model['king'])  # Vector for the word 'king'
      print(model.words)  # List of words in the dictionary
      print(model['king'])  # Vector for the word 'king'
      'INSTANT VB TODO TASK: The following line uses invalid syntax:
      'print(model.words) # List @of words in the dictionary print(model['king']) # Vector for the word 'king'
      $vbLabelText   $csharpLabel
    • Save and load trained models:

      model.save_model("model_filename.bin")
      loaded_model = fasttext.load_model("model_filename.bin")
      model.save_model("model_filename.bin")
      loaded_model = fasttext.load_model("model_filename.bin")
      'INSTANT VB TODO TASK: The following line uses invalid syntax:
      'model.save_model("model_filename.bin") loaded_model = fasttext.load_model("model_filename.bin")
      $vbLabelText   $csharpLabel
  2. Text Classification Model:

    • Train supervised text classifiers using fasttext.train_supervised or use previously trained model:

      model = fasttext.train_supervised('data.train.txt')
      model = fasttext.train_supervised('data.train.txt')
      'INSTANT VB TODO TASK: The following line uses invalid syntax:
      'model = fasttext.train_supervised('data.train.txt')
      $vbLabelText   $csharpLabel
    • Evaluate the model within the context window:

      print(model.test('data.test.txt'))  # Precision and recall
      print(model.test('data.test.txt'))  # Precision and recall
      'INSTANT VB TODO TASK: The following line uses invalid syntax:
      'print(model.test('data.test.txt')) # Precision @and recall
      $vbLabelText   $csharpLabel
    • Predict labels for specific text:

      print(model.predict("Which baking dish is best for banana bread?"))
      print(model.predict("Which baking dish is best for banana bread?"))
      'INSTANT VB TODO TASK: The following line uses invalid syntax:
      'print(model.predict("Which baking dish is best for banana bread?"))
      $vbLabelText   $csharpLabel

Code Examples

The code demonstrates how to train a text classification model using FastText:

import fasttext
# Training data file format: '__label__<label> <text>' with vocabulary words and out of vocabulary words
train_data = [
    "__label__positive I love this!",
    "__label__negative This movie is terrible.",
    "__label__positive Great job!",
    "__label__neutral The weather is okay."
]
# Write the training data to a text file with enriching word vectors
with open('train.txt', 'w', encoding='utf-8') as f:
    for item in train_data:
        f.write("%s\n" % item)
# Train a supervised model with training sentence file input
model = fasttext.train_supervised(input='train.txt', epoch=10, lr=1.0)
# Testing the model
texts = [
    "I like it.",
    "Not good.",
    "Awesome!"
]
for text in texts:
    print(f"Input text: '{text}'")
    print("Predicted label:", model.predict(text))
    print()
import fasttext
# Training data file format: '__label__<label> <text>' with vocabulary words and out of vocabulary words
train_data = [
    "__label__positive I love this!",
    "__label__negative This movie is terrible.",
    "__label__positive Great job!",
    "__label__neutral The weather is okay."
]
# Write the training data to a text file with enriching word vectors
with open('train.txt', 'w', encoding='utf-8') as f:
    for item in train_data:
        f.write("%s\n" % item)
# Train a supervised model with training sentence file input
model = fasttext.train_supervised(input='train.txt', epoch=10, lr=1.0)
# Testing the model
texts = [
    "I like it.",
    "Not good.",
    "Awesome!"
]
for text in texts:
    print(f"Input text: '{text}'")
    print("Predicted label:", model.predict(text))
    print()
#Training data file format: '__label__<label> <text>' with vocabulary words and out of vocabulary words
#Write the training data to a text file with enriching word vectors
#Train a supervised model with training sentence file input
#Testing the model
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'import fasttext train_data = ["__label__positive I love this!", "__label__negative This movie is terrible.", "__label__positive Great job!", "__label__neutral The weather is okay."] @with TryCast(open('train.txt', "w"c, encoding='utf-8'), f): for item in train_data: f.write("%s" + vbLf % item) model = fasttext.train_supervised(input='train.txt', epoch=10, lr=1.0) texts = ["I like it.", "Not good.", "Awesome!"] for text in texts: print(f"Input text: '{text}'") print("Predicted label:", model.predict(text)) print()
$vbLabelText   $csharpLabel

Output Explanation

  • Training: The example trains a FastText model using a small dataset (train_data). Each entry in train_data starts with a label followed by a label (positive, negative, neutral) and the corresponding text.
  • Prediction: After training, the model predicts labels (positive, negative, neutral) for new input texts (texts). Each text is classified into one of the labels based on the trained model.

Output

fastText Python (How It Works: A Guide for Developers): Figure 1 - Text Classification Model Output

Introducing IronPDF

fastText Python (How It Works: A Guide for Developers): Figure 2 - IronPDF: The Python PDF Library

IronPDF is a robust Python library crafted for creating, editing, and digitally signing PDF documents using HTML, CSS, images, and JavaScript. It excels in performance while maintaining a minimal memory footprint. Key features include:

  • HTML to PDF Conversion: Convert HTML files, HTML strings, and URLs into PDF documents, along with the ability to render webpages using the Chrome PDF renderer.

  • Cross-Platform Support: Compatible with Python 3+ on Windows, Mac, Linux, and various cloud platforms. IronPDF is also available for .NET, Java, Python, and Node.js environments.

  • Editing and Signing: Customize PDF properties, bolster security with passwords and permissions, and apply digital signatures to documents.

  • Page Templates and Settings: Customize PDFs with headers, footers, page numbers, adjustable margins, custom paper sizes, and responsive layouts.

  • Standards Compliance: Ensures adherence to PDF standards such as PDF/A and PDF/UA, supports UTF-8 character encoding, and seamlessly manages assets like images, CSS stylesheets, and fonts.

Installation

pip install fastText
pip install fastText
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'pip install fastText
$vbLabelText   $csharpLabel

Generate PDF Documents using IronPDF and FastText

Prerequisites

  1. Make sure Visual Studio Code is installed as a code editor
  2. Python version 3 is installed

To start with, let us create a Python file to add our scripts.

Open Visual Studio Code and create a file, fastTextDemo.py.

Install necessary libraries:

pip install fastText
pip install ironpdf
pip install fastText
pip install ironpdf
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'pip install fastText pip install ironpdf
$vbLabelText   $csharpLabel

Then add the below code to demonstrate the usage of IronPDF and FastText Python packages.

import fasttext
from ironpdf import * 
# Apply your license key
License.LicenseKey = "key"
# Create a PDF from a HTML string using Python
content = "<h1>Awesome Iron PDF with Fasttext</h1>"
# Training data file format to learn word vectors: '__label__<label> <text>' with vocabulary words,  rare words and out of vocabulary words
train_data = [
    "__label__positive I love this!",
    "__label__negative This movie is terrible.",
    "__label__positive Great job!",
    "__label__neutral The weather is okay."
]
# Write the training data to a text file with enriching word vectors
with open('train.txt', 'w', encoding='utf-8') as f:
    for item in train_data:
        f.write("%s\n" % item)
# Train a supervised model with training sentence file input
model = fasttext.train_supervised(input='train.txt', epoch=10, lr=1.0)
# Testing the model
texts = [
    "I like it.",
    "Not good.",
    "Awesome!"
]
content += "<h2>Training data</h2>"
for data in train_data:
    print(data)
    content += f"<p>{data}</p>"
content += "<h2>Train a supervised model</h2>"
content += f"<p>model = fasttext.train_supervised(input='train.txt', epoch=10, lr=1.0)</p>"
content += "<h2>Testing the model</h2>"
for text in texts:
    print(f"Input text: '{text}'")
    print("Predicted label:", model.predict(text))
    print()
    content += f"<p>----------------------------------------------</p>"
    content += f"<p>Input text: '{text}</p>"
    content += f"<p>Predicted label:{model.predict(text)}</p>"
pdf = renderer.RenderHtmlAsPdf(content) 
    # Export to a file or Stream
pdf.SaveAs("DemoIronPDF-FastText.pdf")
import fasttext
from ironpdf import * 
# Apply your license key
License.LicenseKey = "key"
# Create a PDF from a HTML string using Python
content = "<h1>Awesome Iron PDF with Fasttext</h1>"
# Training data file format to learn word vectors: '__label__<label> <text>' with vocabulary words,  rare words and out of vocabulary words
train_data = [
    "__label__positive I love this!",
    "__label__negative This movie is terrible.",
    "__label__positive Great job!",
    "__label__neutral The weather is okay."
]
# Write the training data to a text file with enriching word vectors
with open('train.txt', 'w', encoding='utf-8') as f:
    for item in train_data:
        f.write("%s\n" % item)
# Train a supervised model with training sentence file input
model = fasttext.train_supervised(input='train.txt', epoch=10, lr=1.0)
# Testing the model
texts = [
    "I like it.",
    "Not good.",
    "Awesome!"
]
content += "<h2>Training data</h2>"
for data in train_data:
    print(data)
    content += f"<p>{data}</p>"
content += "<h2>Train a supervised model</h2>"
content += f"<p>model = fasttext.train_supervised(input='train.txt', epoch=10, lr=1.0)</p>"
content += "<h2>Testing the model</h2>"
for text in texts:
    print(f"Input text: '{text}'")
    print("Predicted label:", model.predict(text))
    print()
    content += f"<p>----------------------------------------------</p>"
    content += f"<p>Input text: '{text}</p>"
    content += f"<p>Predicted label:{model.predict(text)}</p>"
pdf = renderer.RenderHtmlAsPdf(content) 
    # Export to a file or Stream
pdf.SaveAs("DemoIronPDF-FastText.pdf")
#Apply your license key
#Create a PDF from a HTML string using Python
#Training data file format to learn word vectors: '__label__<label> <text>' with vocabulary words, rare words and out of vocabulary words
#Write the training data to a text file with enriching word vectors
#Train a supervised model with training sentence file input
#Testing the model
	#Export to a file or Stream
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'import fasttext from ironpdf import * License.LicenseKey = "key" content = "<h1>Awesome Iron PDF with Fasttext</h1>" train_data = ["__label__positive I love this!", "__label__negative This movie is terrible.", "__label__positive Great job!", "__label__neutral The weather is okay."] @with TryCast(open('train.txt', "w"c, encoding='utf-8'), f): for item in train_data: f.write("%s" + vbLf % item) model = fasttext.train_supervised(input='train.txt', epoch=10, lr=1.0) texts = ["I like it.", "Not good.", "Awesome!"] content += "<h2>Training data</h2>" for data in train_data: print(data) content += f"<p>{data}</p>" content += "<h2>Train a supervised model</h2>" content += f"<p>model = fasttext.train_supervised(input='train.txt', epoch=10, lr=1.0)</p>" content += "<h2>Testing the model</h2>" for text in texts: print(f"Input text: '{text}'") print("Predicted label:", model.predict(text)) print() content += f"<p>----------------------------------------------</p>" content += f"<p>Input text: '{text}</p>" content += f"<p>Predicted label:{model.predict(text)}</p>" pdf = renderer.RenderHtmlAsPdf(content) pdf.SaveAs("DemoIronPDF-FastText.pdf")
$vbLabelText   $csharpLabel

Output

fastText Python (How It Works: A Guide for Developers): Figure 3 - Console Output

PDF

fastText Python (How It Works: A Guide for Developers): Figure 4 - PDF Output

IronPDF License

IronPDF works with a license key for Python. IronPDF for Python offers a free-trial license key to allow users to get started for free.

Place the license key at the start of the script before using the IronPDF package:

from ironpdf import * 
# Apply your license key
License.LicenseKey = "key"
from ironpdf import * 
# Apply your license key
License.LicenseKey = "key"
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Conclusion

FastText is a lightweight and efficient library for text representation and classification. It excels in learning Word embeddings using subword information, supporting text classification tasks with high speed and scalability, offering pre-trained models for multiple languages, and providing a user-friendly Python interface for easy integration into projects. IronPDF is a comprehensive Python library for programmatically creating, editing, and rendering PDF documents. It simplifies tasks such as converting HTML to PDF, adding content and annotations to PDFs, managing document properties and security, and is compatible across different operating systems and programming environments. Ideal for generating and manipulating PDFs within Python applications efficiently.

Together with both libraries, we can train text models and document the output results in PDF format for archiving purposes.

Regan Pun

Regan Pun

Software Engineer

 LinkedIn

Regan graduated from the University of Reading, with a BA in Electronic Engineering. Before joining Iron Software, his previous job roles had him laser-focused on single tasks; and what he most enjoys at Iron Software is the spectrum of work he gets to undertake, whether it’s adding value to sales, technical support, product development or marketing. He enjoys understanding the way developers are using the Iron Software library, and using that knowledge to continually improve documentation and develop the products.
< PREVIOUS
Folium Python (How It Works: A Guide for Developers)
NEXT >
Bottle Python ((How It Works: A Guide for Developers))