Skip to content
Doppelganger logo

Understanding Extraction Scripts

Extraction scripts in Doppelganger allow you to process, transform, and manipulate data after it has been captured. They also let you format data for export, whether as JSON or CSV, giving you full control over how extracted content is structured.

What an Extraction Script Does#

An extraction script operates after the selector has captured the data. It can perform:

  • Cleaning HTML or text (removing tags, whitespace, or extra characters)
  • Parsing or formatting dates, numbers, or prices
  • Combining multiple fields into a single value
  • Filtering out unwanted or duplicate data
  • Running conditional logic to adjust content dynamically
  • Preparing content for structured export in CSV or JSON format

Exporting as CSV#

When exporting multiple fields or rows to CSV, extraction scripts can:

  • Combine multiple fields into a single row using commas as separators
  • Escape special characters such as quotes, commas, or line breaks
  • Format values consistently to match CSV requirements
  • Generate headers dynamically for structured export

For example, if you capture a product name, price, and URL, an extraction script can structure them as a CSV row:

Product Name,Price,URL  
"Cool Widget","$19.99","https://example.com/widget"

This ensures that each row of extracted content is ready to save or export as a CSV file.

Exporting as JSON#

Extraction scripts can also prepare data for JSON output:

  • Combine multiple fields into a JSON object with keys and values
  • Structure nested data for complex elements, such as lists or variants
  • Filter out empty or invalid entries
  • Ensure consistent formatting for downstream tasks or APIs

For example, a script could transform captured fields into:

{ "name": "Cool Widget", "price": "$19.99", "url": "https://example.com/widget" }

This format is ideal for programmatic processing or storage.

How to Use Extraction Scripts#

  1. Capture the desired fields using selectors in separate extraction blocks.
  2. In the extraction script, access captured values via their variables.
  3. Clean or transform each value as needed.
  4. Combine values into either a CSV row or JSON object.
  5. Return the formatted string or object to be saved, exported, or passed to the next block.

Best Practices#

  • Always escape special characters for CSV and validate values for JSON.
  • Keep scripts simple and focused on one row or object at a time.
  • Test scripts on multiple elements to ensure consistent output.
  • Use consistent data types for numeric, date, or boolean fields.
  • Avoid modifying selectors in the script — scripts should only transform captured content.

Extraction scripts in Doppelganger give you complete control over how data is cleaned, structured, and exported. Mastering them ensures your tasks produce clean, consistent, and ready-to-use CSV or JSON output.