Order Uploader Documentation

The Order Uploader is a basic CLI application designed to simplify the process of uploading protein sequence data to both cloud storage (GCP) and the Benchling Electronic Lab Notebook (ELN) platform. This tool handles the complexities of data validation, metadata consistency checking, and some error correction to ensure your experimental data is properly archived and accessible for future analysis.

Overview

The Order Uploader processes CSV files containing protein sequence data and associated metadata, performing validation against your organization’s schema and metadata standards. It can infer some missing information, correct common typos through fuzzy matching, and organize score data from Scientific Compute Units (SCUs) into appropriately structured tables.

Key Features:

  • Design name generation: Automatically construct design names from metadata components

  • Fuzzy matching: Correct typos in metadata fields with user confirmation

  • Score column organization: Automatically categorize and upload score data to appropriate SCU tables

  • Interactive error handling: Guided correction of data inconsistencies

  • Benchling integration: Upload to Benchling ELN with proper folder organization

  • Monday.com integration: Validation against project tickets for consistency

Quick Start

Note

The cluster wrapper is at /runtime/scripts/order_uploader, and should be usable from the head node at order_uploader

Basic usage requires a CSV file with at minimum sequence and tag_location columns and the program ID. The program ID must be registered in the metadata source of truth with a Benchling Program ID that matches the name of the top level project folder in Benchling. The Benchling API, at the time of this release, does not support creating top-level project folders, you must use an existing one (or create one manually).

It is recommended to also provide the design_name column:

order_uploader --input-csv-path my_sequences.csv --user-id your.email@ziptx.bio --program-id PROG123

Example CSV:

Basic CSV format

design_name

sequence

tag_location

IL11_S1_Ic1_001

MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFE

N-term

IL11_S1_Ic1_002

AKQRQISFVKSHFSRQLEE

C-term

IL11_S1_Ic1_GSFus003

MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSGGGGGGGGGGGGGGGGGGGGGSSSSSSSSS

N-term

If you do not provide a design_name column, or wish to override the design names where relevant, you can provide the metadata as CLI arguments:

order_uploader \
  --input-csv-path my_sequences.csv \
  --program-id PROG123 \
  --target-id IL11 \
  --binding-site-id S1 \
  --iteration-number Ic1 \
  --user-id your.email@ziptx.bio \
  --monday-ticket-link "https://ziptx.monday.com/boards/123/items/456"

You can also provide the metadata as columns in your CSV:

CSV with metadata columns

program_id

target_id

binding_site

iteration

fusion_id

design_name

sequence

tag_location

PROG123

IL11

S1

Ic1

IL11_S1_Ic1_001

MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFE

N-term

PROG123

IL11

S1

Ic1

IL11_S1_Ic1_002

AKQRQISFVKSHFSRQLEE

C-term

PROG123

IL11

S1

Ic1

GSFus

IL11_S1_Ic1_GSFus003

MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSGGGGGGGGGGGGGGGGGGGGGSSSSSSSSS

N-term

Please do be aware that conflicts between in-csv and CLI arguments are resolved in favor of CLI if you use the –allow-cli-override flag. In other cases, the tool will fail unless there is a bug, and ask for resolution.

It is generally recommended to either make the names yourself or use either the CLI or the CSV columns, and not to mix and match, as there may be edge cases where the implementation cannot handle your input.

Additional files

You can also upload additional files to the order bucket, such as configuration files or supplementary data. Directory names are supported, and the tool will upload, recursively, all files in any directory path you provide, respecting the directory structure. Do keep in mind that symlinks might break traversal, and you should avoid them here.

The benchling order will include a top-level pointer to the GCS “folder” that contains the additional files.

order_uploader --input-csv-path my_sequences.csv --user-id your.email@ziptx.bio --additional-upload-paths config.json /path/to/analysis_plots/

Usage Scenarios

Scenario 1: Well-Formatted CSV Upload

When to use: You have a complete, properly formatted CSV with design names, sequences, and tag locations.

Example CSV (complete_order.csv):

Complete order CSV

design_name

sequence

tag_location

IL11_S1_Ic1_001

MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQ…

N-term

Command:

order_uploader \
  --input-csv-path complete_order.csv \
  --user-id researcher@ziptx.bio

What happens:

  1. The tool validates design names against the metadata schema

  2. Confirms all required columns are present

  3. Checks for any additional score columns (none in this case)

  4. Creates appropriate Benchling folders and uploads data

  5. Registers protein entities and generates registry IDs

Scenario 2: CSV Without Design Names

When to use: You have sequences and metadata but need the tool to generate design names.

Example CSV (sequences_only.csv):

Sequences without design names

sequence

tag_location

program

target_id

binding_site

iteration

MKTAYIAKQRQISFVKSHFS…

N-term

PROG123

IL11

S1

Ic1

AKQRQISFVKSHFSRQLEE…

C-term

PROG123

IL11

S1

Ic1

This will generate design names like IL11_S1_Ic1_001, IL11_S1_Ic1_002, etc.

Command:

order_uploader \
  --input-csv-path sequences_only.csv \
  --user-id researcher@ziptx.bio

What happens:

  1. Tool extracts metadata from CSV columns

  2. Validates each metadata component against the schema

  3. Generates design names: IL11_S1_Ic1_001, IL11_S1_Ic1_002

  4. Proceeds with normal upload workflow

Scenario 3: Conflicting Data Requiring Correction

When to use: Your CSV has inconsistencies that need user intervention.

Example: CSV with mismatched design names vs. inferred metadata.

Example CSV (conflicted_order.csv):

Conflicted data CSV

design_name

sequence

tag_location

target_id

IL11_S1_Ic1_001

MKTAYIAK…

N-term

FasR

IL11_S1_Ic1_002

AKQRQISF…

C-term

IL11

What happens:

  1. Tool detects target_id mismatch in first row (IL11 in name vs FasR in column)

  2. Prompts user with options:

    Design names clash with expectations based on the input configuration
    WARNING: if you choose (1), the 'raw' csv will be the original (mismatched) one provided at the beginning.
    Unable to find design names. Please select an option:
    1. Use the generated design names, continue to upload
    2. Save the generated names to file, and exit
    3. Exit
    
  3. If option 2 is chosen, saves corrected CSV for future use, but does not upload to Benchling.

Scenario 4: Handling Typos with Fuzzy Matching

When to use: Your metadata contains typos or slight variations from registered values.

Example: Using PROG12 instead of PROG123 or IL1 instead of IL11.

Command:

order_uploader \
  --input-csv-path order_with_typos.csv \
  --program-id PROG12 \
  --target-id IL1 \
  --user-id researcher@ziptx.bio

What happens:

  1. Tool fails to find exact match for PROG12

  2. Queries metadata tables for fuzzy matches

  3. Presents options:

    PROG12 does not match any registered names. Did you mean PROG123? (y/n):
    
  4. User confirms correction

  5. Continues with validated metadata

For multiple fuzzy matches:

Please select the number of the fuzzy match you meant to check for in program_table:
1. PROG123
2. PROG124
3. PROG125
4. Enter a new value

Scenario 5: CSV with Score Columns

When to use: Your CSV includes additional score columns from Scientific Compute Units (SCUs).

Example CSV (order_with_scores.csv):

CSV with score columns

design_name

sequence

tag_location

stability_score

binding_affinity

expression_level

IL11_S1_Ic1_001

MKTAYIAK…

N-term

0.85

1.2e-9

145.2

IL11_S1_Ic1_002

AKQRQISF…

C-term

0.92

8.7e-10

162.8

What happens:

  1. Tool identifies stability_score, binding_affinity, expression_level as potential score columns

  2. Validates each against SCU schema endpoints

  3. Groups validated columns by their SCU table assignments

  4. Uploads core sequence data to main table

  5. Uploads score columns to appropriate SCU-specific tables (e.g., stability_table.csv, binding_table.csv)

If score columns have typos:

Unable to find these columns in the registered SCU values: {'expresion_level'}
WARNING: If you choose a fuzzy match, your score tables uploaded will not match the raw uploaded csv
Would you like to check for fuzzy matches? (y/n):

User can choose to:

  • Accept fuzzy matches with confirmation

  • Manually rename columns

  • Keep orphaned columns (uploaded as orphaned_score_columns.csv)

Design Naming Convention

ZipTx follows a specific naming convention for protein designs. Understanding this format is crucial for using the Order Uploader effectively.

Format: Target_Site_Iteration_Number

Components

Target

The target protein identifier (e.g., IL11, FasR, IL6, VEGFCD)

Site

The binding site identifier, which can be:

  • Simple site numbers: S1, S2, S3

  • Complex site identifiers: S3ab, S1c

  • Named sites: epitope1, binding_domain

Iteration

The design iteration following the format I{letter}{number}:

  • Letter component:

    • a-z: Iteration number for unsuccessful previous designs (Ia, Ib, Ic)

    • 1-9: Iteration for successful previous designs (I1, I2, I3)

  • Number component: Integer starting from 1, no zero-padding

Note: Once the first successful design iteration is found, all future designs will increment the number and retain the letter, even if a future design is unsuccessful.

Number

Sequential design number within the iteration, zero-padded to 3 digits (001, 002, 012)

Examples

Design naming examples

Design Name

Description

IL11_S1_Ia1_001

First iteration (no previous attempts), IL-11 site 1, design 1

FasR_S1_Ic1_005

3rd iteration (2 unsuccessful previous attempts), Site 1 FasR, design 5

IL6_S3ab_Ia3_012

3rd iteration (on previously successful designs), site 3ab in IL-6, design 12

Fusion Proteins

For designs fused to drug backbones, add the 3-letter drug code before the number:

Fusion protein naming examples

Design Name

Description

VEGFC/D_S1_Ia4_Eyl001

VEGFC/D target, site 1, 4th iteration, Eyl drug backbone, design 1

The tool automatically handles fusion protein naming when --fusion-id is provided or fusion information is included in the CSV.

CLI Reference

zcloud.console_scripts.order_uploader.upload_order(*args, **kwargs)

Upload an order to Benchling, validate and/or generate the sequence names and score columns.

If the input CSV sequence_name column is well formatted and follows schema, then all you need is the csv of: design_name, sequence, tag_location

You can also omit the name, as long as you provide either a column with the appropriate program id, binder id, binding site id, etc. You may also provide missing data as an argument, which will override the data in the csv (and will regenerate the names).

Parameters:
  • input_csv_path (str) – The path to the input CSV file.

  • program_id (Optional[str]) – The ID of the program.

  • target_id (Optional[str]) – The ID of the target.

  • binding_site_id (Optional[str]) – The ID of the binding site.

  • user_id (Optional[str]) – The ID of the user.

  • monday_ticket_link (Optional[str]) – The link to the Monday ticket.

  • iteration_number (Optional[str]) – The iteration number.

  • fusion_id (Optional[str]) – The ID of the fusion.

  • additional_upload_paths (Tuple[str, ...]) – Additional paths to upload, can be used to upload any files or directories recursively to the order bucket. Useful for if you used an unusual config and want to record it.

  • allow_cli_override (bool) – Allow CLI arguments to override the data in the input CSV, default is to fail and complain.

Raises:
  • FileNotFoundError – If the input CSV file is not found.

  • ValueError – If required columns are missing or if there are validation errors.

  • SystemExit – If user chooses to exit during interactive prompts.

Core Parameters

--input-csv-path (required)

Path to your input CSV file containing sequence data.

--user-id

Your email address for attribution in Benchling entries.

--monday-ticket-link

URL to the Monday.com ticket for this design campaign. Used for consistency validation.

Metadata Parameters

These can be provided via CLI or included as columns in your CSV:

--program-id

The program identifier for this design campaign.

--target-id

The target protein identifier.

--binding-site-id

The binding site identifier on the target.

--fusion-id

The fusion construct identifier.

--iteration-number

The design iteration identifier.

Advanced Options

--allow-cli-override

Allow CLI arguments to override CSV data when conflicts arise. Default: False.

--additional-upload-paths

Additional files or directories to upload alongside the main data. Useful for configuration files or supplementary data.

--additional-upload-paths config.json analysis_plots/

Required CSV Columns

Mandatory columns:

  • sequence: The protein amino acid sequence

  • tag_location: Location of any tags (typically ‘N-term’, ‘C-term’, or ‘internal’)

Optional columns:

  • design_name: Complete design name following ZipTx convention (generated if missing)

  • program, target_id, binding_site, fusion_id, iteration: Metadata components for name generation

  • Additional columns are treated as score data and validated against SCU schemas

API Reference

For developers and power users, key functions include:

Core Validation Functions

zcloud.benchling_order.check_program_id(program_id_query, eval_records=None, try_to_find_monday_id=None)[source]

Validate program ID against metadata oracle.

Parameters:
  • program_id_query (str) – Program ID to validate.

  • eval_records (Optional[List[Dict[str, str]]], optional) – Pre-loaded records to validate against instead of making API call, by default None.

  • try_to_find_monday_id (Optional[str], optional) – Monday ID to try to match against, by default None.

Returns:

Tuple containing (program_id_benchling, program_id_design, program_id_monday).

Return type:

Tuple[str, str, str]

Raises:

ValueError – If program ID cannot be found in the metadata.

zcloud.benchling_order.check_target_id(target_id_query, allowed_other_ids=None, eval_records=None, try_to_find_monday_id=None)[source]

Validate target ID against metadata oracle.

Parameters:
  • target_id_query (str) – Target ID to validate.

  • allowed_other_ids (Optional[List[str]], optional) – List of allowed program IDs for cross-validation, by default None.

  • eval_records (Optional[List[Dict[str, str]]], optional) – Pre-loaded records to validate against instead of making API call, by default None.

  • try_to_find_monday_id (Optional[str], optional) – Monday ID to try to match against, by default None.

Returns:

Tuple containing (matching_program_id, matching_target_id_benchling, matching_target_id_design, matching_target_id_monday, matching_target_id_internal).

Return type:

Tuple[str, str, str, str, str]

Raises:

UnableToFindMetadataError – If target ID cannot be found in the metadata.

zcloud.benchling_order.check_binding_site_id(binding_site_query, allowed_other_ids=None, eval_records=None)[source]

Validate binding site ID against metadata oracle.

Parameters:
  • binding_site_query (str) – Binding site ID to validate.

  • allowed_other_ids (Optional[List[str]], optional) – List of allowed target IDs for cross-validation, by default None.

  • eval_records (Optional[List[Dict[str, str]]], optional) – Pre-loaded records to validate against instead of making API call, by default None.

Returns:

Tuple containing (matching_target_id, binding_site_id_benchling, binding_site_id_design).

Return type:

Tuple[str, str, str]

Raises:

UnableToFindMetadataError – If binding site ID cannot be found in the metadata.

zcloud.benchling_order.check_fusion_id(fusion_id_query, eval_records=None)[source]

Validate a fusion ID against the metadata validator.

Parameters:
  • fusion_id_query (str) – Fusion ID to validate.

  • eval_records (Optional[List[Dict[str, str]]], optional) – Pre-loaded records to validate against instead of making API call, by default None.

Returns:

Tuple containing (fusion_id_internal, fusion_id_benchling, fusion_id_design).

Return type:

Tuple[str, str, str]

Raises:

UnableToFindMetadataError – If fusion ID cannot be found in the metadata.

Candidate Resolution

zcloud.console_scripts.order_uploader.confirm_single_value_from_user(value_type, df, cli_value, allow_cli_override)[source]

Confirm and resolve a single value from user input or DataFrame.

This function attempts to resolve a value from the DataFrame first, and if that fails, prompts the user for input or uses CLI override values.

Parameters:
  • value_type (str) – The type of value to resolve (e.g., ‘program_id’, ‘target_id’).

  • df (pd.DataFrame) – The input DataFrame containing the data to analyze.

  • cli_value (Optional[str]) – Optional CLI-provided value to use as override.

  • allow_cli_override (bool) – Whether to allow CLI values to override DataFrame values.

Returns:

The resolved value for the specified type.

Return type:

str

Raises:

SystemExit – If user chooses to exit during the confirmation process.

zcloud.console_scripts.order_uploader.confirm_set_of_values_from_user(value_type, df, cli_value, allow_cli_override)[source]

Confirm and resolve a set of values from user input or DataFrame.

This function attempts to resolve values from the DataFrame first, and if that fails, prompts the user for input or uses CLI override values.

Parameters:
  • value_type (str) – The type of value to resolve (e.g., ‘binding_site_id’, ‘fusion_id’).

  • df (pd.DataFrame) – The input DataFrame containing the data to analyze.

  • cli_value (Optional[str]) – Optional CLI-provided value to use as override.

  • allow_cli_override (bool) – Whether to allow CLI values to override DataFrame values.

Returns:

A set of resolved values for the specified type.

Return type:

Set[str]

Raises:

SystemExit – If user chooses to exit during the confirmation process.

Fuzzy Matching and Error Handling

zcloud.console_scripts.order_uploader.check_metadata_against_oracle_with_fuzzy_find_on_fail(metadata_table_id, query_value, allowed_other_ids=None, try_to_find_monday_id=None, fuzzy_match_threshold=70, _recursion_depth=0)[source]

Check metadata value against oracle with fuzzy matching fallback.

Attempts to validate a metadata value against the oracle database. If the exact match fails, falls back to fuzzy matching and user confirmation. Uses recursion to handle multiple validation attempts.

Parameters:
  • metadata_table_id (str) – The ID of the metadata table to check against.

  • query_value (str) – The value to validate.

  • allowed_other_ids (Optional[List[str]], optional) – Additional IDs that are allowed for validation, by default None.

  • try_to_find_monday_id (Optional[str], optional) – Monday ID to try to match against, by default None.

  • fuzzy_match_threshold (int, optional) – Threshold for fuzzy matching (0-100), by default 70.

  • _recursion_depth (int, optional) – Internal recursion depth counter, by default 0.

Returns:

A tuple containing the validated metadata information. The exact structure depends on the metadata checker function used.

Return type:

Tuple[Any, …]

Raises:
  • SystemExit – If maximum recursion depth is reached.

  • ValueError – If there are errors retrieving metadata tables.

zcloud.console_scripts.order_uploader.ask_user_to_confirm_fuzzy_match(query_val, fuzzy_matches, metadata_table_id)[source]

Ask the user to confirm a fuzzy match selection from a list of candidates.

Parameters:
  • query_val (str) – The original query value that did not match exactly.

  • fuzzy_matches (Iterable[str]) – A list of fuzzy match candidates.

  • metadata_table_id (str) – The ID of the metadata table being queried.

Returns:

The value selected by the user (either from fuzzy matches or a new value).

Return type:

str

Raises:

SystemExit – If user chooses to exit during the selection process.

SCU and Score Column Processing

zcloud.console_scripts.order_uploader.check_scu_against_oracle_with_fuzzy_find_on_fail(score_columns_to_check, _recursion_depth=0, all_tables=None, keymap=None)[source]

Check score columns against SCU oracle with fuzzy matching fallback.

Validates score column names against the SCU (Score Column Unit) schema. If exact matches fail, provides fuzzy matching and user interaction to resolve column names. Handles orphaned columns that cannot be matched.

Parameters:
  • score_columns_to_check (Set[str]) – Set of score column names to validate.

  • _recursion_depth (int, optional) – Internal recursion depth counter, by default 0.

  • all_tables (Optional[Dict[str, List[Dict[str, str]]]], optional) – Cached SCU tables data to avoid repeated API calls, by default None.

  • keymap (Optional[Dict[str, str]], optional) – Mapping of original column names to corrected names, by default None.

Returns:

A tuple containing: - Dictionary mapping table IDs to lists of found field names - Set of orphaned score columns that couldn’t be matched - Dictionary mapping original column names to corrected names

Return type:

Tuple[Dict[str, List[str]], Set[str], Dict[str, str]]

Raises:
  • SystemExit – If maximum recursion depth is reached.

  • ValueError – If there are errors retrieving SCU tables.

zcloud.benchling_order.check_scu_schema(query_dict, all_tables=None)[source]

Check Scientific Compute Unit (SCU) schema against available tables.

Parameters:
  • query_dict (Dict[str, str]) – Dictionary containing query parameters with field information.

  • all_tables (Optional[Dict[str, List[Dict[str, str]]]], optional) – Pre-loaded table data to avoid API calls. If None, will make API call.

Returns:

Dictionary mapping table IDs to lists of found field names.

Return type:

Dict[str, List[str]]

Raises:

ValueError – If SCU validation fails with non-200 status code.

Design Name Generation

zcloud.benchling_order.pdapply_build_design_names_from_row(row, iteration, target_id, binding_site_id=None, fusion_id=None, override=False)[source]

Build a design name from a row, intended for pandas apply.

Parameters:
  • row (pd.Series) – Pandas Series representing a row of data.

  • iteration (str) – Iteration code to use in the design name.

  • target_id (str) – Target ID to use in the design name.

  • binding_site_id (Optional[str], optional) – Binding site ID to use. If None, will try to infer from row data.

  • fusion_id (Optional[str], optional) – Fusion ID to use. If None, will try to infer from row data.

  • override (bool, optional) – If True, use provided parameters directly. If False, try to infer missing values.

Returns:

Generated design name in the format target_id_binding_site_iteration_fusion_id###.

Return type:

str

zcloud.benchling_order.check_generated_design_names(df, generated_design_names, allow_cli_override=False)[source]

Check if generated design names match existing design names in DataFrame.

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing existing design names.

  • generated_design_names (pd.Series) – Series of generated design names to compare against.

  • allow_cli_override (bool, optional) – Whether to allow override and use generated names when there’s a mismatch, by default False.

Raises:

UnableToFindMetadataError – If design names don’t match and override is not allowed.

Return type:

None

Benchling Integration

zcloud.benchling_order.create_benchling_order_folder(program_id, target_id, iteration)[source]

Create a new folder in Benchling for the order.

Parameters:
  • program_id (str) – The program ID for the order.

  • target_id (str) – The target ID for the order.

  • iteration (str) – The iteration number for the order.

Returns:

Response containing folder creation details including registry_folder_id and iteration_folder_id.

Return type:

Dict

zcloud.benchling_order.register_protein_entities(protein_registry_folder_id, small_table_data)[source]

Register protein entities in Benchling.

Parameters:
  • protein_registry_folder_id (str) – The folder ID in Benchling where proteins should be registered.

  • small_table_data (List[Dict[str, str]]) – List of dictionaries containing protein data to register.

Returns:

Response containing registration details including aaSequences with entity registry IDs.

Return type:

Dict

zcloud.benchling_order.publish_benchling_entry(benchling_entry_query_dict)[source]

Create a new entry in Benchling.

Parameters:

benchling_entry_query_dict (Dict) – Dictionary containing entry data including sequence records, CSV data, entry name, GCS bucket path, author email, Monday ticket URL, and iteration folder ID.

Returns:

Response from the Benchling service indicating success or failure of entry creation.

Return type:

Dict

Troubleshooting

Common Issues and Solutions

“Required columns missing from input CSV”

Ensure your CSV includes at minimum sequence and tag_location columns.

“Multiple program IDs found from different sources”

Use --allow-cli-override to force CLI values, or ensure consistency between CSV columns and CLI arguments.

“Unable to find design names”

Choose option 2 to save generated names to a file, correct your original CSV, and re-run.

“Max recursion depth reached”

Too many correction attempts. Start over with corrected input data.

Score columns not found in SCU schema

Score columns that can’t be matched are uploaded as orphaned_score_columns.csv. Consider registering new columns in your SCU schema if they represent valid computed metrics.

Benchling folder creation failed

Check that your program/target/iteration combination is valid and that you have appropriate Benchling permissions.

Environment Setup

Ensure your environment has proper credentials configured for:

  • Google Cloud Storage access

  • Benchling API authentication

  • Monday.com API access (if using ticket validation)

VMs in the ZipTx cluster are generally configured via Workload Identity Federation, and use the compute engine service account which should be authorized. When using another machine and something like gcloud auth application-default login, have your admin set your user account with all the necessary permissions.

The tool will attempt to provide specific error messages if authentication fails for any service.

Tips for Success

  1. Start simple: Begin with a minimal CSV (just sequences and tag locations) and let the tool guide you through adding metadata.

  2. Use Monday integration: Providing --monday-ticket-link enables consistency checking against your project management system.

  3. Keep raw data: The tool preserves your original CSV alongside any corrections, ensuring data provenance.

  4. Leverage fuzzy matching: Don’t worry about minor typos UNLESS there are many similar values in the metadata source of truth - the tool’s fuzzy matching will help you correct them interactively.

  5. Double check the source of truth: The tool will check the source of truth for metadata, if your program, target, or binding site is not registered, this tool will fail and not proceed.