Is there a parquet file size limit for loading files using Azure

Laurent Delaquis 0 Reputation points
2026-02-02T20:24:06.2866667+00:00

We are using an ADF pipeline to load a parquet file into a database table. On Friday the data file size was 16,384 KB. On Saturday and this morning, the file size is now 16,387 KB.

The data file is a full copy of data from another system. Any differences from a day-to-day bases are either the changes to the names of a site, the site being enabled/disable, changes to other attributes of the site, and/or additions of sites to the file.

I've been informed that a total of 6 sites were added to the data that creates the file and no other changes were made.

The error we received was:

ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source '{removed file name}' with row number 24702: found more columns than expected column count 30.,Source=Microsoft.DataTransfer.Common,'

I've opened an CSV version of the file with excel and there are no extra columns in the file. The files that errored have 4 additional data rows compare to the last successful load.

Thank you in advance for any assistance provided.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Manoj Kumar Boyini 5,795 Reputation points Microsoft External Staff Moderator
    2026-02-02T20:27:20.6133333+00:00

    Hi Laurent Delaquis,

    It looks like you're encountering a problem when trying to load a parquet file into a database using Azure Data Factory (ADF). The error message indicates that there are more columns in the data than defined, which is usually tied to how your data is structured or how the pipeline is configured. Here’s a step-by-step approach you can try:

    Check the Column Count: Since you're receiving an error regarding more columns than expected, double-check the schema and ensure that the number of columns in the source parquet file matches the defined schema in ADF.

    File Inspection: You mentioned that opening the CSV version in Excel shows no extra columns. It's worth inspecting the actual parquet file as well—use tools like Azure Data Lake Storage or Azure Synapse Analytics to directly inspect the parquet file's schema.

    ADF Pipeline Configuration:

    • Ensure that your dataset schema in ADF matches exactly with that of the parquet file.
      • Look for any changes in the incoming files; since you mentioned new sites were added, make sure any new attributes are accounted for in the schema.
      Review Data in Context: The error occurred at row number 24,702. Check if that specific row contains any unexpected data or formatting issues that could cause misalignment. Adjust ADF Limits: If the file size (16,387 KB) is nearing any limits, consider reviewing Azure Data Factory Limits to ensure you're within acceptable thresholds. Break Down the Pipeline: If you continue to see issues, consider breaking your pipeline into smaller chunks. This might help isolate the problem and avoid hitting any memory limits, as sometimes larger files can lead to complications. Try a Different Runtime: If you're running into resource constraints, using a self-hosted integration runtime with more memory could alleviate the issues you're encountering. Ensure you have proper configuration as described in the documentation about Self-hosted IR settings.

    Here's a set of follow-up questions that might help clarify the situation:

    • Have you modified the schema in the target database recently?
    • Are there any specific transformations applied within the ADF pipeline that could affect the column count?
    • Is there a specific pattern to the rows that return errors, or do they seem random?

    I hope this helps! Feel free to reach out with any more details, and we can dig deeper into the issue together. Good luck!

    References

    1 person found this answer helpful.

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.