Documentation / Code Snippets

Importing CSV files with multi-character delimiters e.g. “!#”

Previously i came across an issue where a user of ImportWP was trying to setup an import of a CSV file that had a multi-character delimiter from a remote url, if this was just a single character delimiter such as “,” this would be no problem to setup and could be changed from within the ImportWP interface for that importer, but instead it used a multi-character delimiter such as “!#” to seperate its data columns.

Looking into how php processes csv files all of the documentation points towards it only allowing a single character delimiter, and research into custom parsing of csv files was not advised due to all the complex use cases required. This left me with the only other option i could think of was to add in the ability to preprocess the csv file before the importer read the file.

Using ImportWP’s ‘iwp/importer/file_uploaded’ action i could preprocess the csv file, searching for all instances of “!#” and replacing them with a single character delimiter that could be then parsed by the importer, the following code shows how to do this and limits it to only do it for a specific importer based on its id.

function jciwp_replace_csv_delimiter($attachment, $importer_id){

   // Limit it to a specific importer id
   if ( 44078 === $importer_id ) {

      if ( isset( $attachment['dest'] ) && file_exists( $attachment['dest'] ) ) {

         // Read/write file into string and search and replace
         $contents = file_get_contents( $attachment['dest'] );
         $contents = str_replace('!#', ',', $contents);

         file_put_contents( $attachment['dest'], $contents );
      }
   }
   
}
add_action( 'iwp/importer/file_uploaded', 'jciwp_replace_csv_delimiter', 10, 2 );

This is a very basic example of replacing a csv files multi-character delimiter with a single character delimiter, this can easily be optimised to reduce memory usage on large files by streaming through the file instead of fetching the entire contents at once when parsing it.