Script to split huge text file into chunks with headers

As a powershell newbie I’m pretty happy with this one, even if it relies on calling the split utility from cygwin and a dos copy command to perform…

$path = 'H:\DataDump_DEV\MME\seed\ivln\'
$FileName = 'bi_extract.ivln'
$SourceFile = $path + $filename + '.csv'
$chunk = '65535k'
$headerfile = $path + $FileName + '-header.csv'

$cmd = 'C:\Users\BI\cygwin\bin\split.exe -C ' + $chunk + ' -a 5 -d ' + $SourceFile + ' ' + $path + $filename + '_split_'

Invoke-Expression $cmd

ls ($path + $FileName + '_split_*') |
Foreach-object -process { cmd /c Copy /b $headerfile + $_.FullName ($path + $_.Name + '.csv')
remove-item $_.FullName}

The split.exe is the windows port of the unix split command, I downloaded this as part of cygwin. However I didn’t want to install cygwin on the server so just copied the split.exe file across.

This performs pretty well on our VM, it processed 200GB of text files into 64MB chunks in about 2 hours. It’s a one off process to seed the historical invoices into the DWH, so I have need to seek further enhancements.


You can be the first one to leave a comment.

Leave a Comment