Bash Script to Split Large CSV Files

At times I work with very large CSV files and, unfortunately, not all systems can handle massive file imports. In those cases, I need a quick and easy way to split up one massive CSV file into many smaller CSV files. I’ve written the bash script below to do just that. The approach is based on the one suggested by Mark Setchell on StackOverflow.


#!/bin/bash
# Author: Daniel Ziegler (drziegler.net)

#Accept number of rows to split and filename as command line argument
SPLITNUMBER=$1
FILENAME=$2

#Get filename and file extension (probably .csv or .txt)
BASEFILENAME=${FILENAME%%.*}
FILEEXTENSION=${FILENAME##*.}

#Extract first line of input file as header row
HDR=$(head -1 $FILENAME)

#split file into chunks based on the number of lines input
split -l $SPLITNUMBER $FILENAME num

n=1

#loop through chunks and output to new files in the current directory. Output files will be named “filename-n.ext” where “filename” is the input file name and “.ext” is the extension of that file
for f in num*
do

#include the header row unless we’re looking at the first chunk which already includes the header row
if [ $n -gt 1 ]
then echo $HDR > $BASEFILENAME-${n}.$FILEEXTENSION
fi

cat $f >> $BASEFILENAME-${n}.$FILEEXTENSION

rm $f

((n++))

done

Usage is just ./scriptname.sh XXX filetosplit where “XXX” is the number of lines you want in each smaller file and “filetosplit” is the input CSV file you’re looking to split up.

You can download a Zip of this script here: splitfile.sh

Notes

  • This script assumes that your input file starts with a header row. This header row will be repeated in each output file
  • Should work with any file type, I just use it for CSV files primarily
  • I can’t take any responsibility if this script breaks something! Check the code before you run it!

Leave a Reply