Friday, March 1, 2013

Rapid file duplicator or copier


Recently, I was given a task to check the document processing capability of an application. The objective of the test was to check whether system can consume 2 million documents in one hour duration.

System is configured to consume the documents from the specified folder, but the question is how to create 2 million documents in a folder. Manually copying the files take lot of time and test need to be repeated multiple times. Windows file system don't work optimally when a folder contain more than 5000 files, so need to create sub-folders, each sub-folder containing 4000 files and in total 2 million files.

I have created simple VB script program that use copy method to create duplicate files, it took nearly 20 hours. Then I started redesigning the program that will run the copy method in multiple threads, so that task can be accomplished in 1 hour by utilizing the 100% CPU capacity.

It consists of two programs. Initiator program calls the duplicator program multiple times, so that each program runs in different thread and task is accomplished quickly. Program settings need to be tweaked as per the system configuration, so that threads run in an optimal way, not too many or too less threads.

Program download link

Initiator.vbs 


'Make sure you have enough space on the system, this program run in multiple threads by utilizing 100% cpu
'Just copy the files in any folder, it will automatically create subfolders
'Perform file partations in the optimal way.

FileName = "test.docx" 'Make sure file exist in the folder
NumberOfCopies = 100 'Make sure division with the below number gives reminder 0 -  1000000 (Actual Test)
NumberOfPartations = 10 'Number Of partaions or blocks for the above said file copies  - 10000  (Actual Test)
TimeStampGenerationAfterCopies = 10 'Generate time stamp in the log file after creating som many files - 1000 (Actual Test)
NumberOfFilesInFolder = 4 'Number of files inside each folder - 3000 (Actual Test)


'-------------------------------------------------------------------------------------

'For appending zeros to the file name, so that files are sorted sequently
Zeros = len(NumberOfCopies)

Const ForAppending = 8
count = NumberOfCopies/NumberOfPartations

Set wshShell = CreateObject( "WScript.Shell" )
set fso=CreateObject("Scripting.FileSystemObject")
WorkingDirectory = fso.GetParentFolderName(Wscript.ScriptFullName)

'Check folder exist, else create the folder
strFolder = WorkingDirectory & "\Duplicates\"
If Not fso.FolderExists(strFolder) Then
   fso.CreateFolder(strFolder)
End If

strFolder = WorkingDirectory & "\log\"
If Not fso.FolderExists(strFolder) Then
   fso.CreateFolder(strFolder)
End If

Set MyFile = fso.OpenTextFile(WorkingDirectory & "\log\log.txt", ForAppending, True)
MyFile.WriteLine("###################################################")
MyFile.WriteLine("File Duplication Launch Start:"  & funGetTimeStamp())
MyFile.WriteLine("Total Launches:"  & count)

Start = 1
End1 = NumberOfPartations

for count1 = 1 to count

strFileName =  WorkingDirectory & "\FileDuplicater.vbs" & " " & Start & " " & End1 & " " & count1 & " " & TimeStampGenerationAfterCopies & " " & NumberOfFilesInFolder & " " & FileName & " " & Zeros
wshShell.Run "wscript " & strFileName, 1, False
WScript.Sleep 3000

Start = Start + NumberOfPartations
End1 = End1 + NumberOfPartations


next

MyFile.WriteLine("File Duplication Launch  End:"  & funGetTimeStamp())

Set fso = Nothing
Set wshShell = Nothing


Function funGetTimeStamp()
sDateTIme = Now()

iDate = Datepart("d",sDateTime)
iLen = Len(iDate)
If iLen = 1 Then
iDate = "0" & iDate
End If

sMonth=  mid(MonthName(Datepart("m",sDateTime)),1,3)

iYear = Datepart("yyyy",sDateTime)

iHour = Datepart("h",sDateTime)
iLen = Len(iHour)
If iLen = 1 Then
iHour = "0" & iHour
End If

iMinute = Datepart("n",sDateTime)
iLen = Len(iMinute)
If iLen = 1 Then
iMinute = "0" & iMinute
End If

iSec = Datepart("s",sDateTime)
iLen = Len(iSec)
If iLen = 1 Then
iSec = "0" & iSec
End If


funGetTimeStamp =  sMonth & "_" &  iDate & "_" & iYear & "_" & iHour & "_" & iMinute & "_" & iSec

End Function


FileDuplicator.vbs


'This program need to be called by Initiator.vbs that pass the necessary command line parameters
'Arguments
Set objArgs = WScript.Arguments
StartIndex = clng(objArgs(0))
EndIndex = clng(objArgs(1))
LaunchID = clng(objArgs(2))
TimeStampGenerationAfterCopies = clng(objArgs(3))
NumberOfFilesInFolder = clng(objArgs(4))
FileName = objArgs(5)
Zeros = clng(objArgs(6))
Set objArgs = Nothing

FolderIndex = 1
FileCount = 1


TimeStampGenerationAfterCopies1 = TimeStampGenerationAfterCopies

set fso=CreateObject("Scripting.FileSystemObject")
WorkingDirectory = fso.GetParentFolderName(Wscript.ScriptFullName)

strFolder = WorkingDirectory & "\Duplicates\" & LaunchID & "_" & FolderIndex & "\"
If Not fso.FolderExists(strFolder) Then
   fso.CreateFolder(strFolder)
End If

Length = len(FileName)
JustFileName = Mid(FileName,1,Length-5) '.docx len 5
JUstFileExt = Mid(FileName,Length-4) 'docx len 4

LogFile = "\log\log_" & LaunchID & ".txt"

OrginalFileNamePath = WorkingDirectory & "\" & FileName

Set MyFile = fso.OpenTextFile(WorkingDirectory & LogFile, ForAppending, True)
MyFile.WriteLine("##########################################################")
MyFile.WriteLine("LaunchID:" & LaunchID & "---" & "Start:" & funGetTimeStamp())
MyFile.WriteLine("LaunchID:" & LaunchID & "---" & "StartIndex:" & StartIndex)
MyFile.WriteLine("LaunchID:" & LaunchID & "---" & "  EndIndex:" & EndIndex)
MyFile.WriteLine("LaunchID:" & LaunchID & "---" & "  TimeStamp Generated after number of files:" & TimeStampGenerationAfterCopies)

TimeStampCounter = TimeStampGenerationAfterCopies + StartIndex

if StartIndex = 1 then
else
StartIndex = StartIndex - 1
end if
Const ForAppending = 8

for count = StartIndex to EndIndex
'Logic to append zeros
FileIndexLength = len(count)
FileIndex = count
for count1 = 1 to (Zeros - FileIndexLength)
FileIndex = "0" & FileIndex
next

DuplicateFileNamePath = strFolder & JustFileName & "_" & FileIndex & JUstFileExt
fso.CopyFile OrginalFileNamePath, DuplicateFileNamePath , True

if TimeStampCounter = count then
MyFile.WriteLine("LaunchID:" & LaunchID & "---" & "Total Files Duplicated:" & TimeStampGenerationAfterCopies1 & "---" & funGetTimeStamp())
TimeStampCounter = TimeStampCounter + TimeStampGenerationAfterCopies
TimeStampGenerationAfterCopies1 = TimeStampGenerationAfterCopies1 + TimeStampGenerationAfterCopies
else
end if

if FileCount = NumberOfFilesInFolder then
FileCount = 0
FolderIndex = FolderIndex + 1
strFolder = WorkingDirectory & "\Duplicates\" & LaunchID & "_" & FolderIndex & "\"
If Not fso.FolderExists(strFolder) Then
fso.CreateFolder(strFolder)
End If
End If
FileCount = FileCount + 1

next

MyFile.WriteLine("LaunchID:" & LaunchID & "---" & "  End:" & funGetTimeStamp())
MyFile.Close
Set MyFile = Nothing

'wscript.echo "File Duplication Completed. Total Files:" & NumberOfCopies

Set fso = Nothing



Function funGetTimeStamp()
sDateTIme = Now()

iDate = Datepart("d",sDateTime)
iLen = Len(iDate)
If iLen = 1 Then
iDate = "0" & iDate
End If

sMonth=  mid(MonthName(Datepart("m",sDateTime)),1,3)

iYear = Datepart("yyyy",sDateTime)

iHour = Datepart("h",sDateTime)
iLen = Len(iHour)
If iLen = 1 Then
iHour = "0" & iHour
End If

iMinute = Datepart("n",sDateTime)
iLen = Len(iMinute)
If iLen = 1 Then
iMinute = "0" & iMinute
End If

iSec = Datepart("s",sDateTime)
iLen = Len(iSec)
If iLen = 1 Then
iSec = "0" & iSec
End If


funGetTimeStamp =  sMonth & "_" &  iDate & "_" & iYear & "_" & iHour & "_" & iMinute & "_" & iSec

End Function


Folder structure (Create below folders at any location in your file system)





---




No comments:

Post a Comment