command line - How to tar.gz many similar-size files into multiple archives with a size limit - Ask Ubuntu
i'm on ubuntu 16.04.
i have folder lot of text files (almost 12k). need upload them website accepts .tar.gz
uploads , decompresses them automatically, has limit of 10mb (10000kb) per file (so in particular each file has decompressed on own). if tar.gz
these files resulting file of 72mb.
what create 8 .tar.gz
files, each of size / dimension (strictly) smaller 10000kb.
alternatively, 1 can assume files above have approximately same dimension, create 8 .tar.gz
files more or less same amount of files each.
how can of these 2 tasks?
i fine solution involves gui, cli or scripting. not looking speed here, need done.
totally patchwork , quick, rough sketch is, tested on directory 3000 files, script below did extremely fast job:
#!/usr/bin/env python3 import subprocess import os import sys splitinto = 2 dr = sys.argv[1] os.chdir(dr) files = os.listdir(dr) n_files = len(files) size = n_files // splitinto def compress(tar, files): command = ["tar", "-zcvf", "tarfile" + str(tar) + ".tar.gz", "-t", "-", "--null"] proc = subprocess.popen(command, stdin=subprocess.pipe) proc: proc.stdin.write(b'\0'.join(map(str.encode, files))) proc.stdin.write(b'\0') if proc.returncode: sys.exit(proc.returncode) sub = []; tar = 1 f in files: sub.append(f) if len(sub) == size: compress(tar, sub) sub = []; tar += 1 if sub: # taking care of left compress(tar, sub)
how use
- save empty file
compress_split.py
- in head section, set number of files compress into. in practice, there 1 more take care of remaining few "left overs".
run directory files argument:
python3 /path/tocompress_split.py /directory/with/files/tocompress
numbered .tar.gz
files created in same directory files are.
explanation
the script:
- lists files in directory
- cd's directory prevent adding path info tar file
- reads through file list, grouping them set division
- compresses sub group(s) numbered files
edit
automatically create chunks size in mb
more sophisticated use max- size (in mb) of chunks (second) argument. in script below, chunks written compressed file chunk reaches (passes) threshold.
since script triggered chunks, exceeding threshold, work if size of (all) files substantially smaller chunk size.
the script:
#!/usr/bin/env python3 import subprocess import os import sys dr = sys.argv[1] chunksize = float(sys.argv[2]) os.chdir(dr) files = os.listdir(dr) n_files = len(files) def compress(tar, files): command = ["tar", "-zcvf", "tarfile" + str(tar) + ".tar.gz", "-t", "-", "--null"] proc = subprocess.popen(command, stdin=subprocess.pipe) proc: proc.stdin.write(b'\0'.join(map(str.encode, files))) proc.stdin.write(b'\0') if proc.returncode: sys.exit(proc.returncode) sub = []; tar = 1; subsize = 0 f in files: sub.append(f) subsize = subsize + (os.path.getsize(f)/1000000) if subsize >= chunksize: compress(tar, sub) sub = []; tar += 1; subsize = 0 if sub: # taking care of left compress(tar, sub)
to run:
python3 /path/tocompress_split.py /directory/with/files/tocompress chunksize
...where chunksize size of input tar command.
in one, suggested improvements @davidfoerster included. a lot!
Comments
Post a Comment