command line - How to tar.gz many similar-size files into multiple archives with a size limit - Ask Ubuntu


i'm on ubuntu 16.04.

i have folder lot of text files (almost 12k). need upload them website accepts .tar.gz uploads , decompresses them automatically, has limit of 10mb (10000kb) per file (so in particular each file has decompressed on own). if tar.gz these files resulting file of 72mb.

what create 8 .tar.gz files, each of size / dimension (strictly) smaller 10000kb.

alternatively, 1 can assume files above have approximately same dimension, create 8 .tar.gz files more or less same amount of files each.

how can of these 2 tasks?

i fine solution involves gui, cli or scripting. not looking speed here, need done.

totally patchwork , quick, rough sketch is, tested on directory 3000 files, script below did extremely fast job:

#!/usr/bin/env python3 import subprocess import os import sys  splitinto = 2  dr = sys.argv[1] os.chdir(dr)  files = os.listdir(dr) n_files = len(files) size = n_files // splitinto  def compress(tar, files):     command = ["tar", "-zcvf", "tarfile" + str(tar) + ".tar.gz", "-t", "-", "--null"]     proc = subprocess.popen(command, stdin=subprocess.pipe)     proc:         proc.stdin.write(b'\0'.join(map(str.encode, files)))         proc.stdin.write(b'\0')     if proc.returncode:         sys.exit(proc.returncode)  sub = []; tar = 1 f in files:     sub.append(f)     if len(sub) == size:         compress(tar, sub)         sub = []; tar += 1  if sub:     # taking care of left     compress(tar, sub) 

how use

  • save empty file compress_split.py
  • in head section, set number of files compress into. in practice, there 1 more take care of remaining few "left overs".
  • run directory files argument:

    python3 /path/tocompress_split.py /directory/with/files/tocompress 

numbered .tar.gz files created in same directory files are.

explanation

the script:

  • lists files in directory
  • cd's directory prevent adding path info tar file
  • reads through file list, grouping them set division
  • compresses sub group(s) numbered files

edit

automatically create chunks size in mb

more sophisticated use max- size (in mb) of chunks (second) argument. in script below, chunks written compressed file chunk reaches (passes) threshold.

since script triggered chunks, exceeding threshold, work if size of (all) files substantially smaller chunk size.

the script:

#!/usr/bin/env python3 import subprocess import os import sys  dr = sys.argv[1] chunksize = float(sys.argv[2]) os.chdir(dr)  files = os.listdir(dr) n_files = len(files)  def compress(tar, files):     command = ["tar", "-zcvf", "tarfile" + str(tar) + ".tar.gz", "-t", "-", "--null"]     proc = subprocess.popen(command, stdin=subprocess.pipe)     proc:         proc.stdin.write(b'\0'.join(map(str.encode, files)))         proc.stdin.write(b'\0')     if proc.returncode:         sys.exit(proc.returncode)  sub = []; tar = 1; subsize = 0 f in files:     sub.append(f)     subsize = subsize + (os.path.getsize(f)/1000000)     if subsize >= chunksize:         compress(tar, sub)         sub = []; tar += 1; subsize = 0  if sub:     # taking care of left     compress(tar, sub) 

to run:

python3 /path/tocompress_split.py /directory/with/files/tocompress chunksize 

...where chunksize size of input tar command.

in one, suggested improvements @davidfoerster included. a lot!


Comments

Popular posts from this blog

download - Firefox cannot save files (most of the time), how to solve? - Super User

windows - "-2146893807 NTE_NOT_FOUND" when repair certificate store - Super User

sql server - "Configuration file does not exist", Event ID 274 - Super User