리눅스 wget 명령어
설명
CUI환경에서 파일을 다운받을때 사용한다.
도움말
$ wget -help
GNU Wget 1.17.1, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
Mandatory arguments to long options are mandatory for short options too.
Startup:
-V, --version display the version of Wget and exit
-h, --help print this help
-b, --background go to background after startup
-e, --execute=COMMAND execute a `.wgetrc'-style command
Logging and input file:
-o, --output-file=FILE log messages to FILE
-a, --append-output=FILE append messages to FILE
-d, --debug print lots of debugging information
-q, --quiet quiet (no output)
-v, --verbose be verbose (this is the default)
-nv, --no-verbose turn off verboseness, without being quiet
--report-speed=TYPE output bandwidth as TYPE. TYPE can be bits
-i, --input-file=FILE download URLs found in local or external FILE
-F, --force-html treat input file as HTML
-B, --base=URL resolves HTML input-file links (-i -F)
relative to URL
--config=FILE specify config file to use
--no-config do not read any config file
--rejected-log=FILE log reasons for URL rejection to FILE
Download:
-t, --tries=NUMBER set number of retries to NUMBER (0 unlimits)
--retry-connrefused retry even if connection is refused
-O, --output-document=FILE write documents to FILE
-nc, --no-clobber skip downloads that would download to
existing files (overwriting them)
-c, --continue resume getting a partially-downloaded file
--start-pos=OFFSET start downloading from zero-based position OFFSET
--progress=TYPE select progress gauge type
--show-progress display the progress bar in any verbosity mode
-N, --timestamping don't re-retrieve files unless newer than
local
--no-if-modified-since don't use conditional if-modified-since get
requests in timestamping mode
--no-use-server-timestamps don't set the local file's timestamp by
the one on the server
-S, --server-response print server response
--spider don't download anything
-T, --timeout=SECONDS set all timeout values to SECONDS
--dns-timeout=SECS set the DNS lookup timeout to SECS
--connect-timeout=SECS set the connect timeout to SECS
--read-timeout=SECS set the read timeout to SECS
-w, --wait=SECONDS wait SECONDS between retrievals
--waitretry=SECONDS wait 1..SECONDS between retries of a retrieval
--random-wait wait from 0.5*WAIT...1.5*WAIT secs between retrievals
--no-proxy explicitly turn off proxy
-Q, --quota=NUMBER set retrieval quota to NUMBER
--bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host
--limit-rate=RATE limit download rate to RATE
--no-dns-cache disable caching DNS lookups
--restrict-file-names=OS restrict chars in file names to ones OS allows
--ignore-case ignore case when matching files/directories
-4, --inet4-only connect only to IPv4 addresses
-6, --inet6-only connect only to IPv6 addresses
--prefer-family=FAMILY connect first to addresses of specified family,
one of IPv6, IPv4, or none
--user=USER set both ftp and http user to USER
--password=PASS set both ftp and http password to PASS
--ask-password prompt for passwords
--no-iri turn off IRI support
--local-encoding=ENC use ENC as the local encoding for IRIs
--remote-encoding=ENC use ENC as the default remote encoding
--unlink remove file before clobber
Directories:
-nd, --no-directories don't create directories
-x, --force-directories force creation of directories
-nH, --no-host-directories don't create host directories
--protocol-directories use protocol name in directories
-P, --directory-prefix=PREFIX save files to PREFIX/..
--cut-dirs=NUMBER ignore NUMBER remote directory components
HTTP options:
--http-user=USER set http user to USER
--http-password=PASS set http password to PASS
--no-cache disallow server-cached data
--default-page=NAME change the default page name (normally
this is 'index.html'.)
-E, --adjust-extension save HTML/CSS documents with proper extensions
--ignore-length ignore 'Content-Length' header field
--header=STRING insert STRING among the headers
--max-redirect maximum redirections allowed per page
--proxy-user=USER set USER as proxy username
--proxy-password=PASS set PASS as proxy password
--referer=URL include 'Referer: URL' header in HTTP request
--save-headers save the HTTP headers to file
-U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION
--no-http-keep-alive disable HTTP keep-alive (persistent connections)
--no-cookies don't use cookies
--load-cookies=FILE load cookies from FILE before session
--save-cookies=FILE save cookies to FILE after session
--keep-session-cookies load and save session (non-permanent) cookies
--post-data=STRING use the POST method; send STRING as the data
--post-file=FILE use the POST method; send contents of FILE
--method=HTTPMethod use method "HTTPMethod" in the request
--body-data=STRING send STRING as data. --method MUST be set
--body-file=FILE send contents of FILE. --method MUST be set
--content-disposition honor the Content-Disposition header when
choosing local file names (EXPERIMENTAL)
--content-on-error output the received content on server errors
--auth-no-challenge send Basic HTTP authentication information
without first waiting for the server's
challenge
HTTPS (SSL/TLS) options:
--secure-protocol=PR choose secure protocol, one of auto, SSLv2,
SSLv3, TLSv1 and PFS
--https-only only follow secure HTTPS links
--no-check-certificate don't validate the server's certificate
--certificate=FILE client certificate file
--certificate-type=TYPE client certificate type, PEM or DER
--private-key=FILE private key file
--private-key-type=TYPE private key type, PEM or DER
--ca-certificate=FILE file with the bundle of CAs
--ca-directory=DIR directory where hash list of CAs is stored
--crl-file=FILE file with bundle of CRLs
--random-file=FILE file with random data for seeding the SSL PRNG
--egd-file=FILE file naming the EGD socket with random data
HSTS options:
--no-hsts disable HSTS
--hsts-file path of HSTS database (will override default)
FTP options:
--ftp-user=USER set ftp user to USER
--ftp-password=PASS set ftp password to PASS
--no-remove-listing don't remove '.listing' files
--no-glob turn off FTP file name globbing
--no-passive-ftp disable the "passive" transfer mode
--preserve-permissions preserve remote file permissions
--retr-symlinks when recursing, get linked-to files (not dir)
FTPS options:
--ftps-implicit use implicit FTPS (default port is 990)
--ftps-resume-ssl resume the SSL/TLS session started in the control connection when
opening a data connection
--ftps-clear-data-connection cipher the control channel only; all the data will be in plaintext
--ftps-fallback-to-ftp fall back to FTP if FTPS is not supported in the target server
WARC options:
--warc-file=FILENAME save request/response data to a .warc.gz file
--warc-header=STRING insert STRING into the warcinfo record
--warc-max-size=NUMBER set maximum size of WARC files to NUMBER
--warc-cdx write CDX index files
--warc-dedup=FILENAME do not store records listed in this CDX file
--no-warc-compression do not compress WARC files with GZIP
--no-warc-digests do not calculate SHA1 digests
--no-warc-keep-log do not store the log file in a WARC record
--warc-tempdir=DIRECTORY location for temporary files created by the
WARC writer
Recursive download:
-r, --recursive specify recursive download
-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite)
--delete-after delete files locally after downloading them
-k, --convert-links make links in downloaded HTML or CSS point to
local files
--convert-file-only convert the file part of the URLs only (usually known as the basename)
--backups=N before writing file X, rotate up to N backup files
-K, --backup-converted before converting file X, back up as X.orig
-m, --mirror shortcut for -N -r -l inf --no-remove-listing
-p, --page-requisites get all images, etc. needed to display HTML page
--strict-comments turn on strict (SGML) handling of HTML comments
Recursive accept/reject:
-A, --accept=LIST comma-separated list of accepted extensions
-R, --reject=LIST comma-separated list of rejected extensions
--accept-regex=REGEX regex matching accepted URLs
--reject-regex=REGEX regex matching rejected URLs
--regex-type=TYPE regex type (posix|pcre)
-D, --domains=LIST comma-separated list of accepted domains
--exclude-domains=LIST comma-separated list of rejected domains
--follow-ftp follow FTP links from HTML documents
--follow-tags=LIST comma-separated list of followed HTML tags
--ignore-tags=LIST comma-separated list of ignored HTML tags
-H, --span-hosts go to foreign hosts when recursive
-L, --relative follow relative links only
-I, --include-directories=LIST list of allowed directories
--trust-server-names use the name specified by the redirection
URL's last component
-X, --exclude-directories=LIST list of excluded directories
-np, --no-parent don't ascend to the parent directory
Mail bug reports and suggestions to <bug-wget@gnu.org>
활용 예
영상 다운로드서버에 올라가있는 영상들이 정상적인지를 체크하기위해 한번씩 다운로드해준다.
list.txt
http://111.111.11.1/assets/prod/1629/ORIGINAL/1629_1504176178478.ts
http://111.111.11.1/assets/prod/1630/ORIGINAL/1630_1504176206826.ts
http://1111.111.11.1/assets/prod/1672/ORIGINAL/1672_1505466475322.ts
엄청많아 100개가 넘어 ..
$ wget -i list.txt -o log --no-verbose
list에 써져있는 파일들을 차례대로 다운로드한 후 로그를 남긴다(성공여부만 필요)
-i 인풋 -o 아웃풋 --no-verbose 다 남길 필요없다.
log
2017-12-22 17:17:53 URL:http://111.111.11.1/assets/prod/1629/ORIGINAL/1629_1504176178478.ts [16958540/16958540] -> "1629_1504176178478.ts.2" [1]
2017-12-22 17:17:56 URL:http://111.111.11.1/assets/prod/1630/ORIGINAL/1630_1504176206826.ts [32955836/32955836] -> "1630_1504176206826.ts.2" [1]
2017-12-22 17:17:59 URL:http://111.111.11.1/assets/prod/1672/ORIGINAL/1672_1505466475322.ts [32954520/32954520] -> "1672_1505466475322.ts.2" [1]
2017-12-22 17:18:03 URL:http://111.111.11.1/assets/uplus-assets/prod/1686/ORIGINAL/1686_1505969267788.ts [32950572/32950572] -> "1686_1505969267788.ts.2" [1]
http://111.111.11.1/assets/prod/1692/ORIGINAL/1692_1506023232326203376.ts:
2017-12-22 17:18:03 ERROR 404: Not Found.
FINISHED --2017-12-22 17:18:03--
Total wall clock time: 12s
Downloaded: 4 files, 110M in 11s (9.61 MB/s)
없으면 저렇게 404 Not Found 뜬다.
20개정도 브라우저에 입력하다가 빡쳐서 찾았다.