Retrieve 'Magic' Attributes from Files and Directories

The 'libmagic' library provides functions to determine 'MIME' type and other metadata from files through their "magic" attributes. This is useful when you do not wish to rely solely on the honesty of a user or the extension on a file name. It also incorporates other metadata from the mime-db database < https://github.com/jshttp/mime-db>.


wand : Retrieve 'Magic' Attributes from Files and Directories

The libmagic library must be installed on *nix/macOS and available to use this.

  • apt-get install libmagic-dev on Ubuntu/Debian-ish systems
  • brew install libmagic on macOS
  • yum install file-devel on RHEL/CentOS/Fedora

While the package was developed using the 5.28 version of libmagic it has been configured to work with older versions. Note that some fields in the resultant data frame might not be available with older library versions. When using the function magic_wand_file() it checks for which version of libmagic is installed on your system and provides a suitable magic.mgc file for it.

The package also works on Windows but it's a bit of a hack because, well, Windows. The Windows version makes two system2() calls and relies on Rtools being installed and file.exe being available on the Windows PATH, so it's sub-optimal at best. Help to get it working in C would be greatly appreciated. Windows folk can go here to find out more info on Rtools.

The following functions are implemented:

  • incant : returns the "magic" metadata of the files in the input vector (as a data frame)
  • magic_wand_file : provides a full path to the package-provided magic file

The following datasets are included:

devtools::install_github("hrbrmstr/wand")
library(wand)
library(dplyr)
 
system.file("extdata", "img", package="wand") %>% 
  list.files(full.names=TRUE) %>% 
  incant() %>% 
  glimpse()
## Observations: 10
## Variables: 5
## $ file        <chr> "/Library/Frameworks/R.framework/Versions/3.3/Resources/library/wand/extdata/img/example_dir", ...
## $ mime_type   <chr> "inode/directory", "text/x-c", "text/html", "text/plain", "text/rtf", "image/jpeg", "applicatio...
## $ encoding    <chr> "binary", "us-ascii", "us-ascii", "us-ascii", "us-ascii", "binary", "binary", "binary", "us-asc...
## $ extensions  <list> [NA, <"c", "cc", "cpp", "cxx", "dic", "h", "hh">, <"htm", "html", "shtml">, <"conf", "def", "i...
## $ description <chr> "directory", "C source, ASCII text", "HTML document, ASCII text, with CRLF line terminators", "...
# Use a non-system magic-file
 
system.file("extdata", "img", package="wand") %>% 
  list.files(full.names=TRUE) %>% 
  incant(magic_wand_file()) %>% 
  select(description) %>% 
  unlist(use.names=FALSE)
##  [1] "directory"                                                                                                                                                                                                        
##  [2] "C source, ASCII text"                                                                                                                                                                                             
##  [3] "HTML document, ASCII text, with CRLF line terminators"                                                                                                                                                            
##  [4] "ASCII text, with no line terminators"                                                                                                                                                                             
##  [5] "Rich Text Format data, version 1, ANSI"                                                                                                                                                                           
##  [6] "JPEG image data, JFIF standard 1.01, aspect ratio, density 72x72, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=2, orientation=upper-left], baseline, precision 8, 800x700, frames 3"
##  [7] "PDF document, version 1.3"                                                                                                                                                                                        
##  [8] "PNG image data, 800 x 700, 8-bit/color RGBA, non-interlaced"                                                                                                                                                      
##  [9] "ASCII text, with very long lines, with CRLF line terminators"                                                                                                                                                     
## [10] "TIFF image data, big-endian"
# what kinds of extensions are associated with these mime types
system.file("extdata", "img", package="wand") %>% 
  list.files(full.names=TRUE) %>% 
  incant(magic_wand_file()) %>% 
  select(extensions) %>% 
  as.data.frame()
##                                  extensions
## 1                                        NA
## 2               c, cc, cpp, cxx, dic, h, hh
## 3                          htm, html, shtml
## 4  conf, def, in, ini, list, log, text, txt
## 5                                       rtf
## 6                      jfif, jpe, jpeg, jpg
## 7                                       pdf
## 8                                       png
## 9  conf, def, in, ini, list, log, text, txt
## 10                                tif, tiff
# current verison
packageVersion("wand")
## [1] '0.2.0'
library(wand)
library(testthat)
 
date()
## [1] "Mon Aug 15 14:56:16 2016"
test_dir("tests/")
## testthat results ========================================================================================================
## OK: 1 SKIPPED: 0 FAILED: 0
## 
## DONE ===================================================================================================================

News

0.2.0

  • Works on Windows

0.1.0

  • Initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("wand")

0.2.0 by Bob Rudis, a year ago


http://github.com/hrbrmstr/wand


Report a bug at https://github.com/hrbrmstr/wand/issues


Browse source code at https://github.com/cran/wand


Authors: Bob Rudis (@hrbrmstr), Christos Zoulas [libmagic], Mans Rullgard [file], Jonathan Ong <me@jongleberry.com> [mime-db]


Documentation:   PDF Manual  


AGPL license


Imports dplyr, purrr, rappdirs, stats, stringi, tibble, tidyr, utils, Rcpp

Suggests testthat

Linking to Rcpp

System requirements: libmagic (>= 5.14) for Unix/Linux/macOS; Rtools 3.3+ for Windows


See at CRAN