typeddfs.file_formats

File formats for reading/writing to/from DFs.

Module Contents

class typeddfs.file_formats.BaseCompression
base :pathlib.Path
compression :CompressionFormat
class typeddfs.file_formats.BaseFormatCompression
base :pathlib.Path
compression :CompressionFormat
format :Optional[FileFormat]
class typeddfs.file_formats.CompressionFormat

A compression scheme or no compression: gzip, zip, bz2, xz, and none. These are the formats supported by Pandas for read and write. Provides a few useful functions for calling code.

Examples

  • CompressionFormat.strip("my_file.csv.gz")  # Path("my_file.csv")

  • CompressionFormat.from_path("myfile.csv")  # CompressionFormat.none

bz2 = []
gz = []
none = []
xz = []
zip = []
classmethod all_suffixes(cls) Set[str]

Returns all suffixes for all compression formats.

classmethod from_path(cls, path: typeddfs.utils._utils.PathLike) CompressionFormat

Returns the compression scheme from a path suffix.

classmethod from_suffix(cls, suffix: str) CompressionFormat

Returns the recognized compression scheme from a suffix.

property full_name(self) str

Returns a more-complete name of this format. For example, β€œgzip” β€œbzip2”, β€œxz”, and β€œnone”.

property is_compressed(self) bool

Shorthand for fmt is not CompressionFormat.none.

classmethod list(cls) Set[CompressionFormat]

Returns the set of CompressionFormats. Works with static type analysis.

classmethod list_non_empty(cls) Set[CompressionFormat]

Returns the set of CompressionFormats, except for none. Works with static type analysis.

classmethod of(cls, t: Union[str, CompressionFormat]) CompressionFormat

Returns a FileFormat from a name (e.g. β€œgz” or β€œgzip”). Case-insensitive.

Example

CompressionFormat.of("gzip").suffix  # ".gz"

classmethod split(cls, path: typeddfs.utils._utils.PathLike) BaseCompression
classmethod strip_suffix(cls, path: typeddfs.utils._utils.PathLike) pathlib.Path

Returns a path with any recognized compression suffix (e.g. β€œ.gz”) stripped.

property suffix(self) str

Returns the single Pandas-recognized suffix for this format. This is just β€œβ€ for CompressionFormat.none.

class typeddfs.file_formats.FileFormat

A computer-readable format for reading and writing of DataFrames in typeddfs. This includes CSV, Parquet, ODT, etc. Some formats also include compressed variants. E.g. a β€œ.csg.gz” will map to FileFormat.csv. This is used internally by typeddfs.abs_df.read_file() and typeddfs.abs_df.write_file(), but it may be useful to calling code directly.

Examples

  • FileFormat.from_path("my_file.csv.gz").is_text()   # True

  • FileFormat.from_path("my_file.csv.gz").can_read()  # always True

  • FileFormat.from_path("my_file.xlsx").can_read()    # true if required package is installed

csv = []
feather = []
flexwf = []
fwf = []
hdf = []
ini = []
json = []
lines = []
ods = []
parquet = []
pickle = []
properties = []
toml = []
tsv = []
xls = []
xlsb = []
xlsx = []
xml = []
classmethod all_readable(cls) Set[FileFormat]

Returns all formats that can be read on this system. Note that the result may depend on whether supporting packages are installed. Includes insecure and discouraged formats.

classmethod all_writable(cls) Set[FileFormat]

Returns all formats that can be written to on this system. Note that the result may depend on whether supporting packages are installed. Includes insecure and discouraged formats.

property can_always_read(self) bool

Returns whether this format can be read as long as typeddfs is installed. In other words, regardless of any optional packages.

property can_always_write(self) bool

Returns whether this format can be written to as long as typeddfs is installed. In other words, regardless of any optional packages.

property can_read(self) bool

Returns whether this format can be read. Note that the result may depend on whether supporting packages are installed.

property can_write(self) bool

Returns whether this format can be written. Note that the result may depend on whether supporting packages are installed.

compressed_variants(self, suffix: str) Set[str]

Returns all allowed suffixes.

Example


FileFormat.json.compressed_variants(β€œ.json”) # {β€œ.json”, β€œ.json.gz”, β€œ.json.zip”, …}

classmethod from_path(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) FileFormat

Guesses a FileFormat from a filename.

See also

from_suffix()

Parameters
  • path – A string or pathlib.Path to a file.

  • format_map – A mapping from suffixes to formats; if None, uses suffix_map().

Raises

typeddfs.df_errors.FilenameSuffixError – If not found

classmethod from_path_or_none(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) Optional[FileFormat]

Same as from_path(), but returns None if not found.

classmethod from_suffix(cls, suffix: str, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) FileFormat

Returns the FileFormat corresponding to a filename suffix.

See also

from_path()

Parameters
  • suffix – E.g. β€œ.csv.gz” or β€œ.feather”

  • format_map – A mapping from suffixes to formats; if None, uses suffix_map().

Raises

typeddfs.df_errors.FilenameSuffixError – If not found

classmethod from_suffix_or_none(cls, suffix: str, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) Optional[FileFormat]

Same as from_suffix(), but returns None if not found.

property is_binary(self) bool

Returns whether this format is text-encoded. Note that this does not consider whether the file is compressed.

Returns whether the format is good. Includes CSV, TSV, Parquet, etc. Excludes all insecure formats along with fixed-width, INI, properties, TOML, and HDF5.

property is_secure(self) bool

Returns whether the format does NOT have serious security issues. These issues only apply to reading files, not writing. Excel formats that support Macros are not considered secure. This includes .xlsm, .xltm, and .xls. These can simply be replaced with xlsx. Note that .xml is treated as secure: Although some parsers are subject to entity expansion attacks, good ones are not.

property is_text(self) bool

Returns whether this format is text-encoded. Note that this does not consider whether the file is compressed.

classmethod list(cls) Set[FileFormat]

Returns the set of FileFormats. Works with static type analysis.

matches(self, *, supported: bool, secure: bool, recommended: bool) bool

Returns whether this format meets some requirements.

Parameters
classmethod of(cls, t: Union[str, FileFormat]) FileFormat

Returns a FileFormat from an exact name (e.g. β€œcsv”).

classmethod split(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) BaseFormatCompression

Splits a path into the base path, format, and compression.

Raises

FilenameSuffixError – If the suffix is not found

Returns

A 3-tuple of (base base excluding suffixes, file format, compression format)

classmethod split_or_none(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) BaseFormatCompression

Splits a path into the base path, format, and compression.

Returns

A 3-tuple of (base base excluding suffixes, file format, compression format)

classmethod strip(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) pathlib.Path

Strips a recognized, optionally compressed, suffix from path.

See also

split()

Example

FileFormat.strip("abc/xyz.csv.gz")  # Path("abc") / "xyz"
classmethod suffix_map(cls) MutableMapping[str, FileFormat]

Returns a mapping from all suffixes to their respective formats. See suffixes().

property suffixes(self) Set[str]

Returns the suffixes that are tied to this format. These will not overlap with the suffixes for any other format. For example, .txt is for FileFormat.lines, although it could be treated as tab- or space-separated.

property supports_encoding(self) bool

Returns whether this format supports a text encoding of some sort. This may not correspond to an encoding= parameter, and the format may be binary. For example, XLS and XML support encodings.