typeddfs.file_formats
ο
File formats for reading/writing to/from DFs.
Module Contentsο
- class typeddfs.file_formats.BaseCompressionο
- base :pathlib.Pathο
- compression :CompressionFormatο
- class typeddfs.file_formats.BaseFormatCompressionο
- base :pathlib.Pathο
- compression :CompressionFormatο
- format :Optional[FileFormat]ο
- class typeddfs.file_formats.CompressionFormatο
A compression scheme or no compression: gzip, zip, bz2, xz, and none. These are the formats supported by Pandas for read and write. Provides a few useful functions for calling code.
Examples
CompressionFormat.strip("my_file.csv.gz") # Path("my_file.csv")
CompressionFormat.from_path("myfile.csv") # CompressionFormat.none
- bz2 = []ο
- gz = []ο
- none = []ο
- xz = []ο
- zip = []ο
- classmethod all_suffixes(cls) Set[str] ο
Returns all suffixes for all compression formats.
- classmethod from_path(cls, path: typeddfs.utils._utils.PathLike) CompressionFormat ο
Returns the compression scheme from a path suffix.
- classmethod from_suffix(cls, suffix: str) CompressionFormat ο
Returns the recognized compression scheme from a suffix.
- property full_name(self) str ο
Returns a more-complete name of this format. For example, βgzipβ βbzip2β, βxzβ, and βnoneβ.
- property is_compressed(self) bool ο
Shorthand for
fmt is not CompressionFormat.none
.
- classmethod list(cls) Set[CompressionFormat] ο
Returns the set of CompressionFormats. Works with static type analysis.
- classmethod list_non_empty(cls) Set[CompressionFormat] ο
Returns the set of CompressionFormats, except for
none
. Works with static type analysis.
- classmethod of(cls, t: Union[str, CompressionFormat]) CompressionFormat ο
Returns a FileFormat from a name (e.g. βgzβ or βgzipβ). Case-insensitive.
Example
CompressionFormat.of("gzip").suffix # ".gz"
- classmethod split(cls, path: typeddfs.utils._utils.PathLike) BaseCompression ο
- classmethod strip_suffix(cls, path: typeddfs.utils._utils.PathLike) pathlib.Path ο
Returns a path with any recognized compression suffix (e.g. β.gzβ) stripped.
- property suffix(self) str ο
Returns the single Pandas-recognized suffix for this format. This is just ββ for CompressionFormat.none.
- class typeddfs.file_formats.FileFormatο
A computer-readable format for reading and writing of DataFrames in typeddfs. This includes CSV, Parquet, ODT, etc. Some formats also include compressed variants. E.g. a β.csg.gzβ will map to
FileFormat.csv
. This is used internally bytypeddfs.abs_df.read_file()
andtypeddfs.abs_df.write_file()
, but it may be useful to calling code directly.Examples
FileFormat.from_path("my_file.csv.gz").is_text() # True
FileFormat.from_path("my_file.csv.gz").can_read() # always True
FileFormat.from_path("my_file.xlsx").can_read() # true if required package is installed
- csv = []ο
- feather = []ο
- flexwf = []ο
- fwf = []ο
- hdf = []ο
- ini = []ο
- json = []ο
- lines = []ο
- ods = []ο
- parquet = []ο
- pickle = []ο
- properties = []ο
- toml = []ο
- tsv = []ο
- xls = []ο
- xlsb = []ο
- xlsx = []ο
- xml = []ο
- classmethod all_readable(cls) Set[FileFormat] ο
Returns all formats that can be read on this system. Note that the result may depend on whether supporting packages are installed. Includes insecure and discouraged formats.
- classmethod all_writable(cls) Set[FileFormat] ο
Returns all formats that can be written to on this system. Note that the result may depend on whether supporting packages are installed. Includes insecure and discouraged formats.
- property can_always_read(self) bool ο
Returns whether this format can be read as long as typeddfs is installed. In other words, regardless of any optional packages.
- property can_always_write(self) bool ο
Returns whether this format can be written to as long as typeddfs is installed. In other words, regardless of any optional packages.
- property can_read(self) bool ο
Returns whether this format can be read. Note that the result may depend on whether supporting packages are installed.
- property can_write(self) bool ο
Returns whether this format can be written. Note that the result may depend on whether supporting packages are installed.
- compressed_variants(self, suffix: str) Set[str] ο
Returns all allowed suffixes.
Example
FileFormat.json.compressed_variants(β.jsonβ) # {β.jsonβ, β.json.gzβ, β.json.zipβ, β¦}
- classmethod from_path(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) FileFormat ο
Guesses a FileFormat from a filename.
See also
- Parameters
path β A string or
pathlib.Path
to a file.format_map β A mapping from suffixes to formats; if
None
, usessuffix_map()
.
- Raises
typeddfs.df_errors.FilenameSuffixError β If not found
- classmethod from_path_or_none(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) Optional[FileFormat] ο
Same as
from_path()
, but returns None if not found.
- classmethod from_suffix(cls, suffix: str, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) FileFormat ο
Returns the FileFormat corresponding to a filename suffix.
See also
- Parameters
suffix β E.g. β.csv.gzβ or β.featherβ
format_map β A mapping from suffixes to formats; if
None
, usessuffix_map()
.
- Raises
typeddfs.df_errors.FilenameSuffixError β If not found
- classmethod from_suffix_or_none(cls, suffix: str, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) Optional[FileFormat] ο
Same as
from_suffix()
, but returns None if not found.
- property is_binary(self) bool ο
Returns whether this format is text-encoded. Note that this does not consider whether the file is compressed.
- property is_recommended(self) bool ο
Returns whether the format is good. Includes CSV, TSV, Parquet, etc. Excludes all insecure formats along with fixed-width, INI, properties, TOML, and HDF5.
- property is_secure(self) bool ο
Returns whether the format does NOT have serious security issues. These issues only apply to reading files, not writing. Excel formats that support Macros are not considered secure. This includes .xlsm, .xltm, and .xls. These can simply be replaced with xlsx. Note that .xml is treated as secure: Although some parsers are subject to entity expansion attacks, good ones are not.
- property is_text(self) bool ο
Returns whether this format is text-encoded. Note that this does not consider whether the file is compressed.
- classmethod list(cls) Set[FileFormat] ο
Returns the set of FileFormats. Works with static type analysis.
- matches(self, *, supported: bool, secure: bool, recommended: bool) bool ο
Returns whether this format meets some requirements.
- Parameters
secure β
is_secure
is Truerecommended β
is_recommended
is True
- classmethod of(cls, t: Union[str, FileFormat]) FileFormat ο
Returns a FileFormat from an exact name (e.g. βcsvβ).
See also
- classmethod split(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) BaseFormatCompression ο
Splits a path into the base path, format, and compression.
See also
- Raises
FilenameSuffixError β If the suffix is not found
- Returns
A 3-tuple of (base base excluding suffixes, file format, compression format)
- classmethod split_or_none(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) BaseFormatCompression ο
Splits a path into the base path, format, and compression.
See also
- Returns
A 3-tuple of (base base excluding suffixes, file format, compression format)
- classmethod strip(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) pathlib.Path ο
Strips a recognized, optionally compressed, suffix from
path
.See also
Example
FileFormat.strip("abc/xyz.csv.gz") # Path("abc") / "xyz"
- classmethod suffix_map(cls) MutableMapping[str, FileFormat] ο
Returns a mapping from all suffixes to their respective formats. See
suffixes()
.
- property suffixes(self) Set[str] ο
Returns the suffixes that are tied to this format. These will not overlap with the suffixes for any other format. For example, .txt is for
FileFormat.lines
, although it could be treated as tab- or space-separated.
- property supports_encoding(self) bool ο
Returns whether this format supports a text encoding of some sort. This may not correspond to an
encoding=
parameter, and the format may be binary. For example, XLS and XML support encodings.