typeddfs.file_formatsο
File formats for reading/writing to/from DFs.
Module Contentsο
- class typeddfs.file_formats.BaseCompressionο
- base :pathlib.Pathο
- compression :CompressionFormatο
- class typeddfs.file_formats.BaseFormatCompressionο
- base :pathlib.Pathο
- compression :CompressionFormatο
- format :Optional[FileFormat]ο
- class typeddfs.file_formats.CompressionFormatο
A compression scheme or no compression: gzip, zip, bz2, xz, and none. These are the formats supported by Pandas for read and write. Provides a few useful functions for calling code.
Examples
CompressionFormat.strip("my_file.csv.gz") # Path("my_file.csv")CompressionFormat.from_path("myfile.csv") # CompressionFormat.none
- bz2 = []ο
- gz = []ο
- none = []ο
- xz = []ο
- zip = []ο
- classmethod all_suffixes(cls) Set[str]ο
Returns all suffixes for all compression formats.
- classmethod from_path(cls, path: typeddfs.utils._utils.PathLike) CompressionFormatο
Returns the compression scheme from a path suffix.
- classmethod from_suffix(cls, suffix: str) CompressionFormatο
Returns the recognized compression scheme from a suffix.
- property full_name(self) strο
Returns a more-complete name of this format. For example, βgzipβ βbzip2β, βxzβ, and βnoneβ.
- property is_compressed(self) boolο
Shorthand for
fmt is not CompressionFormat.none.
- classmethod list(cls) Set[CompressionFormat]ο
Returns the set of CompressionFormats. Works with static type analysis.
- classmethod list_non_empty(cls) Set[CompressionFormat]ο
Returns the set of CompressionFormats, except for
none. Works with static type analysis.
- classmethod of(cls, t: Union[str, CompressionFormat]) CompressionFormatο
Returns a FileFormat from a name (e.g. βgzβ or βgzipβ). Case-insensitive.
Example
CompressionFormat.of("gzip").suffix # ".gz"
- classmethod split(cls, path: typeddfs.utils._utils.PathLike) BaseCompressionο
- classmethod strip_suffix(cls, path: typeddfs.utils._utils.PathLike) pathlib.Pathο
Returns a path with any recognized compression suffix (e.g. β.gzβ) stripped.
- property suffix(self) strο
Returns the single Pandas-recognized suffix for this format. This is just ββ for CompressionFormat.none.
- class typeddfs.file_formats.FileFormatο
A computer-readable format for reading and writing of DataFrames in typeddfs. This includes CSV, Parquet, ODT, etc. Some formats also include compressed variants. E.g. a β.csg.gzβ will map to
FileFormat.csv. This is used internally bytypeddfs.abs_df.read_file()andtypeddfs.abs_df.write_file(), but it may be useful to calling code directly.Examples
FileFormat.from_path("my_file.csv.gz").is_text() # TrueFileFormat.from_path("my_file.csv.gz").can_read() # always TrueFileFormat.from_path("my_file.xlsx").can_read() # true if required package is installed
- csv = []ο
- feather = []ο
- flexwf = []ο
- fwf = []ο
- hdf = []ο
- ini = []ο
- json = []ο
- lines = []ο
- ods = []ο
- parquet = []ο
- pickle = []ο
- properties = []ο
- toml = []ο
- tsv = []ο
- xls = []ο
- xlsb = []ο
- xlsx = []ο
- xml = []ο
- classmethod all_readable(cls) Set[FileFormat]ο
Returns all formats that can be read on this system. Note that the result may depend on whether supporting packages are installed. Includes insecure and discouraged formats.
- classmethod all_writable(cls) Set[FileFormat]ο
Returns all formats that can be written to on this system. Note that the result may depend on whether supporting packages are installed. Includes insecure and discouraged formats.
- property can_always_read(self) boolο
Returns whether this format can be read as long as typeddfs is installed. In other words, regardless of any optional packages.
- property can_always_write(self) boolο
Returns whether this format can be written to as long as typeddfs is installed. In other words, regardless of any optional packages.
- property can_read(self) boolο
Returns whether this format can be read. Note that the result may depend on whether supporting packages are installed.
- property can_write(self) boolο
Returns whether this format can be written. Note that the result may depend on whether supporting packages are installed.
- compressed_variants(self, suffix: str) Set[str]ο
Returns all allowed suffixes.
Example
FileFormat.json.compressed_variants(β.jsonβ) # {β.jsonβ, β.json.gzβ, β.json.zipβ, β¦}
- classmethod from_path(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) FileFormatο
Guesses a FileFormat from a filename.
See also
- Parameters
path β A string or
pathlib.Pathto a file.format_map β A mapping from suffixes to formats; if
None, usessuffix_map().
- Raises
typeddfs.df_errors.FilenameSuffixError β If not found
- classmethod from_path_or_none(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) Optional[FileFormat]ο
Same as
from_path(), but returns None if not found.
- classmethod from_suffix(cls, suffix: str, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) FileFormatο
Returns the FileFormat corresponding to a filename suffix.
See also
- Parameters
suffix β E.g. β.csv.gzβ or β.featherβ
format_map β A mapping from suffixes to formats; if
None, usessuffix_map().
- Raises
typeddfs.df_errors.FilenameSuffixError β If not found
- classmethod from_suffix_or_none(cls, suffix: str, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) Optional[FileFormat]ο
Same as
from_suffix(), but returns None if not found.
- property is_binary(self) boolο
Returns whether this format is text-encoded. Note that this does not consider whether the file is compressed.
- property is_recommended(self) boolο
Returns whether the format is good. Includes CSV, TSV, Parquet, etc. Excludes all insecure formats along with fixed-width, INI, properties, TOML, and HDF5.
- property is_secure(self) boolο
Returns whether the format does NOT have serious security issues. These issues only apply to reading files, not writing. Excel formats that support Macros are not considered secure. This includes .xlsm, .xltm, and .xls. These can simply be replaced with xlsx. Note that .xml is treated as secure: Although some parsers are subject to entity expansion attacks, good ones are not.
- property is_text(self) boolο
Returns whether this format is text-encoded. Note that this does not consider whether the file is compressed.
- classmethod list(cls) Set[FileFormat]ο
Returns the set of FileFormats. Works with static type analysis.
- matches(self, *, supported: bool, secure: bool, recommended: bool) boolο
Returns whether this format meets some requirements.
- Parameters
secure β
is_secureis Truerecommended β
is_recommendedis True
- classmethod of(cls, t: Union[str, FileFormat]) FileFormatο
Returns a FileFormat from an exact name (e.g. βcsvβ).
See also
- classmethod split(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) BaseFormatCompressionο
Splits a path into the base path, format, and compression.
See also
- Raises
FilenameSuffixError β If the suffix is not found
- Returns
A 3-tuple of (base base excluding suffixes, file format, compression format)
- classmethod split_or_none(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) BaseFormatCompressionο
Splits a path into the base path, format, and compression.
See also
- Returns
A 3-tuple of (base base excluding suffixes, file format, compression format)
- classmethod strip(cls, path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) pathlib.Pathο
Strips a recognized, optionally compressed, suffix from
path.See also
Example
FileFormat.strip("abc/xyz.csv.gz") # Path("abc") / "xyz"
- classmethod suffix_map(cls) MutableMapping[str, FileFormat]ο
Returns a mapping from all suffixes to their respective formats. See
suffixes().
- property suffixes(self) Set[str]ο
Returns the suffixes that are tied to this format. These will not overlap with the suffixes for any other format. For example, .txt is for
FileFormat.lines, although it could be treated as tab- or space-separated.
- property supports_encoding(self) boolο
Returns whether this format supports a text encoding of some sort. This may not correspond to an
encoding=parameter, and the format may be binary. For example, XLS and XML support encodings.