typeddfs.df_typing

Information about how DataFrame subclasses should be handled.

Module Contents

typeddfs.df_typing.FINAL_DF_TYPING
typeddfs.df_typing.FINAL_IO_TYPING
class typeddfs.df_typing.DfTyping

Contains all information about how to type a DataFrame subclass.

_auto_dtypes :Optional[Mapping[str, Type[Any]]]
_column_series_name :Union[bool, None, str]
_columns_to_drop :Optional[Set[str]]
_index_series_name :Union[bool, None, str]
_io_typing :IoTyping
_more_columns_allowed :bool = True
_more_index_names_allowed :bool = True
_order_dclass :bool = True
_post_processing :Optional[Callable[[T], Optional[T]]]
_required_columns :Optional[Sequence[str]]
_required_index_names :Optional[Sequence[str]]
_reserved_columns :Optional[Sequence[str]]
_reserved_index_names :Optional[Sequence[str]]
_value_dtype :Optional[Type[Any]]
_verifications :Optional[Sequence[Callable[[T], Union[None, bool, str]]]]
property auto_dtypes(self) Mapping[str, Type[Any]]

A mapping from column/index names to the expected dtype. These are used via pd.Series.as_type for automatic conversion. An error will be raised if a as_type call fails. Note that Pandas frequently just does not perform the conversion, rather than raising an error. The keys should be contained in known_names, but this is not strictly required.

property column_series_name(self) Union[bool, None, str]

Intelligently returns df.columns.name. Returns a value that will be forced into df.columns.name on calling convert. If None, will set df.columns.name = None. If False, will not set. (True is treated the same as None.)

property columns_to_drop(self) Set[str]

Returns the list of columns that are automatically dropped by convert. This does NOT include β€œlevel_0” and β€œindex, which are ALWAYS dropped.

copy(self, **kwargs) DfTyping
property index_series_name(self) Union[bool, None, str]

Intelligently returns df.index.name. Returns a value that will be forced into df.index.name on calling convert, only if the DataFrame is multi-index. If None, will set df.index.name = None if df.index.names != [None]. If False, will not set. (True is treated the same as None.)

property io(self) IoTyping
property is_strict(self) bool

Returns True if this allows unspecified index levels or columns.

property known_column_names(self) Sequence[str]

Returns all columns that are required or reserved. The sort order positions required columns first.

property known_index_names(self) Sequence[str]

Returns all index levels that are required or reserved. The sort order positions required columns first.

property known_names(self) Sequence[str]

Returns all index and column names that are required or reserved. The sort order is: required index, reserved index, required columns, reserved columns.

property more_columns_allowed(self) bool

Returns whether the DataFrame allows columns that are not reserved or required.

property more_indices_allowed(self) bool

Returns whether the DataFrame allows index levels that are neither reserved nor required.

property order_dataclass(self) bool

Whether the corresponding dataclass can be sorted (has __lt__).

property post_processing(self) Optional[Callable[[T], Optional[T]]]

A function to be called at the final stage of convert. It is called immediately before verifications are checked. The function takes a copy of the input BaseDf and returns a new copy.

Note

Although a copy is passed as input, the function should not modify it. Technically, doing so will cause problems only if the DataFrame’s internal values are modified. The value passed is a shallow copy (see pd.DataFrame.copy).

property required_columns(self) Sequence[str]

Returns the list of required column names.

property required_index_names(self) Sequence[str]

Returns the list of required column names.

property required_names(self) Sequence[str]

Returns all index and column names that are required. The sort order is: required index, required columns.

property reserved_columns(self) Sequence[str]

Returns the list of reserved (optional) column names.

property reserved_index_names(self) Sequence[str]

Returns the list of reserved (optional) index levels.

property reserved_names(self) Sequence[str]

Returns all index and column names that are not required. The sort order is: reserved index, reserved columns.

property value_dtype(self) Optional[Type[Any]]

A type for β€œvalues” in a simple DataFrame. Typically numeric.

property verifications(self) Sequence[Callable[[T], Union[None, bool, str]]]

Additional requirements for the DataFrame to be conformant.

Returns

A sequence of conditions that map the DF to None or True if the condition passes, or False or the string of an error message if it fails

class typeddfs.df_typing.IoTyping

Abstract base class for generic types.

A generic type is typically declared by inheriting from this class parameterized with one or more type variables. For example, a generic mapping type might be defined as:

class Mapping(Generic[KT, VT]):
    def __getitem__(self, key: KT) -> VT:
        ...
    # Etc.

This class can then be used as follows:

def lookup_name(mapping: Mapping[KT, VT], key: KT, default: VT) -> VT:
    try:
        return mapping[key]
    except KeyError:
        return default
_attrs_json_kwargs :Optional[Mapping[str, Any]]
_attrs_suffix :str = .attrs.json
_custom_readers :Optional[Mapping[str, Callable[[pathlib.Path], pandas.DataFrame]]]
_custom_writers :Optional[Mapping[str, Callable[[pandas.DataFrame, pathlib.Path], None]]]
_hash_alg :Optional[str] = sha256
_hdf_key :str = df
_read_kwargs :Optional[Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]]]
_remap_suffixes :Optional[Mapping[str, typeddfs.file_formats.FileFormat]]
_remapped_read_kwargs :Optional[Mapping[str, Any]]
_remapped_write_kwargs :Optional[Mapping[str, Any]]
_save_hash_dir :bool = False
_save_hash_file :bool = False
_secure :bool = False
_text_encoding :str = utf-8
_use_attrs :bool = False
_write_kwargs :Optional[Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]]]
property attrs_json_kwargs(self) Mapping[str, Any]

Keyword arguments for typeddfs.json_utils.JsonUtils.encoder. Used when writing attrs.

property attrs_suffix(self) str

File filename suffix detailing where to save/load per-DataFrame β€œattrs” (metadata). Will be appended to the DataFrame filename.

copy(self, **kwargs) IoTyping
property custom_readers(self) Mapping[str, Callable[[pathlib.Path], pandas.DataFrame]]

Mapping from filename suffixes (module compression) to custom reading methods.

property custom_writers(self) Mapping[str, Callable[[pandas.DataFrame, pathlib.Path], None]]

Mapping from filename suffixes (module compression) to custom reading methods.

property dir_hash(self) bool

Whether to save (append) to per-directory hash files by default. Specifically, in typeddfs.abs_df.AbsDf.write_file().

property file_hash(self) bool

Whether to save per-file hash files by default. Specifically, in typeddfs.abs_df.AbsDf.write_file().

property flexwf_sep(self) str

The delimiter used when reading β€œflex-width” format.

Caution

Only checks the read keyword arguments, not write

property hash_algorithm(self) Optional[str]

The hash algorithm used for checksums.

property hdf_key(self) str

The default key used in typeddfs.abs_df.AbsDf.to_hdf(). The key is also used in typeddfs.abs_df.AbsDf.read_hdf.()

property is_text_encoding_utf(self) bool
property read_kwargs(self) Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]]

Passes kwargs into read functions from read_file. These are keyword arguments that are automatically added into specific read_ methods when called by read_file.

Note

This should rarely be needed

property read_suffix_kwargs(self) Mapping[str, Mapping[str, Any]]

Per-suffix kwargs into read functions from read_file. Modulo compression (e.g. .tsv is equivalent to .tsv.gz).

property recommended(self) bool

Whether to forbid discouraged formats like fixed-width and HDF5. Excludes all insecure formats.

property remap_suffixes(self) Mapping[str, typeddfs.file_formats.FileFormat]

Returns filename formats that have been re-mapped to file formats. These are used in read_file and write_file.

Note

This should rarely be needed. An exception might be .txt to tsv rather than lines; Excel uses this.

property secure(self) bool

Whether to forbid insecure operations and formats.

property text_encoding(self) str

Can be an exact encoding like utf-8, β€œplatform”, β€œutf8(bom)” or β€œutf16(bom)”. See the docs in TypedDfs.typed().encoding for details.

property toml_aot(self) str

The name of the Array of Tables (AoT) used when reading TOML.

Caution

Only checks the read keyword arguments, not write

property use_attrs(self) bool

Whether to read and write pd.DataFrame.attrs when passing attrs=None.

property write_kwargs(self) Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]]

Passes kwargs into write functions from to_file. These are keyword arguments that are automatically added into specific to_ methods when called by write_file.

Note

This should rarely be needed

property write_suffix_kwargs(self) Mapping[str, Mapping[str, Any]]

Per-suffix kwargs into read functions from write_file. Modulo compression (e.g. .tsv is equivalent to .tsv.gz).