- Home /

I'm new to pandas have tried going through the docs and experiment with various examples, but this problem I'm tacking has really stumped me.

I have the following two dataframes (DataA/DataB) which I would like to merge on a per global_index/item/values basis.

DataA DataB row item_id valueA row item_id valueB 0 x A1 0 x B1 1 y A2 1 y B2 2 z A3 2 x B3 3 x A4 3 y B4 4 z A5 4 z B5 5 x A6 5 x B6 6 y A7 6 y B7 7 z A8 7 z B8The list of items(item_ids) is finite and each of the two dataframes represent a the value of a trait (trait A, trait B) for an item at a given global_index value.

The global_index could roughly be thought of as a unit of "time"

The mapping between each data frame (DataA/DataB) and the global_index is done via the following two mapper DFs:

DataA_mapper global_index start_row num_rows 0 0 3 1 3 2 3 5 3DataB_mapper global_index start_row num_rows 0 0 2 2 2 3 4 5 3

Simply put for a given global_index the mapper will define a list of rows into its respective DF (DataA or DataB) that are associated with that global_index.

I would like to merge the DFs so that I get the following dataframe:

row global_index item_id valueA valueB 0 0 x A1 B1 1 0 y A2 B2 2 0 z A3 NaN 3 1 x A4 B1 4 1 z A5 NaN 5 2 x A4 B3 6 2 y A2 B4 7 2 z A5 B5 8 3 x A6 B3 9 3 y A7 B4 10 3 z A8 B5 11 4 x A6 B6 12 4 y A7 B7 13 4 z A8 B8

In the final datafram any pair of global_index/item_id there will ever be either:

- a value for both valueA and valueB
- a value only for valueA
- a value only for valueB

With the requirement being if there is only one value for a given global_index/item (eg: valueA but no valueB) for the last value of the missing one to be used.

Comment

Databricks Inc.

160 Spear Street, 13th Floor

San Francisco, CA 94105

info@databricks.com

1-866-330-0121

- Anonymous
- Sign in
- Create
- Ask a question
- Create an article
- Explore
- Topics
- Questions
- Articles
- Badges