All about DataSince, DataEngineering and ComputerScience
View the Project on GitHub datainsightat/DataScience_Examples
Aid the cleaning and formating of data.
Define Source > Navigate to Source Details > Name each change based on last step > Repeat until done
let
Source = Your Source Link details
# Source Info = Your Source Navigation Details
# Applied Step 1 Name = M Code Action to Source Info
# Applied Step 2 Name = M Code Action to Result of Applied Step 1,
# Applied Step 3 Name = M Code Action to Result of Applied Step 2,
# Applied Step 4 Name = M Code Action to Result of Applied Step 3
in
# Applied Step 4 Name
let
Source = Excel.Workbook(File.Contents(L1 - Grocery Store doc.xlsx"),null,true),
#"Produce Sheet" = Source{[Item="Produce ",Kind="Sheet"]}[Data],
#"Promoted Headers" = Tabe.PromoteHeaders(#"Produce Sheet","[Promot=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{...}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"PLU",...},
#"Removed Duplicates" = Table.Distinct(#"Removed Other Columns",{"PLU"})
in
#"Removed Duplicates"
Text.PadStart(text.From([Month]),2,"0")
= each if [Month] >= FYStart
then [Month] - (FYStart - 1)
else [Month] + (12 - FYStart * 1)
#date(2021,5,3)
#time(23,59,59)
#duration()
Format data to define new columns, or aggregate values of columns.
= Table.AddColumn(#"Changed Type", "Duration Hours",
each Duration.TotalHours(#duration([Duration],0,0,0)))
= Table.AddColumn(#"Added Custom", "Real Time Duration Hours",
each Duration.TotalHours(Date.From(DateTime.LocalNow())-[Date 1]))
= Table.AddColumn(#"Added Custom1", "Duration Check",
each if [Real Time Duration Hours] > 800 then "Long" else "Short")
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "Category 2",
each Text.Trim([Category])),
#"Reordered Columns" = Table.ReorderColumns(#"Added Custom3",
{"Date 1", "Date 2", "Category", "Category 2", "Serial Number",
"Duration", "Duration Hours", "Real Time Duration Hours", "Duration
Check"}),
#"Removed Columns" = Table.RemoveColumns(#"Reordered Columns",
{"Category"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",
Category 2)
#"Added Custom4" = Table.AddColumn(#"Renamed Columns",
"Serial Number 2", each Text.PadStart(Text.From
([Serial Number]),7,"0")),
#"Reordered Columns1" = Table.ReorderColumns(#"Added Custom4",
{"Date 1", "Date 2", "Category", "Serial Number",
"Serial Number 2", "Duration", "Duration Hours",
"Real Time Duration Hours", "Duration Check"}),
#"Removed Columns1" = Table.RemoveColumns(#"Reordered Columns1",
{"Serial Number"}),
#"Renamed Columns1" = Table.RenameColumns(#"Removed Columns1",
Serial Number 2)
Basis is a list with sparse dates. We need to fill the missing dates with the values of the sparese list.
Insert step “Grouped Rows”
#"Grouped Rows" = Table.Group(#"Renamed Columns", {"TAG"}, All rows),
Insert Step below “Grouped Rows”. The table starts with the end of the week of the lowest date and creates dates in 7 day steps. Name this step “ListMissingDates”
ListMissingDates =
Table.RenameColumns(
Table.FromList(
List.Difference(
List.Dates(
Date.EndOfWeek(List.Min(#"Grouped Rows"[TAG])),
Duration.TotalDays(List.Max(#"Grouped Rows"[TAG])-List.Min(#"Grouped Rows"[TAG]))/7+53,
#duration(7,0,0,0)
),
#"Grouped Rows"[TAG]),
Splitter.SplitByNothing(),null, null, ExtraValues.Error),
Column1),
Merge both tables, sort by date and fill missing rows
Custom1 = Table.Combine({#"Grouped Rows", ListMissingDates}),
#"Sorted Rows" = Table.Sort(Custom1,TAG),
#"Filled Down" = Table.FillDown(#"Sorted Rows",{"All rows"}),
#"Expanded All rows" = Table.ExpandTableColumn(#"Filled Down", "All rows", {"Custom", "Column1", "Value"}, {"Custom", "Column1", "Value"}),