[U-Boot] [RFC/PATCH] tools: New 'spl-stackusage' script

Sun Feb 1 00:47:52 CET 2015

This is a script, which tries to provide a pessimistic estimate
of the stack usage in the SPL binary.

A more detailed description about how it works and some example
pictures are available in an earlier e-mail:
    http://lists.denx.de/pipermail/u-boot/2015-January/201713.html

In fact, I have done nothing with it since that old status update
and the code is still very much just at the proof of concept
stage and can be only treated as a somewhat working prototype.
It only supports 32-bit ARM so far.

An example of trying it:

1. First compile u-boot for some ARM board. For example:

   make CROSS_COMPILE=arm-none-linux-gnueabi- Cubieboard_defconfig
   make CROSS_COMPILE=arm-none-linux-gnueabi- -j8

2. Run the script:

   tools/spl-stackusage/spl-stackusage

3. Look at the (slightly reformatted) output:

   Execution path with maximized stack usage (3152 bytes):
      entry > _main > board_init_f > board_init_r > spl_mmc_load_image >
      mmc_init > mmc_startup > mmc_switch.isra.1 > mmc_send_status >
      printf > puts > console_puts ? mmc_bread > mmc_set_blocklen >
      mmc_send_cmd ? sunxi_mmc_getcd > gpio_get_value >
      axp_gpio_get_value > axp209_read > i2c_read ? twsi_i2c_read >
      i2c_begin > twsi_wait > udelay > __udelay

   The '>' symbols indicate normal calls. The '?' symbols
   indicate a suspected indirect call. Now we can see that the
   'console_puts ? mmc_bread' indirect call does not make any sense.
   So we can run the script again, now indicating that a call
   from 'console_puts' to 'mmc_bread' is impossible:

4. Run the script again as:

   tools/spl-stackusage/spl-stackusage \
       remove_arrow[console_puts,mmc_bread]

5. Look at the new output:

   Execution path with maximized stack usage (3152 bytes):
      entry > _main > board_init_f > board_init_r > spl_mmc_load_image >
      mmc_init > mmc_startup > mmc_switch.isra.1 > mmc_send_status >
      printf > puts > console_puts ? sunxi_mmc_getcd > gpio_get_value >
      axp_gpio_get_value > axp209_read > i2c_read ? mmc_bread >
      mmc_set_blocklen > mmc_send_cmd ? twsi_i2c_read > i2c_begin >
      twsi_wait > udelay > __udelay

   Now it tries to guess that there is an indirect call from
   'console_puts' to 'sunxi_mmc_getcd'. This also does not make sense.
   We add it to the black list too. And repreat the whole process
   iteratively. Eventually we end up with something like

6. Run the script as:

   tools/spl-stackusage/spl-stackusage \
       remove_arrow[console_puts,mmc_bread] \
       remove_arrow[console_puts,sunxi_mmc_getcd] \
       remove_arrow[i2c_read,sunxi_mmc_set_ios] \
       remove_arrow[console_puts,twsi_i2c_read] \
       remove_arrow[i2c_read,mmc_bread] \
       remove_arrow[serial_puts,mmc_bread] \
       remove_arrow[serial_puts,sunxi_mmc_getcd] \
       remove_arrow[console_puts,twsi_i2c_write] \
       remove_arrow[console_puts,sunxi_mmc_send_cmd] \
       remove_arrow[console_puts,twsi_i2c_probe] \
       remove_arrow[serial_puts,twsi_i2c_read] \
       remove_arrow[console_puts,twsi_i2c_init] \
       remove_arrow[serial_puts,twsi_i2c_write] \
       remove_arrow[serial_puts,sunxi_mmc_send_cmd] \
       remove_arrow[serial_puts,twsi_i2c_probe] \
       remove_arrow[console_puts,sunxi_mmc_set_ios] \
       remove_arrow[serial_puts,twsi_i2c_init]

7. And look at the final report:

   Execution path with maximized stack usage (3064 bytes):
      entry > _main > board_init_f > board_init_r > spl_mmc_load_image >
      mmc_init > mmc_startup > mmc_set_clock > mmc_set_ios ? mmc_bread >
      mmc_set_blocklen > mmc_send_cmd ? sunxi_mmc_getcd > gpio_get_value >
      axp_gpio_get_value > axp209_read > i2c_read > i2c_get_adapter >
      printf > vsprintf > vsnprintf_internal.isra.6 > number.isra.4 >
      put_dec > __div64_32 > __aeabi_uidiv > __aeabi_idiv0

The process of blacklisting impossible caller->callee pairs made
the stack usage report less pessimistic (from 3152 to 3064 bytes)
and the execution path makes a little bit more sense. But that's
the whole point. Even the initial stack usage number is supposed
to be usable, because it is just overestimating the stack usage
and should never underestimate it (unless confused by tricks in
the assembly code).

Also there is a 'spl-stackusage.png' file with the whole callgraph
generated by the graphviz 'dot' tool. It may looks like this (a
different picture, not generated by following the steps above):
    http://people.freedesktop.org/~siamashka/files/20150116-spl-stackgraph/spl-stackgraph-v2015.01-cubieboard2-fel.png

This is surely not intended to be pushed as-is yet. And still needs
some cleanups. However I would like to get some feedback, just to get
a better sense of direction where it should be moving.

Also, formally, the merge window is still open. So, I guess, this tool
might have a very slim theoretical chance to get into v2015.04 :-)

---
 tools/spl-stackusage/spl-stackusage    |   1 +
 tools/spl-stackusage/spl-stackusage.rb | 471 +++++++++++++++++++++++++++++++++
 2 files changed, 472 insertions(+)
 create mode 120000 tools/spl-stackusage/spl-stackusage
 create mode 100755 tools/spl-stackusage/spl-stackusage.rb

diff --git a/tools/spl-stackusage/spl-stackusage b/tools/spl-stackusage/spl-stackusage
new file mode 120000
index 0000000..eb10b9e
--- /dev/null
+++ b/tools/spl-stackusage/spl-stackusage
@@ -0,0 +1 @@
+spl-stackusage.rb
\ No newline at end of file
diff --git a/tools/spl-stackusage/spl-stackusage.rb b/tools/spl-stackusage/spl-stackusage.rb
new file mode 100755
index 0000000..a135d6c
--- /dev/null
+++ b/tools/spl-stackusage/spl-stackusage.rb
@@ -0,0 +1,471 @@
+#!/usr/bin/env ruby
+#
+# Copyright © 2015 Siarhei Siamashka <siarhei.siamashka at gmail.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+
+###############################################################################
+# This is a u-boot SPL stack usage analysis script. Needs to be run           #
+# from the root of the u-boot source tree after a successful SPL build.       #
+#                                                                             #
+# Produces 'spl-stackusage.png' file as the output.                           #
+###############################################################################
+
+require 'find'
+
+###############################################################################
+# Process command line arguments
+#
+# add_node[node_name, stack_usage]
+# remove_node[node_name, stack_usage]
+# add_arrow[node_name1, node_name2]
+# remove_arrow[node_name1, node_name2]
+#
+###############################################################################
+
+$removed_arrows = {}
+
+ARGV.each {|a|
+  a.split(";").each {|a|
+    if a =~ /remove_arrow[\(\[]\s*(\S+)\s*\,\s*(\S+)\s*[\)\]]/
+      $removed_arrows[[$1, $2]] = 1
+    end
+  }
+}
+
+###############################################################################
+
+if !File.directory?("spl")
+  abort("Error: can't find the 'spl' directory.\n" +
+        "Please ensure that you are in the root of the u-boot source tree.\n")
+end
+
+def tool_exists(tool_name)
+    `which #{tool_name} > /dev/null 2>&1`
+    return $?.to_i == 0
+end
+
+if !tool_exists("dot")
+  abort("Please install the graphviz dot tool.\n")
+end
+
+###############################################################################
+# Get objdump log                                                             #
+###############################################################################
+
+# Get the information from the environment
+$toolchain = ENV["CROSS_COMPILE"]
+
+# Try to guess the ARM toolchain automatically
+if !$toolchain && `file spl/u-boot-spl` =~ / ARM, /
+  toolchain_list = [
+    "arm-none-linux-gnueabi-",
+    "armv7a-hardfloat-linux-gnueabi-",
+    "arm-linux-gnueabihf-"
+  ]
+  toolchain_list.each {|toolchain|
+    if tool_exists("#{toolchain}objdump")
+      $toolchain = toolchain
+      break
+    end
+  }
+end
+
+if !$toolchain
+  abort("Error: can't find a usable toolchain.\n" +
+        "Try to set the CROSS_COMPILE environment variable.\n")
+end
+
+$objdump_dis_log = `#{$toolchain}objdump -d -M reg-names-raw spl/u-boot-spl`
+$objdump_sym_log = `#{$toolchain}objdump -t spl/u-boot-spl`
+$size_log = `#{$toolchain}size spl/u-boot-spl`
+
+File.open("spl-stackusage.dis", "w").write($objdump_dis_log)
+File.open("spl-stackusage.sym", "w").write($objdump_sym_log)
+
+###############################################################################
+# Read the *.su files (generated by GCC with '-fstack-usage' option)          #
+###############################################################################
+
+$dynamic_stack_size = {}
+$self_stack_size = {}
+$total_stack_size  = {}
+
+Find.find("spl") { |filename|
+  next unless filename =~ /.*\.su$/
+  File.open(filename).each_line {|l|
+    if l =~ /\:([^\:\s]+)\s+(\d+)\s+(\S+)$/
+      case $3
+      when "dynamic"
+        $dynamic_stack_size[$1] = true
+      when "static"
+        $self_stack_size[$1] = [$2.to_i, ($self_stack_size[$1] || 0)].max
+      else
+        abort("Unsupported stack usage type #{$3} in #{filename}\n")
+      end
+    end
+  }
+}
+
+if $self_stack_size.size == 0
+  abort("Error: could not load anything from *.su files in 'spl' directory.\n" +
+        "Please ensure that you are using '-fstack-usage' GCC option.\n")
+end
+
+###############################################################################
+# Parse the objdump log                                                       #
+###############################################################################
+
+$detected_cycles = {}
+$labels = {}
+$arrows = {}
+$functions = {}
+$func_code = {}
+$datastructs = {}
+$refs = {}
+$xrefs = {}
+$max_stack_size = 0
+$critical_path = {}
+$shortest_cycle = nil
+$indirect_targets = {}
+$indirect_sources = {}
+$blue_arrows = {}
+
+# Stack usage, based on inspecting disassembly listing in objdump log
+def guess_func_stack_usage(func_name)
+  func_code = $func_code[func_name]
+  if !func_code
+    printf("No function code for '%s'\n", func_name)
+    return nil
+  end
+  stack_usage = 0
+  func_code.each_line { |l|
+    # 'push {reglist}' instruction
+    if l =~ /push\s+\{\s*(.*?)\s*\}/
+      reglist = $1.split(",")
+      abort("Register ranges are not supported") if $1 =~ /\-/
+      stack_usage += reglist.size * 4
+    end
+    # 'sub sp, sp, #imm' instruction
+    if l =~ /sub\s+r13\s*\,\s*(r13\s*)\,\s*\#(\d+)/
+      stack_usage += $2.to_i
+    end
+  }
+  return stack_usage
+end
+
+# Stack usage, based on parsing *.su files
+def get_func_stack_usage(func_name)
+  stack_usage = $self_stack_size[func_name]
+  stack_usage = guess_func_stack_usage(func_name) if !stack_usage
+  return stack_usage
+end
+
+$objdump_sym_log.each_line {|l|
+  # 000034c8 l     F .text	000000c8 twsi_i2c_read
+  next unless l =~ /^(\h+)\s+\S+\s+(\S+)\s+(\.\S+)\s+\h+.*?\s+(\S+)\s*$/
+  addr = $1
+  type = $2
+  sect = $3
+  name = $4
+  if type == "F" && sect == ".text"
+    $functions[name] = addr.to_i(16)
+  else
+    $datastructs[name] = addr.to_i(16)
+  end
+}
+
+$functions.each { |func, _|
+  if $dynamic_stack_size.has_key?(func)
+    abort("dynamic stack size in #{func}")
+  end
+}
+
+Find.find("spl") { |filename|
+  next unless filename =~ /.*\.o$/
+  skipflag = false
+  current_section = ""
+  `#{$toolchain}objdump -r #{filename}`.each_line {|l|
+    if l =~ /RELOCATION RECORDS FOR/
+      if l =~ /\[\.text\.(.*?)\]/
+        skipflag = !$functions.has_key?($1)
+      elsif l =~ /\[\.(data|rodata)\.(.*?)\]/
+        skipflag = !$datastructs.has_key?($2)
+      else
+        skipflag = false
+      end
+    end
+
+    next if skipflag
+
+    if l =~ /(\S+)\s+(\S+)\s+(\S+)/
+      reltype = $2
+      name    = $3
+      next if !$functions.has_key?(name)
+      next if reltype == "R_ARM_CALL" || reltype == "R_ARM_JUMP24"
+      next if reltype == "R_ARM_THM_CALL" || reltype == "R_ARM_THM_JUMP24"
+      $indirect_targets[name] = 1
+    end
+  }
+}
+
+def add_ref(current_func, func_name)
+  return if $removed_arrows.has_key?([current_func, func_name])
+
+  $arrows[[current_func, func_name]] = 1
+  $refs[current_func] = {} if !$refs.has_key?(current_func)
+  $refs[func_name] = {} if !$refs.has_key?(func_name)
+  $refs[current_func][func_name] = true
+
+  $xrefs[current_func] = {} if !$xrefs.has_key?(current_func)
+  $xrefs[func_name] = {} if !$xrefs.has_key?(func_name)
+  $xrefs[func_name][current_func] = true
+end
+
+# First pass, collect local aliases
+current_func = "entry"
+current_label = "entry"
+$label_to_func_mapping = {}
+$objdump_dis_log.each_line {|l|
+  current_label = $1 if l =~ /^\h+\s+\<(.*?)>\:$/
+  current_func = $1 if l =~ /^\h+\s+\<(.*?)>\:$/ && $functions.has_key?($1)
+  $label_to_func_mapping[current_label] = current_func
+}
+
+current_func = "entry"
+$objdump_dis_log.each_line {|l|
+  current_func = $1 if l =~ /^\h+\s+\<(.*?)>\:$/ && $functions.has_key?($1)
+  if l =~ /(.*?)\<(.*?)(\+0x\h+)?\>/
+    target_func = $label_to_func_mapping[$2]
+    add_ref(current_func, target_func) if target_func != current_func
+  end
+  if l =~ /\s+blx?\s+r/
+    $indirect_sources[current_func] = 1
+  end
+  if l =~ /\s+bx?\s+r/ && !(l =~ /\s+bx?\s+r14/)
+    $indirect_sources[current_func] = 1
+  end
+  $func_code[current_func] = "" unless $func_code.has_key?(current_func)
+  $func_code[current_func] << sprintf("%s\n", l.strip)
+}
+
+###############################################################################
+# Handle indirect calls                                                       #
+###############################################################################
+
+$functions["<indirect-call>"] = 1
+$self_stack_size["<indirect-call>"] = 0
+
+$multiplexed_indirect_sources = $indirect_sources.clone
+$multiplexed_indirect_targets = $indirect_targets.clone
+
+# A function can't indirectly call itself (that would be a recursion)
+$indirect_sources.each {|src, _|
+  $removed_arrows[[src, src]] = 1
+}
+# A function can't indirectly call its caller (that would be a recursion)
+$refs.each {|src, dst_list|
+  dst_list.each {|dst, _|
+    $removed_arrows[[dst, src]] = 1
+  }
+}
+
+def impossible_indirect_call(bad_src, bad_dst)
+  return unless $multiplexed_indirect_sources.has_key?(bad_src)
+  return unless $multiplexed_indirect_targets.has_key?(bad_dst)
+
+  if $multiplexed_indirect_sources.size < $multiplexed_indirect_targets.size
+    $multiplexed_indirect_sources.delete(bad_src)
+    $multiplexed_indirect_targets.each {|dst, _|
+      next if dst == bad_dst
+      newarrow = [bad_src, dst]
+      if !($removed_arrows.has_key?(newarrow))
+        add_ref(newarrow[0], newarrow[1])
+        $blue_arrows[newarrow] = 1
+      end
+    }
+  else
+    $multiplexed_indirect_targets.delete(bad_dst)
+    $multiplexed_indirect_sources.each {|src, _|
+      next if src == bad_src
+      newarrow = [src, bad_dst]
+      if !($removed_arrows.has_key?(newarrow))
+        add_ref(newarrow[0], newarrow[1])
+        $blue_arrows[newarrow] = 1
+      end
+    }
+  end
+end
+
+$removed_arrows.each {|k, _|
+  impossible_indirect_call(k[0], k[1])
+}
+
+$multiplexed_indirect_sources.each {|src, _|
+  newarrow = [src, "<indirect-call>"]
+  add_ref(newarrow[0], newarrow[1])
+  $blue_arrows[newarrow] = 1
+}
+
+$multiplexed_indirect_targets.each {|dst, _|
+  newarrow = ["<indirect-call>", dst]
+  add_ref(newarrow[0], newarrow[1])
+  $blue_arrows[newarrow] = 1
+}
+
+###############################################################################
+
+def recursive_walk(node, visited_nodes, depth, total_stack)
+  $total_stack_size[node] = [total_stack, ($total_stack_size[node] || 0)].max
+  if $total_stack_size[node] > $max_stack_size
+    $critical_path = visited_nodes.clone
+    $max_stack_size = $total_stack_size[node]
+  end
+  $refs[node].each {|next_node, _|
+    func_stack = get_func_stack_usage(next_node)
+    if visited_nodes.has_key?(next_node) then
+      # Find shortest cycle
+      tmp = visited_nodes.sort {|a, b| a[1] <=> b[1]}.map {|a| a[0]}
+      idx = tmp.find_index(next_node)
+      tmp = tmp.slice(idx, tmp.size - idx)
+      $shortest_cycle = tmp if !$shortest_cycle
+      $shortest_cycle = tmp if tmp.size < $shortest_cycle.size
+      $detected_cycles[[node, next_node]] = 1
+      next
+    end
+    visited_nodes[next_node] = depth if next_node != "<indirect-call>"
+    recursive_walk(next_node, visited_nodes, depth + 1,
+                   total_stack + func_stack)
+    visited_nodes.delete(next_node)
+  }
+end
+
+# Find the 'root' nodes and recursively traverse the graph starting from them
+$xrefs.each {|func_name, nodes|
+  next if nodes.size != 0
+  visited_nodes = {}
+  depth = 0
+  visited_nodes[func_name] = depth
+  recursive_walk(func_name, visited_nodes, depth + 1,
+                 get_func_stack_usage(func_name))
+}
+
+###############################################################################
+# Generate .dot file for graphviz                                             #
+###############################################################################
+
+$legend = ""
+$max_stack_size = $total_stack_size.map { |k, v| v }.max
+
+def format_path_arr(path_arr)
+  result = ""
+  (path_arr + [path_arr[0]]).each_cons(2) { |a|
+    normal_call   = $refs[a[0]].has_key?(a[1]) &&
+                    !$blue_arrows.has_key?([a[0], a[1]])
+    indirect_call = ($refs[a[0]].has_key?("<indirect-call>") &&
+                     $refs["<indirect-call>"].has_key?(a[1])) ||
+                     $blue_arrows.has_key?([a[0], a[1]])
+    if normal_call
+      result << sprintf("%s > ", a[0])
+    elsif indirect_call
+      result << sprintf("%s ? ", a[0])
+    else
+      result << sprintf("%s ! ", a[0])
+    end
+  }
+  return result + path_arr[0]
+end
+
+if $size_log =~ /(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\h+)\s+spl\/u\-boot\-spl/
+    text_plus_data_size = $1.to_i + $2.to_i
+    bss_size = $3.to_i
+
+    $legend << sprintf("------------ u-boot SPL ------------\\l")
+    $legend << sprintf("text+data   = %5d bytes\\l", text_plus_data_size)
+    $legend << sprintf("bss         = %5d bytes\\l", bss_size)
+    $legend << sprintf("stack usage = %5d bytes\\l", $max_stack_size)
+end
+
+if $detected_cycles.size != 0
+  $legend << sprintf("--------------------------------------\\l")
+  $legend << sprintf("Warning, recursion suspected:\\l\\\"%s\\\"\\l",
+                     format_path_arr($shortest_cycle))
+  $legend << sprintf("\\lWhere:")
+  $legend << sprintf("\\l  '>' - means normal function call")
+  $legend << sprintf("\\l  '?' - indirect call (via a pointer)\\l")
+  $legend << sprintf("\\lBlue arrows highlight possible indirect calls")
+  $legend << sprintf("\\lgoing through a fake multiplexer node. Red")
+  $legend << sprintf("\\larrows highlight possible cases of recursion")
+  $legend << sprintf("\\lon the callgraph. Recursion is bad because it")
+  $legend << sprintf("\\lmakes stack usage estimation unpredictable.\\l")
+  $legend << sprintf("\\lThe recursion warnings can be eliminated by:")
+  $legend << sprintf("\\l  1) Fixing the SPL code if it is really broken.")
+  $legend << sprintf("\\l  2) Incorrectly guessed indirect calls can")
+  $legend << sprintf("\\l     be suppressed by using one or more")
+  $legend << sprintf("\\l     'remove_arrow[caller,callee]' command")
+  $legend << sprintf("\\l     line arguments passed to the script.\\l")
+  $legend << sprintf("\\lThe nodes on the execution path with maximized")
+  $legend << sprintf("\\lstack usage are orange colored.\\l")
+end
+
+fh = File.open("spl-stackusage.dot", "w")
+
+$refs.each {|func_name, nodes|
+  $labels[func_name] = sprintf("%s\\n[total=%d, self=%d]",
+                               func_name,
+                               $total_stack_size[func_name],
+                               get_func_stack_usage(func_name))
+}
+
+fh.printf("digraph graphname {\n")
+fh.printf("rankdir=LR;\n")
+fh.printf("node [shape=box,fontname=helvetica];\n")
+
+$total_stack_size.each { |k, v|
+  extra = ""
+  shape = "box"
+  shape = "octagon" if $indirect_sources.has_key?(k)
+  shape = "doubleoctagon" if $indirect_targets.has_key?(k)
+  if $critical_path.has_key?(k)
+    extra = ",style=filled,fillcolor=orange,group=main"
+  end
+  if k == "<indirect-call>"
+    shape = "oval"
+    extra = ",style=filled,fillcolor=lightblue"
+  end
+  fh.printf("\"%s\" [shape=#{shape}#{extra}];\n", $labels[k])
+}
+
+$arrows.each {|k, v|
+  if $detected_cycles.has_key?(k) then
+    fh.printf("\"%s\" -> \"%s\" [color=\"red\",penwidth=3];\n",
+              $labels[k[0]], $labels[k[1]])
+  elsif $blue_arrows.has_key?(k) then
+    fh.printf("\"%s\" -> \"%s\" [color=blue];\n",
+              $labels[k[0]], $labels[k[1]])
+  else
+    fh.printf("\"%s\" -> \"%s\";\n", $labels[k[0]], $labels[k[1]])
+  end
+}
+fh.printf("legend [shape=box,style=filled,fontname=monospace,fontsize=24," +
+          "color=\"lightgray\",label=\"#{$legend}\"];\n")
+fh.printf("}\n")
+fh.close
+
+printf("Execution path with maximized stack usage (%d bytes):\n%s\n",
+       $max_stack_size,
+       format_path_arr($critical_path.invert.sort.map { |a| a[1] }))
+
+`dot -Tpng -o spl-stackusage.png spl-stackusage.dot`
-- 
2.0.5