An Introduction to Data Flow Diagrams


Across multiple industries, systems exist to automate manual tasks for users. To that end, any process within a system will take data in, sort it in a useful way, and then return that data as output. Data flow diagrams are ideal for depicting these type of scenarios—they help viewers visualize and understand data stores, data flows, and business processes. A data flow diagram (commonly abbreviated to DFD) shows what information is needed within a process, where it is stored, and how it moves through a system to accomplish an objective. As its name implies, a data flow diagram depicts the flow of data within a system.

BABOK 2.0 has an entire section dedicated to data flow diagrams, noting that data flow diagrams “show how information is input, processed, stored, and output from a system.” [1] Wikipedia puts it this way: A data flow “is a graphical presentation of the ‘flow’ of data through an information system.” [2] Although it displays the flow of data, a data flow diagram is different from a flowchart in that it excludes cause and effect, sequences and the order of the process.

Additionally, data flow diagrams are typically user-friendly, and easy for designers and end-users alike to interpret. As such, they are useful in both the discovery and development stages of a project. While data flow diagrams are common to many organizations, some analysts may know them by another name. As Yourdon notes in his article on structured analysis, certain organizations or business cultures refer to data flows diagrams as “Bubble chart, Bubble diagram, Process model (or business process model), Business flow model, Work flow diagram, or Function model.” [3]

Data flow diagrams are usually classified in different numerical levels in order to display more granular levels of a business process, with level 0—also known as a context diagram—being the highest level. Level 1 is the next highest level view of the information flow; 2 is more granular than 1; 3 is more detailed than 2, and so on (though few systems display more than 3 levels). This method is particularly helpful with complex business processes since it enables a business analyst to illustrate an entire business process succinctly, and to get as detailed as necessary. According to one site, “The technique exploits a method called top-down expansion to conduct the analysis in a targeted way.” i Therefore, if a user has many processes and levels of data to include in a diagram (subprocesses), the multi-level process should be used. In these cases, each diagram (subprocess) should have a unique identifier. The multi-level aspect of data-flow diagrams eliminates any need for an analyst to depict hundreds of processes or pieces of data. Multiple processes within one level may be organized by using numerical identifiers such as 1.0., 1.1, and so on. It is also helpful to use broad categories rather than minute details to cover the business processes. For example, minor bits of deviating information, such as error messages, would likely not be useful to include in a data flow diagram. The practice of keeping DFDs simple and separated into different sublevels is particularly helpful in the agile development process where many of the processes may change frequently. Also particularly useful to the agile process, iterations of data flow diagrams may be archived to show the history of a project’s development as new discoveries and business needs dictate changes to a system’s information sources, outputs, and flow.

For an analyst constructing a level 1 DFD, it may be helpful to start with a context diagram, which is also known as a level 0 data flow diagram. According to Wikipedia: “The context diagram shows the entire system as a single process, and gives no clues as to its internal organization.” This will enable the user to quickly isolate the main entities, inputs, and outputs. Here is an example of a context diagram, or level 0 data flow diagram:

What elements does a data flow diagram include?

A data flow diagram includes the data, processes, stores, and external entities of a system, and all of the data necessary for the system to function (both how it flows and where it is stored). To that end, it includes notations (or symbols) from one of two traditions: Yourdon & Coad or Gene & Sarson. For a visual comparison of the Yourdon & Coad and Gene & Sarson symbols for these, please see this link. (Note that while both are correct, Yourdon & Coad symbols are generally more commonly used in the business analyst profession.) These traditions are just two different styles of symbols, but both depict the same things: processes, datastores, dataflows, and external entities. The style of symbols an analyst follows is not as important as consistently following that style. The computer software program at an analyst’s disposal may dictate that one type of notation is easier to create and edit than another. A brief overview of each of a data flow diagram’s components follows.

Process. A process is also commonly referred to as “a bubble, a function, or a transformation.” [4] Its function is to “transform an incoming data flow into an outgoing data flow.” [5] In other words, a process is how information moves along in the system. A process should be named according to precisely what it does (i.e., Get orders), and it should have inputs and outputs (dataflows, described below). Processes with no information flowing in to or out of them are known as “infinite sinks” and are logically inconsistent. [6]

Below is an example of Yourdon & Coad process symbols.

Dataflows. Dataflows show how information moves along within a system; they are “pipelines” through which bits of data flow. [7] Flows represent “data in motion, whereas the stores . . . represent data at rest.” [8] In both symbolic schools, dataflows are presented as arrows and labeled according to the data they represent (i.e., Record purchase). Examples are below.

Datastore. A datastore is a storage place for data that is used within the system. A datastore can be anything from a database to a customer (either of which can hold customer attributes that the system must extract). According to Yourdon, “Typically, the name chosen to identify the store is the plural of the name of the packets that are carried by flows into and out of the store” [9] (i.e., Purchase). Below is an example from the Yourdon & Coad school.

Yourdon & Code datastore symbol


External entities. External entities are outside sources and targets of information that are relevant to the system. Examples of external entities could be customers, suppliers, or external databases. The Yourdon & Coad school uses a box to represent external entities.

When all of the data flow elements are combined into one diagram using Yourdon & Coad symbols, here is just one example of what the final piece might look:

Example/Sample DFD - Data Flow Diagram

What does a data flow diagram not include?

  • Procedures. A data flow does not answer procedural questions that flowcharts usually cover. For example, a data flow diagram representing an order delivery system would not depict whether orders were taken in person or virtually, or whether they happened automatically or manually.

  • Sequences. A data flow does not represent what happens first or second, or the order in which a process runs.

  • Users. To quote BABOK, a data flow diagram “cannot easily show who is responsible for performing the work.” The owner of a particular process is not relevant to a data flow diagram.

  • Alternative scenarios. A data flow diagram follows one main path of information, and does not take into account a series of feasible spin-off scenarios, such as a flow-chart might.

As a matter of proper use, a data flow diagram also should not include dozens of entities, stores, and flows. According to Yourdon, “each DFD figure should have no more than half a dozen bubbles and related stores.” [10] If a diagram seems particularly complex (representing a complex system), it must be broken into levels, meaning an analyst first depicts the most basic aspects of the system in one simple data flow diagram. Once that diagram is logically depicted, succeeding levels are created, each representing more complex aspects of the system. For more details on successfully constructing levels of data flow diagrams, please see Yourdon’s article here.

While not applicable to all business scenarios, data flow diagrams are almost always ideal tools for analysts who wish to analyze or depict the extent of information needed for a system, where that information will be stored, and how it will move throughout the system. Used properly, they are a potent tool in an analyst’s arsenal.

Author : Morgan Masters is Business Analyst and Staff Writer at, the premier community and resource portal for business analysts. Business analysis resources such as articles, blogs, templates, forums, books, along with a thriving business analyst community can be found at

[1] A Guide to the Business Analyst’s Body of Knowledge® (BABOK® Guide), Version 2.0, International Institute of Business Analysis, Toronto, Ontario, Canada, ©2005, 2006, 2008, 2009.


[3] “Data Flow Diagrams,” from the Structured Analysis Wiki.

[4] ibid

[5] Data Flow Diagram Notations

[6] ibid

[7] ibid

[8] ibid

[9] ibid

[10] ibid


Like this article:
  16 members liked this article


JoannaKozlowska posted on Monday, January 23, 2012 5:54 AM
I wonder how many of you are using DFD diagrams in analysis. Ive learnt about them while studying, but never really used them in analysis (we convey analysis fully based on UML). What is your experience and in which situation you find them useful?
Tony Markos posted on Monday, January 23, 2012 9:16 AM

Thanks for bringing up the very important point that DFDs are very appropriate for Agile. Agile, while focused on minimal documentation, it is also just as much about quality documentation. What is quality documentation?: That which captures the essentials. And the essentials are the data inputs and outputs to/from a process. Only DFDs are focused on the essentials - the data flows. Indeed, if one looks at what a Use Case diagram really is, it is a poor man's DFD: A DFD diagram without the essential data flows.

Granted, DFDs are not to capture the sequential processing logic of a system. However, it needs to be made clear that with DFDs (especially the Yourdon methodology), once the system is decomposed down to a low enough level, it is very easy to then switch to sequence based flow diagrams to finish off the details.

The most important power of DFD's need to be driven home. If one goes to the public library and looks up the word "analysis" in a dictionary, the first listed definition will probably say something like: "Analysis: Partitioning an entity (such as a system) into its component parts and then examining how the parts interrelate". It is that first word - partitioning - that is a killer. Manual and or automated systems - yes, DFD's are just as appropriate for manual systems as automated systems - can not be seen or touched and therefore they are hard to partition. So very few BAs are aware that analysis,- actual analysis, not analysis support - is largely all about partitioning. And ONLY DFD's, through their unique "interview the data" approach, lead to a logical, natural partitioning of a system. Without a logical, natural partitioning of the system, decomposition becomes very problematic. And effective decomposition is needed to handle complexity. Another reason why Use Cases are really poor man's DFD: They offer no guidance on how to proceed in logically partitioning a system.

One final point: The BABOK is flat out wrong in stating that DFD's can not easily show who is responsible for performing a task. It is simply a matter of changing the circles to squares, and then sectioning off a part of the square to contain the "who". I do this all the time.

nickpbroom posted on Friday, February 3, 2012 7:16 AM
Does anybody even still use these? I remember when I started my analysis career back in 2000 that even then these were on their way out as UML was starting to come more into play.

Surely the BPMN and UML modelling artefacts that can be used to represent both behavioural and structural aspects of a system supercede this by now? With BPMN 2.0 having introduced more data modelling artefacts and the presence of UML artefacts like the class diagram, deployment diagram, component diagram etc does the DFD really still have its uses?

Data flow is driven by activity; it doesn't just get there by accident, which is why BPMN modelled data as being associated with a process. Those data items can be modelled statically in a class diagram/entity relationship diagram or a component/deployment diagram for implementation purposes or it can have its behaviour further expanded by looking at state charts or interaction diagrams.

Tony Markos posted on Saturday, February 4, 2012 1:57 PM

If one looks up the definition of "analysis" in a dictionary, chances are that the first list definition will say something like "Analysis: Partitioning an entity [such as a manual and/or automated system] into its component parts and then examining how the parts interrelate."

Analysis of a system is especially hard as one can not see or feel a system (only the mechanisms used to implement a system), and therefore partitioning them is especially hard. In other words analysis of systems - acutal analysis, not analysis support or solution implementation - is largely all about partitioning.

With data flow diagrams, using a unqiue "interview the data" approach, BA's are actually guided through a logic, natural partitioning of a system. Without a logical, natural partitioning, decomposition is problematic. And without effective decomposition, we can not effectively handle larger scale complexity.

With the UML and with BPMN, we pretend that formal partitioning does not exist. We use the default: Sledge hammer partitioning (otherwise know as "Just think real hard" partitioninig and "Well, its an undefinable mystery that only comes with experience" partitioning).

A logical, natural partitioning is the most powerful, unique benefit of data flow diagrams. DFD's have other unique benefits such as a litmus test of completedness and integration across the various requirements categories (business requirements, stakeholder requirements, and solution requirements), but I will not go into those here.

So instead of one thinking "Does anyone us DFD's anymore", I suggest they should look at what analysis of systems is really all about (if one does this they will be way ahead of about 98% of other BA's) and ask themselves questions like "Humm, partitioning, how do the various techniques (DFD, UML, BPMN) help me to do such?


bizguy72 posted on Wednesday, February 15, 2012 9:05 AM
Two things, or questions rather on DFD's:

1. This article states that a context diagram is the same as a level 0 diagram,whereas I have seen elsewhere that a context diagram and a level 0 diagram are two separate diagrams. Which is correct?

2. In this article it states "Here is an example of a context diagram, or level 0 data flow diagram:"

.....and it then shows a data store in the diagram. I was always of the belief that a context diagram never shows a datastore......but then again, this question has relevance back to my first point.


Engle posted on Sunday, February 19, 2012 4:25 PM
From BABOK 9.27.3 " A Context diagram is a top-level data flow diagram "

While the points you make are fair,
Are there not other tools/techniques that can used at a high-level of abstraction, then further decomposed/partitioned to lower levels ?

Cheers !

bizguy72 posted on Monday, February 20, 2012 4:50 AM

>>From BABOK 9.27.3 " A Context diagram is a top-level data flow diagram "

Thanks for that however I was already aware that a context diagram is a top-level DFD.....what I'm wanting to know is whether a context diagram is the same thing as a level 0 DFD or whether they are two separate things.


Tony Markos posted on Monday, February 20, 2012 11:53 AM

Great question! Only Data Flow Diagrams through their unique "interview the data" approach. actually guide the BA through a logical, natural partitioning per level of decomposition. Without such a partitioning, deeper levels of decomposition become next to impossible.

If there is one thing that I would emphasize it is that the word partitioning (i.e., "dividing up the pie" per level of decompostion) zeros in on the real work involved in decomposition.

Also, remember that a BA is not only decomposing processes/functions, he/she needs to, in parallel, decompose the interfaces between such. One can decompose data flow interfaces (example: decompose a purchase order into its components). One can not do a deep decomposition on an interface labled "Depends On".


Level 0 is the same thing as a Context Diagram.

Only registered users may post comments.



Copyright 2006-2024 by Modern Analyst Media LLC